Fix performance regression of ChaCha20 on LoongArch64
authorLin Runze <lrzlin@163.com>
Sun, 14 Jan 2024 12:21:49 +0000 (20:21 +0800)
committerTomas Mraz <tomas@openssl.org>
Wed, 17 Jan 2024 08:40:04 +0000 (09:40 +0100)
The regression was introduced in PR #22817.

In that pull request, the input length check was moved forward,
but the related ori instruction was missing, and it will cause
input of any length down to the much slower scalar implementation.

Fixes #23300

CLA: trivial

Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23301)

crypto/chacha/asm/chacha-loongarch64.pl

index 9eed5860de9a710ba92d876117c0778b0ace0f13..0a194dd799009458fc3e1236d5e4910ff75cfeeb 100644 (file)
@@ -71,6 +71,7 @@ ChaCha20_ctr32:
        # $a4 = arg #5 (counter array)
 
        beqz            $len,.Lno_data
+       ori                     $t3,$zero,64
        la.pcrel        $t0,OPENSSL_loongarch_hwcap_P
        ld.w            $t0,$t0,0
 
@@ -461,7 +462,6 @@ EOF
 $code .= <<EOF;
 .align 6
 .LChaCha20_4x:
-       ori                     $t3,$zero,64
        addi.d          $sp,$sp,-128
 
        # Save the initial block counter in $t4
@@ -886,7 +886,6 @@ EOF
 $code .= <<EOF;
 .align 6
 .LChaCha20_8x:
-       ori                     $t3,$zero,64
        addi.d          $sp,$sp,-128
 
        # Save the initial block counter in $t4