Acceleration of chacha20 on aarch64 by SVE
authorDaniel Hu <Daniel.Hu@arm.com>
Mon, 7 Feb 2022 10:17:06 +0000 (10:17 +0000)
committerPauli <pauli@openssl.org>
Tue, 3 May 2022 04:37:46 +0000 (14:37 +1000)
commitb1b2146ded9ce5a84c62f30c6c4a922b449f6c90
tree969d007a0e310df537f7f9495b353bbad4e984d4
parent04904a0fff639c058d38b355d75485ca5dde0a89
Acceleration of chacha20 on aarch64 by SVE

This patch accelerates chacha20 on aarch64 when Scalable Vector Extension
(SVE) is supported by CPU. Tested on modern micro-architecture with
256-bit SVE, it has the potential to improve performance up to 20%

The solution takes a hybrid approach. SVE will handle multi-blocks that fit
the SVE vector length, with Neon/Scalar to process any tail data

Test result:
With SVE
type            1024 bytes   8192 bytes  16384 bytes
ChaCha20        1596208.13k  1650010.79k  1653151.06k

Without SVE (by Neon/Scalar)
type            1024 bytes   8192 bytes  16384 bytes
chacha20        1355487.91k  1372678.83k  1372662.44k

The assembly code has been reviewed internally by
ARM engineer Fangming.Fang@arm.com

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17916)
crypto/arm64cpuid.pl
crypto/arm_arch.h
crypto/armcap.c
crypto/chacha/asm/chacha-armv8-sve.pl [new file with mode: 0755]
crypto/chacha/asm/chacha-armv8.pl
crypto/chacha/build.info