Use scalar ALU and vector ALU together for chacha20 stream cipher
Fixes #24070
Use scalar ALU for 1 chacha block with rvv ALU simultaneously.
The tail elements(non-multiple of block length) will be handled by
the scalar logic.
Use rvv path if the input length > chacha_block_size.
And we have about 1.2x improvement comparing with the original code.
Reviewed-by: Hongren Zheng <i@zenithal.me>
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/24097)