sha/asm/keccak1600-armv4.pl: improve non-NEON performance by ~10%.
This is achieved mostly by ~10% reduction of amount of instructions
per round thanks to a) switch to KECCAK_2X variant; b) merge of
almost 1/2 rotations with logical instructions. Performance is
improved on all observed processors except on Cortex-A15. This is
because it's capable of exploiting more parallelism and can execute
original code for same amount of time.
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/4057)