This change improves md5 performance significantly by using a hand-optimized
assembly implementation of the inner loop of md5 calculation. The instructions
are carefully ordered to separate data dependencies as much as possible.
Test with:
$ openssl speed md5
AWS Graviton 2
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
md5 46990.60k 132778.65k 270376.96k 364718.08k 405962.75k 409201.32k
md5-modified 51725.23k 152236.22k 323469.14k 453869.57k 514102.61k 519056.04k
+10% +15% +20% +24% +27% +27%
Apple M1
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
md5 74634.39k 195561.25k 375434.45k 491004.23k 532361.40k 536636.48k
md5-modified 84637.11k 229017.09k 444609.62k 588069.50k 655114.24k 660850.56k
+13% +17% +18% +20% +23% +23%
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16928)