md5: add assembly implementation for aarch64
authorJonathan Swinney <jswinney@amazon.com>
Wed, 27 Oct 2021 16:50:30 +0000 (16:50 +0000)
committerPauli <pauli@openssl.org>
Tue, 3 May 2022 04:36:17 +0000 (14:36 +1000)
This change improves md5 performance significantly by using a hand-optimized
assembly implementation of the inner loop of md5 calculation. The instructions
are carefully ordered to separate data dependencies as much as possible.

Test with:
$ openssl speed md5

AWS Graviton 2
type             16 bytes    64 bytes     256 bytes    1024 bytes   8192 bytes   16384 bytes
md5              46990.60k   132778.65k   270376.96k   364718.08k   405962.75k   409201.32k
md5-modified     51725.23k   152236.22k   323469.14k   453869.57k   514102.61k   519056.04k
                 +10%        +15%         +20%         +24%         +27%         +27%

Apple M1
type             16 bytes    64 bytes     256 bytes    1024 bytes   8192 bytes   16384 bytes
md5              74634.39k   195561.25k   375434.45k   491004.23k   532361.40k   536636.48k
md5-modified     84637.11k   229017.09k   444609.62k   588069.50k   655114.24k   660850.56k
                 +13%        +17%         +18%         +20%         +23%         +23%

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16928)


No differences found