* This might appear controversial, but the fact is that generic
* prime method was observed to deliver better performance even
* for NIST primes on a range of platforms, e.g.: 60%-15%
- * improvement on IA-64, 50%-20% on ARM, 30%-90% on P4, 20%-25%
+ * improvement on IA-64, ~25% on ARM, 30%-90% on P4, 20%-25%
* in 32-bit build and 35%--12% in 64-bit build on Core2...
* Coefficients are relative to optimized bn_nist.c for most
* intensive ECDSA verify and ECDH operations for 192- and 521-
- * bit keys respectively. What effectively happens is that loop
- * with bn_mul_add_words is put against bn_mul_mont, and latter
- * wins on short vectors. Correct solution should be implementing
- * dedicated NxN multiplication subroutines for small N. But till
- * it materializes, let's stick to generic prime method...
+ * bit keys respectively. Choice of these boundary values is
+ * arguable, because the dependency of improvement coefficient
+ * from key length is not a "monotone" curve. For example while
+ * 571-bit result is 23% on ARM, 384-bit one is -1%. But it's
+ * generally faster, sometimes "respectfully" faster, or
+ * "tolerably" slower... What effectively happens is that loop
+ * with bn_mul_add_words is put against bn_mul_mont, and the
+ * latter "wins" on short vectors. Correct solution should be
+ * implementing dedicated NxN multiplication subroutines for
+ * small N. But till it materializes, let's stick to generic
+ * prime method...
* <appro>
*/
meth = EC_GFp_mont_method();