[openssl-dev] ARM optimised montgomery multiplication (armv4-mont)

Jonathan Larmour jifl at eCosCentric.com
Tue Jun 16 20:41:36 UTC 2015


Hi,

Thanks for the reply.

On 16/06/15 13:09, Andy Polyakov wrote:
>>
>> With some experimentation, it turns out that if I *stop* using the
>> crypto/bn/asm/bn/armv4-mont.pl generated asm "optimised" version, the time for
>> a simplish test to establish and close a simple SSL connection went from 28
>> seconds to 18. (It's quite a slow target at any time).
>>
>> In other words, this "optimised" version has slowed things down dramatically.
>> Has anyone queried the value of the asm of armv4-mont.pl any time in the last
>> few years?
> 
> Yes, of course. For reference, here are speed rsa2048 dsa2048 results
> from Cortex-A8. Numbers are operations per second, so that higher is better.
> 
> Without armv4-mont.pl:
> 
>                   sign    verify    sign/s verify/s
> rsa 2048 bits 0.052684s 0.001421s     19.0    703.5
> dsa 2048 bits 0.014576s 0.017526s     68.6     57.1
> 
> With armv4-mont.pl but without NEON (ARM SIMD extension):
> 
> rsa 2048 bits 0.039255s 0.001140s     25.5    877.3
> dsa 2048 bits 0.011630s 0.013900s     86.0     71.9


Wow, I get very different results on my ARM9 target. Without armv4-mont.pl:
                  sign    verify    sign/s verify/s
rsa 2048 bits 2.567500s 0.072826s      0.4     13.7
dsa 2048 bits 0.722857s 0.865833s      1.4      1.2

With armv4-mont.pl:
                  sign    verify    sign/s verify/s
rsa 2048 bits 3.433333s 0.104896s      0.3      9.5
dsa 2048 bits 1.058000s 1.253750s      0.9      0.8

What's more, I dug out a Cortex-A9 target (Atmel CycloneV board, operating
with single core only) and got this without armv4-mont.pl:
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.127342s 0.003628s      7.9    275.6
dsa 2048 bits 0.035971s 0.042778s     27.8     23.4

and this with armv4-mont.pl:
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.172931s 0.005222s      5.8    191.5
dsa 2048 bits 0.052565s 0.061350s     19.0     16.3

As you can see, in both cases using armv4-mont.pl makes it 30% slower. So
whatever is going on, it isn't down to the CPU. I think there must be
something else going on. I'll get back to you.

Jifl
-- 
eCosCentric Limited      http://www.eCosCentric.com/     The eCos experts
Barnwell House, Barnwell Drive, Cambridge, UK.       Tel: +44 1223 245571
Registered in England and Wales: Reg No 4422071.
------["Si fractum non sit, noli id reficere"]------       Opinions==mine


More information about the openssl-dev mailing list