[openssl-users] BN_MUL_MONT for ARM64 v8

Vijay Chander vijay.chander at gmail.com
Mon Feb 6 18:41:39 UTC 2017


  Is big number montogomery multiplication as optimized as it can be for
ARM64 as compared to X86-64 from the latest openssl github ?
  We are not seeing vmull ( or pmull/pmull2) instructions in armv8-mont.pl.


   On an ARM cortex-A72 (1GHz)  and E5-2620 (2.1 Ghz)  we are seeing an
order of 10 difference in RSA signing perf for 2048 bit keys.


  Ran

          openssl speed rsa2048


Here are the openssl speed numbers.

*x86-64*

[root at nuosrv2 openssl]# ./apps/openssl speed rsa2048
Doing 2048 bit private rsa's for 10s: 13134 2048 bit private RSA's in 9.97s
Doing 2048 bit public rsa's for 10s: 379019 2048 bit public RSA's in 9.98s
OpenSSL 1.1.1-dev  xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int)
blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS
-DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2
-DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m
-DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM
-DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM
-DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\""
-DENGINESDIR="\"/usr/local/lib64/engines-1.1\""
-Wa,--noexecstack
                  sign    verify    sign/s verify/s
*rsa 2048 bits 0.000759s 0.000026s   1317.4  37977.9*


*arm64:*

[root at juno openssl]# ./apps/openssl speed rsa2048
Doing 2048 bit private rsa's for 10s: 1319 2048 bit private RSA's in 9.92s
Doing 2048 bit public rsa's for 10s: 49209 2048 bit public RSA's in 9.93s
OpenSSL 1.1.1-dev  xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS
-DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM
-DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM
-DOPENSSLDIR="\"/usr/local/ssl\""
-DENGINESDIR="\"/usr/local/lib/engines-1.1\""
-Wa,--noexecstack
                  sign    verify    sign/s verify/s
*rsa 2048 bits 0.007521s 0.000202s    133.0   4955.6*



   * ARM64 heavy hitters*



    69.70%  openssl        libcrypto.so.1.1         [.] __bn_sqr8x_mont
    18.64%  openssl        libcrypto.so.1.1         [.] __bn_mul4x_mont
     4.92%  openssl        libcrypto.so.1.1         [.]
MOD_EXP_CTIME_COPY_FROM_PREBUF
     1.50%  openssl        libcrypto.so.1.1         [.] bn_mul_add_words


   * x86-64 heavy hitters*



    30.93%  openssl          libcrypto.so.1.1         [.]
__bn_sqrx8x_reduction
    17.65%  openssl          libcrypto.so.1.1         [.] bn_sqrx8x_internal
    12.65%  openssl          libcrypto.so.1.1         [.] mulx4x_internal
     8.91%  openssl          libcrypto.so.1.1         [.] bn_mul_add_words
     7.14%  openssl          libcrypto.so.1.1         [.] bn_mulx4x_mont


Code looks different between x86 and ARM64. Is it due to the ISA or ARM64
not yet catching up with
super efficient X86-64.

Basically are we stuck with 1:5 (if we extrapolate A72 to 2Ghz) or is there
an optimal code that
we need to pick up for ARM64.  I compiled openssl from github (latest).





Any pointers will be extremely helpful.


Thanks,
-vijay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mta.openssl.org/pipermail/openssl-users/attachments/20170206/f09abc63/attachment-0001.html>


More information about the openssl-users mailing list