[openssl-users] BN_MUL_MONT for ARM64 v8
Vijay Chander
vijay.chander at gmail.com
Mon Feb 6 18:41:39 UTC 2017
Is big number montogomery multiplication as optimized as it can be for
ARM64 as compared to X86-64 from the latest openssl github ?
We are not seeing vmull ( or pmull/pmull2) instructions in armv8-mont.pl.
On an ARM cortex-A72 (1GHz) and E5-2620 (2.1 Ghz) we are seeing an
order of 10 difference in RSA signing perf for 2048 bit keys.
Ran
openssl speed rsa2048
Here are the openssl speed numbers.
*x86-64*
[root at nuosrv2 openssl]# ./apps/openssl speed rsa2048
Doing 2048 bit private rsa's for 10s: 13134 2048 bit private RSA's in 9.97s
Doing 2048 bit public rsa's for 10s: 379019 2048 bit public RSA's in 9.98s
OpenSSL 1.1.1-dev xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int)
blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS
-DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2
-DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m
-DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM
-DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM
-DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\""
-DENGINESDIR="\"/usr/local/lib64/engines-1.1\""
-Wa,--noexecstack
sign verify sign/s verify/s
*rsa 2048 bits 0.000759s 0.000026s 1317.4 37977.9*
*arm64:*
[root at juno openssl]# ./apps/openssl speed rsa2048
Doing 2048 bit private rsa's for 10s: 1319 2048 bit private RSA's in 9.92s
Doing 2048 bit public rsa's for 10s: 49209 2048 bit public RSA's in 9.93s
OpenSSL 1.1.1-dev xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS
-DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM
-DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM
-DOPENSSLDIR="\"/usr/local/ssl\""
-DENGINESDIR="\"/usr/local/lib/engines-1.1\""
-Wa,--noexecstack
sign verify sign/s verify/s
*rsa 2048 bits 0.007521s 0.000202s 133.0 4955.6*
* ARM64 heavy hitters*
69.70% openssl libcrypto.so.1.1 [.] __bn_sqr8x_mont
18.64% openssl libcrypto.so.1.1 [.] __bn_mul4x_mont
4.92% openssl libcrypto.so.1.1 [.]
MOD_EXP_CTIME_COPY_FROM_PREBUF
1.50% openssl libcrypto.so.1.1 [.] bn_mul_add_words
* x86-64 heavy hitters*
30.93% openssl libcrypto.so.1.1 [.]
__bn_sqrx8x_reduction
17.65% openssl libcrypto.so.1.1 [.] bn_sqrx8x_internal
12.65% openssl libcrypto.so.1.1 [.] mulx4x_internal
8.91% openssl libcrypto.so.1.1 [.] bn_mul_add_words
7.14% openssl libcrypto.so.1.1 [.] bn_mulx4x_mont
Code looks different between x86 and ARM64. Is it due to the ISA or ARM64
not yet catching up with
super efficient X86-64.
Basically are we stuck with 1:5 (if we extrapolate A72 to 2Ghz) or is there
an optimal code that
we need to pick up for ARM64. I compiled openssl from github (latest).
Any pointers will be extremely helpful.
Thanks,
-vijay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mta.openssl.org/pipermail/openssl-users/attachments/20170206/f09abc63/attachment-0001.html>
More information about the openssl-users
mailing list