[openssl-users] vpaes performance problems on SSSE3 capable Amd and Intel Baytrail cpus
Arne Fitzenreiter
Arne.Fitzenreiter at ipfire.org
Fri May 8 07:46:38 UTC 2015
Hi,
i have a performance problem with aes-xxx-cbc in evp mode on some cpus.
Drop from 70MB/s to 30MB/s. It seems that the vpaes implemention is not
good for all cpus that support ssse3. (I know that it speed up a lot on
many Intel cpu's)
Tested cpu's that have the problem:
AMD E-350
AMD E2-1800
AMD A4-5000 (only noticeable when disabling AES-NI)
AMD FX8150 (only noticeable when disabling AES-NI)
Intel Celeron J1900
Inter Celeron N2930
I will add some output with older OpenSSL from a Linux-Mint system but
it is the same with current 1.0.2a on IPFire build.
Any Ideas to solve this without disabling vpaes for all cpu's.
I already have a patch to disable it for Amd because i have not found
any Amd that are faster with vpaes, but for Intel Core2 it brings a lot
of speed.
http://git.ipfire.org/?p=ipfire-2.x.git;a=blob;f=src/patches/openssl-1.0.2a_disable_ssse3_for_amd.patch;h=097cc80713ffc592dfe708ba9155591407c34c14;hb=0e2f9b011b8945dbfdfd3cac9fe1a486c48732e1
Regards,
Arne Fitzenreiter
Maintainer IPFire 2.x
----------------------------------------------------------------------------------------
arne at hp-e2 ~ $ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 20
model : 2
model name : AMD E2-1800 APU with Radeon(tm) HD Graphics
stepping : 0
microcode : 0x500010d
cpu MHz : 850.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid
aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic
cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat
hw_pstate npt lbrv svm_lock nrip_save pausefilter
bogomips : 3393.76
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
---- other 4 cores removed ----
For reference without -evp
hp-e2 ~ # openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 4735277 aes-256 cbc's in
3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 1244427 aes-256 cbc's in
2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 316282 aes-256 cbc's in
2.99s
Doing aes-256 cbc for 3s on 1024 size blocks: 209266 aes-256 cbc's in
2.99s
Doing aes-256 cbc for 3s on 8192 size blocks: 26337 aes-256 cbc's in
2.99s
OpenSSL 1.0.1f 6 Jan 2014
built on: Thu Mar 19 15:12:02 UTC 2015
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial)
blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT
-DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2
-fstack-protector --param=ssp-buffer-size=4 -Wformat
-Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions
-Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM
-DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192
bytes
aes-256 cbc 25254.81k 26636.56k 27079.66k 71668.36k
72158.09k
now with -evp
hp-e2 ~ # openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 4915660 aes-256-cbc's in
2.98s
Doing aes-256-cbc for 3s on 64 size blocks: 1278970 aes-256-cbc's in
2.98s
Doing aes-256-cbc for 3s on 256 size blocks: 324633 aes-256-cbc's in
2.99s
Doing aes-256-cbc for 3s on 1024 size blocks: 81472 aes-256-cbc's in
2.98s
Doing aes-256-cbc for 3s on 8192 size blocks: 10196 aes-256-cbc's in
2.98s
OpenSSL 1.0.1f 6 Jan 2014
built on: Thu Mar 19 15:12:02 UTC 2015
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial)
blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT
-DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2
-fstack-protector --param=ssp-buffer-size=4 -Wformat
-Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions
-Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM
-DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192
bytes
aes-256-cbc 26392.81k 27467.81k 27794.66k 27995.75k
28028.74k
now with hided ssse3 so it has to fallback...
hp-e2 ~ # OPENSSL_ia32cap=~0x20000000000 openssl speed -evp
aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 4594852 aes-256-cbc's in
2.99s
Doing aes-256-cbc for 3s on 64 size blocks: 1232170 aes-256-cbc's in
2.98s
Doing aes-256-cbc for 3s on 256 size blocks: 314750 aes-256-cbc's in
2.99s
Doing aes-256-cbc for 3s on 1024 size blocks: 207284 aes-256-cbc's in
2.98s
Doing aes-256-cbc for 3s on 8192 size blocks: 26242 aes-256-cbc's in
2.99s
OpenSSL 1.0.1f 6 Jan 2014
built on: Thu Mar 19 15:12:02 UTC 2015
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial)
blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT
-DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2
-fstack-protector --param=ssp-buffer-size=4 -Wformat
-Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions
-Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM
-DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192
bytes
aes-256-cbc 24587.84k 26462.71k 26948.49k 71227.79k
71897.81k
hp-e2 ~ #
More information about the openssl-users
mailing list