[openssl-users] vpaes performance problems on SSSE3 capable Amd and Intel Baytrail cpus

Arne Fitzenreiter Arne.Fitzenreiter at ipfire.org
Fri May 8 07:46:38 UTC 2015


Hi,

i have a performance problem with aes-xxx-cbc in evp mode on some cpus. 
Drop from 70MB/s to 30MB/s. It seems that the vpaes implemention is not 
good for all cpus that support ssse3. (I know that it speed up a lot on 
many Intel cpu's)

Tested cpu's that have the problem:
AMD E-⁠350
AMD E2-⁠1800
AMD A4-⁠5000 (only noticeable when disabling AES-⁠NI)
AMD FX8150 (only noticeable when disabling AES-⁠NI)

Intel Celeron J1900
Inter Celeron N2930

I will add some output with older OpenSSL from a Linux-Mint system but 
it is the same with current 1.0.2a on IPFire build.

Any Ideas to solve this without disabling vpaes for all cpu's.

I already have a patch to disable it for Amd because i have not found 
any Amd that are faster with vpaes, but for Intel Core2 it brings a lot 
of speed.

http://git.ipfire.org/?p=ipfire-2.x.git;a=blob;f=src/patches/openssl-1.0.2a_disable_ssse3_for_amd.patch;h=097cc80713ffc592dfe708ba9155591407c34c14;hb=0e2f9b011b8945dbfdfd3cac9fe1a486c48732e1

Regards,
Arne Fitzenreiter
Maintainer IPFire 2.x

-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠-⁠
arne at hp-e2 ~ $ cat /⁠proc/⁠cpuinfo
processor    : 0
vendor_id    : AuthenticAMD
cpu family    : 20
model        : 2
model name    : AMD E2-⁠1800 APU with Radeon(tm) HD Graphics
stepping    : 0
microcode    : 0x500010d
cpu MHz        : 850.000
cache size    : 512 KB
physical id    : 0
siblings    : 2
core id        : 0
cpu cores    : 2
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 6
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid 
aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic 
cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat 
hw_pstate npt lbrv svm_lock nrip_save pausefilter
bogomips    : 3393.76
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 36 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
-⁠-⁠-⁠-⁠ other 4 cores removed -⁠-⁠-⁠-⁠

For reference without -⁠evp
hp-⁠e2 ~ # openssl speed aes-⁠256-⁠cbc
Doing aes-256 cbc for 3s on 16 size blocks: 4735277 aes-256 cbc's in 
3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 1244427 aes-256 cbc's in 
2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 316282 aes-256 cbc's in 
2.99s
Doing aes-256 cbc for 3s on 1024 size blocks: 209266 aes-256 cbc's in 
2.99s
Doing aes-256 cbc for 3s on 8192 size blocks: 26337 aes-256 cbc's in 
2.99s
OpenSSL 1.0.1f 6 Jan 2014
built on: Thu Mar 19 15:12:02 UTC 2015
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) 
blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 
-fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions 
-Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM 
-DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 
bytes
aes-256 cbc      25254.81k    26636.56k    27079.66k    71668.36k    
72158.09k

now with -⁠evp
hp-⁠e2 ~ # openssl speed -⁠evp aes-⁠256-⁠cbc
Doing aes-256-cbc for 3s on 16 size blocks: 4915660 aes-256-cbc's in 
2.98s
Doing aes-256-cbc for 3s on 64 size blocks: 1278970 aes-256-cbc's in 
2.98s
Doing aes-256-cbc for 3s on 256 size blocks: 324633 aes-256-cbc's in 
2.99s
Doing aes-256-cbc for 3s on 1024 size blocks: 81472 aes-256-cbc's in 
2.98s
Doing aes-256-cbc for 3s on 8192 size blocks: 10196 aes-256-cbc's in 
2.98s
OpenSSL 1.0.1f 6 Jan 2014
built on: Thu Mar 19 15:12:02 UTC 2015
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) 
blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 
-fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions 
-Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM 
-DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 
bytes
aes-256-cbc      26392.81k    27467.81k    27794.66k    27995.75k    
28028.74k

now with hided ssse3 so it has to fallback...
hp-⁠e2 ~ # OPENSSL_ia32cap=~0x20000000000 openssl speed -⁠evp 
aes-⁠256-⁠cbc
Doing aes-256-cbc for 3s on 16 size blocks: 4594852 aes-256-cbc's in 
2.99s
Doing aes-256-cbc for 3s on 64 size blocks: 1232170 aes-256-cbc's in 
2.98s
Doing aes-256-cbc for 3s on 256 size blocks: 314750 aes-256-cbc's in 
2.99s
Doing aes-256-cbc for 3s on 1024 size blocks: 207284 aes-256-cbc's in 
2.98s
Doing aes-256-cbc for 3s on 8192 size blocks: 26242 aes-256-cbc's in 
2.99s
OpenSSL 1.0.1f 6 Jan 2014
built on: Thu Mar 19 15:12:02 UTC 2015
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) 
blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT 
-DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 
-fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions 
-Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM 
-DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 
bytes
aes-256-cbc      24587.84k    26462.71k    26948.49k    71227.79k    
71897.81k
hp-⁠e2 ~ #


More information about the openssl-users mailing list