[openssl-users] Multithreading: Global locks causing bottleneck in parallel SSL_write calls
dipakgaigole
dipakgaigole at rediffmail.com
Wed Apr 12 10:54:28 UTC 2017
Hi, I have a windows multi-threaded SSL server application which handles each client request in a new thread. The Server handles different types of requests. One of the request type is like “send file” where server thread has to read a file from local filesystem and send the content to the client.Server configurations: FIPS: Enabled SSL Protocol: TLSv1.2 Cipher: AES256-SHA It was observed that as the number of thread parallelism increases, the throughput decreases.To profile the server, I had recompiled the OpenSSL and FIPS source with debug symbol information. When run under a statistical profiler “verysleepy“ (http://www.codersnotes.com/sleepy) points out below stack (hotspot
) which was consuming most of the time.###################################WaitForSingleObjectEx KERNELBASE [unknown] 0 0x7fefd2610dcCRYPTO_lock LIBEAY64 c:\openssl_src\openssl-1.0.2f\crypto\cryptlib.c 597 0xfb0bb26FIPS_lock &nb
sp; LIBEAY64 c:\fips_src\openssl-fips-2.0.10\fips\utl\fips_lck.c 69 0xfceb291fips_drbg_bytes LIBEAY64 c:\fips_src\openssl-fips-2.0.10\fips\rand\fips_drbg_rand.c 86 0xfcfe868RAND_bytes &n
bsp; LIBEAY64 c:\openssl_src\openssl-1.0.2f\crypto\rand\rand_lib.c 159 0xfc0dbe5tls1_enc SSLEAY64 c:\openssl_src\openssl-1.0.2f\ssl\t1_enc.c 786 0x3b6675cdo_ssl3_write SSLEAY64 &
nbsp; c:\openssl_src\openssl-1.0.2f\ssl\s3_pkt.c 1042 0x3b4c336ssl3_write_bytes SSLEAY64 c:\openssl_src\openssl-1.0.2f\ssl\s3_pkt.c 830 0x3b4baddssl3_write SSLEAY64 c:\openssl_src\openssl-1.0.2f\ssl\s3_lib.c &
nbsp; 4404 0x3b4796cSSL_write SSLEAY64 c:\openssl_src\openssl-1.0.2f\ssl\ssl_lib.c 1047 0x3b7a3e4################################### To check if this behavior can be seen outside of our code, I wrote a standalone multi threaded SSL server which performs same task as “send file”. And profiling of the standalone server also point out at the similar stack. So I was able to reproduced this behavior in standalone program.File size used: 340 MB To find out how the bottleneck varies with increasing the parallel thread count
in standalone SSL server program, I analyzed one thread behavior with different parallelism and here are the results:######################“Parallel thread count” -> “% of time spend in waiting for global lock”1 -> 1 %2 -> 2 %5 -> 5 %10 -> 40 %15 -> 46 %20 -> 65 %25 -> 68 %30 -> 70 %###################### After digging into the FIPS code found that there is a global lock around the random number generation code which is causing the bottleneck when multiple threads want to perform SSL_write operation in parallel.Code snippet from fips/rand/fips_drbg_rand.c:######################/* Since we only have one global PRNG used at any time in OpenSSL use a global* variable to store context.*/static DRBG_CTX ossl_dctx;….….static int fips_drbg_bytes(unsigned char *out, int count) {
DRBG_CTX *dctx = &ossl_dctx; int rv = 0; unsigned char *adin = NULL; size_t adinlen = 0; CRYPTO_w_lock(CRYPTO_LOCK_RAND); …. …. CRYPTO_w_unlock(CRYPTO_LOCK_RAND);###################### As comment from fips_drbg_rand.c says, do we really need to have one global PRNG at any time in OpenSSL? Does any
one has any suggestion about how starvation (due to the global locks) of parallel SSL_write can be reduced? Any suggestions are welcome :) Thanks,Dipak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mta.openssl.org/pipermail/openssl-users/attachments/20170412/70b473ef/attachment-0001.html>
More information about the openssl-users
mailing list