libssl 1.1 blocking with multi-forking application

Daniel-Constantin Mierla miconda at gmail.com
Mon Apr 1 07:40:43 UTC 2019


Hello,

we have been used libssl 0.9.x/1.0.x with no issue for more than 15
years in a multi-forking application, respectively Kamailio SIP server.
The application is initializing in the main process (including loading
data from database like mysql or postgress) then creates a pool of
processes. We are setting our memory management functions as well as
locking functions for 0.9.x/1.0.x.

With the internal changes in the libssl 1.1, the locking functions
cannot be set any more. However, it appears that it ends up in a
deadlock, or at least long time blocking. I received several such
reports from different people running Debian stable. Few of them
switched back to compile against libssl 1.0.x and no such issue happened
again.

Has anyone else here experienced something similar?

For a better understanding of how we use libssl, here is what happens:

  1) kamailio starts and sets the memory management functions for libssl
  2) it loads data from backend/database into memory -- this can create
connections to database servers using external libs (e.g., mysql) that
may use libssl
  3) once the initialization is done (connections to backends should be
also closed), it knows how many child processes will be forked and
creates a dedicated SSL_CTX for each of them
  4) kamailio forks and each process is using its own SSL_CTX structure
for accepting or connecting over tls -- each child process will also
reconnect to backend, if it needs it for runtime
  5) after a while, several processes try to acquire same mutex inside
libssl/libcrypto, but that seems to be already acquired -- I paste next
two partial backtraces from different processes taken when blocking
happen, the others are similar and the rest of processes are waiting for
traffic on the network, their backtrace don't show they do anything with
libssl at that moment -- see all at
https://sip.antisip.com/kamailio-trap-tcp-down.txt

Note that we do not create threads in our application, but we cannot
control if an used external library (e.g., mysql client) does it. Also,
the tls connection can be used by different child processes.

Digging a bit in the libssl code, my first thoughts of a possible issue
went to the type of locks created by libcrypto, because they are not
process shared. Normally, operations to read/write for a connections
should happen in one process, then when no traffic, the child process
can move to another connection and handle traffic on it and other
process can get to the previous connection once it has new traffic to
handle.

Anyone having hints about what can be wrong there? Is libssl 1.1
supposed to be initialized/used in a different way for a multi-forking
application with use of a tls connection between child processes?

Thanks,
Daniel

#0  0x00007ff8eedb7470 in futex_wait (private=<optimized out>, expected=18780, futex_word=0x7ff8de86130c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
        __ret = -512
        err = <optimized out>
#1  futex_wait_simple (private=<optimized out>, expected=18780, futex_word=0x7ff8de86130c) at ../sysdeps/nptl/futex-internal.h:135
No locals.
#2  __pthread_rwlock_wrlock_slow (rwlock=0x7ff8de861300) at pthread_rwlock_wrlock.c:67
        waitval = 18780
        result = 0
        futex_shared = <optimized out>
#3  0x00007ff8ef189ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4  0x00007ff8ef158c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#5  0x00007ff8ef4a3caf in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6  0x00007ff8ef4994ff in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#7  0x00007ff8ef491f61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1


#0  0x00007ff8eedb7470 in futex_wait (private=<optimized out>, expected=918, futex_word=0x7ff8de86130c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
        __ret = -512
        err = <optimized out>
#1  futex_wait_simple (private=<optimized out>, expected=918, futex_word=0x7ff8de86130c) at ../sysdeps/nptl/futex-internal.h:135
No locals.
#2  __pthread_rwlock_wrlock_slow (rwlock=0x7ff8de861300) at pthread_rwlock_wrlock.c:67
        waitval = 918
        result = 0
        futex_shared = <optimized out>
#3  0x00007ff8ef189ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4  0x00007ff8ef158756 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#5  0x00007ff8ef4a7465 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6  0x00007ff8ef4997ee in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#7  0x00007ff8ef491f61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.

-- 
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mta.openssl.org/pipermail/openssl-users/attachments/20190401/4b0c9f1e/attachment.html>


More information about the openssl-users mailing list