<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hello,</p>
<p>we have been used libssl 0.9.x/1.0.x with no issue for more than
15 years in a multi-forking application, respectively Kamailio SIP
server. The application is initializing in the main process
(including loading data from database like mysql or postgress)
then creates a pool of processes. We are setting our memory
management functions as well as locking functions for 0.9.x/1.0.x.</p>
<p>With the internal changes in the libssl 1.1, the locking
functions cannot be set any more. However, it appears that it ends
up in a deadlock, or at least long time blocking. I received
several such reports from different people running Debian stable.
Few of them switched back to compile against libssl 1.0.x and no
such issue happened again.<br>
</p>
<p>Has anyone else here experienced something similar?</p>
<p>For a better understanding of how we use libssl, here is what
happens:</p>
<p> 1) kamailio starts and sets the memory management functions for
libssl<br>
2) it loads data from backend/database into memory -- this can
create connections to database servers using external libs (e.g.,
mysql) that may use libssl<br>
3) once the initialization is done (connections to backends
should be also closed), it knows how many child processes will be
forked and creates a dedicated SSL_CTX for each of them<br>
4) kamailio forks and each process is using its own SSL_CTX
structure for accepting or connecting over tls -- each child
process will also reconnect to backend, if it needs it for runtime<br>
5) after a while, several processes try to acquire same mutex
inside libssl/libcrypto, but that seems to be already acquired --
I paste next two partial backtraces from different processes taken
when blocking happen, the others are similar and the rest of
processes are waiting for traffic on the network, their backtrace
don't show they do anything with libssl at that moment -- see all
at <a href="https://sip.antisip.com/kamailio-trap-tcp-down.txt">https://sip.antisip.com/kamailio-trap-tcp-down.txt</a><br>
</p>
<p>Note that we do not create threads in our application, but we
cannot control if an used external library (e.g., mysql client)
does it. Also, the tls connection can be used by different child
processes.<br>
</p>
<p>Digging a bit in the libssl code, my first thoughts of a possible
issue went to the type of locks created by libcrypto, because they
are not process shared. Normally, operations to read/write for a
connections should happen in one process, then when no traffic,
the child process can move to another connection and handle
traffic on it and other process can get to the previous connection
once it has new traffic to handle.</p>
<p>Anyone having hints about what can be wrong there? Is libssl 1.1
supposed to be initialized/used in a different way for a
multi-forking application with use of a tls connection between
child processes?</p>
<p>Thanks,<br>
Daniel<br>
</p>
<pre style="color: rgb(0, 0, 0); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial; overflow-wrap: break-word; white-space: pre-wrap;">#0 0x00007ff8eedb7470 in futex_wait (private=<optimized out>, expected=18780, futex_word=0x7ff8de86130c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
__ret = -512
err = <optimized out>
#1 futex_wait_simple (private=<optimized out>, expected=18780, futex_word=0x7ff8de86130c) at ../sysdeps/nptl/futex-internal.h:135
No locals.
#2 __pthread_rwlock_wrlock_slow (rwlock=0x7ff8de861300) at pthread_rwlock_wrlock.c:67
waitval = 18780
result = 0
futex_shared = <optimized out>
#3 0x00007ff8ef189ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4 0x00007ff8ef158c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#5 0x00007ff8ef4a3caf in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6 0x00007ff8ef4994ff in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#7 0x00007ff8ef491f61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#0 0x00007ff8eedb7470 in futex_wait (private=<optimized out>, expected=918, futex_word=0x7ff8de86130c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
__ret = -512
err = <optimized out>
#1 futex_wait_simple (private=<optimized out>, expected=918, futex_word=0x7ff8de86130c) at ../sysdeps/nptl/futex-internal.h:135
No locals.
#2 __pthread_rwlock_wrlock_slow (rwlock=0x7ff8de861300) at pthread_rwlock_wrlock.c:67
waitval = 918
result = 0
futex_shared = <optimized out>
#3 0x00007ff8ef189ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4 0x00007ff8ef158756 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#5 0x00007ff8ef4a7465 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6 0x00007ff8ef4997ee in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#7 0x00007ff8ef491f61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
</pre>
<pre class="moz-signature" cols="72">--
Daniel-Constantin Mierla -- <a class="moz-txt-link-abbreviated" href="http://www.asipto.com">www.asipto.com</a>
<a class="moz-txt-link-abbreviated" href="http://www.twitter.com/miconda">www.twitter.com/miconda</a> -- <a class="moz-txt-link-abbreviated" href="http://www.linkedin.com/in/miconda">www.linkedin.com/in/miconda</a>
</pre>
</body>
</html>