Server application hangs on SS_read, even when client disconnects

Fri Nov 13 12:06:25 UTC 2020

Hello,

I have developed a client-server application with openssl and I have a
recurrent bug where, sometimes, server instance seems to be definitively
stuck in SSL_read call.

I have put more details of the problem here below, but it seems that in
some rare execution cases, the server performs a SSL_read, the client
disconnects in the meantime, and the server never detects the disconnection
and remains stuck in the SSL_read operation.

My server runs on a Debian 6.3, and my version of openssl is 1.1.0l.

Here is an extract of the code that manages the SSL connexion at server
side :

   ctx = SSL_CTX_new(SSLv23_server_method());

   BIO* bio = BIO_new_file("dhkey.pem", "r");
   if (bio == NULL) ...
   DH* ret = PEM_read_bio_DHparams(bio, NULL, NULL, NULL);
   BIO_free(bio);
   if (SSL_CTX_set_tmp_dh(ctx, ret) < 0) ...

   SSL_CTX_set_default_passwd_cb_userdata(ctx, (void*)key);
   if (SSL_CTX_use_PrivateKey_file(ctx, "server.key", SSL_FILETYPE_PEM) <=
0) ...
   if (SSL_CTX_use_certificate_file(ctx, "server.crt", SSL_FILETYPE_PEM) <=
0) ...
   if (SSL_CTX_check_private_key(ctx) == 0) ...
   SSL_CTX_set_cipher_list(ctx, "ALL");

   ssl_in = SSL_new(ctx);
   BIO* sslclient_in = BIO_new_socket(in_sock, BIO_NOCLOSE);
   SSL_set_bio(ssl_in, sslclient_in, sslclient_in);
   int r_in = SSL_accept(ssl_in);
   if (r_in != 1) ...

   ...

   /* Place where program hangs : */
   int read = SSL_read(ssl_in, &(((char*)ptr)[nb_read]), size-nb_read);

Here is the full stack-trace where the program hangs :

#0  0x00007f836575d210 in __read_nocancel () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f8365c8ccec in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#2  0x00007f8365c8772b in BIO_read () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#3  0x00007f83659879a2 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#4  0x00007f836598b70d in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#5  0x00007f8365989113 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#6  0x00007f836598eff6 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#7  0x00007f8365998dc9 in SSL_read () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
#8  0x000055b7b3e98289 in Socket::SslRead (this=0x7ffdc6131900, size=4,
ptr=0x7ffdc613066c)
    at ../../Utilities/Database/Sync/server/Communication/Socket.cpp:80

Here is the result of "netstat -natp | grep <pid of hanging process>" :

tcp       32      0 5.196.111.132:5412      109.133.193.70:51822
 CLOSE_WAIT  19218/./MabeeServer
tcp       32      0 5.196.111.132:5412      109.133.193.70:51696
 CLOSE_WAIT  19218/./MabeeServer
tcp       32      0 5.196.111.132:5412      109.133.193.70:51658
 CLOSE_WAIT  19218/./MabeeServer
tcp        0      0 5.196.111.132:5413      85.27.92.8:25856
 ESTABLISHED 19218/./MabeeServer
tcp       32      0 5.196.111.132:5412      109.133.193.70:51818
 CLOSE_WAIT  19218/./MabeeServer
tcp       32      0 5.196.111.132:5412      109.133.193.70:51740
 CLOSE_WAIT  19218/./MabeeServer
tcp        0      0 5.196.111.132:5412      85.27.92.8:26305
 ESTABLISHED 19218/./MabeeServer
tcp6       0      0 ::1:36448               ::1:5432
 ESTABLISHED 19218/./MabeeServer

>From this log, I can see that I have two established connections with
remote client machine on IP 109.133.193.70. Note that it's normal to have
two connexions because my client-server protocol relies on two distinct TCP
connexions.

>From this, I logged the result of a "tcpdump -i any -nn host 85.27.92.8"
during two days (and during those two days, my server instance remained
stuck in SSL_read...). On this log, I see no packet exchange on ports
85.27.92.8:25856 or 85.27.92.8:26305. I see some burst of packets exchanged
on other client TCP ports, but probably due to the client that performs
other requests to the server (and thus, the server that is forking new
instances with connections on other client ports).

This let me think that the connexion on which the SSL_read is listening is
definitively dead (no more TCP keepalive), and that, for a reason I do not
understand, the SSL_read keeps blocked into it.

Note that the normal behavior of my application is : client connects,
server daemon forks a new instance, communication remains a few seconds
with forked server instance, client disconnects and the forked process
finished.

Note also that normally, client performs a proper disconnection
(SSL_shutdown, etc.). But I cannot guarantee it never interrupts on a more
abrupt way (connection lost, client crash, etc.).

Any advice on what is going wrong ?

Many thanks,

Brice
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mta.openssl.org/pipermail/openssl-users/attachments/20201113/0e7fa8d3/attachment.html>