Server application hangs on SS_read, even when client disconnects

Jakob Bohm jb-openssl at wisemo.com
Tue Nov 17 03:13:54 UTC 2020


(Top posting to match what Mr. André does):

TCP without keepalive will time out the connection a few minutes after
sending any data that doesn't get a response.

TCP without keepalive with no outstanding send (so only a blocking
recv) and nothing outstanding at the other end will probably hang
almost forever as there is nothing indicating that there is actual
data lost in transit.

On 2020-11-13 17:13, Brice André wrote:
> Hello,
>
> And many thanks for the answer.
>
> "Does the server parent process close its copy of the conversation 
> socket?" : I checked in my code, but it seems that no. Is it needed  ? 
> May it explain my problem ?
>
> " Do you have keepalives enabled?" To be honest, I did not know it was 
> possible to not enable them. I checked with command "netstat -tnope" 
> and it tells me that it is not enabled.
>
> I suppose that, if for some reason, the communication with the client 
> is lost (crash of client, loss of network, etc.) and keepalive is not 
> enabled, this may fully explain my problem ?
>
> If yes, do you have an idea of why keepalive is not enabled ? I 
> thought that by default on linux it was ?
>
> Many thanks,
> Brice
>
>
> Le ven. 13 nov. 2020 à 15:43, Michael Wojcik 
> <Michael.Wojcik at microfocus.com <mailto:Michael.Wojcik at microfocus.com>> 
> a écrit :
>
> > From: openssl-users <openssl-users-bounces at openssl.org
> <mailto:openssl-users-bounces at openssl.org>> On Behalf Of Brice André
> > Sent: Friday, 13 November, 2020 05:06
>
> > ... it seems that in some rare execution cases, the server
> performs a SSL_read,
> > the client disconnects in the meantime, and the server never
> detects the
> > disconnection and remains stuck in the SSL_read operation.
>
> ...
>
> > #0  0x00007f836575d210 in __read_nocancel () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> > #1  0x00007f8365c8ccec in ?? () from
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > #2  0x00007f8365c8772b in BIO_read () from
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>
> So OpenSSL is in a blocking read of the socket descriptor.
>
> > tcp        0      0 http://5.196.111.132:5413
> <http://5.196.111.132:5413> http://85.27.92.8:25856
> <http://85.27.92.8:25856>       ESTABLISHED 19218/./MabeeServer
> > tcp        0      0 http://5.196.111.132:5412
> <http://5.196.111.132:5412> http://85.27.92.8:26305
> <http://85.27.92.8:26305>       ESTABLISHED 19218/./MabeeServer
>
> > From this log, I can see that I have two established connections
> with remote
> > client machine on IP 109.133.193.70. Note that it's normal to
> have two connexions
> > because my client-server protocol relies on two distinct TCP
> connexions.
>
> So the client has not, in fact, disconnected.
>
> When a system closes one end of a TCP connection, the stack will
> send a TCP packet
> with either the FIN or the RST flag set. (Which one you get
> depends on whether the
> stack on the closing side was holding data for the conversation
> which the application
> hadn't read.)
>
> The sockets are still in ESTABLISHED state; therefore, no FIN or
> RST has been
> received by the local stack.
>
> There are various possibilities:
>
> - The client system has not in fact closed its end of the
> conversation. Sometimes
> this happens for reasons that aren't immediately apparent; for
> example, if the
> client forked and allowed the descriptor for the conversation
> socket to be inherited
> by the child, and the child still has it open.
>
> - The client system shut down suddenly (crashed) and so couldn't
> send the FIN/RST.
>
> - There was a failure in network connectivity between the two
> systems, and consequently
> the FIN/RST couldn't be received by the local system.
>
> - The connection is in a state where the peer can't send the
> FIN/RST, for example
> because the local side's receive window is zero. That shouldn't be
> the case, since
> OpenSSL is (apparently) blocked in a receive on the connection.
> but as I don't have
> the complete picture I can't rule it out.
>
> > This let me think that the connexion on which the SSL_read is
> listening is
> > definitively dead (no more TCP keepalive)
>
> "definitely dead" doesn't have any meaning in TCP. That's not one
> of the TCP states,
> or part of the other TCP or IP metadata associated with the local
> port (which is
> what matters).
>
> Do you have keepalives enabled?
>
> > and that, for a reason I do not understand, the SSL_read keeps
> blocked into it.
>
> The reason is simple: The connection is still established, but
> there's no data to
> receive. The question isn't why SSL_read is blocking; it's why you
> think the
> connection is gone, but the stack thinks otherwise.
>
> > Note that the normal behavior of my application is : client
> connects, server
> > daemon forks a new instance,
>
> Does the server parent process close its copy of the conversation
> socket?
>
>


Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded



More information about the openssl-users mailing list