SSL_read() returning SSL_ERROR_SYSCALL with errno 11 EAGAIN

Thu May 2 16:10:31 UTC 2019

Please note that the connection has been made successfully and many operations and responses have occurred before the fail.

> Do you wait for the non-blocking connect to complete at this point?
We connect in blocking mode then switch to non-blocking.

> Are multiple threads writing to the same SSL connection?  How do you ensure orderly use of the SSL connection?  Sharing connections across threads without application level synchronization is not supported in OpenSSL.

We use mutexes to synchronize of course.

> How are further requests locked out when you're performing reads?
What is the granularity of the relevant locks?

The mutex only allows one SSL call at a time.

> At this point you'd be calling SSL_get_error(), is there a lock that prevents writes between SSL_read() and SSL_read() and SSL_get_error()?

The mutex does not protect SSL_get_error() calls.

> I gather the protocol is full-duplex and multiple outstanding requests can be written before the corresponding replies are read?  Or is it strict half-duplex request-response?

It is full duplex and there can be multiple operations in progress.

Regards,
John.

-----Original Message-----
From: openssl-users <openssl-users-bounces at openssl.org> On Behalf Of Viktor Dukhovni
Sent: 02 May 2019 15:56
To: openssl-users at openssl.org
Subject: Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11 EAGAIN

CAUTION: This email originated from outside of Synchronoss.

> On May 2, 2019, at 5:56 AM, John Unsworth <John.Unsworth at synchronoss.com> wrote:
>
> Create a non-blocking TCP socket
>       socket() for a sock_stream.
>       connect().

Do you wait for the non-blocking connect to complete at this point?

>       SSL_new(), SSL_set_fd(), SSL_connect().
>
> The application sends LDAP operations from many threads.

Are multiple threads writing to the same SSL connection?  How do you ensure orderly use of the SSL connection?  Sharing connections across threads without application level synchronization is not supported in OpenSSL.

> We have just one thread that reads LDAP results.

How are further requests locked out when you're performing reads?
What is the granularity of the relevant locks?

> If an operation is outstanding then the result thread does (simplified):
>
> SSL_read()
> If > 0 return data.

At this point you'd be calling SSL_get_error(), is there a lock that prevents writes between SSL_read() and SSL_read() and SSL_get_error()?

> Else if SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE then poll(); back to SSL_read() when data available.
> Else return error and disconnect.
>
> Don't know what protocol was negotiated or what the server does in terms of returned data. TCP/OpenSSL handle that.
> Both ends OpenSSL 1.1.0h.
> Problem seems to occur at random - only reproducable on customer site and after a long time running their soak test.

It would be helpful if the customer could gather more diagnostic information from that "soak test".  With 1.1.0, presumably they negotiate TLS 1.2, since TLS 1.3 is not available, while 1.2 is available on both ends.

IIRC OpenSSL will normally send the record layer header in the same segment as the payload, so running into EAGAIN is unlikely after the initial 5 bytes of record header, unless the TCP receive window was nearly full.

I gather the protocol is full-duplex and multiple outstanding requests can be written before the corresponding replies are read?  Or is it strict half-duplex request-response?

--
        Viktor.