SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

Thu May 2 10:01:13 UTC 2019

Openssl 1.1.0h
We have implemented the workaround - if SSL_ERROR_SYSCALL and errno=EAGAIN then treat as WANT_READ/WANT_WRITE. This (seems to) work fine. No subsequent problems, everything continues correctly.

Regards,
John

-----Original Message-----
From: openssl-users <openssl-users-bounces at openssl.org> On Behalf Of Matt Caswell
Sent: 01 May 2019 08:42
To: openssl-users at openssl.org
Subject: Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

CAUTION: This email originated from outside of Synchronoss.

On 30/04/2019 23:37, Viktor Dukhovni wrote:
> On Tue, Apr 30, 2019 at 03:23:23PM -0700, Erik Forsberg wrote:
>
>>> Is the handshake explicit, or does the application just call 
>>> SSL_read(), with OpenSSL performing the handshake as needed?
>>
>> I occasionally (somewhat rarely) see the issue mentioned by the OP.
>> Ignoring the error, or mapping it and do what WANT_READ/WANT_WRITE 
>> does effectively hides the issue and connection works fine. I 
>> predominantly run on Solaris 11. In my case, I open the socket 
>> myself, set non-blocking mode and associates with an SSL object using SS_set_fd().
>> The initial handshake is done explicitly.
>
> Recoverable errors should not result in SSL_ERROR_SYSCALL.  This feels 
> like a bug.  I'd like to hear from Matt Caswell on this one.
> Perhaps someone should open an issue on Github...
>

SSL_ERROR_SYSCALL should not be raised as result of a recoverable error. This should always be considered fatal. If you are getting this but errno says EAGAIN then a number of possibilities spring to mind:

1) If a fatal error has occurred SSL_get_error() checks to see if there is an error on the OpenSSL error queue. If there is it returns SSL_ERROR_SSL (unless the error type is ERR_LIB_SYS). If there is no error at all, but libssl doesn't think the error is recoverable then it will return SSL_ERROR_SYSCALL by default.
It is possible that libssl has encountered some non-syscall related error but neglected to push an error onto the error queue. Thus the return value incorrectly indicates SSL_ERROR_SYSCALL when it should have been SSL_ERROR_SSL.
This would be an OpenSSL bug - but quite tricky to find since we'd have to locate the spot where no error is being pushed...but because there is no error we don't have a lot to go on!

2) A second possibility is that it really was a syscall that failed but something (either in libssl or possibly in application code) made some subsequent syscall that changed errno in the meantime. If that "something" was in libssl then that's probably also a libssl bug. (Also quite tricky to track down)

3) A third possibility is that it really is a retryable error but libssl failed to properly set its state to note that. I think this is quite a lot less likely than (1) or (2) but would also be a libssl bug.

So my guess is, except in the case where the application itself has accidentally changed errno, this most likely indicates an openssl bug. The safest thing to do in such circumstances is to treat this as a fatal error. It is very unwise to retry a connection where the library has indicated a fatal error (e.g. see
CVE-2019-1559)

What OpenSSL version is this?

Matt