problems with too many ssl_read and ssl_write errors
Michael.Wojcik at microfocus.com
Wed Aug 25 23:16:52 UTC 2021
> From: Kamala Ayyar <kamala.ayyar at gmail.com>
> Sent: Monday, 23 August, 2021 09:22
> We get the SSL_ERROR_SYSCALL from SSL_Read and SSL_Write quite often.
You'll get SSL_ERROR_SYSCALL any time OpenSSL makes a system call (including, on Windows, a Winsock call) and gets an error.
> It seems the handshake is done correctly and over a period of time (few hours
> to 2-3 days random) the SSL_Read /SSL_Write fails. We do not get the
> WSAEWOULDBLOCK error code
What is the underlying error, then? Are you logging the result of WSAGetLastError immediately after you get SSL_ERROR_SYSCALL? What about the SSL error stack (with ERR_print_errors_fp or similar)?
> nor the OpenSSL's version of SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE error.
SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE are not related to WSAEWOULDBLOCK, so I'm not sure why you're mentioning them here.
> We get WSAETIMEDOUT on Receive more often and a few times on the Send.
That's typically the case; generally speaking, a timeout is more likely when receiving (where you are at the mercy of the peer sending data) than when sending (where you simply need the peer to open the receive window and then ACK the sent data, both of which are often possible even if the application is not behaving, depending on the amount of data and other variables).
> We are not using SO_KEEPALIVE but using application specific heartbeat TO to
> keep the socket alive.
That could certainly cause send or receive timeouts on the socket if the peer becomes unresponsive. The same is true of any application-data transmission, of course.
> Based on blogs and googling we have seen that OpenSSL quite often issues a
> SSL_ERROR_SYSCALL when a Timeout is encountered
Yes, that's what it should do, if "when a timeout is encountered" means "a socket-API function returns an error due to a timeout". SSL_ERROR_SYSCALL means exactly that: a system call returned an error.
I suspect one of the following:
- A client application is hanging (or blocking for some other reason), and consequently:
- Not sending data, so the server's not receiving data until it times out, or
- Not receiving data that the server is sending; that will cause its receive window to fill, and eventually the server's send will time out.
- Network issues are transiently preventing data and/or ACK reception by one side or the other. That will also eventually lead to timeouts.
More information about the openssl-users