How to debug a TLSv1.3 protocol problem?

Wed May 20 20:36:23 UTC 2020

On Wed, May 20, 2020 at 09:21:25PM +0100, Matt Caswell wrote:

> > Explanation:
> > want_write = BIO_ctrl_pending(network_bio)
> > want_read = BIO_ctrl_get_read_request(network_bio)
> > 
> > I didn't instrument the write part, only the read part.
> > want_read>0 -> invoke read, check result:  read=STATUS, num_read=N
> > So the last read does not get the data the library wants,
> > and hence the client fails and closes the connection.
> 
> This sounds odd. Why does the client fail because it hasn't read the
> expected data yet? Normally (with non-blocking sockets), a failure to
> read the expected data will result in SSL_get_error() returning
> SSL_ERROR_WANT_READ - indicating that the data is not currently
> available and we should retry again later. That isn't a fatal error, so
> the connection should not be closed.
> 
> So - when you say the client fails - what exactly happens? What does
> SSL_get_error() return at that point?

Speaking of which, I've recently discovered (a documented interface
landmine) that:

    status = SSL_read(ssl, ...);
    err = SSL_get_error(ssl, status);

is an anti-pattern, because the "correct" usage is:

    ERR_clear_error();
    status = SSL_read(ssl, ...);
    err = SSL_get_error(ssl, status);

without ERR_clear_error(), SSL_get_error() can randomly indicate false
positive connection failures, depending on what's left over on the
error stack:

    http://postfix.1071664.n5.nabble.com/quot-SSL-Shutdown-shutdown-while-in-init-quot-while-sending-and-receiving-td105822.html

My take is that this is a sufficiently nasty problem to warrant some
changes in SSL_read(), SSL_write, SSL_accept(), ... to internally
memoize the error status before returning, in a manner that does not
depend on the prior state of the error stack, and that then
SSL_get_error() must look only at the given (SSL *) handle and
not at the error stack.

-- 
    Viktor.