How to debug a TLSv1.3 protocol problem?

Matt Caswell matt at openssl.org
Wed May 20 13:01:59 UTC 2020



On 20/05/2020 13:44, Claus Assmann wrote:
> On Wed, May 20, 2020, Matt Caswell wrote:
> 
>> SSL_accept:TLSv1.3 early data
> 
>> What happens in the application code? What was the function being called
>> (SSL_accept?) and what return value do you get? What does
>> SSL_get_error() return at this point?
> 
> It's:
> 	r = SSL_accept(srv_ssl);
> 	if (r <= 0)
> 		ssl_err = SSL_get_error(srv_ssl, r);
> 
> return value=-1
> ssl_err=5 SSL_ERROR_SYSCALL
> errno=0
> 
> It seems to me server and client get "out of sync" at the I/O layer
> if I understand the SSL traces correctly:
> 
> S8: sends 2 records at the end:
>   - handshake
>   - ChangeCipherSpec
> 
> M1: receives
>   handshake
>   but seemingly not
>   ChangeCipherSpec
>   Instead it sends only its own
>   ChangeCipherSpec
>   then its handshake again
>   and only then it receives ChangeCipherSpec
> 
> and S8 seemingly tries to interprete the out-of-sync data as TLSv1.3
> early data and fails, thus returning an error from SSL_accept().
> 
> If that analysis is correct (can someone check please?), then I
> need to look at the I/O layers of both programs -- they are rather
> different :-(
> 

The "early data" here is a red herring. It is normal for the internal
libssl state machine to (briefly) transition through the "early data"
state even though there is no early data being sent. In this case the
early data state is the last state that occurs after having written
change cipher spec, but before reading the second Client Hello following
an HRR.

I *think* what is happening here is that the server side state machine
has finished its writing tasks, and is now attempting to read data from
the client again (i.e. the 2nd ClientHello). Normally what would happen
at this point is that it would read the header of the next record that
is received from the client to check that the message type it has
received is sane. It is expecting the ClientHello message type, and only
at that point will it move the state machine on from the "early data"
state into the "SSLv3/TLS read client hello" state.

For some reason it is experiencing a failure while reading the client
hello from the client. Therefore it never makes the state transition out
of the "early data" state.

The SSL_ERROR_SYSCALL return suggests to me that the underlying system
call to read the data has failed for some reason. However errno being 0
indicates otherwise. There is actually a known scenario in 1.1.1 where
this can occur: if EOF is unexpectedly encountered on the socket. We
briefly fixed this (you should never normally get SSL_ERROR_SYSCALL with
errno == 0) but had to back the fix out because it broke some
applications which were written to expect this buggy behaviour.

I wonder if there could be some middlebox in between these two peers
that is interfering with the connection in some way and arbitrarily
closing it down?

Matt



More information about the openssl-users mailing list