[openssl-dev] [openssl.org #4025] Bug? DTLS server does not respond if HelloVerifyRequest message lost

Sat Aug 29 12:38:02 UTC 2015

On 29/08/15 13:11, Ken Ballou via RT wrote:
> I'm sorry if I did not explain myself well.
> 
> The message flow I observed and am trying to describe is:
> 
> ClientHello ---------------------->
>                             x----------------- HelloVerifyRequest
> ClientHello ---------------------->
> ClientHello ---------------------->
> ClientHello ---------------------->
> ClientHello ---------------------->
> (and the sound of crickets chirping on the server side is deafening).
> 
> I do understand that the server does not RETRANSMIT the 
> HelloVerifyRequest (and I do understand the reason for that).  What I am 
> describing is a failure to RESPOND to a retransmitted ClientHello.

The reasons for this are a little complicated I think.

When DTLS was first integrated into OpenSSL it was (and still is)
largely based on the TLS code. In particular the whole handshake is very
connection oriented. Each message in the handshake is given a message
sequence number (not to be confused with the record sequence number). So
typically you get:

ClientHello (no cookie, seq = 0) ---------->
                                 <---------- HelloVerifyRequest (seq 0)
ClientHello (cookie, seq = 1)    ---------->
                                 <---------- ServerHello (seq 1)
                                             etc

In this very connection oriented way of looking at things the server
receives the original ClientHello and responds with the
HelloVerifyRequest. It then increments its counter to expect message
sequence number 1 next from the client. Anything less than sequence
number 1 is ignored as a retransmit of a message that the server has
already dealt with. Normally (for other messages) a lost message isn't a
problem because the server will retransmit the messages its already sent
if it doesn't get the message its expecting from the client within a
given timeout period. In this case however the RFC expressly prohibits
the retransmission of the HelloVerifyRequest. Therefore where the
original HelloVerifyRequest gets lost the server is stuck waiting for
message sequence 1 to arrive, whilst the client is stuck waiting for the
HelloVerifyRequest to be retransmitted (which is never going to happen),
whilst retransmitting its original seq=0 ClientHello which the server is
ignoring.

This connection oriented perspective was never the way that a
HelloVerifyRequest was intended to be used. The server is supposed to
handle these in a completely stateless way, i.e. it waits for a
ClientHello (seq=0) to arrive; sends back a HelloVerifyRequest; and then
instantly forgets anything about that and waits for the next ClientHello
(seq=0) to arrive. Only when it sees a ClientHello (seq=1) with a valid
cookie does the server form a connection and the handshake proceeds as
normal.

To deal with this stateless way of handling the
ClientHello/HelloVerifyRequest a new function was introduced (way back
in OpenSSL 0.9.8) called DTLSv1_listen. In the original connection
oriented way of doing things a server would just call SSL_accept (or
some other function that ultimately calls that) and waits for the
handshake to complete. In this stateless way of operating a server
application is supposed to call DTLSv1_listen until a ClientHello with a
valid cookie is received, and only *then* is it supposed to call SSL_accept.

The bug here I think is that when DTLS support was originally added to
s_server the DTLSv1_listen function had not been created yet. When
DTLSv1_listen was later added, s_server was never updated.

So, in summary, I believe this to be an s_server bug, not a libssl bug.
We should update s_server to use DTLSv1_listen (and coincidentally I
have a patch for the master branch knocking around somewhere to do just
that...I'll dig it out).

Matt