[openssl-dev] [openssl.org #4119] DTLS resets handshake hash too frequently for ClientHello

Tue Nov 3 17:43:05 UTC 2015

On Tue, Nov 3, 2015 at 11:16 AM Matt Caswell via RT <rt at openssl.org> wrote:

> Whilst investigating this I noticed another bug which is actually
> probably more significant. My eyeball only look at the BoringSSL source
> suggests that it is there too, so I'm not sure why you haven't seen it
> in the test that you mentioned.
>
> If a retry occurs on a second or subsequent fragment of a handshake
> message then when we do the retry the wrong fragment offset and length
> is used. The impact isn't that great because the messages got dropped by
> the peer, and then when they get retransmitted they have the correct
> values inserted...so the handshake succeeds - but it gets delayed.
> Perhaps that is why you don't see it in your test.
>

Our tests are pretty strict and deterministic (clock is mocked out, Go DTLS
half is intentionally much stricter than a real impl), so it actually fails
if packets are dropped that aren't supposed to be. I think it's actually
because my async tests, uh, didn't stress the low MTU case at all. :-)
Thanks!

(The handshake hash bug requires a fragmented ClientHello to trigger in
OpenSSL because you only update the handshake hash as it gets written,
whereas BoringSSL updates it up-front to simplify a few things. I actually
didn't think much about fragmentation when working on it our end and only
realized it was a precondition for you later.)

In fact, apparently I'd noticed frag_off was messed up before, added TODOs,
and forgot about it. :-) I guess this must have been the other reason I'd
previously assumed retrying writes in the DTLS code didn't work at all. It
mostly doesn't.
https://boringssl.googlesource.com/boringssl/+/master/ssl/d1_both.c#285

I'm not sure that fix quite works though. If BIO_flush completes
asynchronously (hrm, it's missing an rwstate update), then I believe you'll
be in a state where you *do* want to repeat the init_off / init_num adjust.
You might be able to get away with using init_off/init_num with some minor
tweaks? Another problem: because the fragment headers clobber whatever was
already written, msg_callback sees garbage. Also this function is used for
the unfragmented ChangeCipherSpec, so it's even messier.

I dunno, this code is too stateful by several orders of magnitude. I think
I'm going to toy with rewriting it now rather than think too hard about the
existing mess.

> This second issue occurs in all branches.
>
> One other related point is that fragmenting ClientHellos is probably a
> bad idea. The whole ClientHello/HelloVerifyRequest mechanism is meant to
> be implemented without storing state on the server. That isn't possible
> if you have to deal with fragment reassembly. In the new DTLSv1_listen
> implementation in master we drop fragmented ClientHellos.
>

Yeah, this part of DTLS (like much of it) is woefully underspecified. We
keep stuffing things into ClientHellos, so if one does need to support
fragmented ones, I think the right way to do stateless HelloVerifyRequest
is to silently drop all non-initial ClientHello fragments and require the
initial one be large enough to include the client_random and whatever else
you figure into the cookie.

David