[openssl-project] [TLS] Yet more TLS 1.3 deployment updates

Wed Jan 23 10:24:18 UTC 2019

On 22/01/2019 20:41, David Benjamin wrote:
> On Tue, Jan 22, 2019 at 1:48 PM Viktor Dukhovni <openssl-users at dukhovni.org
> <mailto:openssl-users at dukhovni.org>> wrote:
> 
> 
> 
>     > On Jan 22, 2019, at 2:06 PM, Adam Langley <agl at imperialviolet.org
>     <mailto:agl at imperialviolet.org>> wrote:
>     >
>     > (This is another installment of our experiences with deploying the
>     > RFC-final TLS 1.3—previous messages: [1][2]. We share these with the
>     > community to hopefully avoid other people hitting the same issues.)
>     >
>     > [...]
>     >
>     > However, OpenSSL 1.1.1a signals SSL_CB_HANDSHAKE_START when TLS 1.3
>     > post-handshake messages are received[5], including KeyUpdate. This
>     > causes KeyUpdate messages to break with, at least, HAProxy, and with
>     > NGINX prior to this commit[6]. (There may well be more, but that level
>     > of breakage was enough to drown any other signal.)
>     >
>     > Lastly, OpenSSL 1.1.1a imposes a hard limit of 32 KeyUpdate messages
>     > per connection[7]. Therefore clients that send periodic KeyUpdates
>     > based on elapsed time or transmitted bytes will eventually hit that
>     > limit, which is fatal to the connection.
>     >
>     > Therefore KeyUpdate messages are not currently viable on the web, at
>     > least when client initiated.
>     >
>     > [1] https://mailarchive.ietf.org/arch/msg/tls/PLtOD4kROZFfNtPKzSoMyIUOzuE
>     > [2] https://mailarchive.ietf.org/arch/msg/tls/pixg5cBXHuwd3MtMIn_xIhWmGGQ
>     > [3] https://bugs.openjdk.java.net/browse/JDK-8211806
>     > [4] https://bugs.openjdk.java.net/browse/JDK-8213202
>     > [5] https://github.com/openssl/openssl/issues/8069
>     > [6]
>     https://trac.nginx.org/nginx/changeset/e3ba4026c02d2c1810fd6f2cecf499fc39dde5ee/nginx/src/event/ngx_event_openssl.c
>     > [7] https://github.com/openssl/openssl/issues/8068
>     > [8] https://twitter.com/__subodh/status/1085642001595265024
> 
>     I think we should remediate the reported issues in the 1.1.1b release.
>     We should probably clear the keyUpdate count when sufficient application
>     data has been received from the peer.  Where sufficient could be as little
>     as 1 byte, or could be something more reasonable (say 1MB, allowing for
>     up to 32 rekeys per MB, which is plenty).
> 
>     As for applications mishandling "SSL_CB_HANDSHAKE_START", not quite sure
>     what to do there, but perhaps we could define a new even for keyUpdates
>     that does not mislead applications into assuming a new "handshake".
> 
> 
> I think this is clearly the right option. This is a compatibility break, IMO.
> 
> Prior to SSL_OP_NO_RENEGOTIATION (new in the same release that added 1.3), this
> was the only way to disable renegotiations. I've seen a lot of codebases do
> this. I don't think one could even call it a mishandling. The natural
> interpretation of "SSL_CB_HANDSHAKE_START" is that a handshake has started.
> Thus, if you wish to block renegotiations, absent a more direct API, it's
> natural to count those and fail if you see more than one.
> 
> A KeyUpdate is not a handshake.

That's depends on your perspective. One peer sends a message, and the other peer
(may) respond with a message. Sounds like a handshake to me. KeyUpdate gets sent
in handshake records, with a HandshakeType value of 24 and is defined in section
4 of the RFC ("Handshake Protocol").

To me it makes perfect sense to signal it this way. In fact it was a deliberate
decision to do so and is documented accordingly:

 =item SSL_CB_HANDSHAKE_START

 Callback has been called because a new handshake is started. In TLSv1.3 this is
 also used for the start of post-handshake message exchanges such as for the
 exchange of session tickets, or for key updates. It also occurs when resuming a
 handshake following a pause to handle early data.

 =item SSL_CB_HANDSHAKE_DONE           0x20

 Callback has been called because a handshake is finished. In TLSv1.3 this is
 also used at the end of an exchange of post-handshake messages such as for
 session tickets or key updates. It also occurs if the handshake is paused to
 allow the exchange of early data.

Ironically it was done this way with a view to *avoiding* a compatibility break.
The thinking was that applications written with TLSv1.2 in mind may find it
surprising to start receiving events after the initial connection setup that are
not enclosed in HANDSHAKE_START/HANDSHAKE_DONE. The info callback is intended as
a tracing API IMO. It's only the fact that it has been used in ways other than
we might have expected that we have this problem. In fact if your application is
using the API to detect renegs because it wants to report when key updates are
happening (rather than blocking them)....well that application will continue to
work and will stop working if we change the events.

Given this is documented that way, some applications may already be using it -
so changing the events now should be done with caution. That itself could cause
a break. We know that this is causing some problems now because of the way that
some applications are (mis)using this API. What we don't know is what new
problems would surface if we changed its semantics in a letter release.

Really the root of this problem is not in the info callback at all. It is the
fact that in earlier 1.1.0 releases we did not have an effective way of blocking
renegs. We now have SSL_OP_NO_RENEGOTIATION which is the "right" way to do this.
I'd also note that this was backported to the 1.1.0 branch in 1.1.0h (March
2018) and so has been available to all 1.1.x users for some while.

Matt