[ech] would a callback for ECH retry-configs be useful?

David Benjamin davidben at google.com
Tue Apr 18 15:06:43 UTC 2023

I don't think using the most recent config is right. We started with that
in an early prototype, but quickly realized that doesn't work. For
BoringSSL's API, when we make a config set, we just let you mark which
configs are to be sent as retry configs and which aren't. I don't think you
need a callback.

First, consider a service backed by multiple servers, as in most large
deployments. It is impossible to atomically deploy something across all
servers concurrently, but you also need to rotate keys, so your deployment
model needs to take all this into account. You'll need to satisfy the

- New ECHConfigs, when they're generated, do not end up in DNS until all
[or almost all] your servers support it
- DNS caches should be assumed stale, so servers also need to support the
last few generations of keys
- The recovery flow makes a new connection, so it may hit a new server.
Thus retry configs should also be ones that are expected to be supported by
all servers

Put all this together and you get something of this rough shape:

- Servers keep a window of N configs. The most recent or so is in the
process of being rolled out and may not be available on all its peers yet.
Then there's one config that the server believes is rolled out to all its
peers. *That* one should be the retry config. Then it also retains N-2 or
so configs past that to deal with stale caches.

- Periodically, some provisioning process generates a new ECHConfig,
prepends it to this list, retires the oldest one, and deploys the new
window to all servers. The new config is *not* put in DNS yet because not
all servers have it.

- Once the new config is sufficiently rolled out, put the new config in
DNS, replacing the previous config. Possibly also do a round of rollout to
all the servers to tell them they can bump the retry config forward by one
generation, though the generation before that should also work fine
provided N is large enough.

Second, there may be more than one retry config, hence why it's an
ECHConfigList. ECH's extensibility model is that the server presents
multiple ECHConfigs and the client picks one. Suppose an ECH server
supports both P-256 and X25519 KEMs. It would then provision pairs of
(P-256, X25519) configs every time it rotates. So while I said N configs
above, it's really N pairs (or more depending on what your generation size
is). The thing that goes in DNS and retry configs is one full generation of
configs. That means your API should be able to express that.

Put another way: the retry config should be the answer to "what would I, an
individual server instance, want to put in DNS, as of the information I
have right now?" Having a simple boolean associated with each config
suffices to let the caller control this, which is what we've done. (Though
looking back, I see we didn't document the details here as well as I
thought we had. I'll see about fixing that...)

On Sun, Apr 16, 2023 at 9:08 PM Stephen Farrell <stephen.farrell at cs.tcd.ie>

> Hiya,
> I've been adding code for testing badly encoded ECH stuff
> to my branch, esp. for EncodedClientHelloInner which is the
> new thing that could cause server bugs. That's in [1] and
> seems like a reasonable start to doing that well. And that
> approach (for testing) also seems to work ok for badly
> constructed values for the ECH acceptance signal in SH.random
> or within HRRs.
> One problem I've not solved (within the test harness) is
> how to do similarly for the retry-config values returned
> by a server when the wrong ECH public value is used by a
> client (or if a client GREASEs). Right now, a server (that
> has some ECH private values loaded) will return the ECHConfig
> corresponding to the most recently loaded ECH private value,
> which I think is reasonable.
> However, for testing, it might be useful to enable a server
> to trigger a callback, so that it could return a borked
> retry-config value, to check that doesn't result in badness
> for a client.
> My question is: would it be useful for real servers to be
> able to choose the retry-config value to return via a new
> callback? I guess that might be useful for servers that
> use multiple CDNs, but I'm not at all sure, since I don't
> get near such servers... hence asking:-)
> Secondary question: if useful, then what params might such
> a callback need?
> Opinions welcome!
> Thanks,
> S.
> [1]
> https://github.com/sftcd/openssl/blob/ECH-draft-13c/test/echcorrupttest.c#L41
