SSL_ERROR_WANT_TIME: Pause SSL_connect to fetch intermediate certificates

Wed Aug 19 09:29:19 UTC 2020

On 18/08/2020 22:31, Alex Rousskov wrote:
> As you know, OpenSSL provides the certificate verification callback that
> can discover that the origin certificate chain is incomplete. An
> application using threads or blocking I/O can probably "pause" its
> verification callback execution, fetch the intermediate certificates,
> and then complete validation before happily returning to the
> SSL_connect() caller. Life is easy when you can use threads or block
> thousands of concurrent transactions!

I suspect this is the way most people do it.

> What can we do? How can we pause the SSL_connect() progress after the
> origin certificate is fetched but before it is validated?

We should really have a proper callback for this purpose. PRs welcome!
(Doesn't help you right now though).

> I am aware of the ASYNC_pause_job() and related async APIs in OpenSSL.
> If I interpret related documentation, discussions, and our test results
> correctly, that API is not meant as the correct answer for our problem.
> Today, abusing that API will probably work. Tomorrow,
> internal/unpredictable OpenSSL changes might break our Squid
> enhancements beyond repair as detailed below.
> 
> Somewhat counter-intuitively, the OpenSSL async API is meant for
> activities that can work correctly _without_ becoming asynchronous (i.e.
> without being paused to temporary give way to other activities). Squid
> cannot fetch the missing intermediate certificates without pausing TLS
> negotiations with the server...
> 
> The async API was added to support custom OpenSSL engines, not
> application callbacks. The API does not guarantee that an
> ASYNC_pause_job() will actually pause processing and return to the
> SSL_connect() caller! That will only happen if OpenSSL internal code
> does not call ASYNC_block_pause(), effectively converting all subsequent
> ASYNC_pause_job() calls into a no-op. That pause-nullification was added
> to work around deadlocks, but it effectively places the API off limits
> to user-level code that cannot control the timing of those
> ASYNC_block_pause() calls.

The async API is meant for any scenario where user code may want to
perform async processing. Its design is NOT restricted to engines -
although that is certainly where it is normally used. However there are
no assumptions made anywhere that it will be exclusively restricted to
engines.

ASYNC_block_pause() is intended as a user level API, and a quick search
of the  codebase reveals that the only place we use it internally is in
our tests - it does not appear in the library code. The intention is
that you should be able to rely on being inside a job in any callbacks,
if you've started the connection inside one.

"Somewhat counter-intuitively, the OpenSSL async API is meant for
activities that can work correctly _without_ becoming asynchronous (i.e.
without being paused to temporary give way to other activities)"

I have no idea what you mean by this. The whole point of
ASYNC_pause_job() is to temporarily give way to other activities.

One issue you might encounter with the ASYNC APIs is that they are not
available on some less-common platforms. Basically anything without
setcontext/swapcontext support (e.g. IIRC I think android may fall into
this category).

> Can you think of another trick?

One possibility that springs to mind (which is also an ugly hack) is to
defer the validation of the certificates. So, you have a verify callback
that always says "ok". But any further reads on the underlying BIO
always return with "retry" until such time as any intermediate
certificates have been fetched and the chain has been verified "for
real". The main problem I can see with this approach is there is no easy
way to send the right alert back to the server in the event of failure.

> P.S. Squid does not support BoringSSL, but BoringSSL's
> SSL_ERROR_WANT_CERTIFICATE_VERIFY result of the certificate validation
> callback seemingly addresses our use case. I do not know whether OpenSSL
> decision makers would be open to adding something along those lines and
> decided to ask for existing solutions here before proposing adding
> SSL_ERROR_WANT_TIME :-).

I'd definitely be open to adding it - although it wouldn't be backported
to a stable branch.

Matt