SSL_ERROR_WANT_TIME: Pause SSL_connect to fetch intermediate certificates

Thu Aug 20 14:37:48 UTC 2020

On 19/08/2020 20:35, Alex Rousskov wrote:
> Does this clarify what I meant? Do you agree that OpenSSL async API is
> not suitable for callbacks that _require_ ASYNC_pause_job() to return
> control to the application?

Yes, it clarifies what you meant. And, yes, its true that strictly
speaking that *could* happen. ASYNC_block_pause() was introduced to
handle the problem where we are holding a lock and therefore must not
return control to the user without releasing that lock. As a general
rule we want to keep the sections of code that perform work under a lock
to an absolute minimum. It would not seem like a great idea to me to
call user callbacks from libssl while holding such a lock. We have no
idea what those callbacks are going to do, and which APIs they will
call. The chances of a deadlock occurring seem very high under those
circumstances, unless restrictions are placed on what the callback can
do, and those restrictions are very clearly documented.

So, yes you are right. But in practice I'm not sure how much I'd really
worry about this theoretical restriction. That's of course for you to
decide.

> If you think that fears about something inside OpenSSL/engines
> preventing our callback from returning control to the application are
> unfounded, then using async API may be the best long-term solution for
> Squid. Short-term, it does not work "as is" because OpenSSL STACKSIZE
> appears to be too small (leading to weird crashes that disappear if we
> increase STACKSIZE from 32768 to 524288 bytes), but perhaps we can
> somehow hack around that.

Hmm. Yes this is a problem with the current implementation. The
selection of STACKSIZE is somewhat arbitrary. It would be nice if the
stack size grew as required, but I'm not sure if that's even technically
possible. A workaround might be for us to expose some API to set it -
but exposing such internal details is also quite horrible.

> 
> 
>> One possibility that springs to mind (which is also an ugly hack) is to
>> defer the validation of the certificates. So, you have a verify callback
>> that always says "ok". But any further reads on the underlying BIO
>> always return with "retry" until such time as any intermediate
>> certificates have been fetched and the chain has been verified "for
>> real". The main problem I can see with this approach is there is no easy
>> way to send the right alert back to the server in the event of failure.
> 
> We were also concerned that X509_verify_cert() is not enough to fully
> mimic the existing OpenSSL certificate validation procedure because the
> internal OpenSSL ssl_verify_cert_chain() does not just call
> X509_verify_cert(). It also does some DANE-related manipulations, for
> example. Are those fears unfounded? In other words, is calling
> X509_verify_cert() directly always enough to make the right certificate
> validation decision?
> 

Does squid use the DANE APIs? If not I'm not sure it makes much
difference. In any case the "manipulation" seems limited to setting DANE
information in the X509_STORE_CTX which presumably could be replicated by:

X509_STORE_CTX_set0_dane(ctx, SSL_get0_dane());

However, I'm not really the person to ask about the DANE implementation.
Maybe Viktor Dukhovni will chip in with his thoughts.

Matt