Thread sanitiser problems

Thu Aug 1 04:27:19 UTC 2019

The locks aren’t being held for long.

I have noticed that the lock in the OPENSSL_CTX only seems to be locked for reading and only in one place.  crypto/context.c in openssl_ctx_get_data().  Writes seem to go via CRYPTO_alloc_ex_data() which uses a global write lock (on a different lock).  Would it be feasible to drop the OPENSSL_CTX lock completely?  This would address the issue.  Matt?  I think you know this code the best.

CRYPTO_alloc_ex_data() is calling the new_func pointer under the read lock and the new function can in turn cause a search of the providers for implementations which grabs the provider lock.  Conversely, searching the providers for implementations grabs the provider lock and can lead to openssl_ctx_get_data() grabbing the context lock.

We don’t appear to have recursion at the moment, but the setup seems a little fragile.  We do have two different ordering for grabbing locks which is also bad.

Pauli
-- 
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia

> On 31 Jul 2019, at 2:10 pm, Viktor Dukhovni <openssl-users at dukhovni.org> wrote:
> 
>> On Jul 30, 2019, at 10:02 PM, Dr Paul Dale <paul.dale at oracle.com> wrote:
>> 
>> The #9454 description includes thread sanitisizer logs showing different lock orderings — this has the potential to dead lock.  Agreed with Rich that giving up the lock would make sense, but I don’t see a way for this to be easily done.
> 
> My take is that we should never hold any lock long enough to even consider
> acquiring another lock.  No more than one lock should be held at any one
> time, and only long enough to bump a reference count to ensure that the
> object of interest is no deallocated before we (or our caller in a "get1"
> type interface) is done with it.
> 
> I don't know what "long-term" locks we're holding, but it would be great
> if it were possible to never (or never recursively) hold any such locks.
> 
> -- 
> 	Viktor.
>