Thread sanitiser problems
Dr Paul Dale
paul.dale at oracle.com
Thu Aug 1 04:27:19 UTC 2019
The locks aren’t being held for long.
I have noticed that the lock in the OPENSSL_CTX only seems to be locked for reading and only in one place. crypto/context.c in openssl_ctx_get_data(). Writes seem to go via CRYPTO_alloc_ex_data() which uses a global write lock (on a different lock). Would it be feasible to drop the OPENSSL_CTX lock completely? This would address the issue. Matt? I think you know this code the best.
CRYPTO_alloc_ex_data() is calling the new_func pointer under the read lock and the new function can in turn cause a search of the providers for implementations which grabs the provider lock. Conversely, searching the providers for implementations grabs the provider lock and can lead to openssl_ctx_get_data() grabbing the context lock.
We don’t appear to have recursion at the moment, but the setup seems a little fragile. We do have two different ordering for grabbing locks which is also bad.
Dr Paul Dale | Cryptographer | Network Security & Encryption
Phone +61 7 3031 7217
> On 31 Jul 2019, at 2:10 pm, Viktor Dukhovni <openssl-users at dukhovni.org> wrote:
>> On Jul 30, 2019, at 10:02 PM, Dr Paul Dale <paul.dale at oracle.com> wrote:
>> The #9454 description includes thread sanitisizer logs showing different lock orderings — this has the potential to dead lock. Agreed with Rich that giving up the lock would make sense, but I don’t see a way for this to be easily done.
> My take is that we should never hold any lock long enough to even consider
> acquiring another lock. No more than one lock should be held at any one
> time, and only long enough to bump a reference count to ensure that the
> object of interest is no deallocated before we (or our caller in a "get1"
> type interface) is done with it.
> I don't know what "long-term" locks we're holding, but it would be great
> if it were possible to never (or never recursively) hold any such locks.
More information about the openssl-project