Thread sanitiser problems

Tue Jul 30 02:42:33 UTC 2019

Bringing the discussions over to the project list.

The problem is initially mentioned in #9454.  Followup issues with infinite recursion and use after free also get mentioned in #9455.  These are addressed by #9477 which clears the flush flag before flushing and removes the dependence on the RAND call.  Reproduction seems to require gcc-8, gcc-7 doesn’t have the appropriate thread sanitisation.

Overly simplified, the problem boils down to the CTR DRBG needing an AES CTR cipher context to work.  When creating the former, a recursive call is made to get the latter.  A deadlock results due to a cycle in the locking.  The RAND/DRBG code will not be the only place where this occurs.  KDF, MAC and some public key operations will suffer a similar issue.

This is almost certainly going to hurt moving forward.

On to possible mitigations:

1. Make our locks recursive.  Add a reference counter and a thread ID to the lock structure.  There would be no need to make either use atomic operations.  Getting a read lock after a write lock would be considered an extra write lock, handing a write lock while holding a read lock could be problematic.  This seems very messy.

2. Return dependent algorithms as part of the registration process.  The particular algorithm could be preloaded somehow.  I’m not sure how ugly this will become but it will need names (nids) for each possible DRBG type.

Thoughts anyone?
Any better solutions?
Any other solutions?

Pauli
-- 
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia