[openssl/openssl] f5b5a3: Ensure proper memory barriers around ossl_rcu_dere...
Neil Horman
noreply at github.com
Wed Apr 10 07:20:57 UTC 2024
Branch: refs/heads/master
Home: https://github.com/openssl/openssl
Commit: f5b5a35c84626823364b0c8535b968c106690a56
https://github.com/openssl/openssl/commit/f5b5a35c84626823364b0c8535b968c106690a56
Author: Neil Horman <nhorman at openssl.org>
Date: 2024-04-10 (Wed, 10 Apr 2024)
Changed paths:
M crypto/threads_pthread.c
Log Message:
-----------
Ensure proper memory barriers around ossl_rcu_deref/ossl_rcu_assign_ptr
Since the addition of macos14 M1 runners in our CI jobs we've been
seeing periodic random failures in the test_threads CI job.
Specifically we've seen instances in which the shared pointer in the
test (which points to a monotonically incrementing uint64_t went
backwards.
>From taking a look at the disassembled code in the failing case, we see
that __atomic_load_n when emitted in clang 15 looks like this
0000000100120488 <_ossl_rcu_uptr_deref>:
100120488: f8bfc000 ldapr x0, [x0]
10012048c: d65f03c0 ret
Notably, when compiling with gcc on the same system we get this output
instead:
0000000100120488 <_ossl_rcu_uptr_deref>:
100120488: f8bfc000 ldar x0, [x0]
10012048c: d65f03c0 ret
Checking the arm docs for the difference between ldar and ldapr:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDAPR--Load-Acquire-RCpc-Register-
https://developer.arm.com/documentation/dui0802/b/A64-Data-Transfer-Instructions/LDAR
It seems that the ldar instruction provides a global cpu fence, not
completing until all writes in a given cpus writeback queue have
completed
Conversely, the ldapr instruction attmpts to achieve performance
improvements by honoring the Local Ordering register available in the
system coprocessor, only flushing writes in the same address region as
other cpus on the system.
I believe that on M1 virtualized cpus the ldapr is not properly ordering
writes, leading to an out of order read, despite the needed fencing.
I've opened an issue with apple on this here:
https://developer.apple.com/forums/thread/749530
I believe that it is not safe to issue an ldapr instruction unless the
programmer knows that the Local order registers are properly configured
for use on the system.
So to fix it I'm proposing with this patch that we, in the event that:
1) __APPLE__ is defined
AND
2) __clang__ is defined
AND
3) __aarch64__ is defined
during the build, that we override the ATOMIC_LOAD_N macro in the rcu
code such that it uses a custom function with inline assembly to emit
the ldar instruction rather than the ldapr instruction. The above
conditions should get us to where this is only used on more recent MAC
cpus, and only in the case where the affected clang compiler emits the
offending instruction.
I've run this patch 10 times in our CI and failed to reproduce the
issue, whereas previously I could trigger it within 5 runs routinely.
Reviewed-by: Tom Cosgrove <tom.cosgrove at arm.com>
Reviewed-by: Paul Dale <ppzgs1 at gmail.com>
Reviewed-by: Tomas Mraz <tomas at openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23974)
To unsubscribe from these emails, change your notification settings at https://github.com/openssl/openssl/settings/notifications
More information about the openssl-commits
mailing list