[openssl-dev] [openssl.org #4116] [PATCH] Reimplement non-asm OPENSSL_cleanse()

Sat Oct 31 21:50:31 UTC 2015

On Sat, Oct 31, 2015 at 07:59:03PM +0000, Brian Smith via RT wrote:
> Alessandro Ghedini via RT <rt at openssl.org> wrote:
> 
> > I was also wondering whether it would make sense to just drop the asm
> > implementations. Does the speed-up justify the added complexity?
> >
> 
> IMO, it should work like this:
> * memset_s when memset_s is available.

Given the current OpenSSL build system, there's no way to detect if memset_s is
available at build time (unless it's defined as a macro).

In any case memset_s is not available anywhere anyway, so that doesn't really
matter.

> * Otherwise, SecureZeroMemory, when SecureZeroMemory is available.
> * Otherwise, if a flag OPENSSL_REQUIRE_SECURE_ZERO is set, fail.

In 99% of the cases (e.g. Linux with glibc or any *BSD) that would fail, so I
don't see the point in that.

> * Otherwise, use an assembly language implementation, if available.
> * Otherwise, emit a warning and use the C implementation.
> 
> Note in particular that the C compiler is allowed to completely defeat the
> purpose of the function unless SecureZeroMemory or memset_s is used, even
> if you use "volatile" or other tricks.

I don't think that is true (regarding the volatile pointer). But assuming that
a broken compiler decided to optimize that call away, what's stopping it to
optimize the call to the asm implementation as well? Also, such broken compiler
probably wouldn't know about C11 either, so even if memset_s() was available it
could optimize that as well.

I don't know how compilers are supposed to treat memset_s(), but if I define a
memset_s() function which just calls memset() internally, GCC (v5.2.1 20151028)
optimizes that away just as it does with plain memset(), so libc
implementations would probably need to adopt "tricks" to avoid the optimization
as well.

FWIW OpenSSH implements the portable explicit_bzero() using the volatile
pointer as well (unless memset_s() is detected at build time). On OpenBSD it
just uses OpenBSD's explicit_bzero() which is implemented using memset() and a
weak function pointer. But that (as used in LibreSSL) seems to have problems in
relation to LTO, unless optimizations are specifically disabled:
https://github.com/libressl-portable/openbsd/issues/5

On the other hand BoringSSL uses memset() with an explicit memory barrier
implemented as inline assembly, which is possibly better than the volatile
implementation because GCC (and probably other compilers) can inline the
memset() call. But if OPENSSL_NO_ASM is defined, the barrier is omitted and
memset() is optimized away, so that's potentially dangerous.

The above is just to give an overview on how other projects solved this
problem. Notably none of them uses full assembly implementations like OpenSSL
does.

> The primary purpose of the assembly language implementations is to reduce the
> possibility that the C compiler will do the weird things that C compilers love
> to do.

According to the changelog and git log, the primary purpose of the asm
implementations was to improve performance (see commit b2dba9b). Using the
volatile pointer implementation would IMO make these optimizations useless,
hence the proposal to drop them and make things simpler.

Cheers