[openssl-dev] [openssl.org #4116] [PATCH] Reimplement non-asm OPENSSL_cleanse()

Fri Nov 13 12:50:59 UTC 2015

> The contract of volatile means that the compiler can't cache it.
> But I think that's only when it actually generates code for it,
> not when it can optimize it away.

Well, if that was the case, then you wouldn't be able to talk to
hardware from C. Formally volatile references are not subject to
away-optimization under no circumstances. Just like call to say
fwrite(3) isn't. Or maybe write(2) would make even more illustrative
example, because in fwrite case you have FILE * that is not declared
const, so you can argue that compiler assumes that FILE object is
modified. Well, it is, but that's not the reason for why compiler would
never optimize away call to fwrite, isn't it? They may not be optimized
away because opaque function calls, just like references to volatile
object[!], are specified as *side effects*. Or in other words if
reference to volatile object is optimized away, then it would be
violation of standard and would have to be treated as compiler bug. And
exactly this last thing was the original concern. Is there anything one
can do to eliminate the *room* itself for compiler to screw things up?
That's what assembly OPENSSL_cleanse was and still is about.

Couple of points in overall context.

As for zeroing memory vs. filling with non-zero. Swap argument doesn't
really apply, because if memory was swapped at some point, swapped copy
can persist even after original's modification. Even more likely so if
you cleanse just before terminating application. In other words if you
are set to secure swap file, then cleansing memory is not really the
solution.

As for LTO. I argue that LTO and even over-aggressive IPA doesn't belong
in OpenSSL. Remember the week when UltraSPARC held SPEC benchmark
record? Well, I don't expect you to remember the week (I actually
don't), but rather how did it happen. Compiler decomposed array of
structures to multiple arrays of elements comprising the structure and
thus dramatically improved cache locality. With this in mind ask
yourself if you would appreciate if side channel attack countermeasure
falls victim to similar optimization?

One can refer to what others are doing, but you would also have to
recognize that others are consciously limiting their scope and kind of
say "we make it work exactly where we are at the moment." I'm not saying
that it's not viable strategy, only that it doesn't have to be universal.