[openssl-dev] Removing gcm128_context->H for non-1-bit builds

Brian Smith brian at briansmith.org
Wed Jun 8 11:51:20 UTC 2016


On Wed, Jun 8, 2016 at 12:40 AM, Andy Polyakov <appro at openssl.org> wrote:
>> I noticed that the `H` member of `gcm128_context` seems to be
>> unnecessary for builds that aren't using the 1-bit GCM math.
>
> Not true. It is actually used in s390x assembly module. And I mean both
> H and Htable.

I see. I admit I didn't look at the s390x code, mostly because I can't
fit one in my small apartment.

>> Could somebody adjust who understand the assembly code (probably Andy)
>> modify it to use symbolic names for the offsets that are used to
>> access Xi, H, Htable? If so, then I can write the patch to
>> conditionally exclude `H` on platforms that don't need it after
>> `CRYPTO_gcm128_init` finishes executing.

> But going the
> line of taking into consideration all corner cases is a stretch and
> should be weighed against 16 out of ~380[!] bytes waste. I'd say it's
> not worth it.

I see it both as an *optimization* and also a way to ensure
*correctness*. In particular, if the code doesn't expect H to be there
in configurations that don't use H, then some tricks that people might
use (in particular, a trick I am using) becomes safer.

In particular, notice that in the gcm128_context structure, there are
three kinds of state (again, only talking about non-s390x, non-1-bit
platforms):

1. State that is only used in the _init function: H.

2. State that needs to be preserved in between
authenticated-and-encrypted messages. This is `Htable`, `EK0`,
`gmult`, `ghash`, `block`, etc.

3. State that needs to be preserved only between the time you start an
authenticated-and-encrypted message and the time you end it. This
includes `len`, `EKi`, `mres`, `ares`, etc. currently. In theory this
could also include `gmult`, `ghash` and `block`, if the code were
refactored to recomputed them for each message and/or if things like
the OPENSSL_STATIC_ARMCAP-type optimization allowed one to omit them
from the structure completely in some configurations where there is no
way they could vary at runtime. Also, Htable is 256 bytes on its own
(on the platforms I care about), but actually in some
platforms/configurations not all of Htable is used.

In my code, after I call the _init function, I extract out all the
numbers in category #2 and store them in my per-connection context
structure on. Then, when I need to encrypt/decrypt a message, I
construct a full gcm128_context *on the stack*, zero it out, and then
fill in the values from category #2. Then I encrypt/decrypt the
message, and then throw away the gcm128_context.

> One can argue that "the most popular embedded three-letter
> platform" deserves this 4% space optimization [by being so popular], but
> then question would be if OpenSSL can actually execute in such
> constrained environment where 4% per GCM context (i.e. something that
> itself is percentage of something else) makes a difference. Aren't we
> likely to be talking about ripping out single component and using in the
> said environment? But then question is why this specific case would have
> to complicate maintenance for OpenSSL as whole?

In other words, there are *lots* of bytes in gcm128_context that could
be thrown away in between messages, if one really needed to save
memory. And then the size of `H` does matter quite a bit more as a
percentage of the size of this inter-message state. And, also, whether
or not `H` belongs in category #1 or category #2 is important for
correctness. Thus, my suggestion in this thread is an attempt to
clarify the code to make it more obvious that it is in category #2
(besides 1-bit-mult and s390x platforms).

Note, in contrast, Poly1305 only requires 32 bytes of state to be
preserved between messages. My goal is to bring the GCM inter-message
state storage requirements closer to this 32 bytes.

> However, I can tell that assembly module
> for "the most popular embedded three-letter platform" does *not* depend
> on relative position of Xi, H and Htable. One can *probably* discuss
> that it would be appropriate to *facilitate* omission of H in context
> *other than* OpenSSL by avoiding H during most of the setup procedure.
> See attached patch for example. But do note that I'm not saying that it
> works or suggesting to include it right away, I only want to show what
> *might* be matter of discussion.

Nice. Thanks for the patch. That is actually very similar to what I've
done in my experiments. But, I was also trying to get it to work on
x86 and x86-64, where the relative position does matter. The
x86/x86-64 code is where I got confused about the math in the
assembler modules.

>> Also, I wonder how important it is to keep the 1-bit math code?
>
> Look at it as insurance. The moment they come and say table-driven
> approach is no-go, we have 1-bit code to switch to.

Understood.

Thanks a ton!

Cheers,
Brian
-- 
https://briansmith.org/


More information about the openssl-dev mailing list