[OTC VOTE PROPOSAL] Don't merge PR#14759 (blinding=yes and similar properties)

Thu Apr 8 22:50:03 UTC 2021

Nicola, you are (in my view) conflating multiple items.

For the provider and property design approach it is rather simple.
- Algorithm implementations can vary.
- Selection between algorithm implementations when multiple providers are
available is performed by properties

Algorithm implementations should declare whatever set of properties they
feel is appropriate for their implementation.
Applications (and in this context most likely directly by end-user
configuration) should be able to select which properties are considered
most important for their context.
That decision capability must be left to the end user as only the end user
knows the security context in which they are operating - we don't know that
ourselves.

The vast majority of your lengthy email below is actually focused on one
issue - what should the default behaviour be for selection of
implementations - and your viewpoint that we should not mark different
properties at all that might impact security. I don't think that position
is supportable - in that you are basically arguing that we should never
declare anything about properties of implementations and should never
select between different implementations except at a "provider level"
approach. Your approach is that all implementations should be considered
equal and that pretty much defies logic in my view.

Different implementations have different characteristics. Even looking at
something like constant time - not all of our implementations are constant
time.

Your statement that the approach of declaring properties "promotes insecure
by default" is simply flawed logic - and following the same logic I could
state that your approach "promotes ignorance by default" as it effectively
states that users shouldn't know the properties of the implementation in a
manner that allows selection to be performed.

Not all implementations are the same and different implementations can
implement different mitigations. Properties allow us to declare those
mitigations and allow users to make different decisions on the basis of the
properties that providers declare for algorithms. Having that information
available has to be a better thing than having nothing available - as with
nothing available then no selection between alternatives is possible.

Separately arguing about what the default set of properties should be (i.e.
what mitigations should we configure as required by default) would make
sense to do so - but arguing that the information for making such decisions
shouldn't be present simply makes no sense.

Tim.

On Fri, Apr 9, 2021 at 3:02 AM Nicola Tuveri <nic.tuv at gmail.com> wrote:

> Background
> ==========
>
> [PR#14759](https://github.com/openssl/openssl/pull/14759) (Set
> blinding=yes property on some algorithm implementations) is a fix for
> [Issue#14654](https://github.com/openssl/openssl/issues/14654) which
> itself is a spin-off of
> [Issue#14616](https://github.com/openssl/openssl/issues/14616).
>
> The original issue is about _Deprecated low level key API's that have no
> replacements_, among which the following received special attention and
> were issued a dedicated issue during an OTC meeting:
>
> ~~~c
> // RSA functions on RSA_FLAG_*
> void RSA_clear_flags(RSA *r, int flags);
> int RSA_test_flags(const RSA *r, int flags);
> void RSA_set_flags(RSA *r, int flags);
>
> // RSA_FLAG_* about blinding
> #define RSA_FLAG_BLINDING
> #define RSA_FLAG_NO_BLINDING
>
> // RSA functions directly on blinding
> int RSA_blinding_on(RSA *rsa, BN_CTX *ctx);
> void RSA_blinding_off(RSA *rsa);
> BN_BLINDING *RSA_setup_blinding(RSA *rsa, BN_CTX *ctx);
> ~~~
>
> The decision the sprung Issue#14616 and PR#14759 was to use the
> propquery mechanism to let providers advertise algorithms as
> `blinding=yes` to select secure implementations if there are insecure
> ones present as well.
>
> Similarly it was discussed the potential `consttime=yes` property that
> would work in a similar way: if applied properly for our current
> implementations that are not fully constant time it would allow a
> user/sysadmin/developer to prefer a third party implementation for the
> same algorithm with better security guarantees.
> In some contexts the consttime implementation might be seriously
> penalizing in terms of performance and in the contexts where const time
> is not required this would allow to select accordingly.
>
> Definition for the blinding property
> ------------------------------------
>
> The current definition of the `blinding` property applies to
> provider-native algorithm implementations for the `asym_cipher` and
> `signature` operations:
>
> ```pod
> =head2 Properties
>
> The following property is set by some of the OpenSSL signature
> algorithms.
>
> =over 4
>
> =item "blinding"
>
> This boolean property is set to "yes" if the implementation performs
> blinding to prevent some side-channel attacks.
> ```
>
> Rationale
> =========
>
> Property queries are our decision making process for implementation
> selection, and has been part of the design for 3.0 since the Brisbane
> design meetings: I am not opposing the use of the property query
> mechanism to select algorithms here, but the semantics of the properties
> we decide to adopt (and thus endorse and encourage also for use by 3rd
> parties).
> In particular I see the following issues with choices like
> `blinding=yes` or `consttime=yes`.
>
> Design promotes insecure by default
> -----------------------------------
>
> This design is a slippery slope into patterns that go against the
> "secure by default" best practices.
> Users/administrators/developers should not have to opt-in for the safer
> implementation, but the other way around: the opt-in should be for the
> less safe but more performant implementations after evaluating the
> consequences of such a choice in specific contexts.
> We shouldn't have users having to query for `consttime=yes` algorithms
> but rather for `consttime=no` explicitly in specific conditions.
> So if this was the only issue with PR#14759 my recommendation would be
> to rather flag the non-blinding implementations as such rather than the
> other way around as is currently done.
>
> The scenario in which 3rd party providers offer insecure algorithms not
> flagged as such with `consttime=no` or `blinding=no` IMO would then fall
> under the "know your providers" umbrella: if you are installing provider
> X you should be well aware that you are trusting X's authors "to do the
> right thing".
>
> The project history showed us how the risk for insecure-by-default
> designs and its consequences are not just an hypothetical matter of
> debate: for example, `BN_FLG_CONSTTIME` has been with us since version
> 0.9.8f (and even before that under the `BN_FLG_EXP_CONSTTIME` name),
> well over 14 years, in which the decision of having the flag off by
> default and requiring to manually enabling it when desired has been the
> cause of many vulnerabilities and fixes, that are still a concern today,
> as we fixed yet another instance of forgetting to set it just some weeks
> ago (PR#13889).
>
> In this case, though, I expect that just flipping the default is not
> enough to accept this PR: the problem of going with the negative
> versions of these flags is that such design can't be made future proof
> as we wish to do: we can't know in advance all possible properties that
> should be declared as `=no` by an algorithm released today as part of
> our providers when new relevant properties might be defined in the
> future.
> E.g., what if after 3.0 is released a ECCKiila provider was released and
> opted to tag its implementations as `formally_verified`?
> They could add `formally_verified=yes` but then they would fall into the
> "insecure by default" pattern: users would need to opt-in for the safer
> version instead of getting it by default, but there is no other option
> given that our and others' providers' offerings were not preemptively
> flagged with `formally_verified=no`.
>
> Scope of propquery semantics
> ----------------------------
>
> This follows directly from the previous example. The design of the
> propquery mechanism seems (IMO) to be limited to provide
> __well-defined__ semantics for the properties only within the scope of a
> single provider (and a single provider version at that).
>
> If a given provider offered at runtime for the same algorithm two or
> more versions, one flagged as `consttime=no` and one with
> consttime guarantees, then it could provide documentation to its
> userbase on how to setup default propqueries in the configuration file
> at installation time or to application developers to embed them in their
> code depending on the use case, to select the `consttime=no` offering
> when such a thing is perceived as desirable for whatever (ill-guided :P)
> reason.
>
> But doing this across providers does not really work, because other
> provider authors might have not flagged their insecure implementations
> with `consttime=no`, or because there might be very different opinions
> on what qualifies as "fully consttime" (that evolve over time with
> different threat models and attacker capabilities):
> provX's `consttime=yes` might be "more consttime" than provY's
> `consttime=yes`.
> For example provX is similar to our default provider, and
> offers an EC algorithm built on top of a generic BIGNUM module so its
> generic implementation supports any valid EC curve named or custom and
> orgX made sure that the ECDH/ECDSA and ECmath layers of the
> implementation follow state-of-the-art practices regarding
> secret-dependant code execution or data access so they flag their
> EC implementation as `consttime=yes`, even though the underlying BIGNUM
> module has limited consttime guarantees, so in certain circumstances a
> dynamic `realloc` or some other seemingly minor thing could leak a few
> secret bits to a motivated attacker.
> provY is similar to our default provider, and provides an EC algorithm,
> but instead of having a generic codepath they only support a subset of
> named curves (e.g. only P256 and P521) each with a dedicated
> implementation that removes the need for a generic BIGNUM layer and
> instead embeds its specific consttime field arithmetic in each curve
> implementation (e.g. they provide our `ecp_nistz256.c` and
> `ecp_nistp521.c` implementations only). provY's ECDH/ECDSA, ECmath and
> FFmath layers all follow best consttime practices so they flag as
> `consttime=yes` their EC offering.
> Now `constime=yes` is not well-defined anymore across providers, making
> it quite useless for users and providers to use `consttime=yes` or
> `consttime=no` to pick among offerings from different providers.
>
> The same would be true within the same organization and provider, across
> provider versions. Let's say tomorrow we were to merge a PR that made
> all of BIGNUM resistant against all currently known timing side-channel
> techniques. Coincidentally we decided to mark all our pubkey providers
> as `consttime=yes` after verifying that all the layers above BIGNUM are
> also timing-resistant to current techniques.
> Fast-forward a year and a new CPU design flaw is discovered, giving a
> remote attacker a side-channel with unprecedented resolution and
> SNR: our previous `consttime=yes` semantic would not stand the
> test of time and the disruption of semantics could have ripple-effects
> within our providers and to the stack of applications that embedded
> `consttime=yes` as a selection mechanism to pick among offerings from
> various provider versions with now ill-defined qualifiers.
>
> If we can agree, even just on some level or some specific cases like
> these, that the definitions of properties in propqueries have their
> scope limited to the single provider version, what is the point of
> exposing `blinding=*` to our users, given that at runtime we don't offer
> alternative implementations for the same algorithm with different
> properties?
> (I mean, yes we have alternative choices at runtime in some cases, but
> not at the algorithm level in the provider API, they are buried well
> within each algorithm implementation that offers such choices)
>
> Too vague/coarse semantics
> --------------------------
>
> Elaborating further from the examples above, there is also a conceptual
> issue at the base of the OTC discussion that led to PR#14759, namely
> that "blinding is a property of an algorithm: either an implementation
> does blinding or it doesn't".
>
> But this is not entirely true: blinding is applied to individual
> operations at various levels.
> RSA is a special case that has the luxury of resting its security on
> basically a single mathematical operation (modular exponentiation) over
> a single mathematical layer of abstraction (finite field math provided
> by BIGNUM): so it is enough to apply blinding to the modular
> exponentiation to say "this algorithm implementations does blinding".
> That is not true anymore already for DSA signing (that relies on the
> same single finite field math abstraction provided by BIGNUM, but
> already rests on two operation, exponentiation and modular inversion)
> and one more layer deep with ECDH/ECDSA where you have two operations
> (scalar multiplication and modular inversion), of which the first
> is composed of math operations coming from the EC arithmetic layer of
> abstraction, and each of those operation themselves are built as the
> composition of many operations from the same FF arithmetic layer from
> BIGNUM.
> The number of levels of abstraction involved in an implementation keeps
> rising steeply when we move into the realm of upcoming post-quantum
> algorithms.
> When you have multiple operations from multiple layers, applying binding
> efficiently is non-trivial, one maybe blinds only the EC scalarmul but
> not the FF modinv, because the first takes the bulk of the computation
> time and common consensus at some point in time is that there is little
> point in blinding the modinv if its SNR is deemed too low to be
> exploitable. But what happens when this does not hold anymore? What is
> the meaning of `blinding=yes` for an EC implementation that applied
> blinding only to scalarmul and not to modinv because that was the best
> practice at the time it was written?
> What if blinding is applied to the scalarmul but the FF operation to
> perform the blinding already leaks a small amount because BIGNUM?
> A small leakage might not be considered a problem with 2048+ RSA
> keys (shared opinion might be that 1 or 3 bits of knowledge wouldn't
> give the attacker a noticeable boost compared to other tools in their
> arsenal) but the same few bits have been proven sufficient to break even
> P384 or P521 keys (which are above the security level of RSA 2048) with
> limited effort and costs: what is the meaning of `blinding=yes` in the
> case where the blinding implementation itself reveals information on the
> secret it should protect?
>
> What if the blinding operation itself is not leaking a single bit by
> itself, but it magnifies some effect from the underlying level of
> abstraction (e.g., blinding scalarmul could make the nonce bigger than
> the prime order subgroup size `n`, and the specific scalarmul
> implementation must guarantee that its scalar input is less than `n` to
> compute correctly, so it has a special branch to be robust and handle
> bigger scalars coming from blinding, creating a magnifying lens for an
> attacker when the branch is taken, as that says something on the
> probability of the high order bits of the scalar being set)?
> This might seem as having nothing to do with the semantics of
> `blinding=yes`, but it does for providers like ours where we apply
> blinding at top abstraction layer (and in the case of certain paths also
> in strategic points in the second layer) but then have a number of
> different sub-algorithm implementations that behave very differently:
> depending on the curve name (and encoding format when parsing a key) you
> might end up in the generic prime/binary curves implementation, or in
> one of the specific ones like nistz256 or nistp521.
>
> Applying `blinding=yes` like this:
>
> ```c
> {"ECDSA","provider=default,blinding=yes",ossl_ecdsa_signature_functions}
> ```
>
> is already making the semantics of the `blinding` property ill-defined,
> and affecting its effectiveness/purpose as a decision mechanism to be
> exposed to users: we have different codepaths under the same umbrella
> flagged as applying blinding, but even within it that blinding has very
> different shades of "yes".
>
> As @t8m said "Security is not a boolean value" and my point here is that
> neither is `blinding` or `consttime` and similar: they come in shades
> (even the `formally_verified` one: maybe the EC layer is formally
> verified by tools, but the underlying FF module might come with
> different guarantees, etc.) and the threshold by which SOTA deems the
> quality to become "yes*" or "no*" changes over time (e.g., it was fine
> to only blind scalarmul yesterday, so implementation X was considered as
> a blinding implementation, but today also modinv blinding is required so
> the same implementation X is now deemed non-blinding).
>
> Alternative
> -----------
>
> Support for the specific `RSA_FLAG_BLINDING`&co functionality could
> instead just be dropped.
> My understanding is that its usefulness is already limited to legacy
> compatibility with engines which anyway would not play nicely with the
> new provider/core architecture.
>
> More in general, as a mechanism to select secure implementations if
> there are insecure ones present as well, the problem can already be
> solved applying the "know your providers" mantra in combination with the
> use of the `provider=default`, `provider=provX` properties to pick
> across the offerings of multiple providers.
>
> This wouldn't cover the use case (is this even supported/tested right
> now?) of a single provider registering the same algorithm twice with
> different alternative implementations and properties: in this case
> though it would be possible to use a property definition local to the
> specific provider version, and its documentation would inform the users
> of how to use propquery to select what and the related conditions and
> caveats. Luckily this last case does not affect our implementations, as
> we don't offer competing algorithms at the provider level.
>
> Although I wouldn't propose to do it, if we really really wanted to
> retain the functionality of `RSA_FLAG_BLINDING` and have
> algorithms exposing knobs with the capability of turning on/off (or
> 1000/500/0) whatever degree of blinding they support, we could achieve
> this with existing ctrls/params as we do for many other tunable knobs.
> Using ctrls/params over doing this on a propquery level has the
> advantage that the definition/semantics can be scoped with more
> granularity than offered at the provider level.
> As a user/sysadmin, if I have done the evaluation of the specific
> context that could lead me to responsibly decide to pick no-blinding,
> no-consttime, no-secure or similar hazardous choices, I would expect to
> be anyway in a situation where I have so much control over my
> system, software and environment that patching the code to pass a param
> or call a ctrl function deeply inside my application shouldn't be a
> blocker at all.
>
> Summary
> -------
>
> I am opposed to merging PR#14759 for several reasons:
>
> - it lends itself to insecure-by-default patterns, repeating mistakes
>   from the past
> - the validity scope for similar properties is somewhat limited to a
>   single provider version, so its effectiveness against its purpose of
>   being a selection mechanism for users to pick among competing
>   algorithms offered by different providers is limited (and has the
>   potential to hunt us back in the future)
> - its definition and semantics are vague, its applicability as an
>   algorithm property too coarse to be well-defined even inside our own
>   providers
> - `secure`, `blinding`, `consttime`, etc. are not boolean properties of
>   algorithms (even if I really wish they could be)
> - it's one more vague concept for everyone to understand, and one more
>   item with not-too-clear logic for the project to handle and maintain
>   for the foreseeable future
> - we have already established alternatives in the form of
>   per-implementation params or ctrls to tune some security parameters
>   that can alter the execution flow of our implementations
>
> Also, from a more generic usability perspective and thinking of
> supporting those in our Community developing 3rd party providers, I
> think we should also
>
> - [in the vote] take a position on what are good/bad ideas when devising
>   provider-specific properties, and have some official guidelines in the
>   documentation about it by the time final 3.0.0 is release, and
> - [not in the vote, could be a separate discussion] document what
>   mechanisms we plan to put in place to coordinate efforts
>   which will allow for some carefully evaluated properties to have
>   well-defined semantics (with versioning?) across providers/vendors
>   (some form of registry of properties, with well-defined criteria for
>   maintaining it?)
>
> Proposed vote text
> ==================
>
>     Do not merge PR#14759, prevent declaring properties similar to
>     `blinding=yes` or `consttime=yes` in our implementations and
>     discourage 3rd parties from adopting similar designs.
>
>
>
>
> Please provide feedback on the vote text, I would like to open the vote
> on Monday and have some public debate about its points via the list
> rather then compress it on the OTC meeting which already has enough
> items in agenda!
>
> Thanks,
>
> Nicola
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mta.openssl.org/pipermail/openssl-project/attachments/20210409/ee62cb9d/attachment-0001.html>