[openssl-project] Help deciding on PR 6341 (facilitate reading PKCS#12 objects in OSSL_STORE)

Sat Jun 2 18:27:42 UTC 2018

> On Jun 2, 2018, at 2:36 AM, Richard Levitte <levitte at openssl.org> wrote:
> 
>> Canonicalize when importing for use with the store API.
> 
> Yup.
> 
>> Not sure whether wchar_t though, just octet string in UTF-8 seems saner.
> 
> Dunno about that, really.  The aim, to quote David W, is to make it
> *hard* for applications to get it wrong, and we all know that an octet
> string is merely an octet string...

Octet strings are by *defintion* not wide characters, they are an
opaque string of *octets* (an array of uint8).  The purpose of
whchar_t and friends is to process non-ascii *character strings*,
with the wide versions of strlen(), strchr(), ...  We do none of
this.  We pass the opaque input to a key-derivation function that
treats it as a opaque octet-string.

> We cannot know with absolute certainty that it's UTF-8 encoded.

Indeed someone could pass us an octet string that is not derived
from the UTF-8 encoding of some actual character string entered
by a user.  That does not matter.  What matters is that all
user input is canonically encoded, in a *platform-independent*
way.  And for that the application is responsible for converting
user input to UTF-8.  If the application does not do it right,
it will get incorrect (fail to decrypt) or non-portable (fail
to decrypt in the future on other platforms) behaviour.

> The way I saw it is that UTF-8
> really means Unicode, and a way to codify that is wchar_t.

NO.  That's not the point.  UTF-8 yields a canonical encoding
of what the user typed to an opaque octet string.  That
encoding is the application's responsibility.  We must not
treat the password as a character string, that's not portable.

> openssl-users> That is the password is an opaque byte string, not a character
> openssl-users> string in the platform's encoding of i18n strings.
> 
> Here is, unfortunately, where standards differ.  PKCS#12 has a
> requirement that makes the pass phrase anything but opaque.

OK, looking at:

  https://tools.ietf.org/html/rfc7292#appendix-B.1

we see that PKCS#5 v2.1 sensibly defines passwords as opaque strings
in some unspecified standard encoding (ASCII or UTF-8 for example).

PKCS#12 however, is sadly requiring a 16-bit BMPString encoding
(instead of UTF-8), presumably for backwards compatibility.

> With that, the characters have meaning and need to be interpreted
> correctly to form a standard compliant BMPString.

Well, in that case for PKCS#12 we must require a well-formed
UTF-8 input, which we can convert to BMPString without any
need for locale-specific information.  The ASN.1 library
presumably can convert from UTF-8 to BMP, or code can be
added to do that if missing.

> (it would have been smarter to have the PKCS12 routines take wchar_t
> strings rather than char strings...  hindsight is what it is...)

No, wchar_t is not defined to be a 16-bit BMPString compatible
encoding.  It is AFAIK a platform-specific string representation
that is not canonical.

-- 
	Viktor.