[openssl-dev] Making assembly language optimizations working on Cortex-M3

Brian Smith brian at briansmith.org
Wed Jun 8 01:48:15 UTC 2016


Andy Polyakov <appro at openssl.org> wrote:
>>     > Cortex-M platforms are so limited that every bit of performance and
>>     > space savings matters. So, I think it is definitely worthwhile to
>>     > support the non-NEON ARMv7-M configuration. One easy way to do this
>>     > would be to avoid building NEON code when __TARGET_PROFILE_M is defined.
>>
>>     I don't see no __TARGET_PROFILE_M defined by gcc
>>
>>
>> I see. I didn't realize that GCC didn't emulate this ARM compiler
>> feature. Never mind.
>
> But gcc defines __ARM_ARCH_7M__, which can be used to e.g.

Thanks. That's useful to know.

>> I can try to make a patch to bring BoringSSL's OPENSSL_STATIC_ARMCAP
>> mechanism to OpenSSL, if you think that is an OK approach.
>
> I don't understand. Original question was about conditional *omission*
> of NEON code (which incidentally means even omission of run-time
> switches), while BoringSSL's OPENSSL_STATIC_ARMCAP is about *keeping*
> NEON as well as run-time switch *code*, just setting OPENSSL_armcap_P to
> a chosen value at compile time... I mean it looks like we somehow
> started to talk about different things... When I wrote "care to make
> suggestion" I was thinking about going through all #if __ARM_ARCH__>=7
> and complementing some of them with !defined(something_M)...

> Compiler might remove dead code it would generate itself, but it still
> won't omit anything from assembly module. Linker takes them in as
> monolithic blocks.

If the target is Cortex-M4, there is no NEON. So then, with the
OPENSSL_STATIC_ARMCAP, we won't set define OPENSSL_STATIC_ARMCAP_NEON
and so that bit of the armcap variable won't be set.

I think what you're trying to say is that, if we just stop there, then
all the NEON code will still get linked in. That's true. But, what I
mean is that we should then also change all the tests of the NEON bit
of OPENSSL_armcap_P (and, more generally, all tests of
OPENSSL_armcap_P) to use code that the C compiler can do constant
propagation and dead code elimination on. We can do this, for example,
by defining `OPENSSL_armcap_P` to be a macro that can be seen to have
a constant compile-time value, when using the OPENSSL_STATIC_ARMCAP
mechanism. And/or, we can surround the relevant code with `#if
!defined(OPENSSL_STATIC_ARMCAP ) ||
defined(OPENSSL_STATIC_ARMCAP_NEON)`, etc. This latter technique would
(IIUC) work even in the assembly language files.

In this way, if we know at build time that NEON will be available, we
can avoid compiling/linking the non-NEON code. Conversely, if we know
that NEON will NOT be available, we can avoid compiling/linking the
NEON code.

I hope this clarifies my suggestion.

Cheers,
Brian
-- 
https://briansmith.org/


More information about the openssl-dev mailing list