[openssl-dev] Making assembly language optimizations working on Cortex-M3
Brian Smith
brian at briansmith.org
Wed Jun 8 01:48:15 UTC 2016
Andy Polyakov <appro at openssl.org> wrote:
>> > Cortex-M platforms are so limited that every bit of performance and
>> > space savings matters. So, I think it is definitely worthwhile to
>> > support the non-NEON ARMv7-M configuration. One easy way to do this
>> > would be to avoid building NEON code when __TARGET_PROFILE_M is defined.
>>
>> I don't see no __TARGET_PROFILE_M defined by gcc
>>
>>
>> I see. I didn't realize that GCC didn't emulate this ARM compiler
>> feature. Never mind.
>
> But gcc defines __ARM_ARCH_7M__, which can be used to e.g.
Thanks. That's useful to know.
>> I can try to make a patch to bring BoringSSL's OPENSSL_STATIC_ARMCAP
>> mechanism to OpenSSL, if you think that is an OK approach.
>
> I don't understand. Original question was about conditional *omission*
> of NEON code (which incidentally means even omission of run-time
> switches), while BoringSSL's OPENSSL_STATIC_ARMCAP is about *keeping*
> NEON as well as run-time switch *code*, just setting OPENSSL_armcap_P to
> a chosen value at compile time... I mean it looks like we somehow
> started to talk about different things... When I wrote "care to make
> suggestion" I was thinking about going through all #if __ARM_ARCH__>=7
> and complementing some of them with !defined(something_M)...
> Compiler might remove dead code it would generate itself, but it still
> won't omit anything from assembly module. Linker takes them in as
> monolithic blocks.
If the target is Cortex-M4, there is no NEON. So then, with the
OPENSSL_STATIC_ARMCAP, we won't set define OPENSSL_STATIC_ARMCAP_NEON
and so that bit of the armcap variable won't be set.
I think what you're trying to say is that, if we just stop there, then
all the NEON code will still get linked in. That's true. But, what I
mean is that we should then also change all the tests of the NEON bit
of OPENSSL_armcap_P (and, more generally, all tests of
OPENSSL_armcap_P) to use code that the C compiler can do constant
propagation and dead code elimination on. We can do this, for example,
by defining `OPENSSL_armcap_P` to be a macro that can be seen to have
a constant compile-time value, when using the OPENSSL_STATIC_ARMCAP
mechanism. And/or, we can surround the relevant code with `#if
!defined(OPENSSL_STATIC_ARMCAP ) ||
defined(OPENSSL_STATIC_ARMCAP_NEON)`, etc. This latter technique would
(IIUC) work even in the assembly language files.
In this way, if we know at build time that NEON will be available, we
can avoid compiling/linking the non-NEON code. Conversely, if we know
that NEON will NOT be available, we can avoid compiling/linking the
NEON code.
I hope this clarifies my suggestion.
Cheers,
Brian
--
https://briansmith.org/
More information about the openssl-dev
mailing list