[openssl-dev] Making assembly language optimizations working on Cortex-M3

Andy Polyakov appro at openssl.org
Thu May 26 19:02:21 UTC 2016


>     > Cortex-M platforms are so limited that every bit of performance and
>     > space savings matters. So, I think it is definitely worthwhile to
>     > support the non-NEON ARMv7-M configuration. One easy way to do this
>     > would be to avoid building NEON code when __TARGET_PROFILE_M is defined.
> 
>     I don't see no __TARGET_PROFILE_M defined by gcc
> 
> 
> I see. I didn't realize that GCC didn't emulate this ARM compiler
> feature. Never mind.

But gcc defines __ARM_ARCH_7M__, which can be used to e.g.

#if !defined(__TARGET_PROFILE_M) && defined(__ARM_ARCH_7M__)
# define __TARGET_PROFILE_M
#endif

>     Anyway, care to make a suggestion in form of patch? That
>     would be suitable even for gcc? [Just in case, no, I don't have ARM's
>     compiler, only its manual.]
> 
> 
> I can try to make a patch to bring BoringSSL's OPENSSL_STATIC_ARMCAP
> mechanism to OpenSSL, if you think that is an OK approach.

I don't understand. Original question was about conditional *omission*
of NEON code (which incidentally means even omission of run-time
switches), while BoringSSL's OPENSSL_STATIC_ARMCAP is about *keeping*
NEON as well as run-time switch *code*, just setting OPENSSL_armcap_P to
a chosen value at compile time... I mean it looks like we somehow
started to talk about different things... When I wrote "care to make
suggestion" I was thinking about going through all #if __ARM_ARCH__>=7
and complementing some of them with !defined(something_M)...

>     > Alternatively, similar to what BoringSSL did, you could have an option
>     > that says "instead of doing runtime feature detection, instead detect
>     > features at compile time based on __ARM_NEON__ and the like." I think
>     > such a configuration would also help the C compiler do whole-program
>     > optimization better.
> 
>     I doubt that, because compiler doesn't look at assembly modules.
> 
> 
> For example, in the AES-GCM code, there is a runtime check to decide
> between various implementations. With the OPENSSL_STATIC_ARMCAP-like
> approach, in theory the compiler's constant propagation and dead code
> elimination can work together to automatically optimize away the code
> paths that aren't applicable to the current configuration, without
> needing to maintain lots of #ifdefs.

Compiler might remove dead code it would generate itself, but it still
won't omit anything from assembly module. Linker takes them in as
monolithic blocks.



More information about the openssl-dev mailing list