[openssl-dev] Usage of assembler code on ARM architectures

Tue Mar 17 12:37:17 UTC 2015

> My mistake, it looks like my memory was wrong on two accounts.  First,
> it was AES, not SHA, where I observed the no-asm was faster.  Second, it
> was on the PowerPC cross-compiled target, not ARM.  The results from
> "openssl speed aes-128-cbc" are:
> 
> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
> bytes
> w/o no-asm       31010.47k    32988.82k    33549.41k    33693.05k   
> 33825.67k
> no-asm           42431.46k    46485.14k    47479.20k    47874.86k   
> 47829.36k
> 
> This is using a Freescale 8548.

This is no mystery at all, and kind of intentional. If you examine
commentary in aes-ppc.pl you'll notice that that it relies on "compact"
subroutines, those that are using 256-byte S-boxes, which require more
computations. It mentions that "compact" encrypt is ~2 times slower than
"traditional" encrypt. On the other side of scales is insecurity of
"traditional" subroutine which is susceptible to cache-timing attacks.
Well, it's not like "compact" is not susceptible, but it's *much* more
resistant. Indeed, vulnerability is quantified by probability of a cache
line not being accessed as result of block operation, and in "compact"
case is as low as (1-32/256)^160=5e-10 vs. (1-4/256)^160=0.08 for
processor in question. Note that C version is even worse than
"non-compact" assembly subroutine.

You might argue that there is no room for adversary in *your*
application and performance should be favoured. By "no room" I mean that
it's probably locked down embedded system and adversary having ability
to execute own code is considered big enough problem. Yes, but you have
to *argue* in favour. Maybe it should be a compile option...