[openssl-dev] [openssl.org #3615] [PATCH] ChaCha20 with Poly1305 TLS Cipher Suites via the EVP interface

Andy Polyakov via RT rt at openssl.org
Sat May 30 15:00:47 UTC 2015


>> More coming in.
> 
> Consider 32-bit results. First column is assembly results for base 2^32
> integer-only code in comparison to compiler-generate code. Second column
> is my result for NEON, and last column are results for Andrew Moon's
> NEON implementation, both are base 2^26.
> 
> #                       IALU/gcc-4.4    NEON    poly1305-opt
> #
> # Cortex-A5             6.30/+130%      2.96    4.90
> # Cortex-A8             6.25/+115%      2.40    2.36
> # Cortex-A9             5.10/+95%       2.56    2.25
> # Cortex-A15            3.79/+85%       1.30    1.53
> # Snapdragon S4         5.70/+100%      1.48    7.58(?)
> 
> As mentioned earlier goal is "all-round" performance, i.e. near-optimal
> performance across *range* of platforms. Judging from Cortex-A9 result I
> have some room for improvement, hopefully it will benefit all
> processors.

After experimenting I'm leaning toward settling for above results. A
little bit improved on couple of CPUs, but same approach. What are the
approaches? When pulling input data and performing due conversion to
base 2^26 it's possible to

a) do it completely in NEON (above results);
b) do it with integer-only instructions and move data to NEON with
inter-register vmov;
c) do it with integer-only instructions and transfer data to NEON
through memory.

It was found that b) gives me ~8% improvement on Cortex-A15 and
Snapdragon S4, but hurts low-end Cortex-A5 as well as Cortex-A7 by
15/12%. Then c) performs as b) on Cortex-A15 and S4, improves Cortex-A9
by 10%, but losses on low-end go over 20%. Keep in mind that there is
certain asymmetry in how losses vs. gains are presented. For example
when we measure 25% regression it means that original is 33% faster.
Anyway, all-NEON approach appears to provide best "all-round" performance.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: poly1305-armv4.pl
Type: application/x-perl
Size: 25164 bytes
Desc: not available
URL: <http://mta.openssl.org/pipermail/openssl-dev/attachments/20150530/e136c151/attachment-0001.bin>


More information about the openssl-dev mailing list