[openssl-dev] [openssl.org #4346] poly1305-x86.pl's AVX2 code

David Benjamin via RT rt at openssl.org
Thu Feb 25 23:16:29 UTC 2016


There seems to be a bug in the AVX2 codepath in poly1305-x86.pl. I have not
attempted to debug this, but I have attached a test file which produces
different output in normal and AVX2 codepaths. Our existing poly1305
implementation agrees with the former.

$ OPENSSL_ia32cap=0 ./poly1305_test
PASS
$ ./poly1305_test
Poly1305 test failed.
got:      2e65f0054e36505687d937ff5e8ed112
expected: 69d28f73dd09d39a92aa179da354b7ea

You may wish to generalize that Poly1305_Update pattern into your own
tests. This is what I did to catch this:
https://boringssl-review.googlesource.com/#/c/7223/
>From looking at valgrind, this pattern seems to give good coverage. I
used valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes and
then kcachegrind to inspect the output. (kcachegrind is a bit heavy for
this. I'm hoping I can find or write a better annotator here. Something
which looks like, say, LCOV would be ideal.)

By the way, this assembly code is quite complicated. I wasn't able to find
problems in the others (I tested armv4, armv8, x86, and x86_64), but I'm
far from confident I've covered all the cases.

With the caveat that I'm no assembly programmer, much of the complexity
seems to come the SIMD code needing a multiple of 2 or 4 blocks and the
implementation converting internal state back and forth from base 2^26 and
2^64 and handling excess blocks slightly differently in different cases. (I
counted nine distinct codepaths to test in the x86_64 AVX codepath alone.)

The C code already buffers up to 16-byte blocks. Did you consider buffering
up to 32 or 64 bytes in C when the SIMD code called for it? I think it
could be simpler. You'd only need to handle excess blocks at the end. This
would also simplify the SIMD upgrade on long inputs, so long as the buffer
exceeds the cutoff. (You'll never process input before the upgrade.)

I haven't tried this, so perhaps the performance costs are prohibitive, but
if the costs are modest, the simplifications may be worth it.

David

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4346
Please log in as guest with password guest if prompted

-------------- next part --------------
A non-text attachment was scrubbed...
Name: poly1305_test.c
Type: text/x-csrc
Size: 7376 bytes
Desc: not available
URL: <http://mta.openssl.org/pipermail/openssl-dev/attachments/20160225/2fe26067/attachment.c>


More information about the openssl-dev mailing list