[openssl-dev] [openssl.org #3897] request: add BLAKE2 hash function (let's kill md5sum!)

Bill Cox waywardgeek at google.com
Wed Jun 10 23:36:27 UTC 2015


On Wed, Jun 10, 2015 at 3:19 PM, Zooko Wilcox-OHearn <
zooko at leastauthority.com> wrote:

> On Tue, Jun 9, 2015 at 10:00 PM, Bill Cox <waywardgeek at google.com> wrote:
> >>
> >> I would suggest that we move ahead with the last option — the
> >> reference implementation of BLAKE2sp.
>
> > I have trouble agreeing with this.  First, BLAKE2sp is more than 10X
> slower
> > than BLAKE2s for 256 bit inputs on my machine.
>
> Wow! I didn't measure that. What implementation of BLAKE2sp did you
> try? What kind of machine is yours?


My machine is a desktop with a Xeon E5-1650 v2 @ 3.50GHz.  I've attached
the code.  It's Samuel Neves' SSE4.1 optimized version, with a simple
wrapper to measure the speed.

>  On my machine, BLAKE2sp only wins for data sizes over 1 KiB.
>
> Hm, yeah, I'm mostly driven by the "b2sum" use case here, which almost
> never gets used on inputs as small as 1 KiB, and often gets used on
> inputs 3 or even 6 orders of magnitude bigger!
>

That's a _very_ important use case!


> I agree with you that the short-input use case is really important. I
> wonder how well we could do with an optimized, single-threaded
> implementation of BLAKE2sp. Could you do an experiment of the
> single-threaded implementation of BLAKE2sp with floodyberry's
> optimized implementation: https://github.com/floodyberry/blake2b-opt ?


I could benchmark it, but the basic problem is that BLAKE2sp initializes 8
BLAKE2s contexts, calls 8 finish methods, and then hashes 256*8 bits to
generate the 256 bit result.  There is no way to make this speed
competitive for small inputs on the order of 256 bits.

However, we could get fancier if we wanted a threaded algorithm with a
simple API, and no num_threads parameter.  I love these sorts of
optimizations :-)  Start out by hashing the first 512 bytes with 1 thread,
then switch to 2 threads for the next 1KiB.  After that, hash the next 2KiB
with 4 threads, and the rest with 8 threads.

This way, for large files, you would get the full 8-thread speed, while for
tiny hashes like 256 bits, you get the raw BLAKE2s speed.  Honestly,
BLAKE2sp and BLAKE2bp should already work this way.  I think I've just
talked myself into thinking that any OpenSSL parallel hash wrappers should
also work this way (possibly with a max_threads parameter).

Samuel Neves' SSE version is the one we all played with in the Password
Hashing Competition.  The speed is amazing.  Is there a faster version
available now?  Which version should we integrate into OpenSSL?

BLAKE2 rocks.  I'm looking forward to using it in many applications.

Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mta.openssl.org/pipermail/openssl-dev/attachments/20150610/2eb9e482/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blake2-sse.tar.gz
Type: application/x-gzip
Size: 9061 bytes
Desc: not available
URL: <http://mta.openssl.org/pipermail/openssl-dev/attachments/20150610/2eb9e482/attachment.bin>


More information about the openssl-dev mailing list