[openssl-users] BN_MUL_MONT for ARM64 v8
Vijay Chander
vijay.chander at gmail.com
Wed Feb 8 00:50:33 UTC 2017
Yes. Already took Andy's word from his previous replies for precisely this
reason.
GMP exercise was easy enough to get it out of the way.
Thanks,
Vijay
On Feb 7, 2017 4:46 PM, "Jakob Bohm" <jb-openssl at wisemo.com> wrote:
> OpenSSL also has a lot of handwritten assembly language for ARM,
> x86 etc. Most of it written by Andy Polyakov.
>
> His response about what can and cannot be done on various ARM CPU
> models is most probably a result of this work.
>
> Also, OpenSSL has a more permissive license than the GMP, so using
> GMP in OpenSSL would cause problems for many OpenSSL using
> applications.
>
> On 08/02/2017 00:31, Mike Mohr wrote:
>
>> Have you considered using GMP as a big integer backed for openssl? It
>> has support for several arm variants using handwritten assembly code
>> and the developers go to great lengths to find optimize runtime on all
>> supported platforms.
>>
>> On Feb 7, 2017 2:26 PM, "Vijay Chander" <vijay.chander at gmail.com
>> <mailto:vijay.chander at gmail.com>> wrote:
>>
>> Andy,
>> 1:2.5 is pretty in my opinion for ARM !
>>
>> We will check out Mongoose.
>>
>> Hmm - will try to get to the bottom of those cache misses (at a
>> lower priority).
>>
>> Thanks,
>> -vijay
>>
>>
>> On Tue, Feb 7, 2017 at 11:07 AM, Andy Polyakov <appro at openssl.org
>> <mailto:appro at openssl.org>> wrote:
>>
>> > A72 is running 1GHz compared to x86 at 2.1Ghz. So that should
>> hopefully
>> > get down to -1:5.
>>
>> And Mongoose will take you to ~1:2.5 (scaled to same frequency
>> that is).
>> Which I'd say is a fair result. Well, still could have been a bit
>> better, but it's not unreasonable given ISA differences. Keep
>> in mind
>> that presented x86_64 result is for code utilizing
>> Intel-specific code
>> extensions.
>>
>> > There is no L3 cache on the A72 eval board and performance
>> counters do
>> > show 9x more DRAM accesses for ARM compared to x86.
>>
>> This is unexpected, because it takes *less* references to
>> memory to
>> perform it on ARMv8. Because it has larger register bank. And
>> cache
>> requirement is not that high for L3 to kick in... But at any
>> case memory
>> is not bottleneck here...
>>
>>
>
> --
> Jakob Bohm, CIO, partner, WiseMo A/S. https://www.wisemo.com
> Transformervej 29, 2860 Soborg, Denmark. direct: +45 31 13 16 10 <tel:
> +4531131610>
> This message is only for its intended recipient, delete if misaddressed.
> WiseMo - Remote Service Management for PCs, Phones and Embedded
>
>
> Enjoy
>
> Jakob
> --
> Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
> Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
> This public discussion message is non-binding and may contain errors.
> WiseMo - Remote Service Management for PCs, Phones and Embedded
> --
> openssl-users mailing list
> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mta.openssl.org/pipermail/openssl-users/attachments/20170207/297d5160/attachment.html>
More information about the openssl-users
mailing list