[openssl-dev] Partially- vs. full- reduced inputs to ecp_nistz256_neg

Tue Aug 16 19:41:30 UTC 2016

Andy Polyakov <appro at openssl.org> wrote:
> And it's not only that multiplication (and squaring) result is fully
> reduced, it, multiplication (ans squaring) subroutine can actually
> manage partially reduced input. On related note one can point out that
> result of addition (and mul_by_[2|3]) is partially reduced. But it's
> multiplication's ability to handle it that ties things up. One should
> also remember that it always ends with multiplication when result is
> converted from Montgomery representation. As well as that it starts with
> multiplication when input is converted to Montgomery representation...

Andy, thanks! Let's rewind a bit:

Originally I was writing tests and the first function I tested the
reduction for was ecp_nistz256_add.

I was surprised to find that the result is not necessarily unreduced.
I emailed the original authors from Intel (Vlad and Shay) asking if
this was expected and whether any other functions return unreduced
results. Vlad's reply was that *all* of the functions are
expected/assumed to return unreduced inputs. I then followed up asking
specifically about whether the result of the multiplication could be
only partially reduced. I received another reply that the result of
the multiplication could be only partially reduced and that some code
I'd written for BoringSSL that assumes that the multiplication result
was always fully reduced was wrong.

After that, I re-read the code for the conditional subtraction at the
end of ecp_nistz256_mul_mont (__ecp_nistz256_mul_montq, actually) and
I couldn't convince myself that the result was always fully reduced.

My concern is that what you say and what Vlad said is contradictory.
You both understand the code way better than me, so I feel like I'm
not so useful in resolving the contradiction. But, I will try anyway:

    sbb $poly3, $acc1 # .Lpoly[3]
    sbb \$0, $acc2

    cmovc $t0, $acc4
    cmovc $t1, $acc5

My understand after talking with Vlad that the "sbb \$0, $acc2" makes
this equivalent to (r >= 2**256) ? (r - q) : r. If the "sbb \$0,
$acc2" line were removed then it would be equivalent to (r >= q) ? (r
- q) : r. My understanding is that the difference in semantics is
exactly the difference between partially reduced results and fully
reduced results.

Another way to see this is that there are 5 sbb instructions issued
for the conditional subtraction, which means 5 limbs are involved.
But, a full reduction mod q should only have 4 sbb instructions,
right?

Again, I could very well be horrifically misunderstanding things.
Perhaps it would be a good idea to ask Vlad to weigh in again?

Cheers,
Brian
-- 
https://briansmith.org/