[openssl] OpenSSL_1_1_1-stable update

Sun Jan 5 08:22:23 UTC 2020

The branch OpenSSL_1_1_1-stable has been updated
       via  940c5888a2510403ae6178581d9280f0b8ef700b (commit)
       via  38be93f6bf86ef51fc832cadecaad8f7c1f8feb9 (commit)
       via  ec7c1fd322ede90783b626facc5252133a6dfa90 (commit)
       via  75738b9433185f44103073fd29ec63ae72e24e63 (commit)
       via  969ee511822bfdc5b25c7edbb9968d3746a402c4 (commit)
       via  6f52dbb68fb88d693b4e49a2a5d6638534cd8a79 (commit)
       via  1f60c1c788559ef69ee39ad490ba89b13733e1ca (commit)
      from  2c52a36400345d999c8ee3604fe1ea93fddd5cb7 (commit)


- Log -----------------------------------------------------------------
commit 940c5888a2510403ae6178581d9280f0b8ef700b
Author: Fangming.Fang <fangming.fang at arm.com>
Date:   Mon Dec 30 12:15:37 2019 +0000

    Fix side channel in ecp_nistz256-armv8.pl
    
    This change addresses a potential side-channel vulnerability in
    the internals of nistz256 low level operations for armv8.
    
    Reviewed-by: Nicola Tuveri <nic.tuv at gmail.com>
    Reviewed-by: Matt Caswell <matt at openssl.org>
    Reviewed-by: Bernd Edlinger <bernd.edlinger at hotmail.de>
    (Merged from https://github.com/openssl/openssl/pull/9239)
    
    (cherry picked from commit f5a659b6dfcc735a62c712dcca64d116d2289b97)

commit 38be93f6bf86ef51fc832cadecaad8f7c1f8feb9
Author: Bernd Edlinger <bernd.edlinger at hotmail.de>
Date:   Thu Aug 29 22:45:18 2019 +0200

    Fix side channel in the ecp_nistz256.c reference implementation
    
    This is only used if configured with
    ./config -DECP_NISTZ256_REFERENCE_IMPLEMENTATION
    
    Reviewed-by: Nicola Tuveri <nic.tuv at gmail.com>
    Reviewed-by: Matt Caswell <matt at openssl.org>
    (Merged from https://github.com/openssl/openssl/pull/9239)
    
    (cherry picked from commit 7d4716648e8348dea862e198b9395478fae01907)

commit ec7c1fd322ede90783b626facc5252133a6dfa90
Author: Bernd Edlinger <bernd.edlinger at hotmail.de>
Date:   Sun Aug 25 03:47:01 2019 +0200

    Improve side channel fix in ecp_nistz256-x86_64.pl
    
    Reviewed-by: Nicola Tuveri <nic.tuv at gmail.com>
    Reviewed-by: Matt Caswell <matt at openssl.org>
    (Merged from https://github.com/openssl/openssl/pull/9239)
    
    (cherry picked from commit e9fe87950db2e6169029b2ecf3ed09d64265bc9c)

commit 75738b9433185f44103073fd29ec63ae72e24e63
Author: Bernd Edlinger <bernd.edlinger at hotmail.de>
Date:   Sun Aug 25 03:45:31 2019 +0200

    Fix side channel in ecp_nistz256-armv4.pl
    
    Reviewed-by: Nicola Tuveri <nic.tuv at gmail.com>
    Reviewed-by: Matt Caswell <matt at openssl.org>
    (Merged from https://github.com/openssl/openssl/pull/9239)
    
    (cherry picked from commit 3d139746ca72f0906c036d0a4a3e176c7b61ed1b)

commit 969ee511822bfdc5b25c7edbb9968d3746a402c4
Author: Bernd Edlinger <bernd.edlinger at hotmail.de>
Date:   Sun Aug 25 03:03:55 2019 +0200

    Fix side channel in ecp_nistz256-x86.pl
    
    Reviewed-by: Nicola Tuveri <nic.tuv at gmail.com>
    Reviewed-by: Matt Caswell <matt at openssl.org>
    (Merged from https://github.com/openssl/openssl/pull/9239)
    
    (cherry picked from commit 0de3399b691f025153c8001045d5eeb0909dfd7a)

commit 6f52dbb68fb88d693b4e49a2a5d6638534cd8a79
Author: David Benjamin <davidben at google.com>
Date:   Fri Jun 14 17:06:52 2019 -0400

    Avoid leaking intermediate states in point doubling special case.
    
    Cherry picked from
    https://github.com/google/boringssl/commit/12d9ed670da3edd64ce8175cfe0e091982989c18
    
    Reviewed-by: Nicola Tuveri <nic.tuv at gmail.com>
    Reviewed-by: Matt Caswell <matt at openssl.org>
    Reviewed-by: Bernd Edlinger <bernd.edlinger at hotmail.de>
    (Merged from https://github.com/openssl/openssl/pull/9239)
    
    (cherry picked from commit 2baea7c7e0896658b74956cac6084dd7e82e8c1b)

commit 1f60c1c788559ef69ee39ad490ba89b13733e1ca
Author: Nicola Tuveri <nic.tuv at gmail.com>
Date:   Sat Jun 8 12:48:47 2019 +0300

    Fix potential SCA vulnerability in some EC_METHODs
    
    This commit addresses a potential side-channel vulnerability in the
    internals of some elliptic curve low level operations.
    The side-channel leakage appears to be tiny, so the severity of this
    issue is rather low.
    
    The issue was reported by David Schrammel and Samuel Weiser.
    
    Reviewed-by: Matt Caswell <matt at openssl.org>
    Reviewed-by: Bernd Edlinger <bernd.edlinger at hotmail.de>
    (Merged from https://github.com/openssl/openssl/pull/9239)
    
    (cherry picked from commit 3cb914c463ed1c9e32cfb773d816139a61b6ad5f)

-----------------------------------------------------------------------

Summary of changes:
 crypto/ec/asm/ecp_nistz256-armv4.pl  | 79 +++++++++++++++---------------------
 crypto/ec/asm/ecp_nistz256-armv8.pl  | 65 +++++++++++++----------------
 crypto/ec/asm/ecp_nistz256-x86.pl    | 76 ++++++++++++++++------------------
 crypto/ec/asm/ecp_nistz256-x86_64.pl | 26 ++++--------
 crypto/ec/ecp_nistp224.c             | 35 ++++++++++++++--
 crypto/ec/ecp_nistp256.c             | 22 +++++++++-
 crypto/ec/ecp_nistp521.c             | 20 ++++++++-
 crypto/ec/ecp_nistz256.c             | 49 ++++++++++++++++++----
 8 files changed, 216 insertions(+), 156 deletions(-)

diff --git a/crypto/ec/asm/ecp_nistz256-armv4.pl b/crypto/ec/asm/ecp_nistz256-armv4.pl
index 83abbdd895..f5e7a88163 100755
--- a/crypto/ec/asm/ecp_nistz256-armv4.pl
+++ b/crypto/ec/asm/ecp_nistz256-armv4.pl
@@ -1394,7 +1394,7 @@ my ($Z1sqr, $Z2sqr) = ($Hsqr, $Rsqr);
 # 256-bit vectors on top. Then note that we push
 # starting from r0, which means that we have copy of
 # input arguments just below these temporary vectors.
-# We use three of them for !in1infty, !in2intfy and
+# We use three of them for ~in1infty, ~in2infty and
 # result of check for zero.
 
 $code.=<<___;
@@ -1424,7 +1424,7 @@ ecp_nistz256_point_add:
 #endif
 	movne	r12,#-1
 	stmia	r3,{r4-r11}
-	str	r12,[sp,#32*18+8]	@ !in2infty
+	str	r12,[sp,#32*18+8]	@ ~in2infty
 
 	ldmia	$a_ptr!,{r4-r11}	@ copy in1_x
 	add	r3,sp,#$in1_x
@@ -1445,7 +1445,7 @@ ecp_nistz256_point_add:
 #endif
 	movne	r12,#-1
 	stmia	r3,{r4-r11}
-	str	r12,[sp,#32*18+4]	@ !in1infty
+	str	r12,[sp,#32*18+4]	@ ~in1infty
 
 	add	$a_ptr,sp,#$in2_z
 	add	$b_ptr,sp,#$in2_z
@@ -1510,33 +1510,20 @@ ecp_nistz256_point_add:
 	orr	$a0,$a0,$a2
 	orr	$a4,$a4,$a6
 	orr	$a0,$a0,$a7
-	orrs	$a0,$a0,$a4
+	orr	$a0,$a0,$a4		@ ~is_equal(U1,U2)
 
-	bne	.Ladd_proceed		@ is_equal(U1,U2)?
+	ldr	$t0,[sp,#32*18+4]	@ ~in1infty
+	ldr	$t1,[sp,#32*18+8]	@ ~in2infty
+	ldr	$t2,[sp,#32*18+12]	@ ~is_equal(S1,S2)
+	mvn	$t0,$t0			@ -1/0 -> 0/-1
+	mvn	$t1,$t1			@ -1/0 -> 0/-1
+	orr	$a0,$t0
+	orr	$a0,$t1
+	orrs	$a0,$t2			@ set flags
 
-	ldr	$t0,[sp,#32*18+4]
-	ldr	$t1,[sp,#32*18+8]
-	ldr	$t2,[sp,#32*18+12]
-	tst	$t0,$t1
-	beq	.Ladd_proceed		@ (in1infty || in2infty)?
-	tst	$t2,$t2
-	beq	.Ladd_double		@ is_equal(S1,S2)?
+	@ if(~is_equal(U1,U2) | in1infty | in2infty | ~is_equal(S1,S2))
+	bne	.Ladd_proceed
 
-	ldr	$r_ptr,[sp,#32*18+16]
-	eor	r4,r4,r4
-	eor	r5,r5,r5
-	eor	r6,r6,r6
-	eor	r7,r7,r7
-	eor	r8,r8,r8
-	eor	r9,r9,r9
-	eor	r10,r10,r10
-	eor	r11,r11,r11
-	stmia	$r_ptr!,{r4-r11}
-	stmia	$r_ptr!,{r4-r11}
-	stmia	$r_ptr!,{r4-r11}
-	b	.Ladd_done
-
-.align	4
 .Ladd_double:
 	ldr	$a_ptr,[sp,#32*18+20]
 	add	sp,sp,#32*(18-5)+16	@ difference in frame sizes
@@ -1601,15 +1588,15 @@ ecp_nistz256_point_add:
 	add	$b_ptr,sp,#$S2
 	bl	__ecp_nistz256_sub_from	@ p256_sub(res_y, res_y, S2);
 
-	ldr	r11,[sp,#32*18+4]	@ !in1intfy
-	ldr	r12,[sp,#32*18+8]	@ !in2intfy
+	ldr	r11,[sp,#32*18+4]	@ ~in1infty
+	ldr	r12,[sp,#32*18+8]	@ ~in2infty
 	add	r1,sp,#$res_x
 	add	r2,sp,#$in2_x
-	and	r10,r11,r12
+	and	r10,r11,r12		@ ~in1infty & ~in2infty
 	mvn	r11,r11
 	add	r3,sp,#$in1_x
-	and	r11,r11,r12
-	mvn	r12,r12
+	and	r11,r11,r12		@ in1infty & ~in2infty
+	mvn	r12,r12			@ in2infty
 	ldr	$r_ptr,[sp,#32*18+16]
 ___
 for($i=0;$i<96;$i+=8) {			# conditional moves
@@ -1617,11 +1604,11 @@ $code.=<<___;
 	ldmia	r1!,{r4-r5}		@ res_x
 	ldmia	r2!,{r6-r7}		@ in2_x
 	ldmia	r3!,{r8-r9}		@ in1_x
-	and	r4,r4,r10
+	and	r4,r4,r10		@ ~in1infty & ~in2infty
 	and	r5,r5,r10
-	and	r6,r6,r11
+	and	r6,r6,r11		@ in1infty & ~in2infty
 	and	r7,r7,r11
-	and	r8,r8,r12
+	and	r8,r8,r12		@ in2infty
 	and	r9,r9,r12
 	orr	r4,r4,r6
 	orr	r5,r5,r7
@@ -1656,7 +1643,7 @@ my $Z1sqr = $S2;
 # 256-bit vectors on top. Then note that we push
 # starting from r0, which means that we have copy of
 # input arguments just below these temporary vectors.
-# We use two of them for !in1infty, !in2intfy.
+# We use two of them for ~in1infty, ~in2infty.
 
 my @ONE_mont=(1,0,0,-1,-1,-1,-2,0);
 
@@ -1687,7 +1674,7 @@ ecp_nistz256_point_add_affine:
 #endif
 	movne	r12,#-1
 	stmia	r3,{r4-r11}
-	str	r12,[sp,#32*15+4]	@ !in1infty
+	str	r12,[sp,#32*15+4]	@ ~in1infty
 
 	ldmia	$b_ptr!,{r4-r11}	@ copy in2_x
 	add	r3,sp,#$in2_x
@@ -1714,7 +1701,7 @@ ecp_nistz256_point_add_affine:
 	it	ne
 #endif
 	movne	r12,#-1
-	str	r12,[sp,#32*15+8]	@ !in2infty
+	str	r12,[sp,#32*15+8]	@ ~in2infty
 
 	add	$a_ptr,sp,#$in1_z
 	add	$b_ptr,sp,#$in1_z
@@ -1796,15 +1783,15 @@ ecp_nistz256_point_add_affine:
 	add	$b_ptr,sp,#$S2
 	bl	__ecp_nistz256_sub_from	@ p256_sub(res_y, res_y, S2);
 
-	ldr	r11,[sp,#32*15+4]	@ !in1intfy
-	ldr	r12,[sp,#32*15+8]	@ !in2intfy
+	ldr	r11,[sp,#32*15+4]	@ ~in1infty
+	ldr	r12,[sp,#32*15+8]	@ ~in2infty
 	add	r1,sp,#$res_x
 	add	r2,sp,#$in2_x
-	and	r10,r11,r12
+	and	r10,r11,r12		@ ~in1infty & ~in2infty
 	mvn	r11,r11
 	add	r3,sp,#$in1_x
-	and	r11,r11,r12
-	mvn	r12,r12
+	and	r11,r11,r12		@ in1infty & ~in2infty
+	mvn	r12,r12			@ in2infty
 	ldr	$r_ptr,[sp,#32*15]
 ___
 for($i=0;$i<64;$i+=8) {			# conditional moves
@@ -1812,11 +1799,11 @@ $code.=<<___;
 	ldmia	r1!,{r4-r5}		@ res_x
 	ldmia	r2!,{r6-r7}		@ in2_x
 	ldmia	r3!,{r8-r9}		@ in1_x
-	and	r4,r4,r10
+	and	r4,r4,r10		@ ~in1infty & ~in2infty
 	and	r5,r5,r10
-	and	r6,r6,r11
+	and	r6,r6,r11		@ in1infty & ~in2infty
 	and	r7,r7,r11
-	and	r8,r8,r12
+	and	r8,r8,r12		@ in2infty
 	and	r9,r9,r12
 	orr	r4,r4,r6
 	orr	r5,r5,r7
diff --git a/crypto/ec/asm/ecp_nistz256-armv8.pl b/crypto/ec/asm/ecp_nistz256-armv8.pl
index 887ddfb1ea..e32d0d9344 100644
--- a/crypto/ec/asm/ecp_nistz256-armv8.pl
+++ b/crypto/ec/asm/ecp_nistz256-armv8.pl
@@ -722,7 +722,7 @@ $code.=<<___;
 .align	5
 ecp_nistz256_point_double:
 	.inst	0xd503233f		// paciasp
-	stp	x29,x30,[sp,#-80]!
+	stp	x29,x30,[sp,#-96]!
 	add	x29,sp,#0
 	stp	x19,x20,[sp,#16]
 	stp	x21,x22,[sp,#32]
@@ -855,7 +855,7 @@ ecp_nistz256_point_double:
 	add	sp,x29,#0		// destroy frame
 	ldp	x19,x20,[x29,#16]
 	ldp	x21,x22,[x29,#32]
-	ldp	x29,x30,[sp],#80
+	ldp	x29,x30,[sp],#96
 	.inst	0xd50323bf		// autiasp
 	ret
 .size	ecp_nistz256_point_double,.-ecp_nistz256_point_double
@@ -872,7 +872,7 @@ my ($res_x,$res_y,$res_z,
 my ($Z1sqr, $Z2sqr) = ($Hsqr, $Rsqr);
 # above map() describes stack layout with 12 temporary
 # 256-bit vectors on top.
-my ($rp_real,$ap_real,$bp_real,$in1infty,$in2infty,$temp)=map("x$_",(21..26));
+my ($rp_real,$ap_real,$bp_real,$in1infty,$in2infty,$temp0,$temp1,$temp2)=map("x$_",(21..28));
 
 $code.=<<___;
 .globl	ecp_nistz256_point_add
@@ -880,12 +880,13 @@ $code.=<<___;
 .align	5
 ecp_nistz256_point_add:
 	.inst	0xd503233f		// paciasp
-	stp	x29,x30,[sp,#-80]!
+	stp	x29,x30,[sp,#-96]!
 	add	x29,sp,#0
 	stp	x19,x20,[sp,#16]
 	stp	x21,x22,[sp,#32]
 	stp	x23,x24,[sp,#48]
 	stp	x25,x26,[sp,#64]
+	stp	x27,x28,[sp,#80]
 	sub	sp,sp,#32*12
 
 	ldp	$a0,$a1,[$bp,#64]	// in2_z
@@ -899,7 +900,7 @@ ecp_nistz256_point_add:
 	orr	$t2,$a2,$a3
 	orr	$in2infty,$t0,$t2
 	cmp	$in2infty,#0
-	csetm	$in2infty,ne		// !in2infty
+	csetm	$in2infty,ne		// ~in2infty
 	add	$rp,sp,#$Z2sqr
 	bl	__ecp_nistz256_sqr_mont	// p256_sqr_mont(Z2sqr, in2_z);
 
@@ -909,7 +910,7 @@ ecp_nistz256_point_add:
 	orr	$t2,$a2,$a3
 	orr	$in1infty,$t0,$t2
 	cmp	$in1infty,#0
-	csetm	$in1infty,ne		// !in1infty
+	csetm	$in1infty,ne		// ~in1infty
 	add	$rp,sp,#$Z1sqr
 	bl	__ecp_nistz256_sqr_mont	// p256_sqr_mont(Z1sqr, in1_z);
 
@@ -950,7 +951,7 @@ ecp_nistz256_point_add:
 
 	orr	$acc0,$acc0,$acc1	// see if result is zero
 	orr	$acc2,$acc2,$acc3
-	orr	$temp,$acc0,$acc2
+	orr	$temp0,$acc0,$acc2	// ~is_equal(S1,S2)
 
 	add	$bp,sp,#$Z2sqr
 	add	$rp,sp,#$U1
@@ -971,32 +972,21 @@ ecp_nistz256_point_add:
 
 	orr	$acc0,$acc0,$acc1	// see if result is zero
 	orr	$acc2,$acc2,$acc3
-	orr	$acc0,$acc0,$acc2
-	tst	$acc0,$acc0
-	b.ne	.Ladd_proceed		// is_equal(U1,U2)?
+	orr	$acc0,$acc0,$acc2	// ~is_equal(U1,U2)
 
-	tst	$in1infty,$in2infty
-	b.eq	.Ladd_proceed		// (in1infty || in2infty)?
+	mvn	$temp1,$in1infty	// -1/0 -> 0/-1
+	mvn	$temp2,$in2infty	// -1/0 -> 0/-1
+	orr	$acc0,$acc0,$temp1
+	orr	$acc0,$acc0,$temp2
+	orr	$acc0,$acc0,$temp0
+	cbnz	$acc0,.Ladd_proceed	// if(~is_equal(U1,U2) | in1infty | in2infty | ~is_equal(S1,S2))
 
-	tst	$temp,$temp
-	b.eq	.Ladd_double		// is_equal(S1,S2)?
-
-	eor	$a0,$a0,$a0
-	eor	$a1,$a1,$a1
-	stp	$a0,$a1,[$rp_real]
-	stp	$a0,$a1,[$rp_real,#16]
-	stp	$a0,$a1,[$rp_real,#32]
-	stp	$a0,$a1,[$rp_real,#48]
-	stp	$a0,$a1,[$rp_real,#64]
-	stp	$a0,$a1,[$rp_real,#80]
-	b	.Ladd_done
-
-.align	4
 .Ladd_double:
 	mov	$ap,$ap_real
 	mov	$rp,$rp_real
 	ldp	x23,x24,[x29,#48]
 	ldp	x25,x26,[x29,#64]
+	ldp	x27,x28,[x29,#80]
 	add	sp,sp,#32*(12-4)	// difference in stack frames
 	b	.Ldouble_shortcut
 
@@ -1081,14 +1071,14 @@ ___
 for($i=0;$i<64;$i+=32) {		# conditional moves
 $code.=<<___;
 	ldp	$acc0,$acc1,[$ap_real,#$i]	// in1
-	cmp	$in1infty,#0			// !$in1intfy, remember?
+	cmp	$in1infty,#0			// ~$in1intfy, remember?
 	ldp	$acc2,$acc3,[$ap_real,#$i+16]
 	csel	$t0,$a0,$t0,ne
 	csel	$t1,$a1,$t1,ne
 	ldp	$a0,$a1,[sp,#$res_x+$i+32]	// res
 	csel	$t2,$a2,$t2,ne
 	csel	$t3,$a3,$t3,ne
-	cmp	$in2infty,#0			// !$in2intfy, remember?
+	cmp	$in2infty,#0			// ~$in2intfy, remember?
 	ldp	$a2,$a3,[sp,#$res_x+$i+48]
 	csel	$acc0,$t0,$acc0,ne
 	csel	$acc1,$t1,$acc1,ne
@@ -1102,13 +1092,13 @@ ___
 }
 $code.=<<___;
 	ldp	$acc0,$acc1,[$ap_real,#$i]	// in1
-	cmp	$in1infty,#0			// !$in1intfy, remember?
+	cmp	$in1infty,#0			// ~$in1intfy, remember?
 	ldp	$acc2,$acc3,[$ap_real,#$i+16]
 	csel	$t0,$a0,$t0,ne
 	csel	$t1,$a1,$t1,ne
 	csel	$t2,$a2,$t2,ne
 	csel	$t3,$a3,$t3,ne
-	cmp	$in2infty,#0			// !$in2intfy, remember?
+	cmp	$in2infty,#0			// ~$in2intfy, remember?
 	csel	$acc0,$t0,$acc0,ne
 	csel	$acc1,$t1,$acc1,ne
 	csel	$acc2,$t2,$acc2,ne
@@ -1122,7 +1112,8 @@ $code.=<<___;
 	ldp	x21,x22,[x29,#32]
 	ldp	x23,x24,[x29,#48]
 	ldp	x25,x26,[x29,#64]
-	ldp	x29,x30,[sp],#80
+	ldp	x27,x28,[x29,#80]
+	ldp	x29,x30,[sp],#96
 	.inst	0xd50323bf		// autiasp
 	ret
 .size	ecp_nistz256_point_add,.-ecp_nistz256_point_add
@@ -1166,7 +1157,7 @@ ecp_nistz256_point_add_affine:
 	orr	$t2,$a2,$a3
 	orr	$in1infty,$t0,$t2
 	cmp	$in1infty,#0
-	csetm	$in1infty,ne		// !in1infty
+	csetm	$in1infty,ne		// ~in1infty
 
 	ldp	$acc0,$acc1,[$bp]	// in2_x
 	ldp	$acc2,$acc3,[$bp,#16]
@@ -1180,7 +1171,7 @@ ecp_nistz256_point_add_affine:
 	orr	$t0,$t0,$t2
 	orr	$in2infty,$acc0,$t0
 	cmp	$in2infty,#0
-	csetm	$in2infty,ne		// !in2infty
+	csetm	$in2infty,ne		// ~in2infty
 
 	add	$rp,sp,#$Z1sqr
 	bl	__ecp_nistz256_sqr_mont	// p256_sqr_mont(Z1sqr, in1_z);
@@ -1290,14 +1281,14 @@ ___
 for($i=0;$i<64;$i+=32) {		# conditional moves
 $code.=<<___;
 	ldp	$acc0,$acc1,[$ap_real,#$i]	// in1
-	cmp	$in1infty,#0			// !$in1intfy, remember?
+	cmp	$in1infty,#0			// ~$in1intfy, remember?
 	ldp	$acc2,$acc3,[$ap_real,#$i+16]
 	csel	$t0,$a0,$t0,ne
 	csel	$t1,$a1,$t1,ne
 	ldp	$a0,$a1,[sp,#$res_x+$i+32]	// res
 	csel	$t2,$a2,$t2,ne
 	csel	$t3,$a3,$t3,ne
-	cmp	$in2infty,#0			// !$in2intfy, remember?
+	cmp	$in2infty,#0			// ~$in2intfy, remember?
 	ldp	$a2,$a3,[sp,#$res_x+$i+48]
 	csel	$acc0,$t0,$acc0,ne
 	csel	$acc1,$t1,$acc1,ne
@@ -1314,13 +1305,13 @@ ___
 }
 $code.=<<___;
 	ldp	$acc0,$acc1,[$ap_real,#$i]	// in1
-	cmp	$in1infty,#0			// !$in1intfy, remember?
+	cmp	$in1infty,#0			// ~$in1intfy, remember?
 	ldp	$acc2,$acc3,[$ap_real,#$i+16]
 	csel	$t0,$a0,$t0,ne
 	csel	$t1,$a1,$t1,ne
 	csel	$t2,$a2,$t2,ne
 	csel	$t3,$a3,$t3,ne
-	cmp	$in2infty,#0			// !$in2intfy, remember?
+	cmp	$in2infty,#0			// ~$in2intfy, remember?
 	csel	$acc0,$t0,$acc0,ne
 	csel	$acc1,$t1,$acc1,ne
 	csel	$acc2,$t2,$acc2,ne
diff --git a/crypto/ec/asm/ecp_nistz256-x86.pl b/crypto/ec/asm/ecp_nistz256-x86.pl
index 0c6fc665bf..1a44c639d7 100755
--- a/crypto/ec/asm/ecp_nistz256-x86.pl
+++ b/crypto/ec/asm/ecp_nistz256-x86.pl
@@ -1388,7 +1388,7 @@ for ($i=0;$i<7;$i++) {
 
 	# above map() describes stack layout with 18 temporary
 	# 256-bit vectors on top, then we take extra words for
-	# !in1infty, !in2infty, result of check for zero and
+	# ~in1infty, ~in2infty, result of check for zero and
 	# OPENSSL_ia32cap_P copy. [one unused word for padding]
 	&stack_push(8*18+5);
 						if ($sse2) {
@@ -1419,7 +1419,7 @@ for ($i=0;$i<7;$i++) {
 	&sub	("eax","ebp");
 	&or	("ebp","eax");
 	&sar	("ebp",31);
-	&mov	(&DWP(32*18+4,"esp"),"ebp");	# !in2infty
+	&mov	(&DWP(32*18+4,"esp"),"ebp");	# ~in2infty
 
 	&lea	("edi",&DWP($in1_x,"esp"));
     for($i=0;$i<96;$i+=16) {
@@ -1441,7 +1441,7 @@ for ($i=0;$i<7;$i++) {
 	&sub	("eax","ebp");
 	&or	("ebp","eax");
 	&sar	("ebp",31);
-	&mov	(&DWP(32*18+0,"esp"),"ebp");	# !in1infty
+	&mov	(&DWP(32*18+0,"esp"),"ebp");	# ~in1infty
 
 	&mov	("eax",&DWP(32*18+12,"esp"));	# OPENSSL_ia32cap_P copy
 	&lea	("esi",&DWP($in2_z,"esp"));
@@ -1516,23 +1516,19 @@ for ($i=0;$i<7;$i++) {
 	&or	("eax",&DWP(0,"edi"));
 	&or	("eax",&DWP(4,"edi"));
 	&or	("eax",&DWP(8,"edi"));
-	&or	("eax",&DWP(12,"edi"));
+	&or	("eax",&DWP(12,"edi"));		# ~is_equal(U1,U2)
 
-	&data_byte(0x3e);			# predict taken
-	&jnz	(&label("add_proceed"));	# is_equal(U1,U2)?
-
-	&mov	("eax",&DWP(32*18+0,"esp"));
-	&and	("eax",&DWP(32*18+4,"esp"));
-	&mov	("ebx",&DWP(32*18+8,"esp"));
-	&jz	(&label("add_proceed"));	# (in1infty || in2infty)?
-	&test	("ebx","ebx");
-	&jz	(&label("add_double"));		# is_equal(S1,S2)?
+	&mov	("ebx",&DWP(32*18+0,"esp"));	# ~in1infty
+	&not	("ebx");			# -1/0 -> 0/-1
+	&or	("eax","ebx");
+	&mov	("ebx",&DWP(32*18+4,"esp"));	# ~in2infty
+	&not	("ebx");			# -1/0 -> 0/-1
+	&or	("eax","ebx");
+	&or	("eax",&DWP(32*18+8,"esp"));	# ~is_equal(S1,S2)
 
-	&mov	("edi",&wparam(0));
-	&xor	("eax","eax");
-	&mov	("ecx",96/4);
-	&data_byte(0xfc,0xf3,0xab);		# cld; stosd
-	&jmp	(&label("add_done"));
+	# if (~is_equal(U1,U2) | in1infty | in2infty | ~is_equal(S1,S2))
+	&data_byte(0x3e);			# predict taken
+	&jnz	(&label("add_proceed"));
 
 &set_label("add_double",16);
 	&mov	("esi",&wparam(1));
@@ -1614,34 +1610,34 @@ for ($i=0;$i<7;$i++) {
 	&lea	("edi",&DWP($res_y,"esp"));
 	&call	("_ecp_nistz256_sub");		# p256_sub(res_y, res_y, S2);
 
-	&mov	("ebp",&DWP(32*18+0,"esp"));	# !in1infty
-	&mov	("esi",&DWP(32*18+4,"esp"));	# !in2infty
+	&mov	("ebp",&DWP(32*18+0,"esp"));	# ~in1infty
+	&mov	("esi",&DWP(32*18+4,"esp"));	# ~in2infty
 	&mov	("edi",&wparam(0));
 	&mov	("edx","ebp");
 	&not	("ebp");
-	&and	("edx","esi");
-	&and	("ebp","esi");
-	&not	("esi");
+	&and	("edx","esi");			# ~in1infty & ~in2infty
+	&and	("ebp","esi");			# in1infty & ~in2infty
+	&not	("esi");			# in2infty
 
 	########################################
 	# conditional moves
     for($i=64;$i<96;$i+=4) {
-	&mov	("eax","edx");
+	&mov	("eax","edx");			# ~in1infty & ~in2infty
 	&and	("eax",&DWP($res_x+$i,"esp"));
-	&mov	("ebx","ebp");
+	&mov	("ebx","ebp");			# in1infty & ~in2infty
 	&and	("ebx",&DWP($in2_x+$i,"esp"));
-	&mov	("ecx","esi");
+	&mov	("ecx","esi");			# in2infty
 	&and	("ecx",&DWP($in1_x+$i,"esp"));
 	&or	("eax","ebx");
 	&or	("eax","ecx");
 	&mov	(&DWP($i,"edi"),"eax");
     }
     for($i=0;$i<64;$i+=4) {
-	&mov	("eax","edx");
+	&mov	("eax","edx");			# ~in1infty & ~in2infty
 	&and	("eax",&DWP($res_x+$i,"esp"));
-	&mov	("ebx","ebp");
+	&mov	("ebx","ebp");			# in1infty & ~in2infty
 	&and	("ebx",&DWP($in2_x+$i,"esp"));
-	&mov	("ecx","esi");
+	&mov	("ecx","esi");			# in2infty
 	&and	("ecx",&DWP($in1_x+$i,"esp"));
 	&or	("eax","ebx");
 	&or	("eax","ecx");
@@ -1668,7 +1664,7 @@ for ($i=0;$i<7;$i++) {
 
 	# above map() describes stack layout with 15 temporary
 	# 256-bit vectors on top, then we take extra words for
-	# !in1infty, !in2infty, and OPENSSL_ia32cap_P copy.
+	# ~in1infty, ~in2infty, and OPENSSL_ia32cap_P copy.
 	&stack_push(8*15+3);
 						if ($sse2) {
 	&call	("_picup_eax");
@@ -1698,7 +1694,7 @@ for ($i=0;$i<7;$i++) {
 	&sub	("eax","ebp");
 	&or	("ebp","eax");
 	&sar	("ebp",31);
-	&mov	(&DWP(32*15+0,"esp"),"ebp");	# !in1infty
+	&mov	(&DWP(32*15+0,"esp"),"ebp");	# ~in1infty
 
 	&lea	("edi",&DWP($in2_x,"esp"));
     for($i=0;$i<64;$i+=16) {
@@ -1724,7 +1720,7 @@ for ($i=0;$i<7;$i++) {
 	 &lea	("ebp",&DWP($in1_z,"esp"));
 	&sar	("ebx",31);
 	 &lea	("edi",&DWP($Z1sqr,"esp"));
-	&mov	(&DWP(32*15+4,"esp"),"ebx");	# !in2infty
+	&mov	(&DWP(32*15+4,"esp"),"ebx");	# ~in2infty
 
 	&call	("_ecp_nistz256_mul_mont");	# p256_sqr_mont(Z1sqr, in1_z);
 
@@ -1823,14 +1819,14 @@ for ($i=0;$i<7;$i++) {
 	&lea	("edi",&DWP($res_y,"esp"));
 	&call	("_ecp_nistz256_sub");		# p256_sub(res_y, res_y, S2);
 
-	&mov	("ebp",&DWP(32*15+0,"esp"));	# !in1infty
-	&mov	("esi",&DWP(32*15+4,"esp"));	# !in2infty
+	&mov	("ebp",&DWP(32*15+0,"esp"));	# ~in1infty
+	&mov	("esi",&DWP(32*15+4,"esp"));	# ~in2infty
 	&mov	("edi",&wparam(0));
 	&mov	("edx","ebp");
 	&not	("ebp");
-	&and	("edx","esi");
-	&and	("ebp","esi");
-	&not	("esi");
+	&and	("edx","esi");			# ~in1infty & ~in2infty
+	&and	("ebp","esi");			# in1infty & ~in2infty
+	&not	("esi");			# in2infty
 
 	########################################
 	# conditional moves
@@ -1848,11 +1844,11 @@ for ($i=0;$i<7;$i++) {
 	&mov	(&DWP($i,"edi"),"eax");
     }
     for($i=0;$i<64;$i+=4) {
-	&mov	("eax","edx");
+	&mov	("eax","edx");			# ~in1infty & ~in2infty
 	&and	("eax",&DWP($res_x+$i,"esp"));
-	&mov	("ebx","ebp");
+	&mov	("ebx","ebp");			# in1infty & ~in2infty
 	&and	("ebx",&DWP($in2_x+$i,"esp"));
-	&mov	("ecx","esi");
+	&mov	("ecx","esi");			# in2infty
 	&and	("ecx",&DWP($in1_x+$i,"esp"));
 	&or	("eax","ebx");
 	&or	("eax","ecx");
diff --git a/crypto/ec/asm/ecp_nistz256-x86_64.pl b/crypto/ec/asm/ecp_nistz256-x86_64.pl
index 533da506ad..676c908cff 100755
--- a/crypto/ec/asm/ecp_nistz256-x86_64.pl
+++ b/crypto/ec/asm/ecp_nistz256-x86_64.pl
@@ -3625,29 +3625,19 @@ $code.=<<___;
 	call	__ecp_nistz256_sub_from$x	# p256_sub(H, U2, U1);
 
 	or	$acc5, $acc4			# see if result is zero
+	or	$acc0, $acc4
+	or	$acc1, $acc4			# !is_equal(U1, U2)
+
+	movq	%xmm2, $acc0			# in1infty | in2infty
+	movq	%xmm3, $acc1			# !is_equal(S1, S2)
+
 	or	$acc0, $acc4
 	or	$acc1, $acc4
 
+	# if (!is_equal(U1, U2) | in1infty | in2infty | !is_equal(S1, S2))
 	.byte	0x3e				# predict taken
-	jnz	.Ladd_proceed$x			# is_equal(U1,U2)?
-	movq	%xmm2, $acc0
-	movq	%xmm3, $acc1
-	test	$acc0, $acc0
-	jnz	.Ladd_proceed$x			# (in1infty || in2infty)?
-	test	$acc1, $acc1
-	jz	.Ladd_double$x			# is_equal(S1,S2)?
+	jnz	.Ladd_proceed$x
 
-	movq	%xmm0, $r_ptr			# restore $r_ptr
-	pxor	%xmm0, %xmm0
-	movdqu	%xmm0, 0x00($r_ptr)
-	movdqu	%xmm0, 0x10($r_ptr)
-	movdqu	%xmm0, 0x20($r_ptr)
-	movdqu	%xmm0, 0x30($r_ptr)
-	movdqu	%xmm0, 0x40($r_ptr)
-	movdqu	%xmm0, 0x50($r_ptr)
-	jmp	.Ladd_done$x
-
-.align	32
 .Ladd_double$x:
 	movq	%xmm1, $a_ptr			# restore $a_ptr
 	movq	%xmm0, $r_ptr			# restore $r_ptr
diff --git a/crypto/ec/ecp_nistp224.c b/crypto/ec/ecp_nistp224.c
index 36edcd5130..5a3812d04e 100644
--- a/crypto/ec/ecp_nistp224.c
+++ b/crypto/ec/ecp_nistp224.c
@@ -907,6 +907,7 @@ static void point_add(felem x3, felem y3, felem z3,
     felem ftmp, ftmp2, ftmp3, ftmp4, ftmp5, x_out, y_out, z_out;
     widefelem tmp, tmp2;
     limb z1_is_zero, z2_is_zero, x_equal, y_equal;
+    limb points_equal;
 
     if (!mixed) {
         /* ftmp2 = z2^2 */
@@ -963,15 +964,41 @@ static void point_add(felem x3, felem y3, felem z3,
     felem_reduce(ftmp, tmp);
 
     /*
-     * the formulae are incorrect if the points are equal so we check for
-     * this and do doubling if this happens
+     * The formulae are incorrect if the points are equal, in affine coordinates
+     * (X_1, Y_1) == (X_2, Y_2), so we check for this and do doubling if this
+     * happens.
+     *
+     * We use bitwise operations to avoid potential side-channels introduced by
+     * the short-circuiting behaviour of boolean operators.
      */
     x_equal = felem_is_zero(ftmp);
     y_equal = felem_is_zero(ftmp3);
+    /*
+     * The special case of either point being the point at infinity (z1 and/or
+     * z2 are zero), is handled separately later on in this function, so we
+     * avoid jumping to point_double here in those special cases.
+     */
     z1_is_zero = felem_is_zero(z1);
     z2_is_zero = felem_is_zero(z2);
-    /* In affine coordinates, (X_1, Y_1) == (X_2, Y_2) */
-    if (x_equal && y_equal && !z1_is_zero && !z2_is_zero) {
+
+    /*
+     * Compared to `ecp_nistp256.c` and `ecp_nistp521.c`, in this
+     * specific implementation `felem_is_zero()` returns truth as `0x1`
+     * (rather than `0xff..ff`).
+     *
+     * This implies that `~true` in this implementation becomes
+     * `0xff..fe` (rather than `0x0`): for this reason, to be used in
+     * the if expression, we mask out only the last bit in the next
+     * line.
+     */
+    points_equal = (x_equal & y_equal & (~z1_is_zero) & (~z2_is_zero)) & 1;
+
+    if (points_equal) {
+        /*
+         * This is obviously not constant-time but, as mentioned before, this
+         * case never happens during single point multiplication, so there is no
+         * timing leak for ECDH or ECDSA signing.
+         */
         point_double(x3, y3, z3, x1, y1, z1);
         return;
     }
diff --git a/crypto/ec/ecp_nistp256.c b/crypto/ec/ecp_nistp256.c
index 8265d574de..3c3769873e 100644
--- a/crypto/ec/ecp_nistp256.c
+++ b/crypto/ec/ecp_nistp256.c
@@ -1241,6 +1241,7 @@ static void point_add(felem x3, felem y3, felem z3,
     longfelem tmp, tmp2;
     smallfelem small1, small2, small3, small4, small5;
     limb x_equal, y_equal, z1_is_zero, z2_is_zero;
+    limb points_equal;
 
     felem_shrink(small3, z1);
 
@@ -1340,7 +1341,26 @@ static void point_add(felem x3, felem y3, felem z3,
     felem_shrink(small1, ftmp5);
     y_equal = smallfelem_is_zero(small1);
 
-    if (x_equal && y_equal && !z1_is_zero && !z2_is_zero) {
+    /*
+     * The formulae are incorrect if the points are equal, in affine coordinates
+     * (X_1, Y_1) == (X_2, Y_2), so we check for this and do doubling if this
+     * happens.
+     *
+     * We use bitwise operations to avoid potential side-channels introduced by
+     * the short-circuiting behaviour of boolean operators.
+     *
+     * The special case of either point being the point at infinity (z1 and/or
+     * z2 are zero), is handled separately later on in this function, so we
+     * avoid jumping to point_double here in those special cases.
+     */
+    points_equal = (x_equal & y_equal & (~z1_is_zero) & (~z2_is_zero));
+
+    if (points_equal) {
+        /*
+         * This is obviously not constant-time but, as mentioned before, this
+         * case never happens during single point multiplication, so there is no
+         * timing leak for ECDH or ECDSA signing.
+         */
         point_double(x3, y3, z3, x1, y1, z1);
         return;
     }
diff --git a/crypto/ec/ecp_nistp521.c b/crypto/ec/ecp_nistp521.c
index 14cd409162..b10f4c8672 100644
--- a/crypto/ec/ecp_nistp521.c
+++ b/crypto/ec/ecp_nistp521.c
@@ -1158,6 +1158,7 @@ static void point_add(felem x3, felem y3, felem z3,
     felem ftmp, ftmp2, ftmp3, ftmp4, ftmp5, ftmp6, x_out, y_out, z_out;
     largefelem tmp, tmp2;
     limb x_equal, y_equal, z1_is_zero, z2_is_zero;
+    limb points_equal;
 
     z1_is_zero = felem_is_zero(z1);
     z2_is_zero = felem_is_zero(z2);
@@ -1242,7 +1243,24 @@ static void point_add(felem x3, felem y3, felem z3,
     felem_scalar64(ftmp5, 2);
     /* ftmp5[i] < 2^61 */
 
-    if (x_equal && y_equal && !z1_is_zero && !z2_is_zero) {
+    /*
+     * The formulae are incorrect if the points are equal, in affine coordinates
+     * (X_1, Y_1) == (X_2, Y_2), so we check for this and do doubling if this
+     * happens.
+     *
+     * We use bitwise operations to avoid potential side-channels introduced by
+     * the short-circuiting behaviour of boolean operators.
+     *
+     * The special case of either point being the point at infinity (z1 and/or
+     * z2 are zero), is handled separately later on in this function, so we
+     * avoid jumping to point_double here in those special cases.
+     *
+     * Notice the comment below on the implications of this branching for timing
+     * leaks and why it is considered practically irrelevant.
+     */
+    points_equal = (x_equal & y_equal & (~z1_is_zero) & (~z2_is_zero));
+
+    if (points_equal) {
         /*
          * This is obviously not constant-time but it will almost-never happen
          * for ECDH / ECDSA. The case where it can happen is during scalar-mult
diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c
index 6da62c3b91..1e45a81b76 100644
--- a/crypto/ec/ecp_nistz256.c
+++ b/crypto/ec/ecp_nistz256.c
@@ -358,16 +358,47 @@ static void ecp_nistz256_point_add(P256_POINT *r,
     ecp_nistz256_sub(H, U2, U1);                /* H = U2 - U1 */
 
     /*
-     * This should not happen during sign/ecdh, so no constant time violation
+     * The formulae are incorrect if the points are equal so we check for
+     * this and do doubling if this happens.
+     *
+     * Points here are in Jacobian projective coordinates (Xi, Yi, Zi)
+     * that are bound to the affine coordinates (xi, yi) by the following
+     * equations:
+     *     - xi = Xi / (Zi)^2
+     *     - y1 = Yi / (Zi)^3
+     *
+     * For the sake of optimization, the algorithm operates over
+     * intermediate variables U1, U2 and S1, S2 that are derived from
+     * the projective coordinates:
+     *     - U1 = X1 * (Z2)^2 ; U2 = X2 * (Z1)^2
+     *     - S1 = Y1 * (Z2)^3 ; S2 = Y2 * (Z1)^3
+     *
+     * It is easy to prove that is_equal(U1, U2) implies that the affine
+     * x-coordinates are equal, or either point is at infinity.
+     * Likewise is_equal(S1, S2) implies that the affine y-coordinates are
+     * equal, or either point is at infinity.
+     *
+     * The special case of either point being the point at infinity (Z1 or Z2
+     * is zero), is handled separately later on in this function, so we avoid
+     * jumping to point_double here in those special cases.
+     *
+     * When both points are inverse of each other, we know that the affine
+     * x-coordinates are equal, and the y-coordinates have different sign.
+     * Therefore since U1 = U2, we know H = 0, and therefore Z3 = H*Z1*Z2
+     * will equal 0, thus the result is infinity, if we simply let this
+     * function continue normally.
+     *
+     * We use bitwise operations to avoid potential side-channels introduced by
+     * the short-circuiting behaviour of boolean operators.
      */
-    if (is_equal(U1, U2) && !in1infty && !in2infty) {
-        if (is_equal(S1, S2)) {
-            ecp_nistz256_point_double(r, a);
-            return;
-        } else {
-            memset(r, 0, sizeof(*r));
-            return;
-        }
+    if (is_equal(U1, U2) & ~in1infty & ~in2infty & is_equal(S1, S2)) {
+        /*
+         * This is obviously not constant-time but it should never happen during
+         * single point multiplication, so there is no timing leak for ECDH or
+         * ECDSA signing.
+         */
+        ecp_nistz256_point_double(r, a);
+        return;
     }
 
     ecp_nistz256_sqr_mont(Rsqr, R);             /* R^2 */