[openssl-dev] #GP happens in do_sse3_after_all
Andy Polyakov
appro at openssl.org
Fri Oct 20 08:14:02 UTC 2017
Hi,
> I met an issue in the crypto/chacha/chacha-x86_64.S, could you be kind
> to have a look on it? Thanks very much.
>
> Currently it will stuck in the function *do_sse3_after_all*, and a #GP
> will occurs due to the following instructions
>
> ““movdqa %xmm0,0(%rsp)” need 16 bytes alignment, however, after I go
> through the detail code, I find that it already
>
> adjust the rsp by “subq $64+8,%rsp” and I simply tried to change it like
> “subq $64,%rsp” then it will works correctly.
>
> I don’t know whether there’s an issue about it?, if I have some mistake
> please correct me. J
>
> I suppose that the “subq $64+8,%rsp” is used to align the stack with 16
> bytes, but in my case if the default RSP already be 16 bytes
>
> align then after execute it the stack will becomes 8 bytes align so the
> #GP happensL So could you please help to check it?
All known x86_64 ABIs specify that top of stack is to be aligned at 16
bytes. Obviously it can't be aligned at each given moment, not on
x86_64, so question is *when* does it have to be aligned? It has to be
aligned at least at moment of call to another subroutine. Since x86_64
call instruction pushes return address to stack, this means that upon
entry to function stack is actually misaligned. Hence compliant function
has to allocate 16*n+8 frame. And that's what we see in code, 64+8 in
the referred case. Now, if you experience crash at the point in
question, it can only mean one thing, caller is not compliant with ABI.
Though there is ambiguity and it might be wrong to blame direct caller
for following reason. Customarily compilers don't explicitly align stack
in each subroutine, but instead assume that caller aligned it. In other
words stack alignment is kind of collective effort, with each subroutine
relying on its caller. So that all subroutines can be compliant, but it
would still be a problem. This would be case when stack was *initially*
misaligned [upon its creation]. To summarize, it's either one of
subroutines in chain of calls leading to ChaCha20_ctr32 that is not
compliant with ABI, or stack was initially seeded misaligned.
More information about the openssl-dev
mailing list