[openssl-dev] bug in openssl 1.0.2a, s3_pkt.c ssl3_write_bytes()

Tue May 19 10:45:40 UTC 2015

On 18/05/15 23:58, George Wang wrote:
> The variable causing problem is
> 
> tot = 32768
> 
> tot should be less than "len", while, len = 332
> 
> analyzing the code change between 1.0.1m and 1.0.2a, the following code
> should be the source of the problem,
> 
>     /*
>      * first check if there is a SSL3_BUFFER still being written out.  This
>      * will happen with non blocking IO
>      */
>     if (wb->left != 0) {
>         i = ssl3_write_pending(s, type, &buf[tot], s->s3->wpend_tot);
>         if (i <= 0) {
>             /* XXX should we ssl3_release_write_buffer if i<0? */
>             s->s3->wnum = tot;
>             return i;
>         }
>         tot += i;               /* this might be last fragment */
>     }
> 
> 
> looks like ssl3_write_pending() return 16384, increase "tot" value by
> 16384, made it greater than "len", end up causing "n = 4294934860".
> 
> It only happens when there are pending data flushed. 1.0.1m code does
> not have the logic to flush pending data.

>From a user application perspective application data is being written
using SSL_write(). That ultimately gets passed down to this function,
ssl3_write_bytes(), where |buf| is the pointer to the user buffer and
|len| is the length of that buffer.

It may be that the length of the application data is too long to fit
into a single TLS record, and therefore we may have to split that data
up. The variable |tot| holds the amount of data that we have
successfully written out so far in complete records. The current record
being written is held in an SSL3_BUFFER structure, and it is that
current buffer that ssl3_write_pending() is trying to write out.

Of course with non-blocking IO at some point during the attempt to write
out data we may not be able to (i.e. where the write would block in
blocking IO). In that instance ssl3_write_pending() returns a value <=0
and control is returned to the user application. The user application
(at some later point) is expected to call SSL_write() again with exactly
the *same* arguments as the previous call, i.e. the same |buf| and the
same value for |len|.

The function ssl3_write_pending() only actually has two possible types
of return. It can return <=0 to indicate an error (which can also mean
that a non-blocking IO attempt would normally block), or it can return
the length of the pending data in the SSL3_BUFFER that it successfully
wrote out. This value is |s->s3->wpend_ret| which (for the purposes of
this discussion) always holds the same value as |s->s3->wpend_tot|.

When we return from the call to ssl3_write_pending() we add the amount
of data we wrote out to |tot|.

If in the call to ssl3_write_bytes() we discover that |wb->left != 0|
this means we are being called again after a previous call to
SSL_write() would have blocked, i.e. there is data still in the
SSL3_BUFFER to be written out. Therefore we call ssl3_write_pending() to
write out that data. We do some sanity checks along the way though.
First here:

    if (len < tot) {
        SSLerr(SSL_F_SSL3_WRITE_BYTES, SSL_R_BAD_LENGTH);
        return (-1);
    }

It should never be the case that we have written out more data than the
length of the application buffer passed in the call to SSL_write(). If
we have then something has gone wrong....almost certainly the
application has changed the value of |len| in the retry attempt.

Next there is this check within ssl3_write_pending():

    if ((s->s3->wpend_tot > (int)len)
        || ((s->s3->wpend_buf != buf) &&
            !(s->mode & SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER))
        || (s->s3->wpend_type != type)) {
        SSLerr(SSL_F_SSL3_WRITE_PENDING, SSL_R_BAD_WRITE_RETRY);
        return (-1);
    }

So this checks that the amount of data in the SSL3_BUFFER
|s->s3->wpend_tot| isn't greater than the length of the application
buffer left to write out. Again if this check fails then something has
gone wrong....also probably because the application has changed the
value of |len| in the retry attempt. The other checks are similar...we
check that the application hasn't changed its buffer location. The final
check is to ensure that the type of data hasn't changed (e.g. we've got
handshake data in the SSL3_BUFFER, but now we're trying to write
application data). If any of these sanity checks fail then we get an
SSL_R_BAD_WRITE_RETRY error returned.

*But* it looks like there is a bug here. The value of |len| in this last
set of checks is the third argument passed to ssl3_write_pending(). It
is supposed to represent the amount of application data that is in the
application buffer still to be written out. Our call to
ssl3_write_pending() looks like this:

        i = ssl3_write_pending(s, type, &buf[tot], s->s3->wpend_tot);

What we are actually passing is |s->s3->wpend_tot|. This is the amount
of data pending in the SSL3_BUFFER structure! It probably should look
like this:

        i = ssl3_write_pending(s, type, &buf[tot], len - tot);

Due to this problem it means that the sanity check for the length of
data in ssl3_write_pending() always passes. So, in the event that the
application calls SSL_write() again when it should be passing the same
values again, but instead passes a much reduced value for |len|, then
our sanity checks will not detect this issue. The result will be that
the return from ssl3_write_pending() will be too large and hence tot
will end up > len...eventually leading to a crash.

The attached patch for 1.0.2a corrects this issue. Please could you try
it out and see if it resolves your issue.

*If* all of the above is what is happening in your case *then* what this
means is that there is a second bug here, probably in the application
code, which means that SSL_write() is being called with different
arguments in a retry attempt. Most likely a previous call to SSL_write()
has failed to detect the error return and act on it accordingly. If
that's the case then hopefully this patch will help you track down where
it is occurring.

> 
> I hope someone really know how it should work can help to fix this.
> 
> 1.0.2a is not very stable, we see some other strange behaviors as well.

I'd be interested to know what other issues you are encountering.
Possibly they are related?

Hope that helps. Let me know how you get on,

Matt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bad-write-retry.patch
Type: text/x-patch
Size: 1324 bytes
Desc: not available
URL: <http://mta.openssl.org/pipermail/openssl-dev/attachments/20150519/9ab09fd4/attachment.bin>