ssl3_get_record:decryption failed on some machines

Mon Nov 18 20:33:47 UTC 2019

The writer is my own code but I can also reproduce the problem when server is nginx and client is my app.

In my code I do not use OpenSSL socket BIOs instead I do read/writes through a BIO pair:

  pairBase = BIO_new(BIO_s_bio());
  pairInt  = BIO_new(BIO_s_bio());

  [...]

  BIO_make_bio_pair(pairBase, pairInt);

  [...]

  sslBIO = BIO_new_ssl(ssl_ctx, 1 /* Client */);

  [...]

  BIO_push(sslBIO, pairInt);

After each BIO_read/BIO_write to sslBIO I read/write any available data from the network to pairBase.

I think I'm handling partial writes correctly:

  SSL_CTX_set_mode(ssl_ctx, SSL_MODE_AUTO_RETRY | SSL_MODE_ENABLE_PARTIAL_WRITE | SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);

  [..]

  ret = BIO_write(sslBIO, buf, (int)length);

  if (ret <= 0 && !BIO_should_retry(sslBIO))
  {
      /* Handle error */
      return;
  }

  if (ret > 0)
  {
      buf = ((uint8_t *)buf) + (size_t)ret;
      length -= (size_t)ret;
  }

but again the problem reproduces even if the writer is nginx.

Thanks

On Mon, Nov 18, 2019 at 02:19:30PM -0500, Viktor Dukhovni wrote:
> > On Nov 18, 2019, at 1:44 PM, Fernando Gutierrez Mendez <fergtm at hyperion.io> wrote:
> > 
> > I use non-blocking IO with a SSL BIO so a call to BIO_read eventually returns -1, when this happens I call BIO_should_retry to test if this is due an error or because of the underlying non-blocking transport.
> 
> Is the writer side also non-blocking?  Is it your own code?
> 
> > This code works correctly but after transferring between 1Mb to 5Mb (it varies every time) BIO_should_rety returns false and SSL_get_error returns SSL_ERROR_SSL. The error is "139964546914112:error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac:../ssl/record/ssl3_record.c:677"
> 
> One way to get decryption integrity failure is for a non-blocking
> writer to not handle partial writes correctly, if on an incomplete
> write the writer resends the whole buffer, rather than only what
> it failed to send last time, the TCP stream ends up stuttering
> ciphertext, and the reader sees data integrity errors.
> 
> This can be seen by looking for unexpected runs of repeated
> ciphertext in a PCAP capture of the data.
> 
> Whether the data sent to a particular reader ever ends up
> blocked at the TCP layer for a given writer can depend on
> various network-layer issues making some machines more
> prone to problems than others.
> 
> -- 
> 	Viktor.
>