Check NULL pointers or not...

Fri Nov 29 08:34:49 UTC 2019

The way I view the issue is to look at what happens when things go wrong -
what is the impact - and evaluate the difference in behaviour between the
approaches.
You have to start from the premise that (in general) software is not tested
for all possible usage models - i.e. test coverage isn't at 100% - so there
are always code paths and circumstances that can occur that simply haven't
been hit during testing.

That means that having a special assert enabled build doesn't solve the
problem - in that a gap in testing always remains - so the "middle ground"
doesn't alter the fundamentals as such and does also lead to basically
testing something other than what is running in a default (non-debug)
build. Testing precisely what you will actually run is a fairly normal
principle to work from - but many overlook those sorts of differences.

In a production environment, it is almost never appropriate to simply crash
in an uncontrolled manner (i.e. dereferencing a NULL pointer).
There are many contexts that generate real issues where a more graceful
failure mode (i.e. anything other than crashing) results in a much better
user outcome.
Denial-of-service attacks often rely on this sort of issue - basically
recognising that testing doesn't cover everything and finding unusual
circumstances that create some form of load on a system that impacts other
things.
You are effectively adding in a whole pile of abort() references in the
code by not checking - that is the actual effect - and abort isn't
something that should ever be called in production code.

The other rather practical thing is that when you do check for incoming
NULL pointers, you end up with a lot less reports of issues in a library -
as you aren't sitting in the call chain of a crash - and that in itself
saves a lot of time when dealing with users. Many people will report issues
like this - but if they get an error return rather than a crash they do
(generally) keep looking. And developer time for OpenSSL (like many
projects) is the most scarce resource that we have. Anything that reduces
the impact on looking at crashes enables people to perform more useful work
rather than debugging a coding error of an OpenSSL user.

Another simple argument that helps to make a choice is that whatever we do
we need to be consistent in our handling of things - and at the moment many
functions check and many functions do not.
And if we make things consistent we certainly don't really have the option
of undoing any of the null checks because that will break code - it is
effectively part of the API contract - that it is safe to call certain
functions with NULL arguments. Adding extra checks from an API contract
point of view is harmless, removing them isn't an option. And if we want
consistency then we pretty much have only one choice - to check everywhere.

There are *very few places* where such checks will have a substantial
performance impact in real-world contexts.

Arguments about the C runtime library not checking simply aren't relevant.
The C runtime library doesn't set the API contracts and usage model that we
use. And there are C runtime functions that do check.

What we should be looking at in my view is the impact when things go wrong
- and a position that basically says the caller isn't allowed to make
mistakes ever for simple things like this which are easy to check isn't
appropriate. That it does also reduce the burden on the project (and it
also increases the users perception of the quality because they are always
looking at their own bugs and not thinking that OpenSSL has a major issue
with an uncontrolled crash - even if they don't report it publicly).

Tim.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mta.openssl.org/pipermail/openssl-project/attachments/20191129/6ab081a3/attachment-0001.html>