Vote proposal: Technical items still to be done

Thu Oct 8 04:57:36 UTC 2020

On Wed, Oct 07, 2020 at 12:35:28PM +0100, Matt Caswell wrote:
> I had an action from the OTC meeting today to raise a vote on the OTC
> list of technical items still to be done. Here is my proposed vote text.
> There will be a subsequent vote on the "beta readiness checklist" which
> is a separate list.

Thanks, Matt; this is a fine list of desirable things.  (I tried to resist
commenting on specific items, but building the 1.1.1 apps+tests against 3.0
libraries is a solid way to help meet our stability goals, and should
arguably be something that run-checker just does, all the time.)

I've trimmed the list, though, to make a broader point, which could be
summarized as "the perfect is the enemy of the good".

It's a natural consequence for a software project that has long-term
supported releases, strong API stability guarantees, and infrequent
releases, to feel that each major release is a critical threshold and that
we need to do a bunch of tidying before the release to wrap it up nicely.
Paying off technical debt is often a bit part of the tidying that is
perceived necessary, as is attempting to be future looking -- missing the
release, after all, is the difference between needing to support some
bad/annoying thing for (say) 8 years instead of "only" 5.

There is value in doing this fixup, yes, but there is also cost in keeping
all the good things (and other fixup) that is already done out of the hands
of those who wish to consume it.  I've seen this phenomenon bite various
projects over time with effects of different magnitude, varying from
FreeBSD 5.0+SMP that had nearly existential consequences on the project, to
OpenAFS 1.6.0 where release delays produced enough frustration to lead to a
bit of a rush-job final release that was a bit unstable; I was lucky enough
to have missed the worst of this effect for krb5, and managed to do a
little better (but still not great) for OpenAFS 1.8.0.

Projects that get hit really badly by this phenomenon tend to correct for
it and end up on a fixed time-based cadence of releases, and in order to
stick to that cadence end up having to get comfortable shipping releases
without a desired feature or that are known to have incomplete parts of one
feature or another.  It also requires discipline to keep the main
development branch always (or nearly so) in a releasable state, but in my
opinion we are already doing a pretty good job of that with our policies
for code review and unit tests.  (We could, of course, do better about
monitoring the extended tests, run checker, and the like, but when we do
have regressions getting them fixed is not an invasive mess.)

To list some examples:

FreeBSD aims for new major ("dot zero") releases every two years.
MIT krb5 puts out new releases annually.
Google Chrome puts out releases every 6 weeks.
[Okay, I haven't exactly moved OpenAFS over to this schedule yet, but I did
just today cut a 1.9.0 and am going to try to put out regular 1.9.x
versions.]

I would urge the OTC (and OMC) to be careful around the pitfalls of making
the perfect the enemy of the good, and be willing to accept some level of
"uncleanliness" in the interest of getting all the good things we have
already done out there in production releases.  (And also trying to not
slip the published schedule too much.)  It is unpleasant, yes, but
sometimes it is the best choice overall.

Thanks,

Ben