notes on bitrot avoidance for on-disk data (including code + APIs)

As a long term archival project the choices we make for the
usability and accessibility of our data is of the utmost
importance.

While past history is no guarantee of the future, it does seem to be an
important data point in choosing formats for data we hope to be
in use decades or centuries from now.  Data formats include
programming languages and APIs of our implementation.

* git - great history of data compatibility since its first year of
  existence.  As a programming API, the only major plumbing change
  was the removal of the dashed `git-foo' form from the install path
  in the early years.

* SQLite 3 - good on-disk format and one of the few recommended
  formats by the Library of Congress[1].

  However, we only depend on its stability to maintain a stable,
  bidirectional mapping of Message-IDs to NNTP article numbers
  in msgmap.sqlite3.  lei uses it to maintain mail source mappings,
  but lei itself is not-yet-ready for reliably storing private mail.

  [1] https://www.loc.gov/preservation/digital/formats/fdd/fdd000461.shtml

* POSIX, Linux + *BSD kernel APIs - the only relevant OS APIs

  As good as it gets with no other practical choices available.

  When relying on the `syscall' perlop, be sure to hard code the
  actual numbers used for syscalls instead of relying on the
  symbolic name => number mapping at compilation time.  FreeBSD (and
  probably others) will assign different numbers to the same name
  name (e.g. SYS_kevent changed from 363 to 560, while
  SYS_freebsd11_kevent continues to map to 363 in FreeBSD 12+).

* Perl 5 - probably accidentally stable due to the focus on Perl 6
  (now Raku), but it seems to have the strongest record of backwards
  compatibility of all scripting languages suitable for systems and
  network programming on POSIX-like systems.  The scare we got from
  the Perl 7 proposal in 2020 will not be forgotten, however.
  Additional independent implementations would improve our trust
  of the language going forward.

* Xapian - A search index, not suitable for long-term archival (and
  it need not be).  There have been several DB format changes
  which required migrations across the years.  The Xapian Perl API
  has gone through incompatible changes migrating from XS to the
  SWIG API.  It's native API is C++, which seems to have its own
  share of bitrot problems from forward/backwards compatibility.

  We need to provide a migration/backup path for tags and labels in
  lei/store before lei can be trusted to store private mail.

  The behavior of the Xapian query parser does leak into public
  interfaces (lei, WWW) so unexpected changes can affect cronjobs,
  bookmarks, and such.  Fortunately, the query parser seems to
  have remained stable for many years.  This type of dependency
  appears unavoidable with any search engine which seeks to
  emulate the behavior of existing websites and tools (e.g.
  mairix(1) and notmuch(1)).

* POSIX shell - standardized by POSIX, but many tools are not and
  GNU-isms can creep in.  Perl is typically a nicer and more
  powerful language for anything longer than a few lines.

* C - Two major and several minor Free implementations supporting
  various standards with a reasonable history of forwards/backwards
  compatibility.  Build systems and non-POSIX dependencies are a
  significantly bigger bitrot problem than the language itself.

Things to avoid:

* autoconf + automake - Several backwards and forwards compatibility
  problems in the past.  Use Perl 5 and possibly POSIX make, instead.

* newer Perl 5 features - We need to support users on LTS distros and
  will never encourage the use of 3rd-party or custom Perl installs.

* GNU (awk|make|*) - Stick to POSIX features as much as possible due
  to a few instances of backwards compatibility problems.  Perl's
  standard ExtUtils::MakeMaker does tend to use GNU-isms in the
  generated Makefile, unfortunately.

* bash - Use POSIX shell for portability, or use Perl.

* C++ - BDFL isn't smart enough to understand it, but it appears more
  subject to bitrot than C.  Avoid it unless required for small pieces
  such as the native Xapian API.  Compilation is slow and the language
  seems surprising to inexperienced users, so it's unpleasant to work
  with on old hardware.

* Markdown - 927 subtly incompatible flavors and counting!  perlpod(1)
  is more appropriate for manpages, but use plain UTF-8 text for
  everything else.
