How to forget everything

Recently I was studying the list of digital formats recom­mended by the Library of Congress for digital data—meaning, formats that are likely (though not guar­an­teed) to be well supported in the future.

One of the many lurking horrors of the digital age: though it’s become much easier to publish and retrieve infor­ma­tion in the present—e.g., we can put mate­rial on a website instead of printing & ship­ping a book—the long-term archiving of infor­ma­tion has arguably become more diffi­cult. That printed book, you can put on a shelf and it will prob­ably last hundreds of years. The data, you can … do what, exactly?

Turning data into phys­ical arti­facts is possible. But other compli­ca­tions arise. Like many who owned a computer in the ’90s, I archived a lot of data using CD-R discs. At the time, these were projected to last 100–200 years. It’s turned out, however, that many of these discs are going bad after about 10 years. This means that many CD-R data archives will anni­hi­late the infor­ma­tion they were entrusted to protect. (I was able to recover all the data from my own pile of CD-Rs because I made multiple copies. But indeed, a number of indi­vidual disks had already failed.)

One major advan­tage of printed mate­rial is that it’s self-contained: you take a book from the shelf, and it works the way the creator intended. Not so for soft­ware. Any digital file needs an app to read it & display it; the app needs a certain oper­ating system; the oper­ating system needs certain hard­ware; the hard­ware is compli­cated and always phys­i­cally dete­ri­o­rating.

The Internet Archive has been taking this problem seri­ously for 20+ years. It’s maybe best known for the Wayback Machine, its histor­ical archive of web pages. But it’s also been figuring out how to preserve other soft­ware. The best idea is to at least get rid of the hard­ware, and run the old oper­ating system as a “virtual machine” on the current system (the Internet Arcade being one example). In this way, every­thing that is needed to run a file—the OS, the app, etc.—is pack­aged together, and thus, like a book, becomes some­thing closer to self-contained.

As a soft­ware devel­oper, I use virtual machines like these to test my work on older systems. I don’t want to keep an actual Windows XP machine in my office. But I can run it anytime from a hard drive attached to my Mac. It’s easy and it works.

One of the less-noted side effects of the cloud-computing era is that this form of archiv­ability is starting to disap­pear. When you run a cloud-connected piece of soft­ware, it’s depen­dent as usual on a certain oper­ating system and certain hard­ware. But also a new ingre­dient: a server else­where on the internet that deter­mines whether the program can run. (Usually, based on whether you have a subscrip­tion, etc.)

This outside server makes it impos­sible to archive the soft­ware. Why? First, the depen­dency on the outside server can’t be encap­su­lated within a virtual machine. Second, there’s no guar­antee that the outside server will keep oper­ating. (On the contrary: every server gets unplugged, even­tu­ally.) And because the soft­ware envi­ron­ment cannot be archived, the data files that rely on that envi­ron­ment also cannot be archived.

What does this mean for the culture and knowl­edge that’s captured in digital data? I’m far from a soft­ware-freedom abso­lutist. But I think the rise of subscrip­tion-based soft­ware is terrible for the longer arc of computing, because it forces so much to become tempo­rary.

update, 344 days later

The idea of soft­ware longevity through mini­mal­istic use of computing resources is starting to perco­late upward. See, e.g.—Collapse OS, perma­com­puting, and the Uxn stack.