Two vexing problems in functional programming

1 April 2022

Since I started using Lisp languages about 10 years ago, I’ve also become a fan of functional programming. I’ve taken these habits with me whenever I return to non-Lisp languages—e.g., JavaScript, Python, Swift. No longer will I write a loop in an imperative style when I can use map or forEach or filter.

As my projects have gotten bigger, however, I’ve discovered two recurring problems that resist the functional-programming idiom: what I’ll call the large-object problem and the parent–child problem. (Yes, the first one takes a hyphen; the second, an en dash.)

Interestingly, both these problems arise from the mismatch between the functional-programming model of the world and the underlying hardware reality of the computer.

Functional programming, briefly

The core principle of functional programming is to compose the program from functions that are maximally self-contained:

On input, all the information a function needs should be passed in as a list of arguments. Functions should not rely on state, meaning extra data maintained outside the function. For example: global variables.
On output, functions should not mutate (aka change) the arguments received as input, nor induce other side effects outside the function. Instead, functions should return a value that comprises fresh data. For instance, if a function needs to change the third element of a five-element list, it would return a newly constructed list. It would not change the existing list and return nothing.

To be clear, there’s nothing inherently superior about functional programming. (Despite the fact that these self-contained functions are also known as pure functions.) It is not the way computers work at a hardware level. Though it is the way math works. Compared to imperative programming, functional programming is arguably a worse representation of low-level computing (meaning, the part that contends with the mechanical characteristics of the computer), but a better representation of high-level computing (meaning, the part that models ideas through data structures and algorithms).

In practice, functional programming manifests as a stylistic discipline. By making functions entirely self-contained as to input and output, it becomes easier to reason about the behavior of each individual function. And in turn, compositions of multiple functions. As the program grows in complexity, this per-function tidiness pays increasing dividends.

The large-object problem

Functional-programming tutorials often focus on cute little values like numbers & strings and simple lists thereof. Sometimes you see an example of recursion using a tree-shaped data structure. Seems fancy, but that’s just what happens when you allow lists to contain other lists. Otherwise, not that different.

For simple scripts that handle simple data, that’s all there is to the functional-programming story. So it’s usually about here that these tutorials clap you on the back and say “best wishes on your journey”. The rest, apparently, is just details.

For a while, that is true.

But then you start trying more elaborate projects. I got my first sinking feeling while working on the bf language tutorial for my book Beautiful Racket. In bf, every program initializes a 30K byte array. All instructions in bf operate on one byte in the array.

Consistent with my loyalty to the functional-programming idiom, I designed my initial bf implementation so that each instruction would copy the existing bytes into a new array, change the byte of interest, return this new byte array, and abandon the old array (which would eventually be garbage-collected).

Did this implementation work? Yes. Did it run fast? No. It was glacial. After noodling with the code for a while, I had my galaxy-brain moment: maybe the incessant duplication and deletion of 30K data arrays had something to do with it? I refactored so that a single byte array was created for each run of a program—oh no, a global variable!—and individual bytes were mutated within that array as needed. Result? Faster. Mind? Blown. Surprising? No. Avoiding work has always been a great way to speed up a program.

Alan Perlis was adapting an Oscar Wilde epigram when he said that “a Lisp programmer knows the value of everything, but the cost of nothing.” This was exactly my mistake. Lured by the glittering beauty of functional programming, I had casually committed to creating and destroying 30K blocks of memory zillions of times per program. The value was that it allowed me to avoid state and mutation. But allocating and destroying memory isn’t free. So the cost turned out to be far greater than I could really afford.

Indeed, Perlis’s quip gestures toward the ultimate tension within all programming languages: how to connect the gossamer realm of human ideas to the grubby reality of a computing machine. Arguably the whole point of programming languages is to insulate us from those details, and allow us to express ourselves naturally. But the farther we drift from reality of the machine, the less efficiently we can use its resources.

(Right—I understand that in the internet age, wasting computing resources has become the primary engine of the US economy. I predict, perhaps naively, that as the rate of improvement of processor and storage speeds continue to flatten out, there will be an increasing interest in languages that allow more granular control of those resources.)

This is the heart of the large-object problem. Big, complex programs tend to create big, complex data structures to represent a certain slice of reality. Functional programming, however, suggests we should create and destroy these structures willy-nilly. But memory operations are never free. Therefore, this principle of functional programming will always become impractically wasteful for sufficiently large memory structures.

For instance, my Archetype app is a GUI editor for fonts (written in Swift). Under the hood, a font is a graph of smaller elements: glyph shapes of course, but also spacing data, layout instructions, and many other fiddly fields. Altogether, a font data structure might be multiple megabytes. When I change the width of a certain glyph—a single integer—it wouldn’t make sense to throw away the whole font structure and create a new one.

Thus, my technique within Archetype has been to prefer functional programming “in the small” (i.e., when dealing with simple values or small structures) and prefer mutation “in the large” (i.e., when dealing with bigger structures).

Another bugaboo of fitting functional programming into a GUI app is that the GUI itself is a form of mutable global state. In Archetype, most changes to a font trigger some visible change in the GUI. Likewise, when the user makes changes in the GUI, it triggers an immediate update to the underlying font data. I couldn’t casually throw away a font data structure and regenerate it even if I wanted to, because there’s a whole apparatus on screen that depends on that particular structure remaining alive and available.

Which brings us to the doorstep of the other vexing problem—

The parent–child problem

Programming languages include numerous ways of creating custom data structures. Within the language, these may be called structs or objects or something else. But it boils down to the same idea: a collection of items that we can pass around our program as a single item.

The cost of this abstraction, however, is a layer of indirection: roughly, the language is allocating memory for the structure (and its collection of items) and returning a reference to that structure. This becomes the single item that we can use in the rest of the program. Each reference points to a specific chunk of memory.

But this structure reference behaves differently from other values we handle in the program. As programmers we tend to internalize these shifts between value semantics and reference semantics well enough that they become second nature.

Once we have enough of these structures, we often find that we want to make a structure that refers to other structures—the parent–child relationship.

(I’m using the term parent–child because it’s a common name for this pattern. But it unhelpfully implies some generative connection between the two. There is none. It is just a way of hierarchically composing two structures with a cross-reference. Like the real world, a parent can refer to any number of children; unlike the real world, a child can also be referred to by any number of parents.)

For instance, in Archetype, each font contains an array of glyphs; in turn, each glyph contains an array of contours; in turn, each contour contains an array of points; and so on. This example is multilayered. But the problem arises even in the simplest case of one structure holding a reference to another.

Let’s suppose in Archetype that I want to define an operation on a glyph in a functional-programming style. How would I do it? Easy—I write a function that takes a glyph as input, allocates a new glyph with the new characteristics, and returns that as its result. The old glyph is released from memory.

That’s fine for a glyph standing in isolation. But in Archetype, every glyph is a child of a font. When we invoke this functional update on a child glyph, we get a new glyph in return, but the parent font still holds a reference to the old glyph.

In essence, the parent becomes external state that relies on the child. To complete our functional update of the child glyph, we have to also update the reference within the parent to the new glyph. That is, our delightful functional operation still incurs some imperative housekeeping.

“I know—let’s redefine our child operation to take both a parent and child as input, then we can functionally modify both.” Two problems with this:

It entangles an operation on a child structure with a parent structure. Though it may be worthwhile to tolerate this untidiness.
(Indeed, the idea of passing parent and child as arguments is pretty much how object methods are implemented in many languages: notation like parent.foo(child) is rewritten as foo(parent, child), and within the body of the function, any uses of this are rewritten as parent. Wait, did I just call OOP untidy? Yes. Though I can’t be the only functional programmer who struggles with OOP the way vegetarians struggle with bacon.)
The deeper lurking horror is that there may be any number of other things that hold a reference to the old glyph. After a functional update, all of them will retain their reference to the outdated glyph. That is, when we say “parent–child problem” there could be any number of parents refering to a certain child. Whatever operation we contemplate on the child, all the parents have to be updated.

Functional programming has no solution for this. We end up having to adopt one of three patterns:

Parent–child links are bidirectional. That way, when a child is updated, it can trace back to all its parents and update them too. This is awful, because it creates even more entanglement between structures that were intended to be separate.
Children can issue notification events. This is like shooting a flare gun into the program, and other structures can take whatever action is appropriate. This is possibly better than the first option because children don’t have to know or care who their parents are. But it still requires introducing a new control mechanism into the program. It also pushes the program farther away from a functional orientation, because each notification event triggers side effects that can become hard to reason about.
I use notification events within Archetype. It was gnarlier than I supposed. One ends up needing to build a new graph representing the flow of notification events that sort of but not quite matches the data graph, and keep these in sync as the data changes.
Or we can avoid all this complexity by mutating the existing child structure. The parents retain their existing references, but when they access the structure on the other end of that reference, they get the updated version.

In my experience, option (3) usually ends up being most attractive because it’s simple and contained. Though as that pattern spreads, we end up with less and less functional programming.

How could the parent–child problem be solved?

My best idea—which I have yet to widely adopt—is to introduce a proxy that I’ll call a mutable mediator. Each child has a mediator. Instead of parents pointing directly at a child, the parents point at the mediator, and the mediator points at the child.

When a functional update on a child structure occurs, the mediator is updated with the reference to the new child. But all the parents still point at the same mediator. So the next time they reference the child structure (via the mediator), they get the new version. In sum, the idea is that if some mutation is inevitable, we can at least centralize and simplify it.

Furthermore, the mediator does not preclude mutable access to the child. Any parent can just reach through the mediator to the child and change it. The change will immediately be visible to every parent also holding a reference to that mediator.

Why have I not adopted this? Ideally parents shouldn’t need to care whether a child is accessed directly or through a mediator. It should just be an implementation detail. But this means a mediator needs to be able to offer the same calling API as the structure it mediates. This is easier said than done, especially if we don’t want to create a whole army of mediators, one for each type of child, with a lot of boilerplate wrapper functions.

Here’s my hunch. (Which could be terrible.) Because the mutable mediator is essentially a pointer to a pointer—aka double pointer—it belongs to the realm of lower-level memory mangement. Thus it is probably best handled within the language compiler and exposed as a language feature, or just the default way user-defined data structures are stored and accessed.

“Dude, that’s exactly the difference between how mutable and immutable structures are implemented in certain languages.” Maybe so. But AFAIK it’s never the case that a mutable structure, once created, can also be treated as immutable. Once we choose, we’re stuck with it. The mutable-mediator pattern, by contrast, would allow a structure to behave as both mutable and immutable. Duality.

I’m not a Rust user, but roughly I understand that its borrow checker lets you “check out” a memory object using either a mutable or immutable reference. If so, then my hunch isn’t completely terrible, because better programmers have also hatched the idea of maintaining the mutable–immutable duality. No idea, however, whether the Rust borrow checker resolves the parent–child problem. I gather not. Rather, it appears to ensure memory safety (by guaranteeing that only one mutable reference can circulate at a time). Cunningham’s Law is exceptionally strong for Rust, so I’m sure I’ll have an answer shortly.

And coming full circle—

How could the large-object problem be solved?

I have no idea. As we saw earlier, baked into functional programming is the notion that for some sufficiently small size, memory allocation and destruction can be treated as if it were free. In practice, it’s easy to exceed that threshold faster than we expect.

The most productive angle would be to decompose large objects into smaller ones that are closer to that basically-free threshold. So instead of a binary blob representing a million foos, introduce a bar structure that holds a million individual foo structures that can be updated individually. Keep subdividing until you get to the structure size you like.

Of course, that should sound familiar. We’ve merely transmuted the large-object problem into the parent–child problem. So solving the parent–child problem is the key.