Two vexing problems in functional programming

Since I started using Lisp languages about 10 years ago, I’ve also become a fan of func­tional program­ming. I’ve taken these habits with me when­ever I return to non-Lisp languages—e.g., JavaScript, Python, Swift. No longer will I write a loop in an imper­a­tive style when I can use map or forEach or filter.

As my projects have gotten bigger, however, I’ve discov­ered two recur­ring prob­lems that resist the func­tional-program­ming idiom: what I’ll call the large-object problem and the parent–child problem. (Yes, the first one takes a hyphen; the second, an en dash.)

Inter­est­ingly, both these prob­lems arise from the mismatch between the func­tional-program­ming model of the world and the under­lying hard­ware reality of the computer.

Functional programming, briefly

The core prin­ciple of func­tional program­ming is to compose the program from func­tions that are maxi­mally self-contained:

To be clear, there’s nothing inher­ently supe­rior about func­tional program­ming. (Despite the fact that these self-contained func­tions are also known as pure func­tions.) It is not the way computers work at a hard­ware level. Though it is the way math works. Compared to imper­a­tive program­ming, func­tional program­ming is arguably a worse repre­sen­ta­tion of low-level computing (meaning, the part that contends with the mechan­ical char­ac­ter­is­tics of the computer), but a better repre­sen­ta­tion of high-level computing (meaning, the part that models ideas through data struc­tures and algo­rithms).

In prac­tice, func­tional program­ming mani­fests as a styl­istic disci­pline. By making func­tions entirely self-contained as to input and output, it becomes easier to reason about the behavior of each indi­vidual func­tion. And in turn, compo­si­tions of multiple func­tions. As the program grows in complexity, this per-func­tion tidi­ness pays increasing divi­dends.

The large-object problem

Func­tional-program­ming tuto­rials often focus on cute little values like numbers & strings and simple lists thereof. Some­times you see an example of recur­sion using a tree-shaped data struc­ture. Seems fancy, but that’s just what happens when you allow lists to contain other lists. Other­wise, not that different.

For simple scripts that handle simple data, that’s all there is to the func­tional-program­ming story. So it’s usually about here that these tuto­rials clap you on the back and say “best wishes on your journey”. The rest, appar­ently, is just details.

For a while, that is true.

But then you start trying more elab­o­rate projects. I got my first sinking feeling while working on the bf language tuto­rial for my book Beau­tiful Racket. In bf, every program initial­izes a 30K byte array. All instruc­tions in bf operate on one byte in the array.

Consis­tent with my loyalty to the func­tional-program­ming idiom, I designed my initial bf imple­men­ta­tion so that each instruc­tion would copy the existing bytes into a new array, change the byte of interest, return this new byte array, and abandon the old array (which would even­tu­ally be garbage-collected).

Did this imple­men­ta­tion work? Yes. Did it run fast? No. It was glacial. After noodling with the code for a while, I had my galaxy-brain moment: maybe the inces­sant dupli­ca­tion and dele­tion of 30K data arrays had some­thing to do with it? I refac­tored so that a single byte array was created for each run of a program—oh no, a global vari­able!—and indi­vidual bytes were mutated within that array as needed. Result? Faster. Mind? Blown. Surprising? No. Avoiding work has always been a great way to speed up a program.

Alan Perlis was adapting an Oscar Wilde epigram when he said that “a Lisp programmer knows the value of every­thing, but the cost of nothing.” This was exactly my mistake. Lured by the glit­tering beauty of func­tional program­ming, I had casu­ally committed to creating and destroying 30K blocks of memory zillions of times per program. The value was that it allowed me to avoid state and muta­tion. But allo­cating and destroying memory isn’t free. So the cost turned out to be far greater than I could really afford.

Indeed, Perlis’s quip gestures toward the ulti­mate tension within all program­ming languages: how to connect the gossamer realm of human ideas to the grubby reality of a computing machine. Arguably the whole point of program­ming languages is to insu­late us from those details, and allow us to express ourselves natu­rally. But the farther we drift from reality of the machine, the less effi­ciently we can use its resources.

(Right—I under­stand that in the internet age, wasting computing resources has become the primary engine of the US economy. I predict, perhaps naively, that as the rate of improve­ment of processor and storage speeds continue to flatten out, there will be an increasing interest in languages that allow more gran­ular control of those resources.)

This is the heart of the large-object problem. Big, complex programs tend to create big, complex data struc­tures to repre­sent a certain slice of reality. Func­tional program­ming, however, suggests we should create and destroy these struc­tures willy-nilly. But memory oper­a­tions are never free. There­fore, this prin­ciple of func­tional program­ming will always become imprac­ti­cally wasteful for suffi­ciently large memory struc­tures.

For instance, my Arche­type app is a GUI editor for fonts (written in Swift). Under the hood, a font is a graph of smaller elements: glyph shapes of course, but also spacing data, layout instruc­tions, and many other fiddly fields. Alto­gether, a font data struc­ture might be multiple megabytes. When I change the width of a certain glyph—a single integer—it wouldn’t make sense to throw away the whole font struc­ture and create a new one.

Thus, my tech­nique within Arche­type has been to prefer func­tional program­ming “in the small” (i.e., when dealing with simple values or small struc­tures) and prefer muta­tion “in the large” (i.e., when dealing with bigger struc­tures).

Another bugaboo of fitting func­tional program­ming into a GUI app is that the GUI itself is a form of mutable global state. In Arche­type, most changes to a font trigger some visible change in the GUI. Like­wise, when the user makes changes in the GUI, it trig­gers an imme­diate update to the under­lying font data. I couldn’t casu­ally throw away a font data struc­ture and regen­erate it even if I wanted to, because there’s a whole appa­ratus on screen that depends on that partic­ular struc­ture remaining alive and avail­able.

Which brings us to the doorstep of the other vexing problem—

The parent–child problem

Program­ming languages include numerous ways of creating custom data struc­tures. Within the language, these may be called structs or objects or some­thing else. But it boils down to the same idea: a collec­tion of items that we can pass around our program as a single item.

The cost of this abstrac­tion, however, is a layer of indi­rec­tion: roughly, the language is allo­cating memory for the struc­ture (and its collec­tion of items) and returning a refer­ence to that struc­ture. This becomes the single item that we can use in the rest of the program. Each refer­ence points to a specific chunk of memory.

But this struc­ture refer­ence behaves differ­ently from other values we handle in the program. As program­mers we tend to inter­nalize these shifts between value seman­tics and refer­ence seman­tics well enough that they become second nature.

Once we have enough of these struc­tures, we often find that we want to make a struc­ture that refers to other struc­tures—the parent–child rela­tion­ship.

(I’m using the term parent–child because it’s a common name for this pattern. But it unhelp­fully implies some gener­a­tive connec­tion between the two. There is none. It is just a way of hier­ar­chi­cally composing two struc­tures with a cross-refer­ence. Like the real world, a parent can refer to any number of chil­dren; unlike the real world, a child can also be referred to by any number of parents.)

For instance, in Arche­type, each font contains an array of glyphs; in turn, each glyph contains an array of contours; in turn, each contour contains an array of points; and so on. This example is multi­lay­ered. But the problem arises even in the simplest case of one struc­ture holding a refer­ence to another.

Let’s suppose in Arche­type that I want to define an oper­a­tion on a glyph in a func­tional-program­ming style. How would I do it? Easy—I write a func­tion that takes a glyph as input, allo­cates a new glyph with the new char­ac­ter­is­tics, and returns that as its result. The old glyph is released from memory.

That’s fine for a glyph standing in isola­tion. But in Arche­type, every glyph is a child of a font. When we invoke this func­tional update on a child glyph, we get a new glyph in return, but the parent font still holds a refer­ence to the old glyph.

In essence, the parent becomes external state that relies on the child. To complete our func­tional update of the child glyph, we have to also update the refer­ence within the parent to the new glyph. That is, our delightful func­tional oper­a­tion still incurs some imper­a­tive house­keeping.

“I know—let’s rede­fine our child oper­a­tion to take both a parent and child as input, then we can func­tion­ally modify both.” Two prob­lems with this:

  1. It entan­gles an oper­a­tion on a child struc­ture with a parent struc­ture. Though it may be worth­while to tolerate this untidi­ness.

    (Indeed, the idea of passing parent and child as argu­ments is pretty much how object methods are imple­mented in many languages: nota­tion like parent.foo(child) is rewritten as foo(parent, child), and within the body of the func­tion, any uses of this are rewritten as parent. Wait, did I just call OOP untidy? Yes. Though I can’t be the only func­tional programmer who strug­gles with OOP the way vege­tar­ians struggle with bacon.)

  2. The deeper lurking horror is that there may be any number of other things that hold a refer­ence to the old glyph. After a func­tional update, all of them will retain their refer­ence to the outdated glyph. That is, when we say “parent–child problem” there could be any number of parents refering to a certain child. What­ever oper­a­tion we contem­plate on the child, all the parents have to be updated.

Func­tional program­ming has no solu­tion for this. We end up having to adopt one of three patterns:

  1. Parent–child links are bidi­rec­tional. That way, when a child is updated, it can trace back to all its parents and update them too. This is awful, because it creates even more entan­gle­ment between struc­tures that were intended to be sepa­rate.

  2. Chil­dren can issue noti­fi­ca­tion events. This is like shooting a flare gun into the program, and other struc­tures can take what­ever action is appro­priate. This is possibly better than the first option because chil­dren don’t have to know or care who their parents are. But it still requires intro­ducing a new control mech­a­nism into the program. It also pushes the program farther away from a func­tional orien­ta­tion, because each noti­fi­ca­tion event trig­gers side effects that can become hard to reason about.

    I use noti­fi­ca­tion events within Arche­type. It was gnarlier than I supposed. One ends up needing to build a new graph repre­senting the flow of noti­fi­ca­tion events that sort of but not quite matches the data graph, and keep these in sync as the data changes.

  3. Or we can avoid all this complexity by mutating the existing child struc­ture. The parents retain their existing refer­ences, but when they access the struc­ture on the other end of that refer­ence, they get the updated version.

In my expe­ri­ence, option (3) usually ends up being most attrac­tive because it’s simple and contained. Though as that pattern spreads, we end up with less and less func­tional program­ming.

How could the parent–child problem be solved?

My best idea—which I have yet to widely adopt—is to intro­duce a proxy that I’ll call a mutable medi­ator. Each child has a medi­ator. Instead of parents pointing directly at a child, the parents point at the medi­ator, and the medi­ator points at the child.

When a func­tional update on a child struc­ture occurs, the medi­ator is updated with the refer­ence to the new child. But all the parents still point at the same medi­ator. So the next time they refer­ence the child struc­ture (via the medi­ator), they get the new version. In sum, the idea is that if some muta­tion is inevitable, we can at least centralize and simplify it.

Further­more, the medi­ator does not preclude mutable access to the child. Any parent can just reach through the medi­ator to the child and change it. The change will imme­di­ately be visible to every parent also holding a refer­ence to that medi­ator.

Why have I not adopted this? Ideally parents shouldn’t need to care whether a child is accessed directly or through a medi­ator. It should just be an imple­men­ta­tion detail. But this means a medi­ator needs to be able to offer the same calling API as the struc­ture it medi­ates. This is easier said than done, espe­cially if we don’t want to create a whole army of medi­a­tors, one for each type of child, with a lot of boil­er­plate wrapper func­tions.

Here’s my hunch. (Which could be terrible.) Because the mutable medi­ator is essen­tially a pointer to a pointer—aka double pointer—it belongs to the realm of lower-level memory mange­ment. Thus it is prob­ably best handled within the language compiler and exposed as a language feature, or just the default way user-defined data struc­tures are stored and accessed.

“Dude, that’s exactly the differ­ence between how mutable and immutable struc­tures are imple­mented in certain languages.” Maybe so. But AFAIK it’s never the case that a mutable struc­ture, once created, can also be treated as immutable. Once we choose, we’re stuck with it. The mutable-medi­ator pattern, by contrast, would allow a struc­ture to behave as both mutable and immutable. Duality.

I’m not a Rust user, but roughly I under­stand that its borrow checker lets you “check out” a memory object using either a mutable or immutable refer­ence. If so, then my hunch isn’t completely terrible, because better program­mers have also hatched the idea of main­taining the mutable–immutable duality. No idea, however, whether the Rust borrow checker resolves the parent–child problem. I gather not. Rather, it appears to ensure memory safety (by guar­an­teeing that only one mutable refer­ence can circu­late at a time). Cunningham’s Law is excep­tion­ally strong for Rust, so I’m sure I’ll have an answer shortly.

And coming full circle—

How could the large-object problem be solved?

I have no idea. As we saw earlier, baked into func­tional program­ming is the notion that for some suffi­ciently small size, memory allo­ca­tion and destruc­tion can be treated as if it were free. In prac­tice, it’s easy to exceed that threshold faster than we expect.

The most produc­tive angle would be to decom­pose large objects into smaller ones that are closer to that basi­cally-free threshold. So instead of a binary blob repre­senting a million foos, intro­duce a bar struc­ture that holds a million indi­vidual foo struc­tures that can be updated indi­vid­u­ally. Keep subdi­viding until you get to the struc­ture size you like.

Of course, that should sound familiar. We’ve merely trans­muted the large-object problem into the parent–child problem. So solving the parent–child problem is the key.