I'm implementing something similar to a Spreadsheet engine in Haskell.
ETables, which have rows of cells containing expressions in the form of ASTs (e.g.
BinOp + 2 2), which can contain references to other cells of
The main function should convert these
VTables, which contain a fully resolved value in the cell (e.g. the cell
BinOp + 2 2 should be resolved to
IntValue 4). This is pretty easy when cells have no external references, because you can just build the value bottom up from the expression AST of the cell (e.g.
eval (BinOpExpr op l r) = IntValue $ (eval l) op (eval r), minus unboxing and typechecking) all the way to the table (
evalTable = (map . map) eval rows)
However, I can't think of a "natural" way of handling this when external references are thrown into the mix. Am I correct to assume that I can't just call
eval on the referenced cell and use its value, because Haskell is not smart enough to cache the result and re-use it when that cell is independently evaluated?
The best thing I came up with is using a
State [VTable] which is progressively filled, so the caching is explicit (each eval call updates the state with the return value before returning). This should work, however it feels "procedural". Is there a more idiomatic approach available that I'm missing?
Haskell doesn't memoization by default because that would generally take up too much memory, so you can't just rely on
eval doing the right thing. However, the nature of lazy evaluation means that data structures are, in a sense, memoized: each thunk in a large lazy structure is only evaluated once. This means that you can memoize a function by defining a large lazy data structure internally and replacing recursive calls with accesses into the structure—each part of the structure will be evaluated at most once.
I think the most elegant approach to model your spreadsheet would be a large, lazy directed graph with the cells as nodes and references as edges. Then you would need to define the
VTable graph in a recursive way such that all recursion goes through the graph itself, which will memoize the result in the way I described above.
There are a couple of handy ways to model a graph. One option would be to use an explicit map with integers as node identifiers—
IntMap or even an array of some sort could work. Another option is to use an existing graph library; this will save you some work and ensure you have a nice graph abstraction, but will take some effort up front to understand. I'm a big fan of the fgl, the "functional graph library", but it does take a bit of up-front reading and thinking to understand. The performance isn't going to be very different because it's also implemented in terms of
Tooting my own horn a bit, I've written a couple of blog posts expanding on this answer: one about memoization with lazy structures (with pictures!) and one about the functional graph library. Putting the two ideas together should get you what you want, I believe.