I need to write a plugin for an application which is extensible using Python 2.7. It needs to execute a rather complex dynamic algorithm which works on a rectengular matrix of integers.
The default Python installation that comes with that application does not include a numeric library like numpy, so unfortunately I have to implement this using only the Python stdlib.
I tried several different approaches to represent the matrix in memory:
values = defaultdict(int)
values = [[0 for _ in range(width)] for _ in range(height)]
values =  * (width * height) # access like values[j*width + i] later
values = [ * width for _ in range(height)]
The dict approach is only there for completeness, it's actually not very beneficial because every element is accessed.
From my measurements, the last of those seems the fastest to build and access. However, I am surprised that there is no built-in matrix functionality. From what I have learned about Python so far, if you don't find some obvious functionality in the stdlib, the most likely reason is that you haven't looked hard enough.
So I wonder whether this can be optimized even further, for example using the
array module or some other feature that I'm not aware of.
array module might be faster when the matrix gets large, because it can pack values more compactly; it can be used with the
values[j*width + i] convention. But no, there's no multidimensional array in the Python standard library, probably because (a) Numpy already fills that niche effectively and (b) you can always make a list of lists if performance is not paramount.
The fastest option really depends on the algorithm. The
dict-based approach might actually be the fastest when the matrices you're handling are very sparse (which in DP algorithms they're usually not, at least not in the ones I saw).
You could either use a default dictionary, and use 2-tuples as indexes, or implement a custom class and implement the
__setitem__ methods to deal with 2-tuples as indexes and store the result in a Python array:
from array import array class Array2D(object): def __init__(self, w, h): self.data = array("f",  * w * h) self.width = w def __getitem__(self, index): return self.data[index * self.width + index] def __setitem__(self, index, value): self.data[index * self.width + index] = value
Or, using a defaultdict, you might get better performance:
>>> from collections import defaultdict >>> d = defaultdict(lambda : 0) >>> d[0,0] 0 >>> d[0,0] = 2.5 >>> d[0,0] 2.5 >>>