# Python 中的向量化运算

1. 使用 Python for 循环计算矩阵乘法

import sys
import time
import random
def matmul(a, b, c, m, n, r):
for i in range(m):
for j in range(n):
for k in range(r):
a[i][j] += b[i][k]*c[k][j]
n = int(sys.argv[1])
a = [[0.0 for _ in range(n)] for _ in range(n)]
b = [[random.random() for _ in range(n)] for _ in range(n)]
c = [[random.random() for _ in range(n)] for _ in range(n)]
matmul(a, b, c, n, n, n) # warm up
a = [[0.0 for _ in range(n)] for _ in range(n)]
b = [[random.random() for _ in range(n)] for _ in range(n)]
c = [[random.random() for _ in range(n)] for _ in range(n)]
tic = time.time()
matmul(a, b, c, n, n, n)
toc = time.time()
print(toc-tic)

N = 300 时的花费时间为 5.774713993s。

2. 使用 NumPy for 循环计算矩阵乘法

def matmul(a, b, c, m, n, r):
for i in range(m):
for j in range(n):
for k in range(r):
a[i][j] += b[i][k]*c[k][j]
a = np.zeros((n, n))
b = np.random.rand(n, n)
c = np.random.rand(n, n)

N = 300 时的花费时间为 22.72896314s。

3. 使用库函数的矩阵乘法

def matmul(a, b, c, m, n, r):
a = np.matmul(b, c)

N = 300 时的花费时间为 0.002428055s。

>>> B = np.array([[0, 1]])# shape: 1x2
>>> C = np.array([[0],
... [1],
... [2]])# shape: 3x1
>>> B + C
array([[0, 1],
[1, 2],
[2, 3]])

>>> a = np.array([1.0, 2.0, 3.0])
>>> b = np.array([2.0, 2.0, 2.0])
>>> a * b
array([ 2.,4.,6.])

>>> a = np.array([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
array([ 2.,4.,6.])

A(2d array):5 x 4
B(1d array):1
Result (2d array):5 x 4
A(2d array):5 x 4
B(1d array):4
Result (2d array):5 x 4
A(3d array):15 x 3 x 5
B(3d array):15 x 1 x 5
Result (3d array):15 x 3 x 5
A(3d array):15 x 3 x 5
B(2d array): 3 x 5
Result (3d array):15 x 3 x 5
A(3d array):15 x 3 x 5
B(2d array): 3 x 1
Result (3d array):15 x 3 x 5

A(1d array):3
B(1d array):4 # trailing dimensions do not match
A(2d array):2 x 1
B(3d array):8 x 4 x 3 # second from last dimensions mismatched
4. 向量化的矩阵乘法计算

def matmul(a, b, c, m, n, r):
m_r_n = b.reshape(m, r, 1) * c.reshape(1, r, n)
a = np.sum(m_r_n, axis=1)

N = 300 时的花费时间为 0.14920187s。

References
https://meijun.github.io/vectorized-computing/