Matrix function estimation with primate
In the introduction, the basics of implicit trace estimation were introduced, wherein it shown both in theory and with code using primate
how to estimate the trace of matrix functions:
f(A) = U f(\Lambda) U^T, \quad f: [a,b] \to \mathbb{R}
In particular, the introduction briefly covered how the Lanczos method is intimately connected to Gaussian Quadrature, and how this connection enables fast randomized trace estimation of f(A).
In this post, I’ll cover how to approximate the action v \mapsto f(A)v for any function f with primate
with its matrix_function
API, and how to compose this functionality with other trace estimators.
Matrix function approximation
If A \in \mathbb{R}^{n \times n} and n is large, obtaining f(A) \in \mathbb{R}^{n \times n} explicitly can be very expensive. One way to sidestep this difficulty is to approximate v \mapsto f(A)v using the degree-k Lanczos expansion:
Q^T A Q = T \quad \Leftrightarrow \quad f(A)v \approx \lVert x \rVert \cdot Q f(T) e_1
It’s been shown by (Musco, Musco, and Sidford 2018) that this approximation has the following guarantee:
\|f(\mathbf{A}) \mathbf{x}-\mathbf{y}\| \leq 2\|\mathbf{x}\| \cdot \min _{\substack{\text { polynomial } p \\ \text { with degree }<k}}\left(\max _{x \in\left[\lambda_{\min }(\mathbf{A}), \lambda_{\max }(\mathbf{A})\right]}|f(x)-p(x)|\right)
In other words, up to a factor of 2, the error \|f(\mathbf{A}) \mathbf{x}-\mathbf{y}\| is bounded by the uniform error of the best polynomial approximation to f with degree < k. For general matrix functions, this implies that finite-precision Lanczos essentially matches strongest known exact arithmetic bounds.
This suggests an idea: can we convert any matrix-like object into a LinearOperator that transparently converts matrix-vector products Av into products with matrix-functions f(A)v?
Establishing a baseline
primate
allows the user to construct a LinearOperator
directly from an existing matrix-like object A
and callable fun
through its matrix_function
API.
As a baseline example, consider the action that adds an \epsilon amount of mass to the diagonal of A:
v \mapsto (A + \epsilon I)v
For any fixed \epsilon, imitating this matrix action can be done in three different ways:
- Obtain u = Av and then add u \gets u + \epsilon \cdot v
- Form (A + \epsilon I) and explicitly carry out the multiplication
- Multiply v by f(A) induced by f(\lambda) = \lambda + \epsilon
Lets see what the code to accomplish this using (3) looks like:
from primate.random import symmetric
from primate.operator import matrix_function
## Random matrix + input vector
= symmetric(150, pd = True)
A = np.random.uniform(size=A.shape[1])
v
## Ground truth v |-> f(A)v
= np.linalg.eigh(A)
Lambda, U = (U @ np.diag(Lambda + 0.10) @ U.T) @ v
v_truth
## Lanczos approximation
= matrix_function(A, fun = lambda x: x + 0.10)
M = np.ravel(M @ v)
v_test
= np.linalg.norm(v_truth), np.linalg.norm(v_test)
vn, vn_approx print(f"f(A)v norm: {vn:.6f}")
print(f"Approx norm: {vn_approx:.6f}")
print(f"Max diff: {np.max(np.abs(v_truth - np.ravel(v_test))):.6e}")
print(f"cosine sim: {np.dot(v_test, v_truth) / (vn * vn_approx):6e}")
f(A)v norm: 4.392660
Approx norm: 4.392660
Max diff: 5.218048e-15
cosine sim: 1.000000e+00
Observe M matches the ground truth v \mapsto (A + \epsilon I)v. In this way, one benefit of using matrix_function
is that it allows one to approximate f(A) by thinking only about what is happening at the spectral level (as opposed to the matrix level). Of course, this example is a bit non-convincing as there are simpler ways of achieving the same result as shown by (1) and (2), for example:
@ v + 0.10 * v, v_truth) np.allclose(A
True
Baseline established.
When v \mapsto f(A)v is not known
On the other hand, there are plenty of situations where one doesn’t have access to these simpler expressions. For example, consider the map:
v \mapsto (A + \epsilon I)^{-1} v
Such expressions pop up in a variety of settings, such as in Tikhonov regularization, in Schatten-norm estimation Ubaru, Saad, and Seghouane (2017), in the Cholesky factorization of PSD matrices, and so on. Unlike the previous setting, we cannot readily access v \mapsto f(A)v unless we explicitly compute the full spectral decomposition of A or the inverse of A, both of which are expensive to obtain.
## Alternative: v_truth = np.linalg.inv((A + 0.10 * np.eye(A.shape[0]))) @ v
= (U @ np.diag(np.reciprocal(Lambda + 0.10)) @ U.T) @ v
v_truth
## Lanczos approximation
= matrix_function(A, fun = lambda x: 1.0 / (x + 0.10))
M = np.ravel(M @ v)
v_test
= np.linalg.norm(v_truth), np.linalg.norm(v_test)
vn, vn_approx print(f"f(A)v norm: {vn:.6f}")
print(f"Approx norm: {vn_approx:.6f}")
print(f"Max diff: {np.max(np.abs(v_truth - np.ravel(v_test))):.6e}")
print(f"cosine sim: {np.dot(v_test, v_truth) / (vn * vn_approx):6e}")
f(A)v norm: 32.982583
Approx norm: 32.982583
Max diff: 1.626085e-05
cosine sim: 1.000000e+00
There is a larger degree of error compared to the base as evidenced by the \lVert \cdot \rVert_\infty-normed difference between v_truth
and v_test
, however this is to be expected, as in general approximating the action v \mapsto A^{-1} v will always be more difficult that v \mapsto A v, even if A is well-conditioned.
Back to trace estimation
There are several use-cases wherein one might be interested in the output f(A)v itself—see Musco, Musco, and Sidford (2018) and references within for some examples. One use-case is the extension of existing matvec
-dependent implicit trace estimators to matrix-function trace estimators.
from primate.trace import hutch, xtrace
= matrix_function(A, fun="log")
M print(f"Logdet exact: {np.sum(np.log(np.linalg.eigvalsh(A))):6e}")
print(f"Logdet Hutch: {hutch(A, fun='log'):6e}")
print(f"Logdet XTrace: {xtrace(M):6e}")
Logdet exact: -1.483218e+02
Logdet Hutch: -1.521959e+02
Logdet XTrace: -1.483155e+02
As with the hutch estimators applied to matrix functions, note that the action v \mapsto f(A)v is subject to the approximation errors studied by Musco, Musco, and Sidford (2018), making such extensions limited to functions that are well-approximated by the Lanczos method.