brainiak.matnormal
¶Michael Shvartsman m.shvartsman.work@gmail.com
Shvartsman, M., Sundaram, N., Aoi, M., Charles, A., Willke, T. L., & Cohen, J. D. (2018). Matrix-normal models for fMRI analysis. International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 1914–1923. Extended version available at link
Describes how to formulate a number of common fMRI analysis methods available in BrainIAK
as matrix-normal models, and shows some benefits of this formulation.
Cai, M. B., Shvartsman, M., Wu, A., Zhang, H., & Zhu, X. (2020). Incorporating structured assumptions with probabilistic graphical models in fMRI data analysis. Neuropsychologia, 144, 1–23. link
Provides an alternate framing of the matrix normal model focusing on the modeling of structured residuals.
Magnus, J. R., & Neudecker, H. (1988). Matrix differential calculus with applications in statistics and econometrics. This is the standard reference for matrix calculus. A summary of some important identities may also be found on Wikipedia at link
.
Katanoda, K., Matsuda, Y., & Sugishita, M. (2002). A spatio-temporal regression model for the analysis of functional MRI data. NeuroImage, 17(3), 1415–1428. link
Example of a regression model for fMRI with separable residuals.
Hartvig, N. V. (2002). A stochastic geometry model for functional magnetic resonance images. Scandinavian Journal of Statistics, 29(3), 333–353. link
Example of a separable residual covariance to a spatial activation model for fMRI data.
Kia, S. M., Beckmann, C. F., & Marquand, A. F. (2018). Scalable multi-task gaussian process tensor regression for normative modeling of structured variation in neuroimaging data. Retrieved from link
Example of using tensor regression models for analyzing fMRI data.
Unlike many of the other tools in brainiak
, the brainiak.matnormal
package is only a little bit about specific methods and a lot about letting you try new ideas and method variants quickly. If the variants are useful, they can be sped up and made neater for broader consumption. To understand the idea behind matrix-normal or kronecker-separable models, consider the following figure:
Matrix normal models simultaneously model spatial and temporal residuals. [A]: a schematic view of a vectorized data matrix, where each voxel's time series is vertically concatenated (in orange), and the covariance of every voxel at every timepoint with every other voxel at every other timepoint is modeled. Modeling all of these elements independently is intractable, and some structure needs to be imposed -- in this case, kronecker-separable structure. [B]: the un-vectorized data matrix (orange rectangle), and its spatial and temporal covariances on the right and bottom. [C]: A matrix-normal distribution with the mean M and row/column covariances R, C is equivalent to the large structure in [A], but can be much more tractable to estimate. Figure and caption reused under CC-BY-NC-ND from doi:10.1016/j.neuropsychologia.2020.107500.
To understand how simple it is to prototype new matrix-normal models, consider first simple linear regression:
$$ Y \sim \mathcal{MN}(XB, I, I), $$where $Y$ is a centered TRs by voxels matrix of brain data, $X$ is a TRs by conditions design matrix, and $B$ is a conditions by voxels coefficients matrix. Notated this way, the model is the conventional massively-univariate GLM: the activation at every voxel and every timepoint is a linear combination (aka weighted sum) of the condition coding, weighted by the coefficients -- or equivalently a weighted sum of the coefficients, with the condition codes as weights. We can generate a simple dataset and show how simple it is to fit such a model by maximum likelihood using automatically computed gradients.
# General imports and setup
import numpy as np
import scipy.linalg
from numpy.linalg import cholesky
import seaborn as sns
import tensorflow as tf
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.special import expit as inv_logit
from scipy.stats import norm, pearsonr
from sklearn.datasets import make_friedman1
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LinearRegression
np.random.seed(1)
tf.random.set_seed(1)
sns.set_theme()
# brainiak.matnormal.covs provides implementations of various
# structured covariances. We will discuss this shortly.
from brainiak.matnormal.covs import (
CovAR1,
CovDiagonal,
CovDiagonalGammaPrior,
CovIdentity,
CovUnconstrainedCholesky,
)
# brainiak.matnormal.matnormal_likelihoods provides efficient implementations
# of matrix-normal likelihoods, including marginals and conditionals
from brainiak.matnormal.matnormal_likelihoods import (
matnorm_logp,
matnorm_logp_marginal_row,
matnorm_logp_marginal_col,
)
# brainiak.matnormal provides implementations of MN variants of some existing analyses
from brainiak.matnormal.mnrsa import MNRSA
from brainiak.matnormal.regression import MatnormalRegression
# brainiak.matnormal.utils provides helpers that (mostly) make it easy
# to interface between tensorflow and scipy optimizers
from brainiak.matnormal.utils import (
make_val_and_grad,
pack_trainable_vars,
rmn,
unpack_trainable_vars,
)
# this is used in the MNRSA example
from brainiak.utils.utils import cov2corr
# Generate data
conditions = 5
voxels = 100
TRs = 50
X, _ = make_friedman1(n_samples=TRs, n_features=conditions, random_state=None)
B = 2 * np.random.random(size=(conditions, voxels)) # pretty generous SNR of 2
Y = X @ B + np.random.normal(size=(TRs, voxels))
# Now we create the tensorflow objects we need. The covariances are independent
# since this is the univariate GLM.
space_cov = CovIdentity(size=voxels)
time_cov = CovIdentity(size=TRs)
# this is our estimate of B, and the wrappers for X and Y.
B_hat = tf.Variable(np.random.normal(size=(conditions, voxels)), name="beta")
Y_tf = tf.constant(Y)
X_tf = tf.constant(X)
# construct loss (negative log likelihood)
# note that params are ignored by this function but implicitly
# tracked by tensorflow to compute gradients
def loss(params):
return -matnorm_logp(Y_tf - X_tf @ B_hat, time_cov, space_cov)
val_and_grad = make_val_and_grad(lossfn=loss, train_vars=[B_hat])
initial_guess = pack_trainable_vars([B_hat])
opt_results = minimize(fun=val_and_grad, x0=initial_guess, jac=True, method="L-BFGS-B")
print(opt_results.message) # check that we converged
_ = B_hat.assign(unpack_trainable_vars(opt_results.x, trainable_vars=[B_hat])[0])
plt.plot(B.flatten(), B_hat.numpy().flatten(), "bo")
plt.xlabel("True B")
plt.ylabel("Estimated B")
plt.title("True vs estimated B")
print(
f"True and estimated B correlation: {pearsonr(B_hat.numpy().flatten(), B.flatten())[0]}"
)
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH' True and estimated B correlation: 0.78999786325132
In practice, a simple model like this could just as easily be fit using more standard tools. More interesting is if we (reasonably) assume that the noise in our data is structured, and we want to model that structure rather than try to prewhiten it, for example:
$$ \sigma_v^{-1} \sim \mathcal{InvGamma}(\alpha, \beta)\\ Y \sim \mathcal{MN}(XB, \Sigma_{AR1},\mathrm{diag}(\sigma_v)), $$where $\Sigma_{AR1}$ is a structured covariance matrix with AR(1) structure, and the spatial covariance is diagonal (independent) but with varying scales for each voxel and we use an inverse-gamma prior to regularize them. Estimating this model is very similar:
# generate AR(1) temporal noise that is independent in space
true_time_cov = CovAR1(size=TRs)
true_space_cov = CovDiagonalGammaPrior(size=voxels)
noise = rmn(rowcov=true_time_cov._cov, colcov=true_space_cov._cov)
Y = X @ B + noise
# This time space_cov is AR(1)
space_cov = CovDiagonalGammaPrior(size=voxels)
time_cov = CovAR1(size=TRs)
# Reset B_hat
B_hat = tf.Variable(np.random.normal(size=(conditions, voxels)), name="beta")
# Now we estimate B_hat, but also the AR parameters and the
# voxelwise residual variances (which the relevant
# cov objects know about and expose to us)
train_vars = [B_hat] + time_cov.get_optimize_vars() + space_cov.get_optimize_vars()
# now this loss incorporates the log-likelihood of our covariance parameters
def loss(params):
return (
-matnorm_logp(Y_tf - X_tf @ B_hat, time_cov, space_cov)
- space_cov.logp
- time_cov.logp
)
val_and_grad = make_val_and_grad(lossfn=loss, train_vars=train_vars)
initial_guess = pack_trainable_vars(train_vars)
opt_results = minimize(fun=val_and_grad, x0=initial_guess, jac=True, method="L-BFGS-B")
print(opt_results.message) # check that we converged
# assign the AR parameters as well as B
unpacked_theta = unpack_trainable_vars(opt_results.x, trainable_vars=train_vars)
for var, val in zip(train_vars, unpacked_theta):
var.assign(val)
plt.plot(B.flatten(), B_hat.numpy().flatten(), "bo")
plt.xlabel("True B")
plt.ylabel("Estimated B")
plt.title("True vs estimated B")
print(
f"True and estimated B correlation: {pearsonr(B_hat.numpy().flatten(), B.flatten())[0]}"
)
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH' True and estimated B correlation: 0.7891954003335606
brainiak.matnormal
provides a convenience wrapper for such regression models (as MatnormalRegression
).
model = MatnormalRegression(time_cov=time_cov, space_cov=space_cov)
model.fit(X, Y, naive_init=False)
plt.plot(B.flatten(), model.beta_.flatten(), "bo")
plt.xlabel("True B")
plt.ylabel("Estimated B")
plt.title("True vs estimated B")
print(
f"True an estimated B correlation (MN): {pearsonr(model.beta_.flatten(), B.flatten())[0]}"
)
True an estimated B correlation (MN): 0.6194865860962584
The covariance-modeling approach should work better than either ignoring covariance structure in the data, or pre-whitening by dividing out the correlations. So we now compare to both of those approaches on the same synthetic dataset.
glm_ignore_cov = LinearRegression()
glm_ignore_cov.fit(X, Y)
# prewhiten
W = np.linalg.cholesky(np.cov(Y))
Y_whitened = np.linalg.solve(W, Y)
X_whitened = np.linalg.solve(W, X)
coef_whitened = np.linalg.solve(X_whitened.T @ X_whitened, X_whitened.T @ Y_whitened)
glm_whitened = LinearRegression()
glm_whitened.fit(X_whitened, Y_whitened)
fig, axes = plt.subplots(1, 3, sharex=False, sharey=True, figsize=(18, 6))
axes[0].plot(B.flatten(), model.beta_.flatten(), "bo", label="MN")
axes[0].set_title("True vs estimated B (MN)")
axes[1].plot(
B.flatten(),
glm_ignore_cov.coef_.T.flatten(),
"go",
label="GLM (ignore cov)",
)
axes[1].set_title("True vs estimated B (GLM, ignore cov)")
axes[2].plot(
B.flatten(),
glm_whitened.coef_.T.flatten(),
"ro",
label="GLM (pre-whiten)",
)
axes[2].set_title("True vs estimated B (GLM, prewhiten cov)")
for ax in axes:
ax.set_xlabel("True B")
ax.set_ylabel("Estimated B")
plt.xlim(0, 2)
plt.ylim(-5, 5)
print(
f"True and estimated B correlation (MN): {pearsonr(B.flatten(), model.beta_.flatten())[0]}"
)
print(
f"True and estimated B correlation (GLM, ignore cov): {pearsonr(B.flatten(),glm_ignore_cov.coef_.T.flatten())[0]}"
)
print(
f"True and estimated B correlation (GLM, prewhiten cov): {pearsonr(B.flatten(), glm_whitened.coef_.T.flatten())[0]}"
)
True and estimated B correlation (MN): 0.6194865860962584 True and estimated B correlation (GLM, ignore cov): 0.38985685962097216 True and estimated B correlation (GLM, prewhiten cov): 0.5315137290009893