# brainiak.reprsimil package¶

A Bayesian method to perform Representational Similarity Analysis

## brainiak.reprsimil.brsa module¶

Bayesian Representational Similarity Analysis (BRSA)

This implementation is based on [Cai2016] and [Cai2019]:

Cai2016

“A Bayesian method for reducing bias in neural representational similarity analysis”, M.B. Cai, N.W. Schuck, J.W. Pillow, Y. Niv, Advances in Neural Information Processing Systems 29, 2016, 4952–4960 Available at: http://papers.nips.cc/paper/6131-a-bayesian-method-for-reducing-bias-in-neural-representational-similarity-analysis.pdf

Cai2019

“Representational structure or task structure? Bias in neural representational similarity analysis and a Bayesian method for reducing bias”, M.B. Cai, N.W. Schuck, J.W. Pillow, Y. Niv, PLoS computational biology 15.5 (2019): e1006299. https://doi.org/10.1371/journal.pcbi.1006299

BRSA is based on [Cai2016] with additional consideration of spatial noise correlation proposed in [Cai2019]. GBRSA is based on [Cai2019]. GBRSA may perform better than BRSA due to marginalization of all voxel-wise parameters. It can be use for single participant as well.

class brainiak.reprsimil.brsa.BRSA(n_iter=100, rank=None, auto_nuisance=True, n_nureg=None, nureg_zscore=True, nureg_method='PCA', baseline_single=False, GP_space=False, GP_inten=False, space_smooth_range=None, inten_smooth_range=None, tau_range=5.0, tau2_prior=<function prior_GP_var_inv_gamma>, eta=0.0001, init_iter=20, optimizer='L-BFGS-B', random_state=None, anneal_speed=10, tol=0.0001, minimize_options={'disp': False, 'gtol': 0.0001, 'maxiter': 6})

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Bayesian representational Similarity Analysis (BRSA)

Given the time series of neural imaging data in a region of interest (ROI) and the hypothetical neural response (design matrix) to each experimental condition of interest, calculate the shared covariance matrix U of the voxels(recording unit)’ response profiles beta_i to each condition, and the relative SNR of each voxels. The relative SNR could be considered as the degree of contribution of each voxel to this shared covariance matrix. A correlation matrix converted from the covariance matrix U will be provided as a quantification of neural representational similarity.

\begin{align}\begin{aligned}Y = X \cdot \beta + X_0 \cdot \beta_0 + \epsilon\\\beta_i \sim N(0,(s_{i} \sigma_{i})^2 U)\\\epsilon_i \sim AR(1)\end{aligned}\end{align}

Please note that the model assumes that the covariance matrix U which all beta_i follow is zero-meaned. This assumption does not imply there must be both positive and negative responses across voxels. However, it means that Bayesian RSA treats the task-evoked activity against baseline BOLD level as signal, while in other RSA tools the deviation of task-evoked activity in each voxel from the average task-evoked activity level across voxels may be considered as signal of interest. Due to this assumption in BRSA, relatively high degree of similarity may be expected when the activity patterns of two task conditions both include strong sensory driven signals regardless of their specific stimuli. When two task conditions elicit exactly the same activity patterns but only differ in their global magnitudes, under the assumption in BRSA, their similarity is 1; under the assumption that only deviation of pattern from average patterns is signal of interest, their similarity should be -1.

Parameters
• n_iter (int.) – Number of maximum iterations to run the algorithm.

• rank (int. Default: None) – The rank of the covariance matrix. If not provided, the covariance matrix will be assumed to be full rank. When you have many conditions (e.g., calculating the similarity matrix of responses to each event), you might try specifying a lower rank.

• auto_nuisance (boolean.) – In order to model spatial correlation between voxels that cannot be accounted for by common response captured in the design matrix, we assume that a set of time courses not related to the task conditions are shared across voxels with unknown amplitudes. One approach is for users to provide time series which they consider as nuisance but exist in the noise (such as head motion). The other way is to take the first n_nureg principal components in the residual after subtracting the response to the design matrix from the data, and use these components as the nuisance regressor. This flag is for the second approach. If turned on, PCA or factor analysis will be applied to the residuals to obtain new nuisance regressors in each round of fitting. These two approaches can be combined. If the users provide nuisance regressors and set this flag as True, then the first n_nureg principal components of the residuals after subtracting both the responses to design matrix and the user-supplied nuisance regressors will be used in addition to the nuisance regressors provided by the users. Note that nuisance regressor is not required from user. If it is not provided, DC components for each run will be included as nuisance regressor regardless of the auto_nuisance parameter.

• n_nureg (Optional[int]) – Number of nuisance regressors to use in order to model signals shared across voxels not captured by the design matrix. This number is in addition to any nuisance regressor that the user has already provided. If set to None, the number of nuisance regressors will be automatically determined based on M Gavish and D Donoho’s approximate estimation of optimal hard threshold for singular values. This only takes effect if auto_nuisance is True.

• nureg_zscore (boolean.) – A flag to tell the algorithm whether data is z-scored before estimating the number of nuisance regressor components necessary to account for spatial noise correlation. It also determinie whether the residual noise is z-scored before estimating the nuisance regressors from residual. This only takes effect if auto_nuisance is True.

• nureg_method (string, naming a method from sklearn.decomposition.) – ‘PCA’, ‘ICA’, ‘FA’ or ‘SPCA’ are currently supported. The method to estimate the shared component in noise across voxels. This only takes effect if auto_nuisance is True.

• baseline_single (boolean.) – A time course of constant 1 will be included to the nuisance regressor regardless of whether the user requests. If baseline_single is set to False, one such regressor is included for each fMRI run, but a single component in beta0_ will be computed as the average of the weight maps corresponding to these regressors. This might cause underestimation of noise variance. If baseline_single is True, only one regressor of constant 1 will be used for the whole dataset. This might be desirable if you believe the average image intensity might not scale with the same proportion for different voxels across scan. In other words, it is possible that some part of the brain is more vulnerable to change in baseline intensity due to facts such as field inhomogeneity. Setting baseline_single to True will force the nuisance regressors automatically estimated from residuals to capture this. However, when each task condition only occurs in one run and when the design matrix in each run sums together close to a flat line, this option can cause the estimated similarity to be extremely high between conditions occuring in the same run.

• GP_space (boolean.) – Whether to impose a Gaussion Process (GP) prior on the log(pseudo-SNR). If true, the GP has a kernel defined over spatial coordinate of each voxel. The idea behind this option is that adjacent voxels should have similar SNRs. This is relatively slow for big ROI. We find that when SNR is generally low, smoothness can be overestimated. But such regularization may reduce variance in the estimated SNR map and similarity matrix.

• GP_inten (boolean.) – Whether to include a kernel defined over the intensity of image. GP_space should be True as well if you want to use this, because the smoothness should be primarily in space. Smoothness in intensity is just complementary. The idea behind this option is that voxels should have similar SNRs when they are both adjacent (imposed by GP_space) and are of the same tissue type (when their image intensities are close). If you accept the second assumption, then you can set GP_inten as True and provide an array to the inten variable, expressing the intensities (brightness) for each voxel.

• space_smooth_range (float.) – The distance (in unit the same as what you would use when supplying the spatial coordiates of each voxel, typically millimeter) which you believe is the maximum range of the length scale parameter of Gaussian Process defined over voxel location. This is used to impose a half-Cauchy prior on the length scale. If set to None, the program will default to half of the maximum distance between all voxels.

• inten_smooth_range (float.) – The difference in image intensity which you believe is the maximum range of plausible length scale for the Gaussian Process defined over image intensity. Length scales larger than this are allowed, but will be penalized. If set to None, this parameter will default to half of the maximal intensity difference.

• tau_range (float.) – The reasonable range of the standard deviation of log(SNR). This range should not be too large. 5 is a loose range. When a Gaussian Process is imposed on the log(SNR), this parameter is used in a half-Cauchy prior on the standard deviation, or an inverse-Gamma prior on the variance of the GP.

• tau2_prior (Callable[[float, int, float]], [float, float]],) –

Default: prior_GP_var_inv_gamma. Can be prior_GP_var_inv_gamma or prior_GP_var_half_cauchy, or a custom function. The function which impose a prior for tau^2, the variance of the GP prior on log(SNR), and returns the MAP estimate of tau^2. It can be either prior_GP_var_inv_gamma for inverse-Gamma or prior_GP_var_half_cauchy for half-Cauchy. half-Cauchy prior is in fact imposed on tau. But tau_range describes the range of tau in the prior in both cases. Both functions are part of brsa module. See also prior_GP_var_inv_gamma and prior_GP_var_half_cauchy To use the default inverse-Gamma prior, you can ignore this argument:

from brainiak.reprsimil.brsa import BRSA
brsa = BRSA()


If you want to try the alternative half-Cauchy prior, then you need to import it in addition to BRSA:

from brainiak.reprsimil.brsa import BRSA, prior_GP_var_half_cauchy
brsa = BRSA(tau2_prior=prior_GP_var_half_cauchy)


• eta (float.) – A small number added to the diagonal element of the covariance matrix in the Gaussian Process prior. This is to ensure that the matrix is invertible.

• init_iter (int.) – How many initial iterations to fit the model without introducing the GP prior before fitting with it, if GP_space or GP_inten is requested. This initial fitting is to give the parameters a good starting point.

• optimizer (str or callable.) – The optimizer to use for minimizing cost function which scipy.optimize.minimize can accept. We use ‘L-BFGS-B’ as a default. Users can try other strings corresponding to optimizer provided by scipy.optimize.minimize, or a custom optimizer, such as ‘BFGS’ or ‘CG’. Note that BRSA fits a lot of parameters. So a chosen optimizer should accept gradient (Jacobian) of the cost function. Otherwise the fitting is likely to be unbarely slow. We do not calculate Hessian of the objective function. So an optimizer which requires Hessian cannot be used.

• random_state (RandomState or an int seed.) – A random number generator instance to define the state of the random permutations generator whenever the module needs to generate random number (e.g., initial parameter of the Cholesky factor).

• anneal_speed (float.) – Annealing is introduced in fitting of the Cholesky decomposition of the shared covariance matrix. The amount of perturbation decays exponentially. This parameter sets the ratio of the maximum number of iteration to the time constant of the exponential. anneal_speed=10 means by n_iter/10 iterations, the amount of perturbation is reduced by 2.713 times.

• minimize_options (dictionary.) – Default: {‘gtol’: 1e-4, ‘disp’: False, ‘maxiter’: 6} This is the dictionary passed as the options argument to scipy.optimize.minize which minimizes the cost function during fitting. Notice that the minimization is performed for many times, alternating between optimizing the covariance matrix U underlying the pattern similarity matrix, and SNR. At most n_iter times of this alternation is performed. So within each step of fitting, the step of iteration performed by scipy.optimize.minize does not have to be very large. In other words, scipy.optimize.minize does not need to converge within each step of the alternating fitting procedure.

• tol (float.) – Tolerance parameter passed to scipy.optimize.minimize. It is also used for determining convergence of the alternating fitting procedure.

U_

The shared covariance matrix.

Type

numpy array, shape=[condition,condition]

L_

The Cholesky factor of the shared covariance matrix (lower-triangular matrix).

Type

numpy array, shape=[condition,rank]

C_

The correlation matrix derived from the shared covariance matrix. This is the estimated similarity matrix between neural patterns to your task conditions. Notice that it is recommended that you also check U_, which is the covariance matrix underlying this correlation matrix. In cases there is almost no response to your task conditions, the diagonal values of U_ would become very small and C_ might contain many correlation coefficients close to 1 or -1. This might not reflect true strong correlation or strong negative correlation, but a result of lack of task-related neural activity, design matrix that does not match true neural response, or not enough data. It is also recommended to check nSNR_ after mapping it back to the brain. A “reasonable” map should at least have higher values in gray matter in than white matter.

Type

numpy array, shape=[condition,condition]

nSNR_

The normalized pseuso-SNR of all voxels. They are normalized such that the geometric mean is 1. Note that this attribute can not be interpreted as true SNR, but the relative ratios between voxel indicates the contribution of each voxel to the representational similarity structure.

Type

numpy array, shape=[voxels,]

sigma_

The estimated standard deviation of the noise in each voxel Assuming AR(1) model, this means the standard deviation of the innovation noise.

Type

numpy array, shape=[voxels,]

rho_

The estimated autoregressive coefficient of each voxel

Type

numpy array, shape=[voxels,]

bGP_

The standard deviation of the GP prior

Type

float, only if GP_space or GP_inten is True.

lGPspace_

The length scale of Gaussian Process prior of log(SNR)

Type

float, only if GP_space or GP_inten is True

lGPinten_

The length scale in fMRI intensity of the GP prior of log(SNR)

Type

float, only if GP_inten is True

beta_

The maximum a posterior estimation of the response amplitudes of each voxel to each task condition.

Type

array, shape=[conditions, voxels]

beta0_

The loading weights of each voxel for the shared time courses not captured by the design matrix. This helps capture the structure of spatial covariance of task-unrelated signal. n_base is the number of columns of the user-supplied nuisance regressors plus one for DC component

Type

numpy array, shape=[n_nureg + n_base, voxels]

X0_

The estimated time course that is shared across voxels but unrelated to the events of interest (design matrix).

Type

numpy array, shape=[time_points, n_nureg + n_base]

beta0_null_

The equivalent of beta0_ in a null model which does not include the design matrix and response pattern beta.

Type

numpy array, shape=[n_nureg + n_base, voxels]

X0_null_

The equivalent of X0_ in a null model which does not include the design matrix and response pattern beta

Type

numpy array, shape=[time_points, n_nureg + n_base]

n_nureg_

Number of nuisance regressor in addition to such regressors provided by the user (if any), if auto_nuisance is set to True. If n_nureg is set to ‘opt’, this will be estimated from data. ‘opt’ will use M Gavish and D Donoho’s approximate estimation of optimal hard threshold for singular values.

Type

int

random_state_

Random number generator initialized using random_state.

Type

RandomState

fit(X, design, nuisance=None, scan_onsets=None, coords=None, inten=None)

Compute the Bayesian RSA

Parameters
• X (numpy array, shape=[time_points, voxels]) – If you have multiple scans of the same participants that you want to analyze together, you should concatenate them along the time dimension after proper preprocessing (e.g. spatial alignment), and specify the onsets of each scan in scan_onsets.

• design (numpy array, shape=[time_points, conditions]) – This is the design matrix. It should only include the hypothetic response for task conditions. You should not include regressors for a DC component or motion parameters, unless you want to estimate their pattern similarity with response patterns to your task conditions. If you want to model head motion, you should include them in nuisance regressors. If you have multiple run, the design matrix of all runs should be concatenated along the time dimension, with every column for one condition across runs. For example, if you have 3 runs of experiment of one participant, with each run lasting 200 TR. And you have 4 conditions, then design should be a 600 x 4 numpy array.

• nuisance (optional, numpy array, shape=[time_points, nuisance_factors]) – The responses to these regressors will be marginalized out from each voxel, which means they are considered, but won’t be assumed to share the same pseudo-SNR map with the design matrix. Therefore, the pseudo-SNR map will only reflect the relative contribution of design matrix to each voxel. You can provide time courses such as those for head motion to this parameter. Note that if auto_nuisance is set to True, the first n_nureg principal components of residual (excluding the response to the design matrix and the user-provided nuisance regressors and a constant baseline) will be included as additional nuisance regressor after the first round of fitting. If auto_nuisance is set to False, the nuisance regressors supplied by the users together with DC components will be used as nuisance time series. Please do not include time course of constant baseline in nuisance.

• scan_onsets (optional, numpy array, shape=[runs,]) – This specifies the indices of X which correspond to the onset of each scanning run. For example, if you have two experimental runs of the same subject, each with 100 TRs, then scan_onsets should be [0,100]. If you do not provide the argument, the program will assume all data are from the same run. The effect of them is to make the inverse matrix of the temporal covariance matrix of noise block-diagonal.

• coords (optional, numpy array, shape=[voxels,3]) – This is the coordinate of each voxel, used for implementing Gaussian Process prior.

• inten (optional, numpy array, shape=[voxel,]) – This is the average fMRI intensity in each voxel. It should be calculated from your data without any preprocessing such as z-scoring. Because it should reflect whether a voxel is bright (grey matter) or dark (white matter). A Gaussian Process kernel defined on both coordinate and intensity imposes a smoothness prior on adjcent voxels but with the same tissue type. The Gaussian Process is experimental and has shown good performance on some visual datasets.

score(X, design, scan_onsets=None)
Use the model and parameters estimated by fit function

from some data of a participant to evaluate the log likelihood of some new data of the same participant. Design matrix of the same set of experimental conditions in the testing data should be provided, with each column corresponding to the same condition as that column in the design matrix of the training data. Unknown nuisance time series will be marginalized, assuming they follow the same spatial pattern as in the training data. The hypothetical response captured by the design matrix will be subtracted from data before the marginalization when evaluating the log likelihood. For null model, nothing will be subtracted before marginalization.

There is a difference between the form of likelihood function used in fit() and score(). In fit(), the response amplitude beta to design matrix X and the modulation beta0 by nuisance regressor X0 are both marginalized, with X provided and X0 estimated from data. In score(), posterior estimation of beta and beta0 from the fitting step are assumed unchanged to testing data and X0 is marginalized. The logic underlying score() is to transfer as much as what we can learn from training data when calculating a likelihood score for testing data.

If you z-scored your data during fit step, you should z-score them for score function as well. If you did not z-score in fitting, you should not z-score here either.

Parameters
• X (numpy arrays, shape=[time_points, voxels]) – fMRI data of new data of the same subject. The voxels should match those used in the fit() function. If data are z-scored (recommended) when fitting the model, data should be z-scored as well when calling transform()

• design (numpy array, shape=[time_points, conditions]) – Design matrix expressing the hypothetical response of the task conditions in data X.

• scan_onsets (numpy array, shape=[number of runs]) – A list of indices corresponding to the onsets of scans in the data X. If not provided, data will be assumed to be acquired in a continuous scan.

Returns

• ll (float.) – The log likelihood of the new data based on the model and its parameters fit to the training data.

• ll_null (float.) – The log likelihood of the new data based on a null model which assumes the same as the full model for everything except for that there is no response to any of the task conditions.

transform(X, y=None, scan_onsets=None)
Use the model to estimate the time course of response to

each condition (ts), and the time course unrelated to task (ts0) which is spread across the brain. This is equivalent to “decoding” the design matrix and nuisance regressors from a new dataset different from the training dataset on which fit() was applied. An AR(1) smooth prior is imposed on the decoded ts and ts0 with the AR(1) parameters learnt from the corresponding time courses in the training data. Notice: if you set the rank to be lower than the number of experimental conditions (number of columns in the design matrix), the recovered task-related activity will have collinearity (the recovered time courses of some conditions can be linearly explained by the recovered time courses of other conditions).

Parameters
• X (numpy arrays, shape=[time_points, voxels]) – fMRI data of new data of the same subject. The voxels should match those used in the fit() function. If data are z-scored (recommended) when fitting the model, data should be z-scored as well when calling transform()

• y (not used (as it is unsupervised learning)) –

• scan_onsets (numpy array, shape=[number of runs]) – A list of indices corresponding to the onsets of scans in the data X. If not provided, data will be assumed to be acquired in a continuous scan.

Returns

• ts (numpy arrays, shape = [time_points, condition]) – The estimated response to the task conditions which have the response amplitudes estimated during the fit step.

• ts0 (numpy array, shape = [time_points, n_nureg]) – The estimated time course spread across the brain, with the loading weights estimated during the fit step.

class brainiak.reprsimil.brsa.GBRSA(n_iter=100, rank=None, auto_nuisance=True, n_nureg=None, nureg_zscore=True, nureg_method='PCA', baseline_single=False, logS_range=1.0, SNR_prior='exp', SNR_bins=21, rho_bins=20, tol=0.0001, optimizer='L-BFGS-B', minimize_options={'disp': False, 'gtol': 0.0001, 'maxiter': 20}, random_state=None, anneal_speed=10)

Group Bayesian representational Similarity Analysis (GBRSA)

Given the time series of neural imaging data in a region of interest (ROI) and the hypothetical neural response (design matrix) to each experimental condition of interest, calculate the shared covariance matrix of the voxels(recording unit)’ response to each condition, and the relative SNR of each voxels. The relative SNR could be considered as the degree of contribution of each voxel to this shared covariance matrix. A correlation matrix converted from the covariance matrix will be provided as a quantification of neural representational similarity. Both tools provide estimation of SNR and noise parameters at the end, and both tools provide empirical Bayesian estimates of activity patterns beta, together with weight map of nuisance signals beta0.

The differences of this tool from BRSA are: (1) It allows fitting a shared covariance matrix (which can be converted to similarity matrix) across multiple subjects. This is analogous to SRM under funcalign submodule. Because of using multiple subjects, the result is less noisy. (2) In the fitting process, the SNR and noise parameters are marginalized for each voxel. Therefore, this tool should be faster than BRSA when analyzing an ROI of hundreds to thousands voxels. It does not provide a spatial smoothness prior on SNR though. (3) The voxel-wise pseudo-SNR and noise parameters estimated are posterior mean estimates, while those estimated by BRSA are maximum-a-posterior estimates. If your goal is to perform searchlight RSA with relatively fewer voxels on single subject, BRSA should be faster. However, GBRSA can in principle be used together with searchlight in a template space such as MNI.

\begin{align}\begin{aligned}Y = X \cdot \beta + X_0 \cdot \beta_0 + \epsilon\\\beta_i \sim N(0,(s_{i} \sigma_{i})^2 U)\end{aligned}\end{align}

See also BRSA.

Please note that the model assumes that the covariance matrix U which all beta_i follow is zero-meaned. For more details of its implication, see documentation of BRSA

Parameters
• n_iter (int.) – Number of maximum iterations to run the algorithm.

• rank (int.) – The rank of the covariance matrix. If not provided, the covariance matrix will be assumed to be full rank. When you have many conditions (e.g., calculating the similarity matrix of responses to each event), you might want to start with specifying a lower rank and use metrics such as AIC or BIC to decide the optimal rank. The log likelihood for the fitted data can be retrieved through private attributes _LL_train_. Note that this log likelihood score is only used here for selecting hyperparameters such as rank. For any formal model comparison, we recommend using score() function on left-out data.

• auto_nuisance (Boolean.) – In order to model spatial correlation between voxels that cannot be accounted for by common response captured in the design matrix, we assume that a set of time courses not related to the task conditions are shared across voxels with unknown amplitudes. One approach is for users to provide time series which they consider as nuisance but exist in the noise (such as head motion). The other way is to take the first n_nureg principal components in the residual after subtracting the response to the design matrix from the data, and use these components as the nuisance regressor. This flag is for the second approach. If turned on, PCA or factor analysis will be applied to the residuals to obtain new nuisance regressors in each round of fitting. These two approaches can be combined. If the users provide nuisance regressors and set this flag as True, then the first n_nureg principal components of the residuals after subtracting both the responses to design matrix and the user-supplied nuisance regressors will be used in addition to the nuisance regressors provided by the users. Note that nuisance regressor is not required from user. If it is not provided, DC components for each run will be included as nuisance regressor regardless of the auto_nuisance parameter.

• n_nureg (Optional[int]) – Number of nuisance regressors to use in order to model signals shared across voxels not captured by the design matrix. This number is in addition to any nuisance regressor that the user has already provided. If set to None, the number of nuisance regressors will be automatically determined based on M Gavish and D Donoho’s approximate estimation of optimal hard threshold for singular values. (Gavish & Donoho, IEEE Transactions on Information Theory 60.8 (2014): 5040-5053.) This only takes effect if auto_nuisance is True.

• nureg_zscore (Boolean.) – A flag to tell the algorithm whether data is z-scored before estimating the number of nuisance regressor components necessary to account for spatial noise correlation. It also determinie whether the residual noise is z-scored before estimating the nuisance regressors from residual. This only takes effect if auto_nuisance is True.

• nureg_method (string, naming a method from sklearn.decomposition.) – ‘PCA’, ‘ICA’, ‘FA’ or ‘SPCA’ are currently supported. The method to estimate the shared component in noise across voxels. This only takes effect if auto_nuisance is True.

• baseline_single (Boolean.) – A time course of constant 1 will be included to the nuisance regressor for each participant. If baseline_single is set to False, one such regressor is included for each fMRI run, but at the end of fitting, a single component in beta0_ will be computed as the average of the weight maps corresponding to these regressors. This might cause underestimation of noise variance. If baseline_single is True, only one regressor of constant 1 will be used for the whole dataset. This might be desirable if you believe the average image intensity might not scale with the same proportion for different voxels across scan. In other words, it is possible that some part of the brain is more vulnerable to change in baseline intensity due to facts such as field inhomogeneity. Setting baseline_single to True will force the nuisance regressors automatically estimated from residuals to capture this. However, when each task condition only occurs in one run and when the design matrix in each run sums together close to a flat line, this option can cause the estimated similarity to be extremely high between conditions occuring in the same run.

• SNR_prior (string.) – The type of prior for pseudo-SNR. If set to ‘exp’, truncated exponential distribution with scale parameter of 1 is imposed on pseudo-SNR. If set to ‘lognorm’, a truncated log normal prior is imposed. In this case, the standard deviation of log(SNR) is set by the parameter logS_range. If set to ‘unif’, a uniform prior in [0,1] is imposed. In all above cases, SNR is numerically marginalized on a grid of parameters. So the parameter SNR_bins determines how accurate the numerical integration is. The more number of bins are used, the more accurate the numerical integration becomes. If set to ‘equal’, all voxels are assumed to have the same fixed SNR. Pseudo-SNR is 1.0 for all voxels. In all the cases, the grids used for pseudo-SNR do not really set an upper bound for SNR, because the real SNR is determined by both pseudo-SNR and U, the shared covariance structure.

• logS_range (float.) – The reasonable range of the spread of SNR in log scale. This parameter only takes effect if SNR_prior is set to ‘lognorm’. It is effectively the s parameter of scipy.stats.lognorm, or the standard deviation of the distribution in log scale. logS_range specifies how variable you believe the SNRs to vary across voxels in log scale. This range should not be set too large, otherwise the fitting may encounter numerical issue. If it is set too small, the estimated SNR will turn to be too close to each other and the estimated similarity matrix might overfit to voxels of low SNR. If you increase logS_range, it is recommended to increase SNR_bins accordingly, otherwise the pseudo-SNR values evaluated might be too sparse, causing the posterior pseudo-SNR estimations to be clustered around the bins.

• SNR_bins (integer.) – The number of bins used to numerically marginalize the pseudo-SNR parameter. In general, you should try to choose a large number to the degree that decreasing SNR_bins does not change the result of fitting result. However, very large number of bins also causes slower computation and larger memory consumption. For SNR_prior=’lognorm’, the default value 21 is based on the default value of logS_range=1.0 and bin width of 0.3 on log scale. But it is also a reasonable choice for the other two options for SNR_prior.

• rho_bins (integer.) – The number of bins to divide the region of (-1, 1) for rho. This only takes effect for fitting the marginalized version. If set to 20, discrete numbers of {-0.95, -0.85, …, 0.95} will be used to numerically integrate rho from -1 to 1.

• optimizer (str or callable.) – The optimizer to use for minimizing cost function which scipy.optimize.minimize can accept. We use ‘L-BFGS-B’ as a default. Users can try other strings corresponding to optimizer provided by scipy.optimize.minimize, or a custom optimizer, such as ‘BFGS’ or ‘CG’. Note that BRSA fits a lot of parameters. So a chosen optimizer should accept gradient (Jacobian) of the cost function. Otherwise the fitting is likely to be unbarely slow. We do not calculate Hessian of the objective function. So an optimizer which requires Hessian cannot be used.

• minimize_options (dictionary.) – This is the dictionary passed as the options argument to scipy.optimize.minize which minimizes the cost function during fitting. Notice that the minimization is performed for up to n_iter times, with the nuisance regressor re-estimated each time. So within each of the n_iter steps of fitting, scipy.optimize.minize does not need to fully converge. The key ‘maxiter’ in this dictionary determines the maximum number of iteration done by scipy.optimize.minimize within each of the n_iter steps of fitting.

• tol (float.) – Tolerance parameter passed to scipy.optimize.minimize. It is also used for determining convergence of the alternating fitting procedure.

• random_state (RandomState or an int seed.) – A random number generator instance to define the state of the random permutations generator whenever the module needs to generate random number (e.g., initial parameter of the Cholesky factor).

• anneal_speed (float.) – Annealing is introduced in fitting of the Cholesky decomposition of the shared covariance matrix. The amount of perturbation decays exponentially. This parameter sets the ratio of the maximum number of iteration to the time constant of the exponential. anneal_speed=10 means by n_iter/10 iterations, the amount of perturbation is reduced by 2.713 times.

U_

The shared covariance matrix

Type

numpy array, shape=[condition,condition]

L_

The Cholesky factor of the shared covariance matrix (lower-triangular matrix).

Type

numpy array, shape=[condition,condition]

C_

The correlation matrix derived from the shared covariance matrix. This is the estimated similarity matrix between neural patterns to your task conditions. Notice that it is recommended that you also check U_, which is the covariance matrix underlying this correlation matrix. In cases there is almost no response to your task conditions, the diagonal values of U_ would become very small and C_ might contain many correlation coefficients close to 1 or -1. This might not reflect true strong correlation or strong negative correlation, but a result of lack of task-related neural activity, design matrix that does not match true neural response, or not enough data. It is also recommended to check nSNR_ after mapping it back to the brain. A “reasonable” map should at least have higher values in gray matter in than white matter.

Type

numpy array, shape=[condition,condition]

nSNR_

The pseuso-SNR of all voxels. If SNR_prior=’lognormal’, the geometric mean of nSNR_ would be approximately 1. If SNR_prior=’unif’, all nSNR_ would be in the range of (0,1). If SNR_prior=’exp’ (default), the range of values would vary depending on the data and SNR_bins, but many should have low values with few voxels with high values. Note that this attribute can not be interpreted as true SNR, but the relative ratios between voxels indicate the contribution of each voxel to the representational similarity structure.

Type

list of numpy arrays, shape=[voxels,] for each subject in the list.

sigma_

The estimated standard deviation of the noise in each voxel Assuming AR(1) model, this means the standard deviation of the innovation noise.

Type

list of numpy arrays, shape=[voxels,] for each subject.

rho_

The estimated autoregressive coefficient of each voxel

Type

list of numpy arrays, shape=[voxels,] for each subject.

beta_

The posterior mean estimation of the response amplitudes of each voxel to each task condition.

Type

list of numpy arrays, shape=[conditions, voxels] for each subject.

beta0_

for each subject. The loading weights of each voxel for the shared time courses not captured by the design matrix. n_base is the number of columns of the user-supplied nuisance regressors plus one for DC component.

Type

list of numpy arrays, shape=[n_nureg + n_base, voxels]

X0_

for each subject. The estimated time course that is shared across voxels but unrelated to the events of interest (design matrix).

Type

list of numpy arrays, shape=[time_points, n_nureg + n_base]

beta0_null_

for each subject. The equivalent of beta0_ in a null model which does not include the design matrix and response pattern beta

Type

list of numpy arrays, shape=[n_nureg + n_base, voxels]

X0_null_

for each subject. The equivalent of X0_ in a null model which does not include the design matrix and response pattern beta

Type

list of numpy arrays, shape=[time_points, n_nureg + n_base]

n_nureg_

Number of nuisance regressor used to model the spatial noise correlation of each participant.

Type

1-d numpy array

random_state_

Random number generator initialized using random_state.

Type

RandomState

fit(X, design, nuisance=None, scan_onsets=None)

Fit the model to data of all participants jointly.

Parameters
• X (list of numpy arrays, shape=[time_points, voxels] for each entry.) – Data to be fitted. Each participant corresponds to one item in the list. If you have multiple scans of the same participants that you want to analyze together, you should concatenate them along the time dimension after proper preprocessing (e.g. spatial alignment), and specify the onsets of each scan in scan_onsets.

• design (list of numpy arrays, shape=[time_points, conditions] for each.) – This is the design matrix of each participant. It should only include the hypothetic response for task conditions. You should not include regressors for a DC component or motion parameters, unless with a strong reason. If you want to model head motion, you should include them in nuisance regressors. If you have multiple run, the design matrix of all runs should be concatenated along the time dimension for each participant, with every column for one condition across runs. If the design matrix is the same for all subjects, either provide a list as required, or provide single numpy array.

• nuisance (optional, list of numpy arrays,) – shape=[time_points, nuisance_factors] for each subject in the list. Nuisance regressors of each participant. The responses to these regressors will be marginalized out from each voxel, which means they are considered, but won’t be assumed to share the same pseudo-SNR map with the design matrix. Therefore, the pseudo-SNR map will only reflect the relative contribution of design matrix to each voxel. You can provide time courses such as those for head motion to this parameter. Note that if auto_nuisance is set to True, the first n_nureg principal components of residual (excluding the response to the design matrix and the user-provided nuisance regressors) will be included as additional nuisance regressor after the first round of fitting. If auto_nuisance is set to False, the nuisance regressors supplied by the users together with DC components will be used as nuisance time series.

• scan_onsets (optional, list numpy arrays, shape=[runs,] for each.) – Each item in the list specifies the indices of X which correspond to the onset of each scanning run for one participant. For example, if you have two experimental runs of the first participant, each with 100 TRs, and one run of the second participant, with 150 TR, then scan_onsets should be [ndarry([0, 100]), ndarry([150])]. The effect of this argument is to make the inverse matrix of the temporal covariance matrix of noise block-diagonal. If you do not provide the argument, the program will assume all data are from the same run for each participant.

score(X, design, scan_onsets=None)
After fit() is applied to the data of a group of participants,

use the parameters estimated by fit() function to evaluate from some data of a set of participants to evaluate the log likelihood of some new data of the same participants given these estimated parameters. Design matrices of the same set of experimental conditions in the testing data should be provided, with each column corresponding to the same condition as that column in the design matrix of the training data. Unknown nuisance time series will be marginalized, assuming they follow the same spatial pattern as in the training data. The hypothetical response captured by the design matrix will be subtracted from data before the marginalization when evaluating the log likelihood. For null model, nothing will be subtracted before marginalization.

There is a difference between the form of likelihood function used in fit() and score(). In fit(), the response amplitude beta to design matrix X and the modulation beta0 by nuisance regressor X0 are both marginalized, with X provided and X0 estimated from data. In score(), posterior estimation of beta and beta0 from the fitting step are assumed unchanged in testing data; X is assumed given by the user, and X0 is marginalized. The logic underlying score() is to transfer as much as what we can learn from training data when calculating a likelihood score for testing data. This is done at the cost of using point estimation for beta and beta0.

If you z-scored your data during fit step, you should z-score them for score function as well. If you did not z-score in fitting, you should not z-score here either.

Parameters
• X (List of 2-D arrays. For each item, shape=[time_points, voxels]) – fMRI data of new data of the same participants. The voxels of each participants should match those used in the fit() function. If data are z-scored (recommended) when fitting the model, data should be z-scored as well when calling transform()

• design (List of 2-D arrays. shape=[time_points, conditions] for each) – Each corresponds to one participant. Design matrices expressing the hypothetical response of the task conditions in data X.

• scan_onsets (List of 2-D arrays, shape=[#fMRI runs] for each) – Each array corresponds to one participant. Lists of indices corresponding to the onsets of scans in the data X. If not provided, data will be assumed to be acquired in a continuous scan.

Returns

• ll (list, shape=[number of participants]) – The log likelihoods of the new data based on the model and its parameters fit to the training data. If data of some participants are not provided, the corresponding entry will be None.

• ll_null (list, shape=[number of participants]) – The log likelihood of the new data based on a null model which assumes the same as the full model for everything except for that there is no response to any of the task conditions.

transform(X, y=None, scan_onsets=None)
Use the model to estimate the time course of response to

each condition (ts), and the time course unrelated to task (ts0) which is spread across the brain. This is equivalent to “decoding” the design matrix and nuisance regressors from a new dataset different from the training dataset on which fit() was applied. An AR(1) smooth prior is imposed on the decoded ts and ts0 with the AR(1) parameters learnt from the corresponding time courses in the training data.

Parameters
• X (list of 2-D arrays. For each item, shape=[time_points, voxels]) – New fMRI data of the same subjects. The voxels should match those used in the fit() function. The size of the list should match the size of the list X fed to fit(), with each item in the list corresponding to data from the same subject in the X fed to fit(). If you do not need to transform some subjects’ data, leave the entry corresponding to that subject as None. If data are z-scored when fitting the model, data should be z-scored as well when calling transform()

• y (not used (as it is unsupervised learning)) –

• scan_onsets (list of 1-D numpy arrays,) – Each array corresponds to the onsets of scans in the data X for the particular subject. If not provided, data will be assumed to be acquired in a continuous scan.

Returns

• ts (list of 2-D arrays. For each, shape = [time_points, condition]) – The estimated response to the cognitive dimensions （task dimensions) whose response amplitudes were estimated during the fit step. One item for each subject. If some subjects’ data are not provided, None will be returned.

• ts0 (list of 2-D array. For each, shape = [time_points, n_nureg]) – The estimated time courses spread across the brain, with the loading weights estimated during the fit step. One item for each subject. If some subjects’ data are not provided, None will be returned.

brainiak.reprsimil.brsa.Ncomp_SVHT_MG_DLD_approx(X, zscore=True)
This function implements the approximate calculation of the

optimal hard threshold for singular values, by Matan Gavish and David L. Donoho: “The optimal hard threshold for singular values is 4 / sqrt(3)” http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6846297

Parameters
• X (2-D numpy array of size [n_T, n_V]) – The data to estimate the optimal rank for selecting principal components.

• zscore (Boolean) – Whether to z-score the data before calculating number of components.

Returns

ncomp – The optimal number of components determined by the method of MG and DLD

Return type

integer

brainiak.reprsimil.brsa.prior_GP_var_half_cauchy(y_invK_y, n_y, tau_range)

Imposing a half-Cauchy prior onto the standard deviation (tau) of the Gaussian Process which is in turn a prior imposed over a function y = f(x). The scale parameter of the half-Cauchy prior is tau_range. The function returns the MAP estimate of tau^2 and log(p(tau|tau_range)) for the MAP value of tau^2, where tau_range describes the reasonable range of tau in the half-Cauchy prior. An alternative form of prior is inverse-Gamma prior on tau^2. Inverse-Gamma prior penalizes for both very small and very large values of tau, while half-Cauchy prior only penalizes for very large values of tau. For more information on usage, see description in BRSA class: BRSA

brainiak.reprsimil.brsa.prior_GP_var_inv_gamma(y_invK_y, n_y, tau_range)
Imposing an inverse-Gamma prior onto the variance (tau^2)

parameter of a Gaussian Process, which is in turn a prior imposed over an unknown function y = f(x). The inverse-Gamma prior of tau^2, tau^2 ~ invgamma(shape, scale) is described by a shape parameter alpha=2 and a scale parameter beta=tau_range^2. tau_range describes the reasonable range of tau in the inverse-Gamma prior. The data y’s at locations x’s are assumed to follow Gaussian Process: f(x, x’) ~ N(0, K(x, x’) / 2 tau^2), where K is a kernel function defined on x. For n observations, K(x1, x2, …, xn) is an n by n positive definite matrix. Given the prior parameter tau_range, number of observations n_y, and y_invK_y = y * inv(K) * y’, the function returns the MAP estimate of tau^2 and the log posterior probability of tau^2 at the MAP value: log(p(tau^2|tau_range)). This function is written primarily for BRSA but can also be used elsewhere. y in this case corresponds to the log of SNR in each voxel. GBRSA does not rely on this function. An alternative form of prior is half-Cauchy prior on tau. Inverse-Gamma prior penalizes for both very small and very large values of tau, while half-Cauchy prior only penalizes for very large values of tau. For more information on usage, see description in BRSA class: BRSA

See also: prior_GP_var_half_cauchy

Parameters
• y_invK_y (float) – y * inv(K) * y^T, where y=f(x) is a vector of observations of unknown function f at different locations x. K is correlation matrix of f between different locations, based on a Gaussian Process (GP) describing the smoothness property of f. K fully incorporates the form of the kernel and the length scale of the GP, but not the variance of the GP (the purpose of this function is to estimate the variance).

• n_y (int, number of observations) –

• tau_range (float,) – The reasonable range of tau, the standard deviation of the Gaussian Process imposed on y=f(x). tau_range is parameter of the inverse-Gamma prior. Say, if you expect the standard deviation of the Gaussian process to be around 3, tau_range can be set to 3. The smaller it is, the more penalization is imposed on large variation of y.

Returns

• tau2 (The MAP estimation of tau^2 based on the prior on tau) – and y_invK_y.

• log_ptau (log(p(tau)) of the returned tau^2 based on the) – inverse-Gamma prior.