# brainiak.fcma package¶

Full correlation matrix analysis

The implementation is based on the work in [Wang2015-1] and [Wang2015-2].

 [Wang2015-1] Full correlation matrix analysis (FCMA): An unbiased method for task-related functional connectivity”, Yida Wang, Jonathan D Cohen, Kai Li, Nicholas B Turk-Browne. Journal of Neuroscience Methods, 2015.
 [Wang2015-2] “Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors”, Yida Wang, Michael J. Anderson, Jonathan D. Cohen, Alexander Heinecke, Kai Li, Nadathur Satish, Narayanan Sundaram, Nicholas B. Turk-Browne, Theodore L. Willke. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015.

## brainiak.fcma.classifier module¶

Full Correlation Matrix Analysis (FCMA)

Correlation-based training and prediction

class brainiak.fcma.classifier.Classifier(clf, num_processed_voxels=2000, epochs_per_subj=0)

Bases: sklearn.base.BaseEstimator

Correlation-based classification component of FCMA

The classifier first computes correlation of the input data, and normalizes them if needed, then uses the given classifier to train and/or predict the correlation data. NOTE: if the classifier is sklearn.svm.SVC with precomputed kernel, the test data may be provided in the fit method to compute the kernel matrix together with the training data to save the memory usage, but the test data will NEVER be seen in the model training.

Parameters: clf (class) – The classifier used, normally a classifier class of sklearn num_processed_voxels (int, default 2000) – Used for SVM with precomputed kernel, every time it only computes correlation between num_process_voxels and the whole mask to aggregate the kernel matrices. This is to save the memory so as to handle correlations at a larger scale. epochs_per_subj (int, default 0) – The number of epochs of each subject within-subject normalization will be performed during classifier training if epochs_per_subj is specified default 0 means no within-subject normalization
training_data_

2D numpy array in shape [num_samples, num_features] – training_data_ is None except clf is SVM.SVC with precomputed kernel, in which case training data is needed to compute the similarity vector for each sample to be classified. However, if the test samples are also provided during the fit, the similarity vectors can be precomputed too and then training_datais None

test_raw_data_

a list of 2D array in shape [num_TRs, num_voxels] – default None test_raw_data_ is set after a prediction is called, if the new input data equals test_raw_data_, test_data_ can be reused

test_data_

2D numpy array in shape [num_samples, num_features] – default None test_data_ is set after a prediction is called, so that the test data does not need to be regenerated in the subsequent operations, e.g. getting decision values of the prediction. test_data_ may also be set in the fit method if sklearn.svm.SVC with precomputed kernel and the test samples are known. NOTE: the test samples will never be used to fit the model.

num_voxels_

int – The number of voxels of the first brain region used in the classifier. The first brain region is always large. When training, this region may be divided to compute the correlation portion by portion. The brain regions are defined by the applied mask, e.g. the top voxels selected by FCMA voxel selection

num_features_

int – The dimension of correlation data, normally is the product of the number of voxels of brain region 1 and the number of voxels of brain region 2. num_features_ must be consistent in both training and classification

num_samples_

int – The number of samples

num_digits_

int – The number of digits of the first value of the kernel matrix, for normalizing the kernel values accordingly

decision_function(X=None)

Output the decision value of the prediction.

if X is not equal to self.test_raw_data_, i.e. predict is not called, first generate the test_data after getting the test_data, get the decision value via self.clf. if X is None, test_data_ is ready to be used

Parameters: X (Optional[list of tuple (data1, data2)]) – data1 and data2 are numpy array in shape [num_TRs, num_voxels] to be computed for correlation. default None, meaning that the data to be predicted have been processed in the fit method. Otherwise, X contains the activity data filtered by ROIs and prepared for correlation computation. len(X) is the number of test samples. if len(X) > 1: normalization is done on all test samples. Within list, all data1s must have the same num_voxels value, all data2s must have the same num_voxels value. confidence the predictions confidence values of X, in shape [len(X),]
fit(X, y, num_training_samples=None)

Use correlation data to train a model.

First compute the correlation of the input data, and then normalize within subject if more than one sample in one subject, and then fit to a model defined by self.clf.

Parameters: X (list of tuple (data1, data2)) – data1 and data2 are numpy array in shape [num_TRs, num_voxels] to be computed for correlation. They contain the activity data filtered by ROIs and prepared for correlation computation. Within list, all data1s must have the same num_voxels value, all data2s must have the same num_voxels value. y (1D numpy array) – labels, len(X) equals len(y) num_training_samples (Optional[int]) – The number of samples used in the training. Set it to construct the kernel matrix portion by portion so the similarity vectors of the test data have to be computed here. Only set num_training_samples when sklearn.svm.SVC with precomputed kernel is used. If it is set, only those samples will be used to fit the model. self. Classifier
predict(X=None)

Use a trained model to predict correlation data.

first compute the correlation of the input data, and then normalize across all samples in the list if there are more than one sample, and then predict via self.clf. If X is None, use the similarity vectors produced in fit to predict

Parameters: X (Optional[list of tuple (data1, data2)]) – data1 and data2 are numpy array in shape [num_TRs, num_voxels] to be computed for correlation. default None, meaning that the data to be predicted have been processed in the fit method. Otherwise, X contains the activity data filtered by ROIs and prepared for correlation computation. len(X) is the number of test samples. if len(X) > 1: normalization is done on all test samples. Within list, all data1s must have the same num_voxels value, all data2s must have the same num_voxels value. y_pred the predicted label of X, in shape [len(X),]
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

NOTE: In the condition of sklearn.svm.SVC with precomputed kernel when the kernel matrix is computed portion by portion, the function will ignore the first input argument X.

Parameters: X (list of tuple (data1, data2)) – data1 and data2 are numpy array in shape [num_TRs, num_voxels] to be computed for correlation. They are test samples. They contain the activity data filtered by ROIs and prepared for correlation computation. Within list, all data1s must have the same num_voxels value, all data2s must have the same num_voxels value. len(X) is the number of test samples. y (1D numpy array) – labels, len(X) equals len(y), which is num_samples sample_weight (1D array in shape [num_samples], optional) – Sample weights. score – Mean accuracy of self.predict(X) wrt. y. float

## brainiak.fcma.cython_blas module¶

brainiak.fcma.cython_blas.compute_corr_vectors()

use blas API gemm wrapped by scipy to construct a correlation vector

The correlation vector is essentially correlation matrices computed from two activity matrices. It will be placed in the corresponding place of the resulting correlation data set. The blas APIs process matrices in column-major, but our matrices are in row-major, so we play the transpose trick here, i.e. A*B=(B^T*A^T)^T

py_trans_a: str do transpose or not for the first matrix A

py_trans_b: str do transpose or not for the first matrix B

py_m: int the row of the resulting matrix C

py_n: int the column of the resulting matrix C

py_k: int the collapsed dimension of the multiplying matrices i.e. the column of the first matrix after transpose if necessary the row of the second matrix after transpose if necessary

py_alpha: float the weight applied to the input matrix A

py_a: 2D array

py_lda: int the stride of the input matrix A

py_b: 2D array

py_ldb: int the stride of the input matrix B

py_beta: float the weight applied to the resulting matrix C

py_c: 2D array in shape [py_m, py_n] of column-major in fact it is in shape [py_n, py_m] of row-major

py_ldc: int the stride of the resulting matrix

py_start_voxel: int the starting voxel of assigned voxels used to locate the second matrix B

py_start_sample: int the processed sample used to locate the resulting matrix C

Returns: py_c (2D array) in shape [py_m, py_n] of column-major write the resulting matrix to the place indicated by py_start_sample
brainiak.fcma.cython_blas.compute_kernel_matrix()

use blas API syrk wrapped by scipy to compute kernel matrix of SVM

The blas APIs process matrices in column-major, but our matrices are in row-major, so we play the transpose trick here, i.e. A*B=(B^T*A^T)^T

In SVM with linear kernel, the distance of two samples is essentially the dot product of them. Therefore, the kernel matrix can be obtained by matrix multiplication. Since the kernel matrix is symmetric, ssyrk is used, the other half of the matrix is assigned later. In our case, the dimension of samples is much larger than the number samples, so we proportionally shrink the values of the kernel matrix for getting more robust alpha values in SVM iteration.

Parameters: py_uplo (str) – the upper or lower triangle of the matrix (getting) – py_trans (str) – transpose or not for the input matrix A (do) – py_n (int) – row and column of the resulting matrix C (the) – our case, is num_epochs (in) – py_k (int) – collapsed dimension of the multiplying matrices (the) – the column of the first matrix after transpose if necessary (i.e.) – row of the second matrix after transpose if necessary (the) – our case, is num_voxels (in) – py_alpha (float) – weight applied to the input matrix A (the) – py_a (3D array in shape [num_assigned_voxels, num_epochs, num_voxels]) – our case the normalized correlation values of a voxel (in) – py_start_voxel (int) – processed voxel (the) – to locate the input matrix A (used) – py_lda (int) – stride of the input matrix A (the) – py_beta (float) – weight applied to the resulting matrix C (the) – py_c (2D array in shape [num_epochs, num_epochs]) – to store the resulting kernel matrix (place) – py_ldc (int) – stride of the resulting matrix (the) – py_c (2D array in shape [num_epochs, num_epochs]) write the resulting kernel_matrix for the processing voxel
brainiak.fcma.cython_blas.compute_self_corr_for_voxel_sel()

use blas API sgemm wrapped by scipy to compute correlation

This method is limited to process self-correlation. The blas APIs process matrices in column-major, but our matrices are in row-major, so we play the transpose trick here, i.e. A*B=(B^T*A^T)^T. The resulting matrix in shape [num_assigned_voxels, num_voxels] is stored in an alternate way to make sure that the correlation vectors of the same voxel stored continuously

Parameters: py_trans_a (str) – transpose or not for the first matrix A (do) – py_trans_b (str) – transpose or not for the first matrix B (do) – py_m (int) – row of the resulting matrix C (the) – our case, is num_voxels (in) – py_n (int) – column of the resulting matrix C (the) – our case, is num_assigned_voxels (in) – py_k (int) – collapsed dimension of the multiplying matrices (the) – the column of the first matrix after transpose if necessary (i.e.) – row of the second matrix after transpose if necessary (the) – py_alpha (float) – weight applied to the first matrix A (the) – py_a (2D array in shape [epoch_length, num_voxels]) – is the activity data of an epoch, part 1 of the data to be (It) – with. Note that py_a can point to the same location of py_b. (correlated) – py_lda (int) – stride of the first matrix A (the) – py_start_voxel (int) – starting voxel of assigned voxels (the) – to locate the second matrix B (used) – py_b (2D array in shape [epoch_length, num_voxels]) – is the activity data of an epoch, part 2 of the data to be (It) – with. Note that py_a can point to the same location of py_b. – py_ldb (int) – stride of the second matrix B (the) – py_beta (float) – weight applied to the resulting matrix C (the) – py_c (3D array in shape [num_selected_voxels, num_epochs, num_voxels]) – to store the resulting correlation values (place) – py_ldc (int) – stride of the resulting matrix (the) – our case, num_voxels*num_epochs (in) – py_start_epoch (int) – epoch over which the correlation is computed (the) – py_c (3D array in shape [num_selected_voxels, num_epochs, num_voxels]) write the resulting correlation values in an alternate way for the processing epoch
brainiak.fcma.cython_blas.compute_single_matrix_multiplication()

use blas API gemm wrapped by scipy to do matrix multiplication

This is to compute the matrix multiplication. The blas APIs process matrices in column-major, but our matrices are in row-major, so we play the transpose trick here, i.e. A*B=(B^T*A^T)^T

Parameters: py_trans_a (str) – transpose or not for the first matrix A (do) – py_trans_b (str) – transpose or not for the first matrix B (do) – py_m (int) – row of the resulting matrix C (the) – py_n (int) – column of the resulting matrix C (the) – py_k (int) – collapsed dimension of the multiplying matrices (the) – the column of the first matrix after transpose if necessary (i.e.) – row of the second matrix after transpose if necessary (the) – py_alpha (float) – weight applied to the input matrix A (the) – py_a (2D array) – py_lda (int) – stride of the input matrix A (the) – py_b (2D array) – py_ldb (int) – stride of the input matrix B (the) – py_beta (float) – weight applied to the resulting matrix C (the) – py_c (2D array) – shape [py_m, py_n] of column-major (in) – fact it is (in) – shape [py_n, py_m] of row-major (in) – py_ldc (int) – stride of the resulting matrix (the) – py_c (2D array) in shape [py_m, py_n] of column-major write the resulting matrix
brainiak.fcma.cython_blas.compute_single_self_corr_gemm()

use blas API gemm wrapped by scipy to compute correlation matrix

This is to compute the correlation between selected voxels for final training and classification. Although the resulting correlation matrix is symmetric, in most cases, gemm performs better than syrk. Here we assume that the resulting matrix is stored in a compact way, i.e. py_ldc == py_n.

Parameters: py_trans_a (str) – transpose or not for the first matrix A (do) – py_trans_b (str) – transpose or not for the first matrix B (do) – py_m (int) – row of the resulting matrix C (the) – our case, is num_selected_voxels (in) – py_n (int) – column of the resulting matrix C (the) – our case, is num_selected_voxels – py_k (int) – collapsed dimension of the multiplying matrices (the) – the column of the first matrix after transpose if necessary (i.e.) – row of the second matrix after transpose if necessary (the) – our case, is num_TRs (in) – py_alpha (float) – weight applied to the input matrix A (the) – py_a (2D array in shape [num_TRs, num_selected_voxels]) – our case the normalized activity values (in) – multipliers are specified here as the same one (both) – py_lda (int) – stride of the input matrix A (the) – py_ldb (int) – stride of the input matrix B (the) – our case, the same as py_lda (in) – py_beta (float) – weight applied to the resulting matrix C (the) – py_c (3D array) – shape [num_samples, num_selected_voxels, num_selected_voxels] (in) – to store the resulting kernel matrix (place) – py_ldc (int) – stride of the resulting matrix (the) – py_start_sample (int) – processed sample (the) – to locate the resulting matrix C (used) – py_c (3D array) in shape [num_samples, num_selected_voxels, num_selected_voxels] write the resulting correlation matrices for the processed sample
brainiak.fcma.cython_blas.compute_single_self_corr_syrk()

use blas API syrk wrapped by scipy to compute correlation matrix

This is to compute the correlation between selected voxels for final training and classification. Since the resulting correlation matrix is symmetric, syrk is used. However, it looks like that in most cases, syrk performs much worse than gemm (the next function). Here we assume that the resulting matrix is stored in a compact way, i.e. py_ldc == py_n.

Parameters: py_uplo (str) – the upper or lower triangle of the matrix (getting) – py_trans (str) – transpose or not for the input matrix A (do) – py_n (int) – row and column of the resulting matrix C (the) – our case, is num_selected_voxels (in) – py_k (int) – collapsed dimension of the multiplying matrices (the) – the column of the first matrix after transpose if necessary (i.e.) – row of the second matrix after transpose if necessary (the) – our case, is num_TRs (in) – py_alpha (float) – weight applied to the input matrix A (the) – py_a (2D array in shape [num_TRs, num_selected_voxels]) – our case the normalized activity values (in) – py_lda (int) – stride of the input matrix A (the) – py_beta (float) – weight applied to the resulting matrix C (the) – py_c (3D array) – shape [num_samples, num_selected_voxels, num_selected_voxels] (in) – to store the resulting kernel matrix (place) – py_ldc (int) – stride of the resulting matrix (the) – py_start_sample (int) – processed sample (the) – to locate the resulting matrix C (used) – py_c (3D array) in shape [num_samples, num_selected_voxels, num_selected_voxels] write the resulting correlation matrices for the processed sample

## brainiak.fcma.mvpa_voxelselector module¶

Full Correlation Matrix Analysis (FCMA)

Activity-based voxel selection

class brainiak.fcma.mvpa_voxelselector.MVPAVoxelSelector(data, mask, labels, num_folds, sl)

Bases: object

Activity-based voxel selection component of FCMA

Parameters: data (4D array in shape [brain 3D + epoch]) – contains the averaged and normalized brain data epoch by epoch. It is generated by .io.prepare_searchlight_mvpa_data mask (3D array) – labels (1D array) – contains the labels of the epochs. It is generated by .io.prepare_searchlight_mvpa_data num_folds (int) – the number of folds to be conducted in the cross validation sl (Searchlight) – the distributed Searchlight object
run(clf)

run activity-based voxel selection

Sort the voxels based on the cross-validation accuracy of their activity vectors within the searchlight

Parameters: clf (classification function) – the classifier to be used in cross validation result_volume (3D array of accuracy numbers) – contains the voxelwise accuracy numbers obtained via Searchlight results (list of tuple (voxel_id, accuracy)) – the accuracy numbers of all voxels, in accuracy descending order the length of array equals the number of voxels

## brainiak.fcma.preprocessing module¶

FCMA preprocessing.

class brainiak.fcma.preprocessing.RandomType

Bases: enum.Enum

Define the random types as enumeration

NORANDOM means do not randomize the data; REPRODUCIBLE means randomize the data with a fixed seed so that the permutation holds between different runs; UNREPRODUCIBLE means truly randomize the data which returns different results in different runs.

NORANDOM = 0
REPRODUCIBLE = 1
UNREPRODUCIBLE = 2
brainiak.fcma.preprocessing.prepare_fcma_data(images, conditions, mask1, mask2=None, random=<RandomType.NORANDOM: 0>, comm=<mpi4py.MPI.Intracomm object>)

Prepare data for correlation-based computation and analysis.

Generate epochs of interests, then broadcast to all workers.

Parameters: images (Iterable[SpatialImage]) – Data. conditions (List[UniqueLabelConditionSpec]) – Condition specification. mask1 (np.ndarray) – Mask to apply to each image. mask2 (Optional[np.ndarray]) – Mask to apply to each image. If it is not specified, the method will assign None to the returning variable raw_data2 and the self-correlation on raw_data1 will be computed random (Optional[RandomType]) – Randomize the data within subject or not. Default NORANDOM comm (MPI.Comm) – MPI communicator to use for MPI operations. raw_data1 (list of 2D array in shape [epoch length, nVoxels]) – the data organized in epochs, specified by the first mask. len(raw_data) equals the number of epochs raw_data2 (Optional, list of 2D array in shape [epoch length, nVoxels]) – the data organized in epochs, specified by the second mask if any. len(raw_data2) equals the number of epochs labels (list of 1D array) – the condition labels of the epochs len(labels) labels equals the number of epochs
brainiak.fcma.preprocessing.generate_epochs_info(epoch_list)

use epoch_list to generate epoch_info defined below

Parameters: epoch_list (list of 3D (binary) array in shape [condition, nEpochs, nTRs]) – Contains specification of epochs and conditions, assuming 1. all subjects have the same number of epochs; 2. len(epoch_list) equals the number of subjects; 3. an epoch is always a continuous time course. epoch_info – label is the condition labels of the epochs; sid is the subject id, corresponding to the index of raw_data; start is the start TR of an epoch (inclusive); end is the end TR of an epoch(exclusive). Assuming len(labels) labels equals the number of epochs and the epochs of the same sid are adjacent in epoch_info list of tuple (label, sid, start, end)
brainiak.fcma.preprocessing.prepare_mvpa_data(images, conditions, mask)

Prepare data for activity-based model training and prediction.

Average the activity within epochs and z-scoring within subject.

Parameters: images (Iterable[SpatialImage]) – Data. conditions (List[UniqueLabelConditionSpec]) – Condition specification. mask (np.ndarray) – Mask to apply to each image. processed_data (2D array in shape [num_voxels, num_epochs]) – averaged epoch by epoch processed data labels (1D array) – contains labels of the data
brainiak.fcma.preprocessing.prepare_searchlight_mvpa_data(images, conditions, data_type=<class 'numpy.float32'>, random=<RandomType.NORANDOM: 0>)

obtain the data for activity-based voxel selection using Searchlight

Average the activity within epochs and z-scoring within subject, while maintaining the 3D brain structure. In order to save memory, the data is processed subject by subject instead of reading all in before processing. Assuming all subjects live in the identical cube.

Parameters: images (Iterable[SpatialImage]) – Data. conditions (List[UniqueLabelConditionSpec]) – Condition specification. data_type – Type to cast image to. random (Optional[RandomType]) – Randomize the data within subject or not. processed_data (4D array in shape [brain 3D + epoch]) – averaged epoch by epoch processed data labels (1D array) – contains labels of the data

## brainiak.fcma.util module¶

Full Correlation Matrix Analysis (FCMA)

Correlation related high performance routines

brainiak.fcma.util.compute_correlation(matrix1, matrix2)

compute correlation between two sets of variables

Correlate the rows of matrix1 with the rows of matrix2. If matrix1 == matrix2, it is auto-correlation computation resulting in a symmetric correlation matrix. The number of columns MUST agree between set1 and set2. The correlation being computed here is the Pearson’s correlation coefficient, which can be expressed as

$corr(X, Y) = \frac{cov(X, Y)}{\sigma_X\sigma_Y}$

where cov(X, Y) is the covariance of variable X and Y, and

$\sigma_X$

is the standard deviation of variable X

Reducing the correlation computation to matrix multiplication and using BLAS GEMM API wrapped by Scipy can speedup the numpy built-in correlation computation (numpy.corrcoef) by one order of magnitude

$\begin{split}corr(X, Y) &= \frac{\sum\limits_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{(n-1) \sqrt{\frac{\sum\limits_{j=1}^n x_j^2-n\bar{x}}{n-1}} \sqrt{\frac{\sum\limits_{j=1}^{n} y_j^2-n\bar{y}}{n-1}}}\\ &= \sum\limits_{i=1}^n(\frac{(x_i-\bar{x})} {\sqrt{\sum\limits_{j=1}^n x_j^2-n\bar{x}}} \frac{(y_i-\bar{y})}{\sqrt{\sum\limits_{j=1}^n y_j^2-n\bar{y}}})\end{split}$
Parameters: matrix1 (2D array in shape [r1, c]) – MUST be continuous and row-major matrix2 (2D array in shape [r2, c]) – MUST be continuous and row-major corr_data – continuous and row-major in np.float32 2D array in shape [r1, r2]

## brainiak.fcma.voxelselector module¶

Full Correlation Matrix Analysis (FCMA)

Correlation-based voxel selection

class brainiak.fcma.voxelselector.VoxelSelector(labels, epochs_per_subj, num_folds, raw_data, raw_data2=None, voxel_unit=64, process_num=4, master_rank=0)

Bases: object

Correlation-based voxel selection component of FCMA.

Parameters: labels (list of 1D array) – the condition labels of the epochs len(labels) labels equals the number of epochs epochs_per_subj (int) – The number of epochs of each subject num_folds (int) – The number of folds to be conducted in the cross validation raw_data (list of 2D array in shape [epoch length, nVoxels]) – Assumption: 1. all activity data contains the same number of voxels the activity data has been z-scored, ready to compute correlation as matrix multiplication all subjects have the same number of epochs epochs belonging to the same subject are adjacent in the list if MPI jobs are running on multiple nodes, the path used must be on a filesystem shared by all nodes raw_data2 (Optional, list of 2D array in shape [epoch length, nVoxels]) – raw_data2 shares the data structure of the assumptions of raw_data If raw_data2 is None, the correlation will be computed as raw_data by raw_data. If raw_data2 is specified, len(raw_data) MUST equal len(raw_data2), the correlation will be computed as raw_data by raw_data2. voxel_unit (int, default 64) – The number of voxels assigned to a worker each time process_num (Optional[int]) – The maximum number of processes used in cross validation. If None, the number of processes will equal the number of available hardware threads, considering cpusets restrictions. If 0, cross validation will not use python multiprocessing. master_rank (int, default 0) – The process which serves as the master
run(clf)

Run correlation-based voxel selection in master-worker model.

Sort the voxels based on the cross-validation accuracy of their correlation vectors

Parameters: clf (classification function) – the classifier to be used in cross validation results – the accuracy numbers of all voxels, in accuracy descending order the length of array equals the number of voxels list of tuple (voxel_id, accuracy)