brainiak.fcma package
Full correlation matrix analysis
The implementation is based on the work in [Wang2015-1] and [Wang2015-2].
- Wang2015-1
Full correlation matrix analysis (FCMA): An unbiased method for task-related functional connectivity”, Yida Wang, Jonathan D Cohen, Kai Li, Nicholas B Turk-Browne. Journal of Neuroscience Methods, 2015.
- Wang2015-2
“Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors”, Yida Wang, Michael J. Anderson, Jonathan D. Cohen, Alexander Heinecke, Kai Li, Nadathur Satish, Narayanan Sundaram, Nicholas B. Turk-Browne, Theodore L. Willke. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015.
Submodules
brainiak.fcma.classifier module
Full Correlation Matrix Analysis (FCMA)
Correlation-based training and prediction
- class brainiak.fcma.classifier.Classifier(clf, num_processed_voxels=2000, epochs_per_subj=0)
Bases:
BaseEstimator
Correlation-based classification component of FCMA
The classifier first computes correlation of the input data, and normalizes them if needed, then uses the given classifier to train and/or predict the correlation data. NOTE: if the classifier is sklearn.svm.SVC with precomputed kernel, the test data may be provided in the fit method to compute the kernel matrix together with the training data to save the memory usage, but the test data will NEVER be seen in the model training.
- Parameters
clf (class) – The classifier used, normally a classifier class of sklearn
num_processed_voxels (int, default 2000) – Used for SVM with precomputed kernel, every time it only computes correlation between num_process_voxels and the whole mask to aggregate the kernel matrices. This is to save the memory so as to handle correlations at a larger scale.
epochs_per_subj (int, default 0) – The number of epochs of each subject within-subject normalization will be performed during classifier training if epochs_per_subj is specified default 0 means no within-subject normalization
- training_data_
training_data_ is None except clf is SVM.SVC with precomputed kernel, in which case training data is needed to compute the similarity vector for each sample to be classified. However, if the test samples are also provided during the fit, the similarity vectors can be precomputed too and then training_data_ is None
- Type
2D numpy array in shape [num_samples, num_features]
- test_raw_data_
default None test_raw_data_ is set after a prediction is called, if the new input data equals test_raw_data_, test_data_ can be reused
- Type
a list of 2D array in shape [num_TRs, num_voxels]
- test_data_
default None test_data_ is set after a prediction is called, so that the test data does not need to be regenerated in the subsequent operations, e.g. getting decision values of the prediction. test_data_ may also be set in the fit method if sklearn.svm.SVC with precomputed kernel and the test samples are known. NOTE: the test samples will never be used to fit the model.
- Type
2D numpy array in shape [num_samples, num_features]
- num_voxels_
The number of voxels of the first brain region used in the classifier. The first brain region is always large. When training, this region may be divided to compute the correlation portion by portion. The brain regions are defined by the applied mask, e.g. the top voxels selected by FCMA voxel selection
- Type
int
- num_features_
The dimension of correlation data, normally is the product of the number of voxels of brain region 1 and the number of voxels of brain region 2. num_features_ must be consistent in both training and classification
- Type
int
- num_samples_
The number of samples
- Type
int
- num_digits_
The number of digits of the first value of the kernel matrix, for normalizing the kernel values accordingly
- Type
int
- decision_function(X=None)
Output the decision value of the prediction.
if X is not equal to self.test_raw_data_, i.e. predict is not called, first generate the test_data after getting the test_data, get the decision value via self.clf. if X is None, test_data_ is ready to be used
- Parameters
X (Optional[list of tuple (data1, data2)]) – data1 and data2 are numpy array in shape [num_TRs, num_voxels] to be computed for correlation. default None, meaning that the data to be predicted have been processed in the fit method. Otherwise, X contains the activity data filtered by ROIs and prepared for correlation computation. len(X) is the number of test samples. if len(X) > 1: normalization is done on all test samples. Within list, all data1s must have the same num_voxels value, all data2s must have the same num_voxels value.
- Returns
confidence
- Return type
the predictions confidence values of X, in shape [len(X),]
- fit(X, y, num_training_samples=None)
Use correlation data to train a model.
First compute the correlation of the input data, and then normalize within subject if more than one sample in one subject, and then fit to a model defined by self.clf.
- Parameters
X (list of tuple (data1, data2)) – data1 and data2 are numpy array in shape [num_TRs, num_voxels] to be computed for correlation. They contain the activity data filtered by ROIs and prepared for correlation computation. Within list, all data1s must have the same num_voxels value, all data2s must have the same num_voxels value.
y (1D numpy array) – labels, len(X) equals len(y)
num_training_samples (Optional[int]) – The number of samples used in the training. Set it to construct the kernel matrix portion by portion so the similarity vectors of the test data have to be computed here. Only set num_training_samples when sklearn.svm.SVC with precomputed kernel is used. If it is set, only those samples will be used to fit the model.
- Returns
self.
- Return type
- predict(X=None)
Use a trained model to predict correlation data.
first compute the correlation of the input data, and then normalize across all samples in the list if there are more than one sample, and then predict via self.clf. If X is None, use the similarity vectors produced in fit to predict
- Parameters
X (Optional[list of tuple (data1, data2)]) – data1 and data2 are numpy array in shape [num_TRs, num_voxels] to be computed for correlation. default None, meaning that the data to be predicted have been processed in the fit method. Otherwise, X contains the activity data filtered by ROIs and prepared for correlation computation. len(X) is the number of test samples. if len(X) > 1: normalization is done on all test samples. Within list, all data1s must have the same num_voxels value, all data2s must have the same num_voxels value.
- Returns
y_pred
- Return type
the predicted label of X, in shape [len(X),]
- score(X, y, sample_weight=None)
Returns the mean accuracy on the given test data and labels.
NOTE: In the condition of sklearn.svm.SVC with precomputed kernel when the kernel matrix is computed portion by portion, the function will ignore the first input argument X.
- Parameters
X (list of tuple (data1, data2)) – data1 and data2 are numpy array in shape [num_TRs, num_voxels] to be computed for correlation. They are test samples. They contain the activity data filtered by ROIs and prepared for correlation computation. Within list, all data1s must have the same num_voxels value, all data2s must have the same num_voxels value. len(X) is the number of test samples.
y (1D numpy array) – labels, len(X) equals len(y), which is num_samples
sample_weight (1D array in shape [num_samples], optional) – Sample weights.
- Returns
score – Mean accuracy of self.predict(X) wrt. y.
- Return type
float
- set_fit_request(*, num_training_samples: Union[bool, None, str] = '$UNCHANGED$') Classifier
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters
num_training_samples (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
num_training_samples
parameter infit
.- Returns
self – The updated object.
- Return type
object
- set_score_request(*, sample_weight: Union[bool, None, str] = '$UNCHANGED$') Classifier
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns
self – The updated object.
- Return type
object
brainiak.fcma.mvpa_voxelselector module
Full Correlation Matrix Analysis (FCMA)
Activity-based voxel selection
- class brainiak.fcma.mvpa_voxelselector.MVPAVoxelSelector(data, mask, labels, num_folds, sl)
Bases:
object
Activity-based voxel selection component of FCMA
- Parameters
data (4D array in shape [brain 3D + epoch]) – contains the averaged and normalized brain data epoch by epoch. It is generated by .io.prepare_searchlight_mvpa_data
mask (3D array) –
labels (1D array) – contains the labels of the epochs. It is generated by .io.prepare_searchlight_mvpa_data
num_folds (int) – the number of folds to be conducted in the cross validation
sl (Searchlight) – the distributed Searchlight object
- run(clf)
run activity-based voxel selection
Sort the voxels based on the cross-validation accuracy of their activity vectors within the searchlight
- Parameters
clf (classification function) – the classifier to be used in cross validation
- Returns
result_volume (3D array of accuracy numbers) – contains the voxelwise accuracy numbers obtained via Searchlight
results (list of tuple (voxel_id, accuracy)) – the accuracy numbers of all voxels, in accuracy descending order the length of array equals the number of voxels
brainiak.fcma.preprocessing module
FCMA preprocessing.
- class brainiak.fcma.preprocessing.RandomType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
Define the random types as enumeration
NORANDOM means do not randomize the image data; REPRODUCIBLE means randomize the image data with a fixed seed so that the permutation holds between different runs; UNREPRODUCIBLE means truly randomize the image data which returns different results in different runs.
- NORANDOM = 0
- REPRODUCIBLE = 1
- UNREPRODUCIBLE = 2
- brainiak.fcma.preprocessing.generate_epochs_info(epoch_list)
use epoch_list to generate epoch_info defined below
- Parameters
epoch_list (list of 3D (binary) array in shape [condition, nEpochs, nTRs]) – Contains specification of epochs and conditions, assuming 1. all subjects have the same number of epochs; 2. len(epoch_list) equals the number of subjects; 3. an epoch is always a continuous time course.
- Returns
epoch_info – label is the condition labels of the epochs; sid is the subject id, corresponding to the index of raw_data; start is the start TR of an epoch (inclusive); end is the end TR of an epoch(exclusive). Assuming len(labels) labels equals the number of epochs and the epochs of the same sid are adjacent in epoch_info
- Return type
list of tuple (label, sid, start, end).
- brainiak.fcma.preprocessing.prepare_fcma_data(images, conditions, mask1, mask2=None, random=RandomType.NORANDOM, comm=<mpi4py.MPI.Intracomm object>)
Prepare data for correlation-based computation and analysis.
Generate epochs of interests, then broadcast to all workers.
- Parameters
images (Iterable[SpatialImage]) – Data.
conditions (List[UniqueLabelConditionSpec]) – Condition specification.
mask1 (np.ndarray) – Mask to apply to each image.
mask2 (Optional[np.ndarray]) – Mask to apply to each image. If it is not specified, the method will assign None to the returning variable raw_data2 and the self-correlation on raw_data1 will be computed
random (Optional[RandomType]) – Randomize the image data within subject or not.
comm (MPI.Comm) – MPI communicator to use for MPI operations.
- Returns
raw_data1 (list of 2D array in shape [epoch length, nVoxels]) – the data organized in epochs, specified by the first mask. len(raw_data) equals the number of epochs
raw_data2 (Optional, list of 2D array in shape [epoch length, nVoxels]) – the data organized in epochs, specified by the second mask if any. len(raw_data2) equals the number of epochs
labels (list of 1D array) – the condition labels of the epochs len(labels) labels equals the number of epochs
- brainiak.fcma.preprocessing.prepare_mvpa_data(images, conditions, mask)
Prepare data for activity-based model training and prediction.
Average the activity within epochs and z-scoring within subject.
- Parameters
images (Iterable[SpatialImage]) – Data.
conditions (List[UniqueLabelConditionSpec]) – Condition specification.
mask (np.ndarray) – Mask to apply to each image.
- Returns
processed_data (2D array in shape [num_voxels, num_epochs]) – averaged epoch by epoch processed data
labels (1D array) – contains labels of the data
- brainiak.fcma.preprocessing.prepare_searchlight_mvpa_data(images, conditions, data_type=<class 'numpy.float32'>, random=RandomType.NORANDOM)
obtain the data for activity-based voxel selection using Searchlight
Average the activity within epochs and z-scoring within subject, while maintaining the 3D brain structure. In order to save memory, the data is processed subject by subject instead of reading all in before processing. Assuming all subjects live in the identical cube.
- Parameters
images (Iterable[SpatialImage]) – Data.
conditions (List[UniqueLabelConditionSpec]) – Condition specification.
data_type – Type to cast image to.
random (Optional[RandomType]) – Randomize the image data within subject or not.
- Returns
processed_data (4D array in shape [brain 3D + epoch]) – averaged epoch by epoch processed data
labels (1D array) – contains labels of the data
brainiak.fcma.util module
Full Correlation Matrix Analysis (FCMA)
Correlation related high performance routines
- brainiak.fcma.util.compute_correlation(matrix1, matrix2, return_nans=False)
compute correlation between two sets of variables
Correlate the rows of matrix1 with the rows of matrix2. If matrix1 == matrix2, it is auto-correlation computation resulting in a symmetric correlation matrix. The number of columns MUST agree between set1 and set2. The correlation being computed here is the Pearson’s correlation coefficient, which can be expressed as
\[corr(X, Y) = \frac{cov(X, Y)}{\sigma_X\sigma_Y}\]where cov(X, Y) is the covariance of variable X and Y, and
\[\sigma_X\]is the standard deviation of variable X
Reducing the correlation computation to matrix multiplication and using BLAS GEMM API wrapped by Scipy can speedup the numpy built-in correlation computation (numpy.corrcoef) by one order of magnitude
\[\begin{split}corr(X, Y) &= \frac{\sum\limits_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{(n-1) \sqrt{\frac{\sum\limits_{j=1}^n x_j^2-n\bar{x}}{n-1}} \sqrt{\frac{\sum\limits_{j=1}^{n} y_j^2-n\bar{y}}{n-1}}}\\ &= \sum\limits_{i=1}^n(\frac{(x_i-\bar{x})} {\sqrt{\sum\limits_{j=1}^n x_j^2-n\bar{x}}} \frac{(y_i-\bar{y})}{\sqrt{\sum\limits_{j=1}^n y_j^2-n\bar{y}}})\end{split}\]By default (return_nans=False), returns zeros for vectors with NaNs. If return_nans=True, convert zeros to NaNs (np.nan) in output.
- Parameters
matrix1 (2D array in shape [r1, c]) – MUST be continuous and row-major
matrix2 (2D array in shape [r2, c]) – MUST be continuous and row-major
return_nans (bool, default:False) – If False, return zeros for NaNs; if True, return NaNs
- Returns
corr_data – continuous and row-major in np.float32
- Return type
2D array in shape [r1, r2]
brainiak.fcma.voxelselector module
Full Correlation Matrix Analysis (FCMA)
Correlation-based voxel selection
- class brainiak.fcma.voxelselector.VoxelSelector(labels, epochs_per_subj, num_folds, raw_data, raw_data2=None, voxel_unit=64, process_num=4, master_rank=0)
Bases:
object
Correlation-based voxel selection component of FCMA.
- Parameters
labels (list of 1D array) – the condition labels of the epochs len(labels) labels equals the number of epochs
epochs_per_subj (int) – The number of epochs of each subject
num_folds (int) – The number of folds to be conducted in the cross validation
raw_data (list of 2D array in shape [epoch length, nVoxels]) –
- Assumption: 1. all activity data contains the same number of voxels
the activity data has been z-scored, ready to compute correlation as matrix multiplication
all subjects have the same number of epochs
epochs belonging to the same subject are adjacent in the list
if MPI jobs are running on multiple nodes, the path used must be on a filesystem shared by all nodes
raw_data2 (Optional, list of 2D array in shape [epoch length, nVoxels]) – raw_data2 shares the data structure of the assumptions of raw_data If raw_data2 is None, the correlation will be computed as raw_data by raw_data. If raw_data2 is specified, len(raw_data) MUST equal len(raw_data2), the correlation will be computed as raw_data by raw_data2.
voxel_unit (int, default 64) – The number of voxels assigned to a worker each time
process_num (Optional[int]) – The maximum number of processes used in cross validation. If None, the number of processes will equal the number of available hardware threads, considering cpusets restrictions. If 0, cross validation will not use python multiprocessing.
master_rank (int, default 0) – The process which serves as the master
- run(clf)
Run correlation-based voxel selection in master-worker model.
Sort the voxels based on the cross-validation accuracy of their correlation vectors
- Parameters
clf (classification function) – the classifier to be used in cross validation
- Returns
results – the accuracy numbers of all voxels, in accuracy descending order the length of array equals the number of voxels
- Return type
list of tuple (voxel_id, accuracy)