brainiak.funcalign package

Functional alignment of volumes from different subjects.

Submodules

brainiak.funcalign.fastsrm module

Fast Shared Response Model (FastSRM)

The implementation is based on the following publications:

Richard2019

“Fast Shared Response Model for fMRI data” H. Richard, L. Martin, A. Pinho, J. Pillow, B. Thirion, 2019 https://arxiv.org/pdf/1909.12537.pdf

class brainiak.funcalign.fastsrm.FastSRM(atlas=None, n_components=20, n_iter=100, temp_dir=None, low_ram=False, seed=None, n_jobs=1, verbose='warn', aggregate='mean')

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

SRM decomposition using a very low amount of memory and computational power thanks to the use of an atlas as described in [Richard2019].

Given multi-subject data, factorize it as a shared response S among all subjects and an orthogonal transform (basis) W per subject:

\[X_i \approx W_i S, \forall i=1 \dots N\]
Parameters
  • atlas (array, shape=[n_supervoxels, n_voxels] or array,shape=[n_voxels] or str or None, default=None) – Probabilistic or deterministic atlas on which to project the data. Deterministic atlas is an array of shape [n_voxels,] where values range from 1 to n_supervoxels. Voxels labelled 0 will be ignored. If atlas is a str the corresponding array is loaded with numpy.load and expected shape is (n_voxels,) for a deterministic atlas and (n_supervoxels, n_voxels) for a probabilistic atlas.

  • n_components (int) – Number of timecourses of the shared coordinates

  • n_iter (int) – Number of iterations to perform

  • temp_dir (str or None) – Path to dir where temporary results are stored. If None temporary results will be stored in memory. This can results in memory errors when the number of subjects and/or sessions is large

  • low_ram (bool) – If True and temp_dir is not None, reduced_data will be saved on disk. This increases the number of IO but reduces memory complexity when the number of subject and/or sessions is large

  • seed (int) – Seed used for random sampling.

  • n_jobs (int, optional, default=1) – The number of CPUs to use to do the computation. -1 means all CPUs, -2 all CPUs but one, and so on.

  • verbose (bool or "warn") – If True, logs are enabled. If False, logs are disabled. If “warn” only warnings are printed.

  • aggregate (str or None, default="mean") – If “mean”, shared_response is the mean shared response from all subjects. If None, shared_response contains all subject-specific responses in shared space

`basis_list`
  • if basis is a list of array, element i is the basis of subject i

  • if basis is a list of str, element i is the path to the basis of subject i that is loaded with np.load yielding an array of shape [n_components, n_voxels].

Note that any call to the clean method erases this attribute

Type

list of array, element i has shape=[n_components, n_voxels] or list of str

Note

References: H. Richard, L. Martin, A. Pinho, J. Pillow, B. Thirion, 2019: Fast shared response model for fMRI data (https://arxiv.org/pdf/1909.12537.pdf)

add_subjects(imgs, shared_response)

Add subjects to the current fit. Each new basis will be appended at the end of the list of basis (which can be accessed using self.basis)

Parameters
  • imgs (array of str, shape=[n_subjects, n_sessions] or list of list of arrays or list of arrays) –

    Element i, j of the array is a path to the data of subject i collected during session j. Data are loaded with numpy.load and expected shape is [n_voxels, n_timeframes] n_timeframes and n_voxels are assumed to be the same across subjects n_timeframes can vary across sessions. Each voxel’s timecourse is assumed to have mean 0 and variance 1

    imgs can also be a list of list of arrays where element i, j of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i collected during session j.

    imgs can also be a list of arrays where element i of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i (number of sessions is implicitly 1)

  • shared_response (list of arrays, list of list of arrays or array) –

    • if imgs is a list of array and self.aggregate=”mean”: shared response is an array of shape (n_components, n_timeframes)

    • if imgs is a list of array and self.aggregate=None: shared response is a list of array, element i is the projection of data of subject i in shared space.

    • if imgs is an array or a list of list of array and self.aggregate=”mean”: shared response is a list of array, element j is the shared response during session j

    • if imgs is an array or a list of list of array and self.aggregate=None: shared response is a list of list of array, element i, j is the projection of data of subject i collected during session j in shared space.

clean()

This erases temporary files and basis_list attribute to free memory. This method should be called when fitted model is not needed anymore.

fit(imgs)

Computes basis across subjects from input imgs

Parameters

imgs (array of str, shape=[n_subjects, n_sessions] or list of list of arrays or list of arrays) –

Element i, j of the array is a path to the data of subject i collected during session j. Data are loaded with numpy.load and expected shape is [n_voxels, n_timeframes] n_timeframes and n_voxels are assumed to be the same across subjects n_timeframes can vary across sessions. Each voxel’s timecourse is assumed to have mean 0 and variance 1

imgs can also be a list of list of arrays where element i, j of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i collected during session j.

imgs can also be a list of arrays where element i of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i (number of sessions is implicitly 1)

Returns

self – Returns the instance itself. Contains attributes listed at the object level.

Return type

object

fit_transform(imgs, subjects_indexes=None)

Computes basis across subjects and shared response from input imgs return shared response.

Parameters
  • imgs (array of str, shape=[n_subjects, n_sessions] or list of list of arrays or list of arrays) –

    Element i, j of the array is a path to the data of subject i collected during session j. Data are loaded with numpy.load and expected shape is [n_voxels, n_timeframes] n_timeframes and n_voxels are assumed to be the same across subjects n_timeframes can vary across sessions. Each voxel’s timecourse is assumed to have mean 0 and variance 1

    imgs can also be a list of list of arrays where element i, j of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i collected during session j.

    imgs can also be a list of arrays where element i of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i (number of sessions is implicitly 1)

  • subjects_indexes (list or None:) – if None imgs[i] will be transformed using basis_list[i]. Otherwise imgs[i] will be transformed using basis_list[subjects_index[i]]

Returns

shared_response

  • if imgs is a list of array and self.aggregate=”mean”: shared response is an array of shape (n_components, n_timeframes)

  • if imgs is a list of array and self.aggregate=None: shared response is a list of array, element i is the projection of data of subject i in shared space.

  • if imgs is an array or a list of list of array and self.aggregate=”mean”: shared response is a list of array, element j is the shared response during session j

  • if imgs is an array or a list of list of array and self.aggregate=None: shared response is a list of list of array, element i, j is the projection of data of subject i collected during session j in shared space.

Return type

list of arrays, list of list of arrays or array

inverse_transform(shared_response, subjects_indexes=None, sessions_indexes=None)

From shared response and basis from training data reconstruct subject’s data

Parameters
  • shared_response (list of arrays, list of list of arrays or array) –

    • if imgs is a list of array and self.aggregate=”mean”: shared response is an array of shape (n_components, n_timeframes)

    • if imgs is a list of array and self.aggregate=None: shared response is a list of array, element i is the projection of data of subject i in shared space.

    • if imgs is an array or a list of list of array and self.aggregate=”mean”: shared response is a list of array, element j is the shared response during session j

    • if imgs is an array or a list of list of array and self.aggregate=None: shared response is a list of list of array, element i, j is the projection of data of subject i collected during session j in shared space.

  • subjects_indexes (list or None) – if None reconstructs data of all subjects used during train. Otherwise reconstructs data of subjects specified by subjects_indexes.

  • sessions_indexes (list or None) – if None reconstructs data of all sessions. Otherwise uses reconstructs data of sessions specified by sessions_indexes.

Returns

reconstructed_data

  • if reconstructed_data is a list of list : element i, j is the reconstructed data for subject subjects_indexes[i] and session sessions_indexes[j] as an np array of shape n_voxels, n_timeframes

  • if reconstructed_data is a list : element i is the reconstructed data for subject subject_indexes[i] as an np array of shape n_voxels, n_timeframes

Return type

list of list of arrays or list of arrays

transform(imgs, subjects_indexes=None)

From data in imgs and basis from training data, computes shared response.

Parameters
  • imgs (array of str, shape=[n_subjects, n_sessions] or list of list of arrays or list of arrays) –

    Element i, j of the array is a path to the data of subject i collected during session j. Data are loaded with numpy.load and expected shape is [n_voxels, n_timeframes] n_timeframes and n_voxels are assumed to be the same across subjects n_timeframes can vary across sessions. Each voxel’s timecourse is assumed to have mean 0 and variance 1

    imgs can also be a list of list of arrays where element i, j of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i collected during session j.

    imgs can also be a list of arrays where element i of the array is a numpy array of shape [n_voxels, n_timeframes] that contains the data of subject i (number of sessions is implicitly 1)

  • subjects_indexes (list or None:) – if None imgs[i] will be transformed using basis_list[i]. Otherwise imgs[i] will be transformed using basis[subjects_index[i]]

Returns

shared_response

  • if imgs is a list of array and self.aggregate=”mean”: shared response is an array of shape (n_components, n_timeframes)

  • if imgs is a list of array and self.aggregate=None: shared response is a list of array, element i is the projection of data of subject i in shared space.

  • if imgs is an array or a list of list of array and self.aggregate=”mean”: shared response is a list of array, element j is the shared response during session j

  • if imgs is an array or a list of list of array and self.aggregate=None: shared response is a list of list of array, element i, j is the projection of data of subject i collected during session j in shared space.

Return type

list of arrays, list of list of arrays or array

brainiak.funcalign.rsrm module

Robust Shared Response Model (RSRM)

The implementation is based on the following publications:

Turek2017(1,2)

“Capturing Shared and Individual Information in fMRI Data”, J. Turek, C. Ellis, L. Skalaban, N. Turk-Browne, T. Willke under review, 2017.

class brainiak.funcalign.rsrm.RSRM(n_iter=10, features=50, gamma=1.0, rand_seed=0)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Robust Shared Response Model (RSRM)

Given multi-subject data, factorize it as a shared response R among all subjects, an orthogonal transform W per subject, and an individual (outlying) sparse component S per subject:

\[X_i \approx W_i R + S_i, \forall i=1 \dots N\]

This unsupervised model allows to learn idiosyncratic information for subjects and simultaneously improve the shared response estimation. The model has similar properties to the Shared Response Model (SRM) with the addition of the individual components.

The model is estimated solving the following optimization problem:

\[\min_{W_i, S_i, R}\sum_i \frac{1}{2}\|X_i - W_i R - S_i\|_F^2\]
\[+ \gamma\|S_i\|_1\]
\[s.t. \qquad W_i^TW_i = I \quad \forall i=1 \dots N\]

The solution to this problem is obtained by applying a Block-Coordinate Descent procedure. More details can be found in [Turek2017].

Parameters
  • n_iter (int, default: 10) – Number of iterations to run the algorithm.

  • features (int, default: 50) – Number of features to compute.

  • gamma (float, default: 1.0) – Regularization parameter for the sparseness of the individual components. Higher values yield sparser individual components.

  • rand_seed (int, default: 0) – Seed for initializing the random number generator.

w_

The orthogonal transforms (mappings) for each subject.

Type

list of array, element i has shape=[voxels_i, features]

r_

The shared response.

Type

array, shape=[features, timepoints]

s_

The individual components for each subject.

Type

list of array, element i has shape=[voxels_i, timepoints]

random_state_

Random number generator initialized using rand_seed

Type

RandomState

Note

The number of voxels may be different between subjects. However, the number of timepoints for the alignment data must be the same across subjects.

The Robust Shared Response Model is approximated using the Block-Coordinate Descent (BCD) algorithm proposed in [Turek2017].

This is a single node version.

fit(X)

Compute the Robust Shared Response Model

Parameters

X (list of 2D arrays, element i has shape=[voxels_i, timepoints]) – Each element in the list contains the fMRI data of one subject.

transform(X)

Use the model to transform new data to Shared Response space

Parameters

X (list of 2D arrays, element i has shape=[voxels_i, timepoints_i]) – Each element in the list contains the fMRI data of one subject.

Returns

  • r (list of 2D arrays, element i has shape=[features_i, timepoints_i]) – Shared responses from input data (X)

  • s (list of 2D arrays, element i has shape=[voxels_i, timepoints_i]) – Individual data obtained from fitting model to input data (X)

transform_subject(X)

Transform a new subject using the existing model

Parameters

X (2D array, shape=[voxels, timepoints]) – The fMRI data of the new subject.

Returns

  • w (2D array, shape=[voxels, features]) – Orthogonal mapping W_{new} for new subject

  • s (2D array, shape=[voxels, timepoints]) – Individual term S_{new} for new subject

brainiak.funcalign.srm module

Shared Response Model (SRM)

The implementations are based on the following publications:

Chen2015(1,2)

“A Reduced-Dimension fMRI Shared Response Model”, P.-H. Chen, J. Chen, Y. Yeshurun-Dishon, U. Hasson, J. Haxby, P. Ramadge Advances in Neural Information Processing Systems (NIPS), 2015. http://papers.nips.cc/paper/5855-a-reduced-dimension-fmri-shared-response-model

Anderson2016

“Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets”, Michael J. Anderson, Mihai Capotă, Javier S. Turek, Xia Zhu, Theodore L. Willke, Yida Wang, Po-Hsuan Chen, Jeremy R. Manning, Peter J. Ramadge, Kenneth A. Norman, IEEE International Conference on Big Data, 2016. https://doi.org/10.1109/BigData.2016.7840719

class brainiak.funcalign.srm.DetSRM(n_iter=10, features=50, rand_seed=0)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Deterministic Shared Response Model (DetSRM)

Given multi-subject data, factorize it as a shared response S among all subjects and an orthogonal transform W per subject:

\[X_i \approx W_i S, \forall i=1 \dots N\]
Parameters
  • n_iter (int, default: 10) – Number of iterations to run the algorithm.

  • features (int, default: 50) – Number of features to compute.

  • rand_seed (int, default: 0) – Seed for initializing the random number generator.

w_

The orthogonal transforms (mappings) for each subject.

Type

list of array, element i has shape=[voxels_i, features]

s_

The shared response.

Type

array, shape=[features, samples]

random_state_

Random number generator initialized using rand_seed

Type

RandomState

Note

The number of voxels may be different between subjects. However, the number of samples must be the same across subjects.

The Deterministic Shared Response Model is approximated using the Block Coordinate Descent (BCD) algorithm proposed in [Chen2015].

This is a single node version.

The run-time complexity is \(O(I (V T K + V K^2))\) and the memory complexity is \(O(V T)\) with I - the number of iterations, V - the sum of voxels from all subjects, T - the number of samples, K - the number of features (typically, \(V \gg T \gg K\)), and N - the number of subjects.

fit(X, y=None)

Compute the Deterministic Shared Response Model

Parameters
  • X (list of 2D arrays, element i has shape=[voxels_i, samples]) – Each element in the list contains the fMRI data of one subject.

  • y (not used) –

transform(X, y=None)

Use the model to transform data to the Shared Response subspace

Parameters
  • X (list of 2D arrays, element i has shape=[voxels_i, samples_i]) – Each element in the list contains the fMRI data of one subject.

  • y (not used) –

Returns

s – Shared responses from input data (X)

Return type

list of 2D arrays, element i has shape=[features_i, samples_i]

transform_subject(X)

Transform a new subject using the existing model. The subject is assumed to have recieved equivalent stimulation

Parameters

X (2D array, shape=[voxels, timepoints]) – The fMRI data of the new subject.

Returns

w – Orthogonal mapping W_{new} for new subject

Return type

2D array, shape=[voxels, features]

class brainiak.funcalign.srm.SRM(n_iter=10, features=50, rand_seed=0, comm=<mpi4py.MPI.Intracomm object>)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Probabilistic Shared Response Model (SRM)

Given multi-subject data, factorize it as a shared response S among all subjects and an orthogonal transform W per subject:

\[X_i \approx W_i S, \forall i=1 \dots N\]
Parameters
  • n_iter (int, default: 10) – Number of iterations to run the algorithm.

  • features (int, default: 50) – Number of features to compute.

  • rand_seed (int, default: 0) – Seed for initializing the random number generator.

  • comm (mpi4py.MPI.Intracomm) – The MPI communicator containing the data

w_

The orthogonal transforms (mappings) for each subject.

Type

list of array, element i has shape=[voxels_i, features]

s_

The shared response.

Type

array, shape=[features, samples]

sigma_s_

The covariance of the shared response Normal distribution.

Type

array, shape=[features, features]

mu_

The voxel means over the samples for each subject.

Type

list of array, element i has shape=[voxels_i]

rho2_

The estimated noise variance \(\rho_i^2\) for each subject

Type

array, shape=[subjects]

comm

The MPI communicator containing the data

Type

mpi4py.MPI.Intracomm

random_state_

Random number generator initialized using rand_seed

Type

RandomState

Note

The number of voxels may be different between subjects. However, the number of samples must be the same across subjects.

The probabilistic Shared Response Model is approximated using the Expectation Maximization (EM) algorithm proposed in [Chen2015]. The implementation follows the optimizations published in [Anderson2016].

This is a single node version.

The run-time complexity is \(O(I (V T K + V K^2 + K^3))\) and the memory complexity is \(O(V T)\) with I - the number of iterations, V - the sum of voxels from all subjects, T - the number of samples, and K - the number of features (typically, \(V \gg T \gg K\)).

fit(X, y=None)

Compute the probabilistic Shared Response Model

Parameters
  • X (list of 2D arrays, element i has shape=[voxels_i, samples]) – Each element in the list contains the fMRI data of one subject.

  • y (not used) –

save(file)

Save fitted SRM to .npz file.

Parameters

file (str, file-like object, or pathlib.Path) – Filename (string), open file (file-like object) or pathlib.Path where the fitted SRM will be saved. If file is a string or a Path, the .npz extension will be appended to the filename if it is not already there.

Returns

Return type

None

transform(X, y=None)

Use the model to transform matrix to Shared Response space

Parameters
  • X (list of 2D arrays, element i has shape=[voxels_i, samples_i]) – Each element in the list contains the fMRI data of one subject note that number of voxels and samples can vary across subjects

  • y (not used (as it is unsupervised learning)) –

Returns

s – Shared responses from input data (X)

Return type

list of 2D arrays, element i has shape=[features_i, samples_i]

transform_subject(X)

Transform a new subject using the existing model. The subject is assumed to have recieved equivalent stimulation

Parameters

X (2D array, shape=[voxels, timepoints]) – The fMRI data of the new subject.

Returns

w – Orthogonal mapping W_{new} for new subject

Return type

2D array, shape=[voxels, features]

brainiak.funcalign.sssrm module

Semi-Supervised Shared Response Model (SS-SRM)

The implementations are based on the following publications:

Turek2016(1,2)

“A Semi-Supervised Method for Multi-Subject fMRI Functional Alignment”, J. S. Turek, T. L. Willke, P.-H. Chen, P. J. Ramadge IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 1098-1102. https://doi.org/10.1109/ICASSP.2017.7952326

class brainiak.funcalign.sssrm.SSSRM(n_iter=10, features=50, gamma=1.0, alpha=0.5, rand_seed=0)

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin, sklearn.base.TransformerMixin

Semi-Supervised Shared Response Model (SS-SRM)

Given multi-subject data, factorize it as a shared response S among all subjects and an orthogonal transform W per subject, using also labeled data to train a Multinomial Logistic Regression (MLR) classifier (with l2 regularization) in a semi-supervised manner:

(1)\[(1-\alpha) Loss_{SRM}(W_i,S;X_i) + \alpha/\gamma Loss_{MLR}(\theta, bias; {(W_i^T \times Z_i, y_i}) + R(\theta)\]

(see Equations (1) and (4) in [Turek2016]).

Parameters
  • n_iter (int, default: 10) – Number of iterations to run the algorithm.

  • features (int, default: 50) – Number of features to compute.

  • gamma (float, default: 1.0) – Regularization parameter for the classifier.

  • alpha (float, default: 0.5) – Balance parameter between the SRM term and the MLR term.

  • rand_seed (int, default: 0) – Seed for initializing the random number generator.

w_

The orthogonal transforms (mappings) for each subject.

Type

list of array, element i has shape=[voxels_i, features]

s_

The shared response.

Type

array, shape=[features, samples]

theta_

The MLR class plane parameters.

Type

array, shape=[classes, features]

bias_

The MLR class biases.

Type

array, shape=[classes]

classes_

Mapping table for each classes to original class label.

Type

array of int, shape=[classes]

random_state_

Random number generator initialized using rand_seed

Type

RandomState

Note

The number of voxels may be different between subjects. However, the number of samples for the alignment data must be the same across subjects. The number of labeled samples per subject can be different.

The Semi-Supervised Shared Response Model is approximated using the Block-Coordinate Descent (BCD) algorithm proposed in [Turek2016].

This is a single node version.

fit(X, y, Z)

Compute the Semi-Supervised Shared Response Model

Parameters
  • X (list of 2D arrays, element i has shape=[voxels_i, n_align]) – Each element in the list contains the fMRI data for alignment of one subject. There are n_align samples for each subject.

  • y (list of arrays of int, element i has shape=[samples_i]) – Each element in the list contains the labels for the data samples in Z.

  • Z (list of 2D arrays, element i has shape=[voxels_i, samples_i]) – Each element in the list contains the fMRI data of one subject for training the MLR classifier.

predict(X)

Classify the output for given data

Parameters

X (list of 2D arrays, element i has shape=[voxels_i, samples_i]) – Each element in the list contains the fMRI data of one subject The number of voxels should be according to each subject at the moment of training the model.

Returns

p – Predictions for each data sample.

Return type

list of arrays, element i has shape=[samples_i]

transform(X, y=None)

Use the model to transform matrix to Shared Response space

Parameters
  • X (list of 2D arrays, element i has shape=[voxels_i, samples_i]) – Each element in the list contains the fMRI data of one subject note that number of voxels and samples can vary across subjects.

  • y (not used as it only applies the mappings) –

Returns

s – Shared responses from input data (X)

Return type

list of 2D arrays, element i has shape=[features_i, samples_i]