Full Correlation Matrix Analysis


The connectivity analyses we performed previously examined the coupling of different regions of the brain. This coupling can capture information that activity-based univariate or multivariate techniques do not detect. For instance, imagine that voxel A and B are coupled during one task condition and voxel A and C are coupled during another task condition. In such a case, voxel A may always be active (and thus task differences are likely to be invisible to activity-based approaches including MVPA), but its connectivity with other voxels will reveal task information.

However, this creates a problem: to find these diagnostic voxels in an unbiased, data-driven fashion it is necessary to correlate every voxel in the brain with every other voxel and somehow analyze these millions or billions of pairwise correlations. This computation can take a very long time. Simplifying assumptions can be made to reduce the complexity of this analysis. For instance, it is possible to downsample the voxels (e.g., using parcellation as in the previous session) in order to reduce the total number of correlations that are made, but in doing so we introduce the assumption that those parcellations divide functional units of brain activity.

Fortunately, advances in machine learning, parallel computing and efficient coding have made it possible to calculate and analyze billions of correlations rapidly. These tools are part of the BrainIAK toolbox and are used for Full Correlation Matrix Analysis (FCMA). This method is outlined in detail in Wang et al. (2015). In what follows we will learn how to run FCMA and actually perform it on real data on a reasonable timescale.

The logic of FCMA is as follows: take every voxel in the brain and correlate it with every other voxel in the brain for each trial and participant. Then, take each big correlation matrix and turn it into a vector so that every pairwise relationship is represented as an element in that vector. This vector is a list of features that can be used for classification, just like we used voxels or principal components in past exercises. The vectors for each trial/condition can then be stacked so that we get an example by feature matrix. We can then feed that matrix into a classifier and determine whether the pattern of information across voxel pairs discriminates between conditions.

We will use FCMA to determine if there exist brain regions, undetectable by MVPA, that show differential processing of attending to faces vs. scenes. We can compare our results from FCMA to the MVPA results that were generated in the searchlight notebook.

Goal of this script

  1. Run FCMA feature selection
  2. Run FCMA classification
  3. Learn plotting tools for FCMA

Table of Contents

1. The FCMA workflow

1.1 Data preparation
1.2 Create an epoch file
1.3 Normalize data
1.4 Compute correlations and voxel selection
1.5 Create voxel selection masks
1.6 Classification

1.6.1 Pre-computed kernel

2. FCMA batch scripts

2.1 Inner loop (three subjects)
2.2 Outer loop (all subjects)
2.3 Permutation testing

3. Plotting the results

3.1 Plot the connectome
3.2 Plotting circos

4. MVPA and FCMA


1 2 3 4 5 6 7 8 9 Novel contribution


For this script we will use the face/scene dataset from Wang et al. (2015), who in turn used localizer data from Turk-Browne et al. (2012). Localizer details (Turk-Browne et al. (2012):

"Subjects also completed a functional localizer run lasting 6 min 6 s to define bilateral PPA regions of interest (ROIs). An ROI approach was used to minimize multiple comparisons and improve statistical power. The localizer run alternated between six repetitions each of scene and face blocks (Turk-Browne et al., 2010). Blocks contained 12 stimuli presented for 500 ms every 1500 ms. The 18 s of stimulation was followed by 12 s of fixation before the next block. Subjects made the same indoor/outdoor judgment for scenes, and a male/female judgment for faces."

Self-study: Explore the data

In [2]:
import warnings
import sys 
if not sys.warnoptions:
# Import libraries
import nibabel as nib
import numpy as np
import time
import os
from scipy.stats import sem

from nilearn import plotting
from nilearn.image import coord_transform

import brainiak.utils.fmrisim as sim
from brainiak.fcma.voxelselector import VoxelSelector
from brainiak.fcma.preprocessing import prepare_fcma_data
from brainiak.fcma.preprocessing import RandomType
from brainiak.fcma.util import compute_correlation
from brainiak import io

import networkx as nx
from nxviz.plots import CircosPlot

import matplotlib.pyplot as plt
import seaborn as sns 

# %matplotlib inline 
%matplotlib notebook
%autosave 5
sns.set(style = 'white', context='poster', rc={"lines.linewidth": 2.5})
Autosaving every 5 seconds

1. The FCMA workflow

The following sequence of steps are necessary for successfully running FCMA using BrainIAK.

  1. Data preparation

  2. Create an epoch file

  3. Normalize data

  4. Correlation and voxel selection

  5. Create voxel selection masks

  6. Classification

1.1 Data preparation

FCMA has tools to efficiently read in brain data but to do this, directories must be set up appropriately. Specifically, FCMA takes as an input a directory with the fMRI data you want to analyze. You can specify the suffix of the files you want to read in. Usually the suffix will just be '.nii.gz' but if you want to specify loading in only data from a certain condition then you might want a different kind of suffix. All data within the directory with the matching suffix will be loaded in and analyzed.

In [4]:
# Set paths
from utils import fs_data_dir, results_path

print('data dir = %s' % (fs_data_dir)) 
suffix = 'bet.nii.gz'
mask_file =  os.path.join(fs_data_dir, 'mask.nii.gz') # whole brain data
epoch_file = os.path.join(fs_data_dir, 'fs_epoch_labels.npy')

# Make directory to save results
output_dir = results_path + '/fcma_results'
if not os.path.exists(output_dir):

print('output dir = %s' % (output_dir)) 
data dir = /your_home_folder/brainiak_datasets/face_scene
output dir = your_home_folder/brainiak_results/fcma_results
In [7]:
# Create an image object that can be used by FCMA to load in data efficiently.
images = io.load_images_from_dir(fs_data_dir, suffix)

# Use a path to a mask file to create a binary mask for FCMA
mask = io.load_boolean_mask(mask_file)

1.2 Load an epoch file

What is an epoch? For the purposes of the FCMA package, an epoch is a time-interval that you want to carve up for analysis. FCMA needs to know which timepoints in the data correspond to which events, much like using labels in the previous notebooks. In notebook-08 (seed-based connectivity) our epochs were the entire run of data but in the code here this is just each block of faces or scenes. The stimuli were shown only for 12 TRs (18s) followed by 6 TRs (12s) of rest. We will use 12 TRs as the epoch for FCMA.

The epoch file has a very specific structure that makes it readable by BrainIAK. The epoch file is a list of participants/runs (here the same thing) in which each entry is a 3-D matrix of booleans, with the value =1 for all TRs for the epoch of interest and zero everywhere else. The dimensions of the 3-D matrix are condition (face or scene) by epoch (trial 1, 2, etc.) by TR. This means that for each condition and epoch, there is a vector of TR-length that represents whether any time points from this data ought to be included.

Note: the epoch file in this example isn't actually a list of participants, it is a 4-D array with the first dimension representing the participants. However, it could be a list and would be interpreted the same way. The reason you might want a list instead of a 4-D volume is that different participants may have different numbers of TRs.

Exercise 1: Crop the epoch file to extract the first 3 subjects. Do this by first loading in the current epoch file using numpy tools. Then edit this file to only include 3 participants and save this cropped version as fs_epoch_labels_3subj.npy. Then read in the epoch file with the io.load_labels tool.

How many epochs are there per person and how many epochs of each condition there are?

In [8]:
# Insert code here
# load 'epoch_file'

# uncomment the following lines and fill-in the blanks
# # crop the numpy array
# new_epoch_array = 

# # save the cropped array 
# new_epoch_file = os.path.join(output_dir, 'fs_epoch_labels_3subj.npy')
# np.save(new_epoch_file, new_epoch_array)

# # read in the epoch file using io.load_labels
# epoch_list = io.load_labels(new_epoch_file) # It has 3 subjects

# # print out useful information
# epochs_per_subj = epoch_list[0].shape[1]
# print('Number of participants:', len(epoch_list))
# print('Number of epochs per participant:', epochs_per_subj)


1.3 Normalize data

BrainIAKs FCMA function takes in the brain images, mask and epoch file via the prepare_fcma_data function to format and normalizea the data for analysis. You might wonder why this step is not included in the actual FCMA function. Why bother with a dedicated preparation step? The rationale behind this is that it helps in speeding up the correlation computation. As such it is a necessary step to use BrainIAKs optimized processing routines.

Note that this function has a nice feature: if your input epoch_list has fewer entries than your input images then it will only take the first N images for which entries in the epoch_list exist. This is useful for what we are doing here: we input an epoch file for three participants only, while inputting an BrainIAK image object with fMRI data for all participants. In other words, if the epoch_list is subsampled then your images will also be subsampled. However, make sure you have the order right: It will only take the first N participants so make sure your epoch file corresponds to those same participants.

In [11]:
# Normalize and format the data.
raw_data, _, labels = prepare_fcma_data(images, epoch_list, mask)

Exercise 2: What does each list element of raw_data and labels represent? What does the dimensionality of the array in each raw_data list element represent?


1.4 Compute correlations and voxel selection

The computational workhorse behind FCMA is the code in compute_correlation. This is C code written with cython (a Python binding for C code) that allows for extremely efficient computation of correlations.

The following example extracts the data for one epoch, for one subject and computes the correlation using this procedure.

In [12]:
# Extract data for one subject, for one epoch (take only the every 10th voxel)
epoch_data = raw_data[0][:,::10]

# Make the data c contiguous (the type of ordering used in C in which the last dimensions are stored first (opposite of fortran/matlab))
mat = np.ascontiguousarray(epoch_data.T, dtype=np.float32)  # Voxels x TRs for one epoch

begin_time = time.time()
epoch_corr = compute_correlation(mat, mat)  # correlation of every voxel with every other voxel
end_time = time.time()
print("Analysis duration: %0.5fs" % (end_time - begin_time))

plt.title('Correlations for one epoch');
Analysis duration: 0.79793s