Full Correlation Matrix Analysis

V.0.1 - Alpha testing, contributions

The connectivity analyses we performed previously examined the coupling of different regions of the brain. This coupling can capture information that activity-based univariate or multivariate techniques do not detect. For instance, imagine that voxel A and B are coupled during one task condition and voxel A and C are coupled during another task condition. In such a case, voxel A may always be active (and thus task differences are likely to be invisible to activity-based approaches), but its connectivity with other voxels will reveal task information.

However, this creates a problem: to find these diagnostic voxels in an unbiased, data-driven fashion it is necessary to correlate every voxel in the brain with every other voxel. Simplifying assumptions can be made to reduce the complexity of this analysis. For instance, it is possible to downsample the voxels (e.g., using parcellation as in the previous session) in order to reduce the total number of correlations that are made.

Fortunately, advances in machine learning, parallel computing and efficient coding have made it possible to calculate and analyze billions of correlations rapidly. These tools are part of the BrainIAK toolbox and are used for Full Correlation Matrix Analysis (FCMA). This method is outlined in detail in Wang et al. (2015, Journal of Neuroscience Methods). In what follows we will learn how to run FCMA and actually perform it on real data on a reasonable timescale.

The logic of FCMA is as follows: take every voxel in the brain and correlate it with every other voxel in the brain for each trial and participant. Then, take each big correlation matrix and turn it into a vector so that every pairwise relationship is represented as a dimension in that vector. The vectors for each trial/condition can then be stacked so that we get an example by feature matrix. We can then feed that matrix into a classifier and determine whether the pattern of information across voxel pairs discriminates between conditions.

For this script we will use the face/scene dataset from Wang et al. (2015), who in turn used localizer data from Turk-Browne et al. (2012, JNeurosci).

Goal of this script

  1. Run FCMA feature selection
  2. Run FCMA classification
  3. Learn plotting tools for FCMA

Table of Contents

1. The FCMA Workflow

1.1 Data Preparation
1.2 Create an epoch file
1.3 Normalize data
1.4 Compute correlations and voxel selection
1.5 Create voxel selection masks
1.6 Classification

2. FCMA Batch Scripts

2.1 Inner loop (subsample)
2.2 Outer loop (full sample)
2.3 Permutation testing

3. Plotting the results

3.1 Plot the connectome
3.2 Plotting circos

4. MVPA and FCMA

Exercises:

Exercise 1
Exercise 2
Exercise 3
Exercise 4
Exercise 5
Exercise 6
Exercise 7
Exercise 8
Exercise 9
Novel contribution

1. The FCMA Workflow

The following sequence of steps are necessary for successfully running FCMA using BrainIAK.

  1. Data Preparation

  2. Create an epoch file

  3. Normalize data

  4. Correlation and voxel selection

  5. Create voxel selection masks

  6. Classification

In [ ]:
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import time
import sys
import os

from nilearn import plotting
from nilearn.image import coord_transform

import brainiak.utils.fmrisim as sim
from brainiak.fcma.voxelselector import VoxelSelector
from brainiak.fcma.preprocessing import prepare_fcma_data
from brainiak.fcma.preprocessing import RandomType
from brainiak.fcma.util import compute_correlation
from brainiak import io

1.1 Data Preparation

FCMA has tools to efficiently read in brain data but to do this, directories must be set up appropriately. Specifically, FCMA takes as an input a directory with the fMRI data you want to analyze. You can specify the suffix of the files you want to read in. Usually the suffix will just be '.nii.gz' but if you want to specify loading in only data from a certain condition then you might want a different kind of suffix. All data within the directory with the matching suffix will be loaded in and analyzed.

Previous data we have analyzed were shared internally on Milgram. However, there is an increasing trend toward public data sharing and transparency (buzzword Open Science), which may come into play for your final project... As an initial taste, we will use a public dataset for today's exercise. Although it is great that this data is available through the BrainIAK fcma package, it would be even better if it was stored in a a recognized repostiory like NeuroVault, OpenfMRI, or the Open Science Framework.

In [ ]:
# To begin, let's download the data
!./09-fcma/download_data.sh
In [ ]:
# Set paths
data_dir = './face_scene/'
suffix = 'bet.nii.gz'
mask_file =  data_dir + 'mask.nii.gz'
epoch_file = data_dir + 'fs_epoch_labels.npy'

# Print the directory to see its contents
!ls $data_dir

# Create an image object that can be used by FCMA to load in data efficiently.
images = io.load_images_from_dir(data_dir, suffix)

# Use a path to a mask file to create a binary mask for FCMA
mask = io.load_boolean_mask(mask_file)

1.2 Create an epoch file

What is an epoch? For the purposes of the FCMA package, an epoch is a time-interval that you want to carve up for analysis. FCMA needs to know which timepoints in the data correspond to which events, much like using labels in the previous notebooks. In the connectivity notebook our epochs were the entire run of data but in the code here this is just each block of faces or scenes. The stimuli were shown only for 12 TRs (18s) followed by 6 TRs (12s) of rest.

The epoch file has a very specific structure that makes it readible by BrainIAK. The epoch file is a list in which each entry is a 3d matrix of booleans. Each entry in this list corresponds to a participant/run (since there was one run per participant). The dimensions of the 3D matrix are condition (face or scene) by epoch (trial 1, 2, etc.) by TR.

Exercise 1: Read in the epoch data from "fs_epoch_labels.npy". Use plt.imshow to display the epoch by TR sequence for the face and scene conditions in the first participant, separately. Make sure the figure is big enough to read.

In [ ]:
# Insert code here

Exercise 2: Create an epoch file that includes only the first 3 participants and use numpy.save to save it in NumPy fromat here: './face_scene/fs_epoch_labels_3sub.npy'

In [ ]:
# Insert code here

Now let's load in the new epoch file with the io.load_labels tool. We can also pull out some useful information from the epoch file such as how many epochs there are per person.

In [ ]:
new_epoch_file = data_dir + 'fs_epoch_labels_3sub.npy'
epoch_list = io.load_labels(new_epoch_file)

# Parse the epoch data for useful dimensions
epochs_per_subj = epoch_list[0].shape[1]
num_subjs = len(epoch_list)

print('Number of participants:', num_subjs)
print('Number of epochs per participant:', epochs_per_subj)

1.3 Normalize data

BrainIAKs FCMA function takes in the brain images, mask and epoch file via the prepare_fcma_data function to format and normalize the data for analysis. You might wonder why this step is not included in the actual FCMA function. Why bother with a dedicated preparation step? The rationale behind this is that it helps in speeding up the correlation computation. As such it is a necessary step to use BrainIAKs optimized processing routines.

Note that this function has a nice feature: if your input epoch_list has fewer entries than your input images then it will only take the first N images for which entries in the epoch_list exist. This applies to what we do here: we input an epoch file for three participants only, while inputting an BrainIAK image object with fMRI data for all participants. In other words, if the epoch_list is subsampled then your images will also be subsampled. However, make sure you have the order right: It will only take the first N participants so make sure your epoch file corresponds to those same participants.

We are initially going to work with the first three participants to illustrate feature selection. This could take 20-30 mins on 2 cores. We will then use the results of feature selection for all participants (which we already calculated) to perform final classification.

In [ ]:
# Normalize and format the data.
raw_data, _, labels = prepare_fcma_data(images, epoch_list, mask)

Exercise 3: What does each list element of raw_data and labels represent? What does the dimensionality of the array in each raw_data list element represent?

A:

1.4 Compute correlations and voxel selection

The computational workhorse behind FCMA is the code in compute_correlation. This is C code written with cython (a python binding for C code) that allows for extremely efficient computation of correlations.

The following example extracts the data for one epoch, for one subject and computes the correlation using this procedure.

In [ ]:
# Extract data for one subject, for one epoch (take only the every 10th voxel)
epoch_data = raw_data[0][:,::10]

# Make the data c contiguous (the type of ordering used in C in which the last dimensions are stored first (opposite of fortran/matlab))
mat = np.ascontiguousarray(epoch_data.T, dtype=np.float32)  # Voxels x TRs for one epoch

begin_time = time.time()
epoch_corr = compute_correlation(mat, mat)  # correlation of every voxel with every other voxel
end_time = time.time()
print("Analysis duration: %0.5fs" % (end_time - begin_time))

plt.figure()
plt.title('Correlations for one epoch');
plt.xlabel('voxels');
plt.imshow(epoch_corr);

Exercise 4: What are the inputs to the compute_correlation function and what is the main trick this function pulls to speed up computation?

A:

Voxel Selection

To understand how voxel selection in FCMA works, remember what we did in week five's excercise on classifier optimization in which we learned different ways to perform cross-validation. One option is to leave out some unit or units of your data such as a run or a participant while you fit parameters or select voxels. In FCMA, it is typical to perform nested cross-validation in which we leave out a participant for final testing on an outer loop (e.g., participant 18) and perform our voxel selection with cross-validation on an inner loop with a participant left out to quantify feature selection performance (e.g., participant 17) and the rest used to conduct feature selection (e.g., participants 1-16). Thus, for each of 18 outer-loop folds, there are 17 inner-loop folds. To use this approach we simply ignore one subject's data before executing the voxel selection procedure (the inner loop itself). We can then use that data for final classifier testing. Then, on new iterations of the outer loop we can rotate which participants are to be ignored.

The inner loop, i.e. the voxel selection procedure, consists of computing correlation matrices and the classification of correlation patterns. This is highly computationally demanding and has been optimized in BrainIAK, where the VoxelSelector method implements this massive computation (all voxels x all subjects x all epochs). How does it work?

  1. First, VoxelSelector uses the compute_correlation function to compute the correlation of every voxel with every other voxel in the mask provided for each epoch and participant in the training set.
  2. It then trains a classifier for each voxel on a training set consisting of all epochs of n-1 participants. Each voxel's classifier uses the voxel's correlation values with all other voxels in the mask as features (as if it were a seed).
  3. Finally, it tests the classifier on the nth participant in the training set.

This is repeated n times as a result of cross-validation and the accuracies are averaged across folds for each voxel. We can then rank voxels in terms of performance and use this ranking as a way to select which voxels have discriminative connectivity in the training set.

Exercise 5: Excluding one participant from the training set is arbitrary: we could leave out some blocks from all participants, multiple participants or any combination of these choices (as long as the training and testing data are independent). In the space below, write code using the raw_data and labels variables created by prepare_fcma_data to input into VoxelSelector with N-2 participants (i.e., 2 participants held out for final classifier testing). Print out the new shape of raw_data and labels to confirm your answer.

In [ ]:
# Insert code here

How FCMA works on a computer or cluster

The FCMA implementation in BrainIAK efficiently parallelizes and dessiminates the computation amongst the resources available. Below is a figure detailing the workflow of FCMA. The figure caption is quoted below:

image

"Fig. 1. Workflow overview. FCMA uses a controller/worker architecture, in which each worker first loads the full data into memory. The full data consist of a matrix with V voxels in rows and T timepoints in columns; the timepoints can be subdivided into E epochs, each with TE timepoints (inset depicts two voxels and epochs). The controller process does the following: assigns a subset S of voxels to each of W workers; instructs the worker to compute the correlation between each of these voxels and the rest of the brain in each epoch; instructs the worker to analyze the correlation vectors for each voxel across epochs with MVPA and supplied condition labels; collects the analysis result (i.e., cross-validation accuracy) for each voxel and loads it into memory; and returns to the first step to assign another subset of voxels until there are none left. Finally, the controller writes the results to disk." Source: Wang et al. (2015)

This computation cannot be run within a notebook because the controller and worker relationship needs at least two processors to run; whereas we are currently only using one for this notebook.

Instead we provide some batch scripts to execute the whole sequence of the FCMA workflow. However, the code isn't scary and actually does something quite similar to what we have done before with scikit-learn for classification: we create a classification object with the inputs we desire, specify the classification kernel and then run it, like this:

vs = VoxelSelector(labels, epochs_per_subj, num_subjs, raw_data)
clf = SVC(kernel='precomputed', shrinking=False, C=1)
results = vs.run(clf)

1.5 Create voxel selection masks

When we have each voxel labeled in terms of its average accuracy across inner loop folds we can then choose how many voxels we wish to include in building our classifier model to test the left-out outer-loop participant. This is a feature selection step just like we did with classifiers in notebook 03. Unlike there, where it was relatively easy to do feature selection, here the analyses are very long and so we need to save our intermediate steps. Regardless, the logic is the same: find the voxels that we think will be most useful in our classification for a held out, never before seen, classification.

The bash script we use here calls FSL commands to choose a subset of voxels (top N) and create a binary mask for each training set/outer-loop fold. As you should be able to see, it creates three types of files.

  1. For each input volume, a mask for the top N voxels is created (${pref}_top${voxel_number}.nii.gz)
  2. These individual masks are then concatenated, resulting in an array in which each entry is a top-N-voxel volume mask (all_top\${voxel_number}.nii.gz)
  3. This concatenated array is used to create one map which gives the probability for each voxel to be included in the top N (prop_top\${voxel_number}.nii.gz)

Self-study: Try to understand what happens in the script above. Here are some hints: \${voxel_number} is the number of voxels included. \$subj is the outer-loop subject number. fslmerge has the following input structure: 'fslmerge -t \$output_name \$input_1 \$input_2 ... $input_N'. The last line that calls fslmaths creates the prop_top volume by doing a computation on the all_top volumes.

In [ ]:
!cat ./09-fcma/make_top_voxel_mask.sh

1.6 Classification

On this subset of top N voxels we perform classification of the correlation patterns by first training on the subjects we used for voxel selection (all participants in the training set) and then testing on the held-out outer-loop participant that wasn't used for any previous analyses.

BrainIAK has a special method, called Classifier, that will compute correlations on the selected voxels and perform classification in a computationally efficient manner. We have created the script fcma_classify.py as a wrapper. This script is similar to the voxel selection script: data are loaded into memory on each processor, then normalized and prepared for input to FCMA, a Classifier object is made from the data and then fit. Like voxel selection, this can be called in a similar way to the classification tools from scikit-learn.

Critically, the mask input to the fcma_classify.py is not the whole brain mask but is instead the top N voxels that are deemed appropriate for this outer-loop fold.

As before the code is relatively tractable and familiar; however, again we cannot run this in the notebook because of the need for parallelism. Once our data and labels are arranged into training and test sets we create objects that are ready to be read by FCMA. We then create our SVM kernel and Classifier object. This Classifier object is then fit to the training data before being tested on the never-before-seen test data.

training_obj = list(zip(training_data, training_data))
testing_obj = list(zip(testing_data, testing_data))

svm_clf = SVC(kernel='precomputed', shrinking=False, C=1)
clf = Classifier(svm_clf, epochs_per_subj=epochs_per_subj)

clf.fit(training_obj, training_labels)
predict = clf.predict(testing_obj)

If you look at the fcma_classify.py script you will notice that there is an alternative way to organize the data and feed it into the classifier function. If you concatenate the training and testing data then zip it into a single object, this is much more memory efficient. The FCMA code then takes account of how many training samples there are and never looks at the test data when fitting the model. This procedure is thus better, although a little harder to understand.

Users of FCMA have found it useful to distinguish between intrinsic and extrinsic classification.

Intrinsic classification, demonstrated above, is when the correlations of only the voxels in the top_n_mask are used for classification. In other words this method only cares about correlations among those voxels that were selected because they contain information.

Extrinsic classification is when the correlations used for final classification are between voxels within the top_n_mask and voxels outside of this mask. In other words this method examines information captured by these nodes with the rest of the brain.

The scripts we provide allow you to specify whether you wish to perform intrinsic or extrinsic FCMA.

2. FCMA Batch Scripts

We have covered the main functions needed to run a FCMA analysis. Some of these methods require multiple cores for fast execution and thus cannot be executed from cells in a Jupyter notebook. The scripts are described below.

Note that if you run the scripts on milgram but do not participate in the course, you will run into this error at some point: 'sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified'. This happens because the .sh-scripts are set up to run on a dedicated partition for the course. To solve the error, change the .sh-script file (e.g. using the nano command) and change the line '#SBATCH --partition cmhn-s18' to '#SBATCH --partition short; and delete the line '#SBATCH -A cmhn-s18' before running the scripts.

./09-fcma/run_fcma_voxel_selection_cv.sh

This runs ./09-fcma/fcma_voxel_selection_cv.py which loads in and formats the fMRI and epoch data, performs normalization, and VoxelSelection. The bash script takes in six inputs:

  1. data_dir=What is the directory containing data?
     e.g. "./face_scene/"
  2. suffix=What is the extension of the data you're loading
     e.g. ".nii.gz"
  3. mask_file=What is the path to the whole brain mask
     e.g. "./face_scene/mask.nii.gz"
  4. epoch_file=What is the path to the epoch file
     e.g. "./face_scene/fs_epoch_labels_3sub.npy"
  5. left_out_subj=Which participant (as an integer) are you leaving out for this cv?
     e.g. "0"
  6. output_dir=Where do you want to save the data
     e.g. "./09-fcma/voxel_selection_subsample"

./09-fcma/make_top_voxel_mask.sh

Creates binary masks of the top N voxels for each file generated by voxel selection in a given folder. This creates a mask for each participants and then aggregate masks. This takes the following inputs:

  1. input_dir=What is the path to the data?
     e.g. "./09-fcma/voxel_selection_subsample/"
  2. voxel_number=What voxel threshold would you like to set
     e.g. "100"
  3. output_dir=Where do you want to put the data
     e.g. "./09-fcma/top_n_masks_subsample/"

./09-fcma/run_fcma_classify.sh

This runs fcma_classify.py which performs classification of the voxels selected by VoxelSelection, using the Classifier method. This takes the following inputs:

  1. data_dir=What is the directory containing data?
     e.g. "./face_scene/"
  2. suffix=What is the extension of the data you're loading
     e.g. ".nii.gz"
  3. top_n_mask_file=What is the path to the top N mask file (this is not the whole-brain mask)
     e.g. "./09-fcma/top_n_masks_all/fc_no0_result_seq_top1000.nii.gz"
  4. epoch_file=What is the path to the epoch file
     e.g. "./face_scene/fs_epoch_labels.npy"
  5. left_out_subj=Which participant (as an integer) are you using for testing?
     e.g. "0"
  6. second_mask=Do you want to use a second mask to compare the data with? Necessary for extrinsic analyses. Otherwise set to None.
     e.g. "None"

2.1 Inner loop (subsample)

Although the FCMA tools have greatly sped up the computation time required for these analyses, it still takes a long time to compute a trillion (that's 1,000,000,000,000) correlations, as is needed for voxel selection with this dataset. Hence we are only going to run these voxel selection on three participants.

Exercise 6: Perform the following steps for your toy-sample of three participants:

  1. Check that you have created the epoch file with the first three participants in Excercise 2 and it is called "./face_scene/fs_epoch_labels_3sub.npy"
  2. Run the voxel selection (sbatch ./09-fcma/run_fcma_voxel_selection_cv.sh), leaving one of the participants out each time. You can use the example inputs specified above for the first participant. This can take a couple of minutes with only two cores.
  3. Create a mask of the top 100 voxels (sbatch ./09-fcma/make_top_voxel_mask.sh. You can use the example inputs specified above. This will only take seconds. Remember that this script uses FSL fuctions. Thus, it will only work if you have set up and sourced FSL.
  4. Plot the mask of the top 100 voxels for each participant (saved in ./09-fcma/top_n_masks_subsample) using the nilearn.plotting tools you discovered in week 08.
In [ ]:
# Insert code here

2.2 Outer loop (all)

The above exercise was to familiarize you with how the voxel selection step works. This step is slow because we are calculating the full correlation matrix and a separate cross-validation is performed for every voxel's connectivity. However, the outer loop is relatively fast, at least for intrinsic analyses, because we are only training and testing on a subset of voxels. Hence we will use voxel selection data we prepared earlier (/gpfs/milgram/data/cmhn-s18/datasets/face_scene/voxel_selection_all/) to run the outer loop aka final classification for each participant: Note that the selected features will be slightly different for each outer-loop participant because the training sets differ.

If you don't have access to this data then you will need to run this yourself. Simply create an epoch file with all participants included and run the voxel selection bash script again (create and use a new output directory for this: /09-fcma/voxel_selection_all). This will take considerably longer than what we did for only three subjects before.

In [ ]:
! ls /gpfs/milgram/data/cmhn-s18/datasets/face_scene/voxel_selection_all/

Self-study: Visualize the masks that were previously created. Note, these masks will be different from the ones that you created because they are trained on substantially more data. You can use the code:

module load Apps/FSL/5.0.9

fslview /gpfs/milgram/data/cmhn-s18/datasets/face_scene/voxel_selection_all/fc_no0_result_score.nii.gz

Exercise 7: Create a mask of the top 1000 voxels using the make_top_voxel_mask.sh script. Use the voxel_selection_all folder (full path above) as an input and ./09-fcma/top_n_masks_all as the output destination.

Exercise 8: Perform intrinsic classification on all participants using the run_fcma_classify.sh script. You can use all of the example inputs specified above. Once this has finished, you can print out the 18 outer-loop classification accuracies in the cell below. Use these to calculate the overall mean performance.

In [ ]:
!cat classify_result.txt
In [ ]:
# Insert code here

2.3 Permutation testing

FCMA also has tools to allow for easy permutation of data. This way you can determine, by running this analysis many times, the distribution of classification accuracy for a null effect and how the real effect stacks up. The code below uses the prepare_fcma_data function we already know by using its random argument to perform permutation (of the image data, not the labels). Setting the random argument produces random voxel selection results for non-parametric statistical analysis. There are three random options:

  1. RandomType.NORANDOM is the default
  2. RandomType.REPRODUCIBLE permutes the voxels in the same way every run
  3. RandomType.UNREPRODUCIBLE permutes the voxels differently across runs
In [ ]:
# Note that you must run this every time you wish to call prepare_fcma_data
images = io.load_images_from_dir(data_dir, suffix)

# include the random argument
permuted_raw_data, _, permuted_labels = prepare_fcma_data(images, epoch_list, mask, random=RandomType.REPRODUCIBLE)
print(permuted_labels)

Exercise 9: Perform a permutation analysis by doing the following steps:

  1. Create a new version of the fcma_classify.py script that permutes the data (like the code above) and name it fcma_classify_permuted.py.
  2. Have this script output the accuracy to a text file called classify_result_permuted.txt.
  3. Create a new version of the run_fcma_classify.sh script that executes fcma_classify.py and name it run_fcma_classify_permuted.sh.
  4. On the first outer-loop participant, run this analysis 10 times to get a distribution of null classification accuracies. Print the results below and calculate the mean.

Note that as of April 2018 there seemed to be a bug in the random argument of the prepare_fcma_data function, so there was no effect of permutation. An issue was raised on the BrainIAK github repository and will hopefully resolved soon.

In [ ]:
# Insert code here
In [ ]:
!cat classify_result_permuted.txt

3. Plotting the results

As always, it is useful to visualize our results to get an idea of where in the brain the contains information. However, in the case of FCMA, remember the information isn't localized to a voxel but instead is captured in the relationship between voxel. Hence we typically want to plot a connectome rather than a heatmap. Fortunately, as you might remember from week 08, there are great tools for plotting connectomes in nilearn.

3.1 Plot the connectome

Below we are going to load in a mask that we have created and then compute correlations of every voxel in the mask with every other voxel in the mask (intrinsic). We are then going to create a connectome based on the strongest correlations.

In [ ]:
# Load in the data
epoch_data = raw_data[0] # Just load a single subject and a single epoch
mask_top_n_file = './09-fcma/top_n_masks_all/fc_no0_result_seq_top1000.nii.gz'
mask_top_n_nii = nib.load(mask_top_n_file)  # Load the mask that leaves that participant out
mask_top_n = mask_top_n_nii.get_data()

# Convert the top n mask into a vector with the same number of elements as the whole brain
mask_vec = mask.reshape(np.prod(mask.shape))
mask_top_vec = mask_top_n.reshape(np.prod(mask_top_n.shape))[mask_vec]

# Mask the epoch data
epoch_data_masked = epoch_data[:, mask_top_vec==1]

# Make the data c continguous 
epoch_data_masked = np.ascontiguousarray(epoch_data_masked.T, dtype=np.float32)

# Create the internal correlation
epoch_corr = compute_correlation(epoch_data_masked, epoch_data_masked)

# Pull out the coordinates of the mask (in numpy space)
coord_x, coord_y, coord_z = np.where(mask_top_n == 1)

# Convert from the input space into the same space as the mask_top_n_nii file
# This is not MNI space, so the output will not match the glass brain 
coords = coord_transform(coord_x, coord_y, coord_z, mask_top_n_nii.affine)

# Save for later
np.save("epoch_corr", epoch_corr)
np.save("epoch_corr_coords", coords)

# Plot the connectome (only connections in above the specified percentile will be plotted)
plotting.plot_connectome(epoch_corr, 
                         np.transpose(coords), 
                         edge_threshold="99.9%", 
                         node_size=1);

3.2 Plotting circos

As you can see, the plot above is not very informative due to its high number of nodes. So we need other plotting tools for visualizing our complex data. Circos is a very good one for our purpose here and this is what it looks like: if someone tells you that science isn't art then that person has never seen a circos plot:

image

How does it work? Circos plots take in a correlation matrix and treat each row/column of the matrix as a point around the circle. They then draw a connection to points that exceed a certain threshold. These plots can then be used to display high-dimensional ROI data like you saw above.

Self-study: This website allows you to upload a table of data and make a circos plot. If you would like, you can use the script below to save a correlation matrix as a table that can be read into the website. You could then upload that into your git repo and render it in this notebook or you could download the file directly from the repo. Mileage may vary with the website but when it works it is great. Note that you cannot have a matrix that is bigger than 75 * 75 so figure out how to select your data first.

In [ ]:
# Print circos compatibile information
fid = open('circos_table.txt', 'w')
for voxel_row in range(epoch_corr.shape[0]):
    
    # Do something different for the first row
    if voxel_row == 0:
        line = 'Labels\t'

        for voxel_col in range(epoch_corr.shape[1] - 1):
            line += "vox_" + str(voxel_col) + '\t'
    else:
        
        # Pull out the label and the content of the first line
        line = "vox_" + str(voxel_row -1) + '\t'
        for voxel_col in range(epoch_corr.shape[1]):
            weight = int(abs(epoch_corr[voxel_row - 1, voxel_col]*100))  # Must be positive integers
            line += str(weight) + '\t'
    
    # Write the line you have prepared
    fid.write(line + '\n')

# Close the text file that was created
fid.close()
In [ ]:
# displaying this file might take a while, as it has a lot of entries
!cat circos_table.txt

Circos packages are becoming more available now in the python community. For those really interested you should look at the nxviz github repository.

Self-study: On milgram a module for this has been created to let you make circos plots. However, to be able to use it you need to load different modules (the nxviz package has different dependencies than BrainIAK). So, do not attempt to re run everything in this notebook, the circos environment does not have brainiak. Instead, open a new tunnel with a new environment and run the contents of the cell below and it will work (assuming every other exercise in this notebook is done). To do this,

  1. run sbatch ./09-fcma/run_jupyter_circos.sh to set up an appropriate environment.
  2. Note the JOBID that was assigned to your job ("Submitted batch job JOBID") in your terminal and open the corresponding .txt file (jupyter-log-JOBID.txt) to see the instructions of opening a new tunnel to run your notebook through (analog to what you do after executing ./launch_jupyter.sh)
  3. You might be prompted to provide a password or token. The token is also printed in the jupyter-log-JOBID.txt file (in the last line, everything after "jupyter-log-JOBID.txt")
In [ ]:
# Import the necessary libraries (and not extra)
import matplotlib.pyplot as plt
import numpy as np
import networkx as nx
from nxviz.plots import CircosPlot
%matplotlib inline

# What is the (absolute) correlation threshold
threshold = 0.75

# Load in the data
epoch_corr = np.load("epoch_corr.npy")
epoch_coords = np.load("epoch_corr_coords.npy")

# Preset the graph
G = nx.Graph()

# Create the edge list
nodelist = []
edgelist = []
for row_counter in range(epoch_corr.shape[0]):
    nodelist.append(str(row_counter))  # Set up the node names
    
    for col_counter in range(epoch_corr.shape[1]):
        
        # Determine whether to include the edge based on whether it exceeds the threshold
        if abs(epoch_corr[row_counter, col_counter]) > threshold:
            # Add a tuple specifying the voxel pairs being compared and the weight of the edge
            edgelist.append((str(row_counter), str(col_counter), {'weight': epoch_corr[row_counter, col_counter]}))
        
# Create the nodes in the graph
G.add_nodes_from(nodelist)

# Add the edges
G.add_edges_from(edgelist)

# Set the colors and grouping (specify a key in a dictionary that can then be referenced)
for n, d in G.nodes(data=True):
    
    # Is the x coordinate negative (left)
    if epoch_coords[0][int(n)] < 0:
        if epoch_coords[1][int(n)] < 0:
            G.node[n]['grouping'] = 'posterior_left'
        else:
            G.node[n]['grouping'] = 'posterior_right'
    else:
        if epoch_coords[1][int(n)] < 0:
            G.node[n]['grouping'] = 'anterior_left'
        else:
            G.node[n]['grouping'] = 'anterior_right'

# plot the data
c = CircosPlot(graph=G, node_grouping='grouping', node_color='grouping')
c.draw()
plt.title('Circos plot of epoch data');

4. MVPA and FCMA

The FCMA workflow is intentionally set up in BrainIAK to be parallel to MVPA. Rather than looking at correlations, this simply uses patterns of activity across different trials to discriminate conditions. As before there are scripts available for voxel selection and final classification. Here voxel selection is based on a searchlight analysis in n-1 participants.

Self-study: With a few line changes it is possible to run MVPA with the code from section 2. Investigate what you need to change to make this happen, and feel free to run this analysis (at least on a subset of participants) for your novel contribution.

Novel contribution: be creative and make one new discovery by adding an analysis, visualization, or optimization.

Contributions

B. Hutchinson provided data
B. Hutchinson and Y. Wang provided initial code
M. Kumar, C. Ellis and N. Turk-Browne produced the initial notebook 3/27/18