By Jeremy R. Manning (jeremy.r.manning@dartmouth.edu) and Paxton C. Fitzpatrick (Paxton.C.Fitzpatrick@dartmouth.edu)
In this demonstration, we'll be using the BrainIAK Python toolbox to apply Hierarchical Topographic Factor Analysis (HTFA) to an fMRI dataset.
The demo will comprise three main steps:
Manning JR, Ranganath R, Norman KA, Blei DM (2014). Topographic Factor Analysis: a Bayesian model for inferring brain networks from neural data. PLoS One, 9(5): e94914. link Describes a single-subject model (TFA) for inferring brain network hubs and applies it to a semantic decoding dataset.
Manning JR, Zhu X, Willke TL, Ranganath R, Stachenfeld K, Hasson U, Blei DM, Norman KA (2018). A probabilistic approach to discovering dynamic full-brain functional connectivity patterns. NeuroImage, 180: 243-252. link Describes a multi-subject (hierarchical) model (HTFA) for inferring shared brain network hubs and applies it to a story listening and movie viewing dataset.
Owen LLW, Chang TH, Manning JR (2020). High-level cognition during story listening is reflected in high-order dynamic correlations in neural activity patterns. bioRxiv. link Describes a model for inferring network dynamics from timeseries data and applies it to HTFA fits to a story listening dataset.
The easiest way to run this notebook is to download and install Docker on your local machine, and then build the Docker image in this folder. That will install the necessary toolboxes and dependencies, and will also download the data you'll be analyzing. Follow the instructions for your platform to download and install Docker, and then start the Docker Desktop application.
After you've installed Docker, to build the docker image, just navigate to this folder and run:
docker build --rm --force-rm -t htfa .
To start the image for the first time, run:
docker run -it -p 8888:8888 --name htfa -v $PWD:/mnt htfa
and on subsequent times, run:
docker start htfa && docker attach htfa
When the docker image is started, it will automatically start a Jupyter notebook server. Copy and paste the third link into a browser to interact with this notebook.
To stop running the container, run:
docker stop htfa
Run the cells below (in sequence) to load in the example dataset, fit HTFA to the data, and visualize the resulting network dynamics.
Import libraries and helper functions and load the dataset. The dataset we'll be analyzing is a subset of the story listening dataset collected by Simony et al. (2016).
# the next two lines suppress a (meaningless) warning about the graphics backend
import warnings
warnings.simplefilter('ignore')
# data science and visualization libraries
import numpy as np
import pandas as pd
import nibabel as nb
import nilearn as nl
import nltools as nlt
import timecorr as tc
import seaborn as sns
from IPython.display import HTML
# system libraries
from mpi4py import MPI
import os, sys
from glob import glob as lsdir
import pickle as pkl
# brainiak
from brainiak.factoranalysis.htfa import HTFA
# convenience functions from helpers.py
from helpers import nii2cmu, cmu2nii, animate_chord, animate_connectome, opts, opts2str, htfa2dict, dict2htfa, local_params, global_params, plot_nodes
# set to True for a faster run (~20ish minutes)
# set to False for a more accurate run (overnight on a powerful MPI-enabled multi-processor
# machine or several days on a recent single-core machine or VM)
debug_mode = True
# data (change paths if running outside of Docker)
intact = lsdir('/data/Pieman2/sub-*/func/*intact*.nii.gz')
scrambled = lsdir('/data/Pieman2/sub-*/func/*word*.nii.gz')
if debug_mode:
intact = intact[:3]
scrambled = scrambled[:3]
combined_fnames = list.copy(intact)
combined_fnames.extend(scrambled)
First we need to convert the dataset into CMU format. Consistent with CMU format, nilearn expects the data matrices to have number-of-timepoints rows and number-of-voxels columns. BrainIAK expects the data in the transpose of that format-- number-of-voxels by number-of-timepoints matrices. We can easily convert between the two formats as shown below.
After wrangling the data, we'll fit HTFA to the full dataset to identify network nodes.
# convert data (.nii or .nii.gz files) to CMU format
cmu_data = [nii2cmu(f) for f in combined_fnames]
# convert to HTFA format
htfa_data = [{'R': x['R'], 'Z': x['Y'].T} for x in cmu_data]
# configure MPI:
# - if the local environment supports MPI, do computations in parallel (fast!)
# - if MPI is not supported, do computations in serial (slower)
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
if rank == 0:
import logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# set up HTFA model
params = opts(debug=debug_mode)
nvoxels, ntimepoints = htfa_data[0]['Z'].shape
# API specification: https://brainiak.org/docs/brainiak.factoranalysis.html#module-brainiak.factoranalysis.htfa
htfa = HTFA(K=params['K'],
n_subj=len(htfa_data),
max_global_iter=params['max_global_iter'], # decrease for speed, increase for greater accuracy (min: 1)
max_local_iter=params['max_local_iter'], # decrease for speed, increase for greater accuracy (min: 1)
voxel_ratio=params['voxel_ratio'], # decrease for speed, increase for greater accuracy (positive; max: 1.0)
tr_ratio=params['tr_ratio'], # decrease for speed, increase for greater accuracy (positive; max: 1.0)
max_voxel=int(nvoxels*params['max_voxel_scale']), # decrease for speed, increase for greater accuracy (max: number of voxels)
max_tr=int(ntimepoints*params['max_tr_scale']), # decrease for speed, increase for greater accuracy (max: number of TRs)
verbose=params['verbose'])
# load or save a pickled model
savedir = os.path.join('/data', 'htfa')
if not os.path.exists(savedir):
os.makedirs(savedir)
htfa_fname = os.path.join(savedir, 'htfa_' + opts2str(params) + '.pkl')
if not os.path.exists(htfa_fname):
# fit HTFA to htfa_data and save result
htfa.fit([x['Z'] for x in htfa_data], [x['R'] for x in htfa_data])
pkl.dump(htfa2dict(htfa), open(htfa_fname, 'wb'))
htfa_dict = pkl.load(open(htfa_fname, 'rb'))
htfa = dict2htfa(htfa_dict) # overwrite htfa object with saved copy
We'll generate a plot where the global node locations are shown in black, and each subject's "local" node locations are shown in color (where each subject is assigned a different color). The nodes should be in similar (but not identical) locations across subjects. Note: if the number of nodes or iterations is small, or if the voxel and/or timepoint subsampling is high, the final result will tend to be close to the initialized values. Increase max_global_iter
, max_local_iter
, max_voxel
, and max_tr
to achieve greater accuracy.
n_timepoints = [x['Z'].shape[1] for x in htfa_data] # number of timepoints for each person
plot_nodes(htfa, n_timepoints, cmap='Spectral', global_scale=100, local_scale=25)
<nilearn.plotting.displays.OrthoProjector at 0x7f4969dd32d0>
The timeseries of activations for each node, for each participant provide a low-dimensional embedding of the original data that we can use to efficiently examine dynamic connectivity patterns. Obtaining these embeddings requires some data wrangling.
centers, widths, weights = local_params(htfa, n_timepoints)
# filter out intact vs. (word) scrambled subjects
intact_weights = [w for i, w in enumerate(weights) if 'intact' in combined_fnames[i]]
scrambled_weights = [w for i, w in enumerate(weights) if 'word' in combined_fnames[i]]
# compute dynamic ISFC for intact, (word) scrambled
intact_disfc = tc.timecorr(intact_weights, cfun=tc.isfc, combine=tc.corrmean_combine)
scrambled_disfc = tc.timecorr(scrambled_weights, cfun=tc.isfc, combine=tc.corrmean_combine)
The cells below generate interactive figures. After running each cell, move the sliders to change which timepoints are displayed. The next cell generates a figure that displays network dynamics during the "intact" story listening experimental condition, and the subsequent cell generates a figure that displays network dynamics during the "word-scrambled" experimental condition.
# intact
if debug_mode:
pthresh = 50
max_frames = np.min([20, intact_disfc.shape[0], scrambled_disfc.shape[0]])
else:
pthresh = 95
max_frames = np.min([intact_disfc.shape[0], scrambled_disfc.shape[0]])
cthresh = np.min([np.percentile(np.abs(scrambled_disfc), pthresh), np.percentile(np.abs(intact_disfc), pthresh)])
max_frames = np.min([max_frames, 100]) # comment out this line to display the full animation (takes a long time!)
animate_chord(intact_disfc[:max_frames, :], cthresh=cthresh)
# scrambled
animate_chord(scrambled_disfc[:max_frames, :], cthresh=cthresh)
Run the cells below to generate the animations. Individual animated frames may be found in the intact_frames
and scrambled_frames
sub-folder of this directory. The frames are stitched together into an mp4 file in order to display the animation in the notebook. You can right-click on the animations to save the files.
The next cell generates an animation for the "intact" experimental condition, and the subsequent cell generates an animation for the "word-scrambled" experimental condition.
# intact
global_centers, global_widths = global_params(htfa)
intact_ani = animate_connectome(global_centers, intact_disfc[:max_frames, :], cthresh=cthresh, figdir='intact_frames')
HTML(intact_ani.to_html5_video())
INFO:matplotlib.animation:Animation.save using <class 'matplotlib.animation.FFMpegWriter'> INFO:matplotlib.animation:MovieWriter._run: running command: ffmpeg -f rawvideo -vcodec rawvideo -s 432x288 -pix_fmt rgba -r 20.0 -loglevel error -i pipe: -vcodec h264 -pix_fmt yuv420p -y /tmp/tmpcgcso4cq/temp.m4v
# scrambled
scrambled_ani = animate_connectome(global_centers, scrambled_disfc[:max_frames, :], cthresh=cthresh, figdir='scrambled_frames')
HTML(scrambled_ani.to_html5_video())
INFO:matplotlib.animation:Animation.save using <class 'matplotlib.animation.FFMpegWriter'> INFO:matplotlib.animation:MovieWriter._run: running command: ffmpeg -f rawvideo -vcodec rawvideo -s 432x288 -pix_fmt rgba -r 20.0 -loglevel error -i pipe: -vcodec h264 -pix_fmt yuv420p -y /tmp/tmpwrq7s60h/temp.m4v
Using HTFA, we were able to quickly and easily examine and compare network dynamic patterns in a large fMRI dataset, using only modest computing resources. The resulting networks are intuitive and straightforward to visualize.