brainiak.utils package

Utilities used by multiple subpackages.


brainiak.utils.fmrisim module

fMRI Simulator

Simulate fMRI data for a single subject.

This code provides a set of functions necessary to produce realistic simulations of fMRI data. There are two main steps: characterizing the signal and generating the noise model, which are then combined to simulate brain data. Tools are included to support the creation of different types of signal, such as region specific differences in univariate activity. To create the noise model the parameters can either be set manually or can be estimated from real fMRI data with reasonable accuracy ( works best when fMRI data has not been preprocessed)


generate_signal Create a volume with activity, of a specified shape and either multivariate or univariate pattern, in a specific region to represent the signal in the neural data.

generate_stimfunction Create a timecourse of the signal activation. This can be specified using event onsets and durations from a timing file. This is the time course before convolution and therefore can be at any temporal precision.

export_3_column: Generate a three column timing file that can be used with software like FSL to represent event event onsets and duration

export_epoch_file: Generate an epoch file from the time course which can be used as an input to brainiak functions

convolve_hrf Convolve the signal timecourse with the HRF to model the expected evoked activity

apply_signal Combine the signal volume with the HRF, thus giving the signal the temporal properties of the HRF (such as smoothing and lag)

calc_noise Estimate the noise properties of a given fMRI volume. Prominently, estimate the smoothing and SFNR of the data

generate_noise Create the noise for this run. This creates temporal, spatial task and white noise. Various parameters can be tuned depending on need

mask_brain Create a mask volume that has similar contrast as an fMRI image. Defaults to use an MNI grey matter atlas but any image can be supplied to create an estimate.

plot_brain Display the brain, timepoint by timepoint, with above threshold voxels highlighted against the outline of the brain.

Authors: Cameron Ellis (Princeton) 2016-2017 Chris Baldassano (Princeton) 2016-2017
brainiak.utils.fmrisim.generate_signal(dimensions, feature_coordinates, feature_size, feature_type, signal_magnitude=[1], signal_constant=1)

Generate volume containing signal

Generate signal, of a specific shape in specific regions, for a single volume. This will then be convolved with the HRF across time

  • dimensions (1d array, ndarray) – What are the dimensions of the volume you wish to create
  • feature_coordinates (multidimensional array) – What are the feature_coordinates of the signal being created. Be aware of clipping: features far from the centre of the brain will be clipped. If you wish to have multiple features then list these as a features x 3 array. To create a feature of a unique shape then supply all the individual feature_coordinates of the shape and set the feature_size to 1.
  • feature_size (list, int) – How big is the signal. If feature_coordinates=1 then only one value is accepted, if feature_coordinates>1 then either one value must be supplied or m values
  • feature_type (list, string) – What feature_type of signal is being inserted? Options are cube, loop, cavity, sphere. If feature_coordinates = 1 then only one value is accepted, if feature_coordinates > 1 then either one value must be supplied or m values
  • signal_magnitude (list, float) – What is the (average) magnitude of the signal being generated? A value of 1 means that the signal is one standard deviation from the noise
  • signal_constant (list, bool) – Is the signal constant across the feature (for univariate activity) or is it a random pattern of a given magnitude across the feature (for multivariate activity)

volume_signal – Creates a single volume containing the signal

Return type:

3 dimensional array, float

brainiak.utils.fmrisim.generate_stimfunction(onsets, event_durations, total_time, weights=[1], timing_file=None, temporal_resolution=100.0)

Return the function for the timecourse events

When do stimuli onset, how long for and to what extent should you resolve the fMRI time course. There are two ways to create this, either by supplying onset, duration and weight information or by supplying a timing file (in the three column format used by FSL).

  • onsets (list, int) – What are the timestamps (in s) for when an event you want to generate onsets?
  • event_durations (list, int) – What are the durations (in s) of the events you want to generate? If there is only one value then this will be assigned to all onsets
  • total_time (int) – How long (in s) is the experiment in total.
  • weights (list, float) – What is the weight for each event (how high is the box car)? If there is only one value then this will be assigned to all onsets
  • timing_file (string) – The filename (with path) to a three column timing file (FSL) to make the events. Still requires total_time to work
  • temporal_resolution (float) – How many elements per second are you modeling for the timecourse. This is useful when you want to model the HRF at an arbitrarily high resolution (and then downsample to your TR later).

stim_function – The time course of stimulus evoked activation. This has a temporal resolution of temporal resolution / 1.0 elements per second

Return type:

1 by timepoint array, float

brainiak.utils.fmrisim.export_3_column(stimfunction, filename, temporal_resolution=100.0)

Output a tab separated three column timing file

This produces a three column tab separated text file, with the three columns representing onset time (s), event duration (s) and weight, respectively. Useful if you want to run the simulated data through FEAT analyses. In a way, this is the reverse of generate_stimfunction

  • stimfunction (timepoint by 1 array) – The stimulus function describing the time course of events. For instance output from generate_stimfunction.
  • filename (str) – The name of the three column text file to be output
  • temporal_resolution (float) – How many elements per second are you modeling with the stimfunction?
brainiak.utils.fmrisim.export_epoch_file(stimfunction, filename, tr_duration, temporal_resolution=100.0)

Output an epoch file, necessary for some inputs into brainiak

This takes in the time course of stimulus events and outputs the epoch file used in Brainiak. The epoch file is a way to structure the timing information in fMRI that allows you to flexibly input different stimulus sequences. This is a list with each entry a 3d matrix corresponding to a participant. The dimensions of the 3d matrix are condition by epoch by time

  • stimfunction (list of timepoint by condition arrays) – The stimulus function describing the time course of events. Each list entry is from a different participant, each row is a different timepoint (with the given temporal precision), each column is a different condition. export_epoch_file is looking for differences in the value of stimfunction to identify the start and end of an epoch. If epochs in stimfunction are coded with the same weight and there is no time between blocks then export_epoch_file won’t be able to label them as different epochs
  • filename (str) – The name of the three column text file to be output
  • tr_duration (float) – How long is each TR in seconds
  • temporal_resolution (float) – How many elements per second are you modeling with the stimfunction?
brainiak.utils.fmrisim.convolve_hrf(stimfunction, tr_duration, hrf_type='double_gamma', scale_function=True, temporal_resolution=100.0)

Convolve the specified hrf with the timecourse. The output of this is a downsampled convolution of the stimfunction and the HRF function. If temporal_resolution is 1 / tr_duration then the output will be the same length as stimfunction. This time course assumes that slice time correction has occurred and all slices have been aligned to the middle time point in the TR.

Be aware that if scaling is on and event durations are less than the duration of a TR then the hrf may or may not come out as anticipated. This is because very short events would evoke a small absolute response after convolution but if there are only short events and you scale then this will look similar to a convolution with longer events. In general scaling is useful, which is why it is the default, but be aware of this edge case and if it is a concern, set the scale_function to false.

  • stimfunction (timepoint by timecourse array) – What is the time course of events to be modelled in this experiment. This can specify one or more timecourses of events. The events can be weighted or binary
  • tr_duration (float) – How long (in s) between each volume onset
  • hrf_type (str or list) – Takes in a string describing the hrf that ought to be created. Can instead take in a vector describing the HRF as it was specified by any function. The default is ‘double_gamma’ in which an initial rise and an undershoot are modelled.
  • scale_function (bool) – Do you want to scale the function to a range of 1
  • temporal_resolution (float) – How many elements per second are you modeling for the stimfunction

signal_function – The time course of the HRF convolved with the stimulus function. This can have multiple time courses specified as different columns in this array.

Return type:

timepoint by timecourse array

brainiak.utils.fmrisim.apply_signal(signal_function, volume_signal)

Combine the signal volume with its timecourse

Apply the convolution of the HRF and stimulus time course to the volume.

  • signal_function (timepoint by timecourse array, float) – The timecourse of the signal over time. If there is only one column then the same timecourse is applied to all non-zero voxels in volume_signal. If there is more than one column then each column is paired with a non-zero voxel in the volume_signal (a 3d numpy array generated in generate_signal).
  • volume_signal (multi dimensional array, float) – The volume containing the signal to be convolved with the same dimensions as the output volume. The elements in volume_signal indicate how strong each signal in signal_function are modulated by in the output volume

signal – The convolved signal volume with the same 3d as volume signal and the same 4th dimension as signal_function

Return type:

multidimensional array, float

brainiak.utils.fmrisim.calc_noise(volume, mask=None, noise_dict=None)

Calculates the noise properties of the volume supplied. This estimates what noise properties the volume has. For instance it determines the spatial smoothness, the autoregressive noise, system noise etc. Read the doc string for generate_noise to understand how these different types of noise interact.

  • volume (4d numpy array, float) – Take in a functional volume (either the file name or the numpy array) to be used to estimate the noise properties of this
  • mask (3d numpy array, binary) – A binary mask of the brain, the same size as the volume

noise_dict – Return a dictionary of the calculated noise parameters of the provided dataset

Return type:


brainiak.utils.fmrisim.generate_noise(dimensions, stimfunction_tr, tr_duration, template, mask=None, noise_dict=None)

Generate the noise to be added to the signal. Default noise parameters will create a noise volume with a standard deviation of 0.1 (where the signal defaults to a value of 1). This has built into estimates of how different types of noise mix. All noise values can be set by the user or estimated with calc_noise.

  • dimensions (nd array) – What is the shape of the volume to be generated
  • stimfunction_tr (Iterable, list) – When do the stimuli events occur. Each element is a TR
  • tr_duration (float) – What is the duration, in seconds, of each TR?
  • template (3d array, float) – A continuous (0 -> 1) volume describing the likelihood a voxel is in the brain. This can be used to contrast the brain and non brain.
  • mask (3d array, binary) – The mask of the brain volume, distinguishing brain from non-brain
  • noise_dict (dictionary, float) – This is a dictionary which describes the noise parameters of the data. If there are no other variables provided then it will use default values

noise – Generates the noise volume for these parameters

Return type:

multidimensional array, float

brainiak.utils.fmrisim.mask_brain(volume, template_name=None, mask_threshold=None, mask_self=0)

Mask the simulated volume This creates a mask specifying the likelihood (kind of) a voxel is part of the brain. All values are bounded to the range of 0 to 1. An appropriate threshold to isolate brain voxels is >0.2

  • volume (multidimensional array) – Either numpy array of a volume or a tuple describing the dimensions of the mask to be created
  • template_name (str) – What is the path to the template to be loaded? If empty then it defaults to an MNI152 grey matter mask. This is ignored if mask_self is True.
  • mask_threshold (float) – What is the threshold (0 -> 1) for including a voxel in the mask? If None then the program will try and identify the last wide peak in a histogram of the template (assumed to be the brain voxels) and takes the minima before that peak as the threshold. Won’t work when the data is not bimodal.
  • mask_self (bool) – If set to true then it makes a mask from the volume supplied (by averaging across time points and changing the range).

  • mask (3 dimensional array, binary) – The masked brain, thresholded to distinguish brain and non-brain
  • template (3 dimensional array, float) – A continuous (0 -> 1) volume describing the likelihood a voxel is in the brain. This can be used to contrast the brain and non brain.

brainiak.utils.fmrisim.plot_brain(fig, brain, mask=None, percentile=99)

Display the brain that has been generated with a given threshold Will display the voxels above the given percentile and then a shadow of all voxels in the mask

  • fig (matplotlib object) – The figure to be displayed, generated from matplotlib. import matplotlib.pyplot as plt; fig = plt.figure()
  • brain (3d array) – This is a 3d array with the neural data
  • mask (3d array) – A binary mask describing the location that you want to specify as
  • percentile (float) – What percentage of voxels will be included? Based on the values supplied

ax – Object with the information to be plotted

Return type:

matplotlib object

brainiak.utils.utils module

class brainiak.utils.utils.ReadDesign(fname=None, include_orth=True, include_pols=True)

Bases: object

A class which has the ability of reading in design matrix in .1D file,
generated by AFNI’s 3dDeconvolve.
  • fname (string, the address of the file to read.) –
  • include_orth (Boollean, whether to include "orthogonal" regressors in) – the nuisance regressors which are usually head motion parameters. All the columns of design matrix are still going to be read in, but the attribute cols_used will reflect whether these orthogonal regressors are to be included for furhter analysis. Note that these are not entered into design_task attribute which include only regressors related to task conditions.
  • include_pols (Boollean, whether to include polynomial regressors in) – the nuisance regressors which are used to capture slow drift of signals.

2d array. The design matrix read in from the csv file.


2d array. The part of design matrix corresponding to – task conditions.


number of total columns in the design matrix.


1d array. the types of each column in the design matrix. – 0 for orthogonal regressors (usually head motion parameters), -1 for polynomial basis (capturing slow drift of signals), values > 0 for stimulus conditions


scalar. The number of polynomial bases in the designn matrix.


scalar. The number of stimulus conditions.


scalar. The number of orthogoanal regressors (usually head – motions)


list. The names of each column in the design matrix.

brainiak.utils.utils.center_mass_exp(interval, scale=1.0)
Calculate the center of mass of negative exponential distribution
p(x) = exp(-x / scale) / scale in the interval of (interval_left, interval_right). scale is the same scale parameter as scipy.stats.expon.pdf
  • interval (size 2 tuple, float) – interval must be in the form of (interval_left, interval_right), where interval_left/interval_right is the starting/end point of the interval in which the center of mass is calculated for exponential distribution. Note that interval_left must be non-negative, since exponential is not supported in the negative domain, and interval_right must be bigger than interval_left (thus positive) to form a well-defined interval.
  • scale (float, positive) – The scale parameter of the exponential distribution. See above.

m – The center of mass in the interval of (interval_left, interval_right) for exponential distribution.

Return type:


brainiak.utils.utils.concatenate_not_none(l, axis=0)

Construct a numpy array by stacking not-None arrays in a list

  • data (list of arrays) – The list of arrays to be concatenated. Arrays have same shape in all but one dimension or are None, in which case they are ignored.
  • axis (int, default = 0) – Axis for the concatenation

data_stacked – The resulting concatenated array.

Return type:


Calculate the correlation matrix based on a
covariance matrix
Parameters:cov (2D array) –
Returns:corr – correlation converted from the covarince matrix
Return type:2D array

Empirical cumulative distribution function

Given a 1D array of values, returns a function f(q) that outputs the fraction of values less than or equal to q.

Parameters:x (1D array) – values for which to compute CDF
Returns:ecdf_fun – function that returns the value of the CDF at a given point
Return type:Callable[[float], float]
convert a 2D symmetric matrix to an upper
triangular matrix in 1D format
Parameters:symm (2D array) – Symmetric matrix
Returns:tri – Contains elements of upper triangular matrix
Return type:1D array
brainiak.utils.utils.from_tri_2_sym(tri, dim)
convert a upper triangular matrix in 1D format
to 2D symmetric matrix
  • tri (1D array) – Contains elements of upper triangular matrix
  • dim (int) – The dimension of target matrix.

symm – Symmetric matrix in shape=[dim, dim]

Return type:

2D array

brainiak.utils.utils.gen_design(stimtime_files, scan_duration, TR, style='FSL', temp_res=0.01, hrf_para={'undershoot_dispersion': 0.9, 'undershoot_delay': 12, 'response_dispersion': 0.9, 'undershoot_scale': 0.035, 'response_delay': 6})
Generate design matrix based on a list of names of stimulus
timing files. The function will read each file, and generate a numpy array of size [time_points * condition], where time_points equals duration / TR, and condition is the size of stimtime_filenames. Each column is the hypothetical fMRI response based on the stimulus timing in the corresponding file of stimtime_files. This function uses generate_stimfunction and double_gamma_hrf of brainiak.utils.fmrisim.
  • stimtime_files (a string or a list of string.) – Each string is the name of the file storing the stimulus timing information of one task condition. The contents in the files will be interpretated based on the style parameter. Details are explained under the style parameter.
  • scan_duration (float or a list (or a 1D numpy array) of numbers.) – Total duration of each fMRI scan, in unit of seconds. If there are multiple runs, the duration should be a list (or 1-d numpy array) of numbers. If it is a list, then each number in the list represents the duration of the corresponding scan in the stimtime_files. If only a number is provided, it is assumed that there is only one fMRI scan lasting for scan_duration.
  • TR (float.) – The sampling period of fMRI, in unit of seconds.
  • style (string, default: 'FSL') –

    Acceptable inputs: ‘FSL’, ‘AFNI’ The formating style of the stimtime_files. ‘FSL’ style has one line for each event of the same condition. Each line contains three numbers. The first number is the onset of the event relative to the onset of the first scan, in units of seconds. (Multiple scans should be treated as a concatenated long scan for the purpose of calculating onsets. However, the design matrix from one scan won’t leak into the next). The second number is the duration of the event, in unit of seconds. The third number is the amplitude modulation (or weight) of the response. It is acceptable to not provide the weight, or not provide both duration and weight. In such cases, these parameters will default to 1.0. This code will accept timing files with only 1 or 2 columns for convenience but please note that the FSL package does not allow this

    ’AFNI’ style has one line for each scan (run). Each line has a few triplets in the format of stim_onsets*weight:duration (or simpler, see below), separated by spaces. For example, 3.2*2.0:1.5 means that one event starts at 3.2s, modulated by weight of 2.0 and lasts for 1.5s. If some run does not include a single event of a condition (stimulus type), then you can put *, or a negative number, or a very large number in that line. Either duration or weight can be neglected. In such cases, they will default to 1.0. For example, 3.0, 3.0*1.0, 3.0:1.0 and 3.0*1.0:1.0 all means an event starting at 3.0s, lasting for 1.0s, with amplitude modulation of 1.0.

  • temp_res (float, default: 0.01) – Temporal resolution of fMRI, in second.
  • hrf_para (dictionary) – The parameters of the double-Gamma hemodynamic response function. To set different parameters, supply a dictionary with the same set of keys as the default, and replace the corresponding values with the new values.

design – design matrix. Each time row represents one TR (fMRI sampling time point) and each column represents one experiment condition, in the order in stimtime_files

Return type:

2D numpy array

brainiak.utils.utils.p_from_null(X, two_sided=False)

Compute p value of true result from null distribution

Given an array containing both a real result and a set of null results, computes the fraction of null results larger than the real result (or, if two_sided=True, the fraction of null results more extreme than the real result in either the positive or negative direction).

Note that all real results are compared to a pooled null distribution, which is the max/min over all null results, providing multiple comparisons correction.

  • X (ndarray with arbitrary number of dimensions) – The last dimension of X should contain the real result in X[…, 0] and the null results in X[…, 1:]
  • two_sided (bool, default:False) – Whether the p value should be one-sided (testing only for being above the null) or two-sided (testing for both significantly positive and significantly negative values)

p – p values for each true X value under the null distribution

Return type:

ndarray the same shape as X, without the last dimension

brainiak.utils.utils.phase_randomize(D, random_state=0)

Randomly shift signal phases

For each timecourse (from each voxel and each subject), computes its DFT and then randomly shifts the phase of each frequency before inverting back into the time domain. This yields timecourses with the same power spectrum (and thus the same autocorrelation) as the original timecourses, but will remove any meaningful temporal relationships between the timecourses.

This procedure is described in: Simony E, Honey CJ, Chen J, Lositsky O, Yeshurun Y, Wiesel A, Hasson U (2016) Dynamic reconfiguration of the default mode network during narrative comprehension. Nat Commun 7.

  • D (voxel by time by subject ndarray) – fMRI data to be phase randomized
  • random_state (RandomState or an int seed (0 by default)) – A random number generator instance to define the state of the random permutations generator.

phase randomized timecourses

Return type:

ndarray of same shape as D


Compute the sum of exponents for a list of samples

Parameters:data (array, shape=[features, samples]) – A data array containing samples.
  • result_sum (array, shape=[samples,]) – The sum of exponents for each sample divided by the exponent of the maximum feature value in the sample.
  • max_value (array, shape=[samples,]) – The maximum feature value for each sample.
  • result_exp (array, shape=[features, samples]) – The exponent of each element in each sample divided by the exponent of the maximum feature value in the sample.


This function is more stable than computing the sum(exp(v)). It useful for computing the softmax_i(v)=exp(v_i)/sum(exp(v)) function.


Get number of CPUs usable by the current process.

Takes into consideration cpusets restrictions.

Return type:int