brainiak.utils package

Utilities used by multiple subpackages.


brainiak.utils.fmrisim module

fMRI Simulator

Simulate fMRI data for a single subject.

This code provides a set of functions necessary to produce realistic simulations of fMRI data. There are two main steps: characterizing the signal and generating the noise model, which are then combined to simulate brain data. Tools are included to support the creation of different types of signal, such as region specific differences in univariate activity. To create the noise model the parameters can either be set manually or can be estimated from real fMRI data with reasonable accuracy ( works best when fMRI data has not been preprocessed)


generate_signal Create a volume with activity, of a specified shape and either multivariate or univariate pattern, in a specific region to represent the signal in the neural data.

generate_stimfunction Create a timecourse of the signal activation. This can be specified using event onsets and durations from a timing file. This is the time course before convolution and therefore can be at any temporal precision.

export_3_column: Generate a three column timing file that can be used with software like FSL to represent event event onsets and duration

export_epoch_file: Generate an epoch file from the time course which can be used as an input to brainiak functions

convolve_hrf Convolve the signal timecourse with the HRF to model the expected evoked activity

apply_signal Combine the signal volume with the HRF, thus giving the signal the temporal properties of the HRF (such as smoothing and lag)

calc_noise Estimate the noise properties of a given fMRI volume. Prominently, estimate the smoothing and SFNR of the data

generate_noise Create the noise for this run. This creates temporal, spatial task and white noise. Various parameters can be tuned depending on need

mask_brain Create a mask volume that has similar contrast as an fMRI image. Defaults to use an MNI grey matter atlas but any image can be supplied to create an estimate.

compute_signal_change Convert the signal function into useful metric units according to metrics used by others (Welvaert & Rosseel, 2013)

Authors: Cameron Ellis (Princeton & Yale) 2016-2018 Chris Baldassano (Princeton) 2016-2017 Mingbo Cai (Princeton) 2017
brainiak.utils.fmrisim.generate_signal(dimensions, feature_coordinates, feature_size, feature_type, signal_magnitude=[1], signal_constant=1)

Generate volume containing signal

Generate signal, of a specific shape in specific regions, for a single volume. This will then be convolved with the HRF across time

  • dimensions (1d array, ndarray) – What are the dimensions of the volume you wish to create
  • feature_coordinates (multidimensional array) – What are the feature_coordinates of the signal being created. Be aware of clipping: features far from the centre of the brain will be clipped. If you wish to have multiple features then list these as a features x 3 array. To create a feature of a unique shape then supply all the individual feature_coordinates of the shape and set the feature_size to 1.
  • feature_size (list, int) – How big is the signal. If feature_coordinates=1 then only one value is accepted, if feature_coordinates>1 then either one value must be supplied or m values
  • feature_type (list, string) – What feature_type of signal is being inserted? Options are cube, loop, cavity, sphere. If feature_coordinates = 1 then only one value is accepted, if feature_coordinates > 1 then either one value must be supplied or m values
  • signal_magnitude (list, float) – What is the (average) magnitude of the signal being generated? A value of 1 means that the signal is one standard deviation from the noise
  • signal_constant (list, bool) – Is the signal constant across the feature (for univariate activity) or is it a random pattern of a given magnitude across the feature (for multivariate activity)

volume_signal – Creates a single volume containing the signal

Return type:

3 dimensional array, float

brainiak.utils.fmrisim.generate_stimfunction(onsets, event_durations, total_time, weights=[1], timing_file=None, temporal_resolution=100.0)

Return the function for the timecourse events

When do stimuli onset, how long for and to what extent should you resolve the fMRI time course. There are two ways to create this, either by supplying onset, duration and weight information or by supplying a timing file (in the three column format used by FSL).

  • onsets (list, int) – What are the timestamps (in s) for when an event you want to generate onsets?
  • event_durations (list, int) – What are the durations (in s) of the events you want to generate? If there is only one value then this will be assigned to all onsets
  • total_time (int) – How long (in s) is the experiment in total.
  • weights (list, float) – What is the weight for each event (how high is the box car)? If there is only one value then this will be assigned to all onsets
  • timing_file (string) – The filename (with path) to a three column timing file (FSL) to make the events. Still requires total_time to work
  • temporal_resolution (float) – How many elements per second are you modeling for the timecourse. This is useful when you want to model the HRF at an arbitrarily high resolution (and then downsample to your TR later).

stim_function – The time course of stimulus evoked activation. This has a temporal resolution of temporal resolution / 1.0 elements per second

Return type:

1 by timepoint array, float

brainiak.utils.fmrisim.export_3_column(stimfunction, filename, temporal_resolution=100.0)

Output a tab separated three column timing file

This produces a three column tab separated text file, with the three columns representing onset time (s), event duration (s) and weight, respectively. Useful if you want to run the simulated data through FEAT analyses. In a way, this is the reverse of generate_stimfunction

  • stimfunction (timepoint by 1 array) – The stimulus function describing the time course of events. For instance output from generate_stimfunction.
  • filename (str) – The name of the three column text file to be output
  • temporal_resolution (float) – How many elements per second are you modeling with the stimfunction?
brainiak.utils.fmrisim.export_epoch_file(stimfunction, filename, tr_duration, temporal_resolution=100.0)

Output an epoch file, necessary for some inputs into brainiak

This takes in the time course of stimulus events and outputs the epoch file used in Brainiak. The epoch file is a way to structure the timing information in fMRI that allows you to flexibly input different stimulus sequences. This is a list with each entry a 3d matrix corresponding to a participant. The dimensions of the 3d matrix are condition by epoch by time. For the i-th condition, if its k-th epoch spans time points t_m to t_n-1, then [i, k, t_m:t_n] are 1 in the epoch file.

  • stimfunction (list of timepoint by condition arrays) – The stimulus function describing the time course of events. Each list entry is from a different participant, each row is a different timepoint (with the given temporal precision), each column is a different condition. export_epoch_file is looking for differences in the value of stimfunction to identify the start and end of an epoch. If epochs in stimfunction are coded with the same weight and there is no time between blocks then export_epoch_file won’t be able to label them as different epochs
  • filename (str) – The name of the epoch file to be output
  • tr_duration (float) – How long is each TR in seconds
  • temporal_resolution (float) – How many elements per second are you modeling with the stimfunction?
brainiak.utils.fmrisim.convolve_hrf(stimfunction, tr_duration, hrf_type='double_gamma', scale_function=True, temporal_resolution=100.0)

Convolve the specified hrf with the timecourse. The output of this is a downsampled convolution of the stimfunction and the HRF function. If temporal_resolution is 1 / tr_duration then the output will be the same length as stimfunction. This time course assumes that slice time correction has occurred and all slices have been aligned to the middle time point in the TR.

Be aware that if scaling is on and event durations are less than the duration of a TR then the hrf may or may not come out as anticipated. This is because very short events would evoke a small absolute response after convolution but if there are only short events and you scale then this will look similar to a convolution with longer events. In general scaling is useful, which is why it is the default, but be aware of this edge case and if it is a concern, set the scale_function to false.

  • stimfunction (timepoint by feature array) – What is the time course of events to be modelled in this experiment. This can specify one or more timecourses of events. The events can be weighted or binary
  • tr_duration (float) – How long (in s) between each volume onset
  • hrf_type (str or list) – Takes in a string describing the hrf that ought to be created. Can instead take in a vector describing the HRF as it was specified by any function. The default is ‘double_gamma’ in which an initial rise and an undershoot are modelled.
  • scale_function (bool) – Do you want to scale the function to a range of 1
  • temporal_resolution (float) – How many elements per second are you modeling for the stimfunction

signal_function – The time course of the HRF convolved with the stimulus function. This can have multiple time courses specified as different columns in this array.

Return type:

timepoint by timecourse array

brainiak.utils.fmrisim.apply_signal(signal_function, volume_signal)

Combine the signal volume with its timecourse

Apply the convolution of the HRF and stimulus time course to the volume.

  • signal_function (timepoint by timecourse array, float) – The timecourse of the signal over time. If there is only one column then the same timecourse is applied to all non-zero voxels in volume_signal. If there is more than one column then each column is paired with a non-zero voxel in the volume_signal (a 3d numpy array generated in generate_signal).
  • volume_signal (multi dimensional array, float) – The volume containing the signal to be convolved with the same dimensions as the output volume. The elements in volume_signal indicate how strong each signal in signal_function are modulated by in the output volume

signal – The convolved signal volume with the same 3d as volume signal and the same 4th dimension as signal_function

Return type:

multidimensional array, float

brainiak.utils.fmrisim.calc_noise(volume, mask, template, noise_dict=None)

Calculates the noise properties of the volume supplied. This estimates what noise properties the volume has. For instance it determines the spatial smoothness, the autoregressive noise, system noise etc. Read the doc string for generate_noise to understand how these different types of noise interact.

  • volume (4d numpy array, float) – Take in a functional volume (either the file name or the numpy array) to be used to estimate the noise properties of this
  • mask (3d numpy array, binary) – A binary mask of the brain, the same size as the volume
  • template (3d array, float) – A continuous (0 -> 1) volume describing the likelihood a voxel is in the brain. This can be used to contrast the brain and non brain.
  • noise_dict (dict) – The initialized dictionary of the calculated noise parameters of the provided dataset (usually it is only the voxel size)

noise_dict – Return a dictionary of the calculated noise parameters of the provided dataset

Return type:


brainiak.utils.fmrisim.generate_noise(dimensions, stimfunction_tr, tr_duration, template, mask=None, noise_dict=None, temporal_proportion=0.5, iterations=None, fit_thresh=0.05, fit_delta=0.5)

Generate the noise to be added to the signal. Default noise parameters will create a noise volume with a standard deviation of 0.1 (where the signal defaults to a value of 1). This has built into estimates of how different types of noise mix. All noise values can be set by the user or estimated with calc_noise.

  • dimensions (nd array) – What is the shape of the volume to be generated
  • stimfunction_tr (Iterable, list) – When do the stimuli events occur. Each element is a TR
  • tr_duration (float) – What is the duration, in seconds, of each TR?
  • template (3d array, float) – A continuous (0 -> 1) volume describing the likelihood a voxel is in the brain. This can be used to contrast the brain and non brain.
  • mask (3d array, binary) – The mask of the brain volume, distinguishing brain from non-brain
  • noise_dict (dictionary, float) –

    This is a dictionary which describes the noise parameters of the data. If there are no other variables provided then it will use default values. The noise variables are as follows:

    snr [float]: Ratio of MR signal to the spatial noise sfnr [float]: Ratio of the MR signal to the temporal noise. This is the total variability that the following sigmas ‘sum’ to:

    task_sigma [float]: Size of the variance of task specific noise drift_sigma [float]: Size of the variance of drift noise auto_reg_sigma [float]: Size of the variance of autoregressive noise. This is an ARMA process where the AR and MA components can be separately specified physiological_sigma [float]: Size of the variance of physiological noise

    auto_reg_rho [list]: The coefficients of the autoregressive components you are modeling ma_rho [list]:The coefficients of the moving average components you are modeling max_activity [float]: The max value of the averaged brain in order to reference the template voxel_size [list]: The mm size of the voxels fwhm [float]: The gaussian smoothing kernel size (mm) matched [bool]: Specify whether you are fitting the noise parameters

    The volumes of brain noise that are generated have smoothness specified by ‘fwhm’

  • float (temporal_proportion,) – What is the proportion of the temporal variance (as specified by the SFNR noise parameter) that is accounted for by the system noise. If this number is high then all of the temporal variability is due to system noise, if it is low then all of the temporal variability is due to brain variability.
  • iterations (list, int) – The first element is how many steps of fitting the SFNR and SNR values will be performed. Usually converges after < 5. The second element is the number of iterations for the AR fitting. This is much more time consuming (has to make a new timecourse on each iteration) so be careful about setting this appropriately.
  • fit_thresh (float) – What proportion of the target parameter value is sufficient error to warrant finishing fit search.
  • fit_delta (float) – How much are the parameters attenuated during the fitting process, in terms of the proportion of difference between the target parameter and the actual parameter

noise – Generates the noise volume for these parameters

Return type:

multidimensional array, float

brainiak.utils.fmrisim.mask_brain(volume, template_name=None, mask_threshold=None, mask_self=True)

Mask the simulated volume This creates a mask specifying the approximate likelihood that a voxel is part of the brain. All values are bounded to the range of 0 to 1. An appropriate threshold to isolate brain voxels is >0.2. Critically, the data that should be used to create a template shouldn’t already be masked/skull stripped. If it is then it will give in accurate estimates of non-brain noise and corrupt estimations of SNR.

  • volume (multidimensional array) – Either numpy array of a volume or a tuple describing the dimensions of the mask to be created
  • template_name (str) – What is the path to the template to be loaded? If empty then it defaults to an MNI152 grey matter mask. This is ignored if mask_self is True.
  • mask_threshold (float) – What is the threshold (0 -> 1) for including a voxel in the mask? If None then the program will try and identify the last wide peak in a histogram of the template (assumed to be the brain voxels) and takes the minima before that peak as the threshold. Won’t work when the data is not bimodal.
  • mask_self (bool or None) – If set to true then it makes a mask from the volume supplied (by averaging across time points and changing the range). If it is set to false then it will use the template_name as an input.

  • mask (3 dimensional array, binary) – The masked brain, thresholded to distinguish brain and non-brain
  • template (3 dimensional array, float) – A continuous (0 -> 1) volume describing the likelihood a voxel is in the brain. This can be used to contrast the brain and non brain.

brainiak.utils.fmrisim.compute_signal_change(signal_function, noise_function, noise_dict, magnitude, method='PSC')

Rescale the signal to be a given magnitude, based on a specified metric (e.g. percent signal change). Metrics are heavily inspired by Welvaert & Rosseel (2013). The rescaling is based on the maximal activity in the timecourse. Importantly, all values within the signal_function are scaled to have a min of -1 or max of 1, meaning that the voxel value will be the same as the magnitude.

  • signal_function (timepoint by voxel array) – The signal time course to be altered. This can have multiple time courses specified as different columns in this array. Conceivably you could use the output of generate_stimfunction as the input but the temporal variance will be incorrect. Critically, different values across voxels are considered relative to each other, not independently. E.g., if the voxel has a peak signal twice as high as another voxel’s, then this means that the signal after these transformations will still be twice as high (according to the metric) in the first voxel relative to the second
  • noise_function (timepoint by voxel numpy array) – The time course of noise (a voxel created from generate_noise) for each voxel specified in signal_function. This is necessary for computing the mean evoked activity and the noise variability
  • noise_dict (dict) – A dictionary specifying the types of noise in this experiment. The noise types interact in important ways. First, all noise types ending with sigma (e.g. motion sigma) are mixed together in _generate_temporal_noise. The sigma values describe the proportion of mixing of these elements. However critically, SFNR is the parameter that describes how much noise these components contribute to the brain. If you set the noise dict to matched then it will fit the parameters to match the participant as best as possible.
  • magnitude (list of floats) – This specifies the size, in terms of the metric choosen below, of the signal being generated. This can be a single number, and thus apply to all signal timecourses, or it can be array and thus different for each voxel.
  • method (str) – Select the procedure used to calculate the signal magnitude, some of which are based on the definitions outlined in Welvaert & Rosseel (2013): - ‘SFNR’: Change proportional to the temporal variability, as represented by the (desired) SFNR - ‘CNR_Amp/Noise-SD’: Signal magnitude relative to the temporal noise - ‘CNR_Amp2/Noise-Var_dB’: Same as above but converted to decibels - ‘CNR_Signal-SD/Noise-SD’: Standard deviation in signal relative to standard deviation in noise - ‘CNR_Signal-Var/Noise-Var_dB’: Same as above but converted to decibels - ‘PSC’: Calculate the percent signal change based on the average activity of the noise (mean / 100 * magnitude)

signal_function_scaled – The new signal volume with the appropriately set signal change

Return type:

4d numpy array

brainiak.utils.utils module

brainiak.utils.utils.from_tri_2_sym(tri, dim)
convert a upper triangular matrix in 1D format
to 2D symmetric matrix
  • tri (1D array) – Contains elements of upper triangular matrix
  • dim (int) – The dimension of target matrix.

symm – Symmetric matrix in shape=[dim, dim]

Return type:

2D array

convert a 2D symmetric matrix to an upper
triangular matrix in 1D format
Parameters:symm (2D array) – Symmetric matrix
Returns:tri – Contains elements of upper triangular matrix
Return type:1D array

Compute the sum of exponents for a list of samples

Parameters:data (array, shape=[features, samples]) – A data array containing samples.
  • result_sum (array, shape=[samples,]) – The sum of exponents for each sample divided by the exponent of the maximum feature value in the sample.
  • max_value (array, shape=[samples,]) – The maximum feature value for each sample.
  • result_exp (array, shape=[features, samples]) – The exponent of each element in each sample divided by the exponent of the maximum feature value in the sample.


This function is more stable than computing the sum(exp(v)). It useful for computing the softmax_i(v)=exp(v_i)/sum(exp(v)) function.

brainiak.utils.utils.concatenate_not_none(l, axis=0)

Construct a numpy array by stacking not-None arrays in a list

  • data (list of arrays) – The list of arrays to be concatenated. Arrays have same shape in all but one dimension or are None, in which case they are ignored.
  • axis (int, default = 0) – Axis for the concatenation

data_stacked – The resulting concatenated array.

Return type:


Calculate the correlation matrix based on a
covariance matrix
Parameters:cov (2D array) –
Returns:corr – correlation converted from the covarince matrix
Return type:2D array
class brainiak.utils.utils.ReadDesign(fname=None, include_orth=True, include_pols=True)

Bases: object

A class which has the ability of reading in design matrix in .1D file,
generated by AFNI’s 3dDeconvolve.
  • fname (string, the address of the file to read.) –
  • include_orth (Boollean, whether to include "orthogonal" regressors in) – the nuisance regressors which are usually head motion parameters. All the columns of design matrix are still going to be read in, but the attribute cols_used will reflect whether these orthogonal regressors are to be included for furhter analysis. Note that these are not entered into design_task attribute which include only regressors related to task conditions.
  • include_pols (Boollean, whether to include polynomial regressors in) – the nuisance regressors which are used to capture slow drift of signals.

2d array. The design matrix read in from the csv file.


2d array. The part of design matrix corresponding to – task conditions.


number of total columns in the design matrix.


1d array. the types of each column in the design matrix. – 0 for orthogonal regressors (usually head motion parameters), -1 for polynomial basis (capturing slow drift of signals), values > 0 for stimulus conditions


scalar. The number of polynomial bases in the designn matrix.


scalar. The number of stimulus conditions.


scalar. The number of orthogoanal regressors (usually head – motions)


list. The names of each column in the design matrix.

brainiak.utils.utils.gen_design(stimtime_files, scan_duration, TR, style='FSL', temp_res=0.01, hrf_para={'undershoot_scale': 0.035, 'undershoot_delay': 12, 'undershoot_dispersion': 0.9, 'response_delay': 6, 'response_dispersion': 0.9})
Generate design matrix based on a list of names of stimulus
timing files. The function will read each file, and generate a numpy array of size [time_points * condition], where time_points equals duration / TR, and condition is the size of stimtime_filenames. Each column is the hypothetical fMRI response based on the stimulus timing in the corresponding file of stimtime_files. This function uses generate_stimfunction and double_gamma_hrf of brainiak.utils.fmrisim.
  • stimtime_files (a string or a list of string.) – Each string is the name of the file storing the stimulus timing information of one task condition. The contents in the files will be interpretated based on the style parameter. Details are explained under the style parameter.
  • scan_duration (float or a list (or a 1D numpy array) of numbers.) – Total duration of each fMRI scan, in unit of seconds. If there are multiple runs, the duration should be a list (or 1-d numpy array) of numbers. If it is a list, then each number in the list represents the duration of the corresponding scan in the stimtime_files. If only a number is provided, it is assumed that there is only one fMRI scan lasting for scan_duration.
  • TR (float.) – The sampling period of fMRI, in unit of seconds.
  • style (string, default: 'FSL') –

    Acceptable inputs: ‘FSL’, ‘AFNI’ The formating style of the stimtime_files. ‘FSL’ style has one line for each event of the same condition. Each line contains three numbers. The first number is the onset of the event relative to the onset of the first scan, in units of seconds. (Multiple scans should be treated as a concatenated long scan for the purpose of calculating onsets. However, the design matrix from one scan won’t leak into the next). The second number is the duration of the event, in unit of seconds. The third number is the amplitude modulation (or weight) of the response. It is acceptable to not provide the weight, or not provide both duration and weight. In such cases, these parameters will default to 1.0. This code will accept timing files with only 1 or 2 columns for convenience but please note that the FSL package does not allow this

    ’AFNI’ style has one line for each scan (run). Each line has a few triplets in the format of stim_onsets*weight:duration (or simpler, see below), separated by spaces. For example, 3.2*2.0:1.5 means that one event starts at 3.2s, modulated by weight of 2.0 and lasts for 1.5s. If some run does not include a single event of a condition (stimulus type), then you can put *, or a negative number, or a very large number in that line. Either duration or weight can be neglected. In such cases, they will default to 1.0. For example, 3.0, 3.0*1.0, 3.0:1.0 and 3.0*1.0:1.0 all means an event starting at 3.0s, lasting for 1.0s, with amplitude modulation of 1.0.

  • temp_res (float, default: 0.01) – Temporal resolution of fMRI, in second.
  • hrf_para (dictionary) – The parameters of the double-Gamma hemodynamic response function. To set different parameters, supply a dictionary with the same set of keys as the default, and replace the corresponding values with the new values.

design – design matrix. Each time row represents one TR (fMRI sampling time point) and each column represents one experiment condition, in the order in stimtime_files

Return type:

2D numpy array


Get number of CPUs usable by the current process.

Takes into consideration cpusets restrictions.

Return type:int
brainiak.utils.utils.center_mass_exp(interval, scale=1.0)
Calculate the center of mass of negative exponential distribution
p(x) = exp(-x / scale) / scale in the interval of (interval_left, interval_right). scale is the same scale parameter as scipy.stats.expon.pdf
  • interval (size 2 tuple, float) – interval must be in the form of (interval_left, interval_right), where interval_left/interval_right is the starting/end point of the interval in which the center of mass is calculated for exponential distribution. Note that interval_left must be non-negative, since exponential is not supported in the negative domain, and interval_right must be bigger than interval_left (thus positive) to form a well-defined interval.
  • scale (float, positive) – The scale parameter of the exponential distribution. See above.

m – The center of mass in the interval of (interval_left, interval_right) for exponential distribution.

Return type:


brainiak.utils.utils.phase_randomize(D, random_state=0)

Randomly shift signal phases

For each timecourse (from each voxel and each subject), computes its DFT and then randomly shifts the phase of each frequency before inverting back into the time domain. This yields timecourses with the same power spectrum (and thus the same autocorrelation) as the original timecourses, but will remove any meaningful temporal relationships between the timecourses.

This procedure is described in: Simony E, Honey CJ, Chen J, Lositsky O, Yeshurun Y, Wiesel A, Hasson U (2016) Dynamic reconfiguration of the default mode network during narrative comprehension. Nat Commun 7.

  • D (voxel by time by subject ndarray) – fMRI data to be phase randomized
  • random_state (RandomState or an int seed (0 by default)) – A random number generator instance to define the state of the random permutations generator.

phase randomized timecourses

Return type:

ndarray of same shape as D


Empirical cumulative distribution function

Given a 1D array of values, returns a function f(q) that outputs the fraction of values less than or equal to q.

Parameters:x (1D array) – values for which to compute CDF
Returns:ecdf_fun – function that returns the value of the CDF at a given point
Return type:Callable[[float], float]
brainiak.utils.utils.p_from_null(X, two_sided=False, max_null_input=None, min_null_input=None)

Compute p value of true result from null distribution

Given an array containing both a real result and a set of null results, computes the fraction of null results larger than the real result (or, if two_sided=True, the fraction of null results more extreme than the real result in either the positive or negative direction).

Note that all real results are compared to a pooled null distribution, which is the max/min over all null results, providing multiple comparisons correction.

  • X (ndarray with arbitrary number of dimensions) – The last dimension of X should contain the real result in X[…, 0] and the null results in X[…, 1:] If max_null_input and min_null_input are provided, X should contain only the real result
  • two_sided (bool, default:False) – Whether the p value should be one-sided (testing only for being above the null) or two-sided (testing for both significantly positive and significantly negative values)
  • max_null_input (ndarray with num_perm (see brainiak.isfc) entries) – By default this array is derived from the X input array, which can be very large and takes up huge memory space. To save memory, the function which calls p_from_null should provide this array as input.
  • min_null_input (ndarray with num_perm (see brainiak.isfc) entries) – See max_null_input

p – p values for each true X value under the null distribution

Return type:

ndarray the same shape as X, without the last dimension