Contributing
Goals
We’re building a Python toolkit to support neuroscience. This effort is complementary to others in this space, such as nilearn, nipy, and similar. We are primarily working on new research efforts in processing and understanding functional Magnetic Resonance Imaging (fMRI) data sets.
We provide high performance algorithms, typically implemented in C/C++, along with convenient Python wrapper modules that make these advanced tools available for easy use. These are some of the design goals that contributors should keep in mind:
We do not intend to duplicate existing functionality, either in the C++ side, or the Python side. For example, we do not provide any tools for parsing Nifti files, even though BrainIAK heavily depends on them. Nibabel already has perfectly good tools for this.
We try to make the C++ libraries usable outside of Python as well, so that they could be used directly in your C++ project, or accessed from other languages. However this is a secondary goal, our primary goal is to produce a solid, highly usable, very high performance toolkit for Python.
Every algorithm should be capable of running on a single machine, and if there is an appropriate distributed algorithm, it should also be capable of running at cluster scale. It is understood that the single-machine version of an algorithm will need to work with smaller datasets than the cluster version.
How to contribute
We use GitHub pull requests (PRs) to make improvements to the repository. You should make a fork, create a new branch for each new feature you develop, and make a PR to merge your branch into the master branch of the official repository. There are several workflows you could follow. Here is a concise step-by-step description of our recommended workflow:
Fork the official BrainIAK repository on GitHub.
Clone your fork:
git clone https://github.com/yourgithubusername/brainiak
Add the official BrainIAK repository as the
upstream
remote:git remote add upstream https://github.com/brainiak/brainiak
Set the
master
branch to track theupstream
remote:git fetch upstream git branch -u upstream/master
Whenever there are commits in the official repository, pull them to keep your
master
branch up to date:git pull --ff-only
Always create a new branch when you start working on a new feature; we only update the
master
branch via pull requests from feature branches; never commit directly to themaster
branch:git checkout -b new-feature
Make changes and commit them. Include a news fragment for the release notes in
docs/newsfragments
if your changes are visible to users (see Pip’s documentation and our news types inpyproject.toml
).Push your feature branch to your fork:
git push --set-upstream origin new-feature # only for the first push git push # for all subsequent pushes
When your feature is ready, make a PR on GitHub. If you collaborate with others on the code, credit them using Co-authored-by; if you are merging a PR, credit all authors using Co-authored-by in the PR squash message. After your PR is merged, update your
master
branch and delete your feature branch:git checkout master git pull --ff-only git branch -d new-feature git push --delete origin new-feature # or use delete button in GitHub PR
Please see the GitHub help for collaborating on projects using issues and pull requests for more information.
All pull requests are automatically tested using the pr-check.sh
script.
You should test you contributions yourself on your computer using
pr-check.sh
before creating a PR. The script performs several checks in a
Python virtual environment, which is isolated from your normal Python
environment for reproducibility.
During development, you may wish to run some of the individual checks in
pr-check.sh
repeatedly until you get everything right, without waiting for
the virtual environment to be set up every time. You can run the individual
checks from pr-check.sh
using the steps bellow:
# do not run this if using Anaconda, because Anaconda is not compatible with
# venv; instead, look at pr-check.sh to see how to run the individual
# checks that are part of pr-check.sh using Anaconda
# optional, but highly recommended: create a virtual environment
python3 -m venv ../brainiak_pr_venv
source ../brainiak_pr_venv/bin/activate
# install brainiak in editable mode and developer dependencies
python3 -m pip install -U -r requirements-dev.txt
# static analysis
./run-checks.sh
# run tests
./run-tests.sh
# build documentation
cd docs
make
cd -
# optional: remove the virtual environment, if you created one
deactivate
rm -r ../brainiak_pr_venv
When you are ready to submit your PR, run pr-check.sh
even if you were
using the steps above to run the individual checks in pr-check.sh
during
development.
If you want to obtain early feedback for your work, ask people to look at your
fork. Alternatively, you can open a PR before your work is ready; in this case,
you should start the PR title with WIP:
, to let people know your PR is work
in progress.
Tools
We primarily use PyCharm (or equivalently, IDEA with Python plugin). You’re free to use whatever you like to develop, but bear in mind that if you use the same tools as the rest of the group, more people will be able to help if something goes wrong.
The development requirements are listed in requirements-dev.txt
. You can
install them with:
python3 -m pip install -U -r requirements-dev.txt
Standards
Python code should follow the Scikit-learn coding guidelines with the exception that we target Python 3 only.
Python docstrings should be formatted according to the NumPy docstring standard as implemented by the Sphinx Napoleon extension (see also the Sphinx NumPy example). In particular, note that type annotations must follow PEP 484. Please also read the NumPy documentation guide, but note that we consider Sphinx authoritative.
C++ code should follow the WebKit code style guidelines.
All code exposed through public APIs must have documentation that explains what the code does, what its parameters mean, and what its return values can be, at a minimum.
All code must have repeatable automated unit tests, and most code should have integration tests as well.
Where possible, transformations and classifiers should be made compatible with Scikit-learn Pipelines by implementing
fit
,transform
andfit_transform
methods as described in the Scikit-learn pipeline documentation.
All code using random numbers should allow reproducible execution using the Scikit-learn random numbers guidelines.
Use
logging
to record debug messages with a logger obtained using:logging.getLogger(__name__)
Use
warnings
to show warning messages to users. Do not useprint
. See the Python Logging Tutorial for details.
Create usage examples for new modules in the examples directory. Add a
requirements.txt
file to help users install the packages your examples require. If your example requires software that is not available in PyPI, document it in aREADME.rst
and also update the BrainIAK Dockerfile.Remove the output of example Jupyter notebooks before committing them, using nbstripout.
Testing
Unit tests are small tests that execute very quickly, seconds or less. They are
the first line of defense against software errors, and you must include some
whenever you add code to BrainIAK. We use a tool called “pytest” to run tests;
please read the Pytest documentation. You should put your tests in a
test_*.py
file in the test folder, following the structure of the
brainiak
folder. So for example, if you have your code in
brainiak/funcalign/srm.py
you should have tests in
tests/funcalign/test_srm.py
. The unit tests for a subpackage should not
take more than one minute in total on our testing service, Travis CI.
You must install the package in editable mode before running the tests:
python3 -m pip install -e .
You can run ./run-tests.sh
to run all the unit tests, or you can use the
py.test <your-test-file.py>
command to run your tests only, at a more
granular level.
Next to the test results, you will also see a code coverage report. New code should have at least 90% coverage.
Note that you can only obtain test coverage data when the package is installed
in editable mode or the test command is called from the test
directory. If
the package is installed normally and the test command is called from the
project root directory, the coverage program will fail to report the coverage
of the installed code, because it will look for the code in the current
directory, which is not executed.
Folder layout
Since BrainIAK is primarily published as a Python package, it is largely organized according to the Python Packaging User Guide.
Python code goes in the brainiak
package, usually with a subpackage for
each major research initiative. If an algorithm can be implemented in a single
module, place the module directly in the brainiak
package, do not create a
subpackage.
Name subpackages and modules using short names describing their functionality,
e.g., tda
for the subpackage containing topological data analysis work and
htfa.py
for the module implementing hierarchical topographical factor
analysis.
Making a release
This information is only of interest to the core contributors who have the right to make releases.
Before making a release, ensure that:
The following environment variables are set:
export $CONDA_HOME=/path/to/miniconda3
The following Conda channels are enabled:
conda config --add channels conda-forge --add channels defaults
The following Conda packages are installed:
conda install conda-build anaconda-client
To make a release:
Choose a release number,
v
. We follow Semantic Versioning, although we omit the patch number when it is 0:export v=0.11
Prepare the release notes; may require manual additions for PRs without release notes in
docs/newsfragments
:git checkout -b release-v$v towncrier --version $v ./pr-check.sh git commit -a -m "Add release notes for v$v" git push --set-upstream origin release-v$v <Create a PR; merge the PR.> git checkout master git pull --ff-only git branch -d release-v$v
Tag the release:
git tag v$v
Create and test the source distribution package:
# build python3 setup.py sdist # test tar -xf dist/brainiak-$v.tar.gz cd brainiak-$v ./pr-check.sh cd -
Create and test Conda packages (repeat command for all OSes and Python versions); requires the
conda-build
Conda package; make sure you create the tag on all the machines:# build and test in a single script # On Linux machine .conda/bin/build 3.7 .conda/bin/build 3.8 # On macOS machine git tag v$v .conda/bin/build 3.7 .conda/bin/build 3.8
Build the Docker image (requires brainiak-tutorials checkout):
# build # clone if needed git clone git@github.com:brainiak/brainiak-tutorials.git tutorials cd tutorials && git pull --ff-only && cd - docker build --no-cache -t brainiak/brainiak . docker tag v$v-$(date +%Y%m%d) brainiak/brainiak # test # run pr-check.sh in docker # cleanup rm -r brainiak-$v
Build the documentation:
cd docs make cd -
Push tag:
git push upstream v$v
Upload sdist:
twine upload dist/brainiak-$v.tar.gz
Upload Conda package (repeat command for all OSes and Python versions); requires the
anaconda-client
Conda package:anaconda upload -u brainiak \ $CONDA_HOME/conda-bld/<OS>/brainiak-$v-<python_version>.tar.bz2
Push Docker image:
docker push brainiak/brainiak
Publish the documentation:
cd ../brainiak.github.io git checkout -b release-v$v rm -r docs cp -ir ../brainiak/docs/_build/html docs git commit -a -m "Update docs to v$v" # use your fork git push --set-upstream origin release-v$v <Create a PR; merge the PR.> git checkout master git pull --ff-only git branch -d release-v$v cd -