Introduction to sharing code

Open analysis refers to the practice of making the methods and tools, in particular any code used to analyze data, available alongside publication of a research study. Sharing methods and tools makes it much easier to replicate a study than it would otherwise be, and also "facilitate[s] progress in reuse, adaptation, and extension for new research” (Nosek et al, 2012).

At the most basic level, sharing analysis code is very straightforward: simply sharing any assortment of separate scripts that were used to perform the analysis in a study to a public repository of some kind is generally preferable to not sharing any code at all (Barnes, 2010). However, there are a number of more subtle best practices that maximize the utility of shared methods and tools. In a nice paper, Benureau and Rougier articulated five characteristics that a piece of software should have to be maximally useful as part of a publication - it should be: re-runnable, repeatable, reproducible, reusable, and replicable (Benureau and Rougier, 2018). Software development is an increasingly important skill in academia, and sharing code publicly is a good motivator to improve your research software.

Repositories

There are many ways to share code, with different advantages and disadvantages.

Distributed version control systems like git (and GitHub) are a useful tool for collaborative development of code, even without sharing that code publicly. A simple way to share research code is to simply share a link to an existing GitHub repository in which that code is being developed. However, even if the link references a specific version of the code, it is possible to alter the history of a repository, or delete it, so additional tools are ideal for sharing the code that was used to generate a particular result.

Zenodo is a repository that can be used to share all kinds of research outputs, including code. Zenodo does have a useful Github integration feature, which archives a repository every time a new release is added. This makes it easy to refer to a specific, immutable version of some code in a paper.

More generally, most repositories that accept research data (discussed here) can also be used to archive specific versions of research software.

Reproducibility and containers

Reproducing results from a paper can be a challenging endeavor, even when the software tools and data used to produce those results are publicly available. One reason that reproducing results may be challenging is that it can be difficult to recreate the execution environment in which the analysis was first performed. Installed software dependencies, operating systems used, and hardware on which a piece of software is run can all affect the results in ways that are difficult to predict, so any methods of standardizing those execution environments are useful.

One such standardization tool that is seeing increasingly broad use is the container, of which Docker and Singularity are two examples. Containers allow a process to isolate its own filesystem, network, and process list from the rest of the machine it’s running on, meaning that process can have its own isolated operating system and dependencies installed. So, if the developer of some research software releases a container with their software and its dependencies included, others are more likely to easily be able to run the software and reproduce any results associated with that software. While containers can certainly make it easier to reproduce results, it is possible to use a container as crutch for code that does not sufficiently document its dependencies and other important information. Ideally, containers should be an addition to already high-quality code rather than a replacement for that quality.

BIDS Apps are a neuroimaging-specific technology for sharing research software. The idea is that BIDS apps are containerized neuroimaging applications that take BIDS-formatted input data with a common command-line interface. The containerization and common interface are designed to make it as easy as possible to run software that has been packaged as a BIDS app.

Boutiques and the CONP Portal

Building on Zenodo and containers, Boutiques is a framework for publishing, integrating, and executing command-line applications across computing platforms. Boutiques is primarily designed for neuroinformatics applications.

Boutiques works by associating applications with a JSON-formatted descriptor that specifies their required inputs, produced outputs, and command-line interface. The descriptor can also identify a container that encapsulates the application, allowing the full execution environment to be bundled with the command-line interface. In this way, Boutiques allows applications to be executed in a variety of environments in a unified way.

The CONP Portal makes use of Boutiques to expose neuroinformatics tools in that unified way. If you’ve created a Boutiques descriptor for a tool you’ve developed, Boutiques can publish it on Zenodo from the command line. Publishing a tool on Zenodo this way automatically adds it to the CONP Portal, where others have the ability to access and run the tool.