Accessing OpenNeuro datasets on the CBS Server

Hi everyone,

OpenNeuro is a good source of open, BIDS-formatted neuroimaging datasets. However, accessing those datasets can be tricky if you’re not a regular DataLad user. Here I’ll describe how to access an OpenNeuro datasets for use on the CBS server without needing to know any DataLad commands beyond those used in this tutorial. (I will note that DataLad has a lot of nice features for accessing others’ data and managing your own data, so it’s a useful tool to know and I’m happy to help you learn it for your own projects!)

I’ll use text formatted like this: {placeholder} to refer to placeholder values that depend on your specific situation.

  1. Copy the clone URL (generally of the form https://github.com/OpenNeuroDatasets/{dataset-id}.git from the “Clone” dropdown on the dataset page.
  2. Navigate to /localscratch on the CBS Server (DataLad generally doesn’t work with the networked filesystems used by directories under /home or /srv):
    $ cd /localscratch
    
  3. Clone the dataset you’re interested in using the URL you copied earlier:
    $ datalad clone https://github.com/OpenNeuroDatasets/{dataset-id}.git
    
  4. Download the dataset content (datalad clone only sets up placeholders for larger files like images):
    $ cd /localscratch/{dataset-id}
    $ datalad get .
    
  5. Export an archive of the dataset (this will make the content available without using DataLad):
    $ datalad export-archive /localscratch/{dataset-id}.tar.gz
    
  6. Copy the dataset archive to persistent network storage (probably your PI’s /srv share).
    Important: If you leave the dataset in /localscratch, it will disappear when you log out of the CBS Server, so be sure to do this if you’d like to use the dataset again without needing to download it again.
    $ cp /localscratch/{dataset-id}.tar.gz /srv/{pi-name}/{username}
    
  7. Extract the dataset archive.
    $ cd /srv/{pi-name}/{username}
    $ tar -xf {dataset-id}.tar.gz
    

And with that your dataset will be persistently available at /srv/{pi-name}/{username}/{dataset-id} with no need to use DataLad for access. With that said, DataLad has a lot of nice features, so I’ll write a future post on how maintain the DataLad metadata with the dataset in an RIA store.