Datalad sharing on Github

Hi all,
I have started to use Datalad for curating and updating the cerebellar atlases from our lab.


The goal is to integrate the atlases (a growing collection) with the atlas viewer:
http://www.diedrichsenlab.org/imaging/AtlasViewer/index.html
  • automatically generate the files needed for FSLeyes.

My current problem is that I pushed the data lad repro to Github, but the files are only present as links (I tried configuring rclone + dropbox, but wasn’t successful in getting it to work).

@tristankk, @akhanf, @switt4: Have you had success in setting up datalad on github successfully and are willing to lend me a hand?

I haven’t tried rclone+dropbox before, but I was able to get things shared via github with OSF for stoage by following these instructions:

http://docs.datalad.org/projects/osf/en/latest/settingup.html

@tristankk may be able to help to get it working with rclone+dropbox though?

I’ve worked with the CONP datasets that are published on GitHub but stored in a variety of other locations, but haven’t set up rclone+dropbox before. I’m happy to dig into it, though!

I’ll try to get a small test dataset published to dropbox with DataLad from my machine and report back. Please let me know if there’s anything else I can do in the meantime.

We talked about this offline and probably won’t be using rclone+dropbox because it’s pretty fiddly, but here’s the procedure for the record (drawn from the relevant section of the DataLad manual):

  1. Install rclone, probably from your repository’s package manager if you’re on Linux.
  2. Configure an rclone dropbox remote: rclone config, n, dropbox, follow the prompts.
  • Importantly, you need to note and document the name you give the rclone remote, because anyone using datalad to get the dataset will need to set up an rclone remote with the same name.
  • It’s named dropbox-for-friends in this procedure.
  1. Install git-annex-remote-rclone – You can git clone this repo and add the resulting directory to your path: export PATH="/home/user-bob/repos/git-annex-remote-rclone:$PATH"
  2. Add the rclone dropbox remote as a DataLad remote (in your existing DataLad dataset’s directory):
  • git annex initremote dropbox-for-friends type=external externaltype=rclone chunk=50MiB encryption=none target=dropbox-for-friends prefix=my_dataset
  1. At this point, you can push the data to dropbox with datalad push --to dropbox-for-friends.
  2. If you now create a GitHub sibling for your dataset, you can make the dropbox remote a publish dependency, so the GitHub sibling will include a link to the Dropbox dataset:
  • datalad create-sibling-github -d . dropbox-test-dataset --publish-depends dropbox-for-friends
  1. Finally, datalad push --to github will push the dataset information (but not the annexed data) to GitHub.

Ok - I guess here is what I learned for the record:

  • If you create a datalad repro with --noannex option, it bahaves like a normal git repro
  • You can create online versions on github in your organisation like so:
    datalad create-sibling-github --github-organization DiedrichsenLab cerebellar_atlases
  • .git/config, .noannex, .gitattributes seem to hold the key to change the behavior of the datalad repro afterwards