I won’t be making any detailed summary from Bioinformatics Open Source Conference (BOSC) 2015 in Dublin since there is already an excellent one available here: https://smallchangebio.wordpress.com/2015/07/11/bosc2015day2b/ However I thought I’d write down some thoughts on the trends that I think I saw.
Before going into the technical stuff I’d just like to add that the picture above was taken at the panel on diversity in the bioinformatics open source community. It was nice seeing the issue addressed, as it is an important a challenge as any technical once we might currently be facing as a community. This in addition to the cards used to take questions for speakers (instead of the usual stand up and talk way), show that the BOSC organizers are willing to take this on. Kodos to them for doing so!
Workflows, workflows, workflows….
It’s clear that the problem of handling workflows is still an unsolved problem. Having spent considerable time and effort setting up and managing pipelines myself, I truly applaud the on-going efforts to make things a bit more standardized and interoperable using the Common Workflow Language (CWL). If in the future it would actually be possible to download and run somebodies pipeline on more than one platform that would be truly amazing.
There still seems to be some confusion about the exact nature of CWL and what it aims at doing. My understanding is that it will provide a specification consisting of tool definitions and workflow descriptions that platform developers can implement in order to make it possible to migrate these between platforms. As of yet it seems to be somewhat lacking on the implementation side of things (which is to be expected since it was announced to the public at this BOSC if I understand things correctly). I really hope that things will take off on the implementation front and once it does I want to try my hand at translating some of the things that we have setup into CWL.
In my wildest dreams the CWL could also serve as a starting point for build a community which could also be collaborating on and providing other things, such as:
- tools repositories (like a Docker Hub for bioinformatics tools), providing containers and tool definitions.
- collaborative workflow repositories (that are actually possible to deploy outside their exact environment – no more re-implementing the GATK best practice pipeline yet another time).
- reference data repositories – something that could be to bioinformatics what Maven Central is to Java. A single place from which e.g. reference genomes could be downloaded automatically based on a configuration file. (While writing this I realized that I’d seen something similar to what I was describing: Cosmid – so folks, let’s just go and adopt this now!)
Docker was mentioned so many times that eventually it became a joke. It does seem to provide a convenient solution to the software packaging problem – however my own limited experience with Docker tells me that the process of using it would have to be simplified in order to make it adoptable outside of a limited group.
What’s needed is something which allows you to access the tool you want to use with a minimum overhead. Something like:
run-in-docker <my-favorite-tool> <args>
Until this is possible I don’t think that we are going to see wide spread adoption outside platform solutions like Arvados, Galaxy, etc. I guess there’s also there are issues of security that would need to be resolved before sys-admins at HPC centers would be willing to install it.
Going to BOSC was a rewarding experience and an excellent opportunity to get a feel for where the community is heading in the future. A warm thanks to the organizers as well as all the speakers.