Having worked with setting up so called best practice pipelines for different genomics analysis one question that I’ve heard a lot is how long does it take to run? At this point I like to give an answer that’s something along the lines of “it takes 40 hours and 640 core-hours”, and I’d like to argue that the later is the more interesting of the two. Why is that? I’d like try to explain this in layman’s terms so that I myself (and possibly others) can point here to provide an explanation for this in the future.
For some cases what you want to look at is the actual run time (or the “wall time” as it’s sometimes called). For example if you’re setting up a clinical diagnosis pipeline this number might matter a great deal as it is means that you can deliver a result to the doctor quicker. However, for many cases in research this is not the most interesting thing to look at, assuming that you don’t have infinite resources.
Taking a step back and explaining this in a bit more detail one first needs to understand that a computer today typically has more than one processor (also known as a core), and many programs will make use of multiple cores at the same time. This gives a simple relation between normal hours and core hours:
core hours = number of hours passed * number of cores used
However few programs will scale linearly with the number of cores, meaning that you will end up in a situation with diminishing returns.
A computer cluster (such as is used by many genomics researchers) consists of multiple such machines where you can run your programs. Typically you will book a number of cores on such a cluster to run your program on, and you will be charged for the number of cores you booked, as if your were using them 100 percent throughout the entire run-time of the program.
So unless you have for infinite resources you want to use those hours as efficiently as possible, right? Looking at the core hours used rather than the actual run time is one way to make sure that you squeeze out as much as possible from the resources you have. Maybe you have to wait a little longer, but as you’re moving into for example scaling to handle the analysis of thousands whole human genomes core-hours becomes an important factor.
If you didn’t get what those computer folks meant when they were talking about core hours, now you know.