Notes on parallelism in Microbase
There are two forms of parallelism used by Microbase: global and local. It is important to understand these definitions, as well as the characteristics of your responders’ workloads in order to maximise the efficiency of your available hardware.
Global parallelism refers to the ability to execute multiple instances of responders across a set of machines. Jobs from different types of responders can often be executed in parallel, if there are no dependencies between them. For example, two different tools can perform different analyses of the same input data simultaneously.
Another form of global parallelism is the parallel execution of different jobs from the same type of responder. For example, performing the same type of analysis on two independent datasets simultaneously.
Local parallelism refers to the ability of a single, multi-threaded program instance to exploit multiple CPU cores of the same machine to achieve a degree of speedup. With the pervasiveness of multi-core machines, ever more analysis tools are becoming multi-threaded. Microbase provides job scheduling functions at the global level and does not interfere with local parallelism other than to specify the number of cores that a particular instance of a program should use for a specific execution. This determination is made partly based on user-specified configuration and partly based on the current job workloads and capabilities of the available hardware.
Examples of where you might wish to limit global or local parallelism:
- If your responders or analysis tools access a centralised resource, such as a relational database or FTP server, you need to determine the optimum global number of jobs such that the centralised resource isn’t swamped with requests.
- If your machines have many CPUs, but some tasks are disk-intensive, then it would make sense to limit the local parallelism of the disk-intensive jobs. Depending on your workflow, this may improve hardware resource use by: a) spreading the disk-intensive jobs thinly over a larger number of machines; b) enabling other CPU-intensive tasks to execute in parallel with an otherwise disk-saturated machine.
When a responder executes, it is assigned a number of CPU cores by the minion. This number of cores will be at least one, but less than or equal to the number of cores reported by the operating system the minion is executing within. The number of cores assigned to a particular responder instance depends on which other processes are running in parallel. For example, on a 16-core machine, your responder may receive all 16 cores if it the only task running, or it may receive 8 cores if the minion decides to run two 8-core tasks. The responder configuration property ‘preferred cores’ can influence this number to a certain extent, but the final decision is made by the minion. In order to ensure the most efficient use of available hardware, your responder should:
- never use more cores than allocated by the minion;
- never request more cores than it can make use of.
Bear in mind that many bioinformatics tools support multi-core (local) parallelism, but the advantage of assigning more drops off after some threshold. For example, if you have a 32-core machine available, you will need to perform adequate benchmarking to determine whether you should run 8×4-core jobs, 2×16 cores jobs, or one job that uses all 32 cores.