The computational problem

Statistical analysis are usually based on performing operations on large data matrices, which increase with increasing sample sizes.

These analyses are usually performed in single compute nodes (e.g. a desktop, server, or a node in a large computing cluster).

Single compute nodes have a small number of compute cores and a limited amount of memory.

This introduces a limit on the dimensions of the matrices that can be analysed by a single compute node and the time required for analyzing them, which in turn affects the sample sizes that can be used in common genomic analysis.

How to address it

To overcome the memory and computational capacity limitations, DISSECT decomposes the matrices into blocks.

The blocks are distributed between networked compute nodes.

Each node performs computations on local data and shares data with other nodes through the network connection when the algorithm requires it.

Although this approach requires software specifically designed for dealing with the data redistribution, it is not restricted by the computational limits of a single compute node.

As a consequence it is very scalable, and allows to analyze larger datasets by just increasing the number of computing nodes available for the analysis.

What do I need to use DISSECT?

You can use DISSECT both in a single computer, or in a big computing cluster.

To take profit of DISSECT scalability, you just need a cluster with computing nodes connected together and MPI libraries available. These libraries are standard and widely available.