# The computational problem

## Statistical analysis are usually based on performing operations on large data matrices, which increase with increasing sample sizes.

## These analyses are usually performed in single compute nodes (e.g. a desktop, server, or a node in a large computing cluster).

## Single compute nodes have a small number of compute cores and a limited amount of memory.

## This introduces a limit on the dimensions of the matrices that can be analysed by a single compute node and the time required for analyzing them, which in turn affects the sample sizes that can be used in common genomic analysis.

# How to address it

## To overcome the memory and computational capacity limitations, DISSECT decomposes the matrices into blocks.

## The blocks are distributed between networked compute nodes.

## Each node performs computations on local data and shares data with other nodes through the network connection when the algorithm requires it.

## Although this approach requires software specifically designed for dealing with the data redistribution, it is not restricted by the computational limits of a single compute node.

## As a consequence it is very scalable, and allows to analyze larger datasets by just increasing the number of computing nodes available for the analysis.

# What do I need to use DISSECT?

## You can use DISSECT both in a single computer, or in a big computing cluster.

## To take profit of DISSECT scalability, you just need a cluster with computing nodes connected together and MPI libraries available. These libraries are standard and widely available.