.. _pipeline-layout:

Pipeline layout
===============

The pipeline system is designed to be organised in a standard directory
structure. Insofar as is possible, this contains all information needed to
manage a cluster and assosciated set of pipeline tasks. It is not designed to
contain the actual data that is being processed. It is assumed that
this directory will be available to all the various cluster nodes, presumably
using NFS.

The root directory of this structure is the ``runtime`` directory. This
contains all the information about a given "cluster" -- that is, all the
disks, compute nodes, management node, etc described by a given
``clusterdesc`` file. This top level directory contains that ``clusterdesc``,
and, if appropriate, information about an associated IPython cluster: 

* A PID file (``ipc.pid``) and log files from the IPython controller (named
  according to the pattern ``ipc.log${pid}.log``)

* An ``engines`` directory, which contains PID (``ipengine${N}.pid``, where
  ``N`` simply increments according to the number of engines on the host)
  files and log files (``log${PID}.log``) from the various engines in the
  cluster, organised into subdirectories by hostname.

* The files ``engine.furl``, ``multiengine.furl`` and ``task.furl``, which
  provide access to the IPython engines, multiengine client and task client
  resepectively. See the IPython documentation for details.

Of course, a single cluster (and hence runtime directory) may process many
different jobs. These are stored in the ``jobs`` subdirectory, and are
organised by job name -- an arbitrary user-supplied string.

Within each job directory, three further subdirectories are found:

``logs``
    Processing logs; where appropriate, these are filed by sub-band name.

``parsets``
    Paramaeter sets providing configuration information for the various
    pipeline components. These should provide the static parameters used by
    tasks such as ``DPPP`` and the imager; dynamic parameters, such as the
    name of the files to be processed, can be added by the pipeline at
    runtime.

``vds``
    Contains VDS and GDS files pointing to the location of the data to be
    processed on the cluster.