.. _framework-dependencies:
*******************
External components
*******************
The LOFAR pipeline system is built using the `Python
`_ programming language. Certain features build upon
the following libraries and tools. The short descriptions given here should
serve as background material for those who simply wish to use the framework:
directly interacting with these components should rarely be necessary.
Developers, of course, will wish to learn in detail about all of these
libraries.
.. _ipython-blurb:
IPython
=======
`IPython `_, billed as "an enhanced interactive
Python", also provides a comprehensive and easy-to-use suite of tools for
parallel processing across a cluster of compute nodes using Python. This
capability is may be used for writing recipes in the pipeline system.
The parallel computing capabilities are only available in recent (post-0.9)
releases of IPython. The reader may wish to refer to the `IPython
documentation `_ for more information, or, for
a summary of the capabilities of the system, to the `Notes on IPython
`_
document on the `LOFAR wiki `_.
A slight enhancement to the standard 0.9 IPython release is included with the
pipeline system. We subclass :class:`IPython.kernel.task.StringTask` to create
:class:`pipeline.support.LOFARTask`. This adds the ``dependargs`` named
argument to the standard :class:`~IPython.kernel.task.StringTask`, which, in
turn, is fed to the tasks's :meth:`depend` method. This makes the dependency
system significantly more useful. See, for example, the :ref:`dppp-recipe`
recipe for an example of its use.
.. _distproc-blurb:
distproc
========
An alternative method of starting a distributed process across the cluster is
to use the ``distproc`` system by Ger van Diepen. This system is used
internally by various pipeline components, such as the MWImager; the intested
reader is referred to the `MWImager Manual
`_
for an overview of the operation of this system.
Infrastructure for supporting the ``distproc`` system is well embedded within
various pipeline components, so the new framework has been designed to make
use of that where possible. In particular, the reader's attention is drawn to
two file tyes:
``clusterdesc``
A clusterdesc file describes the cluster configuration. It defines a
control node, various processing nodes, and describes what disks and other
resources they have access to.
``VDS``
A VDS file describes the contents of a particular dataset and where it may
be found on the cluster. For the standard imaging pipeline, data is
distributed across different nodes by subband; each subband is described
by a single VDS file (generated by the ``makevds`` command). The VDS files
describing the subbands of a given observation may be combined (using
``combinevds``) to generated a description of all the available data
(often known as a GDS file).
The information contained in this files is used by both the task distribution
systems to schedule jobs on the appropriate compute nodes.