External components

The LOFAR pipeline system is built using the Python programming language. Certain features build upon the following libraries and tools. The short descriptions given here should serve as background material for those who simply wish to use the framework: directly interacting with these components should rarely be necessary. Developers, of course, will wish to learn in detail about all of these libraries.

IPython

IPython, billed as “an enhanced interactive Python”, also provides a comprehensive and easy-to-use suite of tools for parallel processing across a cluster of compute nodes using Python. This capability is may be used for writing recipes in the pipeline system.

The parallel computing capabilities are only available in recent (post-0.9) releases of IPython. The reader may wish to refer to the IPython documentation for more information, or, for a summary of the capabilities of the system, to the Notes on IPython document on the LOFAR wiki.

A slight enhancement to the standard 0.9 IPython release is included with the pipeline system. We subclass IPython.kernel.task.StringTask to create pipeline.support.LOFARTask. This adds the dependargs named argument to the standard StringTask, which, in turn, is fed to the tasks’s depend() method. This makes the dependency system significantly more useful. See, for example, the DPPP recipe for an example of its use.

distproc

An alternative method of starting a distributed process across the cluster is to use the distproc system by Ger van Diepen. This system is used internally by various pipeline components, such as the MWImager; the intested reader is referred to the MWImager Manual for an overview of the operation of this system.

Infrastructure for supporting the distproc system is well embedded within various pipeline components, so the new framework has been designed to make use of that where possible. In particular, the reader’s attention is drawn to two file tyes:

clusterdesc
A clusterdesc file describes the cluster configuration. It defines a control node, various processing nodes, and describes what disks and other resources they have access to.
VDS
A VDS file describes the contents of a particular dataset and where it may be found on the cluster. For the standard imaging pipeline, data is distributed across different nodes by subband; each subband is described by a single VDS file (generated by the makevds command). The VDS files describing the subbands of a given observation may be combined (using combinevds) to generated a description of all the available data (often known as a GDS file).

The information contained in this files is used by both the task distribution systems to schedule jobs on the appropriate compute nodes.

Table Of Contents

This Page