Design of the pipeline code structure

This document outlines the basic design of the pipeline aimed at achieving maximum flexibility and extendability.

Processors

All pipeline processing should be done through classes that inherit from a common base class: PipelineProcessor, which should provide a uniform interface for configuring things like Logging and Crash Recovery. Operations which are shared among multiple processors, yet are so atomic that they could not issue useful logging messages or upon crash can only be recovered by discarding all progress should avoid this mechanism and be implemented as stand-alone functions.

Code layout

Each main-level step in Pipeline Steps (i.e. those with a single number) sit in separate python modules, with first level sub-steps implemented as classes each sitting in its own .py file. In order to avoid excessively long import statements the __init__.py files for each main-level module should import the individual classes from their respective python files and adding them to its __all__ variable.