Introduction

A workflow aims to accomplish a complex piece of work. In Astrogrid terms this would be an astronomical investigation. If the work could be done in one simple step (eg: by executing a single program from the command line), then it would hardly be a workflow. A prime dimension of a workflow is therefore that the objective will take more than one step to accomplish - it has a degree of complexity.

A simple example of a workflow might consist of the following steps:

  1. Make a query against against one catalog (say a cone search of one area of the sky).
  2. Make a similar query against another catalog.
  3. Merge the contents.
  4. Analyse the resulting data.
Each step is usually the invocation of a separate computer program (a tool), each tool requiring its own inputs and outputs. Some inputs to a step will be astronomical data, and some inputs will be control information for the selected tool. Outputs will normally be processed data destined as a final result, or as intermediate data to be used as inputs for subsequent steps within the workflow. For Astrogrid, the final result of executing a workflow is usually a file held in MySpace. From that viewpoint, Astrogrid workflow is itself an intermediate tool, since we would envisage the final results file being subsequently loaded into another tool, for example, for visualization, or even for input into a different workflow.

Astrogrid tries to deal with the complexity of workflows. But it is not an easy thing to eliminate. The following tries to give the salient points.

Format and Structure

We hold workflow designs as XML documents. This is a particularly verbose way of describing an individual workflow, but is convenient for its rigour. Examples are shown here . There is no necessity to interact directly with a workflow xml document and therefore there is no necessity to be intimately equainted with it; the portal design tool hides most of the intricacies. However, some exposure to it is probably useful. Workflow structure is described by its XML schema: every workflow document has to obey the rules embodied in the schema. The details of the workflow schema are described here , but come back to this later.

Persistence Mechanism

Workflow designs are held as files (as XML documents) and stored in MySpace for Astrogrid purposes. They can be copied or shared. At the end of the day they are just files.

Design Time and Execution Time

A workflow must be designed by an astronomer with a particular goal in mind. The process of design may be anything from the straightforward to the immensely complex. Even a relatively simple workflow consisting of a small number of steps may involve an astronomer in an iterative process of design, running the workflow against data, examining the results for feedback, then refining the design. A good workflow may therefore be something of a capital investment, and worth sharing with others.

There are two aspects worth noting.

  1. Workflows are involved in two processes: a design t ime process and an execution time process. A single workflow (ie, a design) may be executed many times. Usually it is obvious from the context whether a workflow is meant as a design or as a record of a specific execution of that design. Just be aware of the difference. We try to be clear by using a separate term (ie: 'job') for an executed workflow.
  2. Any workflow will refer to data outside of itself (files, catalogs). If you copy, amend or share a workflow, some of these external references may need to be taken into account.

Metadata

In designing a workflow, we need specialized information regarding each of the steps we design. We tend to refer to data that itself describes astronomical data, or data that describes the working of an astronomical tool, by the term metadata. Most often you hear the terms catalogue metadata and tools metadata. A workflow design contains metadata or references to metadata within it.

We embody metadata within a workflow in various forms:

  • As a query to be submitted against an astronomical archive
  • As a reference to a file which itself is to be used as control data (eg, a configuration file for a given tool)
  • As settings embedded directly in a workflow document, where the setting is used to control the execution of a tool (eg, an integer value).
But even the name of a tool is derived from access to metadata, so there is no hard and fast line. This control material must be collected together in the design process. Metadata is stored in the Astrogrid registry.

Job and Job Submission

When a workflow begins execution, we treat it as a job. A workflow becomes a job by being submitted to the Job Execution System. Each submission represents a separate job. If you submit one workflow fifty times, you have fifty jobs, and one workflow. We keep a separate record for each job, a fact which is reflected in the workflow schema. In effect, a job is a specialized workflow document, which contains not only the design (as it was when it was submitted) but run-time information about a specific execution down to the individual step level.

Job Execution System

The Job Execution System is the part of Astrogrid that controls the execution of a workflow. Otherwise known as JES, it is Astrogrid's workflow engine. JES is an engine that can manage jobs consisting of multiple steps, where individual steps can be run on different computers across a network. JES will attempt to run steps in a controlled fashion (eg: in parallel or sequentially depending upon the workflow definition). Because steps can execute at different locations, both control flow and data flow have to be managed. A partner component, known as the Common Execution Controller (CEC), manages step execution and any data flow required.

All steps in Astrogrid workflow execute asynchronously. That means that a step dispatched by JES is done so on a non-blocking basis.

CEA and CEC

CEA stands for Commom Execution Architecture. It is our standard for formalizing the web-interface provided by all tools, no matter whether they are implemented as Java code, commandline applications, or another SOAP / HTTP-GET based web service. The set of CEA applications may be split into datacenters and processing tools . Datacenters support complex queries against astronomical archives, while processing tools consume files of data and reduce them in some way. Although these are two very different animals, both can be thought of as tools.

CEC implementations inform JES when a step has completed. This is because steps are dispatched on an asynchronous basis: there is no waiting for a step to finish once it has been dispatched.

A Unit of Work

If you want to consider things as units of work, the smallest unit of work undertaken in a workflow is embodied in a single step. So a job will contain all the units of work sucessfully completed by the steps that got executed by JES. However, usually the appearance of a final result's file (eg, a specific VOTable), or a set of such files , within MySpace will be indicative. At the end of the job, or even at strategic points within it, an email can be sent. Whether a job is a formal success or failure can be programmed into the workflow design and interpreted by JES. Certainly from this point of view you can definitely tell when a job has failed. It will tell whether things have gone wrong, but not necessarilly whether the results produced in a successful run are what were expected. (At present we also have some problems with steps that never return.)