This note describes a proposal for a Common Execution Architecture
(CEA) within the Virtual Observatory. It discusses the general
motivation behind the design as well as detailed schema and WSDL
defintions of the architecture. The scope of this document covers areas
of interest to the Registry and Grid Working Groups as well as the
Applications Special Interest Group.
This is an IVOA Working Draft for review by IVOA members and other
interested parties. It is a draft document and may be updated,
replaced, or obsoleted by other documents at any time. It is
inappropriate to use IVOA Working Drafts as reference materials or to
cite them as other than "work in progress." A list of
current IVOA Recommendations
and
other technical documents can be found at
http://www.ivoa.net/Documents/.
-
3 Deployment
- 4 Future Directions
The Common Execution Architecture (CEA) is an attempt to create a
reasonably small set of interfaces and schema to model how to execute a
typical Astronomical application within the Virtual Observatory (VO).
In this context an application can be any process that consumes or
produces data, so in existing terminology could include
- A unix command line application
- A database query
- A web service
The CEA has been primarily designed to work within a web services
calling mechanism, although it is possible to have specific language
bindings using the same interfaces. For example Astrogrid has a java
implementation of the interfaces that can be called directly from a
java executable.
The primary requirements motivating the creation of this
architecture are;
- To create a uniform interface and model for an application and
its parameters. This has twin benefits;
- It allows VO infrastructure writers a single model of an
application that that have to code for.
- Application writers know what they have to implement to be
compatible with a VO Infrastructure.
- To provide a higher level description than WSDL 1.1 can offer.
- Restrict the almost limitless possibilities allowed by WSDL
into a manageable subset.
- Provide specific semantics for some astronomical quantities.
- Provide extra information not allowed in WSDL - e.g. default
values, descriptions for use in a GUI etc.
- To provide extensions with the VO Resource schema (See
the IVOA WG) that can describe a general application
- To provide asynchronous operation of an application - This is
essential as the call tree that invokes the application cannot be
expected to be active for extremely long lasting operations - e.g. a
user from a web browser invokes a data-mining operation that takes days
- Provide callback for notification of finishing.
- Provide polling mechanisms for status.
- To allow for the data flow to not necessarily have to follow the
call tree. In a typical application execution the results are returned
to the invoking process - In a VO scenario, it can be useful if the
application can be instructed to pass the results on to a different
location.
The design for this architecture has evolved from the requirements
for the Workflow/Job Execution System components within AstroGrid. It
was desireable for the job execution system to have a single model for
an application, so that it could deal with the (already complex)
problems of scheduling, looping, conditional execution etc. without
needing to have specializations for all the different types of service
(SIA, Database query, Cone Search, etc.) that it might be required to
invoke.
Amongst the VO specifications there was no existing model for
applications that was defined at the level at which this design
attempts to address. In the VOResource schema an application is defined
as a Service with the interface definition. The interface defintion
either relies on referring to a WSDL definition of the service, or on
other schema extending the service definition to provide some specific
detail as in the case of a Simple Image Access service. There is no
general definition of an application in the resource.
It is clear that the WSDL model of an interface has had a large
influence on the design of the CEA, but it should be remembered that
the CEA is intentionally layered on top of WSDL, so that CEA
controls the scope and semantics of operations. There is only one WSDL
defintion for all applications, so as far as web services are concerned
the interface is constant. CEA works by transporting meta
information about the application interface within this constant WSDL
interface.
- Application - This is the process that is to
be executed. It is
defined as a process that can consume or create data. So this can
include unix command line tools, database queries, web services etc.
- Common Execution Controller - this is the component that
implements the CommonExecutionConnector interface, and actually
controls the execution of the application. There can be various
specialisms of this service, such as the
CommandLineApplicationController, which can be configured to invoke a
general unix command line tool, a WebServiceApplicationController, which
can be configured to act as a proxy to call a general web service in a
uniform manner
- Invoking process
- Monitoring Service - This is a service that the Common Execution
Controller can report status to - it can of course be
- Storage Service - this is the mechanism by which the application
can return its results in the indirect parameter mode (see indirect parameters).
- Results Service
2.2 Interactions
click on diagram to enlarge
The above sequence diagram illustrates how the various components of
the CEA system interact when an application is executed. The
steps are
- The invoking process calls the init method of the CommonExecutionConnector
interface, which is implemented by the component known as the CommonExecutionController.
This will set up the execution environment for the the application and will return
immediately
with
an executionID
which
is
the
identifier by which the CommonExecutionController keeps track of
this
particular execution instance. The parameters to this call are
- A Tool object - This is described in more detail below.
- JobIdentifier - this is the identifier by which the invoking process
uses to keep track of this particular execution instance.
- the invoking process then has the opportunity to register two classes of listener
- Status Monitor - this is the endpoint of the service that implements
the JobMonitor interface that the ExecutionController can call
to inform the monitoring process of the status of the execution instance.
- Results Listener - this is the endpoint of a service that implements the ResultsListener port so that the ExecutionController can report the results of the application execution once they are ready
- Then the execute operation should be invoked and the CommonExecutionController will then start the application.
- The application can then optionally return status information to the CommonExecutionController
which will then pass this on to the Monitor Service.
- When the application completes it will inform the CommonExecutionController
which will then pass the indirect results on to the storage service, the direct results back to any results listeners and inform the
monitor service that the application has finished.
Some point of note;
- The monitoring/resultListening services could equally be the same as the invoking
service - they are shown as conceptually separate, as the endpoint of
this service is passed in as an argument to the registering call. Indeed if required there could be many status and results listeners for a single application execution.
- The only guaranteed status message that the monitoring service
will receive is the one informing it that the application has finished
(or failed). The application might be capable of sending intermediate
messages whilst it is sill executing, but this is not required.
- The results of the application are not necessarily returned directly
to the invoking process. For "indirect" output paramters, the final destination for the result data is
implicit in the specification of the output parameters, and it is the
responsibility of the ExecutionController to ensure that they get
to the desired storage service.
- The results will also always be passed to the resultsListener if registered. In the case of an indirect parameter, then only the URI that specifies the location will returned, otherwise the full value will be returned.
2.3 Interfaces
CommonExecutionConnector
This is the main port that is used to communicate with the
application. The main operations in this port are;
- init - this will initialize the application environment - returns and executionId by which
- registerResultsListener - any number of services can register themselves as wanting to receive the results from the run when they are available as long as they implement the ResultsListener port below
- registerProgressListener - any number of services can register themselves as wanting to receive status messages during the run as long as they implement the JobMonitor port below
- execute - will actually start the asynchonous execution of the application specified in the init call.
- queryExecutionStatus - this call can be used to actively obtain the execution status of a running application, rather than passively waiting for it as a JobMonitor
- abort - will attempt to abort the execution of an application
- getExecutionSummary - request summary information about the application execution
- getResults - actively request the results of the application execution, rather than passively waiting for them as a ResultsListener.
- returnRegistryEntry - this returns the registry entry for the particular CommonExecutionConnector instance - this will probably be removed from this interface to be replaced by the equivalent operation in the standard VO service definitions.
The WSDL definition of this interface is stored in cvs at http://www.astrogrid.org/viewcvs/*checkout*/astrogrid/workflow-objects/wsdl/CommonExecutionConnnector.wsdl?rev=HEAD
JobMonitor
The WSDL definition of this interface is stored in cvs at http://www.astrogrid.org/viewcvs/*checkout*/astrogrid/workflow-objects/wsdl/JobMonitor.wsdl?rev=HEAD
The only operation is the JobmMonitor port is the monitorJob operation, which expects to receive a message with the job-identifier-type (as specified in the original init operation of the CommonExectutionConnector port and a status message
ResultsListener
The WSDL definition of this interface is stored in cvs at http://www.astrogrid.org/viewcvs/*checkout*/astrogrid/workflow-objects/wsdl/CeaResultsListener.wsdl?rev=HEAD
The only operation is the putResults on the ResultsListener port. This accepts a message that contains a job-identifier-type and a result-list-type, which is just a list of parameterValues.
2.4 Objects
The objects that participate in CEA can be split into two groups
- Those used to describe the application in the registry
- Application the overall application, which has a series of
- Parameter which are the detailed descriptions of the parameters
and their types.
- Those used to describe the application in the WSDL interface
- Tool -
An instance of an application with real parameter values.
- ParameterValue which is used to pass a values to a Tool.
These are described in more detail in the following sections.
2.4.1 Application
uml model
As this model depicts an application in CEA is really quite a simple
entity consisting of 1 or more interfaces that consist of 0 or more
input parameters and 0 or more output parameters.
The schema representation is shown below, and is essentially a
representation of the UML model that has been coded to recognise that
the same parameter can occur in several interfaces.
This diagram also shows a number of specialized elements all within the substitution
group which has Parameter as the head. These are
implementation details where extra information is needed to specify how to
use the parameters - for example in the case of a command line parameter it
is necessary to know the command line switch or position that the parameter
appears at.
2.4.2 Parameter
The description of the parameters and the parameter values are
probably the heart of the CEA. It is the model for the parameters that
allow us to add semantic meaning, and to give the flexibility in how
the parameters are transported. The implementation is still in its
infancy, but it is hoped that the parameter definition will be extended
to encompass any data models that the VO produces.
The basic parameter definition from the schema is shown below
2.4.3 Tool
The tool represents
the full collection of parameters that are passed to a particular interface
of an application and the results that are returned.
2.4.4 ParameterValue
The parameterValue
model is simple but powerful representation of the parameters that are passed
to an application. The parameterValue element has 2 attributes
- name
- indirect This describes whether the value element of the parameter should be used as is (indirect=2false"), or if the value of the parameter represents a uri from which the actual value should be fetched (indirect="true"). It has not been defined what the minimum set of transport mechanisms a service should understand to be CEA compliant, but the different sorts of transport
mechanism are expected to include
- SOAP messages
- http get/put
- SOAP attachments
- ftp/gridftp
- MySpace
- local filestore
2.5 Schema
The schema associated with the CEA fall into two categories
- The schema used to define the messages within the
CommonExecutionConnector interface.
- The schema that is used to define the VOResource extension for
CEA.
These schema are strongly interelated (as they are imported in both
the WSDL and Registry Schema), which aids programming with
automated object generation tools, as there are many common objects.
The schema associated with CEA are
described below with links to their documentation.
| Filename (with cvs link) |
Description |
x3sp Documentation |
| AGApplicationBase.xsd |
This schema defines most of the basic CEA objects that are
imported into both the WSDL and the Registry Schema |
Documentation |
| CEATypes.xsd |
This defines the the message types that are passed in
queryStatus operations in the CommonExecutionConnector interface and in the MonitorJob
operation of the Job Monitor interface.
|
Documentation |
| VOCEA.xsd |
This defines the VOResource extensions of CeaApplication and
CeaService that are used in the registry |
Documentation |
| AGParameterDefinition.xsd |
Contains the basic parameter definition and parameter value
elements used in the other schema |
Documentation |
| Workflow.xsd |
This schema actually describes an astrogrid workflow document
in full, but as part of this is the tool
element that is passed in as a parameter to the execute method in the
CommonExecutionConnector method. This tool element will be factored out
into its own CEA specific schema in future. |
Documentation |
2.5.1 Discussion of the VOResource Extension
It is a valid question to ask whether there needed to be a specific VOResource
extension to accommodate the CEA. The standard Service
element expects the interface to the service to be described in WSDL,
so given that CEA has constant WSDL definitions for different applications
there needs to be a way of expressing the fact that a particular CeaService can
run a particular set of applications. The method that was chosen was to extend
Service with an element that is just an aggregation
of pointers to the actual application defintions defined in CeaApplication which
is an extension of the standard ResourceType.
These relationships are illustrated in the UML below.

For a particular application there should be only one CeaApplication entry
in the registry. This entry will define everything that is necessary to run
the application except for the endpoint of the service. This implies that to
find a particular instance of a particular application is a two stage registry
query.
- Query the registry to find the application of interest - note the parameter
data and the IVOA identifier for the application.
- Query a second time to find the CeaService(s) that can run the application
with that IVOA identifier.
The diagram illustrates
the point that one CeaService may
run several CeaApplications
and that a particular CeaApplication can be
run by several CeaServices.
3 Deployment
3.1 Typical Scenario
This deployment shows some of the features of using the CEA
- On the right hand side of the diagram there are command line
applications that are wrapped by specialized CommonExecutionControllers
that allow the workflow engine to use the CommonExecutionConnector
interface to communicate
- There is a webservices proxy component that can act as an adaptor
between a generic web service and the CommonExecutionConnector interface
- On the left of the diagram the webservices proxy is localised
with a web service so that the results returned by the webservice can
be stored locally thus minimising network traffic
3.2 What it means for an Application to be CEA compliant
- Implement the CommonExecutionConnector interfaces.
- Be prepared to make a call back to the Job Monitor interface with intermediate
and final execution status.
- Understand how to interpret all the parameter types.
- Be able to support all the transport mechanisms for output parameters.
3.3 Astrogrid Implementation
The CEA is implemented in the following astrogrid components
- Applications Integration (maven
documentation). This currently implements a specialized CommonExecutionController
that can execute unix command line applications. In iteration 6 of Astrogrid,
there are plans to create a CommonExecutionController that can execute arbitrary
web services.
- Workflow Common Objects (maven
documentation). This project holds all of the schema and WSDL definitions
that are used by CEA based services in Astrogrid. Additionally it contains
Castor generated object bindings for the schema and the Axis generated web
services stubs for the service.
- Job Execution System (maven documentation). This is the engine of the Astrogrid
workflow
4 Future Directions
4.1 What needs to be done to make this suitable for adoption by the IVOA
- Refactor the schema to remove some of the astrogrid specific
parts - particularly the tool element needs to be removed from workflow
and placed in AGApplicationBase.xsd schema.
- The status callback can be defined in the same WSDL as the main
interface.
- Renaming of components/files to not include astrogrid references.
- Bring into line with the new VOResource
4.2 Extensions
-
AsynchronousActivityProposal
- Including work from the DM workgroup on basic parameter types - perhaps
extend the number of types that CEA "understands".
- Think about Capability/ontology.
Note The AuthorityID in this example is set to an illegal value of
@REGAUTHORITY@ which is a token that is replaced by the astrogrid installation
system.
These are all in-line links at the moment.