Robert's Geospatial Gibberish: Workflow

Showing posts with label Workflow. Show all posts

Friday, 17 October 2008

NEDF Overview

Report to NZ Geospatial Office.

National Elevation Data Framework (NEDF) workshop, 18^th March 2008, AAS, Shine Dome, Canberra, sponsored by the Australian Academy of Science (AAS) and ANZLIC.

Since this is rather a long report compared to typical blog entries, I've split it into 4 posts to make it more manageable.

ContentsNEDF Part 1: The Australian National Elevation Data Framework
NEDF Part 2: Implications for New Zealand
NEDF Part 3: Strawman NZ Elevation Data Framework
NEDF Part 4: Recommendations for a Plan of Action

Acknowledgements

I would like to thank NZGO staff for inviting me and the other NZ participants to attend the NEDF workshop, and thank the other participants for sporadic discussions since then. Where available I have attached their individual reports as appendices. I have also attached an independent otherwise unpublished proposal for KiwiDEM from Paul Hughes at DoC. Having said that any failings, shortcomings or omissions in the report are mine and not theirs. Finally I acknowledge that the PGSF funded SpInfo II research project has funded my time to write this report, which is an additional output beyond the original terms of the contract.

NEDF Part 4: Recommendations for a Plan of Action

The following recommendations should be seen as a checklist of actions that collectively will move NZs elevation infrastructure fully into a digitally wired Web 2.0 world. For this to be achieved every contributor needs to move their elevation assets and knowledge into a digital web-enabled form. Standards need to shift from official published documents describing the circumstances for the standard and containing formulae and data references, to authoritative web-services that actively support the embody best practice of the standard in use. While this might traditionally be approached in a grand design top-down organised way it can also be approached as a bottom-up grass-roots movement where each contributor progressively establishes a suit of web-services associated with their own elevation assets and knowledge. Such an approach is an anathema to traditionalists who need to organise, but the beauty of Web 2.0 is that provided each participant approaches the solution to their part of the problem using appropriate standards (eg OGC WFS, and WCS standards etc), with an expectation that everything will be in a state of continuous evolution as they incrementally respond to market needs – ie the development principle of continuous just-in-time beta releases rather than occasional massive version changes. Then collectively we will converge on a working solution with minimal grand-design overhead effort and significantly reduced risk of failure. Success in a Web 2.0 environment is directly related to shortened time to market. Don’t ‘talk and plan’ just ‘do it and do it again’.

There are three types of actions in the following recommendations, those associated with new improved data, those associated with licensing and pricing, and those associated with web-service enabling existing and new digital elevation assets. All are important in the long run, but provision of web-services is actually the easiest to achieve quickly and will drive the imperative for the other two, by generating demand and equally importantly making the need more transparent. So wherever there are digital assets and process knowledge that are already in the public domain – eg central govt data and standards, there is the opportunity to make a very significant start. Each agency with elevation assets will know their digital assets better than I do and will be able to take the principles outlined here and below and convert them into an appropriate implementation plan that will almost certainly deviate from the details that I have suggested and outlined below. The most important thing is that each agency takes on-board the principles above and considers their assets in the light of Web 2 thinking.

Access to Existing Data

With most high resolution data owned by local government, with a range of different licensing arrangements for access to data for other than the original purpose, work is needed to:

1. establish a web-service based on-line catalogue of all elevation data primary sources, their ownership and licensing. This includes LiDAR data and previous data sources such as contours and spot height measurements.

2. negotiate licensing arrangements for access to existing data where possible

3. establish web-services for on-demand data access and delivery

4. establish protocols for ensuring that future high resolution elevation data is licensed for widest possible access,

5. encourage all owners of elevation data assets to participate in making their data available. This includes non-traditional contributors such as Transit NZ and road engineering contractors who have very detailed before and after data associated with highway construction, road realignment etc. or such as architects and construction companies who build buildings whose outside dimensions (footprint and height) are needed to convert Digital Surface Models (DSM) to bare-earth Digital Elevation Models (DEM).

Reference Frame Solutions

Precise conversion between existing reference frames is limited by the state of our current knowledge of the reference frames, so a programme of work is required to resolve at least the uncertainties in existing knowledge and establish protocols for continuing refinement of our reference frame knowledge.

geoid reference: current knowledge of the geoid reference is based on a set of disconnected historic high precision level surveys, that followed the roads of the day, predating, for instance the Haast Pass road. Two possible solutions present themselves
1. extend the high precision surveys, using modern equipment, to close the loops that are currently open, and link neighbouring surveys. This will allow the existing survey data to be recomputed, reducing the uncertainty in the existing data.
2. investigating the option of adding a levelling payload to the existing road (& rail) condition surveying equipment. This equipment regularly traverses all major roads, recording road pavement condition as a function of location. If the survey vehicle had level recording gear added to its payload and all data from successive surveys were accumulated, the frequency of the measurements, would probably mean that even a lesser precision individual measurement, could result in greater overall precision.
3. establish web-services for on-demand data access and delivery of all the historic and real-time raw data gathered
4. establish web-processing services to provide on-demand standard reference analysis of this data.
sea-level reference: the key to precision in sea-level based reference frames, is the time-span of the measured baseline coupled with the quality of the reference to the associated land based bench-mark(s). A number of the existing sea-level stations are based on relatively short baseline times under a year. Two years of intensive measurement is normally considered the minimum to properly model the tidal pattern. Modelling for sea-level change, requires continuous, but less frequent monitoring. The suggested solution is to:
1. determine the configuration of an optimal network of port and open-coast monitoring stations
2. establish permanent sea-level monitoring stations with data-loggers
3. establish web-services for on-demand data access and delivery of all the historic and real-time raw data gathered
4. establish web-processing services to provide on-demand standard reference analysis of this data.
ellipsoidal reference: New Zealand uses many 'standard ellipsoids', some unique to NZ and others that are also used widely internationally. Unlike the geoid and sea-level references, ellipsoids are generally mathematically defined and not subject to ongoing refinement through measurement. The one exception is the family of ellipsoids based on NZGD2000, that are designed to allow for differential tectonic movement resulting in/from distortions to the NZ landmass. NZ has a network of permanent highest precision differential GPS stations established to monitor and define these distortions.
1. establish web-services for on-demand data access and delivery of all the historic and real-time raw data gathered
2. establish web-processing services to provide on-demand standard reference conversions between the ellipsoid used in NZ
3. establish web-processing to provide the standard reference reduction of the data from the GPS stations, so that people can use the difference between the standard ellipsoid and the distortion of the NZ terrain at any date within the range of the observations.

Elevation Surface Interpolation Solutions

There are many of these, some geared to particularly source data types – eg contour to DEM, and Stereo image to DSM, others geared to production of elevation models with particular characteristics – eg drainage enforcement, optimising height and or slope accuracy, or removal of certain subtle artefacts. Ultimately the wider the selection the better. Some are available in open source codes others are licensed – obviously the Open Source ones are more amenable to being published as a web-service, the important thing is to get the codes in use.

1. establish web-services using open-source codes for interpolation of raw elevation data into a raster elevation model for a user nominated extent and resolution.

2. stand up existing ‘best of breed’ derived elevation datasets as web-services, eg as OGC WCS compliant service, so users can extract subsets as needed. Initially these datasets will be disconnected from their source data and codes, but in the longer term as the full processing workflow becomes available they will be pre-computed elevation datasets being constantly updated from all the available web-based primary data sources and software codes.

Reduction from surface model to bare-earth model

As has been noted earlier, this is a particular issue with processing LiDAR datasets and can account for up to 30% of the total cost of production of a bare earth elevation model. It is also often the most contentious part of the data delivery contract and therefore where most gain can potentially be made, and where there is least precedent for how to approach an optimal solution. In other words this is likely to be the hardest part to achieve.

1. establish web-services for known surface objects. With LiDAR, it is usually thought that surface objects (eg buildings, bridges) can be automatically identified from the raw LiDAR data and then removed. To a certain extent this is true, but if a city council, for instance already has 3D models of downtown buildings at a dimensional precision that exceeds the precision of the LiDAR, then it makes sense to use that data source. Also if a city utility already has data about assets in its drainage network – eg pipes and culverts under roads etc that can’t be directly observed in the LiDAR, then that can be very useful data to have as input to a drainage enforcement algorithm when attempting to create a surface elevation model for drainage or flood modelling. So data describing all of these known objects should also be available as web-services.

Wednesday, 10 September 2008

AHM08/RS1: Regular Session

Jeremy Cohen: ICENI II

Coordinate forms:

declarative workflow lnguage
.. describe what not how
.. much easier to logically analyze the flow

use of coordination forms for matching

workflow execution .. bpel, scufl etc

declarative workflow generation tuned to users normal activities

.. automated workflow generation
.. extract from a users real-time use of their natural software - matlab etc

workflow execution with performance .. performance repository .. used to drive planning of optimal execution plan

ICENA II plan

Daniel Goodman: Decentralised Middleware and Workflow Enactment for the Martlet Workflow Language

Middleware comprises:
.. Process Coordinator
.. Data Store
.. Data Processor

Essentially introduces an efficient protocol for P2P communication between PCs and DPs such that each node becomes aware in changes in state and availability of the network as a whole in a decentralised robust efficient way.

Ahmed Algaoud: Workflow Interoperability

API for workflow interoperablity providing direct interaction
.. based on WS-eventing .. asynchronous
.. look to implement in eg Triana Taverna Kepler

WS-Eventer set up witth four types
.. subscriber, sink service, subscribe manager, source servce

also use WSPeer & working with NAT issues.

Asif Akram: Dynamic Workflow in GRID Environment

Imperial College

part of ICENI project
GridCC incl QoS, BPEL, ActiveBPEL

introduce QoS language

QoS criteria incl security, performance (from performance criteria)

Used WS Addressing engine (WSA) to achieve dynamic redefinition of the BPEL partner link within the BPEL.

BPEL Editor / Monitor

Conclusion .. QoS can be injected into BPEL which makes dynamic workflow much easier to achieve, and this can be achieved within existing standard specification.

Jos Koetsier: A RAPID approach to enabling domain specfic applications

User prefers domain specific portlet, but there is quite a lot of work creating domain specific portlets so ..
OK so approach is to build a custom portlet generator ..

have written one based on jsdl and jsdl xml file (GridSAM)

Uses OMII.uk s/w

obtain at http://research.nesc.ac.uk/rapid

Martin Dove: MaterialsGrid: An end-to-end approach for computational projects
3yr 5fte project www.materialsgrid.org

based on CASTEP to simulate the behaviour of materials to predict the properties of material.

results are contributed to a database .. which may also hold measured properties.

so database content is computed on demand for groups of users that dont want to know the computational under the hood stuff.

workflow using scitegic pipeline pilot instead of bpel, partly because the bpel std wasnt uniformly implemented.

cml.sourceforge.net .. chemical ml from cmlcomp.org

cml2sql & www.lexical.org golem to construct cml

.. jquery allows mix of pulldown and autocompletion & constrains to allowed values ..

AHM08/W9-2 : The Global Data Centric View

Jon Blower: A Framework to Enable Harmonisation of Globally-Distributed Environmental Data holdings using Climate Science Modeling Language

How we use the climate science modeling language.

data from many instruments .. need to combine them all to:
.. validate numerical models
.. calibrate instruments
.. data assimilation - formal method for combing data and model ..
.. making predictions - eg floods, climate, drift at sea and search and rescue

The need for harmonisation leads to Scientists spend lots of time (up to 80% of some post docs) dealing with low-level technical issues .. need a common view onto all appropriate datasets

OGC aim to describe all geographic data . mandated by inspire .. but fiendishly complex evolved from maps

Need to bridge the gap: CSML
both abstract data model & xml encoding

provides a new view of existing data, doesnt actually change it.

14 feature types ..
classified by geometry not their content

Harmonise two datasets with CSML plugs into GeoServer (like GeoSciML)

Second way via Java-CSML
.. aim to reduce the cost of doing analysis
.. high-level analysis/vis routines completely decoupled from the data

Java-CSML Design attempts
.. transform CSML xml schema to java codeusing automated tool
.. leads to v complex code
.. OGC geoapi but incomprehensible & geoapi is a moving target

.. based on well-known java concepts
.. reduce the users code
.. you can always wrap something
.. wrappers for wfs, netcdf, opendap etc to make them all look the same
.. also have plotting routines

Problem is that the more you abstract the more info you loose, so need some more specific profiles that inherit the parent profile and add the extra know for a specific instance.

Wider lessons ..

.. intolerable data formats not necesarily suitable for storage
.. trade-offs between scope and complexity
.. symbiotic relationship between stds, tools & applications.

Aside more opendap services than wcs services for raster data.

Alistair Grant: Bio-Surveillance: Towards a Grid Enabled Health Monitoring System

Problem .. SQL SELECT blah, count from databases where diagnosis = 'X'

databases is a set of databases with non-std schemas

OGSA-DAI used to solve this.

RODSA-DAI was one solution ..
Views canbe implemented in a database, but Views can also be hosted at an ogsa-dai service layer

.. this allows both security to be implemented remote from the database, also allows remote organisations to see a view without requiring hosts to support a particular view or set of views

.. output transformed as required to google maps/earth

.. ogsa-dai view are slower, but not so much slower as to work against the disadvantages.
cf www.ogsadai.org.uk
www.phgrid.net
www.omii.ac.uk

Chris Higgins report that SEE GEO has implemented OGSA-DAI wrapper for WFS.

Lourens E Veen: Virtual Lab ECOGrid: Turning Field Observations into Ecological Understanding

ECOGrid
also www.science.uva.nl/ibed-cge

Species Behaviour
Biotic and abiotic data, incl human behavior
Field Data
Statistical analyses

Organisations incl govt, infrastructure & conservation, & private volunteers

Different datamodels:
Approach incorporated a hierarchical approach of
.. Core data
.. Extended attribute
.. Set Specific extensions to preserve original data

info goes back at least to the 50s, but also earier data if available.

Tamas Kukla, Tamas Kiss, Gabor Terstyananszky: Integrating OGSA-DAI into Computational Grid Workflows

University of Westminster

want to expand workflows in two ways ...

Major problem of all the common system is limited - mainly file or v limited database
eg Triana, Taverna, Kepler, P-Grade Portal

Workflow level interoperation of grid data resources

OGSA-DAI is sufficiently generic for it to be a good candidate.

Data staging
Static vs semi dynamic vs dynamic

static staging - in spec and access before and out spec and access after but not during
semi-dynamic - in and out specified before and in out executed during
dynamc - all access during the workflow **

ogsa-dai integration , tool, workflow editor vs workflow engine

only integration into the engine provides fully dynamic access

either implemented at the port or within the node - chosen within the node - which provides better integration

required functionality .. everything is too complex.

more specific support tool &/or totally generic - chose to support both styles of access.

Chose P-Grade Portl workflow engine, based on GridSphere with extended DAG workflow engine
in P-Grade nodes are jobs, ports represent files and links file transfer

direct submission not possible .. need an application repository so

Chose GEMLCA application repository, which is also a job submitter part of Globus.

This approach has advantage is that GEMCLA is sufficiently generic that it can be used in a range of other workflow systems.

cf http://ngs-portal.cpc.wmin.ac.uk/

Saturday, 6 September 2008

Workflows dissected

In New Zealand the concept of web-service or grid Workflow is very new, with a morass of new nomenclature, that I have found difficult to grasp all in one. So I have attempted to relate objects, names and concepts in the workflow world to their functional equivalents in traditional programming development and execution environments, that are more widely known. This is not to pretend that a web service and a file, for example are the same, but instead to recognise that within the two different domains they fulfill functionally equivalent roles. By seeing things in this way, it becomes easier to understand how all the new nomenclature fits together. Of course sometimes the functional fit is very loose and at other times the equivalence is very close. So this is the conclusion that I have come to, if it helps you as well, then thats is usefull, if I have missed something fundamental, then I'm happy to be corrected and to adjust the table – so if you are an expert feel free to comment, but bear in mind that this is a table to emphasize functional similarities from the perpsective of newbies to the workflow space. Following blogs will hopefully expand on key differences.

OK first attempt at the table - as yet incomplete:

Functional Role	Traditional Environment	Web-service based Workflow - Taverna	Grid based Workflow - Triana	Web-service based Workflow - Sedna
Scripting tools	AML, shell script	SCUFL	?	Domain PEL & Scientific PEL
Programming Language	C++, Fortran, Java	n/a	?	BPEL
Integrated Development Environment	MS Visual Studio	Taverna	Triana	Sedna plugin to Eclipse IDE
Callable object	DLL file	Web Service	Java Unit	Web Service
Executable Object	EXE file	Taverna workflow	Triana workflow	BPEL bpr archives
Process launch & control, or enactment	Windows, Linux	Freefluo workflow enactor	GAP	ActiveBPEL engine
File/data objects	File, database	Web service	Grid service protocol GridFTP	Web service

table v0.1, Sep 5th, 2008

Robert's Geospatial Gibberish