Wednesday, 10 September 2008

AHM08/W9-2 : The Global Data Centric View

Jon Blower: A Framework to Enable Harmonisation of Globally-Distributed Environmental Data holdings using Climate Science Modeling Language


How we use the climate science modeling language.

data from many instruments .. need to combine them all to:
.. validate numerical models
.. calibrate instruments
.. data assimilation - formal method for combing data and model ..
.. making predictions - eg floods, climate, drift at sea and search and rescue

The need for harmonisation leads to Scientists spend lots of time (up to 80% of some post docs) dealing with low-level technical issues .. need a common view onto all appropriate datasets

OGC aim to describe all geographic data . mandated by inspire .. but fiendishly complex evolved from maps

Need to bridge the gap: CSML
both abstract data model & xml encoding

provides a new view of existing data, doesnt actually change it.

14 feature types ..
classified by geometry not their content

Harmonise two datasets with CSML plugs into GeoServer (like GeoSciML)

Second way via Java-CSML
.. aim to reduce the cost of doing analysis
.. high-level analysis/vis routines completely decoupled from the data

Java-CSML Design attempts
.. transform CSML xml schema to java codeusing automated tool
.. leads to v complex code
.. OGC geoapi but incomprehensible & geoapi is a moving target

.. based on well-known java concepts
.. reduce the users code
.. you can always wrap something
.. wrappers for wfs, netcdf, opendap etc to make them all look the same
.. also have plotting routines

Problem is that the more you abstract the more info you loose, so need some more specific profiles that inherit the parent profile and add the extra know for a specific instance.

Wider lessons ..

.. intolerable data formats not necesarily suitable for storage
.. trade-offs between scope and complexity
.. symbiotic relationship between stds, tools & applications.


Aside more opendap services than wcs services for raster data.

Alistair Grant: Bio-Surveillance: Towards a Grid Enabled Health Monitoring System

Problem .. SQL SELECT blah, count from databases where diagnosis = 'X'

databases is a set of databases with non-std schemas

OGSA-DAI used to solve this.

RODSA-DAI was one solution ..
Views canbe implemented in a database, but Views can also be hosted at an ogsa-dai service layer

.. this allows both security to be implemented remote from the database, also allows remote organisations to see a view without requiring hosts to support a particular view or set of views

.. output transformed as required to google maps/earth

.. ogsa-dai view are slower, but not so much slower as to work against the disadvantages.
cf www.ogsadai.org.uk
www.phgrid.net
www.omii.ac.uk

Chris Higgins report that SEE GEO has implemented OGSA-DAI wrapper for WFS.

Lourens E Veen: Virtual Lab ECOGrid: Turning Field Observations into Ecological Understanding

ECOGrid
also www.science.uva.nl/ibed-cge

Species Behaviour
Biotic and abiotic data, incl human behavior
Field Data
Statistical analyses

Organisations incl govt, infrastructure & conservation, & private volunteers

Different datamodels:
Approach incorporated a hierarchical approach of
.. Core data
.. Extended attribute
.. Set Specific extensions to preserve original data

info goes back at least to the 50s, but also earier data if available.

Tamas Kukla, Tamas Kiss, Gabor Terstyananszky: Integrating OGSA-DAI into Computational Grid Workflows

University of Westminster

want to expand workflows in two ways ...

Major problem of all the common system is limited - mainly file or v limited database
eg Triana, Taverna, Kepler, P-Grade Portal

Workflow level interoperation of grid data resources

OGSA-DAI is sufficiently generic for it to be a good candidate.

Data staging
Static vs semi dynamic vs dynamic

static staging - in spec and access before and out spec and access after but not during
semi-dynamic - in and out specified before and in out executed during
dynamc - all access during the workflow **

ogsa-dai integration , tool, workflow editor vs workflow engine

only integration into the engine provides fully dynamic access

either implemented at the port or within the node - chosen within the node - which provides better integration

required functionality .. everything is too complex.

more specific support tool &/or totally generic - chose to support both styles of access.

Chose P-Grade Portl workflow engine, based on GridSphere with extended DAG workflow engine
in P-Grade nodes are jobs, ports represent files and links file transfer

direct submission not possible .. need an application repository so

Chose GEMLCA application repository, which is also a job submitter part of Globus.

This approach has advantage is that GEMCLA is sufficiently generic that it can be used in a range of other workflow systems.

cf http://ngs-portal.cpc.wmin.ac.uk/

No comments: