Jon Blower: A Framework to Enable Harmonisation of Globally-Distributed Environmental Data holdings using Climate Science Modeling Language
How we use the climate science modeling language.
data from many instruments .. need to combine them all to:
.. validate numerical models
.. calibrate instruments
.. data assimilation - formal method for combing data and model ..
.. making predictions - eg floods, climate, drift at sea and search and rescue
The need for harmonisation leads to Scientists spend lots of time (up to 80% of some post docs) dealing with low-level technical issues .. need a common view onto all appropriate datasets
OGC aim to describe all geographic data . mandated by inspire .. but fiendishly complex evolved from maps
Need to bridge the gap: CSML
both abstract data model & xml encoding
provides a new view of existing data, doesnt actually change it.
14 feature types ..
classified by geometry not their content
Harmonise two datasets with CSML plugs into GeoServer (like GeoSciML)
Second way via Java-CSML
.. aim to reduce the cost of doing analysis
.. high-level analysis/vis routines completely decoupled from the data
Java-CSML Design attempts
.. transform CSML xml schema to java codeusing automated tool
.. leads to v complex code
.. OGC geoapi but incomprehensible & geoapi is a moving target
.. based on well-known java concepts
.. reduce the users code
.. you can always wrap something
.. wrappers for wfs, netcdf, opendap etc to make them all look the same
.. also have plotting routines
Problem is that the more you abstract the more info you loose, so need some more specific profiles that inherit the parent profile and add the extra know for a specific instance.
Wider lessons ..
.. intolerable data formats not necesarily suitable for storage
.. trade-offs between scope and complexity
.. symbiotic relationship between stds, tools & applications.
Aside more opendap services than wcs services for raster data.
Alistair Grant: Bio-Surveillance: Towards a Grid Enabled Health Monitoring System
Problem .. SQL SELECT blah, count from databases where diagnosis = 'X'
databases is a set of databases with non-std schemas
OGSA-DAI used to solve this.
RODSA-DAI was one solution ..
Views canbe implemented in a database, but Views can also be hosted at an ogsa-dai service layer
.. this allows both security to be implemented remote from the database, also allows remote organisations to see a view without requiring hosts to support a particular view or set of views
.. output transformed as required to google maps/earth
.. ogsa-dai view are slower, but not so much slower as to work against the disadvantages.
cf www.ogsadai.org.uk
www.phgrid.net
www.omii.ac.uk
Chris Higgins report that SEE GEO has implemented OGSA-DAI wrapper for WFS.
Lourens E Veen: Virtual Lab ECOGrid: Turning Field Observations into Ecological Understanding
ECOGrid
also www.science.uva.nl/ibed-cge
Species Behaviour
Biotic and abiotic data, incl human behavior
Field Data
Statistical analyses
Organisations incl govt, infrastructure & conservation, & private volunteers
Different datamodels:
Approach incorporated a hierarchical approach of
.. Core data
.. Extended attribute
.. Set Specific extensions to preserve original data
info goes back at least to the 50s, but also earier data if available.
Tamas Kukla, Tamas Kiss, Gabor Terstyananszky: Integrating OGSA-DAI into Computational Grid Workflows
University of Westminster
want to expand workflows in two ways ...
Major problem of all the common system is limited - mainly file or v limited database
eg Triana, Taverna, Kepler, P-Grade Portal
Workflow level interoperation of grid data resources
OGSA-DAI is sufficiently generic for it to be a good candidate.
Data staging
Static vs semi dynamic vs dynamic
static staging - in spec and access before and out spec and access after but not during
semi-dynamic - in and out specified before and in out executed during
dynamc - all access during the workflow **
ogsa-dai integration , tool, workflow editor vs workflow engine
only integration into the engine provides fully dynamic access
either implemented at the port or within the node - chosen within the node - which provides better integration
required functionality .. everything is too complex.
more specific support tool &/or totally generic - chose to support both styles of access.
Chose P-Grade Portl workflow engine, based on GridSphere with extended DAG workflow engine
in P-Grade nodes are jobs, ports represent files and links file transfer
direct submission not possible .. need an application repository so
Chose GEMLCA application repository, which is also a job submitter part of Globus.
This approach has advantage is that GEMCLA is sufficiently generic that it can be used in a range of other workflow systems.
cf http://ngs-portal.cpc.wmin.ac.uk/
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment