Dymond & Associates  
 

 

White Paper:
Metadata-Ontology System (MOS)



Overview

Metadata
XML and the Web Ontology Language (OWL)
Some Examples

Overview

Metadata today allows both humans and automated systems to access documentation describing the enterprise.  We introduce a metadata-ontology system (MOS) that is based on XML and the new Web Ontology Language (OWL) standard.  The MOS provides human-readable XML and allows metadata to be stored in both XML and database tables and to be queried with conventional SQL.


Metadata

Metadata is documentation that describes and explains.  Almost any piece of information sits in a halo of  descriptive and explanatory information, such as the variable "patientName" surrounded by metadata as shown below.  The same can be said for functions and workflows.

                location(s)                                 date

        value                                                    value type

                                 patientName 

        comment                                                       label

             owner             instructions             validation


Metadata can support both business and technical needs.  Conventional metadata is accessed by humans, but there is growing interest today in using metadata to allow machines to communicate.  This is the notion behind the semantic web where machines discover, contract, and utilize services provided by other machines over the Internet.

When and how to deploy metadata is a business decision.  There are many situations today where metadata is needed to comply with legislation such as Sarbanes-Oxley or the Patriot Act.  If it is not mandated, metadata deployment should be driven by a return on investment (ROI) decision. Resources are  required to install,  update, and validate metadata .  There is an optimal tuning range between not having adequate metadata and having the metadata effort turn into a mini-industry within the enterprise.


XML and the Web Ontology Language (OWL)

Traditionally, metadata has been used in businesses to describe data contained in relational tables.  Many of the tools used to implement metadata are themselves based on relational tables.  This is advantageous in that relational tables are widely used and well understood in business.  However, conventional metadata tables are challenged when describing complex structures and by open data exchange across the Internet.

Data exchange across the Internet is most often done today using text-based XML (eXtensible Markup Language).  How XML is written is described by the RDF standard.  Using XML and RDF is slightly analogous to saying we will communicate using the Western European  alphabet to write Spanish.  This describes a syntax, but does not yet provide a complete semantic explaining how to embed information in the language.  This requirement is completed by the Web Ontology Language (OWL).

An ontology is a much more powerful information capture tool than conventional metadata.  If the problem is to describe auto makers from different countries that make different vehicle types composed of frames, bodies, and engines, and that engines have valves, and fuel injectors, and pistons, and that pistons have a number of their own characteristics --then we need an ontology to capture this.

The OWL standard comes from the same standards group that provided RDF.  OWL uses RDF, RDF uses XML, and it would seem that this is close to the complete package needed to provide the advanced ontology-based metadata needed by the semantic web.  This is in fact true, but in practice it is seriously compromised by the overwhelming complexity of RDF.  RDF is incomprehensible to humans.  Information embedded in its convoluted structures is not accessible by conventional business tools.  New tools to query RDF-based metadata are being developed, but adding new tools is an obstacle for business.  On top of this is the conceptual learning curve to master the idea of an ontology.  Regardless of its technical merits, as it comes from the standards group, the XML/RDF/OWL stack is not a very attractive business proposition.

A practical solution to the core issue of RDF's complexity of RDF is contained in the standard itself which states that RDF is built over the concept of triples.  Triples are statements of the form:

      Subject - Predicate - Object

For example:

     John - likes - apples.

All of RDF and OWL can be expressed as triples.  Expression as triples will be more verbose than the original RDF syntax but it will become readable by humans.  It can also be stored in tables and queried with existing and well understood tools.

Below is an example using a simple statement about the play Hamlet.  The sentence is show as triples, triples in a relational table, and as triples in XML.

The sentence is:

     “Hamlet was created by Shakespeare in the English language.”

This can be written in triples as:

   Hamlet-creator-Shakespeare
   Hamlet-language-English

Triples are easily stored in a relational table as:

Subject Predicate Object
Hamlet creator Shakespeare
Hamlet language English


Triples can be expressed in XML as:

<row>
   <subject>Hamlet</subject>
   <predicate>creator</predicate>
   <object>Shakespeare</object>
</row>
<row>
   <subject>Hamlet</subject>
   <predicate>language</predicate>
   <object>English</object>
</row>



This Metadata-Ontology System (MOS) approach retains compatibility with the XML/RDF/OWL stack while avoiding the complexities of RDF.  Data can be entered, edited, and queried using relational tables.  It can then be converted to XML files and transported over the Internet to another machine.


Some Examples

The MOS will support many interesting data operations.  Here are a few brief examples of query, taxonomy, dependency, and inference. 

1)  The data structure shown above for Hamlet can provide information to conventional queries such as:

   Query:  Who wrote Hamlet?
   Answer:  Shakespeare

   Query: Hamlet is written in what language?
   Answer: English

   Query: Name a play written in English by Shakespeare?
   Answer:  Hamlet



2)  Taxonomies organize information based on a parent-child linkage.  Some datasets can be accurately described by more than one taxonomic structure, and it would be advantageous to have data structures that support spontaneous taxonomy formation.  If we look at animals and attache enough metadata to the instance of each animal, the structure could automatically organized itself as:

Mammal
|
|-PlacentalMammal
|   |-Primate
|   |   |-Baboon
|   |   |-Orangutan
|   |   |-Gorilla
|   |
|   |-Rodent
|   |   |-Squirrel
|   |   |-Gopher
|   |   |-Mouse
|   |
|   |-Carnivor
|       |-Wolf
|       |-Bear
|       |-Cat
|
|-MarsupialMammal
        |-Kangaroo
        |-Opossum



3)  Processes have sequences of actions and products with an individual item depending on one or more other items.  For example, a cup of tea depends on hot water and a tea bag, and the hot water depends on a teapot.  We can declare a property "dependsOn" and use this in our metadata as:

Subject Predicate Object
cup of tea dependsOn hot water
cup of tea dependsOn tea bag
hot water dependsOn teapot


Now we can query the metadata as:

Query:  What does hot water depend on?
Answer: teapot

Query: What are the consequences of not having a tea bag.
Answer:  No cup of tea!



4)  Inference allows discovering information that is not explicitly stated.  If we declare a property "locatedIn" as being a transitive property, meaning that locatedIn can transfer this characteristic to another property, then we can declare some metadata as:

Subject Predicate Object
locatedIn  transitive property true
elephants locatedIn  Congo
Congo locatedIn  Africa


Now we can query the metadata as:

Query:  Are elephants located in Africa?
Answer:  true

The fact that elephants are located in Africa is not explicitly stated in the metadata.  We have discovered this information based on applying inference to the transitive property locatedIn.






--------------------------------------------------------------------------------------

A technical draft design document is available that describes how the MOS is implemented at the XML level.

 

 
©2007 Dymond and Associates, LLC