|
|
White Paper:
Metadata-Ontology System (MOS)
Overview
Metadata
XML and the Web
Ontology Language (OWL)
Some Examples
Overview
Metadata today allows both humans and automated
systems to access documentation describing the enterprise.
We introduce a metadata-ontology system (MOS) that is based on XML
and the new Web Ontology Language (OWL) standard. The MOS
provides human-readable XML and allows metadata to be stored in
both XML and database tables and to be queried with conventional
SQL.
Metadata
Metadata is documentation that describes and explains.
Almost any piece of information sits in a halo of
descriptive and explanatory information, such as the variable
"patientName" surrounded by metadata as shown below. The same
can be said for functions and workflows.
|
location(s)
date
|
|
value
value type
|
|
patientName
|
|
comment
label
|
|
owner
instructions
validation
|
Metadata can support both business and technical needs.
Conventional metadata is accessed by humans, but there is growing interest
today in using metadata to allow machines to communicate.
This is the notion behind the semantic web where machines
discover, contract, and utilize services provided by other
machines over the Internet.
When and how to deploy metadata is a business decision.
There are many situations today where metadata is needed to comply
with legislation such as Sarbanes-Oxley or the Patriot Act.
If it is not mandated, metadata deployment should be driven by a
return on investment (ROI) decision. Resources are required
to install, update, and validate metadata .
There is an optimal tuning range between not having adequate metadata and
having the metadata effort turn into a mini-industry within the
enterprise.
XML and the Web Ontology Language (OWL)
Traditionally, metadata has been used in businesses to
describe data contained in relational tables. Many of the
tools used to implement metadata are themselves based on
relational tables. This is advantageous in that relational
tables are widely used and well understood in business.
However, conventional metadata tables are challenged when
describing complex structures and by open data exchange across the
Internet.
Data exchange across the Internet is most often done today using
text-based XML (eXtensible Markup Language). How XML is
written is described by the RDF standard. Using XML and RDF
is slightly analogous to saying we will communicate using the
Western European alphabet to write Spanish. This
describes a syntax, but does not yet provide a complete semantic
explaining how to embed information in the language. This requirement is
completed by the Web Ontology Language (OWL).
An ontology is a much more powerful information capture tool than
conventional metadata. If the problem is to describe auto
makers from different countries that make different vehicle types
composed of frames, bodies, and engines, and that engines have
valves, and fuel injectors, and pistons, and that pistons have a
number of their own characteristics --then we need an ontology to
capture this.
The OWL standard comes from the same standards group that provided
RDF. OWL uses RDF, RDF uses XML, and it would seem that this
is close to the complete package needed to provide the advanced
ontology-based metadata needed by the semantic web. This is
in fact true, but in practice it is seriously compromised by the
overwhelming complexity of RDF. RDF is incomprehensible to
humans. Information embedded in its convoluted structures is
not accessible by conventional business tools. New tools to
query RDF-based metadata are being developed, but adding new tools is an
obstacle for business. On top of this is the conceptual
learning curve to master the idea of an ontology. Regardless
of its technical merits, as it comes from the standards group, the
XML/RDF/OWL stack is not a very attractive business proposition.
A practical solution to the core issue of RDF's complexity of RDF
is contained in the standard itself which states that RDF is built
over the concept of triples. Triples are statements
of the form:
Subject - Predicate - Object
For example:
John - likes - apples.
All of RDF and OWL can be expressed as triples. Expression
as triples will be more verbose than the original RDF syntax but it
will become readable by humans. It can also be stored in
tables and queried with existing and well understood tools.
Below is an example using a simple statement about the play
Hamlet. The sentence is show as triples, triples in a
relational table, and as triples in XML.
The sentence is:
“Hamlet was created by Shakespeare
in the English language.”
This can be written in triples as:
Hamlet-creator-Shakespeare
Hamlet-language-English
Triples are easily stored in a relational table as:
| Subject |
Predicate |
Object |
| Hamlet |
creator |
Shakespeare |
| Hamlet |
language |
English |
Triples can be expressed in XML as:
<row>
<subject>Hamlet</subject>
<predicate>creator</predicate>
<object>Shakespeare</object>
</row>
<row>
<subject>Hamlet</subject>
<predicate>language</predicate>
<object>English</object>
</row>
This Metadata-Ontology System (MOS) approach retains compatibility
with the XML/RDF/OWL stack while avoiding the complexities of
RDF. Data can be entered, edited, and queried using
relational tables. It can then be converted to XML files and
transported over the Internet to another machine.
Some Examples
The MOS will support many interesting data operations. Here
are a few brief examples of query, taxonomy, dependency, and
inference.
1) The data structure shown above for Hamlet can provide information
to conventional queries such as:
Query: Who wrote Hamlet?
Answer: Shakespeare
Query: Hamlet is written in what language?
Answer: English
Query: Name a play written in English by Shakespeare?
Answer: Hamlet
2) Taxonomies organize information based on a
parent-child linkage. Some datasets can be accurately
described by more than one taxonomic structure, and it would be
advantageous to have data structures that support spontaneous
taxonomy formation. If we look at animals and attache enough
metadata to the instance of each animal, the structure could
automatically organized itself as:
Mammal
|
|-PlacentalMammal
| |-Primate
| | |-Baboon
| | |-Orangutan
| | |-Gorilla
| |
| |-Rodent
| | |-Squirrel
| | |-Gopher
| | |-Mouse
| |
| |-Carnivor
| |-Wolf
| |-Bear
| |-Cat
|
|-MarsupialMammal
|-Kangaroo
|-Opossum
3) Processes have sequences of actions and products with an
individual item depending on one or more other items. For
example, a cup of tea depends on hot water and a tea bag, and the
hot water depends on a teapot. We can declare a property
"dependsOn" and use this in our metadata as:
| Subject |
Predicate |
Object |
| cup of tea |
dependsOn |
hot water |
| cup of tea |
dependsOn |
tea bag |
| hot water |
dependsOn |
teapot |
Now we can query the metadata as:
Query: What does hot water depend on?
Answer: teapot
Query: What are the consequences of not having a tea bag.
Answer: No cup of tea!
4) Inference allows discovering information that is not explicitly
stated. If we declare a property "locatedIn" as
being a transitive property, meaning that locatedIn can transfer
this characteristic to another property, then we can declare some metadata as:
| Subject |
Predicate |
Object |
| locatedIn |
transitive property |
true |
| elephants |
locatedIn |
Congo |
| Congo |
locatedIn |
Africa |
Now we can query the metadata as:
Query: Are elephants located in Africa?
Answer: true
The fact that elephants are located in Africa is not explicitly
stated in the metadata. We have discovered this information
based on applying inference to the transitive property locatedIn.
--------------------------------------------------------------------------------------
A technical draft
design document is available that describes how the MOS is
implemented at the XML level.
|
|