Home
JUMP 101
Specification
License
Developers

Jumper


-

Home > Specification >

Overview

JUMP-OWL is a markup language for specifying ontologies, schemas, and profiles in databases. It provides an intuitive language structure for describing a relational database structure and the table data into an easily understandable framework.

The information defined in this specification provides a detailed overveiw of the JUMP meta-language framework. It is an open specification subject to review by our open-source peers. We invite your ideas and comments, please contact us with feedback regarding JUMP.


JUMP Specification

JUMP-OWL is a Semantic Web-based representation format for relational data and schema components. It is a lightweight metadata interchange format that captures the metadata available in structured data sources as knowledge. The JUMP metadata language is composed in an OWL framework. The JUMP-OWL specification is an extension of the Web Ontology Language W3C recommendation. It utilizes the OWL Full format and follows RDF Schema in having a higher order syntax (although first order semantics). OWL full does not enforce a strict separation of classes, properties, individuals, datatypes or data values. JUMP-OWL captures specific metadata that makes it particularly appropriate for complex, ad-hoc, cross-domain search of databases, spanning multiple data domains, and even organizations.

JUMP-OWL files create a knowledge base about large collections of structured or semi-structured data residing anywhere, formatted in any structure, and defined using any model. OWL, originally created for the Semantic Web enables us to represent not only the relational data itself, but also a part of its interpretation, i.e. knowledge about its format, its origin, its usage, and its structure in specific frameworks. To represent relevant metadata of a relational database using Semantic Web techniques, we require an OWL ontology which describes the schema and technical metadata of a relational database in an abstract way.

This OWL representation can be used as a schema representation itself and as a novel ontology for creating a representation format, which is suitable for the corresponding data items. To describe the schema of a relational database with the techniques provided by OWL, we have to define reference OWL classes centrally, to which a Jumper Index describing such a database can refer to. The abstract representation of classes like Table or Column , as well as varChar length and Constraints on the data, become a central part of the knowledge representation process realized within JUMP-OWL. These classes represent the taxonomic roots of all data in your organization. Domain specific root classes are defined by simply declaring a named class. Each user defined class will also consist of related sub-classes. Additionally, we have to specify possible relationships among these classes resulting in an ontology by which a relational database can easily be described. We call this central representation of abstract schema components and relationships JUMP-OWL.

JUMP-OWL Vocabulary

The following markup tags define the extended vocubulary of a JUMP-OWL file. The language provides the semantic objects, the low-level XML tags that correspond to the semantic object, and all descriptions, rules, and relationships associated with usage of the semantic object.

Body Types

Public Components <<public component>> - consists of the following identifiers:

Source Object Identifier <<source object identifier>> - Each data unit is captured as an object to be consumed by a Java service. The metadata contained with the table and identified as an object is provided with a unique identifier. A Source Object Identifier (SOI) is much like a GUID, it is a 128 bit integer that is considered statistically unique. The JUMP Framework provides a SOI structure that represents a single SOI value for every data unit. SOI strings are represented as a key based hash provided as a secure method of turning an arbitrary data unit name into a (relatively) small number that may serve as a digital "fingerprint" of the data unit.
For example:

<<source object identifier>>29c04sdk509vid09...<</source object identifier>>

Source Service <<source service;>> - A service in the JUMP Framework is any application that needs to consume data from a non-aligned data store. In the broader context it is part of a SOA that builds applications out of software services. Services are relatively large, intrinsically unassociated units of functionality, which have no calls to each other embedded in them. Instead of services embedding calls to each other in their source code, protocols are defined which describe how one or more services can talk to each other. This architecture then relies on a business process to link and sequence services, in a process known as orchestration, to meet a new or existing business system requirement.
For example:

<<source service>>registration.hospitalA.com<</source service>>

SuperClass <<class selector>> - A data unit may use a naming convention that is defined by a broader data model. These models are often industry standards that are defined and managed by an official governing body. By utilizing the class selector tag we can assign wider meaning to a data units object tag names. Thus a semantic, presentation, or physical tag can have a name that is identified by a broader class based model. In the example provided the semantic object class is defined in a MOF file called CDR provided by the HL7 spec.
For example:

<<class selector>>HL7:it108<</class selector>>
     <<class reference>> http:www.hl7.cdr.org<</class reference>>

Private Components <<private component>> - consists of the following identifiers:

Source Table <<source table;>> - The basic component of a data unit is the source table. It consists of a set of data elements (values) composed in a database table. A table comprises an organized set of metadata identified by a specified number of named columns. The source table tag identifies the table name used to refer to the object throughout the database.
For example:

<<source table>>patient<</source table>>

Source Database <<source database>> - The source table resides in a source database. This is the physical location of the collection of records identified by the source table. The source database tag identifies the URL or IP address of the database where the data unit resides.
For example:

<<source database>>vertex1<</source database>>

Elements <<elements>> - attributes in the physical data table:

Semantic Layer<<semantic>> - The semantic tag is an abstract representation of metadata. It involves defining a meaning-preserving structure from one metadata notation into another metadata notation. Any solution to the mapping problem requires, by definition, a shared understanding of the object. Since such understanding is a pre-requisite to the creation of any meaning-preserving transformation the solution to the mapping problem requires a shared semantic. The semantic tag is used to abstractly name and classify a data value property.
For example:

<<semantic>>PaNa<</semantic>>

Presentation Layer <<presentation>> - This field is used to assign a presentation tag, such as XML markup, in the services schema to the corresponding semantic or physical layer tag that it is related to in both meaning and structure. A semantic tag and an presentation tag are said to be equivalent, as are a physical tag and a presentation tag. Equivalent objects have the same instances. Equality is used to create synonymous objects for metadata mapping. Equality means the data value they represent has the same meaning and has synonymous properties. From the equivalence in a Semantic model you can deduce that if X is related to Y by the defined property, Y is also related to X by the defined property. The two syntactic tags (semantic and presentation) may be stated to represent the same thing. These constructs may be used to automate mapping between autonomous schemas that each map to the same semantic tag. Since X is equivalent to Y, and Z is equivalent to Y, then Z is equivalent to X, through Y, and vice versa.
For example:

<<presentation>>patient_name<</presentation>>

Physical Layer <<physical>> - This physical tag represents a single, implicitly structured name of a column header in a table that represents a field label or name. Every field in a database has a unique name defined by the column in which the cell resides. This name should be a characteristic of the subject of the table to which it belongs. If the column is appropriately named it should be easy to identify the characteristic that the field is supposed to represent. However, a column name is just as likely to be ambiguous, unclear or vague and not clearly suggest or identify the purpose of the field. It is very common to see names that are generic, or abbreviated, often with meaningless acronyms, or even just misleading nomenclature. The physical tag captures the name of the column in the database table described in the data unit.
For example:

<<physical>>name<</physical>>

Description <<description>> - The Description tag provides an expansive human readable, free-form natural language description of the metadata. It captures the description that is tied to the fields in many databases, it represents xsd annotations and @ annotations, and even the WSMO description. Descriptions are attached to the elements they describe. The Description tag provides the mechanism in the meta-model of efficiently capturing the full description of the field represented by metadata. It provides enough information to other data architects and developers to enable them to recognize and use the metadata correctly. This gives relevance to the metadata when viewed and provides the multiple layers of context that must be explicitly conveyed by the data.
For example:

<<description>>the full and proper name of the patient<</description>>

Properties <<property:>> - A metadata property is a parameter-value specifying a specific value associated with the data element. It describes the element in terms of its physical properties. The Property tag is comprised of a number of sub-property component values that that express how the element is stored and presented. The data expressed in the metadata tags is more highly abstracted and thus easier to consume using the Property component.
For example:

<<property:value>>real name - The description of property provided in free 
  form natural language.

Derived from SQL this represents a decomposed attribute expression. It represents the user-defined data type value that further defines a column. The property value tag helps describe an instance of the type. When a typed table is not used the values field can simply be used to further define the element logically.

<<property:type>>Integer, Character, Float, etc. - Type of property, as 
defined by SQL.

Almost all programming languages explicitly include the notion of data type. Common data types include integers, floating point numbers, characters, etc. as defined by SQL. Most programming languages also allow the programmer to define additional data types, usually by combining multiple elements of other types and defining the valid operations of the new data type.

<<property:field>>varchar(30), or 0 to n. Defines field width, can be 
finite or infinite.

Derived from SQL and expressed in the PDM. The char is a fixed-length character data type, the varchar is a variable-length character data type. The char is a fixed-length data type and the storage size of the char value is equal to the maximum size for this column. Because varchar is a variable-length data type, the storage size of the varchar value is the actual length of the data entered, not the maximum size for this column. You can use char when the data entries in a column are expected to be the same size. You can use varchar when the data entries in a column are expected to vary considerably in size.

<<property:format>>first, last - Defines the format of property provided 
in free-form natural language.

Formatting data is always a challenge when validating data for exchange between systems. Often the existing data has no consistent format being derived from many sources. The data must be digitally encoded in a way the program can understand. The specific structure in which it is encoded must be clearly expressed in the metamodel so that it can be correctly formatted between systems.

Property Class <<property class>> - The property tag can benefit from a broader class association. In large smodels the repetition of repeated property definitions that are exactly the same can be made more efficient by the use of a property class association. The property class automates the process of populating the property fields that can be applied across a broad selection of objects. JUMP leverages the power of abstract classes to define class attributes that can be associated with the semantic name. In this way you can have one rule that defines properties for all patients from one location, and from one application, say a local clinic, and another set of properties for all patients from a second location, and second application, say a big doctor?s office. The semantic name defined is the same for all patients in the hospital system, but since each of these applications structured patient names differently this can be easily accommodated and defined. Using the property class value avoids having to construct one semantic name for patients at the clinic <<semantic>>patient_clinic<</semantic>> and another semantic name for patients at the doctors office <<semantic>>patient_office<</semantic>>, each with different property values. Now we can create a single semantic object <<semantic>>patient<</semantic>> and define separate property values for different usage. Manipulating data formats that fall under the same semantic tag is made easier and more flexible via the property class. Define the property class once and deploy anywhere in the smodel.
For example:

<<property class>>clinic format<</property class>>

Constraints <<constraint>> - Constraint defines a rule of restriction. A Constraint indicates a rule or set of rules used to define the specific context of a restriction that defines how the data element can be used when mapped. Business rules translate into data constraints. Constraints are of two basic types - integrity and conditional constraints. An integrity constraint is something that must be true, or it must be kept true. A condition is a test that, depending on its outcome, may invoke another constraint or an action. All Constraint values represent a pre-condition or post-condition restriction on elements defined only within the containing data unit. The constraint rule is defined in free form and langauge similar in structure to commonly used rule expressions. We have made a specific intent in this model to make constraints more usable and readable. It consists of textual descriptions of why, when, and how to use the element defined, and the consequences of using the element.
For example:

<<constraint:required>>

When you make a field required, a value must always be entered in the field when creating a new record or editing an existing record. Required fields cannot be left empty.

<<constraint:optional>>

Whether an attribute is required or not; or a relationship is required or not is based on the objects state. An attribute value is either initially optional (not required), but eventually must be filled in (required). It is common to not require entry of a value on a screen, but rather to notify the operator that one must be added before some other event can happen.

<<constraint:key>>

A key is one or more columns that are used to uniquely identify a row in a table. A key that is two or more attributes is called a composite key. A primary key is the preferred key for an entity type whereas an alternate key (also known as a secondary key) is an alternative way to access rows within a table.

<<constraint:derivation>>

A derivation is an attribute that is derived from other attributes or system variables. Derived attributes are, by convention, shown in parentheses. For example, if we have an attribute called "(Extended value)" that is computed by multiplying a "Quantity" attribute by a "Unit price" attribute we define this using the constraint:derivation.

<<constraint:role>>

The constraint:role can be used to constrain either an attribute or a data unit relationship. In the notation being used here roles are defined within the definition and enforcement of rules and corresponding relationships. This is similar to UMLs more general concept of inter-role constraints.

<<constraint:relation>>

A conceptual model shows the way information flows through an organization, the sequence of actions an organization will follow, the structure of its operating information, and so forth. These are defined by the constraint:relation rule.

<<constraint:cascade>>

If you delete or alter one element it specifies that all the rows with Foreign keys pointing to the changed or deleted row are also changed or deleted.

<<constraint:check>>

Provides an external source for which the column is vetted against.

Vertical Tree <<vertical_tree>> - The vertical tag defines a rule of presentation. The vertical tag is stated on an element to define the position in the tree structure of the data element. The rule allows repeating information presented using parent/child relationships to be expressed in the model. The metamodel must be able to incorporate these data patterns from the service schema. This requires a method of recording the vertical location of an object in a tree structure so that it can be correctly presented. In the example provided below the root of the tree is the element <patientID>, while the mapped XML tag <patient_name> is a child of this root element. This is indicated by the stated position of :1. A root element is provided a weight of 1, a subsequent child element is provided a sequential weight of 2, and so on until we reach the leaf element. All parts of the tree based hierarchy that are vertically linked to one another are captured in the model for correct data presentation. The vertical tag is a simple method of ranking and organizing data elements, where each element of the schema (except for the root element) is subordinate to a single other element. Vertical hierarchy describes interdependencies expressed in an XSD.
For example:

<<vertical tree:patientID:1>>

The metamodel must also capture horizontally linked data. Objects may be associated in BPEL that are not directly linked in the tree but are still required in the process. It does this using the Relational rule.

Horizontal Relation <<horizontal_relation>> - The horizontal tag defines a rule of association. If a horizontal rule is stated on a data unit then it maintains an association with the other data unit(s). A horizontal condition indicates that the element must rely on a specified relation to one or more other data units. A relation is not an absolute rule. The relationship may be suggested or valid only under certain conditions. A horizontal relation defines assumptions about how specific elements are related and dependent thus defining similar relationships that must be present in the mapping. It indicates how these elements work together and are dependent. Horizontal relations describe interdependencies expressed as relationships in a conceptual or logical model. A relationship occurs between two data units and consists of two roles-one going each direction. The model shows that each data unit may be connected via one or more roles.
For example:

<<horizontal relation:type=3 rule:required/>>
  <<relation=table:"patient_procedure"/>>
  <<relation=table:"attending_physician"/>>
  <<relation=table:"account_receivable"/>>


Jump is a trademark of Jumper Networks.
The Jumper Project is maintained and driven by the community and sponsored by Jumper Networks.
All Rights Reserved © 2008 Jumper Networks, Inc. visit us on the web at: www.jumpernetworks.com