Advertisement

Data Rationalization – The Next Step in Semantic Resolution

By on

With the Web 2.0, ontologies are being used to improve search capabilities and make inferences for improved human or computer reasoning. By relating terms in an ontology, the user doesn’t need to know the exact term actually stored in the document. Data Rationalization is a Managed Meta Data Environment (MME) enabled application which creates/extends an ontology for a domain into the structured data world, based on model objects stored in various models (of varying levels of detail, across model files and modeling tools) and other meta data. Ontology is “the study of the categories of things that exist or may exist in some domain”1. An ontology is comprised of “a collection of taxonomies and thesauri”2 about a domain. Data Models, often unknowingly, express many aspects of ontology, even though they are not stored in OWL or RDF.

The primary reason for data modeling is, at the end of the day, to create physical data structures – though a critical best practice for data modeling is to follow a phased modeling approach – typically developing Conceptual, Logical, and finally Physical Data Models. Conceptual Data Models are sometimes considered to be Semantic Models as they are expressed in business terminology and demonstrate how key business objects relate to each other, independent of technology or application. There are other types of data models (e.g. enterprise models) and other meta data which should be linked together to provide a more holistic view of a domain. Unfortunately, most modeling tools are incapable of handling all of the different levels of models effectively, and it is not uncommon for more than one modeling tool to be used in an enterprise, and multiple model files are almost always a necessity. For example, a data modeling tool might be used for Logical and Physical Models, while a UML class diagram might be used for the Conceptual model.

Tying these model objects together, and visualizing these objects and relationships, is called Data Rationalization, and is typically enabled as part of a Managed Meta Data Environment (MME) by leveraging a Meta Data Repository (MDR) tool. Data Rationalization can be thought of as “vertical data lineage” as opposed to horizontal data lineage employed for data movement (i.e. Information Supply Chain). With Data Rationalization, we’re not trying to find where an actual piece of data came from (i.e. source to target), but what higher order model objects the data was conceived from or help to explain it, or which downstream objects implement the higher order model objects (see Figure 1 below).

1 John F Sowa
2 Seth Early, Taxonomies and Metadata

Figure 1 – Example ‐ Data Rationalization versus Information Supply Chain

Benefits of Data Rationalization

What is the benefit of Data Rationalization? To be able to effectively exploit, manage, reuse, and govern enterprise data assets (including the models which describe them), it is necessary to be able to find them. Also, there is (or should be) a wealth of semantics (e.g. business names, definitions, relationships) embedded within our models that can be exposed for improved analysis and knowledge transfer. By linking model objects (across or within models) we can find what higher order model objects the model object in question is conceptualized from. Conversely, we can identify what implementation artifacts implement a higher order model object. For example, we can traverse from a conceptual model entity to a logical model entity to a physical model table to a database table, etc. Similarly, we could use Data Rationalization to understand a database table by traversing up through the different model levels.

With today’s distributed systems, there are often dozens, hundreds of models, and tens of thousands of data elements ‐ found in many heterogeneous systems. It is usually very difficult to find all the model objects or implementation artifacts (e.g. database table) in the enterprise which express a concept, e.g. “Customer”. Even with name matching, the same term may mean different things in different systems (i.e. homonyms) or have differing natural keys and therefore probably NOT representing the exact same Figure 1 – Example ‐ Data Rationalization versus Information Supply Chain thing. Of course, different applications may use different terms and different abbreviations… e.g. prospect, account, cust, cst…

Data Rationalization is an enabler of effective Data Governance. How can you govern information assets if you don’t know where they are or what they mean? Similarly, Data Rationalization can aid in the development of Master Data Management solutions. By identifying common data entities, and how these relate to other pieces of data (again, across many systems), MDM solutions will be able to better accommodate the needs of all the systems which require the master/reference data.

How does it work?

In order to be able to rationalize your data, meta‐relationships between model objects (across model levels) must be established. Of course, we are not talking about supplanting the normal types of relationships between model objects in the same model. Meta relationships can be established in multiple ways:

In order to be able to rationalize your data, meta‐relationships between model objects (across model levels) must be established. Of course, we are not talking about supplanting the normal types of relationships between model objects in the same model. Meta relationships can be established in multiple ways:

  1. Use automated modeling tool functionality (e.g. ERStudio Where Used, PowerDesigner Link and Sync)
  2. Use manual modeling tool functionality (e.g. ERStudio User Defined Mapping)
  3. Use modeling tool meta data fields (e.g. ERwin User Defined Properties (UDP)
  4. Use Meta Data Repository tool (e.g. Rochade, Adaptive, Advantage Repository, etc) to manually establish links using a GUI or other interface.
  5. Use a spreadsheet

Discussing these different ways to establishing rationalization meta‐relationships and the pro’s and con’s of each method will be discussed in a subsequent article.

Once the meta‐relationships are established, these need to be imported into Meta Data Repository (if not established using the MDR tool). From there, analysts can search, retrieve, and visualize the meta data to perform Data Rationalization analysis. Analysts don’t need to have a modeling tool license to explore the models (assuming the higher order models can be found), or need to rely on the data modeler to obtain access or export the model meta data.

A very simplistic example of a Data Rationalization analysis might be an analyst wishing to understand the relationship between two tables. Assume the analyst doesn’t have a modeling tool license, doesn’t know what model to look for, or doesn’t have access to the network share where the models are stored. Also, assume that foreign keys have been disabled in the physical model (valid in some cases, e.g. data warehousing…). Using the MDR, the analyst could search on the table names, and when these are displayed rationalize upwards to see the logical entities (in a separate model file) these tables originated from. This then enables the analyst to see the relationship and allows the analyst to understand its cardinality, optionality, and identification and review the relationship verb phrase.

Figure 2 ‐ Example Data Rationalization visualization using Rochade – the physical model imported into the MDR using the RDM meta‐model, the logical model using the CDM meta‐model.

Summary

Data Rationalization is a powerful means to unlock and leverage semantics hidden in models, and to be able to find, identify, and leverage the data we need. Tying a formal ontology (e.g. stored in OWL) to model objects can help bridge the semantic divide between unstructured and structured data. With Data Rationalization we can more effectively manage, govern, understand, and analyze our information assets. Significant time savings can be realized as we can improve reuse, identify opportunities for master data management solutions, and minimize the amount of re‐analysis so often performed when changes or new applications are required.

Leave a Reply