Ontological Excavation:
Conceptual Integrity

Home Page

Academic Information
Research Abstract
Publications
Ontological Excavation
Curriculum Vitae
Past Projects
Other

Personal Information
My First Name
Hobbies

Creative Efforts
Biographical
Cosmic Irony
Essays
Movie Reviews
Photography
Random Interest

Literature Excerpts
Essays and Anecdotes
Favorite Poems
Folk Tales and Myths
Historical Writings
Oriental Philosophy
Stories and Fragments

Links


Web Counter

Introduction

In The Mythical Man-Month, Fred Brooks described a desirable quality of software that he called conceptual integrity. This property arises from a system that demonstrates design qualities that could only have been engineered under a unified vision of that system.

 “I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.”

Brooks describes how conceptual integrity can be seen in the design of a computing application’s architecture, user interface, and functionality. He used the example of a cathedral at Reims in France as an example of a structure with such conceptual integrity that it invokes joy in the beholder.

Cathédral Notre-Dame de Chartres

Cathédral Notre-Dame de Reims

Built from 1145-1220. Originally, the cathedral was designed in the Romanesque style - as can be seen on the right. In 1194, the left side burned down and the architect decided to rebuild it in the style of early Gothic. Designed by Jean d'Orbais in 1211 and was completed in 1475. Shows a coherent and unified vision preserved across the centuries.

To date, we lack a clear understanding of how to design for conceptual integrity or how to measure it in a computing application (or any artifact for that matter). Computing applications are particularly problematic since, unlike mechanical or electrical artifacts, their functions are not entirely dependent on their implementation. The function of a computing application does not have to follow form. In fact, one can imagine producing an application with horrible code and nonexistent architecture that performs all of its services perfectly - and vice, versa. However, as programming and design are fundamentally cognitive activities, we hypothesize that conceptual integrity does have an impact on all different layers of implementation.


Why is Ontology Important to Conceptual Integrity?

We offer the following argument (summarized in detail here).

  • Conceptual integrity is a desirable quality in computing applications and is evidenced by a well-designed software architecture, user interface, and feature set.

  • Computing applications are developed to be useful to their users and to behave in specific problem domains.

  • These applications encode the user’s domain in a set of concepts and relationships that we call an ontology. The services and behaviors expressed by these concepts are accessed through an application's external interface that we call a morphology.

  • The concepts in the ontology determine what features the software implements.

  • The degree to which the ontology matches the problem domain of its users is its conceptual fitness.

  • An application possessing high conceptual fitness is more likely to be useful than one with a low conceptual fitness.

  • The morphology, architecture, and code must necessarily implement the features articulated by the ontology.

  • Thus, the ontology is the single most important factor in the conceptual integrity of the application.


Measuring Conceptual Integrity

I have been exploring metrics for measuring the conceptual integrity of computing applications. Thus far, I have identified two possible measures based on graph theory: conceptual coherence and conceptual complexity. I am also testing some combined calculations for the overall conceptual integrity. 

Conceptual Coherence - Conceptual coherence is a measure of an application's interrelatedness of its concepts, and uses average distance between nodes in a graph. The theory is that if a semantic network reflects potential data dependencies then a complete connected network contains concepts that are all interrelated and have an average distance of 1.0. The less related the concepts, the greater the average distance. For example, Figure 1 shows a connected graph where the average distance is 1.6.

   Average Distance = 1.6

Figure 2 shows the same graph with the central node removed causing the average distance to increase to 2.3.

  Average Distance - 2.3

Core concepts support other concepts by direct (aggregation and generalization) and indirect data dependencies (associations and n-order interactions). Thus, the hypothesis is that removing those concepts essential to the application's domain model would make the resulting ontology less coherent, appearing as an increase in average distance. Conversely, removing peripheral concepts, not essential to the domain model, would make the resulting ontology more coherent, producing a decrease in average distance. Thus, conceptual coherence values reflect an ontology's "incoherence" where the higher the value, the more incoherent the ontology. 

Conceptual Complexity - An application's conceptual complexity reflects the average number of relationships per node (including attributes which are modeled as nodes in the ontology), and uses the average degree across all nodes in a graph (where a degree is simply the number of edges on a node). The theory (explained in detail here) is that a concept in a semantic network possessing many edges connecting it to its attributes or to other nodes has a high complexity versus a node with few edges. Thus, a complex concept is more likely to have interactions with many other concepts, raising the overall complexity of the ontology. 

Degree of blue node = 7.0, Average Degree = 2.0

This idea of balancing a graph can already be found in the formation of data structures such as B-trees that optimize the organization of data elements to optimize search times. In ontologies, concepts that act as parents of subtypes or are containers that have many aggregation relationships may serve the role of data balancers in an ontology. For measuring conceptual complexity, the hypothesis is that removing those nodes that help to simplify the ontology by organizing concepts will increase the average degree. Removing inherently complex concepts decreases the average degree of the ontology. In addition, identifying inherently complex concepts and "balancing them" using B-tree or similar heuristics may also reduce the overall complexity (but possibly at some cost to coherence).  

Conceptual Integrity Metric - I am testing two calculations of conceptual coherence and complexity to provide an approximation of overall conceptual integrity. Currently they are labeled HZ1 and HZ2 (HZ stands for the Hsi-Zook measure). HZ1 is simply the product of coherence and complexity. HZ2 is the sum of the squares of coherence and complexity. These results are here mainly for completeness as these structural metrics will only be found to be meaningful with more data points and will probably have to be normalized against the size of the ontology.