Ontological
Excavation:
|
||||||||
|
|
IntroductionIn The Mythical Man-Month, Fred Brooks described a desirable quality of software that he called conceptual integrity. This property arises from a system that demonstrates design qualities that could only have been engineered under a unified vision of that system. “I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.”Brooks describes how conceptual integrity can be seen in the design of a computing application’s architecture, user interface, and functionality. He used the example of a cathedral at Reims in France as an example of a structure with such conceptual integrity that it invokes joy in the beholder.
To date, we lack a clear understanding of how to design for conceptual integrity or how to measure it in a computing application (or any artifact for that matter). Computing applications are particularly problematic since, unlike mechanical or electrical artifacts, their functions are not entirely dependent on their implementation. The function of a computing application does not have to follow form. In fact, one can imagine producing an application with horrible code and nonexistent architecture that performs all of its services perfectly - and vice, versa. However, as programming and design are fundamentally cognitive activities, we hypothesize that conceptual integrity does have an impact on all different layers of implementation. Why is Ontology Important to Conceptual Integrity?We offer the following argument (summarized in detail here).
Measuring Conceptual IntegrityI have been exploring metrics for measuring the conceptual integrity of computing applications. Thus far, I have identified two possible measures based on graph theory: conceptual coherence and conceptual complexity. I am also testing some combined calculations for the overall conceptual integrity. Conceptual Coherence - Conceptual coherence is a measure of an application's interrelatedness of its concepts, and uses average distance between nodes in a graph. The theory is that if a semantic network reflects potential data dependencies then a complete connected network contains concepts that are all interrelated and have an average distance of 1.0. The less related the concepts, the greater the average distance. For example, Figure 1 shows a connected graph where the average distance is 1.6. Figure 2 shows the same graph with the central node removed causing the average distance to increase to 2.3.
Core concepts support other concepts by direct (aggregation and generalization) and indirect data dependencies (associations and n-order interactions). Thus, the hypothesis is that removing those concepts essential to the application's domain model would make the resulting ontology less coherent, appearing as an increase in average distance. Conversely, removing peripheral concepts, not essential to the domain model, would make the resulting ontology more coherent, producing a decrease in average distance. Thus, conceptual coherence values reflect an ontology's "incoherence" where the higher the value, the more incoherent the ontology. Conceptual Complexity - An application's conceptual complexity reflects the average number of relationships per node (including attributes which are modeled as nodes in the ontology), and uses the average degree across all nodes in a graph (where a degree is simply the number of edges on a node). The theory (explained in detail here) is that a concept in a semantic network possessing many edges connecting it to its attributes or to other nodes has a high complexity versus a node with few edges. Thus, a complex concept is more likely to have interactions with many other concepts, raising the overall complexity of the ontology.
This idea of balancing a graph can already be found in the formation of data structures such as B-trees that optimize the organization of data elements to optimize search times. In ontologies, concepts that act as parents of subtypes or are containers that have many aggregation relationships may serve the role of data balancers in an ontology. For measuring conceptual complexity, the hypothesis is that removing those nodes that help to simplify the ontology by organizing concepts will increase the average degree. Removing inherently complex concepts decreases the average degree of the ontology. In addition, identifying inherently complex concepts and "balancing them" using B-tree or similar heuristics may also reduce the overall complexity (but possibly at some cost to coherence). Conceptual Integrity Metric - I am testing two calculations of conceptual coherence and complexity to provide an approximation of overall conceptual integrity. Currently they are labeled HZ1 and HZ2 (HZ stands for the Hsi-Zook measure). HZ1 is simply the product of coherence and complexity. HZ2 is the sum of the squares of coherence and complexity. These results are here mainly for completeness as these structural metrics will only be found to be meaningful with more data points and will probably have to be normalized against the size of the ontology. |
|||||||