Previous Table of Contents Next


12.2.2 Major Classes and Associations


   The CWM Data Mining metamodel consists of seven conceptual areas: A core Mining metamodel (upon which the other areas depend), and metamodels representing the data mining subdomains of Clustering, Association Rules, Supervised, Classification, Approximation, and Attribute Importance. Each area is represented by the metamodel packages shown in the diagram below.


   DataMining (from Analysis)


   <<metamodel>> <<metamodel>> <<metamodel>> AssociationRules


   Figure 12-1 CWM Data Mining Metamodel

   Collectively, the collection of Data Mining packages provide the necessary abstractions to model generic representations of data mining models (i.e., mathematical models produced or generated by the execution of data mining algorithms).

   Included are representations of data mining tasks and models, as well as other entities (such as category matrix) that are common across most data mining applications and tools, as well as their relationships to each other and their mappings to technical metadata.

   The Mining Core package consists of common Data Mining abstractions that are fundamental to, and reused by, the major conceptual areas. In particular, this package contains several basic packages that are required to implement the CWM Data Mining interfaces. It is required that at least this package and one more Data Mining package be implemented for compliance. The packages forming the Mining Core are shown in the next diagram.


   <<metamodel>> MiningCore (from DataMining)


   Figure 12-2 CWM Data Mining Metamodel: Mining Core Package

   The following subsections describe the content of each component package of the MiningCore. This is subsequently followed by subsections describing each of the major conceptual area packages.

   12.2.2.1 Mining Function Settings

   


algorithmSettings MiningAlgorithmSettings

   


settings



   


MiningFunctionSettings settings attributeUsageSet AttributeUsageSet

(from MiningData)

   settings logicalData

   Logical Data


   


(fr om MiningData)

   Figure 12-3 CWM Data Mining Metamodel: Mining Function Settings

   This package defines the objects that contain parameters specific to mining functions. The separation of mining functions from mining algorithms enables the user to specify the type of the desired result without being concerned with a particular algorithm. The Mining Function Settings metamodel is illustrated above.

   Mining FunctionSettings (MFS) is the superclass of all other function settings classes. An MFS instance references a set of MiningAttributes, aggregated by a LogicalData instance. The AttributeUsage set defines how each of the MiningAttributes will be used by the Mining Algorithm.

   12.2.2.2 Mining Model

   +modelLocation

   


Class

(from Core) +model



   MiningModel

   


+model +modelSignature


   


ModelSignature




   


+/owner

   


+model

   




   


+model

   


MiningAttribute (fromMiningData)

   


+settings+keyAttribute +/feature

   


Attribute MiningFunctionSettings

   




   SignatureAttribute

   (from Core) (fromMiningFunctionSettings)




   Figure 12-4 CWM Data Mining Metamodel: Mining Model

   This package defines the basic Mining Model from which all model objects inherit as the result of a mining build task. The Mining Model metamodel is illustrated above.

   Each MiningModel has a signature that defines the characteristics of the data required by the model.

   12.2.2.3 Mining Result

   ModelElement


   MiningResult


   Figure 12-5 CWM Data Mining Metamodel: Mining Result

   This package defines the basic MiningResult object from which all result objects inherit as the result of a specific mining task (other than build).

   12.2.2.4 Mining Data

   This package defines the objects that describe the input data, the way the input data is to be treated, and the mapping between the input data and internal representation for which mining algorithms can understand.

   PhysicalData effectively references and instance of a class or subclass (e.g., Table, file, etc.). This allows JDM to leverage the various row/column format data representation expressible in CWM.

   Mining Data metaclasses representing the concepts of physical data are illustrated in Figure 12-6. Logical data metaclasses are illustrated in Figure 12-7. Attribute assignment and attribute usage metaclasses are illustrated in two subsequent diagrams ( Figure 12-8 and Figure 12-9, respectively).

   Finally, metaclasses used to model the matrix representation and taxonomy of mining data are presented in Figure 12-10, Category Matrix, and Figure 12-11, Category Taxonomy, respectively.

   ModelElement (from Core)


   Figure 12-6 CWM Data Mining Metamodel: Physical Data

    Figure 12-6 illustrates those elements of the Mining Data metamodel used to model physical data, whereas the following diagram shows those elements facilitating the logical modeling of data.

   Class (from Core)

   Attribute (from Core)


   


MiningAttribute

   LogicalData

   


/owner




   LogicalAttribute

   


/featurelogicalAttribute

   




logicalAttribute

   numericalProperties

   




   


NumericalAttributeProperties

CategoricalAttributeProperties categoricalProperties categoricalProperties

   categoricalProperties




   




   {ordered}

   taxonomy


OrdinalAttributeProperties CategoryTaxonomy


   


category

   


Category


   Figure 12-7 CWM Data Mining Metamodel: Logical Data

    Figure 12-7 contains objects that represent how physical data should be interpreted, logically by the mining algorithm.

   A LogicalAttribute can be categorical, numerical, or both, depending on its usage. Categorical attributes that have ordered category values are created as ordinal attributes.

   12-8 Common Warehouse Metamodel, v1.1 March 2003

   AttributeAssignmentSet


   





   set

   MiningAttribute

   


attrAssi gnment AttributeAssignment



   



logicalAttribute assignment


   attrAssignmnet orderIdAttribute Attribute{ordered} (from Core)


   







   Pi votAttributeAssi gnment DirectAttributeAssignment





   


directAttrAssignment

   



pivotAttrAssignmnet

   


pivotAttrAssignment



   pivotAttrAssignment


attri bute


   Attribute (from Core)

   nameAttribute




   


Attribute

(from Core)

   SetAttri buteAssignment

   






   




   setAttrAssignment setAttrAssignment




   setIdAttribute




   


Attribute (from Core) memberAttribute


   valueAttribute ReversePivotAttributeAssignment

   setIdAttribute


   reversePivotAttrAssignment


0.



   0.



   Attribute selectorAttribute


   (from Core)




   Figure 12-8 CWM Data Mining Metamodel: Attribute Assignment

   • Figure 12-8 illustrates metaclasses that enable mapping physical data attributes to logical data mining attributes. The following attribute assignments are supported:

   March 2003 OMG-CWM, v1.1: Organization of the Data Mining Metamodel

   Class(from Core)


   AttributeUsageSet

   Feature (from Core) /owner




   




   /feature

   


AttributeUsage


   attribute

   usage

   


LogicalAttribute


   






   Figure 12-9 CWM Data Mining Metamodel: Attribute Usage

    Figure 12-9 illustrates metaclasses that enable specification of how a mining attribute should be used, interpreted, or preprocessed (e.g., mining value or outlier/invalid value treatment).


   CategoryMatrix


   categoryMatrix category

   


Category

   






   CategoryMatrixObject

   categoryMatrix




   matrixTaable

   CategoryMatrixTable source Class



(from Core)

   matrixTable




   entry





   


CategoryMatrixEntry matrixTable matrixTable

   




   




   


categoryEntrycategoryEntry

rowIndex

   col umnIndex columnAttribute

   


Category

   


Attribute

rowAttribute

   (from Core) valueAttribute



   Figure 12-10 CWM Data Mining Metamodel: Category Matrix

    Figure 12-10 illustrates the metaclasses that generalize a complex object used to represent a cost matrix (a model build input) or a confusion matrix (a model test result). Two representations are supported:

   March 2003 OMG-CWM, v1.1: Organization of the Data Mining Metamodel

   CategoryTaxonomy taxonomy CategoryMapcategoryMap

   





   taxonomy





   CategoryMapObject


CategoryMapTable mapTable table Class (from Core)

   






   mapTable mapTable

   




   





   mapObject




   mapTable

   entry




   CategoryMapObjectEntry

   entry


entry



   parent


child

   


childAttribute

   





   Category


Attribute


parentAttribute

   (from Core) graphIdAttributerootCategory

   Figure 12-11 CWM Data Mining Metamodel: Category Taxonomy

    Figure 12-11 also illustrates the metaclasses that enable representing a taxonomy as a directed acyclic graph (DAG). Two representations are supported::

   Mining Task

   This package defines the objects that are related to mining tasks. A MiningTask object represents a specific mining operation to be performed on a given data set (i.e., physical data). Figure 12-12 illustrates the basic Mining Task metamodel.

   Transformation (from Transformation)


   MiningTransformation ModelElement(fromCore)

   transformation




   procedure


   MiningTask

   




   miningTaskMini ngModel inputModel miningTask

   




   (fromMiningModel)

   




   


miningTask

   inputData




   PhysicalDatamodelAssignment




   (from MiningData) AttributeAssignmentSet (from MiningData)

   Figure 12-12 CWM Data Mining Metamodel: Mining Task

    Figure 12-12 illustrates Mining Task as referenced by a Mining Transformation. A Mining Task maps physical data to a model signature (when applicable; for example, lift, test, etc.) using the Attribute Assignment set.

   Min ingTask


   MiningBuildTask buildTask

   





   validationData

   





   





   buildTask

   


PhysicalData

   




   buildTask buildTaskbuildTask

   




   (from MiningData)

   validationAssignmentresultModel miningSettings

   




   




   





   settingsAssignment

   Min ingModel MiningFunctionSettings

   AttributeAssignmentSet

   (from MiningModel) (from MiningFunctionSettings)

   (from MiningData)

   Figure 12-13 CWM Data Mining Metamodel: Mining Build Task

   Model elements comprising the Mining Build Task are shown in Figure 12-13 . The modeling of the application of output and the computation of the result of an application of a data mining model to (new) data are illustrated in Figure 12-14 and Figure 12-15 , respectively.

   MiningApplyOutput


   


applyOutput

   MiningAttribute {ordered}


(from MiningData)

   item


   


ApplyOutputItem

   






   ApplySourceItem

   ApplyContentItem



   ApplyProbabilityItem ApplyScoreItem ApplyRuleIdItem


   Figure 12-14 CWM Data Mining Metamodel: Apply Output

    Figure 12-14 illustrates metaclasses that enable defining the content of an Apply task. This includes source items; for example, keys, or specific content of apply (data scoring using a model).

   An apply output may contain multiple source and content items.

   MiningTask


   MiningApplyOutput AttributeAssignmentSet (from MiningData)

   Figure 12-15 CWM Data Mining Metamodel: Mining Apply Task

    Figure 12-15 illustrates metaclasses that allow specification of an apply task. The apply task requires a model, physical data, apply output, and an attribute assignment set.

   Entry Point

   This package defines the top-level objects of DataMining package which can be used as entry point in application programming. This is illustrated in Figure 12-16 .

   Package (from Core)


   CatalogLogicalData(from MiningData)

   


result MiningResult

catalog

   (from MiningResult)

   logicalData




   




   


schema

schema

   Schema



schema categoryMatrix CategoryMatrix


(from Mi ningData)

   schema

   




   schemaschema

   





   auxOobjects


schema

   schema



schema AuxiliaryObject

   


miningModel 0..*

   0..*

   


(from MiningModel)

   MiningModel auxiliaryObject

   task




   miningFunctionSettings




   MiningTask

   taxonomy

   


MiningFunctionSettings

   




   CategoryTaxonomy (from MiningFunctionSettings) (from MiningTask) attributeAssignmentSet (from MiningData) AttributeAssignmentSet (from MiningData)


   Figure 12-16 CWM Data Mining Metamodel: Entry Point

   Clustering

   This package contains the metamodel that represents clustering functions, models, and settings. The Clustering metamodel is illustrated in Figure 12-17 . It contains attribute usage and function settings, subclasses that are specific to the Clustering function.

   March 2003 OMG-CWM, v1.1: Organization of the Data Mining Metamodel

   AttributeUsage (from MiningData)


   ClusteringAttributeUsage attributeComparisonFunction : AttributeComparisonFunction similarityScale : Double / comparisonMatrix : CategoryMatrix


   





   attributeUsage

   comparisonMatrix




   CategoryMatrix (from MiningData)


   MiningFunctionSettings(from MiningFunctionSettings)


   ClusteringFunctionSettings maxNumberOfClusters :Integer minClusterSize : Integer = 1 aggregationFunction : AggregationFunction



   Figure 12-17 CWM Data Mining Metamodel: Clustering

   Association Rules

   This package contains the metamodel that represents the constructs for frequent itemset, association rules and sequence algorithms. The Association Rules metamodel is illustrated in Figure 12-18 .

   MiningFunctionSettings (fromMiningFunctionSettings)


   FrequentItemSetFunctionSettings


   settings exclusion Category

   


(from MiningData)

   





   AssociationRulesFunctionSettings SequenceFunctionSettings


   Figure 12-18 CWM Data Mining Metamodel: Association Rules

   12.2.2.5 Supervised

   This package contains the metamodel that represents the constructs for supervised learning algorithms. The Approximation, Attribute Importance, and Classification packages must implement this package. Figure 12-19 illustrates the Supervised metamodel. It contains test and lift tasks, test and lift results, and a common superclass for supervised function settings.

   MiningTask (from MiningTask)

   MiningResult(from MiningResult)


   MiningTestTask MiningTestResult




   




   


testResult

   testTask

   liftAnalysis

   




   positiveTargetCategory

   


LiftAnalysis Category positiveTargetCategory liftAnalysis (from MiningData)


   






   MiningFunctionSettings(from MiningFunctionSettings)


   liftAnalysis

   





   point

   




   LiftAnalysisPoint

   SupervisedFunctionSettings


   Figure 12-19 CWM Data Mining Metamodel: Supervised

   Classification

   This package contains the metamodel that represents classification function, models, and settings.

   .

   SupervisedFunctionSettings (from Supervised)


   ClassificationFunctionSettings


   (from MiningData)

   Figure 12-20 CWM Data Mining Metamodel: Classification Function Settings

    Figure 12-20 represents the model for Function Settings, while Figure 12-21 illustrates those model elements used to represent Attribute Usage that can include prior probability specification. Figure 12-22 shows that portion of the Classification metamodel modeling Classification Test tasks, results, and apply output.

   AttributeUsage (from MiningData)


   ClassificationAttributeUsage




   


usage

   usage

   





   priors


PriorProbabilities

   positiveCategory 1..*1..*

   Category(from MiningData)

   priors

   


targetValue



   prior

   


priorsEntry PriorProbabilitiesEntry





   Figure 12-21 CWM Data Mining Metamodel: Classification Attribute Usage

   MiningTestTask (from Supervised)

   MiningTestResult (from Supervised)



   ClassificationTestTask testTask testResult


ClassificationTestResult




   testResult

   





   confusionMatrix




   ApplyOutputItem (from MiningTask)

   CategoryMatrix (from MiningData)


   ApplyTargetValueItem


   (from MiningData)

   Figure 12-22 CWM Data Mining Metamodel: Classification Test and Result

   Approximation

   This package contains the metamodel that represents the constructs for approximation modeling (also known as regression). The metamodel is shown in Figure 12-23 .

   March 2003 OMG-CWM, v1.1: Organization of the Data Mining Metamodel

   MiningTestTask

   MiningTestResult (from Supervised)


   SupervisedFunctionSettings (from Supervised)


   ApproximationFunctionSettings


   Figure 12-23 CWM Data Mining Metamodel: Approximation

   Attribute Importance

   This package contains the metamodel that represents the constructs for attribute importance (also known as feature selection) model. This metamodel is illustrated in Figure 12-24 .

   SupervisedFunctionSettings (from Supervised)


   AttributeImportanceSettings


   Figure 12-24 CWM Data Mining Metamodel: Attribute Importance