Creating rules for data analysis

You can define and run data rules, which evaluate or validate specific conditions associated with your data sources. Data rules can be used to extend your data profiling analysis, to test and evaluate data quality, or to improve your understanding of data integration requirements.

To work with data rules, start by going to the Develop icon on the Navigator menu in the console, and select Data Quality. This will get you to the starting point of creating and working with data rule functionality.

From the Data Quality workspace you can:

  • Create data rule definitions, rule set definitions, data rules, rule sets, and metrics
  • Build data rule definition, rule set definition, and metric logic
  • Create data rule definition and rule set definition associations
  • Associate a data rule definition, rule set definition, metric, data rule, or rule set with folders
  • Associate a data rule definition, rule set definition, metric, data rule, or rule set with IBM® InfoSphere™ Business Glossary terms, policies, and contacts
  • Build data rule definitions or rule set definitions by using the rule builder
  • Add a data rule definition with the free form editor

Characteristics of data rules

You can use data rules to evaluate and analyze conditions found during data profiling, to conduct a data quality assessment, to provide more information to a data integration effort, or to establish a framework for validating and measuring data quality over time.

Data rules are generated out of data rule definitions. These definitions describe the rule evaluation or condition. By associating physical data sources with the definition, a data rule can be run to return analysis statistics and detail results. The process of creating a data rule definition and generating the subsequent data rule is shown in the following figure:

Figure 1. Process of creating and running a data rule definition

The process of creating a data rule definition, generating a data rule, running a data rule, and viewing the output

IBM InfoSphere Information Analyzer data rules include the following characteristics:

Reusable
The definitions are not explicit to one data source, but can be used and applied to many data sources.
Quickly evaluated
They can be tested interactively, as they are being created, to ensure that they deliver expected information.
Produce flexible output
Data rules can produce a variety of statistics and results, at both the summary and detail levels. You can evaluate data either positively (how many records meet your condition) or negatively (how many records violate your condition), and control the specifics of what you need to view to understand specific issues.
Historical
They capture and retain execution results over time, allowing you to view, monitor, and annotate trends.
Managed
Each data rule has a defined state, such as draft or accepted, so that you can identify the status of each rule.
Categorical
You can organize data rules within relevant categories and folders.
Deployable
You can transfer data rules to another environment. For example, you can export and transfer a data rule to a production environment.
Audit functionality
You can identify specific events associated with a rule such as who modified a rule and the date that it was last modified.

Leave a Reply