/DQ
Information WareHouse Administration Toolkit
![]() |
/DQ is the data Quality Control module. Reality teaches us, the hard way, that data, which needs to be processed in a Data Warehouse, is far from perfect. Source data can be delivered incomplete, scrambled or not in time. References between tables can be incorrect, either they are already wrong in the source system or they are mixed up during the extraction process. However customer confidence stands or falls with the quality of data in the Data Warehouse. The control and health of the data is a continuous process and must be an integral part of the daily processing in the Data Warehouse. |
| But, the quality of data changes over time. Data, which is ‘healthy’ in a certain period in time, may be come bad in the future and vice versa. A fixed, one-off development of a quality control solution is not the answer. A Data Warehouse needs a solution that can adapt and is flexible in usage. It is the opinion of Marobi that a standard framework for data quality control should be in place for any Data Warehouse environment. A system, that allows for configuration of quality control, which is independent of development. A framework must be built-in into all processing code, with a mechanism that allows for configuration afterwards during production. Such a system in combination with a framework has been realized. The solution revolves around the concept of a validation engine in combination with validation rules. The trigger to execute a certain set of validation rules is initiated by Data Warehouse processes. However these processes do not define or control these validation rules. This is done, using a separate validation engine. Communication between Data Warehouse processes and this engine is via exchange of textual identifiers and result codes of validation rule executions. The validation engine maintains all relevant information in a central repository. It communicates with the ‘outside’ world through a very simple interface. Requests for validation are uniquely identified by textual identifiers. Such an identifier defines a group of rule definitions. These rule definitions are ordered with their group. The engine will, when requested, execute each rule in that given order. When a rule fails, the engine will abort the execution and return a failure code to the requestor. When all rules execute fine, the engine will return success. |
