Rule-Driven Data Quality Framework for a Multi-Source Azure Lakehouse

Challenges
Fragmented validation and reliance on legacy Attacama led to inconsistent, unreliable data across lakehouse layers, directly impacting reporting accuracy and timelines. The absence of pipeline-integrated quality checks meant issues were detected too late, forcing heavy manual reconciliation and reducing trust in reporting outputs.
Outcome
The implementation of a rule-driven framework using Soda improved data accuracy from ~60% to ~85% while identifying ~150 data quality failures within the first 3 months. Early detection saved over ~1,700 hours of manual effort.
Solution
Lakehouse Build & Modernisation
Challenges
Solution
Technology Stack
Outcomes
For M&S, a multi- source reporting platform was built on Azure lakehouse architecture faced severe data quality issues. This happened due to inconsistent validation across data layers.
Cloudaeon data quality experts implemented a rule-driven data quality framework using Soda (new data quality tool), replacing the legacy Attacama (legacy tool). We ensured that the new framework applied validation checkpoints across bronze, silver, gold and platinum layers. This improved reporting data accuracy, reducing incidents and significantly lowering manual reconciliation effort.
Client Problem
M&S operated a reporting platform built on Azure, where data from multiple source systems was ingested into a lakehouse environment and transformed across all layers before being used for reporting purposes.
Paint Points
The data ingestion and processing took place as expected. But the real challenge was trust. Data quality validations were not consistently applied across all layers, which resulted in:
Severe reporting issues
Duplication or inconsistency in records
Data files went missing
Manual efforts were taken by reporting teams
Data incidents impacting reporting timelines
The challenge was not data ingestion or processing. The problem was ensuring data remained reliable as it moved across lakehouse layers.
Root Cause Analysis
The key issues identified were:
Multi-source data inconsistency Data from different source systems produced different schemas, formats and patterns that led to inconsistencies during transformation.
Lack of validation checkpoints between lakehouse layers Data moved from bronze, silver, to gold layers without structured quality gates.
Reactive data quality approach Data issues were often discovered only after reaching reporting layers, not before that.
Manual reconciliation effort Reporting teams spent significant time only on validating and fixing data issues.
Tool-based rather than pipeline-integrated quality checks Data quality checks existed, but were not fully integrated into the lakehouse architecture.
Initially, M&S maintained the data quality via Attacama (legacy data quality tool). This did not fulfil the requirements of 100% data quality as it used to execute outside the lakehouse transformation flow and was not fully integrated into layer-wise data processing. This was a reactive approach where data issues were identified after data had already reached reporting datasets.
Solution Architecture
Cloudaeon’s solution introduced a rule-driven data quality check integrated into the Azure lakehouse architecture. Each transition between layers became a data quality validation checkpoint.
Validation by Layer:
Ingestion Layer
Schema validation
Mandatory field checks
Record integrity checks
Transformation Layers (Bronze → Silver → Gold)
Data standardisation checks
Referential integrity checks
Business rule validations
Refined Layer (Gold)
Reporting readiness checks
Completeness checks
Cross-dataset consistency checks
Business Rule Validation Framework
In order to implement a top-notch data quality framework, Cloudaeon experts automated business-rule validations across revenue, transactions, pricing, stock and operational datasets.
This was done to make sure the reporting accuracy and data reliability.
Consistency checks
Revenue consistency: To flag abnormal daily revenue variance compared to historical trends.
Transaction volume consistency: To detect unexpected drops or spikes in partner transaction volumes.
Uniqueness checks
Transaction record uniqueness: Used to prevent duplicate transaction records.
Selling price uniqueness: To make sure there is only one active price per product and site.
Timeliness checks
Retailer file timeliness: To show alerts when expected retailer files are not received or processed.
Validity checks
Email format validation: To ensure email addresses meet defined format rules.
Stock quantity validation: To flag abnormal stock reductions beyond expected sales movement.
Accuracy checks
In-transit quantity accuracy: To validate in-transit stock against transfer order and goods receipt data.
These business rule checks ensured data quality issues were identified early in the pipeline rather than later at the reporting stage.
Attacama | Soda |
Tool-based validation | Pipeline-integrated validation |
Reactive issue detection | Proactive validation |
Checks mostly at reporting layer | Checks at each lakehouse layer |
Manual reconciliation | Automated rule enforcement |
Low trust in reporting data | Trusted reporting datasets |
How We Delivered
Cloudaeon’s Databricks and data quality experts delivered the entire solution in structured phases:
Mapped end-to-end reporting data flow from source systems to reporting.
Identified data quality failure points across lakehouse layers.
Defined validation rule categories included: Integrity, business, timeliness, accuracy, consistency, uniqueness, completeness.
Implemented Soda checks across ingestion and transformation pipelines.
Integrated data quality checks with governance and lineage.
Established monitoring and incident tracking.
Transitioned the solution into POD ownership and managed operations.
Technology Stack
Microsoft Azure (Cloud Platform)
Azure Lakehouse Architecture
Bronze / Silver / Gold / Platinum Data Layers
Soda
Attacama (legacy data quality tool)
Microsoft Purview (Governance, Lineage, Monitoring)
Outcomes
Cloudaeon's implementation of a rule-driven data quality framework delivered measurable improvements in data accuracy and incident reduction, resulting in:
~150 data quality failures identified within the first 3 months
5 Sev2 incidents prevented with ~40 hours efforts saved
130 Sev3 incidents were detected early, thereby saving ~1,500 hours
15 Sev4 incidents identified saving ~230 hours
Data accuracy improved from ~60% to ~85%
Significant reduction in manual reconciliation and reporting issues
Cloudaeon’s data quality framework automates validation and rule checks within the data platform itself. It improved reporting trust and platform reliability.
POD & Managed Operations Transition
Cloudaeon’s engagement with M&S moved in a structured transition model:
Solution → POD → Managed Operations
The solution phase implemented the data quality framework.
The POD team took ownership of rule updates, pipeline tuning and new data source onboarding.
Managed Operations provided SLA-based monitoring, incident response and continuous improvement.
Conclusion
Cloudaeon’s implementation of a rule-driven data quality framework and the transition from Attacama to Soda transformed M&S’s approach from a reactive reporting environment into a controlled and reliable system. By introducing validation checkpoints across lakehouse layers and automating business rule checks, their data accuracy improved significantly. This data quality solution implementation established a scalable data quality operating model to support future data growth.
