top of page

Databricks Supply-Chain Intelligence Modernisation

pexels-diva-plavalaguna-6146816.jpg
Challenges

Logtek operated a large-scale RTP network for major retailers where operations relied on accurate forecasting and strict SLA adherence. During peak periods, even minor data delays disrupted supply-chain decisions.

Outcome

£6K–£7K per month was saved by eliminating third-party tools and improving compute and storage efficiency.

Solution

Databricks Modernisation

Challenges
Solution
Technology Stack 
Outcomes

A retail transit-packaging provider exited a high-cost, batch-constrained Snowflake environment and rebuilt its supply-chain analytics platform on Databricks, ADLS and CDC ingestion. This achieved near real-time visibility, governed Gold-layer BI, ~60% storage cost reduction and £6K–£7K monthly savings while improving operational decision-making.


Client Problem


Logtek operates a large-scale returnable transit packaging (RTP) network serving major retailers. The day-to-day operations completely depend on accurate forecasting. Tight SLA adherence and precise coordination across a complex logistics ecosystem are of utmost importance. Where even small data delays can cascade into physical supply-chain disruption. Especially at peak trading times, these risks increased. The margin for error narrows, and operational decisions must be made on current data only.

Technical Pain Points


Over time, the existing analytics platform started to have adverse effects across operations:


Data freshness limitations: Data synchronisation operated on fixed, batch-style schedules. This imposed unavoidable latency on operational reporting, producing stale views during periods of volatility.


Rising cost and scalability pressure: As data volumes and query complexity grew, long-running workloads shot the real-time analytics costs up. Situations like these became increasingly uneconomical to scale.


Pipeline and tooling mess: Legacy ETL patterns and high dependency on third-party tools increased operational complexity, thereby slowing the delivery of new capabilities, making the platform harder to evolve safely.


Operational impact: During peak demand windows, minor inaccuracies in data propagated quickly into delivery and collection planning errors resulting in store-level shortages, degraded customer experience, and SLA exposure. The issue was not visibility alone, but trust in the timeliness and reliability of that visibility. Peak demand widows were prone to having minor inaccuracies in data. These errors quickly rose to delivery and collection planning errors, resulting in store-level shortages, degraded customer experience and SLA exposure.


Root Cause Analysis


For a long all these challenges were thought to be a dashboard problem. However, it was an operating model failure rooted in architecture and workload economics.


Architecture misaligned to operational reality: The logistics network required an incremental change capture and very low-latency updates. Batch synchronisation caused delays when a fast response was highly needed.


Cost drivers embedded in the workload shape: Exponential data growth, inefficient query patterns and long-running compute jobs raised costs. Third-party dependencies further increased both spending and fragility.


Governance as a hidden scaling constraint: As more teams consumed shared datasets, the lack of a central governance control plane made secure self-service BI harder to manage. Without consistent access controls and auditability, scale became risky. Unity Catalog was introduced to resolve this structural gap.

Solution Architecture


The target architecture was engineered to deliver near real-time operational data while resetting storage and compute costs.


Source systems → CDC ingestion


Change Data Capture (CDC) was implemented using Fivetran to move only inserts, updates and deletes, thereby eliminating repeated full reloads and dramatically improving both efficiency and freshness.


Landing and storage layer (ADLS + Medallion)


Azure Data Lake Storage became the core storage foundation, aligned to a Medallion Architecture (Bronze, Silver, Gold). This provided pay-per-use and reduced storage costs, and a clear separation between raw ingestion and curated consumption layers.


Lakehouse compute and transformation (Databricks + Delta)


Databricks was selected for its Spark-native performance and ability to unify batch and streaming workloads. Delta Lake enabled reliable, low-latency updates, while autoscaling and Photon were positioned as performance levers as workloads evolved.


Governance and access control (Unity Catalog)


Unity Catalog established a centralised governance plane, providing consistent permissions, policy enforcement and audit controls across curated datasets.


Consumption (Power BI on Gold Layer)


Power BI was integrated exclusively against the Gold layer, ensuring that business users consumed refined and performance-optimised datasets rather than raw operational tables.


[DIAGRAM PLACEHOLDER]


Before vs After Architecture – components, data flow, governance layer, operations layer


How We Delivered (Step-by-Step Engineering)


Platform changes


Cloudaeons Databricks experts re-architected the platform around Databricks, explicitly avoiding a lift-and-shift replication of legacy constraints. Also, established ADLS as the scalable landing and storage layer, well-structured around the Medallion patterns.


Tooling decisions


Implemented Fivetran CDC to enable near real-time operational reporting without repeated full loads. Introduced Unity Catalog to provide consistent access control, governance and auditability from day one.


Automation introduced


Built a Snowflake → Databricks migration accelerator to automate the migration of tables, views and stored procedures. An estimated manual migration effort exceeding 5,000 hours was reduced by approximately 90% with automation.


Developed additional accelerators for:


Snowflake → ADLS historical data migration. Ongoing SQL Server change processing into ADLS


Testing & validation


Migrated objects and reporting outputs were validated through iterative source-to-target comparison runs. Operational refresh cycles were stabilised before business-facing reporting was enabled, ensuring confidence in both correctness and cadence.

Technology stack 


  • Databricks (Apache Spark) 

  • Delta Lake (ACID, low-latency updates), Photon (performance lever) 

  • Azure Data Lake Storage (ADLS) 

  • Fivetran Change Data Capture (CDC) 

  • Unity Catalog 

  • Power BI (Gold-layer reporting) 

  • Legacy/source context: Snowflake, SQL Server 

Outcomes


Cost reduction


  • £6K–£7K per month saved through elimination of third-party tools and improved compute and storage efficiency.

  • ~60% reduction in storage costs driven by ADLS pay-per-use economics.


Efficiency gains

Data synchronisation cycles were reduced from 24 runs per day to 6, thereby improving processing efficiency and server utilisation.


Decisioning improvements

2× improvement in decision-making efficiency, enabled by faster, more accurate Power BI reporting on near real-time data.


Governance uplift

Unity Catalog delivered clearer access management, stronger audit controls, and enforceable data policies across consumers.


POD & Managed Ops Transition

Delivery was deliberately structured to prevent regression, whether through freshness drift, cost creep or pipeline brittleness.


Solution → POD

A dedicated engineering POD assumed responsibility for ingestion reliability, Medallion layer integrity, governance changes and BI performance guardrails.


POD → Managed Ops

The platform transitioned into SLA-driven operations covering job health, CDC lag monitoring, cost controls and access and audit compliance, treating the lakehouse as an always-on operational service rather than a completed project.


Conclusion


Many organisations face situations like rising data volumes, escalating platform costs and operational decisions constrained by stale or poorly governed data. Cloudaeon helps teams break that pattern by re-engineering analytics platforms around real workload economics, near real-time data flow and sustainable operating models. From platform modernisation to POD-led delivery and managed operations, we focus on making data platforms reliable, cost-efficient and operationally durable, long after go-live.


If your supply-chain, operational or analytics platform is becoming a bottleneck rather than an enabler, talk to us about how we can help you reset it.

We ready for Help you !

Take the first step with a structured, engineering led approach. 

bottom of page