top of page
Layer_1Cloudaeon - Logo White.png
GSK.png

Transforming Data Operations with AI-Powered ETL

M&S unlocked £3.96 million in operational efficiency gains by transforming data operations with AI-powered ETL on Databricks

Transforming Data Operations with AI-Powered ETL

What is Lorem Ipsum?

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

Rectangle 4636

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

Introduction 

Marks & Spencer (M&S) is one of the UK’s most iconic retailers, trusted by millions of customers for quality food, clothing and home products. As a data driven organisation, M&S continually seeks to enhance operational efficiency, improve decision making and drive innovation across its extensive operations. 


Faced with growing data complexity and the need for real-time insights, M&S partnered with Cloudaeon to modernise its ETL (Extract, Transform, Load) and data orchestration processes. Together, they implemented a scalable, AI-powered platform built on Databricks, delivering significant improvements in data speed, quality, operational efficiency, and business agility. 


 

Challenge 

As M&S’s operations expanded, so too did the volume, complexity, and fragmentation of its data​. Data originated from diverse systems across multiple business units, including ERP platforms, CRM systems, supplier feeds and third-party sources. 


This created three major challenges: 


1. Data Management Complexity 

Raw data arrived in a range of formats and sources, creating fragmented silos, inconsistent data quality and escalating operational costs. The lack of unified, governed data pipelines limited the generation of timely, actionable insights. 


2. Manual, Error Prone Ingestion Processes 

Critical sales and operational data, often stored in massive Excel files, was manually uploaded to SharePoint. Ingesting this data into downstream systems involved multiple manual steps, frequent failures and high operational overhead. Manual interventions slowed reporting cycles, reduced trust in outputs and diverted skilled resources away from strategic work. 


3. Lack of Real-Time Data Availability 

In a dynamic retail environment, the ability to respond swiftly to market shifts, customer demands and operational issues is essential. However, the legacy ETL architecture lacked real-time integration capabilities, leaving business teams dependent on delayed and sometimes outdated data. 


Despite having deployed Databricks as a core data platform, M&S had not yet fully leveraged its performance, scalability, or potential for real-time operations. 

 

The objective was clear: Design a future-ready ETL solution that could deliver accurate, scalable, real-time insights with minimal manual intervention. 
 

Solution 

Cloudaeon partnered with M&S to engineer a next generation ETL and orchestration platform, fully exploiting the capabilities of Databricks, Prophecy and Apache Airflow​. 


Key elements of the solution included: 


1. Databricks as the Performance Core 

Databricks provided the cloud native environment for high speed, scalable Spark based data processing, enabling large datasets to be transformed efficiently and consistently. 


2. Prophecy for Low-Code ETL Transformation 

Prophecy’s low-code platform enabled rapid, visual development of ETL pipelines on Apache Spark. This dramatically improved reusability, simplified maintenance  and accelerated pipeline deployment, whilst maintaining the flexibility needed for complex business logic. 


3. Apache Airflow for Orchestration 

Airflow was deployed to orchestrate and automate data ingestion workflows end-to-end. Automated scheduling, dependency management and real-time pipeline monitoring greatly improved operational reliability. 


4. Streamlined Automation and Monitoring 

Manual data ingestion from SharePoint into Azure Data Lake Storage (ADLS) was fully automated, incorporating audit trails, error handling, notification systems and monitoring dashboards. Kafka was introduced for real-time data ingestion, enabling near instantaneous data availability for analysis. 


5. Real-Time Reporting and Visualisation 

A new analytics layer was established over the transformed data, with Power BI dashboards providing business users with real-time, self-service insights. Role based access controls maintained data governance and security compliance. 


6. CI/CD and Version Control 

GitHub Actions was implemented to automate testing, version control, and deployment processes for ETL pipelines, enhancing development agility and governance. 


Through this combination of cloud native platforms, AI-powered low-code tooling and intelligent orchestration, M&S was able to rapidly reengineer its data operations. 

Impact 


 

Impact 

The modernised ETL solution delivered transformational results for M&S: 


1. 62% Reduction in Processing Time 

The time taken to extract, transform and load critical datasets was reduced by 62%, significantly accelerating data availability across business units. 


2. 96% Reduction in Manual Intervention 

Automation eliminated 96% of manual steps in data ingestion and processing. This unlocked £3.96 million in operational efficiency gains, enabling skilled team members to focus on higher value, strategic data initiatives.  


3. 98% Improvement in Data Quality 

Automated cleansing, validation and error detection processes improved data reliability by 98%, strengthening business confidence in reporting and insights​. 


4. Expanded Business Capacity 

Improved operational efficiency enabled M&S’s central data team to dramatically accelerate delivery cycles. What previously took 9 weeks to deliver, (such as business critical data products) can now be achieved in just 1 week, driving faster time to value and enabling the business to respond far more rapidly to opportunities. 


5. Scalable, Future Ready Infrastructure 

Rather than simply migrating legacy Hadoop platforms to the cloud, M&S took a strategic leap forward, building its award winning BEAM platform on Microsoft Azure and Databricks. This modern, cloud native architecture enables real-time insights, seamless scaling and supports the next generation of AI and advanced analytics initiatives across the business​. 


 

Conclusion 

Marks & Spencer’s partnership with Cloudaeon has redefined its data operations, moving from fragmented, manual ETL processes to a fully automated, AI-driven, real-time data ecosystem. 


By harnessing the full power of Databricks, Prophecy, and Airflow within its award-winning BEAM platform, M&S has delivered faster insights, superior data quality, dramatically improved operational efficiency, enabling substantial cost savings. 


This transformation has freed capacity to focus on strategic data innovation, dramatically increased agility in responding to business needs and laid a resilient, scalable foundation for the future. 

Mask group.png
Smarter data, smarter decisions.
bottom of page