top of page
Layer_1Cloudaeon - Logo White.png
GSK.png

Building Real-time Invoice Data Processing

M&S cuts processing times by 79% by streamlining data processing with Prophecy, Kafka and Databricks

Building Real-time Invoice Data Processing

What is Lorem Ipsum?

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

Rectangle 4636

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

Introduction 

Marks & Spencer (M&S), one of the UK’s most established retailers, relies on seamless data operations to support its complex supply chain and financial processes. With thousands of suppliers and daily invoice flows, managing large volumes of semi-structured data at speed and scale was critical to maintaining operational efficiency. 


However, legacy processes involving manual data handling and fragmented workflows created unnecessary delays, complexity and operational costs. 


In partnership with Cloudaeon, M&S re-engineered its invoice data processing pipelines by leveraging a low-code, AI-powered ETL approach, combining Prophecy, Kafka, and Databricks to deliver faster, scalable data ingestion and processing. 


 

Challenge 

M&S faced increasing challenges in efficiently managing and processing invoice data from its supplier ecosystem.  

Key issues included: 


1. Inefficient Multi-Step Processing 

The legacy process involved complex, multi-stage workflows, sourcing Kafka topics, extracting file locations, making API calls and manually processing JSON files. This led to high latency, slowed down processing times and increased operational overhead. 


2. High Operational Overhead 

Authentication handling, API interactions and metadata management consumed substantial resources within Databricks environments, adding further complexity and effort to already fragmented ETL processes. 


3. Redundant Data Processing 

The absence of a robust offset tracking mechanism meant previously processed Kafka messages were often re-ingested, wasting resources, duplicating data and reducing data quality. 


4. Skill Dependency 

Maintaining and scaling Spark-based ETL jobs required specialised engineering expertise, making the process resource intensive, costly and less agile. 


The objective was clear: Build an automated, low-code, high performance ETL pipeline capable of real-time invoice ingestion, minimal manual intervention, improved data quality and scalable performance. 

 

Solution 

Cloudaeon partnered with M&S to design a modern, AI-powered ETL solution focused on automation, real-time ingestion, and operational efficiency. 


The solution architecture included: 


1. Kafka for Real-Time Ingestion 

Kafka was integrated to manage real-time streaming of supplier invoice data. Kafka’s message retention and offset tracking features were leveraged to eliminate redundant processing and ensure scalable, efficient data flows. 


2. Prophecy for Low-Code ETL and Process Automation 

Prophecy provided a low-code development environment for building visual ETL pipelines on top of Spark. 


 Using Prophecy: 

  • Complex, multi-step ETL processes were collapsed into streamlined, reusable visual pipelines.  

  • API calls, JSON ingestion, metadata tracking and error handling were automated.  

  • Manual storage of intermediate Kafka messages was eliminated by referencing the output of prior steps directly.  

  • Automation of file location extraction and JSON processing dramatically reduced the need for manual scripts.  


3. Databricks for High-Performance Processing 

Databricks served as the core engine for data transformation and storage. Prophecy generated Spark jobs (.whl files) were deployed to Databricks, ensuring highly scalable and efficient execution of ingestion and transformation tasks. 


4. GitHub Actions for Version Control and CI/CD 

GitHub Actions was integrated for source control, automating pipeline deployment and versioning, further enhancing maintainability and governance. 

The result was a single automated ETL pipeline, orchestrating real-time invoice ingestion, transformation, storage and monitoring, reducing complexity, increasing reliability and accelerating performance. 


 

Impact 

The modernised invoice data processing solution delivered major operational benefits for M&S


1. 79% Reduction in Processing Time 

End-to-end processing of invoice data was accelerated by 79%, eliminating redundant manual steps and optimising data workflows. 


2. 83% Improvement in Operational Efficiency 

Low-code automation significantly reduced manual coding effort for Spark-based transformations, freeing up skilled resources and improving development agility. 


3. Enhanced Data Quality 

By automating ingestion, offset tracking and validation, M&S ensured consistent, high quality data ingestion, with easier debugging for rapid issue resolution. 


4. Reduced Resource Dependency 

The intuitive Prophecy environment reduced reliance on specialist Spark developers, allowing a broader set of analysts and engineers to manage and adapt data pipelines independently. 


5. Real-Time Data Availability 

Kafka integration and automated ETL pipelines enabled near-real-time visibility into invoice data, improving the accuracy and timeliness of financial operations and supplier engagement. 


 

Conclusion 

By modernising its invoice data processing pipelines with Prophecy, Kafka, and Databricks, Marks & Spencer transformed a complex, manual and resource intensive workflow into an automated, high-performance, real-time ingestion system. 


The solution cut processing times by 79%, boosted operational efficiency by 83% and significantly improved data quality and business agility. 


This project demonstrates how intelligent automation, low-code development, and scalable cloud technologies can drive substantial operational improvements and strategic value for large scale enterprises. 

Mask group.png
Smarter data, smarter decisions.
bottom of page