top of page

SharePoint to Insights: Transforming Excel Files with Azure Data Factory and Databricks

SharePoint to Insights Azure Data Factory and Databricks

Excel files, stored in SharePoint are very convenient for data entry, however their unstructured format often becomes a challenge for reporting and analytics. They lack connectivity to the modern data platforms which makes it difficult for data-driven decision making.

Due to their manual management, a lot of challenges, like version confusion and lack of real-time access limit their usefulness.

This insight shows how you can modernise excel heavy data processing from SharePoint to Databricks seamlessly. We’ll walk you through a real-world use case that combines Azure Data Factory (ADF) for orchestration, Azure Data Lake Storage (ADLS) for storage, and Databricks for transformation and analytics.

You will learn how to automate data ingestion and transformation, enabling real-time analytics across your business.

Author

Cloudaeon's Chief Architect, Raj is an alumni of NTT Data and has 20+ years experience in delivering enterprise transformation projects in Cloud, Data & AI.
Raj
Manoharan

Cloudaeon's Chief Architect, Raj is an alumni of NTT Data and has 20+ years experience in delivering enterprise transformation projects in Cloud, Data & AI.

Connect with 
Raj
Manoharan

Get a free recap to share with colleagues

Ready to shape the future of your business?

What is Lorem Ipsum?

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

Rectangle 4636

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

Data Challenges with Excel Files 


No matter where the world is heading with data, AI and automation, Excel files are still a major part of every organisation.  


82% of organisations still rely on manual or Excel-based routing for tasks.  


Excel is known for being error prone, scaling poorly, and lacking security/version control compared to modern BI/data tools.  


Excel files are easy to use, however come with several consequences:  


  • Inconsistent formats: Every user creates multiple sheets with various headers, merged cells, and data types.  

  • Lack of governance: Files are spread across SharePoint folders with uncertain version control. 

  • Manual handling: Analysts download, clean, and re-upload files manually, leading to inefficiencies and possible errors. 

  • Historical data: Handling huge historical data with Excel is difficult at times.

 

How to Transform Excel Files from SharePoint with ADF, ADLS and Databricks 


Step 1: Automate data ingestion with ADF 


Azure Data Factory connects your SharePoint with an analytics platform. Here’s how it automates data ingestion: 


  • Linked services: ADF creates a connection to SharePoint via REST APIs or HTTP connectors. 

  • Dataset definition: Excel files are registered as datasets with schema inference capabilities. 

  • Copy activity: ADF extracts the content from SharePoint and further writes it to ADLS.  


These processed files are stored in well structured format like Parquet in ADLS Gen2.


Step 2: Scalable storage and compute with ADLS + Databricks


The main support of this solution is the architectural separation of storage and compute. 


What is separation of storage and compute? 


  • ADLS storage acts as a centralised, scalable repository for raw and curated data. This is one of the most cost-efficient way of data storage.  

  • Databricks compute provides distributed computing on demand, allowing teams to run transformations only when needed. This enables compute power to be more scalable irrespective of the data volume to deal with. The compute costs are incurred only when active jobs run. 


Databricks Medallion Architecture 


Databricks is based on medallion architecture with multiple layers:

ree

  • Bronze layer: Raw data is ingested from ADF, which is semi-structured and slightly processed.  

  • Silver layer: Semi-structured data is cleaned and enriched further. 

  • Gold layer: Business-level aggregates are applied for KPI-ready datasets, which are optimised for reporting and analytics.  


Step 3: Enhanced data transformation in Databricks 


Databricks is built on Apache Spark. It offers a high-performance environment for processing large volumes of Excel data. Here’s what happens once the raw data lands in ADLS: 


  • Schema enforcement and validation where it ensures consistent column types and naming conventions. 

  • Data joining is where it merges datasets from various sources to provide unified views. 

  • Enriches data by adding references, calculated fields, or business logic. 

  • Data quality checks by detecting missing values and duplicates. 

  • Delta lake integration enables version control and ACID transactions for analytics-ready data. 


Databricks notebooks empower collaborative development using Python, SQL, or Scala, further making them ideal for data engineers and analysts. 


End-to-End Architecture Overview 


Let’s summarise the entire pipeline with a real-world lens: 


  • SharePoint Online: Source of raw Excel data 

  • ADF Linked Service: Connects to SharePoint 

  • ADF Pipeline: Extracts, formats, and lands data in ADLS (Bronze) 

  • Databricks Notebooks: Reads data, transforms it, stores Silver and Gold layers 

  • BI tools like Sigma and Power BI: Consumes analytics-ready datasets 


This setup is fully automated, modular, and scalable. It supports batch or near-real-time processing, with integrated monitoring and logging in both ADF and Databricks. 


Real World Case Study 


ENVU, a company for over half a century, is one of the top in the environmental science industry with their consistent innovation.  


ENVU was a heavy Excel user which lacked proper usage of the tons of data being generated every day. They had to manage their data on a global and regional level and maintained 2 types of files respectively.  


Challenges 


Their process lacked integration and inefficient data management. This led to: 


  • Fragmented data 

  • Error prone data 

  • Decisions were made on stale data 

  • Governance issues 


Collaboration with Cloudaeon


Cloudaeon implemented Databricks medallion architecture. The global and regional eExcelxcel files were ingested from SharePoint using ADF’s copy activity and stored in bronze layer of ADLS.  


Using Databricks medallion architecture, data from these files were cleaned, processed and stored in silver layer of ADLS in delta format. This data was further aggregated and stored as delta table in the gold layer.  


One delta table was created per file. In this way, the data in excel files were updated in gold layer tables ready for analytical reporting. 


These tables were further ingested in Sigma for reporting. 


Results 


  • 99% reduction in manual data processing 

  • 97% elimination of manual Excel tracking 

  • 95% reduction in FTE cost 

  • Instant, real-time decision making. 

  • Seamless data consistency across operations 


Conclusion 


Almost every organisation has their data stored on Excel. However, that shouldn’t stop you from leveraging modern analytics. With ADF, ADLS, and Databricks working in accord, your Excel data can be transformed from plain disconnected files to high-quality, analytics-ready assets.  


If you are looking to automate reporting or bring structure and make sense out of your Excel files, this architecture is the key.  Interested in knowing how this architecture would work for your business?


Talk to our experts now!

Don’t forgot to download or share with your colleagues and help your organisation navigate these trends.

Mask group.png
Smarter data, smarter decisions.
bottom of page