SharePoint to Insights: Transforming Excel Files with Azure Data Factory and Databricks

Excel files, stored in SharePoint are very convenient for data entry, however their unstructured format often becomes a challenge for reporting and analytics. They lack connectivity to the modern data platforms which makes it difficult for data-driven decision making.
Due to their manual management, a lot of challenges, like version confusion and lack of real-time access limit their usefulness.
This insight shows how you can modernise excel heavy data processing from SharePoint to Databricks seamlessly. We’ll walk you through a real-world use case that combines Azure Data Factory (ADF) for orchestration, Azure Data Lake Storage (ADLS) for storage, and Databricks for transformation and analytics.
You will learn how to automate data ingestion and transformation, enabling real-time analytics across your business.
Get a free recap to share with colleagues
What is Lorem Ipsum?
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
Data Challenges with Excel Files
No matter where the world is heading with data, AI and automation, Excel files are still a major part of every organisation.
82% of organisations still rely on manual or Excel-based routing for tasks.
Excel is known for being error prone, scaling poorly, and lacking security/version control compared to modern BI/data tools.
Excel files are easy to use, however come with several consequences:
Inconsistent formats: Every user creates multiple sheets with various headers, merged cells, and data types.
Lack of governance: Files are spread across SharePoint folders with uncertain version control.
Manual handling: Analysts download, clean, and re-upload files manually, leading to inefficiencies and possible errors.
Historical data: Handling huge historical data with Excel is difficult at times.
How to Transform Excel Files from SharePoint with ADF, ADLS and Databricks
Step 1: Automate data ingestion with ADF
Azure Data Factory connects your SharePoint with an analytics platform. Here’s how it automates data ingestion:
Linked services: ADF creates a connection to SharePoint via REST APIs or HTTP connectors.
Dataset definition: Excel files are registered as datasets with schema inference capabilities.
Copy activity: ADF extracts the content from SharePoint and further writes it to ADLS.
These processed files are stored in well structured format like Parquet in ADLS Gen2.
Step 2: Scalable storage and compute with ADLS + Databricks
The main support of this solution is the architectural separation of storage and compute.
What is separation of storage and compute?
ADLS storage acts as a centralised, scalable repository for raw and curated data. This is one of the most cost-efficient way of data storage.
Databricks compute provides distributed computing on demand, allowing teams to run transformations only when needed. This enables compute power to be more scalable irrespective of the data volume to deal with. The compute costs are incurred only when active jobs run.
Databricks Medallion Architecture
Databricks is based on medallion architecture with multiple layers:

Bronze layer: Raw data is ingested from ADF, which is semi-structured and slightly processed.
Silver layer: Semi-structured data is cleaned and enriched further.
Gold layer: Business-level aggregates are applied for KPI-ready datasets, which are optimised for reporting and analytics.
Step 3: Enhanced data transformation in Databricks
Databricks is built on Apache Spark. It offers a high-performance environment for processing large volumes of Excel data. Here’s what happens once the raw data lands in ADLS:
Schema enforcement and validation where it ensures consistent column types and naming conventions.
Data joining is where it merges datasets from various sources to provide unified views.
Enriches data by adding references, calculated fields, or business logic.
Data quality checks by detecting missing values and duplicates.
Delta lake integration enables version control and ACID transactions for analytics-ready data.
Databricks notebooks empower collaborative development using Python, SQL, or Scala, further making them ideal for data engineers and analysts.
End-to-End Architecture Overview
Let’s summarise the entire pipeline with a real-world lens:
SharePoint Online: Source of raw Excel data
ADF Linked Service: Connects to SharePoint
ADF Pipeline: Extracts, formats, and lands data in ADLS (Bronze)
Databricks Notebooks: Reads data, transforms it, stores Silver and Gold layers
BI tools like Sigma and Power BI: Consumes analytics-ready datasets
This setup is fully automated, modular, and scalable. It supports batch or near-real-time processing, with integrated monitoring and logging in both ADF and Databricks.
Real World Case Study
ENVU, a company for over half a century, is one of the top in the environmental science industry with their consistent innovation.
ENVU was a heavy Excel user which lacked proper usage of the tons of data being generated every day. They had to manage their data on a global and regional level and maintained 2 types of files respectively.
Challenges
Their process lacked integration and inefficient data management. This led to:
Fragmented data
Error prone data
Decisions were made on stale data
Governance issues
Collaboration with Cloudaeon
Cloudaeon implemented Databricks medallion architecture. The global and regional eExcelxcel files were ingested from SharePoint using ADF’s copy activity and stored in bronze layer of ADLS.
Using Databricks medallion architecture, data from these files were cleaned, processed and stored in silver layer of ADLS in delta format. This data was further aggregated and stored as delta table in the gold layer.
One delta table was created per file. In this way, the data in excel files were updated in gold layer tables ready for analytical reporting.
These tables were further ingested in Sigma for reporting.
Results
99% reduction in manual data processing
97% elimination of manual Excel tracking
95% reduction in FTE cost
Instant, real-time decision making.
Seamless data consistency across operations
Conclusion
Almost every organisation has their data stored on Excel. However, that shouldn’t stop you from leveraging modern analytics. With ADF, ADLS, and Databricks working in accord, your Excel data can be transformed from plain disconnected files to high-quality, analytics-ready assets.
If you are looking to automate reporting or bring structure and make sense out of your Excel files, this architecture is the key. Interested in knowing how this architecture would work for your business?


