Automated Supply Chain Data Pipeline

About Client

DnK is a premium cannabis manufacturing and retail group based in Michigan. They operate two fast-growing brands serving both medical and recreational markets, managing hundreds of active SKUs across a complex production-to-retail supply chain.

What They Do

Manufacture & distribute premium cannabis products across Michigan
Operate retail locations for medical and recreational customers
Manage a multi-brand portfolio with hundreds of active SKUs
Fulfill orders across a multi-tier production-to-retail supply chain

Their Clients Include

Licensed dispensaries, medical providers, wholesale distributors, and 450+ active retail and wholesale customers across Michigan’s regulated cannabis market.

Problem Statement

DnK was scaling fast — but their data couldn’t keep up. Inventory was scattered with no single source of truth, orders passed through too many handoffs, and leadership had no reliable view of what was happening until it was already too late. They needed real-time supply chain visibility, not another dashboard running on stale data.

The Challenge

Every new brand, SKU, and retail partner added more complexity — but the data infrastructure was still held together by manual processes and fragmented systems. What worked at a few hundred orders per month completely broke at 20,000+ monthly transactions and 450+ active customers.

The operations team was spending 8+ hours a week just cleaning and reconciling data. Inventory counts were unreliable, fulfillment had too many failure points, and performance reports arrived late or in formats that needed hours of manual work before anyone could act. The business was running on instinct when it should have been running on data.

What Was Going Wrong

1. Zero Data Visibility for the Business

Pipeline failures were discovered hours or days after they happened — usually when a report looked wrong. There were no status alerts, no execution summaries, no audit trail. Data engineering was a black box to the people who needed the data most.

2. Inefficient Data Refresh Cycles

No mechanism existed to detect new vs. updated records. The default was full data reloads every time — burning API quotas, wasting compute, and still missing updates that fell between refresh windows. No one could trust whether the data was current.

3. Inconsistent and Unstructured API Data

Source data arrived as deeply nested JSON with no warning when the structure changed. Field names varied between endpoints, optional attributes appeared and disappeared, and schema mismatches regularly broke downstream processes.

4. Slow and Unreliable API Extraction

Paginated API calls ran sequentially — one at a time. Extraction jobs that should have taken minutes stretched to hours. Rate limits caused stalls, retries, and silent failures. Data freshness suffered across the board.

Solution Provided

Exillar designed and built a fully automated, asynchronous ETL pipeline in Python — replacing DnK’s fragile manual processes with a production-grade system deployed in 4 weeks.

The pipeline connects directly to REST APIs, flattens nested JSON into clean relational tables, and enforces consistent schemas regardless of how the source data shifts. Extraction runs asynchronously with built-in pagination and retry logic. A composite-key MERGE strategy detects only new and changed records — no wasteful full reloads. Hourly incremental runs keep data fresh all day; a full overnight refresh acts as the safety net. Every run is tracked through centralized logging and automated email summaries so the business always knows exactly what happened.

What Is an ETL Pipeline in Simple Terms?

Think of it like a postal sorting facility for your data — running automatically, every hour, without anyone pressing a button.

How the System Works:

01 Async Data Extraction

02 Schema Normalization & Flattening

03 Incremental Load with Change Detection

04 Monitoring, Alerts & Audit Trail

Results

System Capabilities

Capability	Detail
Monthly Transactions Processed	20,000+
Manual Work Eliminated	8+ hours per week
Data Freshness	Hourly (previously end-of-day)
Extraction Speed	Under 10 min (previously 3+ hours)
Deployment Time	4 weeks from kickoff to production

Operational Impact

8+ hours per week returned to the operations team — zero manual data wrangling
Data freshness went from end-of-day to every hour, automatically confirmed
Full pipeline transparency — business teams see health status without asking engineering
Scalable foundation built to grow with DnK's transaction volume — no re-engineering needed

Before vs After

Area	Before	After
Data Freshness	12–24 hrs stale (end-of-day batch)	Hourly automated refreshes
Extraction Speed	3+ hours (sequential calls)	Under 10 minutes (async parallel)
Manual Data Work	8+ hrs/week cleanup	Zero manual intervention
Schema Handling	Broke on API changes	Auto-adapts to schema drift
Change Detection	Full reloads every cycle	MERGE — new/changed records only
Error Handling	Failures found hours later	Real-time retry + alerts within minutes
Inventory Visibility	Fragmented, no single source of truth	Unified across all brands, SKUs & locations
Audit Trail	None	Full log of every record, error & retry

Tech Stack

Layer	Technology	Role
Core Language	Python	Pipeline logic, data processing, orchestration
Async HTTP	aiohttp	High-speed parallel API extraction
Cloud Infrastructure	AWS	Hosting, compute, storage, scheduling
Data Source	REST APIs (LeafLink)	Orders, inventory, customer & fulfillment data
Database	PostgreSQL	Structured storage, query layer, analytics
Load Strategy	Composite-Key MERGE	Incremental change detection & upsert logic
Monitoring	Centralized Logging + Email Alerts	Pipeline health, execution summaries, audit trail
Scheduling	Hourly Incremental + Daily Full Refresh	Automated, zero-touch execution