Automated Supply Chain Data Pipeline

We built the data backbone for Michigan's fastest-growing cannabis group — an automated ETL pipeline that processes 20,000+ monthly transactions, eliminates 8+ hours of weekly manual work, and delivers near-real-time supply chain visibility with zero human intervention.
Project Year
01 Jun 2025
Client Name
DnK Cannabis
Location
Michigan, USA
Industry

About Client

DnK is a premium cannabis manufacturing and retail group based in Michigan. They operate two fast-growing brands serving both medical and recreational markets, managing hundreds of active SKUs across a complex production-to-retail supply chain.

What They Do

Their Clients Include

Licensed dispensaries, medical providers, wholesale distributors, and 450+ active retail and wholesale customers across Michigan’s regulated cannabis market.

Problem Statement

DnK was scaling fast — but their data couldn’t keep up. Inventory was scattered with no single source of truth, orders passed through too many handoffs, and leadership had no reliable view of what was happening until it was already too late. They needed real-time supply chain visibility, not another dashboard running on stale data.

The Challenge

Every new brand, SKU, and retail partner added more complexity — but the data infrastructure was still held together by manual processes and fragmented systems. What worked at a few hundred orders per month completely broke at 20,000+ monthly transactions and 450+ active customers.
The operations team was spending 8+ hours a week just cleaning and reconciling data. Inventory counts were unreliable, fulfillment had too many failure points, and performance reports arrived late or in formats that needed hours of manual work before anyone could act. The business was running on instinct when it should have been running on data.

What Was Going Wrong

1. Zero Data Visibility for the Business

Pipeline failures were discovered hours or days after they happened — usually when a report looked wrong. There were no status alerts, no execution summaries, no audit trail. Data engineering was a black box to the people who needed the data most.

2. Inefficient Data Refresh Cycles

No mechanism existed to detect new vs. updated records. The default was full data reloads every time — burning API quotas, wasting compute, and still missing updates that fell between refresh windows. No one could trust whether the data was current.

3. Inconsistent and Unstructured API Data

Source data arrived as deeply nested JSON with no warning when the structure changed. Field names varied between endpoints, optional attributes appeared and disappeared, and schema mismatches regularly broke downstream processes.

4. Slow and Unreliable API Extraction

Paginated API calls ran sequentially — one at a time. Extraction jobs that should have taken minutes stretched to hours. Rate limits caused stalls, retries, and silent failures. Data freshness suffered across the board.

Solution Provided

Exillar designed and built a fully automated, asynchronous ETL pipeline in Python — replacing DnK’s fragile manual processes with a production-grade system deployed in 4 weeks.
The pipeline connects directly to REST APIs, flattens nested JSON into clean relational tables, and enforces consistent schemas regardless of how the source data shifts. Extraction runs asynchronously with built-in pagination and retry logic. A composite-key MERGE strategy detects only new and changed records — no wasteful full reloads. Hourly incremental runs keep data fresh all day; a full overnight refresh acts as the safety net. Every run is tracked through centralized logging and automated email summaries so the business always knows exactly what happened.

What Is an ETL Pipeline in Simple Terms?

Think of it like a postal sorting facility for your data — running automatically, every hour, without anyone pressing a button.

How the System Works:

01

Async Data Extraction

02

Schema Normalization & Flattening

03

Incremental Load with Change Detection

04

Monitoring, Alerts & Audit Trail

Results

System Capabilities

Capability Detail
Monthly Transactions Processed 20,000+
Manual Work Eliminated 8+ hours per week
Data Freshness Hourly (previously end-of-day)
Extraction Speed Under 10 min (previously 3+ hours)
Deployment Time 4 weeks from kickoff to production

Operational Impact

Before vs After

Area Before After
Data Freshness 12–24 hrs stale (end-of-day batch) Hourly automated refreshes
Extraction Speed 3+ hours (sequential calls) Under 10 minutes (async parallel)
Manual Data Work 8+ hrs/week cleanup Zero manual intervention
Schema Handling Broke on API changes Auto-adapts to schema drift
Change Detection Full reloads every cycle MERGE — new/changed records only
Error Handling Failures found hours later Real-time retry + alerts within minutes
Inventory Visibility Fragmented, no single source of truth Unified across all brands, SKUs & locations
Audit Trail None Full log of every record, error & retry

Tech Stack

Layer Technology Role
Core Language Python Pipeline logic, data processing, orchestration
Async HTTP aiohttp High-speed parallel API extraction
Cloud Infrastructure AWS Hosting, compute, storage, scheduling
Data Source REST APIs (LeafLink) Orders, inventory, customer & fulfillment data
Database PostgreSQL Structured storage, query layer, analytics
Load Strategy Composite-Key MERGE Incremental change detection & upsert logic
Monitoring Centralized Logging + Email Alerts Pipeline health, execution summaries, audit trail
Scheduling Hourly Incremental + Daily Full Refresh Automated, zero-touch execution