How We Replaced Fragmented Spreadsheets With anAutomated Data Pipeline Across Six CannabisFacilities — and Cut Compliance Reporting From 3Days to 2 Hours

Blog 3
Cannabis is one of the most data-intensive regulated industries in the world. Every plant has to be tracked from seed to sale. Every transfer between facilities has to be logged. Every gram of inventory has to reconcile against state or provincial reporting systems. And if the numbers don’t match — the consequences aren’t a fine. They’re a licence revocation.
Most cannabis operators know this. What they don’t have is the data infrastructure to handle it at scale.
One of our clients — a multi-state cannabis operator running six cultivation, processing, and dispensary facilities — was managing all of this with spreadsheets, manual exports, and a compliance team that spent three full days every reporting cycle pulling data from disconnected systems, reconciling it by hand, and formatting reports for regulators.
They weren’t doing anything wrong. The spreadsheets were accurate — most of the time. But the process was fragile, slow, and entirely dependent on two people who knew where everything lived. When the business grew from three facilities to six, the process didn’t scale. The compliance team went from “stretched” to “one mistake away from a regulatory issue.”
They came to Exillar not for a dashboard or a new tool, but for the data layer underneath — an automated pipeline that would connect their seed-to-sale tracking, inventory management, and compliance reporting into a single, reliable system.
This is what we built, how it works, and what changed.

The Problem: Six Facilities, Seven Data Sources, Zero Automation

The client’s data problem wasn’t complexity — it was fragmentation. Every facility had its own systems, its own exports, and its own way of tracking things.
1. Seed-to-sale tracking system (Metrc)
The state-mandated tracking platform. Every plant, every transfer, every sale has to be logged here. But Metrc is a compliance tool, not an analytics platform. Getting data out of it for reporting or reconciliation required manual CSV exports.
2. Point-of-sale systems across four dispensaries
Each dispensary ran its own POS. Sales data, inventory movements, and customer transaction records lived in four separate databases with four different schemas.
3. Cultivation management software
Two cultivation facilities used different grow management platforms. Plant health data, harvest yields, and batch tracking were siloed in each.
4. Inventory management spreadsheets
Processing and packaging inventory was tracked in Excel. Updated manually. Version control was “whoever saved last wins.”
5. Accounting system
Financial data — COGS, revenue by facility, tax obligations — lived in QuickBooks. Reconciling financial data against operational data required manual cross-referencing.
6. Lab testing results
Third-party lab results for potency, terpenes, and contaminants came in as PDFs and were manually entered into spreadsheets for batch tracking.
7. State reporting templates
Each state had its own reporting format, its own data requirements, and its own submission schedule. The compliance team rebuilt reports from scratch for each jurisdiction every cycle.
The result: a compliance team of three spending three full days per reporting cycle — roughly every two weeks — manually pulling, cleaning, reconciling, and formatting data from seven sources across six facilities. The process worked. Until it didn’t scale.

Why Cannabis Data Infrastructure Is Different From Every Other Industry

Cannabis operators face data challenges that don’t exist in most other regulated industries. Understanding these constraints shaped every architectural decision in the pipeline.
Seed-to-sale traceability is legally mandatory
Unlike most supply chains where traceability is a best practice, cannabis traceability is a legal requirement. Every plant must be tracked from the moment it’s planted to the moment it’s sold to a customer. Gaps in the chain aren’t operational problems — they’re compliance violations.
Multi-state operators face different regulations in every market
A six-facility operator across three states has to comply with three different regulatory frameworks, three different reporting formats, and three different data submission requirements. There’s no federal standard. Every state is different.
Inventory discrepancies trigger audits
In most industries, a 2% inventory variance is a rounding error. In cannabis, any discrepancy between physical inventory and what’s reported in the seed-to-sale system can trigger a regulatory audit. The tolerance for error is effectively zero.
Data lives in state-mandated systems the operator doesn’t control
Metrc, BioTrack, and other seed-to-sale platforms are mandated by the state. Operators have to use them, but they don’t control the data model, the export format, or the API capabilities. Building a pipeline on top of these systems means working within constraints you can’t change.
Financial data and operational data have to reconcile perfectly
Tax authorities in legal cannabis markets require that financial reporting aligns with seed-to-sale tracking data. Revenue reported to the state tax authority has to match the sales reported to the cannabis regulatory body. If they don’t — both agencies come asking questions.
These constraints mean cannabis data infrastructure can’t be built with generic pipeline tools and default configurations. Every decision — ingestion frequency, validation rules, reconciliation logic, error handling — has to account for the regulatory reality.

What We Built: Architecture of the Supply Chain Data Pipeline

The pipeline has four layers, each designed to solve a specific part of the data fragmentation problem.
Layer 1 — Data Ingestion Layer
Automated connectors to all seven source systems: Metrc API integration, POS system database connectors (four dispensaries, two different POS platforms), cultivation management software API connections, automated ingestion of Excel-based inventory files, QuickBooks API for financial data, PDF parsing for lab test results (OCR + structured extraction), and state reporting template reverse-engineering for automated formatting. Data is pulled on a schedule matched to each source’s update frequency.
Layer 2 — Transformation and Normalisation Layer
Raw data from seven sources arrives in seven different formats. This layer normalises everything into a unified data model: plant lifecycle records from Metrc and cultivation software are merged into a single timeline per plant, inventory records are reconciled into a single inventory view per facility, financial records are mapped to operational data by facility and batch, and lab results are linked to specific batches and harvests. Every transformation is documented and testable. No black boxes.
Layer 3 — Data Warehouse
A single, structured source of truth storing complete plant lifecycle data (seed to sale) for every plant across all facilities, real-time inventory by facility and batch, financial data reconciled against operational data, lab testing results linked to batches, and historical compliance reports for audit trail.
Layer 4 — Reporting and Compliance Layer
Automated report generation for each state’s regulatory requirements. The system knows each state’s reporting format, data requirements, and submission schedule. Reports are generated automatically, validated against source data, and staged for the compliance team’s review before submission. The compliance team reviews and submits. The system does the assembly.

Seed-to-Sale Tracking: How the Pipeline Handles Plant Lifecycle Data

Seed-to-sale tracking is the backbone of cannabis compliance. The pipeline maintains a complete lifecycle record for every plant across all facilities.
What gets tracked per plant:
How the pipeline maintains accuracy:
Every Metrc record is cross-referenced against the cultivation management software and POS data. If a transfer is logged in Metrc but doesn’t appear in the receiving facility’s POS system within 24 hours, the system flags it for investigation. If a batch’s lab results show a potency that’s statistically outside the range for that strain, the system flags it — not as an error, but as a data point that warrants verification. These validation rules were designed with the compliance team and reflect the specific checks that regulators perform during audits.

Inventory Reconciliation: Automated, Not Manual

Inventory reconciliation was the single most time-consuming task the compliance team performed manually. The pipeline automates it.
How reconciliation works:
Every 4 hours, the system compares three views of inventory:
Any discrepancy between these three views is flagged immediately with the specific product and facility, the size of the discrepancy (in grams), the likely source, and a recommended action.
Timing lag handling:
Most inventory discrepancies in cannabis aren’t theft or loss — they’re timing lags. A transfer logged in one system hasn’t synced to another yet. The pipeline distinguishes between “expected timing lag” discrepancies (which resolve within 24 hours) and “persistent discrepancies” (which need investigation). Before the pipeline, the compliance team spent an entire day per cycle reconciling inventory across six facilities by hand. Now the system does it continuously, and the team only intervenes when a persistent discrepancy is flagged.

Compliance Reporting: From 3 Days to 2 Hours

The reporting layer is where the pipeline’s value is most visible to the compliance team.
Before: 3 days of manual work per reporting cycle
After: 2 hours of review per reporting cycle
What the reporting layer handles per state:
Requirement How It's Handled
Seed-to-sale tracking reports Auto-generated from unified plant lifecycle data
Inventory reconciliation reports Auto-generated from continuous reconciliation engine
Sales and tax reports Auto-generated from POS data reconciled against financial records.
Waste and destruction reports Auto-generated from Metrc waste records cross-referenced against processing data
Transfer manifests Auto-generated from Metrc transfer data with facility-level validation.

Data Quality and Validation: Catching Errors Before Regulators Do

In cannabis, data quality isn’t a nice-to-have — it’s a compliance requirement. The pipeline includes 47 automated validation rules designed to catch errors before they reach a regulator’s desk.
Categories of validation:
Completeness checks
Completeness checks
Timeliness checks
Anomaly detection
Every validation failure generates an alert to the appropriate team — operations for operational issues, compliance for regulatory issues, finance for financial discrepancies. The alerts include the specific data points, the rule that was violated, and a recommended action.

The Build Process: Discovery to Production in 5 Weeks

Week Phase What Happened
Week 1 Discovery Mapped all 7 source systems across 6 facilities. Documented every data flow, export format, and manual process. Defined validation rules with the compliance team.
Week 2 Ingestion Layer Built connectors to Metrc, POS systems, cultivation software, and QuickBooks. Set up automated file ingestion for Excel inventory files and PDF lab results.
Week 3 Transformation & Warehouse Built the unified data model. Created transformation logic to normalise data from all sources. Deployed the warehouse with facility-level and batch-level views.
Week 4 Reconciliation & Validation Built the automated inventory reconciliation engine. Implemented 47 validation rules. Configured alerting for discrepancies and violations.
Week 5 Reporting, Testing & Handover Built automated report generation for each state. End-to-end testing against two reporting cycles of historical data. Documentation of every component. Training for compliance and operations teams.
Five weeks. Seven source systems. Six facilities. Three states. Discovery to production.

Results: What Changed After Go-Live

Compliance reporting cut from 3 days to 2 hours
The compliance team reviews and submits. The pipeline does the assembly, reconciliation, and formatting. What used to consume three full days every two weeks now takes a focused two-hour review session.
Zero compliance discrepancies in the first four reporting cycles
Every report submitted in the first two months passed regulatory review without a single query or correction request. Previously, the team averaged 2-3 minor corrections per cycle.
Inventory discrepancies identified 6x faster
Persistent inventory variances that used to be discovered during the biweekly manual reconciliation are now flagged within 24 hours. Two significant discrepancies — both data entry errors at a processing facility — were caught and corrected within a day of occurring.
Compliance team redeployed from data assembly to strategic work
Three days of manual data work every two weeks meant the compliance team was spending roughly 30% of their time on data assembly. That time is now spent on regulatory strategy, licence applications, and proactive audit preparation.
New facility onboarding dropped from 3 weeks to 3 days
When the client opened their seventh facility, connecting it to the pipeline took 3 days. Previously, integrating a new facility’s data into the manual reporting process took 2-3 weeks of building new spreadsheets and training.
Full audit trail from seed to sale
Every data point — from planting to sale — is traceable through the pipeline. When a regulator asks “show me the chain of custody for this batch,” the compliance team can produce the complete record in minutes, not hours.

What This Pipeline Replaced (And What It Didn't)

What it replaced:
What it didn’t replace:
The pipeline is infrastructure, not a product. The compliance team uses the same regulatory portals they always used. What changed is the data layer underneath — the manual, fragile, person-dependent process of getting data from seven systems into a usable, trustworthy format.

When an Automated Data Pipeline Makes Sense for Cannabis Operators

You operate 3+ facilities
Below three facilities, a well-organised spreadsheet process can work. Above three, the fragmentation becomes structural and the manual reconciliation time grows nonlinearly.
Your compliance team spends more than 2 days per reporting cycle on data assembly
If the time is going to pulling, cleaning, and formatting — not analysis and review — the process is the bottleneck, not the team.
You’ve had a compliance discrepancy in the last 12 months
Even a minor one. If the process is fragile enough to produce errors, it’ll produce bigger ones as you scale.
You’re expanding to new states or new facilities
Every new facility and every new state multiplies the complexity. If onboarding a new facility takes weeks instead of days, the architecture can’t scale with the business.
Inventory reconciliation depends on specific people
If two people leaving would create a compliance crisis, the knowledge isn’t in a system — it’s in their heads. That’s a risk, not a process.

Frequently Asked Questions

Does this replace our seed-to-sale tracking system (Metrc, BioTrack, etc.)?
No. The pipeline integrates with your state-mandated tracking system — it reads data from it and validates against it. You continue using Metrc or whichever system your state requires exactly as before
Each state’s reporting requirements are configured separately in the reporting layer. The pipeline maintains a unified data model underneath, but the reports it generates are formatted to each state’s specific requirements. When you expand to a new state, we add the state configuration — typically 2-3 days of work.
The ingestion layer is built with modular connectors. Replacing a POS system means building a new connector to the new system — typically 1-2 days — not rebuilding the pipeline. The transformation, warehouse, and reporting layers don’t change.
The pipeline runs within your cloud environment. No operational data leaves your infrastructure. Access is role-based — the compliance team sees compliance data, operations sees operational data, and financial data is restricted to finance and leadership.
It depends on the source. Metrc syncs every 2 hours. POS systems sync every 30 minutes during operating hours. Lab results are ingested as they arrive. Financial data syncs daily. These frequencies are configurable based on your needs.
Yes. The ingestion layer is designed to be extensible. Adding a new data source typically takes 1-3 days depending on the source’s API capabilities.
It depends on the number of facilities, source systems, and states. A 3-facility pipeline with standard reporting typically falls between £25-45k. A 6+ facility multi-state build is £40-70k. We’ll give you a clear number after the discovery call — no surprise invoices.
The pipeline still makes sense if your compliance reporting is consuming more than 2 days per cycle or if you’re planning to expand. For operators with 1-2 facilities and straightforward reporting, a well-organised manual process may still be sufficient — and we’ll tell you that honestly.
Share :

In this Artical

Insights That Moves, Impact that matters.