Cannabis is one of the most data-intensive regulated industries in the world. Every plant has to be
tracked from seed to sale. Every transfer between facilities has to be logged. Every gram of inventory
has to reconcile against state or provincial reporting systems. And if the numbers don’t match — the
consequences aren’t a fine. They’re a licence revocation.
Most cannabis operators know this. What they don’t have is the data infrastructure to handle it at scale.
One of our clients — a multi-state cannabis operator running six cultivation, processing, and dispensary
facilities — was managing all of this with spreadsheets, manual exports, and a compliance team that
spent three full days every reporting cycle pulling data from disconnected systems, reconciling it by
hand, and formatting reports for regulators.
They weren’t doing anything wrong. The spreadsheets were accurate — most of the time. But the
process was fragile, slow, and entirely dependent on two people who knew where everything lived.
When the business grew from three facilities to six, the process didn’t scale. The compliance team went
from “stretched” to “one mistake away from a regulatory issue.”
They came to Exillar not for a dashboard or a new tool, but for the data layer underneath — an
automated pipeline that would connect their seed-to-sale tracking, inventory management, and
compliance reporting into a single, reliable system.
This is what we built, how it works, and what changed.
The Problem: Six Facilities, Seven Data Sources, Zero Automation
The client’s data problem wasn’t complexity — it was fragmentation. Every facility had its own systems,
its own exports, and its own way of tracking things.
1. Seed-to-sale tracking system (Metrc)
The state-mandated tracking platform. Every plant, every transfer, every sale has to be logged here.
But Metrc is a compliance tool, not an analytics platform. Getting data out of it for reporting or
reconciliation required manual CSV exports.
2. Point-of-sale systems across four dispensaries
Each dispensary ran its own POS. Sales data, inventory movements, and customer transaction records
lived in four separate databases with four different schemas.
3. Cultivation management software
Two cultivation facilities used different grow management platforms. Plant health data, harvest yields,
and batch tracking were siloed in each.
4. Inventory management spreadsheets
Processing and packaging inventory was tracked in Excel. Updated manually. Version control was
“whoever saved last wins.”
5. Accounting system
Financial data — COGS, revenue by facility, tax obligations — lived in QuickBooks. Reconciling
financial data against operational data required manual cross-referencing.
6. Lab testing results
Third-party lab results for potency, terpenes, and contaminants came in as PDFs and were manually
entered into spreadsheets for batch tracking.
7. State reporting templates
Each state had its own reporting format, its own data requirements, and its own submission schedule.
The compliance team rebuilt reports from scratch for each jurisdiction every cycle.
The result: a compliance team of three spending three full days per reporting cycle — roughly every two
weeks — manually pulling, cleaning, reconciling, and formatting data from seven sources across six
facilities. The process worked. Until it didn’t scale.
Why Cannabis Data Infrastructure Is Different From Every Other Industry
Cannabis operators face data challenges that don’t exist in most other regulated industries.
Understanding these constraints shaped every architectural decision in the pipeline.
Seed-to-sale traceability is legally mandatory
Unlike most supply chains where traceability is a best practice, cannabis traceability is a legal
requirement. Every plant must be tracked from the moment it’s planted to the moment it’s sold to a
customer. Gaps in the chain aren’t operational problems — they’re compliance violations.
Multi-state operators face different regulations in every market
A six-facility operator across three states has to comply with three different regulatory frameworks,
three different reporting formats, and three different data submission requirements. There’s no federal
standard. Every state is different.
Inventory discrepancies trigger audits
In most industries, a 2% inventory variance is a rounding error. In cannabis, any discrepancy between
physical inventory and what’s reported in the seed-to-sale system can trigger a regulatory audit. The
tolerance for error is effectively zero.
Data lives in state-mandated systems the operator doesn’t control
Metrc, BioTrack, and other seed-to-sale platforms are mandated by the state. Operators have to use
them, but they don’t control the data model, the export format, or the API capabilities. Building a
pipeline on top of these systems means working within constraints you can’t change.
Financial data and operational data have to reconcile perfectly
Tax authorities in legal cannabis markets require that financial reporting aligns with seed-to-sale
tracking data. Revenue reported to the state tax authority has to match the sales reported to the
cannabis regulatory body. If they don’t — both agencies come asking questions.
These constraints mean cannabis data infrastructure can’t be built with generic pipeline tools and
default configurations. Every decision — ingestion frequency, validation rules, reconciliation logic, error
handling — has to account for the regulatory reality.
What We Built: Architecture of the Supply Chain Data Pipeline
The pipeline has four layers, each designed to solve a specific part of the data fragmentation problem.
Layer 1 — Data Ingestion Layer
Automated connectors to all seven source systems: Metrc API integration, POS system database
connectors (four dispensaries, two different POS platforms), cultivation management software API
connections, automated ingestion of Excel-based inventory files, QuickBooks API for financial data,
PDF parsing for lab test results (OCR + structured extraction), and state reporting template
reverse-engineering for automated formatting. Data is pulled on a schedule matched to each source’s
update frequency.
Layer 2 — Transformation and Normalisation Layer
Raw data from seven sources arrives in seven different formats. This layer normalises everything into a
unified data model: plant lifecycle records from Metrc and cultivation software are merged into a single
timeline per plant, inventory records are reconciled into a single inventory view per facility, financial
records are mapped to operational data by facility and batch, and lab results are linked to specific
batches and harvests. Every transformation is documented and testable. No black boxes.
Layer 3 — Data Warehouse
A single, structured source of truth storing complete plant lifecycle data (seed to sale) for every plant
across all facilities, real-time inventory by facility and batch, financial data reconciled against
operational data, lab testing results linked to batches, and historical compliance reports for audit trail.
Layer 4 — Reporting and Compliance Layer
Automated report generation for each state’s regulatory requirements. The system knows each state’s
reporting format, data requirements, and submission schedule. Reports are generated automatically,
validated against source data, and staged for the compliance team’s review before submission. The
compliance team reviews and submits. The system does the assembly.
Seed-to-Sale Tracking: How the Pipeline Handles Plant Lifecycle Data
Seed-to-sale tracking is the backbone of cannabis compliance. The pipeline maintains a complete
lifecycle record for every plant across all facilities.
What gets tracked per plant:
- Planting date, strain, facility, and grow room
- Every transfer between facilities (cultivation to processing, processing to dispensary)
- Harvest date, wet weight, dry weight, and trim weight
- Processing records (extraction, infusion, packaging)
- Lab testing results (potency, terpenes, contaminants, pass/fail)
- Final product creation (which plants/batches went into which products)
- Sale records (which products sold, when, where, to whom)
- Waste and destruction records (unsold or failed product)
How the pipeline maintains accuracy:
Every Metrc record is cross-referenced against the cultivation management software and POS data. If
a transfer is logged in Metrc but doesn’t appear in the receiving facility’s POS system within 24 hours,
the system flags it for investigation. If a batch’s lab results show a potency that’s statistically outside the
range for that strain, the system flags it — not as an error, but as a data point that warrants verification.
These validation rules were designed with the compliance team and reflect the specific checks that
regulators perform during audits.
Inventory Reconciliation: Automated, Not Manual
Inventory reconciliation was the single most time-consuming task the compliance team performed
manually. The pipeline automates it.
How reconciliation works:
Every 4 hours, the system compares three views of inventory:
- Metrc inventory — What the state thinks you have (the legal record)
- POS/processing inventory — What your operational systems say you have
- Physical inventory — Entered during scheduled physical counts
Any discrepancy between these three views is flagged immediately with the specific product and
facility, the size of the discrepancy (in grams), the likely source, and a recommended action.
Timing lag handling:
Most inventory discrepancies in cannabis aren’t theft or loss — they’re timing lags. A transfer logged in
one system hasn’t synced to another yet. The pipeline distinguishes between “expected timing lag”
discrepancies (which resolve within 24 hours) and “persistent discrepancies” (which need
investigation). Before the pipeline, the compliance team spent an entire day per cycle reconciling
inventory across six facilities by hand. Now the system does it continuously, and the team only
intervenes when a persistent discrepancy is flagged.
Compliance Reporting: From 3 Days to 2 Hours
The reporting layer is where the pipeline’s value is most visible to the compliance team.
Before: 3 days of manual work per reporting cycle
- Day 1: Export data from Metrc, POS systems, cultivation software, and Excel files. Manually clean and format.
- Day 2: Cross-reference data sources. Investigate and resolve discrepancies. Rebuild reports in each state's required format.
- Day 3: Quality check reports against source data. Format for submission. Submit.
After: 2 hours of review per reporting cycle
- The pipeline assembles all required data automatically
- Reports are generated in each state's specific format
- Validation rules flag any data points that need human review
- The compliance team reviews flagged items, spot-checks the report, and submits
What the reporting layer handles per state:
| Requirement | How It's Handled |
|---|---|
| Seed-to-sale tracking reports | Auto-generated from unified plant lifecycle data |
| Inventory reconciliation reports | Auto-generated from continuous reconciliation engine |
| Sales and tax reports | Auto-generated from POS data reconciled against financial records. |
| Waste and destruction reports | Auto-generated from Metrc waste records cross-referenced against processing data |
| Transfer manifests | Auto-generated from Metrc transfer data with facility-level validation. |
Data Quality and Validation: Catching Errors Before Regulators Do
In cannabis, data quality isn’t a nice-to-have — it’s a compliance requirement. The pipeline includes 47
automated validation rules designed to catch errors before they reach a regulator’s desk.
Categories of validation:
Completeness checks
- Every plant has a complete lifecycle record (no gaps between planting and sale/destruction)
- Every batch has lab results attached before any product from that batch is sold
- Every transfer has matching records in both the sending and receiving facility
Completeness checks
- Inventory quantities don't go negative (a common sign of data entry errors)
- Harvest yields are within expected ranges for the strain and grow conditions
- Sales revenue per facility reconciles against financial records within a defined tolerance
Timeliness checks
- Metrc records are updated within the legally required timeframe
- Lab results are attached to batches before product release deadlines
- Compliance reports are assembled and staged at least 48 hours before the submission deadline
Anomaly detection
- Unusual inventory movements (large transfers outside normal patterns)
- Yield variances significantly above or below historical averages
- Revenue per gram deviations by product category
Every validation failure generates an alert to the appropriate team — operations for operational issues,
compliance for regulatory issues, finance for financial discrepancies. The alerts include the specific
data points, the rule that was violated, and a recommended action.
The Build Process: Discovery to Production in 5 Weeks
| Week | Phase | What Happened |
|---|---|---|
| Week 1 | Discovery | Mapped all 7 source systems across 6 facilities. Documented every data flow, export format, and manual process. Defined validation rules with the compliance team. |
| Week 2 | Ingestion Layer | Built connectors to Metrc, POS systems, cultivation software, and QuickBooks. Set up automated file ingestion for Excel inventory files and PDF lab results. |
| Week 3 | Transformation & Warehouse | Built the unified data model. Created transformation logic to normalise data from all sources. Deployed the warehouse with facility-level and batch-level views. |
| Week 4 | Reconciliation & Validation | Built the automated inventory reconciliation engine. Implemented 47 validation rules. Configured alerting for discrepancies and violations. |
| Week 5 | Reporting, Testing & Handover | Built automated report generation for each state. End-to-end testing against two reporting cycles of historical data. Documentation of every component. Training for compliance and operations teams. |
Five weeks. Seven source systems. Six facilities. Three states. Discovery to production.
Results: What Changed After Go-Live
Compliance reporting cut from 3 days to 2 hours
The compliance team reviews and submits. The pipeline does the assembly, reconciliation, and
formatting. What used to consume three full days every two weeks now takes a focused two-hour
review session.
Zero compliance discrepancies in the first four reporting cycles
Every report submitted in the first two months passed regulatory review without a single query or
correction request. Previously, the team averaged 2-3 minor corrections per cycle.
Inventory discrepancies identified 6x faster
Persistent inventory variances that used to be discovered during the biweekly manual reconciliation are
now flagged within 24 hours. Two significant discrepancies — both data entry errors at a processing
facility — were caught and corrected within a day of occurring.
Compliance team redeployed from data assembly to strategic work
Three days of manual data work every two weeks meant the compliance team was spending roughly
30% of their time on data assembly. That time is now spent on regulatory strategy, licence applications,
and proactive audit preparation.
New facility onboarding dropped from 3 weeks to 3 days
When the client opened their seventh facility, connecting it to the pipeline took 3 days. Previously,
integrating a new facility’s data into the manual reporting process took 2-3 weeks of building new
spreadsheets and training.
Full audit trail from seed to sale
Every data point — from planting to sale — is traceable through the pipeline. When a regulator asks
“show me the chain of custody for this batch,” the compliance team can produce the complete record in
minutes, not hours.
What This Pipeline Replaced (And What It Didn't)
What it replaced:
- Manual CSV exports from Metrc and POS systems
- Excel-based inventory tracking at processing facilities
- Manual data reconciliation across systems
- Hand-built compliance reports reformatted for each state
- The “two people who know where everything lives” dependency
What it didn’t replace:
- Metrc and state-mandated tracking systems — the pipeline reads from them, doesn't replace them
- The compliance team — they still review, verify, and submit every report
- Physical inventory counts — the pipeline reconciles against them, but someone still has to count
- Decision-making — the pipeline surfaces data; humans decide what to do with it
The pipeline is infrastructure, not a product. The compliance team uses the same regulatory portals
they always used. What changed is the data layer underneath — the manual, fragile,
person-dependent process of getting data from seven systems into a usable, trustworthy format.
When an Automated Data Pipeline Makes Sense for Cannabis Operators
You operate 3+ facilities
Below three facilities, a well-organised spreadsheet process can work. Above three, the fragmentation
becomes structural and the manual reconciliation time grows nonlinearly.
Your compliance team spends more than 2 days per reporting cycle on data assembly
If the time is going to pulling, cleaning, and formatting — not analysis and review — the process is the
bottleneck, not the team.
You’ve had a compliance discrepancy in the last 12 months
Even a minor one. If the process is fragile enough to produce errors, it’ll produce bigger ones as you
scale.
You’re expanding to new states or new facilities
Every new facility and every new state multiplies the complexity. If onboarding a new facility takes
weeks instead of days, the architecture can’t scale with the business.
Inventory reconciliation depends on specific people
If two people leaving would create a compliance crisis, the knowledge isn’t in a system — it’s in their
heads. That’s a risk, not a process.
Frequently Asked Questions
Does this replace our seed-to-sale tracking system (Metrc, BioTrack, etc.)?
No. The pipeline integrates with your state-mandated tracking system — it reads data from it and
validates against it. You continue using Metrc or whichever system your state requires exactly as
before
How do you handle different regulations across states?
Each state’s reporting requirements are configured separately in the reporting layer. The pipeline
maintains a unified data model underneath, but the reports it generates are formatted to each state’s
specific requirements. When you expand to a new state, we add the state configuration — typically 2-3
days of work.
What happens if our POS system or cultivation software changes?
The ingestion layer is built with modular connectors. Replacing a POS system means building a new
connector to the new system — typically 1-2 days — not rebuilding the pipeline. The transformation,
warehouse, and reporting layers don’t change.
Is our data secure?
The pipeline runs within your cloud environment. No operational data leaves your infrastructure. Access
is role-based — the compliance team sees compliance data, operations sees operational data, and
financial data is restricted to finance and leadership.
How often does the pipeline sync data?
It depends on the source. Metrc syncs every 2 hours. POS systems sync every 30 minutes during
operating hours. Lab results are ingested as they arrive. Financial data syncs daily. These frequencies
are configurable based on your needs.
Can the pipeline handle additional data sources we add later?
Yes. The ingestion layer is designed to be extensible. Adding a new data source typically takes 1-3
days depending on the source’s API capabilities.
How much does this cost?
It depends on the number of facilities, source systems, and states. A 3-facility pipeline with standard
reporting typically falls between £25-45k. A 6+ facility multi-state build is £40-70k. We’ll give you a clear
number after the discovery call — no surprise invoices.
What if we only have 2-3 facilities?
The pipeline still makes sense if your compliance reporting is consuming more than 2 days per cycle or
if you’re planning to expand. For operators with 1-2 facilities and straightforward reporting, a
well-organised manual process may still be sufficient — and we’ll tell you that honestly.