Every company over 50 people has the same problem — institutional knowledge is scattered across
drives, inboxes, Slack threads, and the heads of three people who’ve been there since 2018. When
those people leave, the knowledge walks out with them. When a new hire needs an answer, they spend
40 minutes searching before they give up and tap someone on the shoulder.
This isn’t a culture problem. It’s an architecture problem. The information exists — it’s just not
connected, not searchable, and not structured in a way that makes retrieval instant.
One of our clients — a mid-market enterprise with 200+ employees across three offices — was living
with exactly this. SOPs in Google Drive. Onboarding docs in Notion. Technical specs in Confluence.
Policy updates buried in email threads. Training materials on a shared drive nobody could navigate.
When they came to Exillar, they weren’t looking for another tool. They’d already tried three. They
needed someone to build the layer underneath — the data architecture that would unify their scattered
knowledge into a single, searchable, AI-powered system.
This is what we built, how we built it, and what changed.
Table of Contents
- The Problem: Scattered Knowledge, Slow Retrieval, Zero Governance
- Why Off-the-Shelf Knowledge Tools Didn't Work
- What We Built: Architecture of the Knowledge Management System
- The AI Layer: How Retrieval Actually Works
- Data Ingestion: Connecting Every Source Without Breaking Workflows
- Governance and Access Control: Who Sees What, and Why It Matters
- The Build Process: Discovery to Production in 5 Weeks
- Results: What Changed After Go-Live
- What This System Replaced (And What It Didn't)
- When a Knowledge Management System Makes Sense for Your Business
- Frequently Asked Questions
The Problem: Scattered Knowledge, Slow Retrieval, Zero Governance
The client’s knowledge was spread across seven different platforms. That’s not unusual — most
mid-market companies accumulate tools organically over the years. What made it painful was the
compounding effect:
1. New hires took 3-4 weeks to become productive
Not because the work was complex, but because finding the right SOP, the right process doc, or the
right technical spec required asking three people and searching four platforms.
2. Duplicate and conflicting documents everywhere
The same onboarding guide existed in three versions across three platforms. None of them were
marked as current. Two of them contradicted each other on a compliance step.
3. No way to search across systems
Google Drive search doesn’t index Notion. Confluence search doesn’t surface Slack threads. The
company had search inside each silo, but no unified search across all of them.
4. Institutional knowledge was walking out the door
When a senior engineer left, six months of context about a critical integration went with them. The
replacement spent two months reverse-engineering what they’d built — because none of it was
documented in a place anyone could find.
5. Compliance risk from ungoverned documents
Regulated industry. No audit trail for document versions. No access controls based on role or
department. The compliance team flagged it as a risk in three consecutive quarterly reviews.
The client had tried SharePoint (too rigid), Guru (too limited), and a custom wiki (abandoned after two months). The problem wasn’t the front-end tool — it was the absence of a data layer connecting everything underneath
Why Off-the-Shelf Knowledge Tools Didn't Work
The client had already evaluated and tried multiple knowledge management platforms before coming to
Exillar. Here’s why each one fell short:
| Tool | What It Promised | Where It Broke |
|---|---|---|
| SharePoint | Unified document management | Required manual migration from every other tool. Team adoption dropped to 15% within 6 weeks because the UX friction was too high. |
| Guru | AI-powered knowledge cards | Worked for small, discrete pieces of knowledge. Couldn't handle long-form SOPs, technical specs, or documents with embedded diagrams. Search quality degraded with scale. |
|
Custom Wiki (Notion-based) |
Flexible, team-maintained knowledge base | Required someone to manually transfer and maintain every document. No one did. The wiki was 40% complete and abandoned within 8 weeks. |
| Confluence | Team collaboration and documentation | Already in use for engineering docs. But marketing, operations, and HR refused to adopt it. Became another silo instead of the single source of truth. |
The pattern was consistent: every tool required the organisation to change how it worked in order to fit
the tool’s model. None of them could ingest knowledge from where it already lived and make it
searchable without forcing a migration.
That’s the gap Exillar filled — not with another tool, but with a data architecture layer that sits behind
whatever tools the team already uses.
What We Built: Architecture of the Knowledge Management System
The system has four layers. Each one was built to solve a specific part of the knowledge retrieval
problem.
Layer 1 — Data Ingestion Layer
Connectors to Google Drive, Notion, Confluence, Slack (archived channels), SharePoint, email
archives, and the company’s shared network drive. Documents are ingested in their native format,
parsed, and normalised into a unified schema.
Layer 2 — Processing and Embedding Layer
Every document is chunked into semantically meaningful sections, not arbitrary page breaks. Each
chunk is embedded using a vector embedding model optimised for enterprise document retrieval.
Metadata is extracted automatically: author, department, document type, last modified date, version,
and access classification.
Layer 3 — AI Retrieval Layer (RAG Architecture)
When a user asks a question, the system uses Retrieval-Augmented Generation (RAG) to find the most
relevant document chunks, assemble context, and generate a precise answer with source citations.
The user sees the answer and the exact document, section, and paragraph it came from.
Layer 4 — Governance and Access Control Layer
Every document inherits access permissions from its source system. If a document was restricted to
the HR team in Google Drive, it remains restricted in the knowledge system. Role-based access control
is enforced at query time — the AI only retrieves documents the user is authorised to see.
No new tools for the team to learn. No migration required. The system connects to where knowledge
already lives and makes all of it searchable from a single interface.
The AI Layer: How Retrieval Actually Works
This isn’t keyword search with a chatbot wrapper. The retrieval system was built to handle the way
people actually ask questions in a workplace:
Natural language queries
“What’s the process for onboarding a new client in the UK?” returns the exact SOP section, not a list of
40 documents with “onboarding” in the title.
Cross-source answers
A single query can pull context from a Notion doc, a Confluence page, and an archived Slack thread —
and synthesise them into a single, coherent answer.
Source citations on every answer
Every response includes clickable links to the original documents. Users can verify the answer against
the source in seconds.
Confidence scoring
When the system isn’t confident in its answer, it says so. It surfaces the closest relevant documents and
tells the user: “I found these related documents, but I’m not certain they fully answer your question.” No
hallucinated answers presented as fact.
Automatic staleness detection
If a document hasn’t been updated in 12+ months and is being cited in answers, the system flags it for
review. The compliance and operations teams get a weekly digest of potentially stale documents being
actively used.
The RAG pipeline runs on the client’s own infrastructure. No document content leaves their
environment. The embedding model and language model both run within their cloud tenant — a
non-negotiable requirement for their compliance posture.
Data Ingestion: Connecting Every Source Without Breaking Workflows
The ingestion layer was the most technically complex part of the build. Seven source systems, each
with its own API, authentication model, and document format.
What we connected:
- Google Drive — Docs, Sheets, PDFs, Slides
- Notion — Pages, databases, embedded files
- Confluence — Spaces, pages, attachments
- Slack — Archived channels (not live channels — a deliberate governance decision)
- SharePoint — Document libraries, lists
- Network shared drive — Word docs, Excel files, PDFs, PowerPoints
- Email archives — Selected distribution list threads (HR policy updates, compliance notices)
How ingestion works:
- Initial full sync on setup — every document across every source is ingested, processed, and embedded
- Incremental sync runs every 4 hours — only changed or new documents are re-processed
- Deleted documents are soft-removed from the index (flagged, not purged) to maintain audit trail
- Document format normalisation — PDFs, DOCX, PPTX, Sheets, and Markdown are all parsed into clean text with structural metadata preserved
What we deliberately excluded:
- Live Slack channels (too noisy, too ephemeral — knowledge isn't in real-time chat)
- Personal email inboxes (privacy boundary)
- Draft documents (only published/shared documents are indexed)
The team kept working in the same tools they were already using. The knowledge system ingests from
those tools automatically. No behaviour change required.
Governance and Access Control: Who Sees What, and Why It Matters
For a regulated enterprise, governance isn’t optional — it’s the reason the system exists.
Permission inheritance
The system inherits access permissions from each source platform. If an HR document is shared only
with the HR team in Google Drive, only HR team members can retrieve it through the knowledge
system.
Role-based access tiers
Three tiers of access: All Staff, Department-Restricted, and Leadership-Only. Every document is
auto-classified based on its source permissions, with the option for manual override by department
leads.
Audit trail
Every query is logged: who asked, what was retrieved, which documents were cited, and when. The
compliance team can audit knowledge access patterns monthly.
Document lifecycle management
Documents are tagged with a review cadence (quarterly, annually, or on-change). When a review is
overdue, the document owner gets notified. If no action is taken within 30 days, the document is
flagged as potentially stale in search results.
Version control
When a document is updated in its source system, the knowledge system re-ingests it and updates the
index. Previous versions are archived but remain accessible for audit purposes
The Build Process: Discovery to Production in 5 Weeks
| Week | Phase | What Happened |
|---|---|---|
| Week 1 | Discovery | Mapped all 7 source systems. Identified 12,000+ documents. Classified by department, type, and access tier. Defined governance rules with compliance team. |
| Week 2 | Architecture & Ingestion | Built connectors to all 7 sources. Ran initial full sync. Validated document parsing across all formats. |
| Week 3 | AI Layer Build | Deployed embedding pipeline. Built RAG retrieval system. Tuned retrieval quality against 50 test queries from each department. |
| Week 4 | Governance & Access | Implemented permission inheritance. Built audit logging. Configured staleness detection and review cadence system. |
| Week 5 | Testing, Training & Handover | End-to-end testing with real users from 4 departments. Documentation of every component. Training sessions for department leads and the compliance team. |
Five weeks. Discovery to production. The client’s team was using the system by the end of Week 5.
Results: What Changed After Go-Live
70% reduction in information retrieval time
Average time to find a specific document or answer dropped from 22 minutes to under 7 minutes. For
frequently asked questions (onboarding, compliance, process), retrieval is under 60 seconds.
New hire productivity ramp cut from 4 weeks to 10 days
New employees get instant access to every SOP, process doc, and training material — searchable by
natural language, not file names.
Zero compliance flags in the first quarterly review post-launch
The governance layer — audit trails, access controls, version management — resolved the compliance
team’s concerns completely.
43% reduction in internal “how do I do this?” Slack messages
Measured over the first 8 weeks post-launch. People ask the knowledge system instead of interrupting
colleagues.
12 stale documents identified and updated in the first month
The staleness detection system surfaced documents that were being actively referenced but hadn’t
been updated in 18+ months. Three of them contained outdated compliance procedures.
What This System Replaced (And What It Didn't)
What it replaced:
- Manual searching across 7 platforms
- “Ask Sarah, she knows where it is” as a knowledge management strategy
- Duplicate documents with no version control
- Ungoverned access to sensitive documents
What it didn’t replace:
- The source tools themselves — Google Drive, Notion, Confluence, Slack all stayed in place
- Human judgement — the AI retrieves and cites, but doesn't make decisions
- Document creation — teams still create documents in their preferred tools; the system just makes them findable
This is important: the knowledge management system is an infrastructure layer, not a destination app.
The team doesn’t “use” it the way they use Notion or Confluence. They search it when they need an
answer, and the system pulls from everywhere.
When a Knowledge Management System Makes Sense for Your Business
You have 100+ employees and knowledge is spread across 3+ platforms
Below that threshold, a well-maintained Notion workspace or Confluence instance is usually enough.
Above it, the fragmentation becomes structural.
New hires take more than 2 weeks to become productive
If onboarding drag is measurable in weeks, the problem is usually information access, not training
quality.
You’ve tried and abandoned 2+ knowledge tools already
The pattern of tool-hopping usually means the problem isn’t the tool — it’s the layer underneath.
You’re in a regulated industry
If auditors ask “who accessed what, when, and was it the current version?” and you can’t answer — a
governed knowledge system isn’t optional.
Institutional knowledge is concentrated in a few people
If three people leaving would create a knowledge crisis, the knowledge isn’t managed — it’s hostage.
Frequently Asked Questions
How long does it take to build a knowledge management system like this?
Most builds take 4-6 weeks from discovery to production. The timeline depends on how many source
systems need to be connected and the complexity of your governance requirements. We’ve built these
in as few as 3 weeks for simpler environments.
Do we need to migrate our documents to a new platform?
No. The entire point of this architecture is that your team keeps working in the tools they already use.
The system connects to your existing platforms and indexes documents where they live.
Is the AI retrieval accurate enough to trust?
The RAG pipeline includes confidence scoring and source citations on every answer. When the system
isn’t confident, it tells you. Every answer links back to the original document so users can verify
instantly.
What happens to our data? Does it leave our environment?
No. The embedding model and retrieval system run within your cloud tenant. No document content is
sent to external APIs. This is a non-negotiable part of every build we do.
Can we control who sees what?
Yes. The system inherits access permissions from your source platforms and adds role-based access
tiers. If a document is restricted in Google Drive, it stays restricted in the knowledge system.
What if we add a new source system later?
The ingestion layer is built to be extensible. Adding a new connector typically takes 1-2 days of
development, depending on the source system’s API.
How much does a knowledge management system cost?
It depends on scope — number of source systems, document volume, governance complexity, and
whether AI retrieval is included. Most builds fall between £20-50k. We’ll give you a clear number after
the discovery call.