How We Built a Centralised Knowledge ManagementSystem That Cut Information Retrieval Time by 70%

Every company over 50 people has the same problem — institutional knowledge is scattered across drives, inboxes, Slack threads, and the heads of three people who’ve been there since 2018. When those people leave, the knowledge walks out with them. When a new hire needs an answer, they spend 40 minutes searching before they give up and tap someone on the shoulder.

This isn’t a culture problem. It’s an architecture problem. The information exists — it’s just not connected, not searchable, and not structured in a way that makes retrieval instant.

One of our clients — a mid-market enterprise with 200+ employees across three offices — was living with exactly this. SOPs in Google Drive. Onboarding docs in Notion. Technical specs in Confluence. Policy updates buried in email threads. Training materials on a shared drive nobody could navigate.

When they came to Exillar, they weren’t looking for another tool. They’d already tried three. They needed someone to build the layer underneath — the data architecture that would unify their scattered knowledge into a single, searchable, AI-powered system.

This is what we built, how we built it, and what changed.

The Problem: Scattered Knowledge, Slow Retrieval, Zero Governance

The client’s knowledge was spread across seven different platforms. That’s not unusual — most mid-market companies accumulate tools organically over the years. What made it painful was the compounding effect:

1. New hires took 3-4 weeks to become productive

Not because the work was complex, but because finding the right SOP, the right process doc, or the right technical spec required asking three people and searching four platforms.

2. Duplicate and conflicting documents everywhere

The same onboarding guide existed in three versions across three platforms. None of them were marked as current. Two of them contradicted each other on a compliance step.

3. No way to search across systems

Google Drive search doesn’t index Notion. Confluence search doesn’t surface Slack threads. The company had search inside each silo, but no unified search across all of them.

4. Institutional knowledge was walking out the door

When a senior engineer left, six months of context about a critical integration went with them. The replacement spent two months reverse-engineering what they’d built — because none of it was documented in a place anyone could find.

5. Compliance risk from ungoverned documents

Regulated industry. No audit trail for document versions. No access controls based on role or department. The compliance team flagged it as a risk in three consecutive quarterly reviews.

The client had tried SharePoint (too rigid), Guru (too limited), and a custom wiki (abandoned after two months). The problem wasn’t the front-end tool — it was the absence of a data layer connecting everything underneath

Why Off-the-Shelf Knowledge Tools Didn't Work

The client had already evaluated and tried multiple knowledge management platforms before coming to Exillar. Here’s why each one fell short:

Tool	What It Promised	Where It Broke
SharePoint	Unified document management	Required manual migration from every other tool. Team adoption dropped to 15% within 6 weeks because the UX friction was too high.
Guru	AI-powered knowledge cards	Worked for small, discrete pieces of knowledge. Couldn't handle long-form SOPs, technical specs, or documents with embedded diagrams. Search quality degraded with scale.
Custom Wiki (Notion-based)	Flexible, team-maintained knowledge base	Required someone to manually transfer and maintain every document. No one did. The wiki was 40% complete and abandoned within 8 weeks.
Confluence	Team collaboration and documentation	Already in use for engineering docs. But marketing, operations, and HR refused to adopt it. Became another silo instead of the single source of truth.

The pattern was consistent: every tool required the organisation to change how it worked in order to fit the tool’s model. None of them could ingest knowledge from where it already lived and make it searchable without forcing a migration.

That’s the gap Exillar filled — not with another tool, but with a data architecture layer that sits behind whatever tools the team already uses.

What We Built: Architecture of the Knowledge Management System

The system has four layers. Each one was built to solve a specific part of the knowledge retrieval problem.

Layer 1 — Data Ingestion Layer

Connectors to Google Drive, Notion, Confluence, Slack (archived channels), SharePoint, email archives, and the company’s shared network drive. Documents are ingested in their native format, parsed, and normalised into a unified schema.

Layer 2 — Processing and Embedding Layer

Every document is chunked into semantically meaningful sections, not arbitrary page breaks. Each chunk is embedded using a vector embedding model optimised for enterprise document retrieval. Metadata is extracted automatically: author, department, document type, last modified date, version, and access classification.

Layer 3 — AI Retrieval Layer (RAG Architecture)

When a user asks a question, the system uses Retrieval-Augmented Generation (RAG) to find the most relevant document chunks, assemble context, and generate a precise answer with source citations. The user sees the answer and the exact document, section, and paragraph it came from.

Layer 4 — Governance and Access Control Layer

Every document inherits access permissions from its source system. If a document was restricted to the HR team in Google Drive, it remains restricted in the knowledge system. Role-based access control is enforced at query time — the AI only retrieves documents the user is authorised to see.

No new tools for the team to learn. No migration required. The system connects to where knowledge already lives and makes all of it searchable from a single interface.

The AI Layer: How Retrieval Actually Works

This isn’t keyword search with a chatbot wrapper. The retrieval system was built to handle the way people actually ask questions in a workplace:

Natural language queries

“What’s the process for onboarding a new client in the UK?” returns the exact SOP section, not a list of 40 documents with “onboarding” in the title.

Cross-source answers

A single query can pull context from a Notion doc, a Confluence page, and an archived Slack thread — and synthesise them into a single, coherent answer.

Source citations on every answer

Every response includes clickable links to the original documents. Users can verify the answer against the source in seconds.

Confidence scoring

When the system isn’t confident in its answer, it says so. It surfaces the closest relevant documents and tells the user: “I found these related documents, but I’m not certain they fully answer your question.” No hallucinated answers presented as fact.

Automatic staleness detection

If a document hasn’t been updated in 12+ months and is being cited in answers, the system flags it for review. The compliance and operations teams get a weekly digest of potentially stale documents being actively used.

The RAG pipeline runs on the client’s own infrastructure. No document content leaves their environment. The embedding model and language model both run within their cloud tenant — a non-negotiable requirement for their compliance posture.

Data Ingestion: Connecting Every Source Without Breaking Workflows

The ingestion layer was the most technically complex part of the build. Seven source systems, each with its own API, authentication model, and document format.

What we connected:

Google Drive — Docs, Sheets, PDFs, Slides
Notion — Pages, databases, embedded files
Confluence — Spaces, pages, attachments
Slack — Archived channels (not live channels — a deliberate governance decision)
SharePoint — Document libraries, lists
Network shared drive — Word docs, Excel files, PDFs, PowerPoints
Email archives — Selected distribution list threads (HR policy updates, compliance notices)

How ingestion works:

Initial full sync on setup — every document across every source is ingested, processed, and embedded
Incremental sync runs every 4 hours — only changed or new documents are re-processed
Deleted documents are soft-removed from the index (flagged, not purged) to maintain audit trail
Document format normalisation — PDFs, DOCX, PPTX, Sheets, and Markdown are all parsed into clean text with structural metadata preserved

What we deliberately excluded:

The team kept working in the same tools they were already using. The knowledge system ingests from those tools automatically. No behaviour change required.

Governance and Access Control: Who Sees What, and Why It Matters

For a regulated enterprise, governance isn’t optional — it’s the reason the system exists.

Permission inheritance

The system inherits access permissions from each source platform. If an HR document is shared only with the HR team in Google Drive, only HR team members can retrieve it through the knowledge system.

Role-based access tiers

Three tiers of access: All Staff, Department-Restricted, and Leadership-Only. Every document is auto-classified based on its source permissions, with the option for manual override by department leads.

Audit trail

Every query is logged: who asked, what was retrieved, which documents were cited, and when. The compliance team can audit knowledge access patterns monthly.

Document lifecycle management

Documents are tagged with a review cadence (quarterly, annually, or on-change). When a review is overdue, the document owner gets notified. If no action is taken within 30 days, the document is flagged as potentially stale in search results.

Version control

When a document is updated in its source system, the knowledge system re-ingests it and updates the index. Previous versions are archived but remain accessible for audit purposes

The Build Process: Discovery to Production in 5 Weeks

Week	Phase	What Happened
Week 1	Discovery	Mapped all 7 source systems. Identified 12,000+ documents. Classified by department, type, and access tier. Defined governance rules with compliance team.
Week 2	Architecture & Ingestion	Built connectors to all 7 sources. Ran initial full sync. Validated document parsing across all formats.
Week 3	AI Layer Build	Deployed embedding pipeline. Built RAG retrieval system. Tuned retrieval quality against 50 test queries from each department.
Week 4	Governance & Access	Implemented permission inheritance. Built audit logging. Configured staleness detection and review cadence system.
Week 5	Testing, Training & Handover	End-to-end testing with real users from 4 departments. Documentation of every component. Training sessions for department leads and the compliance team.

Five weeks. Discovery to production. The client’s team was using the system by the end of Week 5.

Results: What Changed After Go-Live

70% reduction in information retrieval time

Average time to find a specific document or answer dropped from 22 minutes to under 7 minutes. For frequently asked questions (onboarding, compliance, process), retrieval is under 60 seconds.

New hire productivity ramp cut from 4 weeks to 10 days

New employees get instant access to every SOP, process doc, and training material — searchable by natural language, not file names.

Zero compliance flags in the first quarterly review post-launch

The governance layer — audit trails, access controls, version management — resolved the compliance team’s concerns completely.

43% reduction in internal “how do I do this?” Slack messages

Measured over the first 8 weeks post-launch. People ask the knowledge system instead of interrupting colleagues.

12 stale documents identified and updated in the first month

The staleness detection system surfaced documents that were being actively referenced but hadn’t been updated in 18+ months. Three of them contained outdated compliance procedures.

What This System Replaced (And What It Didn't)

What it replaced:

What it didn’t replace:

This is important: the knowledge management system is an infrastructure layer, not a destination app. The team doesn’t “use” it the way they use Notion or Confluence. They search it when they need an answer, and the system pulls from everywhere.

When a Knowledge Management System Makes Sense for Your Business

You have 100+ employees and knowledge is spread across 3+ platforms

Below that threshold, a well-maintained Notion workspace or Confluence instance is usually enough. Above it, the fragmentation becomes structural.

New hires take more than 2 weeks to become productive

If onboarding drag is measurable in weeks, the problem is usually information access, not training quality.

You’ve tried and abandoned 2+ knowledge tools already

The pattern of tool-hopping usually means the problem isn’t the tool — it’s the layer underneath.

You’re in a regulated industry

If auditors ask “who accessed what, when, and was it the current version?” and you can’t answer — a governed knowledge system isn’t optional.

Institutional knowledge is concentrated in a few people

If three people leaving would create a knowledge crisis, the knowledge isn’t managed — it’s hostage.

Frequently Asked Questions

How long does it take to build a knowledge management system like this?

Most builds take 4-6 weeks from discovery to production. The timeline depends on how many source systems need to be connected and the complexity of your governance requirements. We’ve built these in as few as 3 weeks for simpler environments.

Do we need to migrate our documents to a new platform?

No. The entire point of this architecture is that your team keeps working in the tools they already use. The system connects to your existing platforms and indexes documents where they live.

Is the AI retrieval accurate enough to trust?

The RAG pipeline includes confidence scoring and source citations on every answer. When the system isn’t confident, it tells you. Every answer links back to the original document so users can verify instantly.

What happens to our data? Does it leave our environment?

No. The embedding model and retrieval system run within your cloud tenant. No document content is sent to external APIs. This is a non-negotiable part of every build we do.

Can we control who sees what?

Yes. The system inherits access permissions from your source platforms and adds role-based access tiers. If a document is restricted in Google Drive, it stays restricted in the knowledge system.

What if we add a new source system later?

The ingestion layer is built to be extensible. Adding a new connector typically takes 1-2 days of development, depending on the source system’s API.

How much does a knowledge management system cost?

It depends on scope — number of source systems, document volume, governance complexity, and whether AI retrieval is included. Most builds fall between £20-50k. We’ll give you a clear number after the discovery call.

How We Built a Centralised Knowledge ManagementSystem That Cut Information Retrieval Time by 70%

Table of Contents

The Problem: Scattered Knowledge, Slow Retrieval, Zero Governance

Why Off-the-Shelf Knowledge Tools Didn't Work

What We Built: Architecture of the Knowledge Management System

The AI Layer: How Retrieval Actually Works

Data Ingestion: Connecting Every Source Without Breaking Workflows

Governance and Access Control: Who Sees What, and Why It Matters

The Build Process: Discovery to Production in 5 Weeks

Results: What Changed After Go-Live

What This System Replaced (And What It Didn't)

When a Knowledge Management System Makes Sense for Your Business

Frequently Asked Questions

In this Artical

Insights That Moves, Impact that matters.

Call Us

Email Us

Location