-

Milvus Vector Database on the Intranet. How Does RAG Improve Searching in the Knowledge Base?

Modern corporate intranets store vast amounts of documents, procedures, instructions, and organizational knowledge. Traditional keyword-based search often fails when users search for information using terms other than those found in the documents.

Problem: an employee searches for "how to configure access to the payment system," but the document contains the phrase "payment integration configuration." Traditional search won’t find this document, even though it contains the answer to the question.

Solution: RAG (Retrieval-Augmented Generation) with a vector database enables semantic search. The system understands the meaning of the query and finds documents based on context, not just exact word matches.

In this article, we’ll show you how to integrate the Milvus vector database with Open Intranet on Drupal to create intelligent search in corporate knowledge bases.


In this article:


What is RAG and why is it important for intranets?

RAG (Retrieval-Augmented Generation) is a technology that combines semantic search with AI-generated responses. In the context of corporate intranets, RAG offers many benefits.

Semantic search

Instead of searching for exact keywords, the system understands the user's intent. 

Example:

  • User query: "how to reset the administrator password."
  • Traditional search: searches for documents containing exactly those words.
  • Semantic search: finds documents about "recovering access," "changing credentials," or "restoring admin privileges." Even if they don't contain those exact words.

Better results for users

Analysis of client queries shows that 66% of organizations looking for intranet solutions require advanced search or AI search. This is no coincidence – in large organizations with thousands of documents, traditional search is no longer sufficient. Artificial intelligence understands the context and intent of the user, making it ideal for working with extensive knowledge bases.

Scalability

Vector databases, such as Milvus, can handle millions of documents while maintaining fast response times. This is crucial for organizations with extensive knowledge bases.

Performance

Fast similarity search even in large data sets. Milvus uses advanced indexing algorithms (HNSW, IVF) to optimize queries.

Flexibility

Expandable with additional AI features:

Open Intranet: starter kit for corporate intranets

Open Intranet is an open source starter kit on Drupal for building corporate intranets. 

It includes ready-made intranet features such as:

  • collaboration and communication,
  • news and events system,
  • document sharing
  • knowledge base,
  • employee directory.

The system allows organizations to quickly launch a flexible internal portal without having to build everything from scratch.

Open Intranet equipped with a ready-made knowledge base.

Open Intranet system with a ready-made knowledge base

What is Milvus? Vector database for RAG

Milvus is an open source vector database designed specifically for storing, indexing, and searching vector representations of text (embeddings).

How does Milvus work in the context of RAG?

  1. Indexing: documents from the intranet are processed by an AI model (e.g., OpenAI text-embedding-3-small), which creates vectors representing the meaning of the text.
  2. Storage: the vectors are stored in Milvus along with metadata (title, URL, date).
  3. Search: when a user asks a question, the query is also converted into a vector, and Milvus finds the most similar documents based on vector distance.
  4. Return of results: the system returns documents sorted by semantic similarity.

Why Milvus vector database?

  • Open Source: full control over data, no vendor lock-in.
  • Scalability: supports millions of vectors with fast response times.
  • Ready integration: the ai_vdb_provider_milvus module for Drupal facilitates integration.
  • Standalone mode: for smaller organizations, it can be run in standalone mode on a single server.
  • Ready for production use: scalable to a cluster for larger organizations.

Read also: Recommended Vector Databases (VDB) for Drupal – Overview of AI Providers


What does the integration architecture look like? Open Intranet + Milvus RAG

The diagram below shows the complete integration architecture:

The Mermaid diagram showing the complete integration architecture of Open Intranet and Milvus.

 Chart created using the Mermaid tool

What are the specific components of the integration system?

Each element of the architecture plays a specific role, ensuring smooth query processing and data management across the entire RAG environment. Below, we describe how the individual components work together within Open Intranet.

DDEV Application Stack

This development environment provides a ready-made infrastructure for running an intranet with Milvus, automating most of the configuration. This allows the entire system to be run locally in a matter of minutes.

Web Container (Drupal Application)

  • Drupal 11 with PHP 8.3.
  • nginx-fpm as a web server.
  • Ports: 80 (HTTP), 443 (HTTPS), 8025 (Mailpit).
  • Integration with Milvus via the ai_vdb_provider_milvus module.

MariaDB (Database)

  • Database for Drupal.
  • Version: MariaDB 10.11.
  • Stores all Drupal data (content, config, users).

Milvus RAG Stack

The set of services that make up the Milvus RAG Stack is responsible for storing vectors, metadata, and executing search queries. Each component of the system plays a distinct role in ensuring high performance and stability.

etcd (Storage Layer)

  • Metadata storage and coordination.
  • Port: 2379.
  • Stores: collection schemas, indexes, configurations.
  • Why etcd? It’s a distributed key-value store used by Milvus to store metadata and coordinate between components. Without etcd, Milvus cannot function.

MinIO (Storage Layer)

  • Object storage for vector data.
  • Ports: 9000 (S3 API), 9001 (Web Console).
  • Stores: vectors, segments, binary files.
  • Why MinIO? It’s an object data store compatible with the S3 API. Milvus uses it to store actual vector data and segments. MinIO allows for scaling and efficient management of large amounts of vector data.

Milvus (Core Engine)

  • The main vector search engine.
  • Ports: 19530 (API), 9091 (Health Check).
  • Functions:
    • storage of embeddings in the form of vectors,
    • semantic similarity search,
    • indexing and query optimization,
    • RESTful API for integration with Drupal.

Attu (Management UI)

  • Web interface for managing Milvus.
  • Port: 8521 (exposed by DDEV).
  • Features:
    • browsing collections and data,
    • performance monitoring,
    • index management,
    • visualization of search results.

What does data flow look like in an intranet integrated with the Milvus vector database?

Data flow between Drupal, the embeddings model, and the Milvus vector database involves several key steps that together create an intelligent search process. Below, we describe how it works from the moment a query is made to the presentation of results.

Semantic search

  1. The user asks a question in the intranet interface.
  2. Drupal converts the query into a vector using the embeddings model (OpenAI text-embedding-3-small).
  3. The query is sent to Milvus via the ai_vdb_provider_milvus module.
  4. Milvus searches for similar vectors in the database.
  5. Milvus returns results sorted by semantic similarity.
  6. Drupal displays the results to the user with the title, a snippet of content, and the similarity score.

Content indexing

  1. A new document is added to the knowledge base on the intranet.
  2. Drupal automatically generates an embedding using the OpenAI API.
  3. The embedding is saved in Milvus along with metadata (title, URL, date).
  4. The document is ready for semantic search.

Read also: How We Improved the Accuracy of the RAG Chatbot's Responses by 40%


How to install Open Intranet with the Milvus RAG option? Step by step

The installation process has been simplified as much as possible thanks to a ready-made script that automatically configures all the required components. Just follow a few commands to run a full RAG demo in your environment.

Prerequisites

Before you begin, make sure you have:

  1. Docker Desktop — running and active.
  2. DDEV — installed (brew install ddev/ddev/ddev on macOS).
  3. OpenAI API Key — required to generate embeddings.

Open Intranet RAG demo installation process

Use the following command:

git clone https://github.com/droptica/openintranet_rag_demo.git
cd openintranet_rag_demo
./launch_openintranet_with_rag_demo.sh

The script automatically performs the following:

  1. Cloning Open Intranet from Drupal.org.
  2. Downloading the docker-compose configuration for Milvus VDB.
  3. Configuring DDEV (Drupal 11, PHP 8.3).
  4. Starting containers (web, db, Milvus).
  5. Installing Composer dependencies.
  6. Adding the drupal/ai_vdb_provider_milvus:^1.1@beta module.
  7. Copying the recipe (Drupal Recipeopenintranet_milvus_rag.
  8. Installation of Drupal with demo content.
  9. Applying the Milvus RAG recipe configuration.
  10. Interactive request for OpenAI API key (format validation).
  11. Saving the API key to the Key module in Drupal.
  12. Indexing Knowledge Base content to Milvus.
  13. Generation of a one-time login link.

During installation, you’ll be asked to paste the OpenAI API key. The script validates the format and stores it securely.

Installation verification

After completing the installation, it’s worth making sure that all elements are working correctly and communicating with each other. A few simple commands will quickly verify that indexing and semantic search are working properly.

1. Checking the index status

cd openintranet_source_code/openintranet
ddev drush search-api:status

Expected result:

knowledge_base_content   Knowledge Base Content   100%         24        24

If you see 100% - everything is working!

2. Verifying the connection to Milvus

  1. Open Milvus Attu UI: check the port using ddev describe (search for the Attu service port).
  2. Connect to: http://milvus:19530.
  3. Find the collection: openintranet_knowledge_base.
  4. Check: Entity Count > 0

3. OpenAI API test

cd openintranet_source_code/openintranet
ddev drush php:eval "
$provider = Drupal::service('ai.provider')->createInstance('openai');
$result = $provider->embeddings('test', 'text-embedding-3-small', []);
echo count($result->getNormalized()) . ' dimensions';
"

Expected result: 1536 dimensions

Screen with Milvus vector database running for Open Intranet

Screen with Milvus vector database running for Open Intranet

Need more technical information?

For more technical information, including detailed troubleshooting tips, see the project README on GitHub: https://github.com/droptica/openintranet_rag_demo.

How does RAG search work? Examples of use

Droptica's ready-made recipe for Drupal includes a sample RAG Search page at /search-rag-example. To test it:

  1. Open the page: https://your-site.ddev.site/search-rag-example.
  2. Enter a search query (e.g., "milvus configuration").
  3. Verify the display of results from:
    • title (link to the source page),
    • content snippet,
    • similarity result.

Search example

To show how RAG works in practice, the following example illustrates the difference between traditional search and results obtained using the Milvus vector database.

User query: "how to configure access to the system."

Traditional search will only find documents containing exactly those words.

RAG search will find documents about:

  • permission configuration,
  • access management,
  • authorization system settings,
  • login instructions.

Even if the documents don’t contain the exact phrase "how to configure access to the system."

How can Milvus RAG be useful in organizations?

Milvus allows organizations to use RAG in various business scenarios, from document search to content analysis. Here are some examples.

1. Document search

Finding documents based on meaning and context rather than keywords. Example: an employee searches for "emergency procedure" and the system finds documents about "business continuity plans" and "crisis scenarios."

2. Chatbots with company knowledge

Creating chatbots with access to the organization's current knowledge. The chatbot can answer employee questions using documents from the intranet as a source of knowledge.

3. Content recommendations

Suggesting similar content to users based on semantic similarity. Example: after reading a document about "data security," the system suggests documents about "GDPR" and "privacy protection."

4. Automatic tagging

Automatically assigning tags based on document content. The system analyzes the meaning of the text and assigns appropriate categories without manual intervention.

5. Sentiment analysis

Analysis of sentiment in company content. The system can identify documents that need to be updated or those that can build a positive organizational culture.

What technologies were used in the Open Intranet + Milvus vector database demo?

Check out the detailed list of used technologies.

Drupal 11

  • Version: 11.x
  • PHP: 8.3
  • Database: MariaDB 10.11
  • Web server: nginx-fpm

Milvus

  • Version: 2.5.18
  • Mode: Standalone (for development)
  • API: RESTful on port 19530
  • Embeddings: 1536 dimensions (text-embedding-3-small)

OpenAI

  • Model embeddings: text-embedding-3-small
  • Dimensions: 1536
  • Cost: ~$0.01-0.10 for the entire demo

DDEV

  • Version: v1.24.10
  • Platform: Docker Desktop
  • Networking: ddev_default (external network)

Frequently asked questions (FAQ) about Milvus vector database on the intranet?

Check out the most frequently asked questions and answers about integrating Milvus with your intranet.

Does RAG require a constant internet connection for the OpenAI API?

In the demo version of the project on GitHub, a connection to the OpenAI API is required. However, the solution can be configured with other embedding models depending on the needs of the organization, e.g., with local models (Sentence Transformers) operating without an internet connection or other cloud APIs (Claude, local AI servers).

What are the costs of using the OpenAI API for embeddings?

The text-embedding-3-small model costs $0.02 per 1M tokens. For a typical knowledge base of 1,000 documents (averaging 500 words each), the indexing cost is approximately $0.10-0.50 one-time. Searching only requires generating an embedding for the query (a few words), so the costs are minimal.

Read also: How We Reduced AI API Costs by 95% with Intelligent Question Routing

How to scale the solution for a larger organization?

For larger organizations, you can:

  • switch from standalone mode to Milvus cluster (multiple nodes),
  • use larger MinIO instances for greater capacity,
  • split etcd into separate nodes for better performance,
  • add load balancers in front of the Milvus API.

Can other embedding models be used instead of OpenAI?

Yes, the ai_vdb_provider_milvus module is agnostic to the source of embeddings. You can use other providers (Claude, local models) as long as they return vectors in the correct format.

How often should the content be reindexed?

It depends on the frequency of changes in the knowledge base. For dynamic intranets with frequent updates, you can configure automatic reindexing with each content change. For more static databases, reindexing once a day or once a week is sufficient.

Does the solution work for organizations with compliance requirements (GDPR, healthcare sector)?

Yes, because all components (Drupal, Milvus, etcd, MinIO) can run on-premise, the data never leaves the organization's infrastructure. This is crucial for organizations with compliance requirements. The OpenAI API requires sending document content, so for highly sensitive data, local embedding models can be considered.

What are the hardware requirements for Milvus in standalone mode?

For small organizations (up to 10,000 documents), the following is sufficient:

  • 4GB RAM
  • 2 CPU cores
  • 20GB disk

For larger organizations, the requirements increase proportionally to the number of documents and queries.

Milvus vector database on the intranet – summary

The integration of Milvus RAG with Open Intranet opens up new possibilities for corporate platforms. The most important benefits include:

  • Intelligent search based on meaning, not just keywords.
  • Better user experience in the intranet thanks to understanding context and intent.
  • Scalability for organizations with large knowledge bases.
  • Flexibility in expanding with additional AI features.

All components are open source, which means full control over data and no vendor lock-in. The solution is ready for production use and can be scaled according to the needs of the organization.

Do you need to implement a vector database on your intranet?

At Droptica, we design and implement AI-based solutions using LLMs, vector databases, and advanced RAG pipelines. We help you choose the right technology, integrate semantic search, create corporate chatbots, and optimize the quality of generated responses. Check out our generative AI development service and see how we can support your organization in building intelligent data-driven solutions.

-