Building a RAG-Enabled Internal Knowledge Base AI with Qwen3.5-9B and OpenClaw
Learn how to build a RAG-enabled internal knowledge base AI using Qwen3.5-9B and OpenClaw. This guide covers document ingestion from PDFs, Word files, and internal wikis, vector database integration, and best practices for achieving accurate information retrieval with natural dialogue. We provide AI agent implementation support for companies in Shinagawa-ku, Minato-ku, Shibuya-ku, Setagaya-ku, and Ota-ku to enhance operational efficiency.
The Need for RAG-Enabled Internal Knowledge Base AI
The vast amount of internal documents, technical documentation, operational manuals, and past project materials that companies possess are often scattered across file servers and internal wikis, making it difficult to quickly find necessary information. RAG (Retrieval-Augmented Generation) technology has emerged as an innovative approach to solve this challenge. RAG combines the generative capabilities of Large Language Models (LLMs) with high-precision information retrieval using vector databases, enabling AI agents that efficiently leverage internal knowledge. By combining OpenClaw with Qwen3.5-9B, you can build a privacy-preserving internal knowledge base AI that runs locally on Mac mini. Companies in central Tokyo areas such as Shinagawa-ku, Minato-ku, and Shibuya-ku have highly rated this on-premises AI solution as deployable even in environments with strict security requirements.
RAG Architecture Overview with Qwen3.5-9B and OpenClaw
The architecture of a RAG-enabled internal knowledge base AI consists of four layers: document ingestion, vectorization, retrieval, and generation. The document ingestion layer parses documents in various formats including PDF, Word, Markdown, HTML, and internal wikis, converting them into text data. The vectorization layer splits extracted text into chunks and converts them into vector representations using embedding models (e.g., nomic-embed-text, multilingual-e5-large). The retrieval layer uses semantic search to retrieve the most relevant information from document vectors stored in vector databases such as ChromaDB, Qdrant, or Weaviate based on user queries. The generation layer has OpenClaw call Qwen3.5-9B, providing search results as context to generate accurate and contextually appropriate answers. This architecture suppresses LLM hallucinations and provides reliable answers based on actual internal information.
Implementing Document Ingestion and Chunking
The accuracy of a RAG system heavily depends on the quality of document ingestion and chunking. Text extraction from PDFs uses libraries like pdfplumber, PyPDF2, or pdfminer to preserve layout information and table structures while converting to text. For Word documents (.docx), python-docx is used, and for internal wikis or Confluence, content retrieval is implemented through APIs. Chunking combines semantic chunking (splitting by meaningful units) with overlapping chunking (200-300 token overlap between adjacent chunks) to maintain contextual continuity. While chunk sizes of 512-1024 tokens are typically recommended, smaller chunks of around 256 tokens are more effective for improving search accuracy in technical documents. Metadata (filename, creation date, author, department, version) is also attached to each chunk and utilized for filtering during search and result ranking. Small and medium enterprises in Setagaya-ku and Ota-ku minimize operational burden by implementing workflows that automatically ingest existing documents from Google Drive, Dropbox, and SharePoint.
Vector Database Selection and Setup
Selecting the vector database, which serves as the heart of the RAG system, affects system performance and operability. ChromaDB is lightweight and easy to set up, making it ideal for prototypes and small-scale systems. Install with pip install chromadb and run stably on Mac mini by saving to a local directory in persistence mode. Qdrant provides more advanced search capabilities (hybrid search, payload filtering, quantization) and is suitable for medium-scale systems handling hundreds of thousands of documents. It can be deployed on Mac mini via Docker or binary and supports both REST API and gRPC. Weaviate demonstrates power in large-scale systems requiring complex queries and related document exploration through its schema-based approach and graph-based knowledge representation. Using multilingual-e5-large or intfloat/multilingual-e5-base as embedding models enables high-precision handling of documents mixing Japanese and English. When creating collections, set cosine similarity as the metric and optimize the balance between search speed and accuracy by adjusting the HNSW index ef_construction parameter.
OpenClaw Agent RAG Integration Configuration
To integrate RAG functionality into OpenClaw, first create an agent configuration file (~/.openclaw/agents/knowledge-base-agent.yaml) and define the retrieval_tools section. Specify ChromaDB, Qdrant, or Weaviate in the vector_store parameter and configure the connection endpoint, collection name, and authentication credentials. The top_k parameter controls the number of top candidates retrieved during search, typically 3-5 is appropriate, but increasing to 10-15 for highly specialized technical documents improves answer accuracy. The similarity_threshold is the similarity score threshold, and setting it to 0.7 or higher prevents the inclusion of irrelevant information. In OpenClaw's tool_call flow, user queries are first sent to the vector database, and retrieved related documents are formatted to fit within Qwen3.5-9B's context window (32K-128K) and inserted into the prompt. By enabling the retrieval_augmented_generation: true flag, OpenClaw automatically analyzes query intent and distinguishes questions requiring search from those that don't. Startups in Shinagawa-ku and Minato-ku have built environments where the knowledge base AI can be seamlessly used from internal chat through Slack integration.
Query Expansion and Search Accuracy Optimization
To enhance the search accuracy of RAG systems, implementing query expansion and hybrid search is effective. Query expansion inputs user questions into Qwen3.5-9B to generate synonyms, related keywords, and more specific query variations. For example, the query "Where is the contract template?" is expanded into multiple variations like "contract template," "contract format," "contract sample," "NDA template," and vector search is executed for each. Hybrid search combines vector search (semantic similarity) with keyword search (statistical methods like BM25), integrating both scores through weighted averaging (typically 0.7:0.3). Re-ranking methods re-evaluate the top 20-30 candidates retrieved in the initial search using cross-encoder models (such as ms-marco-MiniLM-L-6-v2) to select the final top 5. Metadata filtering adds metadata conditions like department, creation date, or document type to queries to limit the search scope. These optimizations have improved search accuracy by an average of 30-50% for companies in Shibuya-ku and Setagaya-ku, significantly improving user satisfaction.
Improving Answer Quality Through Prompt Engineering
The final output quality of a RAG system heavily depends on the design of prompts given to Qwen3.5-9B. Effective RAG prompts consist of four elements: system role, search context, question, and answer constraints. The system role explicitly states "You are an internal knowledge base AI assistant. Please answer based only on the provided document information" to suppress hallucinations. The search context section organizes document chunks retrieved from the vector DB in the format "Below is related information: [Document 1: filename, creation date] Content... [Document 2: ...] " and inserts them. The question section includes the original user query as is. Answer constraints specify rules like "If information is not found, honestly answer 'No relevant information was found'" and "Please specify the referenced document name at the end of your answer." Adding 3-5 few-shot examples allows learning the expected answer style (concise/detailed, bullet points/paragraph format). Manufacturing companies in Ota-ku have significantly improved answer specialization by preparing few-shot examples containing technical terminology for technical specification searches.
Application Case for Employee Onboarding
RAG-enabled knowledge base AI dramatically streamlines the onboarding process for new and mid-career hires. Traditionally, new employees needed to read through large volumes of documents including work rules, operational manuals, internal system usage guides, organizational charts, and benefits guides, but by introducing a RAG agent, they can instantly obtain answers to specific questions like "How do I apply for paid leave?" or "What's the login procedure for the expense system?" Configure the OpenClaw agent with a dedicated "New Employee Support" persona implementing a friendly and polite tone, plain explanations of technical terms, and presentation of related links and images. Integration with onboarding checklists enables tracking the progress of items new employees should learn and proactively presenting information on unlearned topics. An IT company in Minato-ku introduced a RAG-based onboarding bot, resulting in a 70% improvement in new employee self-resolution rates and a 15-hour weekly reduction in HR inquiry response time. Integration with Slack or Microsoft Teams allows new employees to obtain information in a natural conversational format without interrupting their work.
Technical Documentation Search and Developer Support
For software development teams, accessing technical information such as past project documents, API specifications, design documents, troubleshooting guides, and code review comments is an everyday challenge. RAG-enabled OpenClaw agents index these technical documents and answer developers' specific questions like "What are implementation examples of error handling for the authentication API?" or "What are the precautions during database migration?" including actual code snippets and design diagrams. Integration with GitHub, GitLab, and Bitbucket allows README files, wikis, issue comments, and pull request discussions to be included in the search scope. For code search, symbol information such as function names, class names, and variable names are also stored as metadata to handle specific queries like "What's the implementation of the calculateTotalPrice function?" A SaaS development company in Shinagawa-ku integrated a RAG agent into their development environment, building an environment where technical information can be searched directly from VS Code or JetBrains IDEs, resulting in an average daily reduction of 45 minutes in developer information search time.
Operational Monitoring and Continuous Knowledge Base Improvement
When operating RAG systems in production, monitoring search accuracy and continuous improvement are essential. Query log analysis records user-entered questions, search result similarity scores, ultimately referenced documents, and session times to identify unanswered questions and low-score queries. Implement user feedback functionality ("Was this answer helpful?") and store positive/negative ratings linked to queries. Queries with many negative ratings become targets for improvements such as reviewing chunk splitting, adding metadata, and adjusting prompts. Document update monitoring uses webhooks from internal wikis or file servers to automatically re-index the vector DB when documents are updated. Regular vector DB optimization (deleting old documents, rebuilding indexes, updating embedding models) maintains search performance. A consulting company in Shibuya-ku creates monthly RAG system performance reports and visualizes KPIs such as search accuracy, user satisfaction, and coverage rates on dashboards.
Implementation Support Services by Oflight Inc.
Oflight Inc. provides implementation support services for RAG-enabled internal knowledge base AI utilizing Qwen3.5-9B and OpenClaw for companies in Tokyo areas including Shinagawa-ku, Minato-ku, Shibuya-ku, Setagaya-ku, and Ota-ku. Our support services include interviewing current document management environments and needs, designing optimal RAG architecture, selecting and setting up vector databases, implementing document ingestion pipelines, customizing and tuning OpenClaw agents, integrating with internal systems (Slack, Teams, internal portals), building operational monitoring dashboards, and creating employee training and documentation. Implementation in on-premises environments utilizing Mac mini enables use of advanced AI functionality without sending confidential information to external clouds. We provide consistent support from initial construction to operational support, realizing your company's operational efficiency and knowledge utilization. When considering implementing a RAG-enabled internal knowledge base AI, please consult Oflight Inc. Our experienced engineering team will propose the optimal solution for your company's challenges.
Feel free to contact us
Contact Us