August 18, 2025

AI Assistant for Cybersecurity: Performance Hacks

Building an AI assistant that performs well in a controlled environment is one thing; ensuring it remains robust and reliable especially in a high-stakes domain like cybersecurity—is a different challenge altogether. We recently developed a self-hosted Retrieval-Augmented Generation (RAG) system tailored for cybersecurity use cases. Initial benchmarks were strong, and we had full control over the stack. However, as the system evolved and components like models were upgraded, we began to notice a subtle drift in response quality—revealing how small, overlooked changes can gradually impact overall performance.

The assistant still answered fluently, but the relevance of its responses started to erode. This wasn’t due to obvious bugs or model failure—it stemmed from underappreciated nuances in how retrieval performance can drift as the system evolves over time.

AI Assistant for Cybersecurity

Identifying Retrieval Drift in Evolving RAG Pipelines

Through evaluation, we uncovered subtle but impactful mismatches. An instruction-tuned embedding wrapper was inadvertently used with a non-instruction-tuned model, leading to inconsistent vector representations—an easy oversight that can occur when upgrading models over time without fully accounting for underlying framework changes or limitations in the tech stack (LangChain Community Instruct Embeddings).. We also noticed that a subtle change in the similarity measurement had a significant impact on how accurately relevant information was retrieved (Milvus Metric Docs).. Additionally, missing filters in the retrieval process led to overly broad results, introducing irrelevant context into the responses. While none of these issues caused outright failure, together they quietly degraded performance—highlighting how small configuration gaps in RAG systems can have significant downstream effects.

Retrieval Layer Enhancements for Improved AI Assistant Performance

Once the root causes were clear, we implemented targeted fixes: we switched to the correct non-instruction-tuned embedding model with a plain wrapper(LangChain HuggingFaceEmbeddings), adjusted the similarity metric to use inner product with proper vector normalization(Distance Metric Impact), and conducted controlled tests using FLAT indexing to isolate retrieval performance(Milvus Indexing Reference). These adjustments not only restored accuracy but slightly improved our baseline metrics, with Recall at 0.9036, MRR at 0.8730, and NDCG at 0.8864. To further enhance performance, we introduced reranking to reorder retrieved results based on deeper semantic relevance, and applied query expansion to break down complex queries into simpler parts. Together, these enhancements significantly improved response quality and enabled the assistant to handle nuanced cybersecurity queries more effectively.

The Role of Retrieval Metrics in RAG based AI Assistant Reliability

These details matter because metrics like Recall, MRR, and NDCG aren’t just academic—they act as essential guardrails for retrieval quality. Issues in retrieval logic rarely cause visible errors but often surface as subtle failures: hallucinated answers, buried insights, or off-target responses. In domains like cybersecurity, that can mean the difference between actionable intelligence and misleading output. To avoid these pitfalls, it’s crucial to understand your embedding models and their wrappers—especially the impact of instruction tuning—ensure alignment in similarity metrics, and monitor retrieval-specific metrics alongside generation quality. Enhancing your system with techniques like reranking and query expansion can further strengthen response accuracy and relevance.

RAG systems are powerful—but also fragile if not designed and monitored with care. If you’re building AI assistants for high-stakes domains, pay attention to the retrieval layer. That’s where the battle for accuracy is often won or lost.

For a deep technical dive, including all evaluation methods, architecture diagrams, and implementation tips, refer to our white paper.