Self-Hosted Rag For Institutional Faqs: Two-Level Abstention, Reliability, and Cpu-Only Performance

Large Language Model (LLM)-based chatbots are employed to streamline the handling of frequently asked questions in higher education; however, their institutional adoption is constrained by the risk of ungrounded responses (hallucinations) and by the need for traceability and quality control in sensitive domains. This study presents a domain-restricted chatbot prototype based on a Retrieval-Augmented Generation (RAG) architecture, incorporating semantic retrieval over publicly available institutional documentation and response generation through a self-hosted LLM, with control mechanisms that enable abstention when sufficient evidence is unavailable. The evaluation comprises (i) quantitative testing on 451 queries (351 in-scope and 100 out-of-scope), (ii) end-to-end latency measurement by percentiles and load testing, and (iii) user acceptance assessment (n = 80). The system achieves Macro-F1 = 0.8212, correctness (answered in-scope) = 89.81\% (291/324; excluding 27/351 in-scope abstentions), 4.17\% hallucinations (evaluable generated responses, n = 336), abstention recall = 88\%, in-scope non-rejection rate = 92.31\%, and P95 latency = 4.67 s. Overall, the results support the feasibility of a RAG-based approach with abstention control to enable the automation of institutional queries with adequate operational metrics in a higher education context.

Erick Mora
Universidad de las Fuerzas Armadas ESPE
Ecuador

Marco Esparza
Universidad de las Fuerzas Armadas ESPE
Ecuador

Mayra Alvarez
Universidad de las Fuerzas Armadas ESPE
Ecuador

Geovanny Cudco
Universidad de las Fuerzas Armadas ESPE
Ecuador