Project CNTT

Guiding assistant for Vietnamese university admissions powered by Advanced RAG

The University Admission Assistant is an AI-powered system designed to support admission inquiries for 15 universities in Ho Chi Minh City. The system leverages Retrieval-Augmented Generation (RAG) techniques to enhance query responses by integrating structured data retrieval with generative AI models. This ensures accurate and context-aware answers to users' admission-related questions.

Link

Architecture System

This project presents two versions of an agent system, comparing their architecture, tools, and functionalities. The improvements in Version 2 result from extensive testing and experimentation.

Version 1

The initial system employs an embedding-based approach using recursive splitting and vector search for efficient querying. It includes the following tools:

Time: Provides the current time in Vietnamese.
getRetrieval: Inspired by self-RAG, retrieves relevant information.
Websearch: Conducts web searches for additional insights.
getScore: Converts queries into SQL and returns relevant results based on scoring.

Version 2 (Final Version)

Significant enhancements were made, including:

Semantic Chunking: Improved data segmentation for better understanding.
Hybrid Search: Combines multiple search strategies for enhanced accuracy.
- Watch more Rerank_EmbeddingModel

Updated Tools:

Time: Retained from Version 1.
Corrective_RAG: A new tool that routes queries through three channels:
- Retrieval: Fetches relevant information.
- SQL: Queries structured data.
- Websearch: Performs web searches for external data.

The Corrective_RAG tool optimizes query routing for more efficient and accurate responses.

Table of content

Guiding assistant for Vietnamese university admissions powered by Advanced RAG
Link
Architecture System
- Version 1
- Version 2 (Final Version)
Table of content
Project structure
Getting started
- Note:
- Run local
  - Prepare enviroment
Evaluate
DEMO
- Version 1
- Version 2
References
- Paper
- Others
Contact

Project structure

├── UNIVERSITY_ADMISSIONS_ASSISTANT
│   ├── src                                      # Source code
│   │   ├── adaptive_rag.py                     # Adaptive retrieval-augmented generation
│   │   ├── corrective_rag.py                   # Corrective RAG implementation
│   │   ├── gemi_agent_v1.py                    # First version of GEMI agent
│   │   ├── gemi_agent_v2.py                    # Updated version of GEMI agent
│   │   ├── grader.py                           # Grading system logic
│   │   ├── load_key.py                         # API key loading utilities
│   │   ├── main.py                             # Main entry point for the application
│   │   ├── query_router.py                     # Routing queries to appropriate modules
│   │   ├── query_to_sql.py                     # Convert queries to SQL statements
│   │   ├── query_transformation.py             # Transform query formats
│   │   ├── retrieval_hybrid.py                 # Hybrid retrieval techniques
│   │   ├── retrieval_nv.py                     # Named-variant retrieval implementation
│   │   ├── serve.py                            # Server-related functionalities
│   │   ├── university_admissions.db            # Database for university admissions
│   │   ├── web_search.py                       # Web-based search utilities
│   ├── README.md                               # Project documentation
│   ├── requirements.txt                        # Dependencies for the project
│   ├── .env                                    # Environment variables (API keys, etc.)
│   ├── .env.example                            # Example of environment configuration
│   ├── .gitignore                              # Ignore files for Git
│   ├── creadientials_vertex.json               # JSON config file (possibly credentials)

Getting started

To get starte with this project, we need to do the following

Config all api key in .env.example(Qdrant, Tavily, GROQ, GEMINAI, LANGCHAIN SMITH)

Note:

The reason for using two Qdrant APIs is that one is used for storing the database with recursive chunking, while the other is used for semantic chunking.

Additionally, during development, you can experiment with the free $300 Google Cloud Console credits to obtain API keys.

While configuring, I also use LLM ChatVertexAI, and the setup involves configuring Google Console and downloading the JSON file. This file is required to load the LLM values for the agent.

You can refer to the following link for a guide on setup and credential creation:
LangChain Google Vertex AI Setup

Or you can see step by step to get file creadiential json in this below:
Step 1: Create a Google Cloud Project

Go to Google Cloud Console.
Click on "Select a project" (or "Create Project" if you don’t have one).
Enter a Project Name and click "Create".

Step 2: Enable Vertex AI API

In the Google Cloud Console, navigate to APIs & Services → Library.
Search for "Vertex AI API".
Click Enable.

Step 3: Create a Service Account

Go to APIs & Services → Credentials.
Click "Create Credentials" → "Service account".
Enter a Name, ID, and Description, then click Create.
Assign the Vertex AI User and Editor roles (or a custom role with sufficient permissions).
Click Done.

Step 4: Generate and Download the JSON Key

In the Credentials section, find the newly created Service Account.
Click on it → Navigate to the Keys tab.
Click "Add Key" → "Create new key".
Select "JSON" format and click "Create".
A JSON file will be downloaded to your system (e.g., tdtuchat-16614553b756.json).

Step 5:

Rename file json in 2 files python gemi_agent_v1.py, gemi_agent_v2.py correctively

Run local

Prepare enviroment

Install all dependencies dedicated to the project in local

python -m venv .venv
source .venv/Scripts/activate
pip install -r requirements.txt

After configuring all JSON files, setting up all API keys in the .env file, and installing dependencies from requirements.txt, follow these steps to run the system:

Open Terminal

Activate Virtual Environment

Using venv:

source .venv\Scripts\activate      # On Windows

Navigate to the Project Folder
```
cd UNI_ADMISSIONS_ASSISTANT/src
```
Run the Application
```
streamlit run main.py
```

This will start the University Admission Assistant system, which will be accessible in your browser via the Streamlit interface. 🚀

Evaluate

In this experiment, we compare the performance of two information retrieval methods, Navie RAG and Hybrid Search, when using the chunking methodology with recursive splitting. The two models are evaluated on prominent assessment measures such as Precision@k (P@k), Recall@k, Mean Reciprocal Rank (MRR@k), Discounted Cumulative Gain (DCG@k), and Normalized DCG (NDCG@k), with k = 3 and k = 5, respectively.
Watch : Data evaluation

Recursive chunking

Semantic chunking

Evaluation Using RAGAS

To evaluate the system, we tested two versions: Self-RAG (Version 1) and Corrective-RAG (Version 2). LangChain Smith was also used to assess workflow performance and processing times.

Metric	Self-RAG (Version 1)	Corrective-RAG (Version 2)
Faithfulness	0.7241	0.834
Answer Relevancy	0.7161	0.666
Context Recall	0.3113	0.504
Context Precision	0.4167	0.571
Semantic Similarity	0.8835	0.886
Answer Correctness	0.4014	0.565

Performance Analysis

Corrective-RAG (Version 2) shows improved faithfulness, context recall, and answer correctness.
Self-RAG (Version 1) performs slightly better in answer relevancy but struggles with context recall and correctness.

Processing Time Comparison

Self-RAG: Execution times range from 5.64 to 11.84 seconds, with extreme cases reaching 88+ seconds, showing high inconsistency.
Corrective-RAG: Execution times vary from 7.66 to 55.31 seconds, with some stable runs under 10 seconds and complex queries extending beyond 20 seconds.

Optimization Considerations

Self-RAG struggles with complex queries, requiring query optimization and system enhancements for stability.
Corrective-RAG still experiences variability, which could be improved through caching and processing pipeline optimizations.

In conclusion, Corrective-RAG (Version 2) provides better faithfulness, context handling, and correctness, making it a more reliable choice despite some runtime fluctuations.

DEMO

Version 1

Version 2

References

Paper

[1] Wang, H., Zhang, R., Tao, M., & Liu, Y. (2023). Retriever-Augmented Generation for Knowledge-Intensive NLP Tasks: A Survey. arXiv. https://arxiv.org/pdf/2307.06435

[2] Ram, O., Shreter, U., Shoham, N., & Levy, O. (2023). Corrective RAG: Intervention-Based Retrieval for Mitigating Hallucination in LLMs. arXiv. https://arxiv.org/pdf/2303.18223

[3] Yan, S.-Q., Gu, J.-C., Zhu, Y., & Ling, Z.-H. (2024). Corrective Retrieval Augmented Generation. arXiv. https://arxiv.org/abs/2401.15884

[4] Zhao, P., Zhang, H., Yu, Q., Wang, Z., Geng, Y., Fu, F., Yang, L., Zhang, W., Jiang, J., & Cui, B. (2024). Retrieval-Augmented Generation for AI-Generated Content: A Survey. arXiv. https://arxiv.org/abs/2402.19473

[5] Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv. https://arxiv.org/abs/2312.10997

[6] Ye, Q., Axmed, M., Pryzant, R., & Khani, F. (2023). Prompt Engineering a Prompt Engineer. arXiv. https://arxiv.org/abs/2311.05661

[7] Es, S., James, J., Espinosa-Anke, L., & Schockaert, S. (2023). RAGAS: Automated Evaluation of Retrieval Augmented Generation. arXiv. https://arxiv.org/abs/2309.15217

[8] Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). SELF-RAG: Learning to retrieve, generate, and critique through self-reflection [Preprint]. arXiv. https://arxiv.org/abs/2310.11511

Others

Contact

If you want to support or get API and URL QDRANT ask us:

Member 1
- Hoang Dinh Quy Vu
- 0868245465
- hoangdinhquyvu.snape.22@gmail.com
Memeber 2:
- Tran Quoc An
- 0383474552
- quocan1203it@gmail.com