Research

Related Project Review

RAG-Demystified by pchunduri6

This is a question-answering system that retrieves data from data warehouse, and generates response using LLMs. For the first step, a data warehouse is used to store the information relevant to the question answering task. Then, we retrieve the data with two different methods: vector retrieval and summary retrieval. Finally, the system aggregates the LLM responses from sub-questions into a final response.

This GitHub repository is really detailed, it helped us a lot in understanding the process step by step, including chunking the text (vector retrieval is used in our project), prompt template design (prompting does affect the quality of generated text), and most importantly, showing us our future implementing goals (what and how can we improve our prototype in the future).

Grab's Report Summarizer

There are a few components in this report summarizer system. Airflow acts as a scheduler, it invokes the SpellVault application when requested. SpellVault is an LLM-based app, processes the user prompts and retrieve the relevant information using RAG. Then Data-Arks, acts as the data middleware, executes the predefined query and send the SQL result output back to SpellVault. The summarized query output is then sent to Slack, functions as a user interface.

From this project, we understand the system architecture of a high maturity application that is using RAG. Knowing how the system can be developed via different APIs, also knowing what these APIs act as in the application. This allows us to look at real life cases, using created pathways to achieve a cheaper cost (the company owns the APIs for SpellVault and Data-Arks) but better performance application.

Technology Review

Pinecone Vector DB

Pinecone is a vector database designed for managing high-dimensional vector data. Having a vector database is definitely important and useful when it comes up to high-dimensional vector embeddings, which is a must operation in RAG and LLMs.

The reason why we chose Pinecone includes scalability, allowing real-time search, and allowing semantic search. Pinecone is built to handle large-scale vector data, which maximizes the limit of input files. Pinecone provides real-time search, which allows quick response. Lastly, Pinecone performs semantic searches, retrieving results that are contextually relevant rather than just keyword matches, which leads to higher consistency and relevancy.

Why Pinecone over other vector databases? We thought about using Weaviate, however, Pinecone has a larger scalability without the need of manually configuring the database. Scalability is a key factor in our project, as instead of chat bot or other RAG applications, we are specifically dealing with large number of papers. Pinecone is also easier to use, as it provides a managed service that does not require any maintenance or configuration, while Weaviate is open-sourced. We wont consider Weaviate as it requires more time and effort to set up and maintain.

OpenAI GPT-4o Mini

We used OpenAI GPT-4o Mini as the Large Language Model (LLM) of our prototype. The LLM plays a crucial role in a RAG system. It processes the embedded text to generate sections of a review, that are then combined into a full systematic review.

The reason we why chose GPT-4o Mini as the model was because its good understanding with the prompt and generates a decent quality result output. We were not able to invest in a better models like GPT-4.5 or fine-tuning pre-trained LLM due to the resource and pricing problems.

So how is GPT-4o Mini better than GPT-3.5? GPT-4o Mini supports up to 128k tokens, whereas GPT-3.5 only supports up to 16.4k tokens. Larger number of tokens allows us to generate longer text with better quality. Moreover, GPT-4o Mini has a more recent knowledge cut-off(Oct 2023) comparing to GPT-3.5(Sept 2021).

Next.js

We have chosen to use the React framework, Next.js, with Javascript to develop our website as opposed to other web frameworks like Angular or Vue.js because there is more flexibility with React. React provides plenty of tools, such as state management hooks and the use of the UI framework Tailwind CSS, allowing for more efficient development of design. There is a vast amount of libraries available with React, including the conversion of markdown text to HTML. We specifically chose Next.js because it provides server-side rendering, reducing the load time, and its file-based routing to simplify page navigation.

Technical Decisions Summary

Our technical choices evolved through iterative experimentation, balancing effectiveness with cost-efficiency. Decisions were made both pre-development and during implementation phases, with continuous optimization of approaches to minimize resource expenditure while maximizing system performance.

References

[1]

J. A. A. P. A. A. N. Pramod Chunduri, "RAG-Demystified", GitHub Repository, 26 Jan 2024. https://github.com/pchunduri6/rag-demystified

[2]

Y. N. O. Edmund Hong, "Transforming the Analytics Landscape with RAG-powered LLM", Grab Tech Blog, 9 Oct 2024. https://engineering.grab.com/article-link