Retrieval-Augmented Generation (RAG) chatbots are valuable tools for providing users with relevant and up-to-date information. However, as the world evolves and data in your document sources (PDF, HTML, Word, Text) changes over time, it’s crucial to have a robust strategy in place for updating these vectors so that your RAG chatbot can continue delivering accurate responses. In this blog post, we will discuss various strategies for updating RAG data sources, including using transactional data stores, batch update processes, and implementing a Change Data Capture (CDC) approach.
One of the most effective ways to manage updates in your RAG chatbot’s data sources is by chunking documents into a transactional data store, such as LangChain or LLamaIndex. This allows you to replace chunks when they are updated while also tracking important metrics about their performance, including how often they are queried and the number of positive/negative responses they receive from users.
When implementing a chunking application, it’s essential to consider real-time edits in your Content Management System (CMS) system. This will enable you to make corrections to poor or incorrect responses promptly and ensure that your chatbot provides accurate information at all times.
An efficient way to keep your RAG data sources up-to-date is by employing a batch update strategy, either through a timer-based system or by detecting changes in the documents themselves. This method may not be the most immediate, but it ensures that your chatbot always has access to current information.
To stay informed about updates in your data sources, you can also leverage Change Data Capture (CDC), which involves monitoring changes within your database and triggering actions based on those modifications. One example of how this works is VectorStreaM, an open-source tool developed by Patrick Walton that enables real-time vector updates in MongoDB Atlas.
In summary, developing a plan for updating RAG data sources is essential for maintaining the accuracy and reliability of your chatbot’s responses. By utilizing transactional data stores, employing batch update strategies, and leveraging CDC techniques, you can ensure that your chatbot remains well-informed and ready to assist users with up-to-date information.
Data in your document sources (PDF, HTML, Word, Text) will change over time. You should have a strategy for updating your vector embeddings so your RAG (retrieval augmented generation) chatbot can talk about up to date information
Consider a process of chunking your documents into a transactional data store (using langchain or llamaindex), so you can replace chunks if they are updated. This also lets you store metrics about the chunks performance, such as how often it’s queried. You can also track how many times the chunk had a positive or negative response if you implement user provided thumbs up/down on your chatbot responses.
A batch update strategy is recommended, either on a timer or when you detect changes to the documents. This is not super efficient, but it’ll ensure you have up to date knowledge in your chatbot
If you build a CMS system on top of your text/knowledge chunks you can edit them in real time to correct poor or incorrect responses from the chatbot.
Also consider that you need to run updated chunks through your text embedding models, so you have updated dense vectors in your vector search engine. This can be done using CDC (change data detection) and triggers or as part of your chunking application. VectorStream (https://github.com/patw/VectorStream) is an example of how to do this in MongoDB Atlas
Call to action: Make sure you have a plan in place for updating the knowledge your chatbot has, so it’s always giving solid responses.
In the realm of AI-driven chatbots, particularly those employing Large Language Models (LLMs) as their foundational technology, one crucial factor that often determines the success of these applications is the integration of user feedback. This blog post delves into the importance of user feedback in improving a chatbot’s performance, as well as the significance of storing question-answer pairs and implementing preference models for optimization.
A chatbot’s primary goal is to provide accurate and relevant responses to users while maintaining an engaging conversation. However, not all generated answers will be correct or satisfactory. This is where user feedback comes into play. By allowing users to rate the quality of responses, we can gain valuable insights into which parts of our LLM-driven chatbot are functioning well and which require improvement.
One effective way to gather data on a chatbot’s performance is by storing each question-answer pair alongside the user feedback provided. This approach enables us to analyze trends in response accuracy and identify areas where our chatbot may be struggling. Additionally, this information can be used for recall benchmarking, which compares the chatbot’s performance over time.
A simple yet powerful mechanism to collect user feedback is by incorporating a thumbs up or thumbs down feature in our chatbot interface. This allows users to quickly indicate whether they found the response helpful or not, providing us with valuable data on the overall accuracy of our LLM-driven chatbot. By assigning numerical scores based on user feedback, we can filter out underperforming text chunks and focus on improving their quality.
In the short to medium term future, advancements in supervised fine tuning techniques will enable us to optimize our chatbots more effectively than ever before. This method involves training a model using labeled data, which can be derived from the question-answer pairs collected through user feedback. By leveraging this approach, we can tailor our LLM-driven chatbot to better suit the needs of its users and deliver increasingly accurate responses over time.
In conclusion, the integration of user feedback is essential for continuously improving the performance and relevance of LLM-driven chatbots. By storing question-answer pairs and implementing a thumbs up/down mechanism, we can gain valuable insights into our chatbot’s accuracy and identify areas requiring improvement. Moreover, as supervised fine tuning techniques evolve, we will be able to optimize our chatbots more effectively, ensuring they remain an indispensable tool in the world of AI-powered communication.
When building your LLM (large language model) driven RAG (retrieval augmented generation) chatbot, you want user feedback to improve its performance.
Not all answers coming from the chatbot will be right. It’s important to store the question and the response for further analysis along with a mechanism like thumbs up/down on the response. This lets you “score” your text chunk accuracy and eventually filter out chunks that are not producing good results. Add a key in your mongo document for this feedback.
Storing the question/answer pairs can also be used later for recall benchmarking and supervised fine tuning an instruct-tuned LLM on your data set. Currently supervised fine tuning and direct preference optimization model training techniques are expensive and painful to do, but this will change in the short to medium term future.
Large Language Models (LLMs) have rapidly become a staple in natural language processing, with their impressive ability to generate human-like responses and solve complex tasks. As these models continue to evolve, it’s tempting to rely on them as an all-encompassing source of knowledge. However, there are several factors that limit LLMs’ accuracy and reliability when used without proper precautions. In this blog post, we will delve into the reasons why simply querying the LLM directly may not be the best approach for obtaining accurate information.
LLMs are a blend of factual data and generalizations based on the vast amounts of text they have been trained on. While these models possess an extensive repository of facts in their neural weights, it is crucial to recognize that they do not represent all human knowledge verbatim. Instead, they provide a lossy representation that may not always be accurate or up-to-date.
LLMs are trained on data available at a specific point in time. This means that the model might not have access to information generated after its training set cutoff date. In addition, the model is unable to account for updates or changes that occurred since its training phase, which may lead to outdated information being provided as an answer.
One of the most significant limitations of LLMs is their tendency to “hallucinate” answers when they do not have the necessary facts in their training set. This can result in incorrect or misleading responses, especially when dealing with complex questions that require specific knowledge or context. To mitigate this issue, it’s essential to use prompting techniques like zero-shot summarization with augmentation, which involves sending a question along with possible answers and allowing the LLM to generate a “smooth” response using only the provided data.
To ensure that the information provided by an LLM is accurate, reliable, and relevant, it’s crucial to employ Retrieval Augmented Generation (RAG). This technique combines the strengths of retrieval models with LLMs, enabling the model to search for relevant information in external databases or knowledge bases. By incorporating RAG into your workflow, you can significantly improve the quality and credibility of the responses generated by your LLM.
In conclusion, while LLMs have an impressive range of capabilities, they are not infallible sources of knowledge. By understanding their limitations and employing techniques such as zero-shot summarization with augmentation and RAG, you can ensure that your LLM-powered applications provide accurate, up-to-date, and reliable information to users.
LLMs (large language models), after they have been trained are a mixture of fact recall and generalization. They do indeed have tons of facts memorized in the neural weights, but they are a lossy representation of all human knowledge. They have also been trained at a specific point in time and have no data beyond this point in time. The exact details of your business and your documents are probably not trained into the model, or have been generalized to a point where they would be misrepresented, even if they were on the public internet and pulled into the training process. LLMs will “hallucinate” answers for questions you ask, if they don’t have the facts in the training set. To prevent this we rely on a prompting technique called zero-shot summarization with augmentation. We send the question, with the possible answer and let the LLM provide a “smooth” response, with only the exact data provided.
See our other blog posts about RAG (retrieval augmented generation) for more details.
In today’s fast-paced business landscape, knowledge-driven chatbots play a crucial role in streamlining internal communication and enhancing organizational productivity. The Retrieval Augmented Generation (RAG) approach is the leading technique for building such chatbots, utilizing AI algorithms to retrieve and generate accurate responses based on a vast repository of information.
A company’s knowledge is often scattered across multiple systems in various formats. Identifying relevant data sources is an essential part of designing a chatbot. To begin, select a corpus of documents that represent the knowledge base you want your chatbot to answer questions about. This should be something immediately useful for specific groups within the organization.
The easiest data sources to start with are text documents, PDFs, and Word documents. They are well supported by AI systems such as LLamaIndex and Langchain. Using these tools, you can ingest and chunk the selected documents from a directory. A great tutorial on implementing RAG with Atlas Vector Search, LangChain, and OpenAI can be found here: RAG with Atlas Vector Search, LangChain, and OpenAI | Mongodb.
After setting up your chatbot, it is essential to test its performance extensively. Ask questions that the chatbot should be able to answer based on the source material. This will help you identify any inaccuracies or gaps in the knowledge chunks. Validate what your chatbot gets right and wrong so you can investigate and modify the knowledge chunks later if necessary.
While unstructured data such as text documents, PDFs, and Word documents are ideal for starting a RAG-based chatbot, structured data like tables, point form lists, spec sheets, XML, and JSON documents may not embed well. Therefore, they should not be the first data sources you attempt to use. In future blog posts, we will cover techniques for handling structured data using pre-summarization.
In conclusion, building a knowledge-driven chatbot using RAG involves selecting relevant data sources, starting with unstructured documents like text, PDFs, and Word files, testing your chatbot extensively, and handling structured data with caution. With these strategies in mind, you can create an effective RAG chatbot that enhances communication within your organization and streamlines daily operations.
RAG (Retrieval Augmented Generation) is the leading technique for building knowledge driven chatbots for your organization
Data source selection is an important part of the design for chatbots. A company’s knowledge dispersed in dozens of different systems and in different formats.
The easiest data sources to start with are text documents, pdfs and word documents. These types are well supported in the LLamaIndex and Langchain systems.
Start by identifying a set of documents that represents a corpus of knowledge that you want the chatbot to answer questions about. Pick something that has immediate utility to some group within the organization
Use LLamaIndex or Langchain to ingest and chunk these documents from a directory. A great tutorial can be found here: RAG with Atlas Vector Search, LangChain, and OpenAI | MongoDB
Test your chatbot extensively with questions it should be able to answer, based on the source material. Validate what it’s getting right and wrong so you can investigate and modify your knowledge chunks later if they are incorrect.
A word of warning: Structured data such as tables, point form lists, spec sheets, xml and json documents do not embed very well and should not be the first data sources you attempt to use.
We will cover structured data techniques using pre-summarization in later blog posts.
The advent of generative AI technology has opened up a world of possibilities for businesses to enhance their operations and streamline communication. OpenAI’s ChatGPT and image generation models like Stable Diffusion have demonstrated the potential of AI in solving business problems and improving efficiency across various sectors. In response, nearly every company worldwide is developing an AI strategy to integrate generative AI technologies into their existing business models. One of the most promising use cases for generative AI in businesses is Retrieval Augmented Chatbots (RAG), which leverage the power of Large Language Models (LLMs) to answer questions based on data provided within a prompt (augmentation).
Retrieval Augmented Generation (RAG) is an approach that combines retrieval methods with generative language models, such as GPT-3 or OpenAI’s ChatGPT. This technique allows chatbots to “talk to documents,” ingesting them, chunking the text, vectorizing those chunks using a text embedding model, and then employing semantic search to retrieve relevant chunks of text for answering questions posed by users.
The integration of RAG with Large Language Models (LLMs) has enabled chatbots to tap into the vast potential of retrieval-based augmentation. LLMs can comprehend complex queries and provide accurate responses based on the data provided in the prompt, further enhancing their capabilities to assist businesses in various aspects.
One significant use case for RAG-enabled chatbots is empowering customer support agents by granting them access to an entire knowledge base they work with through “chat-with-docs.” This feature allows agents to search and retrieve relevant information from internal documents, articles, or guides without the need for manual browsing. As a result, customer support teams can provide faster and more accurate responses to inquiries, improving overall customer satisfaction.
RAG-enabled chatbots can also be used directly by customers to answer questions about billing, insurance plans, coverage, or processes. By providing customers with easy access to essential information, businesses can streamline their communication channels and reduce the need for human intervention in basic inquiries. This not only improves customer experience but also frees up customer support agents to focus on more complex tasks.
Another crucial application of RAG-enabled chatbots is connecting them to internal code bases, allowing developers to ask questions about large and complex codebases. This feature is particularly useful for navigating through vast legacy codes, making it easier for developers to find relevant information quickly and efficiently. As a result, businesses can improve developer productivity and ensure that teams spend more time creating innovative solutions rather than wasting time searching for information within their codebase.
The future of most companies will be chatbot-based systems with vectorized knowledge spanning the entire history of the company and all internal knowledge. This intelligence augmentation approach aims to provide every employee with an AI-powered assistant, reducing friction in day-to-day activities and improving overall productivity. As businesses continue to integrate generative AI technologies into their operations, RAG-enabled chatbots will play a crucial role in shaping the future of work, streamlining communication, and fostering innovation across various industries.
While the potential of RAG-enabled chatbots is undeniable, it’s essential to consider several design factors when implementing these systems. These include:
Chunking knowledge into manageable pieces to ensure efficient retrieval and vectorization.
Selecting appropriate text embedding models to optimize semantic search capabilities.
Implementing robust security measures to protect sensitive data and prevent unauthorized access.
Designing intuitive user interfaces that allow users to interact with chatbots seamlessly.
Continuously training and updating the LLM to ensure it remains up-to-date with the latest information and trends.
In conclusion, Retrieval Augmented Chatbots (RAG) represent a game-changing approach to leveraging generative AI technologies in businesses. By combining retrieval methods with powerful Large Language Models (LLMs), these chatbots can revolutionize customer support, streamline communication channels, and empower employees to work more efficiently. As companies continue to embrace AI-driven solutions, RAG-enabled chatbots will undoubtedly play a pivotal role in shaping the future of business intelligence and productivity.
OpenAI’s ChatGPT and image generation models like Stable Diffusion have shown the promise for using AI to solve business problems
Nearly every company in the world is attempting to come up with an AI strategy to integrate generative AI to enhance their existing business
The most straightforward use case for generative AI in business is retrieval augmented chatbots using RAG (retrieval augmented generation) as the underlying technique
Chatbots can be used to “talk to documents”. Ingest the documents, chunk the text, vectorize the chunks using a text embedding model and then use semantic search to retrieve those chunks of text to send to a Large Language Model (LLM) to answer questions.
RAG takes advantage of LLMs ability to answer questions based on data provided in the prompt (augmentation).
Chatbots can be used to empower customer support agents by giving them chat-with-docs access to the entire knowledge base they work with.
Chatbots can be used by customers directly (with more guardrails in place) to answer questions about billing, insurance plans, coverage, and processes.
Chatbots can be connected to internal code bases to allow developers to ask questions about large, complex code bases. This is especially useful with large legacy code.
The future state for most companies will be chatbots that have vectorized knowledge for the entire history of the company and all internal knowledge. This gives every single employee an assistant to reduce friction in day to day activities. We call this the Intelligence Augmented Workforce.
Chatbots have many design considerations around how to chunk and vectorize knowledge, that will be addressed in future blog posts.
Large Language Models (LLMs) have revolutionized the field of natural language processing, providing us with powerful tools for text generation, understanding, and reasoning. However, as with any technology, there are limitations to what LLMs can achieve on their own. One such limitation is the issue of “hallucinations”, where an LLM may provide incorrect or fabricated information in response to a question that it has generalized and not retained specific details for. This blog post will explore Retrieval Augmented Generation (RAG), a technique that utilizes semantic search and augmented prompt engineering to enhance the performance of LLMs by providing them with contextually relevant information to answer questions more accurately.
LLMs are trained on massive corpora of text from various data sources, often reaching multiple trillions of tokens (words) of internet content. They combine memorization and generalization to provide information compression, which can be inherently lossy, resulting in the aforementioned hallucination phenomenon. Despite these limitations, LLMs are excellent summarization and reasoning machines that have the potential to revolutionize fields like AI-powered chatbots, virtual assistants, and content generation.
RAG is a technique that leverages an LLM’s strong summarization capabilities by allowing users to provide all the data required to answer a question, along with the question itself. This can be achieved through a combination of information retrieval and prompt engineering techniques. At its core, RAG aims to address the issue of hallucinations by providing contextually relevant knowledge to LLMs to help them ground their answers in reality.
A key aspect of RAG is prompt engineering, which involves crafting an effective prompt that includes both the user’s question and any relevant data required for the LLM to provide a reliable answer. This can involve techniques such as prompt augmentation, where additional information is added to the initial question. For example, if you asked an LLM “What is my name?”, providing the prompt “Hello, my name is Patrick. What is my name?” would help ground the response and prevent hallucinations.
A major challenge in implementing RAG is retrieving relevant information from vast databases or knowledge repositories to include in the LLM’s prompt. Traditional database techniques and simple lexicographical searches may not be effective, as users do not typically ask questions that match database query formats. Instead, semantic search is a more suitable approach for RAG implementations, as it allows the retrieval of information based on dense vector similarity. This enables the retrieval of chunks of knowledge that are semantically similar to the user’s question, providing contextually relevant information to the LLM without relying solely on lexical matching.
Retrieval Augmented Generation (RAG) is an innovative technique that harnesses the power of semantic search and prompt engineering to enhance the performance of large language models by providing them with contextually relevant information to answer questions more accurately. By addressing the issue of hallucinations and improving the grounding of LLM responses, RAG has the potential to revolutionize the way we interact with AI-powered chatbots, virtual assistants, and content generation tools. As we continue to explore the possibilities of RAG, it is essential to focus on refining data ingestion, prompt engineering, and text embedding models for optimal performance and accuracy in our LLM implementations.
Large Language Models (LLMs) are trained on massive corpus of text from various data sources
This can be up to multiple trillions of tokens (words) of text from the internet
LLMs are a combination of memorization and generalization and are a type of information compression
The information compression is inherently lossy and not all details are retained. If you ask an LLM a question that it has generalized and not retained details for it can either tell you it doesn’t know (ideal answer) or worse make up an answer. This is called a hallucination.
LLMs are excellent summarization and reasoning machines
Augmented Generation takes advantage of an LLMs strong summarization capability by allowing you to provide all the data required to answer the question, along with the question itself. If you combine that with an information retrieval mechanism you have Retrieval Augmented Generation or RAG.
A simple example is putting something in a prompt like “Hello my name is Patrick. What is my name?” This is the most basic example of a prompt augmentation technique.
In a perfect world you could put all your knowledge and data in a single document and provide that whole document in the LLM prompt to answer questions. This is slow and expensive with our current LLM technology.
Retrieving chunks of data from your own data sources solves this issue. It allows you to provide these chunks of knowledge to the LLM, in the prompt to get it to answer questions.
Retrieving information is difficult. Users don’t ask questions that look like SQL or MQL style queries. You can’t rely on traditional database techniques.
Lexical search is better, but you need to rely on the user’s question having tokens (words) that match something in the lexical search index. This is also not optimal.
Semantic search is a good match for the problem space because it allows you to search, semantically, using dense vector similarity, for chunks of knowledge that are similar to the question.
So the workflow for RAG is to intercept the users prompt, send it to a vector search engine, get a fixed number of results (usually 3-10) and “prompt stuff” these results into the LLM prompt, along with the question. The LLM then has all the information it needs to reason about the question and provide a “grounded” answer instead of a hallucination.
The real complexity in RAG solutions is in how to ingest your data, how to chunk it, what text embedding model works best for your use case, the prompt engineering that nets you the most reliable and accurate answers and finally the guard rails that go around the inputs and outputs to prevent the LLM from providing undesirable answers. We will cover all these topics in future posts.