Graph Database or Natural Language Query?
Graph Database or Natural Language Query?
In recent months, there has been a growing interest in using graph databases and large language models (LLMs) for querying structured data. While both approaches have their advantages, there are some important considerations that need to be taken into account when deciding which one to use.
Querying Structured Data with LLMs
Querying structured data with LLMs, such as retrieval augmented generation (RAG), can be challenging. This is because structured data does not vectorize well, and when an LLM summarizes it into text, details like dates, times, and numeric values are often poorly represented in the dense vector output of the embedding model.
Graph Databases for Structured Data
To address this issue, some graph database vendors have proposed using graph structures to query structured data with vectorized text concepts as starting points to explore the graph. This approach involves ingesting structured data and converting it into a graph structure using LLMs that generate semantic triplets. However, this method is not without its problems.
The “Exploding Edge” Problem
One major challenge with using graphs to query structured data is the “exploding edge” problem. Any node in a graph can have an arbitrary number of edges, and each edge leads to another node with an arbitrary number of edges. This can result in a large amount of data being pulled in, making it difficult to determine how many levels deep to traverse the graph to get relevant information for a user’s question. This can easily hit the LLM context window limits.
Natural Language Query for Structured Data
A more practical approach for working with structured data is to use natural language query (NLQ). With NLQ, the LLM is given the schema of the database and samples of the data, which allows it to generate queries on behalf of the user. The generated query is then executed, and the LLM uses the resulting data set, along with the original question, to provide an answer based on the results.
Conclusion
In summary, while graph databases and LLMs can be used to query structured data, they each have their limitations. Graph databases can help address some of the challenges of working with large, interconnected data sets, but the exploding edge problem can make them difficult to work with in practice. On the other hand, natural language query provides a more straightforward and effective way to query structured data using LLMs. Ultimately, the choice between graph databases and natural language query will depend on the specific use case and the data being worked with.
Keywords: Graph database, Natural Language Query, LLM, Retrieval Augmented Generation, structured data.
- Human Intervention: Minor
Facts Used:
-
- Querying structured data with LLMs (large language models), using RAG (retrieval augmented generation) is difficult.
-
- Structured data does not vectorize well and if you summarize it with an LLM into text, details like dates/times and numeric values will be poorly represented in the dense vector output of the embedding model
-
- Graph database vendors are proposing graph structures to query structured data with vectorized text concepts as starting points to exploring the graph
-
- Structured data is ingested and converted into a graph structure with LLMs which generate semantic triplets.
-
- The problem with graphs is the “exploding edge” problem where any node can have an arbitrary number of edges and each edge leads to another node with an arbitrary number of edges.
-
- It’s impossible to determine up-front how many levels deep to traverse in the graph to get relevant information for the users question, so LLM context window limits can be hit very easily when pulling in too much data.
-
- Most use cases on structured data need Natural Language Query: Give the LLM the schema of your database and samples of the data and let it generate queries for you.
-
- Execute the generated query, get a result set back and have the LLM use the result set along with the original question to answer it based on the results.
-
- Graph databases are not required for working with structured data, Natural Language query is a better choice.