Tool Series - Natralang

Tool Series - Natralang: Natural Language Query for MongoDB

Introduction

Natralang is a Natural Language Query (NQL) project that aims to revolutionize the way we interact with databases, specifically MongoDB. Natralang allows users to query structured data from databases using natural language, eliminating the need to learn SQL or MQL. In this blog post, we will explore the key features of Natralang, its functionality, and how it can enhance your work with MongoDB.

What is Natralang?

Natralang is a tool designed to query Mongo collections and mimic the functionality of MongoDB Compass. It is a showcase of what is possible using Natural Language Query, but it would require some work to make it production-ready. The project can be found on GitHub at https://github.com/patw/Natralang.

How does Natralang Work?

Natralang utilizes a combination of semantic search and vectors to enable users to query data using natural language. This approach allows Natralang to work effectively with databases that have a large number of collections, without overwhelming the LLM (Language Model) context window with excessive schema or example data.

Each data source in Natralang can have a different connection string, allowing it to work across multiple Mongo instances. The reliability of the query generation is approximately 80% for simple to moderately complex queries, but it can decrease significantly for more complex queries. Improved LLM models could potentially enhance this aspect.

Benefits of Using Natralang

  1. User-friendly: Natralang eliminates the need to learn SQL or MQL, making it easier for users to interact with databases using natural language.
  2. Efficient: The semantic search and vector-based approach allows Natralang to work effectively with databases containing a large number of collections.
  3. Flexible: Natralang can connect to multiple MongoDB instances using different connection strings.
  4. Accessible: The project is openly available on GitHub, encouraging collaboration and innovation in the community.

Limitations

While Natralang offers several benefits, it currently serves as a showcase for what is possible rather than a fully production-capable tool. Users should consider this when incorporating Natralang into their projects.

Conclusion

Natralang is a promising tool that demonstrates the potential of Natural Language Query in the realm of MongoDB. By utilizing semantic search and vectors, Natralang enables users to interact with databases using natural language, simplifying the process and reducing the need for specialized knowledge. While currently a work in progress, the project holds great potential for future development and innovation. We encourage the community to explore Natralang and contribute to its growth on GitHub.


Resources

  • Human Intervention: It tried to put a fake wikipedia article link at the bottom!

Facts Used:

    • Natralang is short for Natural Language Query (NQL)
    • NQL is a popular style of chatbot that is used to query structured data from databases.
    • In this case, Natralang is designed to query Mongo collections to approximate the functionality we have in the MongoDB Compass product.
    • Natural Language Query is using an LLM to parse a query like “What were my sales like last quarter?”, along with the schema for your database and some example data (few shot learning) to produce a query in your data engine that will answer the question. You execute the query, and then feed the results back to the LLM, along with the original question to get an natural language answer to a natural language question. No need to learn SQL or MQL.
    • Natralang uses semantic search and vectors to allow the initial query to find the most appropriate data set that could answer the question. This allows for databases with a large number of collections to be used effectively, without polluting the LLM context window with too much collection schema/examples.
    • Each data source can have a different connection string, allowing it to work across many mongo instances.
    • The reliability of the query generation is close to 80% for simple to moderately complex queries but can drop drastically for complex queries. Better LLM models could help here.
    • It’s a showcase for what is possible, but would need some work to be production capable.