LLM Prompting Strategy

LLM Prompting Strategy: Enhancing Accuracy and Overcoming Guard Rails

In the ever-evolving landscape of natural language processing, Large Language Models (LLMs) have emerged as powerful tools for generating human-like responses to a wide range of questions. To harness their full potential, it is crucial to implement an effective LLM prompting strategy that maximizes accuracy and overcomes guard rail limitations. This blog post will delve into the best practices for prompt engineering and provide insights on how to bypass guard rails in sensitive domains like healthcare or law.

The Basics of LLM Prompting

Most prompts to an LLM follow a pattern such as: “Can you answer the following question ‘’ based on the text below: ”. While this is a solid starting point, it may not always yield optimal results. To achieve higher accuracy, prompt engineering techniques should be employed to tailor the questions and chunks of data provided to the LLM.

The Role of Chunks in Prompting

To improve the likelihood of obtaining an answer, it is recommended to send multiple chunks of data as part of the input. This approach allows for a more comprehensive understanding of the context and increases the chances of finding relevant information within the text. Current best practices suggest using 3-10 chunks per prompt.

Guard Rails and Overcoming Them

One challenge faced by LLM users is guard rail limitations that prevent certain types of questions from being answered. In sensitive domains such as healthcare or law, these restrictions can be particularly problematic. To circumvent these blockers, creative prompting strategies must be employed. Open-source LLMs offer unguard models capable of generating more sensitive responses when traditional approaches fail.

The Role of Documentation and URLs

When crafting an LLM prompt, there is no need to include the URL for the associated documentation (usually HTML or PDF links) as part of the input. LLMs may ignore these links regardless of their inclusion in the prompt. To ensure all relevant information is provided, it is advisable to store the URLs in a database like Mongo and append them to the final response generated by the LLM. This method mirrors the approach taken by platforms such as Bing Chat.

In conclusion, an effective LLM prompting strategy hinges on understanding the intricacies of prompt engineering, the benefits of using multiple chunks of data, overcoming guard rail limitations, and managing documentation URLs. By leveraging these techniques, users can unlock the full potential of LLMs and generate more accurate and useful responses for their specific needs.

  • Human Intervention: None

Facts Used:

  • Most prompts to the LLM (large language model) will follow a pattern like this: “Can you answer the following question ‘ ’ based on the text below: ”. This is a good starting place but you will most likely need to do prompt engineering to get the best possible result. Use the API or UI for your tool set to take a known question/chunk(s) pair and see what changes to the prompt (ie. Can you answer the following healthcare question) can result in higher accuracy for the response.
  • You can send more than one chunk of data, as long as it fits in the token limit of the LLM. Sending more chunks means it’s more likely to have an answer to the question. The current best practice is 3-10 chunks.
  • Be aware that the guard rails on the LLM could prevent some questions from being answered. This can be a serious problem for healthcare or legal use cases, as the LLM will try to prevent producing responses for these style of questions. This may require some creative prompting strategies to bypass these blockers. Alternatively, the open source LLMs have a selection of unguarded models that can be used to generate more sensitive responses.
  • You don’t need to provide the URL for the documentation (usually HTML or PDF links) as part of the prompt, there’s a high chance that even if you ask for it to be provided as part of the response, the LLM will ignore it. The URL for the documentation can be stored in the mongo collection and can be appended to the LLM response. This is similar to how Bing Chat works.