Prompt Truncation

LLMs come with limitations; specifically, they can only handle so much text as input. This means that you will often need to figure out which document sections and chat history elements to keep, and which ones to omit.

To make this easier, the Chat API comes with a helpful prompt_truncation parameter. When prompt_truncation is set to AUTO, the API will automatically break up the documents into smaller chunks, rerank the chunks and drop the minimum required number of the least relevant documents in order to stay within the model's context length limit.

Note: The last few messages in the chat history will never be truncated or dropped. The RAG API will throw a 400 Too Many Tokens error if it can't fit those messages along with a single document under the context limit.