Increase Ollama models Context Length Beyond 2048

If you're experiencing context loss with the mistral-nemo or llama3 models in Ollama, it's likely due to the default context length being set to 2048 tokens. To utilize the models' full context window, you can adjust the num_ctx parameter. Here's how:

Create a Custom Model with an Increased Context Window:

Export the Current Modelfile:

ollama show --modelfile mistral-nemo > Modelfile

Edit the Modelfile:

Open the Modelfile in a text editor. Locate the line starting with FROM and ensure it reads:

FROM mistral-nemo

Add a new line:

PARAMETER num_ctx 120000

Create the New Model:

ollama create -f Modelfile mistral-nemo-120k

Verify the Context Length:

ollama show mistral-nemo-120k

Ensure num_ctx is set to 120000 under the Parameters section.

Adjust the Context Window for API Calls:

When making API requests, specify the desired context length in the options parameter:

{ "model": "mistral-nemo-120k", "messages": [ {"role": "system", "content": "Your system message"}, {"role": "user", "content": "Your user message"} ], "options": { "num_ctx": 120000 } }

Note: The OpenAI-compatible endpoints do not support setting the context size directly. To use a larger context window with these endpoints, create a custom model as described above.

By following these steps, you can prevent context loss and fully utilize the extended context windows of the mistral-nemo and llama3 models in Ollama.

How to increase Ollama models' context length from 2048