Hi MattGohmann, thanks for reaching out! To clarify, max_tokens is about the maximum number of tokens to allow for each generated response message. In other words, max_tokens represents the maximum number of output tokens, and is set within AI21's container to max_tokens = 4096. Having a context window of 256K is the max tokens for input, which is why the model is perfect for long-context RAG applications: Understanding Large Language Models Context Windows | Appen, Context Window (LLMs) — Klu
This might not be entirely clear within our or AI21's docs, so I've flagged the necessary changes within both sites - thanks for inspiring this change!
How to deploy AI21's Jamba family models with Azure AI Studio - Azure AI Studio | Microsoft Learn, Jamba 1.5 (ai21.com)