Prompt caching can cut costs by ~90% for workloads with repeated system prompts (RAG, long-context chat, structured workflows). Ranked by cached-input price ascending so you can see the actual savings rate per model.
🔔
Depend on one of these models?
Watchlist alerts are coming soon — pick the models you rely on and we'll email you the
moment their price, capabilities, or lifecycle change. Reserve a spot and we'll let you
know when it's live.
Active models offering prompt caching with a published cached-input price. Ranked from cheapest cached-input cost upward — that's the rate you actually pay on cache hits, which is where the savings live.
Refreshed as we re-verify each model against its provider's official docs. Every value links to the page it came from.
Last updated Jun 3, 2026 20:23 UTC.
We use cookies to enhance your browsing experience and analyze site traffic.
By continuing to use our site, you consent to our use of cookies.
Learn more