Fallback Models

Automatically switch to backup models when the primary model hits rate limits, outages, or context window limits.

Pass fallback_models to any Agent or Team. If the primary model fails after exhausting its retries, each fallback is tried in order until one succeeds.

1from kern.agent import Agent
2from kern.models.anthropic import Claude
3from kern.models.openai import OpenAIChat
4
5agent = Agent(
6 model=OpenAIChat(id="gpt-4o"),
7 fallback_models=[Claude(id="claude-sonnet-4-20250514")],
8)

If gpt-4o fails after exhausting its own retries, Claude is tried automatically.

Model strings work too:

1from kern.agent import Agent
2
3agent = Agent(
4 model="openai:gpt-4o",
5 fallback_models=["anthropic:claude-sonnet-4-20250514"],
6)

Usage with Teams

Fallback models apply to the team leader's model calls. Member agents keep their own models and are not affected by the leader's fallback config.

1from kern.agent import Agent
2from kern.models.anthropic import Claude
3from kern.models.openai import OpenAIChat
4from kern.team import Team
5
6researcher = Agent(
7 name="Researcher",
8 role="You research topics and provide detailed findings.",
9 model=OpenAIChat(id="gpt-4o-mini"),
10)
11
12writer = Agent(
13 name="Writer",
14 role="You write clear, concise summaries from research findings.",
15 model=OpenAIChat(id="gpt-4o-mini"),
16)
17
18team = Team(
19 name="Research Team",
20 model=OpenAIChat(id="gpt-4o"),
21 fallback_models=[Claude(id="claude-sonnet-4-20250514")],
22 members=[researcher, writer],
23 markdown=True,
24)

Error-Specific Fallbacks

FallbackConfig lets you route different error types to different fallback models. Instead of a flat list, you specify which models to try for rate limits, context window overflows, and general errors separately.

1from kern.agent import Agent
2from kern.models.fallback import FallbackConfig
3from kern.models.anthropic import Claude
4from kern.models.openai import OpenAIChat
5
6agent = Agent(
7 model=OpenAIChat(id="gpt-4o"),
8 fallback_config=FallbackConfig(
9 # On rate-limit (429/529) errors
10 on_rate_limit=[
11 OpenAIChat(id="gpt-4o-mini"),
12 Claude(id="claude-sonnet-4-20250514"),
13 ],
14 # On context-window-exceeded errors
15 on_context_overflow=[
16 Claude(id="claude-sonnet-4-20250514"),
17 ],
18 # General fallback for any other retryable error
19 on_error=[
20 Claude(id="claude-sonnet-4-20250514"),
21 ],
22 ),
23)

Error routing

When the primary model fails, the error is classified and routed to the matching fallback list:

Error TypeFallback ListExample
Rate limit (429/529)on_rate_limitProvider throttling, Anthropic overloaded
Context window exceededon_context_overflowInput too long for model's context window
Other retryable errorson_errorServer errors (5xx), network failures

If a specific list (like on_rate_limit) is empty, on_error is used as a catch-all.

Non-retryable client errors like 400, 401, 403, 404, and 422 are not caught by fallback. These indicate configuration problems (bad API key, invalid request) that need to be fixed rather than masked by switching models.

Fallback Callback

Use the callback parameter to get notified whenever a fallback model is activated. This is useful for logging, metrics, or alerting.

1from kern.agent import Agent
2from kern.models.fallback import FallbackConfig
3from kern.models.anthropic import Claude
4from kern.models.openai import OpenAIChat
5
6
7def on_fallback(primary_model_id: str, fallback_model_id: str, error: Exception) -> None:
8 print(f"[fallback] {primary_model_id} -> {fallback_model_id} (reason: {error})")
9
10
11agent = Agent(
12 model=OpenAIChat(id="gpt-4o"),
13 fallback_config=FallbackConfig(
14 on_error=[Claude(id="claude-sonnet-4-20250514")],
15 callback=on_fallback,
16 ),
17)

The callback fires after the fallback model succeeds. For streaming calls, it fires after the full stream completes.

Retry vs. Fallback

Retry and fallback are separate layers. Retry happens inside each model. Fallback only triggers after the primary model's retry loop is fully exhausted.

1Primary model
2 _invoke_with_retry() # retries N times (per model config)
3On failure
4 classify error type
5 select matching fallback list
6 try each fallback in order
7 fallback._invoke_with_retry() # each fallback retries independently

Each model controls its own retry behavior:

1agent = Agent(
2 model=OpenAIChat(id="gpt-4o", retries=3, exponential_backoff=True),
3 fallback_models=[
4 Claude(id="claude-sonnet-4-20250514", retries=2),
5 ],
6)

The primary model retries 3 times with exponential backoff. Only after all 3 attempts fail does the fallback kick in, and it gets 2 retries of its own.

Streaming

Fallback works with streaming responses. If the primary model fails mid-stream, the fallback model takes over and the response content is reset so the consumer receives a clean response from the fallback model only.

Parameters

Available on both Agent and Team:

ParameterTypeDescription
fallback_modelsList[Model | str]Models tried in order on any failure. Shorthand for FallbackConfig(on_error=...).
fallback_configFallbackConfigError-specific routing. Takes precedence over fallback_models if both are set.

FallbackConfig

FieldTypeDescription
on_errorList[Model | str]General fallback for any retryable error.
on_rate_limitList[Model | str]Fallback for rate-limit (429/529) errors. Falls back to on_error if empty.
on_context_overflowList[Model | str]Fallback for context-window-exceeded errors. Falls back to on_error if empty.
callbackCallable[[str, str, Exception], None]Called when a fallback model is activated. Receives (primary_model_id, fallback_model_id, error).

Developer Resources