Fallback Models
Automatically switch to backup models when the primary model hits rate limits, outages, or context window limits.
Pass fallback_models to any Agent or Team. If the primary model fails after exhausting its retries, each fallback is tried in order until one succeeds.
1from kern.agent import Agent2from kern.models.anthropic import Claude3from kern.models.openai import OpenAIChat45agent = Agent(6 model=OpenAIChat(id="gpt-4o"),7 fallback_models=[Claude(id="claude-sonnet-4-20250514")],8)If gpt-4o fails after exhausting its own retries, Claude is tried automatically.
Model strings work too:
1from kern.agent import Agent23agent = Agent(4 model="openai:gpt-4o",5 fallback_models=["anthropic:claude-sonnet-4-20250514"],6)Usage with Teams
Fallback models apply to the team leader's model calls. Member agents keep their own models and are not affected by the leader's fallback config.
1from kern.agent import Agent2from kern.models.anthropic import Claude3from kern.models.openai import OpenAIChat4from kern.team import Team56researcher = Agent(7 name="Researcher",8 role="You research topics and provide detailed findings.",9 model=OpenAIChat(id="gpt-4o-mini"),10)1112writer = Agent(13 name="Writer",14 role="You write clear, concise summaries from research findings.",15 model=OpenAIChat(id="gpt-4o-mini"),16)1718team = Team(19 name="Research Team",20 model=OpenAIChat(id="gpt-4o"),21 fallback_models=[Claude(id="claude-sonnet-4-20250514")],22 members=[researcher, writer],23 markdown=True,24)Error-Specific Fallbacks
FallbackConfig lets you route different error types to different fallback models. Instead of a flat list, you specify which models to try for rate limits, context window overflows, and general errors separately.
1from kern.agent import Agent2from kern.models.fallback import FallbackConfig3from kern.models.anthropic import Claude4from kern.models.openai import OpenAIChat56agent = Agent(7 model=OpenAIChat(id="gpt-4o"),8 fallback_config=FallbackConfig(9 # On rate-limit (429/529) errors10 on_rate_limit=[11 OpenAIChat(id="gpt-4o-mini"),12 Claude(id="claude-sonnet-4-20250514"),13 ],14 # On context-window-exceeded errors15 on_context_overflow=[16 Claude(id="claude-sonnet-4-20250514"),17 ],18 # General fallback for any other retryable error19 on_error=[20 Claude(id="claude-sonnet-4-20250514"),21 ],22 ),23)Error routing
When the primary model fails, the error is classified and routed to the matching fallback list:
| Error Type | Fallback List | Example |
|---|---|---|
| Rate limit (429/529) | on_rate_limit | Provider throttling, Anthropic overloaded |
| Context window exceeded | on_context_overflow | Input too long for model's context window |
| Other retryable errors | on_error | Server errors (5xx), network failures |
If a specific list (like on_rate_limit) is empty, on_error is used as a catch-all.
Non-retryable client errors like 400, 401, 403, 404, and 422 are not caught by fallback. These indicate configuration problems (bad API key, invalid request) that need to be fixed rather than masked by switching models.
Fallback Callback
Use the callback parameter to get notified whenever a fallback model is activated. This is useful for logging, metrics, or alerting.
1from kern.agent import Agent2from kern.models.fallback import FallbackConfig3from kern.models.anthropic import Claude4from kern.models.openai import OpenAIChat567def on_fallback(primary_model_id: str, fallback_model_id: str, error: Exception) -> None:8 print(f"[fallback] {primary_model_id} -> {fallback_model_id} (reason: {error})")91011agent = Agent(12 model=OpenAIChat(id="gpt-4o"),13 fallback_config=FallbackConfig(14 on_error=[Claude(id="claude-sonnet-4-20250514")],15 callback=on_fallback,16 ),17)The callback fires after the fallback model succeeds. For streaming calls, it fires after the full stream completes.
Retry vs. Fallback
Retry and fallback are separate layers. Retry happens inside each model. Fallback only triggers after the primary model's retry loop is fully exhausted.
1Primary model2 └── _invoke_with_retry() # retries N times (per model config)3On failure4 └── classify error type5 └── select matching fallback list6 └── try each fallback in order7 └── fallback._invoke_with_retry() # each fallback retries independentlyEach model controls its own retry behavior:
1agent = Agent(2 model=OpenAIChat(id="gpt-4o", retries=3, exponential_backoff=True),3 fallback_models=[4 Claude(id="claude-sonnet-4-20250514", retries=2),5 ],6)The primary model retries 3 times with exponential backoff. Only after all 3 attempts fail does the fallback kick in, and it gets 2 retries of its own.
Streaming
Fallback works with streaming responses. If the primary model fails mid-stream, the fallback model takes over and the response content is reset so the consumer receives a clean response from the fallback model only.
Parameters
Available on both Agent and Team:
| Parameter | Type | Description |
|---|---|---|
fallback_models | List[Model | str] | Models tried in order on any failure. Shorthand for FallbackConfig(on_error=...). |
fallback_config | FallbackConfig | Error-specific routing. Takes precedence over fallback_models if both are set. |
FallbackConfig
| Field | Type | Description |
|---|---|---|
on_error | List[Model | str] | General fallback for any retryable error. |
on_rate_limit | List[Model | str] | Fallback for rate-limit (429/529) errors. Falls back to on_error if empty. |
on_context_overflow | List[Model | str] | Fallback for context-window-exceeded errors. Falls back to on_error if empty. |
callback | Callable[[str, str, Exception], None] | Called when a fallback model is activated. Receives (primary_model_id, fallback_model_id, error). |