Production Best Practices
Avoid common pitfalls, optimize costs, and ensure reliable memory behavior in production.
Memory is powerful, but without careful configuration, it can lead to unexpected token consumption, behavioral issues, and high costs. This guide shows you what to watch out for and how to optimize your memory usage for production.
Quick Reference
- Default to automatic memory (
update_memory_on_run=True) unless you have a specific reason for agentic control - Always provide user_id, don't rely on the default "default" user
- Use cheaper models for memory operations when using agentic memory
- Implement pruning for long-running applications
- Monitor token usage in production to catch memory-related cost spikes
- Test with realistic data: 100+ memories behave very differently than 5 memories
The Agentic Memory Token Trap
The Problem: When you use enable_agentic_memory=True, every memory operation triggers a separate, nested LLM call. This architecture can cause token usage to explode, especially as memories accumulate.
Here's what happens under the hood:
- User sends a message → Main LLM call processes it
- Agent decides to update memory → Calls
update_user_memorytool - Nested LLM call fires with:
- Detailed system prompt (~50 lines)
- ALL existing user memories loaded into context
- Memory management instructions and tools
- Memory LLM makes tool calls (add, update, delete)
- Control returns to main conversation
Real-world impact:
1# Scenario: User with 100 existing memories2agent = Agent(3 db=db,4 enable_agentic_memory=True,5 model=OpenAIResponses(id="gpt-5.2")6)78# 10-message conversation where agent updates memory 7 times:9# Normal conversation: 10 × 500 tokens = 5,000 tokens10# With agentic memory: (10 × 500) + (7 × 5,000) = 40,000 tokens11# Cost increase: 8x more expensive!As memories accumulate, each memory operation gets more expensive. With 200 memories, a single memory update could consume 10,000+ tokens just loading context.
Mitigation Strategy #1: Use Automatic Memory
For most use cases, automatic memory is your best bet—it's significantly more efficient:
1# Recommended: Single memory processing after conversation2agent = Agent(3 db=db,4 update_memory_on_run=True # Processes memories once at end5)67# Only use agentic memory when you specifically need:8# - Real-time memory updates during conversation9# - User-directed memory commands ("forget my address")10# - Complex memory reasoning within the conversation flowMitigation Strategy #2: Use a Cheaper Model for Memory Operations
If you do need agentic memory, use a less expensive model for memory management while keeping a powerful model for conversation:
1from kern.memory import MemoryManager2from kern.models.openai import OpenAIResponses34# Cheap model for memory operations (60x less expensive)5memory_manager = MemoryManager(6 db=db,7 model=OpenAIResponses(id="gpt-5.2")8)910# Expensive model for main conversations11agent = Agent(12 db=db,13 model=OpenAIResponses(id="gpt-5.2"),14 memory_manager=memory_manager,15 enable_agentic_memory=True16)This approach can reduce memory-related costs by 98% while maintaining conversation quality.
Mitigation Strategy #3: Guide Memory Behavior with Instructions
Add explicit instructions to prevent frivolous memory updates:
1agent = Agent(2 db=db,3 enable_agentic_memory=True,4 instructions=[5 "Only update memories when users share significant new information.",6 "Don't create memories for casual conversation or temporary states.",7 "Batch multiple memory updates together when possible."8 ]9)Mitigation Strategy #4: Implement Memory Pruning
Prevent memory bloat by periodically cleaning up old or irrelevant memories:
1from datetime import datetime, timedelta23def prune_old_memories(db, user_id, days=90):4 """Remove memories older than 90 days"""5 cutoff_timestamp = int((datetime.now() - timedelta(days=days)).timestamp())6 7 memories = db.get_user_memories(user_id=user_id)8 for memory in memories:9 if memory.updated_at and memory.updated_at < cutoff_timestamp:10 db.delete_user_memory(memory_id=memory.memory_id)1112# Run periodically or before high-cost operations13prune_old_memories(db, user_id="john_doe@example.com")Mitigation Strategy #5: Set Tool Call Limits
Prevent runaway memory operations by limiting tool calls per conversation:
1agent = Agent(2 db=db,3 enable_agentic_memory=True,4 tool_call_limit=5 # Prevents excessive memory operations5)Common Pitfalls
The user_id Pitfall
The Problem: Forgetting to set user_id causes all memories to default to user_id="default", mixing different users' memories together.
1# ❌ Bad: All users share the same memories2agent.print_response("I love pizza")3agent.print_response("I'm allergic to dairy")45# ✅ Good: Each user has isolated memories6agent.print_response("I love pizza", user_id="user_123")7agent.print_response("I'm allergic to dairy", user_id="user_456")Best practice: Always pass user_id explicitly, especially in multi-user applications.
The Double-Enable Pitfall
The Problem: Using both update_memory_on_run=True and enable_agentic_memory=True doesn't give you both—agentic mode overrides automatic mode.
1# ❌ Doesn't work as expected - automatic memory is disabled2agent = Agent(3 db=db,4 update_memory_on_run=True,5 enable_agentic_memory=True # This disables automatic behavior6)78# ✅ Choose one approach9agent = Agent(db=db, update_memory_on_run=True) # Automatic10# OR11agent = Agent(db=db, enable_agentic_memory=True) # AgenticMemory Growth Monitoring
Track memory counts to catch issues early:
1from kern.agent import Agent23agent = Agent(db=db, update_memory_on_run=True)45# Check memory count for a user6memories = agent.get_user_memories(user_id="user_123")7print(f"User has {len(memories)} memories")89# Alert if memory count is unusually high10if len(memories) > 500:11 print("⚠️ Warning: User has excessive memories. Consider pruning.")