Agent with Streaming
Code
1from kern.agent import Agent2from kern.models.vllm import VLLM34agent = Agent(5 model=VLLM(id="Qwen/Qwen2.5-7B-Instruct", top_k=20, enable_thinking=False),6 markdown=True,7)8agent.print_response("Share a 2 sentence horror story", stream=True)Usage
Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall Libraries
1uv pip install -U kern-ai openai vllmStart vLLM server
1vllm serve Qwen/Qwen2.5-7B-Instruct \2 --enable-auto-tool-choice \3 --tool-call-parser hermes \4 --dtype float16 \5 --max-model-len 8192 \6 --gpu-memory-utilization 0.9Run Agent
1python cookbook/11_models/vllm/basic_stream.py