Agent with Streaming

Code

1from kern.agent import Agent
2from kern.models.vllm import VLLM
3
4agent = Agent(
5    model=VLLM(id="Qwen/Qwen2.5-7B-Instruct", top_k=20, enable_thinking=False),
6    markdown=True,
7)
8agent.print_response("Share a 2 sentence horror story", stream=True)

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate

1uv venv --python 3.12
2.venv\Scripts\activate

Install Libraries

1uv pip install -U kern-ai openai vllm

Start vLLM server

1vllm serve Qwen/Qwen2.5-7B-Instruct \
2    --enable-auto-tool-choice \
3    --tool-call-parser hermes \
4    --dtype float16 \
5    --max-model-len 8192 \
6    --gpu-memory-utilization 0.9

Run Agent

1python cookbook/11_models/vllm/basic_stream.py