Async Agent with Streaming

Code

1import asyncio
2
3from kern.agent import Agent
4from kern.models.vllm import VLLM
5
6agent = Agent(model=VLLM(id="Qwen/Qwen2.5-7B-Instruct"), markdown=True)
7asyncio.run(agent.aprint_response("Share a 2 sentence horror story", stream=True))

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Install Libraries

1uv pip install -U kern-ai openai vllm

Start vLLM server

1vllm serve Qwen/Qwen2.5-7B-Instruct \
2 --enable-auto-tool-choice \
3 --tool-call-parser hermes \
4 --dtype float16 \
5 --max-model-len 8192 \
6 --gpu-memory-utilization 0.9

Run Agent

1python cookbook/11_models/vllm/async_basic_stream.py