Agentic Chunking
Agentic chunking is an intelligent method of splitting documents into smaller chunks by using a model to determine natural breakpoints in the text. Rather than splitting text at fixed character counts, it analyzes the content to find semantically meaningful boundaries like paragraph breaks and topic transitions.
Create a Python file
1import asyncio2from kern.agent import Agent3from kern.knowledge.chunking.agentic import AgenticChunking4from kern.knowledge.knowledge import Knowledge5from kern.knowledge.reader.pdf_reader import PDFReader6from kern.vectordb.pgvector import PgVector78db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"910knowledge = Knowledge(11 vector_db=PgVector(table_name="recipes_agentic_chunking", db_url=db_url),12)1314asyncio.run(knowledge.ainsert(15 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",16 reader=PDFReader(17 name="Agentic Chunking Reader",18 chunking_strategy=AgenticChunking(),19 ),20))2122agent = Agent(23 knowledge=knowledge,24 search_knowledge=True,25)2627agent.print_response("How to make Thai curry?", markdown=True)Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall dependencies
1uv pip install -U kern-ai sqlalchemy psycopg pgvectorRun PgVector
1docker run -d \2 -e POSTGRES_DB=ai \3 -e POSTGRES_USER=ai \4 -e POSTGRES_PASSWORD=ai \5 -e PGDATA=/var/lib/postgresql/data/pgdata \6 -v pgvolume:/var/lib/postgresql/data \7 -p 5532:5432 \8 --name pgvector \9 kern/pgvector:16Run the script
1python agentic_chunking.pyCustom Prompts
1AgenticChunking(2 custom_prompt="Split at major section boundaries. Keep complete clauses together.",3 max_chunk_size=3000,4)Info
Custom prompts override default chunking behavior and are prioritized over default instructions.
Tip
Best Practices:
- Always set
max_chunk_sizewhen usingcustom_prompt. - Focus
custom_prompton chunking logic only. - The default instructions automatically handle the output format constraints.
See Agentic Chunking with Custom Prompt for a complete example.
Agentic Chunking Params
| Parameter | Type | Default | Description |
|---|---|---|---|
model | Model | OpenAIChat | The model to use for chunking. |
max_chunk_size | int | 5000 | The maximum size of each chunk. |
custom_prompt | str | None | Allows personalized instructions to determine chunk breakpoints. |