Agentic Chunking

Agentic chunking is an intelligent method of splitting documents into smaller chunks by using a model to determine natural breakpoints in the text. Rather than splitting text at fixed character counts, it analyzes the content to find semantically meaningful boundaries like paragraph breaks and topic transitions.

Create a Python file

1import asyncio
2from kern.agent import Agent
3from kern.knowledge.chunking.agentic import AgenticChunking
4from kern.knowledge.knowledge import Knowledge
5from kern.knowledge.reader.pdf_reader import PDFReader
6from kern.vectordb.pgvector import PgVector
7
8db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
9
10knowledge = Knowledge(
11 vector_db=PgVector(table_name="recipes_agentic_chunking", db_url=db_url),
12)
13
14asyncio.run(knowledge.ainsert(
15 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
16 reader=PDFReader(
17 name="Agentic Chunking Reader",
18 chunking_strategy=AgenticChunking(),
19 ),
20))
21
22agent = Agent(
23 knowledge=knowledge,
24 search_knowledge=True,
25)
26
27agent.print_response("How to make Thai curry?", markdown=True)

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U kern-ai sqlalchemy psycopg pgvector

Run PgVector

1docker run -d \
2 -e POSTGRES_DB=ai \
3 -e POSTGRES_USER=ai \
4 -e POSTGRES_PASSWORD=ai \
5 -e PGDATA=/var/lib/postgresql/data/pgdata \
6 -v pgvolume:/var/lib/postgresql/data \
7 -p 5532:5432 \
8 --name pgvector \
9 kern/pgvector:16

Run the script

1python agentic_chunking.py

Custom Prompts

1AgenticChunking(
2 custom_prompt="Split at major section boundaries. Keep complete clauses together.",
3 max_chunk_size=3000,
4)
Info

Custom prompts override default chunking behavior and are prioritized over default instructions.

Tip

Best Practices:

  • Always set max_chunk_size when using custom_prompt.
  • Focus custom_prompt on chunking logic only.
  • The default instructions automatically handle the output format constraints.

See Agentic Chunking with Custom Prompt for a complete example.

Agentic Chunking Params

ParameterTypeDefaultDescription
modelModelOpenAIChatThe model to use for chunking.
max_chunk_sizeint5000The maximum size of each chunk.
custom_promptstrNoneAllows personalized instructions to determine chunk breakpoints.