Recursive Chunking

Recursive chunking is a method of splitting documents into smaller chunks by recursively applying a chunking strategy. This is useful when you want to process large documents in smaller, manageable pieces.

Create a Python file

1import asyncio
2from kern.agent import Agent
3from kern.knowledge.chunking.recursive import RecursiveChunking
4from kern.knowledge.knowledge import Knowledge
5from kern.knowledge.reader.pdf_reader import PDFReader
6from kern.vectordb.pgvector import PgVector
7
8db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
9
10knowledge = Knowledge(
11 vector_db=PgVector(table_name="recipes_recursive_chunking", db_url=db_url),
12)
13
14asyncio.run(knowledge.ainsert(
15 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
16 reader=PDFReader(
17 name="Recursive Chunking Reader",
18 chunking_strategy=RecursiveChunking(),
19 ),
20))
21
22agent = Agent(
23 knowledge=knowledge,
24 search_knowledge=True,
25)
26
27agent.print_response("How to make Thai curry?", markdown=True)

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U kern-ai sqlalchemy psycopg pgvector

Run PgVector

1docker run -d \
2 -e POSTGRES_DB=ai \
3 -e POSTGRES_USER=ai \
4 -e POSTGRES_PASSWORD=ai \
5 -e PGDATA=/var/lib/postgresql/data/pgdata \
6 -v pgvolume:/var/lib/postgresql/data \
7 -p 5532:5432 \
8 --name pgvector \
9 kern/pgvector:16

Run the script

1python recursive_chunking.py

Recursive Chunking Params

ParameterTypeDefaultDescription
chunk_sizeint5000The maximum size of each chunk.
overlapint0The number of characters to overlap between chunks.