Recursive Chunking
Recursive chunking is a method of splitting documents into smaller chunks by recursively applying a chunking strategy. This is useful when you want to process large documents in smaller, manageable pieces.
Create a Python file
1import asyncio2from kern.agent import Agent3from kern.knowledge.chunking.recursive import RecursiveChunking4from kern.knowledge.knowledge import Knowledge5from kern.knowledge.reader.pdf_reader import PDFReader6from kern.vectordb.pgvector import PgVector78db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"910knowledge = Knowledge(11 vector_db=PgVector(table_name="recipes_recursive_chunking", db_url=db_url),12)1314asyncio.run(knowledge.ainsert(15 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",16 reader=PDFReader(17 name="Recursive Chunking Reader",18 chunking_strategy=RecursiveChunking(),19 ),20))2122agent = Agent(23 knowledge=knowledge,24 search_knowledge=True,25)2627agent.print_response("How to make Thai curry?", markdown=True)Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall dependencies
1uv pip install -U kern-ai sqlalchemy psycopg pgvectorRun PgVector
1docker run -d \2 -e POSTGRES_DB=ai \3 -e POSTGRES_USER=ai \4 -e POSTGRES_PASSWORD=ai \5 -e PGDATA=/var/lib/postgresql/data/pgdata \6 -v pgvolume:/var/lib/postgresql/data \7 -p 5532:5432 \8 --name pgvector \9 kern/pgvector:16Run the script
1python recursive_chunking.pyRecursive Chunking Params
| Parameter | Type | Default | Description |
|---|---|---|---|
chunk_size | int | 5000 | The maximum size of each chunk. |
overlap | int | 0 | The number of characters to overlap between chunks. |