Fixed Size Chunking

Fixed size chunking is a method of splitting documents into smaller chunks of a specified size, with optional overlap between chunks. This is useful when you want to process large documents in smaller, manageable pieces.

Create a Python file

1import asyncio
2from kern.agent import Agent
3from kern.knowledge.chunking.fixed import FixedSizeChunking
4from kern.knowledge.knowledge import Knowledge
5from kern.knowledge.reader.pdf_reader import PDFReader
6from kern.vectordb.pgvector import PgVector
7
8db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
9
10knowledge = Knowledge(
11 vector_db=PgVector(table_name="recipes_fixed_size_chunking", db_url=db_url),
12)
13
14asyncio.run(knowledge.ainsert(
15 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
16 reader=PDFReader(
17 name="Fixed Size Chunking Reader",
18 chunking_strategy=FixedSizeChunking(),
19 ),
20))
21agent = Agent(
22 knowledge=knowledge,
23 search_knowledge=True,
24)
25
26agent.print_response("How to make Thai curry?", markdown=True)

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U kern-ai sqlalchemy psycopg pgvector

Run PgVector

1docker run -d \
2 -e POSTGRES_DB=ai \
3 -e POSTGRES_USER=ai \
4 -e POSTGRES_PASSWORD=ai \
5 -e PGDATA=/var/lib/postgresql/data/pgdata \
6 -v pgvolume:/var/lib/postgresql/data \
7 -p 5532:5432 \
8 --name pgvector \
9 kern/pgvector:16

Run the script

1python fixed_size_chunking.py

Fixed Size Chunking Params

ParameterTypeDefaultDescription
chunk_sizeint5000The maximum size of each chunk.
overlapint0The number of characters to overlap between chunks.