Markdown Chunking
Markdown chunking is a method of splitting documents into smaller chunks of a specified size, with optional overlap between chunks. This is useful when you want to process large documents in smaller, manageable pieces.
Create a Python file
1import asyncio2from kern.agent import Agent3from kern.knowledge.chunking.markdown import MarkdownChunking4from kern.knowledge.knowledge import Knowledge5from kern.knowledge.reader.markdown_reader import MarkdownReader6from kern.vectordb.pgvector import PgVector78db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"910knowledge = Knowledge(11 vector_db=PgVector(table_name="recipes_markdown_chunking", db_url=db_url),12)1314asyncio.run(knowledge.ainsert(15 url="https://github.com/kern-ai/kern/blob/main/README.md",16 reader=MarkdownReader(17 name="Markdown Chunking Reader",18 chunking_strategy=MarkdownChunking(),19 ),20))21agent = Agent(22 knowledge=knowledge,23 search_knowledge=True,24)2526agent.print_response("What is Kern?", markdown=True)Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall dependencies
1uv pip install -U kern-ai sqlalchemy psycopg pgvectorRun PgVector
1docker run -d \2 -e POSTGRES_DB=ai \3 -e POSTGRES_USER=ai \4 -e POSTGRES_PASSWORD=ai \5 -e PGDATA=/var/lib/postgresql/data/pgdata \6 -v pgvolume:/var/lib/postgresql/data \7 -p 5532:5432 \8 --name pgvector \9 kern/pgvector:16Run the script
1python markdown_chunking.pyMarkdown Chunking Params
| Parameter | Type | Default | Description |
|---|---|---|---|
chunk_size | int | 5000 | The maximum size of each chunk. |
overlap | int | 0 | The number of characters to overlap between chunks. |