Docling Reader
The Docling Reader processes multiple document formats using IBM's Docling library. It handles PDFs, documents, presentations, spreadsheets, images, audio, video and markup files.
Code
1from kern.agent import Agent2from kern.knowledge.knowledge import Knowledge3from kern.knowledge.reader.docling_reader import DoclingReader4from kern.vectordb.pgvector import PgVector56db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"78# Create a knowledge base with docling reader9knowledge = Knowledge(10 vector_db=PgVector(11 table_name="docling_documents",12 db_url=db_url,13 )14)1516# Add documents using DoclingReader17knowledge.insert(18 path="documents/report.pdf",19 reader=DoclingReader(),20)2122# Create an agent with the knowledge base23agent = Agent(24 knowledge=knowledge,25 search_knowledge=True,26)2728# Query the knowledge base29agent.print_response(30 "Summarize the key findings from the report",31 markdown=True,32)Usage
Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall dependencies
1# Base dependencies2uv pip install -U docling sqlalchemy psycopg pgvector kern-ai openai34# For audio/video processing5uv pip install -U openai-whisperInstall ffmpeg (required for audio/video processing):
- macOS:
brew install ffmpeg - Ubuntu:
sudo apt-get install ffmpeg - Windows: Download from https://ffmpeg.org/download.html
Set environment variables
1export OPENAI_API_KEY=xxxRun PgVector
1docker run -d \2 -e POSTGRES_DB=ai \3 -e POSTGRES_USER=ai \4 -e POSTGRES_PASSWORD=ai \5 -e PGDATA=/var/lib/postgresql/data/pgdata \6 -v pgvolume:/var/lib/postgresql/data \7 -p 5532:5432 \8 --name pgvector \9 kern/pgvector:16Run Agent
1python examples/basics/knowledge/concepts/readers/overview/docling_reader_sync.py1python examples/basics/knowledge/concepts/readers/overview/docling_reader_sync.pyParams
| Parameter | Type | Default | Description |
|---|---|---|---|
output_format | str | "markdown" | Export format ("markdown", "text", "json", "yaml", "html", "html_split_page", "doctags", "vtt") |
converter | Optional[DocumentConverter] | None | Custom Docling converter configuration |
format_options | Optional[dict] | None | Format options dictionary for DocumentConverter |
chunking_strategy | Optional[ChunkingStrategy] | DocumentChunking() | Strategy for chunking the document |
allowed_hosts | Optional[List[str]] | None | Hostnames the reader is allowed to fetch from. See Restricting URL Fetches. |