PDF Reader
The PDF Reader processes PDF files synchronously and converts them into documents that can be used with Kern's knowledge system.
Code
1from kern.agent import Agent2from kern.knowledge.knowledge import Knowledge3from kern.knowledge.reader.pdf_reader import PDFReader4from kern.vectordb.pgvector import PgVector56db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"78# Create a knowledge base with PDF documents9knowledge = Knowledge(10 vector_db=PgVector(11 table_name="pdf_documents",12 db_url=db_url,13 )14)1516# Add PDF content synchronously17knowledge.insert(18 path="cookbook/08_knowledge/testing_resources/cv_1.pdf",19 reader=PDFReader(),20)2122# Create an agent with the knowledge base23agent = Agent(24 knowledge=knowledge,25 search_knowledge=True,26)2728# Query the knowledge base29agent.print_response(30 "What skills does an applicant require to apply for the Software Engineer position?",31 markdown=True,32)Usage
Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall dependencies
1uv pip install -U pypdf sqlalchemy psycopg pgvector kern-ai openaiSet environment variables
1export OPENAI_API_KEY=xxxRun PgVector
1docker run -d \2 -e POSTGRES_DB=ai \3 -e POSTGRES_USER=ai \4 -e POSTGRES_PASSWORD=ai \5 -e PGDATA=/var/lib/postgresql/data/pgdata \6 -v pgvolume:/var/lib/postgresql/data \7 -p 5532:5432 \8 --name pgvector \9 kern/pgvector:16Run Agent
1python examples/basics/knowledge/concepts/readers/overview/pdf_reader_sync.py1python examples/basics/knowledge/concepts/readers/overview/pdf_reader_sync.pyParams
| Parameter | Type | Default | Description |
|---|---|---|---|
path | Path | Required | Path to PDF file or URL |
split_on_pages | bool | True | Split the PDF into pages |
page_start_numbering_format | Optional[str] | None | Format for page numbering |
page_end_numbering_format | Optional[str] | None | Format for page numbering |
password | Optional[str] | None | Password to unlock the PDF |