PDF Reader

The PDF Reader processes PDF files synchronously and converts them into documents that can be used with Kern's knowledge system.

Code

1from kern.agent import Agent
2from kern.knowledge.knowledge import Knowledge
3from kern.knowledge.reader.pdf_reader import PDFReader
4from kern.vectordb.pgvector import PgVector
5
6db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
7
8# Create a knowledge base with PDF documents
9knowledge = Knowledge(
10 vector_db=PgVector(
11 table_name="pdf_documents",
12 db_url=db_url,
13 )
14)
15
16# Add PDF content synchronously
17knowledge.insert(
18 path="cookbook/08_knowledge/testing_resources/cv_1.pdf",
19 reader=PDFReader(),
20)
21
22# Create an agent with the knowledge base
23agent = Agent(
24 knowledge=knowledge,
25 search_knowledge=True,
26)
27
28# Query the knowledge base
29agent.print_response(
30 "What skills does an applicant require to apply for the Software Engineer position?",
31 markdown=True,
32)

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U pypdf sqlalchemy psycopg pgvector kern-ai openai

Set environment variables

1export OPENAI_API_KEY=xxx

Run PgVector

1docker run -d \
2 -e POSTGRES_DB=ai \
3 -e POSTGRES_USER=ai \
4 -e POSTGRES_PASSWORD=ai \
5 -e PGDATA=/var/lib/postgresql/data/pgdata \
6 -v pgvolume:/var/lib/postgresql/data \
7 -p 5532:5432 \
8 --name pgvector \
9 kern/pgvector:16

Run Agent

1python examples/basics/knowledge/concepts/readers/overview/pdf_reader_sync.py
1python examples/basics/knowledge/concepts/readers/overview/pdf_reader_sync.py

Params

ParameterTypeDefaultDescription
pathPathRequiredPath to PDF file or URL
split_on_pagesboolTrueSplit the PDF into pages
page_start_numbering_formatOptional[str]NoneFormat for page numbering
page_end_numbering_formatOptional[str]NoneFormat for page numbering
passwordOptional[str]NonePassword to unlock the PDF