LLMs.txt Reader

The LLMs.txt Reader reads an llms.txt file, follows the linked documentation pages, and turns them into documents for your knowledge base.

Code

1from kern.agent import Agent
2from kern.knowledge.knowledge import Knowledge
3from kern.knowledge.reader.llms_txt_reader import LLMsTxtReader
4from kern.vectordb.pgvector import PgVector
5
6db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
7
8knowledge = Knowledge(
9 name="LLMs.txt Docs",
10 vector_db=PgVector(table_name="llms_txt_docs", db_url=db_url),
11)
12
13knowledge.insert(
14 url="https://kern.ndx.rocks/llms.txt",
15 reader=LLMsTxtReader(max_urls=10),
16)
17
18agent = Agent(
19 knowledge=knowledge,
20 search_knowledge=True,
21)
22
23agent.print_response("What is Kern?", markdown=True)

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U beautifulsoup4 sqlalchemy psycopg pgvector kern-ai openai

Set environment variables

1export OPENAI_API_KEY=xxx

Run PgVector

1docker run -d \
2 -e POSTGRES_DB=ai \
3 -e POSTGRES_USER=ai \
4 -e POSTGRES_PASSWORD=ai \
5 -e PGDATA=/var/lib/postgresql/data/pgdata \
6 -v pgvolume:/var/lib/postgresql/data \
7 -p 5532:5432 \
8 --name pgvector \
9 kern/pgvector:16

Run Agent

1python examples/basics/knowledge/concepts/readers/overview/llms_txt_reader.py

Params

ParameterTypeDefaultDescription
urlstrRequiredURL of the llms.txt file to read
max_urlsint20Maximum number of linked URLs to fetch from the file
timeoutint60HTTP timeout in seconds
proxyOptional[str]NoneOptional HTTP proxy URL
skip_optionalboolFalseSkip entries under the ## Optional section
chunking_strategyOptional[ChunkingStrategy]FixedSizeChunking()Strategy for chunking content
allowed_hostsOptional[List[str]]NoneHostnames the reader is allowed to fetch from. See Restricting URL Fetches.