Website Reader

The Website Reader crawls and processes entire websites, following links to create comprehensive knowledge bases from web content.

Code

1from kern.knowledge.reader.website_reader import WebsiteReader
2
3reader = WebsiteReader(max_depth=3, max_links=10)
4
5try:
6 print("Starting read...")
7 documents = reader.read("https://kern.ndx.rocks/introduction")
8 if documents:
9 for doc in documents:
10 print(doc.name)
11 print(doc.content)
12 print(f"Content length: {len(doc.content)}")
13 print("-" * 80)
14 else:
15 print("No documents were returned")
16
17except Exception as e:
18 print(f"Error type: {type(e)}")
19 print(f"Error occurred: {str(e)}")

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U requests beautifulsoup4 kern-ai openai

Set environment variables

1export OPENAI_API_KEY=xxx

Run Agent

1python examples/basics/knowledge/concepts/readers/overview/web_reader.py
1python examples/basics/knowledge/concepts/readers/overview/web_reader.py

Params

ParameterTypeDefaultDescription
urlstrRequiredURL of the website to crawl and read
max_depthint3Maximum depth level for crawling links
max_linksint10Maximum number of links to crawl
allowed_hostsOptional[List[str]]NoneHostnames the reader is allowed to fetch from. See Restricting URL Fetches.