Website Reader
The Website Reader crawls and processes entire websites, following links to create comprehensive knowledge bases from web content.
Code
1from kern.knowledge.reader.website_reader import WebsiteReader23reader = WebsiteReader(max_depth=3, max_links=10)45try:6 print("Starting read...")7 documents = reader.read("https://kern.ndx.rocks/introduction")8 if documents:9 for doc in documents:10 print(doc.name)11 print(doc.content)12 print(f"Content length: {len(doc.content)}")13 print("-" * 80)14 else:15 print("No documents were returned")1617except Exception as e:18 print(f"Error type: {type(e)}")19 print(f"Error occurred: {str(e)}")Usage
Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall dependencies
1uv pip install -U requests beautifulsoup4 kern-ai openaiSet environment variables
1export OPENAI_API_KEY=xxxRun Agent
1python examples/basics/knowledge/concepts/readers/overview/web_reader.py1python examples/basics/knowledge/concepts/readers/overview/web_reader.pyParams
| Parameter | Type | Default | Description |
|---|---|---|---|
url | str | Required | URL of the website to crawl and read |
max_depth | int | 3 | Maximum depth level for crawling links |
max_links | int | 10 | Maximum number of links to crawl |
allowed_hosts | Optional[List[str]] | None | Hostnames the reader is allowed to fetch from. See Restricting URL Fetches. |