Crawl4AI

Crawl4aiTools enable an Agent to perform web crawling and scraping tasks using the Crawl4ai library.

Prerequisites

The following example requires the crawl4ai library.

1uv pip install -U crawl4ai

Example

The following agent will scrape the content from the https://github.com/kern-ai/kern webpage:

1from kern.agent import Agent
2from kern.tools.crawl4ai import Crawl4aiTools
3
4agent = Agent(tools=[Crawl4aiTools(max_length=None)])
5agent.print_response("Tell me about https://github.com/kern-ai/kern.")

Toolkit Params

Parameter	Type	Default	Description
`max_length`	`int`	`1000`	Specifies the maximum length of the text from the webpage to be returned.
`timeout`	`int`	`60`	Timeout in seconds for web crawling operations.
`use_pruning`	`bool`	`False`	Enable content pruning to remove less relevant content.
`pruning_threshold`	`float`	`0.48`	Threshold for content pruning relevance scoring.
`bm25_threshold`	`float`	`1.0`	BM25 scoring threshold for content relevance.
`headless`	`bool`	`True`	Run browser in headless mode.
`wait_until`	`str`	`"domcontentloaded"`	Browser wait condition before crawling (e.g., "domcontentloaded", "load", "networkidle").
`enable_crawl`	`bool`	`True`	Enable the web crawling functionality.
`all`	`bool`	`False`	Enable all available functions. When True, all enable flags are ignored.

Toolkit Functions

Function	Description
`web_crawler`	Crawls a website using crawl4ai's WebCrawler. Parameters include 'url' for the URL to crawl and an optional 'max_length' to limit the length of extracted content. The default value for 'max_length' is 1000.

Developer Resources

View Tools