Crawl4AI
Crawl4aiTools enable an Agent to perform web crawling and scraping tasks using the Crawl4ai library.
Prerequisites
The following example requires the crawl4ai library.
1uv pip install -U crawl4aiExample
The following agent will scrape the content from the https://github.com/kern-ai/kern webpage:
1from kern.agent import Agent2from kern.tools.crawl4ai import Crawl4aiTools34agent = Agent(tools=[Crawl4aiTools(max_length=None)])5agent.print_response("Tell me about https://github.com/kern-ai/kern.")Toolkit Params
| Parameter | Type | Default | Description |
|---|---|---|---|
max_length | int | 1000 | Specifies the maximum length of the text from the webpage to be returned. |
timeout | int | 60 | Timeout in seconds for web crawling operations. |
use_pruning | bool | False | Enable content pruning to remove less relevant content. |
pruning_threshold | float | 0.48 | Threshold for content pruning relevance scoring. |
bm25_threshold | float | 1.0 | BM25 scoring threshold for content relevance. |
headless | bool | True | Run browser in headless mode. |
wait_until | str | "domcontentloaded" | Browser wait condition before crawling (e.g., "domcontentloaded", "load", "networkidle"). |
enable_crawl | bool | True | Enable the web crawling functionality. |
all | bool | False | Enable all available functions. When True, all enable flags are ignored. |
Toolkit Functions
| Function | Description |
|---|---|
web_crawler | Crawls a website using crawl4ai's WebCrawler. Parameters include 'url' for the URL to crawl and an optional 'max_length' to limit the length of extracted content. The default value for 'max_length' is 1000. |
Developer Resources
- View Tools