Spider
SpiderTools is an open source web Scraper & Crawler that returns LLM-ready data. To start using Spider, you need an API key from the Spider dashboard.
Prerequisites
The following example requires the spider-client library.
1uv pip install -U spider-clientExample
The following agent will run a search query to get the latest news in USA and scrape the first search result. The agent will return the scraped data in markdown format.
1from kern.agent import Agent2from kern.tools.spider import SpiderTools34agent = Agent(tools=[SpiderTools()])5agent.print_response('Can you scrape the first search result from a search on "news in USA"?', markdown=True)Toolkit Params
| Parameter | Type | Default | Description |
|---|---|---|---|
max_results | Optional[int] | None | Default maximum number of results. |
url | Optional[str] | None | Default URL for operations. |
optional_params | Optional[dict] | None | Additional parameters for operations. |
enable_search | bool | True | Enable web search functionality. |
enable_scrape | bool | True | Enable web scraping functionality. |
enable_crawl | bool | True | Enable web crawling functionality. |
all | bool | False | Enable all tools. Overrides individual flags when True. |
Toolkit Functions
| Function | Description |
|---|---|
search | Searches the web for the given query. Parameters include query (str) for the search query and max_results (int, default=5) for maximum results. Returns search results in JSON format. |
scrape | Scrapes the content of a webpage. Parameters include url (str) for the URL of the webpage to scrape. Returns markdown of the webpage. |
crawl | Crawls the web starting from a URL. Parameters include url (str) for the URL to crawl and limit (Optional[int], default=10) for maximum pages to crawl. Returns crawl results in JSON format. |
Developer Resources
- View Tools