ChromaDB Vector Database

Use ChromaDB as a vector database for your Knowledge Base.

Setup

1uv pip install chromadb

Example

1import asyncio
2
3from kern.agent import Agent
4from kern.knowledge.knowledge import Knowledge
5from kern.vectordb.chroma import ChromaDb
6
7# Create Knowledge Instance with ChromaDB
8knowledge = Knowledge(
9    name="Basic SDK Knowledge Base",
10    description="Kern 2.0 Knowledge Implementation with ChromaDB",
11    vector_db=ChromaDb(
12        collection="vectors", path="tmp/chromadb", persistent_client=True
13    ),
14)
15
16asyncio.run(
17    knowledge.ainsert(
18        name="Recipes",
19        url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
20        metadata={"doc_type": "recipe_book"},
21    )
22)
23
24# Create and use the agent
25agent = Agent(knowledge=knowledge)
26agent.print_response("List down the ingredients to make Massaman Gai", markdown=True)
27
28# Delete operations examples
29vector_db = knowledge.vector_db
30vector_db.delete_by_name("Recipes")
31# or
32vector_db.delete_by_metadata({"user_tag": "Recipes from website"})

For hosted ChromaDB (Chroma Cloud)

1from chromadb.config import Settings
2
3vector_db = ChromaDb(
4    collection="vectors",
5    settings=Settings(
6        chroma_api_impl="chromadb.api.fastapi.FastAPI",
7        chroma_server_host="your-tenant-id.api.trychroma.com",
8        chroma_server_http_port=443,
9        chroma_server_ssl_enabled=True,
10        chroma_client_auth_provider="chromadb.auth.token_authn.TokenAuthClientProvider",
11        chroma_client_auth_credentials="your-api-key"
12    )
13)

Async Support ⚡

ChromaDB also supports asynchronous operations, enabling concurrency and leading to better performance.

1# install chromadb - `pip install chromadb`
2
3import asyncio
4
5from kern.agent import Agent
6from kern.knowledge.knowledge import Knowledge
7from kern.vectordb.chroma import ChromaDb
8
9# Initialize ChromaDB
10vector_db = ChromaDb(collection="recipes", path="tmp/chromadb", persistent_client=True)
11
12# Create knowledge base
13knowledge = Knowledge(
14    vector_db=vector_db,
15)
16
17# Create and use the agent
18agent = Agent(knowledge=knowledge)
19
20if __name__ == "__main__":
21    # Comment out after first run
22    asyncio.run(
23        knowledge.ainsert(url="https://kern.ndx.rocks/introduction/agents.md")
24    )
25
26    # Create and use the agent
27    asyncio.run(
28        agent.aprint_response("What is the purpose of an Kern Agent?", markdown=True)
29    )

Tip

Use ainsert() and aprint_response() methods with asyncio.run() for non-blocking operations in high-throughput applications.

Note

ChromaDB has a batch size limit due to SQLite constraints. When inserting documents that exceed this limit, Kern automatically splits them into smaller batches. The batch size is auto-detected from ChromaDB's server configuration.

You can also set batch_size to override the auto-detected value.

ChromaDb Params

Parameter	Type	Default	Description
`collection`	`str`	-	The name of the collection to use.
`embedder`	`Embedder`	OpenAIEmbedder()	The embedder to use for embedding document contents.
`distance`	`Distance`	cosine	The distance metric to use.
`path`	`str`	"tmp/chromadb"	The path where ChromaDB data will be stored.
`persistent_client`	`bool`	False	Whether to use a persistent ChromaDB client.
`batch_size`	`int`	None	Maximum number of documents per batch operation. Auto-detected from ChromaDB's server limit if not set, falls back to `100` if auto-detect fails.