ChromaDB Vector Database

Use ChromaDB as a vector database for your Knowledge Base.

Setup

1uv pip install chromadb

Example

1import asyncio
2
3from kern.agent import Agent
4from kern.knowledge.knowledge import Knowledge
5from kern.vectordb.chroma import ChromaDb
6
7# Create Knowledge Instance with ChromaDB
8knowledge = Knowledge(
9 name="Basic SDK Knowledge Base",
10 description="Kern 2.0 Knowledge Implementation with ChromaDB",
11 vector_db=ChromaDb(
12 collection="vectors", path="tmp/chromadb", persistent_client=True
13 ),
14)
15
16asyncio.run(
17 knowledge.ainsert(
18 name="Recipes",
19 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
20 metadata={"doc_type": "recipe_book"},
21 )
22)
23
24# Create and use the agent
25agent = Agent(knowledge=knowledge)
26agent.print_response("List down the ingredients to make Massaman Gai", markdown=True)
27
28# Delete operations examples
29vector_db = knowledge.vector_db
30vector_db.delete_by_name("Recipes")
31# or
32vector_db.delete_by_metadata({"user_tag": "Recipes from website"})

For hosted ChromaDB (Chroma Cloud)

1from chromadb.config import Settings
2
3vector_db = ChromaDb(
4 collection="vectors",
5 settings=Settings(
6 chroma_api_impl="chromadb.api.fastapi.FastAPI",
7 chroma_server_host="your-tenant-id.api.trychroma.com",
8 chroma_server_http_port=443,
9 chroma_server_ssl_enabled=True,
10 chroma_client_auth_provider="chromadb.auth.token_authn.TokenAuthClientProvider",
11 chroma_client_auth_credentials="your-api-key"
12 )
13)

Async Support ⚡

ChromaDB also supports asynchronous operations, enabling concurrency and leading to better performance.

1# install chromadb - `pip install chromadb`
2
3import asyncio
4
5from kern.agent import Agent
6from kern.knowledge.knowledge import Knowledge
7from kern.vectordb.chroma import ChromaDb
8
9# Initialize ChromaDB
10vector_db = ChromaDb(collection="recipes", path="tmp/chromadb", persistent_client=True)
11
12# Create knowledge base
13knowledge = Knowledge(
14 vector_db=vector_db,
15)
16
17# Create and use the agent
18agent = Agent(knowledge=knowledge)
19
20if __name__ == "__main__":
21 # Comment out after first run
22 asyncio.run(
23 knowledge.ainsert(url="https://kern.ndx.rocks/introduction/agents.md")
24 )
25
26 # Create and use the agent
27 asyncio.run(
28 agent.aprint_response("What is the purpose of an Kern Agent?", markdown=True)
29 )
Tip

Use ainsert() and aprint_response() methods with asyncio.run() for non-blocking operations in high-throughput applications.

Note

ChromaDB has a batch size limit due to SQLite constraints. When inserting documents that exceed this limit, Kern automatically splits them into smaller batches. The batch size is auto-detected from ChromaDB's server configuration.

You can also set batch_size to override the auto-detected value.

ChromaDb Params

ParameterTypeDefaultDescription
collectionstr-The name of the collection to use.
embedderEmbedderOpenAIEmbedder()The embedder to use for embedding document contents.
distanceDistancecosineThe distance metric to use.
pathstr"tmp/chromadb"The path where ChromaDB data will be stored.
persistent_clientboolFalseWhether to use a persistent ChromaDB client.
batch_sizeintNoneMaximum number of documents per batch operation. Auto-detected from ChromaDB's server limit if not set, falls back to 100 if auto-detect fails.