Clickhouse Vector Database

Use ClickHouse as a vector database for your Knowledge Base.

Setup

1docker run -d \
2 -e CLICKHOUSE_DB=ai \
3 -e CLICKHOUSE_USER=ai \
4 -e CLICKHOUSE_PASSWORD=ai \
5 -e CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 \
6 -v clickhouse_data:/var/lib/clickhouse/ \
7 -v clickhouse_log:/var/log/clickhouse-server/ \
8 -p 8123:8123 \
9 -p 9000:9000 \
10 --ulimit nofile=262144:262144 \
11 --name clickhouse-server \
12 clickhouse/clickhouse-server

Example

1from kern.agent import Agent
2from kern.knowledge.knowledge import Knowledge
3from kern.db.sqlite import SqliteDb
4from kern.vectordb.clickhouse import Clickhouse
5
6knowledge=Knowledge(
7 vector_db=Clickhouse(
8 table_name="recipe_documents",
9 host="localhost",
10 port=8123,
11 username="ai",
12 password="ai",
13 ),
14)
15
16knowledge.insert(
17 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"
18)
19
20agent = Agent(
21 db=SqliteDb(db_file="kern.db"),
22 knowledge=knowledge,
23 # Enable the agent to search the knowledge base
24 search_knowledge=True,
25 # Enable the agent to read the chat history
26 read_chat_history=True,
27)
28# Comment out after first run
29agent.knowledge.load(recreate=False) # type: ignore
30
31agent.print_response("How do I make pad thai?", markdown=True)
32agent.print_response("What was my last question?", stream=True)

Async Support ⚡

Clickhouse also supports asynchronous operations, enabling concurrency and leading to better performance.

1import asyncio
2
3from kern.agent import Agent
4from kern.knowledge.knowledge import Knowledge
5from kern.db.sqlite import SqliteDb
6from kern.vectordb.clickhouse import Clickhouse
7
8agent = Agent(
9 db=SqliteDb(db_file="kern.db"),
10 knowledge=Knowledge(
11 vector_db=Clickhouse(
12 table_name="recipe_documents",
13 host="localhost",
14 port=8123,
15 username="ai",
16 password="ai",
17 ),
18 ),
19 # Enable the agent to search the knowledge base
20 search_knowledge=True,
21 # Enable the agent to read the chat history
22 read_chat_history=True,
23)
24
25if __name__ == "__main__":
26 # Comment out after first run
27 asyncio.run(agent.knowledge.ainsert(
28 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"
29 )
30 )
31
32 # Create and use the agent
33 asyncio.run(agent.aprint_response("How to make Tom Kha Gai", markdown=True))
Tip

Use aload() and aprint_response() methods with asyncio.run() for non-blocking operations in high-throughput applications.

Clickhouse Params

ParameterTypeDefaultDescription
table_namestrNoneName of the table to store vectors and metadata in Clickhouse
hoststrNoneHostname of the Clickhouse server
usernameOptional[str]NoneUsername for Clickhouse authentication
passwordstr""Password for Clickhouse authentication
portint0Port number for Clickhouse connection
database_namestr"ai"Name of the database to use in Clickhouse
dsnOptional[str]NoneDSN string for Clickhouse connection
compressstr"lz4"Compression algorithm to use
clientOptional[Client]NoneOptional pre-configured Clickhouse client
embedderOptional[Embedder]OpenAIEmbedder()Embedder instance to generate embeddings
distanceDistanceDistance.cosineDistance metric to use for similarity search
indexOptional[HNSW]HNSW()HNSW index configuration for vector similarity search