Cassandra Vector Database

Use Cassandra as a vector database for your Knowledge Base.

Setup

Install cassandra packages

1uv pip install cassandra-driver

Run cassandra

1docker run -d \
2 --name cassandra-db \
3 -p 9042:9042 \
4 cassandra:latest

Example

1from kern.agent import Agent
2from kern.knowledge.knowledge import Knowledge
3from kern.vectordb.cassandra import Cassandra
4from kern.knowledge.embedder.mistral import MistralEmbedder
5from kern.models.mistral import MistralChat
6from cassandra.cluster import Cluster
7
8# (Optional) Set up your Cassandra DB
9
10cluster = Cluster()
11
12session = cluster.connect()
13session.execute(
14 """
15 CREATE KEYSPACE IF NOT EXISTS testkeyspace
16 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }
17 """
18)
19
20knowledge_base = Knowledge(
21 vector_db=Cassandra(table_name="recipes", keyspace="testkeyspace", session=session, embedder=MistralEmbedder()),
22)
23
24knowledge_base.insert(
25 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"
26)
27
28agent = Agent(
29 model=MistralChat(provider="mistral-large-latest", api_key=os.getenv("MISTRAL_API_KEY")),
30 knowledge=knowledge_base,
31)
32
33agent.print_response(
34 "What are the health benefits of Khao Niew Dam Piek Maphrao Awn?", markdown=True, show_full_reasoning=True
35)

Async Support ⚡

Cassandra also supports asynchronous operations, enabling concurrency and leading to better performance.

1import asyncio
2
3from kern.agent import Agent
4from kern.knowledge.embedder.mistral import MistralEmbedder
5from kern.knowledge.knowledge import Knowledge
6from kern.models.mistral import MistralChat
7from kern.vectordb.cassandra import Cassandra
8
9try:
10 from cassandra.cluster import Cluster # type: ignore
11except (ImportError, ModuleNotFoundError):
12 raise ImportError(
13 "Could not import cassandra-driver python package.Please install it with pip install cassandra-driver."
14 )
15
16cluster = Cluster()
17
18session = cluster.connect()
19session.execute(
20 """
21 CREATE KEYSPACE IF NOT EXISTS testkeyspace
22 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }
23 """
24)
25
26knowledge_base = Knowledge(
27 vector_db=Cassandra(
28 table_name="recipes",
29 keyspace="testkeyspace",
30 session=session,
31 embedder=MistralEmbedder(),
32 ),
33)
34
35agent = Agent(
36 model=MistralChat(),
37 knowledge=knowledge_base,
38)
39
40if __name__ == "__main__":
41 asyncio.run(knowledge_base.ainsert(
42 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
43 )
44 )
45
46 # Create and use the agent
47 asyncio.run(
48 agent.aprint_response(
49 "What are the health benefits of Khao Niew Dam Piek Maphrao Awn?",
50 markdown=True,
51 )
52 )
Tip

Use aload() and aprint_response() methods with asyncio.run() for non-blocking operations in high-throughput applications.

Cassandra Params

ParameterTypeDefaultDescription
table_namestrNoneName of the table to store vectors and metadata in Cassandra
keyspacestrNoneKeyspace name in Cassandra where the table will be created
embedderOptional[Embedder]OpenAIEmbedder()Embedder instance to generate embeddings
sessionCassandraSessionNoneActive Cassandra session object for database operations