Hybrid Search

Combine vector similarity with keyword matching for better retrieval accuracy.

Hybrid search combines vector similarity (semantic meaning) with keyword matching (exact terms) to get the best of both approaches. It's the recommended search type for most production use cases.

1from kern.knowledge.knowledge import Knowledge
2from kern.vectordb.pgvector import PgVector, SearchType
3
4knowledge = Knowledge(
5    vector_db=PgVector(
6        table_name="docs",
7        db_url=db_url,
8        search_type=SearchType.hybrid,
9    ),
10)

How It Works

Hybrid search runs two searches in parallel:

Vector search finds semantically similar content (meaning-based)
Keyword search finds exact term matches (text-based)
Fusion combines results using Reciprocal Rank Fusion (RRF)

The RRF algorithm merges rankings with the formula: RRF(d) = Σ 1/(k + rank)

This ensures documents that rank well in both searches appear at the top, while documents that only match one method still surface.

Note

Not all vector databases support hybrid search or RRF.

When to Use Hybrid Search

Scenario	Why Hybrid Helps
User queries vary in phrasing	Vector catches meaning, keywords catch exact terms
Technical content with specific terms	Keywords match error codes, product names exactly
Mixed content types	Balances conceptual and precise matching
Production systems	Best overall accuracy for diverse queries

Use vector-only if your queries are always conceptual with no specific terms. Use keyword-only if you need exact matching (e.g., search by ID or code).

Configuration

Basic Setup

1from kern.vectordb.pgvector import PgVector, SearchType
2
3vector_db = PgVector(
4    table_name="docs",
5    db_url=db_url,
6    search_type=SearchType.hybrid,
7)

With Reranking

Add a reranker to improve result ordering after fusion:

1from kern.knowledge.reranker.cohere import CohereReranker
2
3vector_db = PgVector(
4    table_name="docs",
5    db_url=db_url,
6    search_type=SearchType.hybrid,
7    reranker=CohereReranker(),
8)

RRF Constant

The k constant in RRF controls how much weight lower-ranked results receive. Higher values (e.g., 60) smooth out rankings; lower values make top results more dominant.

1from kern.vectordb.chroma import ChromaDb, SearchType
2
3vector_db = ChromaDb(
4    collection="docs",
5    path="tmp/chromadb",
6    search_type=SearchType.hybrid,
7    hybrid_rrf_k=60,  # Default is 60
8)

Example

1from kern.knowledge.knowledge import Knowledge
2from kern.vectordb.pgvector import PgVector, SearchType
3
4db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
5
6knowledge = Knowledge(
7    vector_db=PgVector(
8        table_name="recipes",
9        db_url=db_url,
10        search_type=SearchType.hybrid,
11    ),
12)
13
14# Load content
15knowledge.insert(
16    url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
17)
18
19# Search combines semantic similarity + keyword matching
20results = knowledge.search("chicken coconut soup", max_results=5)
21for doc in results:
22    print(doc.content[:200])

Supported Vector Databases

Hybrid search is available in:

Check individual vector database docs for specific hybrid search capabilities.