Jina Embedder

The JinaEmbedder class is used to embed text data into vectors using the Jina AI API. You can get started with Jina AI here.

Get your API key.

Usage

1from kern.knowledge.knowledge import Knowledge
2from kern.vectordb.pgvector import PgVector
3from kern.knowledge.embedder.jina import JinaEmbedder
4
5# Add embedding to database
6embeddings = JinaEmbedder(id="jina-embeddings-v3").get_embedding("The quick brown fox jumps over the lazy dog.")
7# Print the embeddings and their dimensions
8print(f"Embeddings: {embeddings[:5]}")
9print(f"Dimensions: {len(embeddings)}")
10
11# Use an embedder in a knowledge base
12knowledge = Knowledge(
13    vector_db=PgVector(
14        db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
15        table_name="jina_embeddings",
16        embedder=JinaEmbedder(id="jina-embeddings-v3"),
17    ),
18    max_results=2,
19)

Advanced Usage

1# Configure embedder with custom settings
2embedder = JinaEmbedder(
3    id="jina-embeddings-v3",
4    dimensions=1024,
5    embedding_type="float",
6    late_chunking=True,
7    batch_size=50,
8    timeout=30.0
9)
10
11# Use async methods for better performance
12import asyncio
13
14async def embed_texts():
15    embedder = JinaEmbedder()
16    texts = ["First text", "Second text", "Third text"]
17    
18    # Get embeddings in batches
19    embeddings, usage = await embedder.async_get_embeddings_batch_and_usage(texts)
20    print(f"Generated {len(embeddings)} embeddings")
21    print(f"Usage info: {usage[0]}")
22
23# Run async example
24asyncio.run(embed_texts())

Params

Parameter	Type	Default	Description
`id`	`str`	`"jina-embeddings-v3"`	The model ID to use for generating embeddings.
`dimensions`	`int`	`1024`	The number of dimensions for the embedding vectors.
`embedding_type`	`Literal["float", "base64", "int8"]`	`"float"`	The format type of the returned embeddings.
`late_chunking`	`bool`	`False`	Whether to enable late chunking optimization.
`user`	`Optional[str]`	`None`	User identifier for tracking purposes. Optional.
`api_key`	`Optional[str]`	`JINA_API_KEY` env var	The Jina AI API key. Can be set via environment variable.
`base_url`	`str`	`"https://api.jina.ai/v1/embeddings"`	The base URL for the Jina API.
`headers`	`Optional[Dict[str, str]]`	`None`	Additional headers to include in API requests. Optional.
`request_params`	`Optional[Dict[str, Any]]`	`None`	Additional parameters to include in the API request. Optional.
`timeout`	`Optional[float]`	`None`	Timeout in seconds for API requests. Optional.
`enable_batch`	`bool`	`False`	Enable batch processing to reduce API calls and avoid rate limits
`batch_size`	`int`	`100`	Number of texts to process in each API call for batch operations.

Features

Async Support: Full async/await support for better performance in concurrent applications
Batch Processing: Efficient batch processing of multiple texts with configurable batch size
Late Chunking: Support for Jina's late chunking optimization technique
Flexible Output: Multiple embedding formats (float, base64, int8)
Usage Tracking: Get detailed usage information for API calls
Error Handling: Robust error handling with fallback mechanisms

Developer Resources

View Cookbook
Jina AI Documentation