Jina Embedder

The JinaEmbedder class is used to embed text data into vectors using the Jina AI API. You can get started with Jina AI here.

Get your API key.

Usage

1from kern.knowledge.knowledge import Knowledge
2from kern.vectordb.pgvector import PgVector
3from kern.knowledge.embedder.jina import JinaEmbedder
4
5# Add embedding to database
6embeddings = JinaEmbedder(id="jina-embeddings-v3").get_embedding("The quick brown fox jumps over the lazy dog.")
7# Print the embeddings and their dimensions
8print(f"Embeddings: {embeddings[:5]}")
9print(f"Dimensions: {len(embeddings)}")
10
11# Use an embedder in a knowledge base
12knowledge = Knowledge(
13 vector_db=PgVector(
14 db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
15 table_name="jina_embeddings",
16 embedder=JinaEmbedder(id="jina-embeddings-v3"),
17 ),
18 max_results=2,
19)

Advanced Usage

1# Configure embedder with custom settings
2embedder = JinaEmbedder(
3 id="jina-embeddings-v3",
4 dimensions=1024,
5 embedding_type="float",
6 late_chunking=True,
7 batch_size=50,
8 timeout=30.0
9)
10
11# Use async methods for better performance
12import asyncio
13
14async def embed_texts():
15 embedder = JinaEmbedder()
16 texts = ["First text", "Second text", "Third text"]
17
18 # Get embeddings in batches
19 embeddings, usage = await embedder.async_get_embeddings_batch_and_usage(texts)
20 print(f"Generated {len(embeddings)} embeddings")
21 print(f"Usage info: {usage[0]}")
22
23# Run async example
24asyncio.run(embed_texts())

Params

ParameterTypeDefaultDescription
idstr"jina-embeddings-v3"The model ID to use for generating embeddings.
dimensionsint1024The number of dimensions for the embedding vectors.
embedding_typeLiteral["float", "base64", "int8"]"float"The format type of the returned embeddings.
late_chunkingboolFalseWhether to enable late chunking optimization.
userOptional[str]NoneUser identifier for tracking purposes. Optional.
api_keyOptional[str]JINA_API_KEY env varThe Jina AI API key. Can be set via environment variable.
base_urlstr"https://api.jina.ai/v1/embeddings"The base URL for the Jina API.
headersOptional[Dict[str, str]]NoneAdditional headers to include in API requests. Optional.
request_paramsOptional[Dict[str, Any]]NoneAdditional parameters to include in the API request. Optional.
timeoutOptional[float]NoneTimeout in seconds for API requests. Optional.
enable_batchboolFalseEnable batch processing to reduce API calls and avoid rate limits
batch_sizeint100Number of texts to process in each API call for batch operations.

Features

  • Async Support: Full async/await support for better performance in concurrent applications
  • Batch Processing: Efficient batch processing of multiple texts with configurable batch size
  • Late Chunking: Support for Jina's late chunking optimization technique
  • Flexible Output: Multiple embedding formats (float, base64, int8)
  • Usage Tracking: Get detailed usage information for API calls
  • Error Handling: Robust error handling with fallback mechanisms

Developer Resources