Jina Embedder
The JinaEmbedder class is used to embed text data into vectors using the Jina AI API. You can get started with Jina AI here.
Get your API key.
Usage
1from kern.knowledge.knowledge import Knowledge2from kern.vectordb.pgvector import PgVector3from kern.knowledge.embedder.jina import JinaEmbedder45# Add embedding to database6embeddings = JinaEmbedder(id="jina-embeddings-v3").get_embedding("The quick brown fox jumps over the lazy dog.")7# Print the embeddings and their dimensions8print(f"Embeddings: {embeddings[:5]}")9print(f"Dimensions: {len(embeddings)}")1011# Use an embedder in a knowledge base12knowledge = Knowledge(13 vector_db=PgVector(14 db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",15 table_name="jina_embeddings",16 embedder=JinaEmbedder(id="jina-embeddings-v3"),17 ),18 max_results=2,19)Advanced Usage
1# Configure embedder with custom settings2embedder = JinaEmbedder(3 id="jina-embeddings-v3",4 dimensions=1024,5 embedding_type="float",6 late_chunking=True,7 batch_size=50,8 timeout=30.09)1011# Use async methods for better performance12import asyncio1314async def embed_texts():15 embedder = JinaEmbedder()16 texts = ["First text", "Second text", "Third text"]17 18 # Get embeddings in batches19 embeddings, usage = await embedder.async_get_embeddings_batch_and_usage(texts)20 print(f"Generated {len(embeddings)} embeddings")21 print(f"Usage info: {usage[0]}")2223# Run async example24asyncio.run(embed_texts())Params
| Parameter | Type | Default | Description |
|---|---|---|---|
id | str | "jina-embeddings-v3" | The model ID to use for generating embeddings. |
dimensions | int | 1024 | The number of dimensions for the embedding vectors. |
embedding_type | Literal["float", "base64", "int8"] | "float" | The format type of the returned embeddings. |
late_chunking | bool | False | Whether to enable late chunking optimization. |
user | Optional[str] | None | User identifier for tracking purposes. Optional. |
api_key | Optional[str] | JINA_API_KEY env var | The Jina AI API key. Can be set via environment variable. |
base_url | str | "https://api.jina.ai/v1/embeddings" | The base URL for the Jina API. |
headers | Optional[Dict[str, str]] | None | Additional headers to include in API requests. Optional. |
request_params | Optional[Dict[str, Any]] | None | Additional parameters to include in the API request. Optional. |
timeout | Optional[float] | None | Timeout in seconds for API requests. Optional. |
enable_batch | bool | False | Enable batch processing to reduce API calls and avoid rate limits |
batch_size | int | 100 | Number of texts to process in each API call for batch operations. |
Features
- Async Support: Full async/await support for better performance in concurrent applications
- Batch Processing: Efficient batch processing of multiple texts with configurable batch size
- Late Chunking: Support for Jina's late chunking optimization technique
- Flexible Output: Multiple embedding formats (float, base64, int8)
- Usage Tracking: Get detailed usage information for API calls
- Error Handling: Robust error handling with fallback mechanisms