Azure OpenAI Embedder

The AzureOpenAIEmbedder class is used to embed text data into vectors using the Azure OpenAI API. Get your key from here.

Setup

Set your API keys

1export AZURE_EMBEDDER_OPENAI_API_KEY=xxx
2export AZURE_EMBEDDER_OPENAI_ENDPOINT=xxx
3export AZURE_EMBEDDER_DEPLOYMENT=xxx

Run PgVector

1docker run -d \
2 -e POSTGRES_DB=ai \
3 -e POSTGRES_USER=ai \
4 -e POSTGRES_PASSWORD=ai \
5 -e PGDATA=/var/lib/postgresql/data/pgdata \
6 -v pgvolume:/var/lib/postgresql/data \
7 -p 5532:5432 \
8 --name pgvector \
9 agnohq/pgvector:16

Usage

1from kern.knowledge.knowledge import Knowledge
2from kern.vectordb.pgvector import PgVector
3from kern.knowledge.embedder.azure_openai import AzureOpenAIEmbedder
4
5# Embed sentence in database
6embeddings = AzureOpenAIEmbedder(id="text-embedding-3-small").get_embedding("The quick brown fox jumps over the lazy dog.")
7
8# Print the embeddings and their dimensions
9print(f"Embeddings: {embeddings[:5]}")
10print(f"Dimensions: {len(embeddings)}")
11
12# Use an embedder in a knowledge base
13knowledge_base = Knowledge(
14 vector_db=PgVector(
15 db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
16 table_name="azure_openai_embeddings",
17 embedder=AzureOpenAIEmbedder(id="text-embedding-3-small"),
18 ),
19 max_results=2,
20)

Params

ParameterTypeDefaultDescription
modelstr"text-embedding-ada-002"The name of the model used for generating embeddings.
dimensionsint1536The dimensionality of the embeddings generated by the model.
encoding_formatLiteral['float', 'base64']"float"The format in which the embeddings are encoded. Options are "float" or "base64".
userstr-The user associated with the API request.
api_keystr-The API key used for authenticating requests.
api_versionstr"2024-02-01"The version of the API to use for the requests.
azure_endpointstr-The Azure endpoint for the API requests.
azure_deploymentstr-The Azure deployment name for the API requests.
base_urlstr-The base URL for the API endpoint.
azure_ad_tokenstr-The Azure Active Directory token for authentication.
azure_ad_token_providerAny-The provider for obtaining the Azure AD token.
organizationstr-The organization associated with the API request.
request_paramsOptional[Dict[str, Any]]-Additional parameters to include in the API request. Optional.
client_paramsOptional[Dict[str, Any]]-Additional parameters for configuring the API client. Optional.
openai_clientOptional[AzureOpenAIClient]-An instance of the AzureOpenAIClient to use for making API requests. Optional.
enable_batchboolFalseEnable batch processing to reduce API calls and avoid rate limits
batch_sizeint100Number of texts to process in each API call for batch operations.

Developer Resources