AWS Bedrock Embedder

The AwsBedrockEmbedder class is used to embed text data into vectors using the AWS Bedrock API. By default, it uses the Cohere Embed Multilingual V3 model for generating embeddings.

Setup

Set your AWS credentials

1export AWS_ACCESS_KEY_ID = xxx
2export AWS_SECRET_ACCESS_KEY = xxx
3export AWS_REGION = xxx
Note

By default, this embedder uses the cohere.embed-multilingual-v3 model. You must enable access to this model from the AWS Bedrock model catalog before using this embedder.

Run PgVector

1docker run -d \
2 -e POSTGRES_DB=ai \
3 -e POSTGRES_USER=ai \
4 -e POSTGRES_PASSWORD=ai \
5 -e PGDATA=/var/lib/postgresql/data/pgdata \
6 -v pgvolume:/var/lib/postgresql/data \
7 -p 5532:5432 \
8 --name pgvector \
9 agnohq/pgvector:16

Usage

1import asyncio
2from kern.knowledge.embedder.aws_bedrock import AwsBedrockEmbedder
3from kern.knowledge.knowledge import Knowledge
4from kern.knowledge.reader.pdf_reader import PDFReader
5from kern.vectordb.pgvector import PgVector
6
7embeddings = AwsBedrockEmbedder().get_embedding(
8 "The quick brown fox jumps over the lazy dog."
9)
10# Print the embeddings and their dimensions
11print(f"Embeddings: {embeddings[:5]}")
12print(f"Dimensions: {len(embeddings)}")
13
14# Example usage:
15knowledge = Knowledge(
16 vector_db=PgVector(
17 table_name="recipes",
18 db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
19 embedder=AwsBedrockEmbedder(),
20 ),
21)
22
23knowledge.insert(
24 url="https://kern-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
25 reader=PDFReader(
26 chunk_size=2048
27 ), # Required because cohere has a fixed size of 2048
28)

Params

ParameterTypeDefaultDescription
idstr"cohere.embed-multilingual-v3"The model ID to use. You need to enable this model in your AWS Bedrock model catalog.
dimensionsint1024The dimensionality of the embeddings generated by the model(1024 for Cohere models).
input_typestr"search_query"Prepends special tokens to differentiate types. Options: 'search_document', 'search_query', 'classification', 'clustering'.
truncateOptional[str]NoneHow to handle inputs longer than the maximum token length. Options: 'NONE', 'START', 'END'.
embedding_typesOptional[List[str]]NoneTypes of embeddings to return . Options: 'float', 'int8', 'uint8', 'binary', 'ubinary'.
aws_regionOptional[str]NoneThe AWS region to use. If not provided, falls back to AWS_REGION env variable.
aws_access_key_idOptional[str]NoneThe AWS access key ID. If not provided, falls back to AWS_ACCESS_KEY_ID env variable.
aws_secret_access_keyOptional[str]NoneThe AWS secret access key. If not provided, falls back to AWS_SECRET_ACCESS_KEY env variable.
sessionOptional[Session]NoneA boto3 Session object to use for authentication.
request_paramsOptional[Dict[str, Any]]NoneAdditional parameters to pass to the API requests.
client_paramsOptional[Dict[str, Any]]NoneAdditional parameters to pass to the boto3 client.
clientOptional[AwsClient]NoneAn instance of the AWS Bedrock client to use for making API requests.

Developer Resources