Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

Bhubaneswar, India

info@krescitus.com

+1 -800-456-478-23

Blogs

What is Vector Database?

 A vector database is a database that stores information as vectors, which are numerical representations of data objects, also known as vector embeddings.

It leverages the power of these vector embeddings to index and search across a massive dataset of unstructured and semi-structured data, such as images, text, or sensor data. Vector databases are built to manage vector embeddings and therefore offer a complete solution for the management of unstructured and semi-structured data.

What are Vector Embeddings?

Vector embeddings are a numerical representation of a subject, word, image, or any other piece of data. Vector embeddings — also known as embeddings — are generated by large language models and other AI models.

The distance between each vector embedding is what enables a vector database, or a vector search engine, to determine the similarity between vectors. Distances may represent several dimensions of data objects, enabling machine learning and AI’s understanding of patterns, relationships, and underlying structures.

What is Bedrock Knowledge Base?

     Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow from ingestion to retrieval and prompt augmentation without having to build custom integrations to data sources and manage data flows.

 

            HOW TO CREATE A KNOWLEDGE BASE

            (With Amazon Aurora )

Prerequisites

  • Amazon Aurora Cluster
  • PostgreSQL in your local
  • AWS Secret Manager
  • S3 bucket

Configure an Aurora PostgreSQL cluster

  1. On the Aurora console, create a new cluster.
  2. For Engine options¸, select Aurora (PostgreSQL Compatible).
  3. For the Engine version, choose your engine version.
  4. For Configuration options, select either Aurora Standard or Aurora I/O Optimized.

We selected Aurora I/O-Optimized, which provides improved performance with predictable pricing for I/O-intensive applications.

  1. For the DB instance class, select your instance class.

We opted to use Amazon Aurora Serverless v2, which automatically scales your compute based on your application workload, so you only pay based on the capacity used.

  1. Enable the RDS Data API, which is used by Amazon Bedrock to access your Aurora cluster.

Create the cluster.

  1. Log in to your Aurora cluster either as the admin user (for example, postgres) or a user that has the rds_superuser privilege, and run the following code. Note the password that you create for bedrock_user, because you’ll need it in a later step when configuring a secret in Secrets Manager. Also note the table names and column names, because they’ll be used in the knowledge base workflow on the Amazon Bedrock console.

To do this, 

  1. Install PostgreSQL in your system.
  2. Open pgAdmin4 and click on Add New Server
  1. Give a name to the server 
  2. Then, click on Connection and add the endpoint of the Aurora Cluster instance in the Host name/address input field
  3. Modify the Amazon Aurora instance and allow Public Access.
  1. Give the password and make sure you remember it.
  2. Click on Save, and it will be connected to the Amazon Aurora Cluster
  3. Then you will be able to view your newly created server as shown below. Now click on the PSQL Tool as shown in the image below 
  1. Now paste the following code there.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE SCHEMA bedrock_integration;

CREATE ROLE bedrock_user LOGIN;

— After running the following line, you will be prompted to enter and note the password for bedrock_user

ALTER USER bedrock_user WITH PASSWORD ‘your_password’;

GRANT ALL ON SCHEMA bedrock_integration TO bedrock_user;

SET SESSION AUTHORIZATION bedrock_user;

CREATE TABLE bedrock_integration.bedrock_kb (

  id uuid PRIMARY KEY,

  embedding vector(1536),

  chunks text,

  metadata json

);

CREATE INDEX ON bedrock_integration.bedrock_kb

  USING hnsw (embedding vector_cosine_ops);

    The Aurora cluster is now set up to be used as a knowledge base for Amazon Bedrock. We’ll now create a secret in Secrets Manager that Amazon Bedrock will use to connect to the cluster.

Create a secret in Secrets Manager

  1. On the Secrets Manager console, create a new secret.
  2. For the Secret type, select Credentials for the Amazon RDS database.
  3. Under Credentials, enter a name for your user (for this post, we use bedrock_user) and the password for that role. (password must match what you have given in the code.)
  4. In the Database section, select the cluster you’re using for the knowledge base.
  5.  For Secret name, enter a name for your secret.
  6. Create the secret

Now, create an S3 bucket where you will upload your text file before creating the Knowledge Base.

Create a knowledge base for Amazon Bedrock

  1. On the Amazon Bedrock console, choose Knowledge base under Orchestration in the navigation pane.
  2. Choose Create knowledge base.
  3. For Knowledge base name¸, enter a name.
  4. For the Runtime role, select Create and use a new service role and enter a service role name.
  5. Choose Next.
  1. For Choose an archive in S3, select the S3 bucket to use as a data source and choose Choose.
  2. For the Embeddings model, select your model (for this post, we use Amazon Titan Embeddings G1 – Text).
  3. For the Vector database, select Choose a vector store you have created and select Amazon Aurora.
  1. Provide the following additional information (note the examples we use for this post):
    1. For Amazon Aurora DB Cluster ARN, enter the ARN you saved when creating your Aurora cluster.
    2. For Database name, enter postgres.
    3. For Table name, enter bedrock_integration.bedrock_kb.
    4. For Secret ARN, enter the ARN you saved when creating the secret for bedrock_user.
    5. For the Vector field, enter the embedding.
    6. For the Text field, enter chunks.
    7. For the Bedrock-managed metadata field, enter metadata.
    8. For the Primary key, enter id.
  1. Choose Next.

     11. Review the summary page and choose Sync.

            This begins the process of converting the unstructured data stored in the S3 bucket into embeddings and storing them in your Aurora cluster.

  1. Click ‘Select Model’ choose Anthropic Claude 2. Then Sync. Then ask a question and get a response.