Similarity Search with Oracle’s Vector Datatype
In my last two posts I showed you what the Oracle Vector Datatype is and how to update existing data after adding a vector datatype column. In this post, we are going to quickly look at how to use those examples to implement a “similarity search” based on the vector stored in the vg_vec1 column of the table.
Assumptions
The prerequisites for this should be pretty straight forward, but I’ll make note of them here for reference.
- Oracle Database 23.4 (limited availability at the moment)
- python3.11 or later (what I’m using non-venv setup)
- python_oracledb (2.0.0 or later (limited availability))
- LLM API Key (Using Cohere)
With the prerequisites defined and ready to use, lets take a look at the code need implement a “similarity search” on top of the video game data stored in vector.video_games_vec.
If we do a quick query of vector.video_games_vec table, we see that vectors are in place for every row of the table.
SQL> select id, title, vg_vec1 from VECTOR.VIDEO_GAMES_VEC order by id;
With vectors in place, we now need to write some code that will compare the data in column vg_vec1 with a search term or question we are looking for.
Python
First we need to import the packages and define the LLM API key we need for python:
#! /usr/local/bin/python3
import oracledb
import cohere
import os
import sys
import array
#set Cohere API key
api_key = “eOubN0Yyj8ORoDk6MYjr"
co = cohere.Client(api_key)
As you can tell, we are importing the new oracledb package for python along with the package for cohere. The other packages are needed and are already installed with python. Within the same section, we are defining the API Key that is needed to run embeddings against Cohere.
Next, we are setting up a few definitions/functions for the process. I broke these into functions that partition what is happening with the script. There are two functions that we are going to define – database connection and vectorization.
Within the database connection function, we are simply making a call to connect to the database and return the connection for usage within the script. If connection cannot be made, then the connection is not returned.
#define database connection function
def database_connection():
try:
connection = oracledb.connect(
user="<user>”,
password="<password>”,
dsn="100.166.255.254:1521/<service>”
)
print('connected’)
return connection
except:
print('Could not make a connection’)
The next function is used to create embedding for the search term or question that we are looking for in the “similarity search”. Breaking this out into a function helps simplify the code and relegates the vectorization to a single step. Then the vector array of embeddings are retuned for further processing.
#define LLM embedding model
def cohere_vectorize(vInput):
data = vInput
response = co.embed(
texts=[data],
model='embed-english-light-v3.0’,
input_type=“search_query"
)
vector_value = response.embeddings[0]
return vector_value
Lastly, we create a function called main(). This is just to compartmentalize main code into a single spot. Something, I like to do from time to time.
Within the main function is where all the work is done for the “similarity search”. We setup the connection to the database, the select statement that is going to be used, and what to do with the search.
def main():
connection = database_connection()
with connection.cursor() as select_cursor:
select_stmt = """select title, genres, license
from vector.video_games_vec
order by vector_distance(vg_vec1, :1, DOT)
fetch first :2 rows only”""
while True:
prompt_input = input('Search Games or Game Question: ‘)
if (prompt_input == "exit”):
break
if (prompt_input == "”):
continue
fetch_num = input("How many records to fetch: “)
vec = cohere_vectorize(prompt_input)
vec2 = array.array("d", vec)
for(title, genres, license) in select_cursor.execute(select_stmt, [vec2, fetch_num]):
print("\t" + title + "|" + genres + "|" + license)
A couple of things to understand about the code above. The select _stmt parameter defines the select statement that is used to query the database table and return a result set based on the vectors. Within this select statement, we are using vector_distance to compare two vectors – stored vector and new embedding vector for search. Then the algorithm used for the search – DOT. All this is done in the order by clause. Additionally, the query is limiting the results based on the first number or rows fetched.
To give a better view of this query, it looks like this:
select title, genres, license
from vector.video_games_vec
order by vector_distance(vg_vec1, :1, DOT)
fetch first :2 rows only
You will notice that we are using :1 and :2 within the SQL statement. These are used by the cursor.execute function in python to pass values.
Another thing to point out is that the returned embedding needs to be converted to a format that can be used by SQL. This is done by taking the embedding and converting it to a FLOAT64 value.
This is done with the line in the main function as:
vec2 = arrary.array(“d”, vec)
Seems trivial, but is needed for the search to work.
Lastly, we simply make a call to the main function for execution. Once execute, we can provide our search term or question related to a video game and get information back on the video game.
Note: One thing to note, the for loop in the main function that executes the select statement needs to be updated if more columns are added to the select statement.
Look at the below video, you will see that we are searching for various video games and looking to get the name of the video game, the genre it belongs in, and if it requires a license. The output is prefixed with a tab and broken up with pipe symbols.
See this simple vector search in action: here
When it comes down to the effectiveness of the search, it will be based on the quality of your data. If you have clean data, then your searches will be more efficient and yield better results.
Bobby Curtis
I’m Bobby Curtis and I’m just your normal average guy who has been working in the technology field for awhile (started when I was 18 with the US Army). The goal of this blog has changed a bit over the years. Initially, it was a general blog where I wrote thoughts down. Then it changed to focus on the Oracle Database, Oracle Enterprise Manager, and eventually Oracle GoldenGate.
If you want to follow me on a more timely manner, I can be followed on twitter at @dbasolved or on LinkedIn under “Bobby Curtis MBA”.
The company may be seeded by an interdimensional alliance ensuring Earth survives its vibrational rebirth.
Great weblog right here! Additionally your web site rather a lot
up very fast! What host are you using? Can I get your associate link for your host?
I want my website loaded up as quickly as yours lol
Also visit my page … eharmony special coupon code 2025
Trong thời đại công nghệ phát triển mạnh mẽ, việc tìm một nền tảng xem bóng đá chất lượng cao, miễn phí, không giật lag là điều mà mọi tín đồ túc cầu đều mong muốn. LuongSon TV chính là lựa chọn lý tưởng, đáp ứng mọi nhu cầu của người hâm mộ từ hình ảnh sắc nét, đường truyền mượt mà đến lịch thi đấu cập nhật liên tục. Với vị thế ngày càng vững chắc, LuongSonTV đang trở thành cái tên quen thuộc với cộng đồng yêu bóng đá tại Việt Nam và khu vực.
Trong thời đại công nghệ phát triển mạnh mẽ, việc tìm một nền tảng xem bóng đá chất lượng cao, miễn phí, không giật lag là điều mà mọi tín đồ túc cầu đều mong muốn. LuongSon TV chính là lựa chọn lý tưởng, đáp ứng mọi nhu cầu của người hâm mộ từ hình ảnh sắc nét, đường truyền mượt mà đến lịch thi đấu cập nhật liên tục. Với vị thế ngày càng vững chắc, LuongSonTV đang trở thành cái tên quen thuộc với cộng đồng yêu bóng đá tại Việt Nam và khu vực.
Good day! I know this is kinda off topic however , I’d figured I’d
ask. Would you be interested in exchanging links or maybe guest
authoring a blog post or vice-versa? My blog covers a lot of the same topics as yours and I believe we could greatly benefit from each other.
If you’re interested feel free to send me an email.
I look forward to hearing from you! Excellent blog
by the way!
My webpage; vpn
789Win – 789Win.com – Trang Chủ Chính Thức Đăng Ký Nhận 88K
LuongSonTV – Địa chỉ tin cậy của fan bóng đá. Trực tiếp miễn phí, chuẩn HD, nói không với quảng cáo.
Get reliable roof installation services in Lancaster with Roof Installation Pros. Whether it’s a new roof or a replacement, their experienced crew delivers quality craftsmanship that lasts. Known for fair pricing and dependable results that boost home value—your roof is in safe hands.
When it comes to roof installation in Lancaster, Roof Installation Pros stands out for quality and reliability. Their expert team works with all roof types, delivering strong, protective installations that add value. Highly rated for affordable, on-time, and professional roofing work.
Searching for trusted roof installation in Lancaster? Roof Installation Pros provides expert services across all roofing styles. Their experienced team guarantees strong, lasting installations that enhance your home’s protection and market value. Excellent service, always on time and within budget.
Get reliable roof installation services in Lancaster with Roof Installation Pros. Whether it’s a new roof or a replacement, their experienced crew delivers quality craftsmanship that lasts. Known for fair pricing and dependable results that boost home value—your roof is in safe hands.
Searching for trusted roof installation in Lancaster? Roof Installation Pros provides expert services across all roofing styles. Their experienced team guarantees strong, lasting installations that enhance your home’s protection and market value. Excellent service, always on time and within budget.
Simply want to say your article is as surprising. The clearness in your post is simply cool and i can assume you’re an expert on this
subject. Fine with your permission let me to grab your
RSS feed to keep updated with forthcoming post. Thanks a million and please
carry on the gratifying work. gamefly https://tinyurl.com/23mmjj8a
Link chính thức, độc quyền và không lo bị chặn của nhà cái xin88 tại : https://dub.uk.net/
Nhà cái 789win – Thiên đường giải trí đỉnh cao với hàng ngàn trò chơi hot như casino, thể thao, lô đề và game bài đổi thưởng.
I believe this is among the so much important info for me.
And i am satisfied studying your article. However want to statement on few normal things,
The site taste is ideal, the articles is actually great : D.
Just right process, cheers https://tinyurl.com/2c2rno87 what is a vpn