[Definitive Guide] The World’s Easiest Explanation of Oracle AI Vector Search & Hands-on

26ai_en

Hello everyone. Have you tried the currently trending “Oracle Database 26ai (and 23ai)” yet? Today, I will explain the new feature “AI Vector Search,” which is being called “revolutionary” in the database world.

“AI and vectors sound like math and seem difficult…”

“I have to write Python, right?”

To those of you holding such anxieties: The theme of this article is “The world’s easiest and most polite explanation.”

Actually, this feature is a magical mechanism where “the database comes to understand the meaning of words using only the usual SQL.”

We will deliver a luxurious two-part structure: the first half is a “Thorough Explanation,” and the second half is a “Hands-on Verification” that you can run by copy-pasting. We have packed everything an Oracle engineer absolutely needs to know!

1. What is “AI Vector Search” anyway?

In a nutshell, it is a function that allows you to search for data not by “keyword (text)” matching, but by the closeness of “meaning and context.”

Let’s compare how this differs from traditional database searches.

Traditional Search (Keyword Search)

Checks if the “text” matches.

  • Goal: Want to find “delicious food”.
  • Search Condition: WHERE text LIKE '%delicious%'
  • Result: “Delicious Curry” is a hit, but “Exquisite Curry” or “Mouth-watering Hamburger” are not hits. (Because the characters are different)

AI Vector Search (Semantic Search)

Checks if the “meaning” is similar.

  • Goal: Want to find “delicious food”.
  • Search Condition: Use AI Vector Search.
  • Result: Data with close meanings such as “Exquisite,” “Yummy,” “Supreme,” and “3-stars” are all hits!

To realize this, Oracle Database has directly incorporated an “AI brain (ability to handle vectors)” inside itself. That is the true identity of this feature.

2. What is a “Vector”? (Let’s understand with an image)

How does a computer judge that “meanings are close”? Computers cannot understand the meaning of words themselves. Therefore, words and images are converted into a sequence of numbers (coordinates). This is called “vector data.”

Imagine a “huge map.”

The image is placing all words onto a map of meaning.

  • “Apple” and “Orange” are placed close together on the map. (Because both are fruits)
  • “Apple” and “iPhone” might also be somewhat close. (Because contexts can be similar due to Apple Inc.)
  • “Apple” and “Automobile” are placed far apart, like opposite ends of the map. (Because the relationship is thin)

AI Vector Search does not involve difficult calculations; it is simply “a technology that measures distances on this map and finds things that are nearby.”

3. Why is Oracle’s AI Vector Search Chosen?

There are also “databases dedicated to vector search (such as Pinecone)” in the world. However, there are three huge advantages to Oracle incorporating this as a standard feature.

(1) You can do AI search with just SQL (Learning cost is almost zero)

For Oracle engineers, this is the greatest benefit.

There is no need to learn Python or new APIs. You can perform AI searches just by adding a little to the “usual SQL” you are familiar with.

-- Image (Example of performing semantic search and traditional condition search simultaneously in SQL)
SELECT product_name, description
FROM   products_table
WHERE  price < 5000                                     -- 1. Normal SQL condition (Business Logic)
AND    VECTOR_DISTANCE(vector_col, :search_word) < 0.1; -- 2. AI Search (Closeness of meaning)

In this way, Oracle’s strength is that it can execute complex processing such as “under 5000 yen (business data search)” AND “items with a luxurious feel (AI semantic search)” at high speed with a single SQL statement.

(2) No Data Movement Required (Strongest Security)

Usually, to perform AI search, it is necessary to move data to an external dedicated DB, but moving data outside leads to security risks (information leakage).

With Oracle, you can vectorize and search within the same database containing confidential personal information without taking the data out. Since Oracle’s robust security (encryption, access control) cultivated over many years can be used as is, it is optimal for enterprise use.

(3) Building RAG becomes overwhelmingly easier

What companies are rushing to do right now is RAG (Retrieval-Augmented Generation). This is a mechanism to “let AI like ChatGPT refer to internal confidential manuals and customer data to give smart answers.”

  1. Put internal manuals into Oracle.
  2. Vectorize (convert to numbers) and save them inside Oracle.
  3. When a user asks a question, Oracle instantly finds “related manuals.”
  4. Pass that information to the Generative AI to create an accurate answer.

This entire flow is completed within a single Oracle Database. Since there is no need to combine other tools, development becomes incredibly easy.

4. Pro’s Perspective: Technical Points

As an engineer, let’s also organize the technical specifications you should know.

  • New Data Type VECTOR:Just like traditional NUMBER or VARCHAR2, a VECTOR type has been added. This stores arrays of numbers.
  • Blazing Fast Index Vector Index:Dedicated indexes (such as HNSW) are provided so that searches can be done in an instant even with hundreds of millions of records.
  • RAC / Exadata Support:This is the true value of Oracle. This AI function works as is on RAC (availability configuration) and Exadata (ultra-high-speed machine) used in mission-critical systems.

[Hands-on Verification] Let’s Build a “Taste Search AI” with Just SQL

Thank you for waiting! Let’s actually get our hands dirty. This time, we will put aside difficult AI model integration settings and manually set vector values to experiment and feel the “mechanism of vectors.”

The theme is “Lunch Menu Recommendation.”

Prerequisite: Definition of Vector (3 Dimensions)

Real AI uses vectors of over 1000 dimensions, but here we will express taste with 3 numbers (3 dimensions) to make it easy for humans to understand.

  • 1st Number: Sweetness (0.0 – 1.0)
  • 2nd Number: Spiciness (0.0 – 1.0)
  • 3rd Number: Temperature (0.0 – 1.0)

Step 1. Create Table (DDL)

First, we create a table. Here, the VECTOR type makes its appearance.

SQL is written in English (single byte) except for comments so that it can be executed in any environment.

-- Create menu table
CREATE TABLE menu_items (
    id          NUMBER PRIMARY KEY,
    name        VARCHAR2(100),
    description VARCHAR2(200),
    -- [Important] This is the new feature! Define a 3-dimensional vector column (FLOAT32)
    features    VECTOR(3, FLOAT32)
);

💡 Point Explanation: What is VECTOR(3, FLOAT32)?

  • 3 (Dimensions): We used “3” because we are handling only three elements: “Sweetness, Spiciness, Temperature,” but when using actual AI models (such as OpenAI’s text-embedding-3-small), large numbers like 1536 or 3072 will go here.
  • FLOAT32 (Data Type): This is the precision of the numbers constituting the vector. Usually, FLOAT32 (single-precision floating-point) is used, but it is also possible to use INT8 (quantization) to reduce the size and save data capacity.
SQL> DESCRIBE menu_items
Name Null? Type
------------------------------- -------- ----------------------------
ID NOT NULL NUMBER
NAME VARCHAR2(100)
DESCRIPTION VARCHAR2(200)
FEATURES VECTOR(3, FLOAT32, DENSE)

Step 2. Register Data (INSERT)

We quantify the menu according to the rules of “Sweetness, Spiciness, Temperature” and register it.

This is the image of the task “vectorizing data.”

-- 1. Spicy Curry (Not sweet, Extremely spicy, Hot) -> [0.1, 0.9, 0.9]
INSERT INTO menu_items VALUES (
    1, 'Spicy Curry', 'Authentic curry with rich spices', '[0.1, 0.9, 0.9]'
);

-- 2. Shortcake (Very sweet, Not spicy, Cold) -> [0.9, 0.1, 0.2]
INSERT INTO menu_items VALUES (
    2, 'Shortcake', 'Sweet cake with strawberries', '[0.9, 0.1, 0.2]'
);

-- 3. Mapo Tofu (Not sweet, Spicy, Hot) -> [0.1, 0.8, 0.9]
INSERT INTO menu_items VALUES (
    3, 'Mapo Tofu', 'Sichuan style spicy tofu', '[0.1, 0.8, 0.9]'
);

-- 4. Vanilla Ice Cream (Sweet, Not spicy, Very cold) -> [0.9, 0.0, 0.0]
INSERT INTO menu_items VALUES (
    4, 'Vanilla Ice Cream', 'Rich milk flavor', '[0.9, 0.0, 0.0]'
);

COMMIT;
SQL> SELECT * FROM menu_items;

ID NAME DESCRIPTION FEATURES
---------- -------------------- ----------------------------------- ----------------------------------------------------------------------
1 Spicy Curry Authentic curry with rich spices [1.00000001E-001,8.99999976E-001,8.99999976E-001]
2 Shortcake Sweet cake with strawberries [8.99999976E-001,1.00000001E-001,2.00000003E-001]
3 Mapo Tofu Sichuan style spicy tofu [1.00000001E-001,8.00000012E-001,8.99999976E-001]
4 Vanilla Ice Cream Rich milk flavor [8.99999976E-001,0,0]

Step 3. Let’s Search! (SELECT)

Here starts the real deal. When I am in the mood for “something spicy and hot anyway (sweetness is unnecessary),” the desire vector in my head is ‘[0.0, 1.0, 1.0]’.

Let’s find menus close to this.

Note: VECTOR_DISTANCE may calculate “Cosine distance” by default, but here we explicitly specify EUCLIDEAN to calculate “distance on the map (Euclidean distance).”

-- Search in order of closeness to "Spicy and Hot things" (Sweetness=0, Spiciness=1, Temperature=1)
SELECT 
    name,
    description,
    -- VECTOR_DISTANCE function: Calculates the "distance" between two vectors.
    -- Specify 'EUCLIDEAN' as the 3rd argument to calculate straight-line distance in space.
    ROUND(VECTOR_DISTANCE(features, '[0.0, 1.0, 1.0]', EUCLIDEAN), 3) AS distance
FROM 
    menu_items
ORDER BY 
    distance ASC; -- Sort by smaller distance (more similar)

💡 Point Explanation: Meaning of ROUND(VECTOR_DISTANCE(…, EUCLIDEAN), 3)

This single line is the heart of AI search.

  • VECTOR_DISTANCE: Calculates the distance between “Menu Features” and “Features I want to search (Spicy and Hot).”
  • EUCLIDEAN: Specifies the calculation method of distance. We use straight-line distance in space (Euclidean distance) this time. (By default, COSINE distance etc. may be used)
  • ROUND(…, 3): Since the calculated distance becomes a fine decimal, we round it off at the 3rd decimal place to make it easier for humans to read.

Execution Result Image

NAMEDESCRIPTIONDISTANCEExplanation
Spicy CurryAuthentic curry…0.173Closest! (Spicy/Hot)
Mapo TofuSichuan style…0.245Quite close! (Spicy/Hot)
ShortcakeSweet cake…1.503Far (Sweet/Cold)
Vanilla Ice CreamRich milk flavor…1.676Furthest (Opposite nature)

Great success!

Although I did not specify keywords like “Curry” or “Tofu,” menus close to my intention (vector) of “Spicy and Hot” came to the top.

SQL> SELECT
2 name,
3 description,
4 ROUND(VECTOR_DISTANCE(features, '[0.0, 1.0, 1.0]', EUCLIDEAN), 3) AS distance
5 FROM
6 menu_items
7 ORDER BY
8 distance ASC;

NAME DESCRIPTION DISTANCE
-------------------- ----------------------------------- ----------
Spicy Curry Authentic curry with rich spices .173
Mapo Tofu Sichuan style spicy tofu .245
Shortcake Sweet cake with strawberries 1.503
Vanilla Ice Cream Rich milk flavor 1.676

Application: Filtering unique to SQL

Furthermore, here is a powerful example showing the meaning of doing this in Oracle Database.

Let’s combine it with a traditional WHERE clause.

“I want something spicy, but I’m not in the mood for curry today (Exclude curry).”

SELECT name, description
FROM   menu_items
WHERE  name NOT LIKE '%Curry%'              -- Traditional SQL condition (Exclude keyword)
AND    VECTOR_DISTANCE(features, '[0.0, 1.0, 1.0]', EUCLIDEAN) < 0.5 -- Vector search condition (Only distance less than 0.5)
ORDER BY VECTOR_DISTANCE(features, '[0.0, 1.0, 1.0]', EUCLIDEAN);

Result: Mapo Tofu

In this way, smoothly executing “filtering by business logic” and “ambiguous search by AI” with a single SQL is the true value of Oracle AI Vector Search.

SQL> SELECT name, description
2 FROM menu_items
3 WHERE name NOT LIKE '%Curry%'
4 AND VECTOR_DISTANCE(features, '[0.0, 1.0, 1.0]', EUCLIDEAN) < 0.5
5 ORDER BY VECTOR_DISTANCE(features, '[0.0, 1.0, 1.0]', EUCLIDEAN);

NAME DESCRIPTION
-------------------- -----------------------------------
Mapo Tofu Sichuan style spicy tofu

Frequently Asked Questions (FAQ)

This is a Q&A corner for those who read this far and thought, “I want to know more details!”

Q1. Can I AI-search data in tables I am currently using?

A. Yes, you can!

By simply adding a VECTOR type column (ALTER TABLE) to an existing table and vectorizing the text of existing data to UPDATE it, AI search becomes possible. There is no need to migrate data to another system.

Q2. Is Japanese search also properly supported?

A. Of course, it is possible!

If the AI model (Embedding model) that performs vectorization supports Japanese, there is no problem. Since Oracle Database itself only handles numerical vector data, it does not depend on the language.

Q3. Won’t it become slow if there are hundreds of millions of records?

A. Please rest assured.

By creating a dedicated index called Vector Index, you can search at blazing speed even with large-scale data. This is an application of index technology that Oracle has excelled at for many years.

Q4. Can I try it for free?

A. Yes!

You can fully use this AI Vector Search feature in the free developer edition called “Oracle Database 23ai Free.” You can easily launch it with Docker containers or VirtualBox.

Summary

How was it? Oracle 26ai’s AI Vector Search is truly a feature that fuses “robust database technology to date” with “latest AI technology.”

Even if it looks difficult, what you do is simple.

  1. Create a VECTOR type column in the table.
  2. Insert data.
  3. Search with VECTOR_DISTANCE.

“There is no need to set up a separate dedicated AI database; you can build AI apps safely and quickly using only the Oracle you are used to.”

This is the greatest value that we Oracle engineers can propose to customers and within our companies.

Please try it out starting from this SQL!

[reference]
Oracle AI Vector Search

コメント

Copied title and URL