[26ai New Feature] What is Oracle AI Vector Search? Explanation of Mechanism and Implementation

English

The biggest highlight feature of Oracle Database 26ai (formerly 23ai) is “AI Vector Search.” Until now, this technology was thought to require “specialized AI knowledge” or a “dedicated vector database,” but it can now be achieved using only the familiar Oracle Database and SQL.

In this article, I will explain the concept of AI Vector Search using familiar examples and introduce the steps to implement it using SQL, step-by-step.


1. Understanding “Vector Search” with Familiar Examples

How is Vector Search different from traditional database search (Keyword Search)? Let’s consider the examples of “searching for a book in a library” or “online shopping.”

Traditional Search (Keyword Search)

  • User: Enters “Delicious” “Pasta” recipe book into the search machine.
  • System: Displays only items where the book title or description completely contains the words “Delicious” and “Pasta.”
  • Weakness: A book titled “Exquisite Spaghetti” will not be a hit because it does not contain the word “Pasta.”

AI Vector Search (Vector Search / Semantic Search)

  • User: Consults the librarian (AI), “I want to make stylish Italian food for the weekend.”
  • System (Librarian): Understands the meaning (nuance) of the question and searches for books close to that image, even if the words do not match.
  • Result: Hits such as “Secrets of Authentic Bolognese” or “Pasta and Wine Taught by Pros.”

What is a “Vector”?

A “Vector” is this “meaning” converted into a sequence of numbers (coordinates) so that a computer can understand it. For example, every word is quantified using countless parameters like “sweetness,” “spiciness,” and “heat,” and placed as a point on a map.

  • “Curry” and “Stew” are located close to each other on the map.
  • “Curry” and “Ice Cream” are located far apart.

Vector search is the technology that searches for data by calculating this “proximity of distance on the map.”


2. Implementation Steps for Oracle AI Vector Search

Now, let’s actually perform a vector search in the Oracle Database 26ai environment. Here, we will implement a “Movie Review Search” as an example.

Prerequisites

  • An Oracle Database 26ai (or 23ai) environment exists.
  • You can log in with administrator privileges (such as SYS).

Step 0: Preparation (Connection to PDB and User Creation)

Oracle Database 26ai requires the Multitenant Architecture (CDB/PDB). Normal table creation and data operations are performed by a user within a PDB (Pluggable Database).

Here, we will explain the procedure to create a working user in FREEPDB1, which is created by default, and connect to it.

1. Switch to PDB as Administrator

First, log in with sqlplus / as sysdba on the database server terminal and move to the PDB.

-- Check current container (Should be in CDB$ROOT initially)
SHOW CON_NAME;

-- Switch to PDB (FREEPDB1)
ALTER SESSION SET CONTAINER = FREEPDB1;

-- Check if switched
SHOW CON_NAME;
-- Result: OK if FREEPDB1 is displayed

2. Creation of Working User

Create a local user in the PDB and grant necessary privileges. Using the convenient role DB_DEVELOPER_ROLE introduced from 23ai/26ai allows you to grant privileges necessary for development all at once.

-- Create user (Username: vector_user / Password: password)
CREATE USER vector_user IDENTIFIED BY password;

-- Grant developer role and tablespace privileges
GRANT DB_DEVELOPER_ROLE TO vector_user;
GRANT UNLIMITED TABLESPACE TO vector_user;

3. Verification of Remote Connection (Easy Connect)

Check if you can connect with the created user. The information for connecting from a remote client (A5:SQL Mk-2, SQL Developer, VS Code, etc.) is as follows. Specify the service name freepdb1 which can be confirmed with the lsnrctl status command.

  • Host Name: Server IP address or Host Name (e.g., 26ai-single)
  • Port: 1521
  • Service Name: freepdb1
  • User: vector_user
  • Password: password

Connection example with SQL*Plus:

-- Format: sqlplus user/password@host:port/service_name
sqlplus vector_user/password@localhost:1521/freepdb1

Step 1: Creating a Table with Vector Type

From here on, execute using the created vector_user. In addition to traditional VARCHAR2 and NUMBER, the new VECTOR type is now available.

-- Create movie information table
CREATE TABLE movies (
    movie_id    NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    title       VARCHAR2(100),
    genre       VARCHAR2(50),
    description VARCHAR2(1000),
    -- VECTOR type: Defined here as a 3-dimensional vector for simplicity
    -- In reality, it would be hundreds to thousands of dimensions
    movie_vector VECTOR(3, FLOAT32)
);

💡 Syntax Explanation: VECTOR Type

  • VECTOR(Dimension, Data Type): A new data type used in column definitions.
  • 3 (Dimension): Specifies the length of the vector (number of values). In the example, it is simplified to 3, but actual AI models (such as OpenAI’s text-embedding-3-small) specify large numbers like 1536 or 1024.
  • FLOAT32(Data Type): Specifies the precision of each number.
    • FLOAT32: Standard single-precision floating-point (Default).
    • FLOAT64: High-precision double-precision.
    • INT8: Used when you want to compress data size (Quantization).

Step 2: Data Registration (Inserting Vector Data)

Normally, you would use an Embedding Model like OpenAI to convert text into vectors, but here we will insert dummy numerical data directly to understand the mechanism.

  • Action Movie (Intense/Explosion/Fight) → Tendency like [0.9, 0.1, 0.1]
  • Romance Movie (Touching/Quiet/Love) → Tendency like [0.1, 0.9, 0.2]
  • Sci-Fi Movie (Future/Space/Science) → Tendency like [0.1, 0.1, 0.9]

We will assume this and insert the data.

INSERT INTO movies (title, genre, description, movie_vector) VALUES 
('Explosive Chase 2026', 'Action', 'Intense car chases and continuous explosions', '[0.95, 0.05, 0.05]');

INSERT INTO movies (title, genre, description, movie_vector) VALUES 
('Promise of the Starry Sky', 'Romance', 'A quiet love story of two people met in a rural town', '[0.05, 0.95, 0.1]');

INSERT INTO movies (title, genre, description, movie_vector) VALUES 
('Galaxy War', 'Sci-Fi', 'Fierce battle between space fleets and aliens', '[0.6, 0.1, 0.8]'); -- Has Action elements and Sci-Fi elements

COMMIT;

💡 Syntax Explanation: Notation of Vector Data

  • '[0.95, 0.05, 0.05]': When handling vector data in SQL, separate numbers with commas and represent them as a string enclosed in brackets []. Oracle automatically converts and stores this into the VECTOR type.

Step 3: Execution of Vector Search (Similarity Search)

This is the highlight. Suppose a user searches: “I want to see a refreshing intense movie.” Let’s assume this search query, when vectorized, becomes [0.9, 0.1, 0.0] (Strong Action element).

To find movies close to this vector, use the VECTOR_DISTANCE function.

-- Search in order of closeness to "Refreshing intense movie ([0.9, 0.1, 0.0])"
SELECT 
    title,
    genre,
    description,
    -- Calculate distance from search vector (smaller number means more similar)
    VECTOR_DISTANCE(movie_vector, '[0.9, 0.1, 0.0]', COSINE) AS distance
FROM 
    movies
ORDER BY 
    distance ASC
FETCH FIRST 3 ROWS ONLY;

💡 Syntax Explanation: VECTOR_DISTANCE Function

  • VECTOR_DISTANCE(Target Column, Search Vector, Distance Metric): A function that calculates the “distance” between two vectors.
    • 1st Argument: The vector column in the table (here, movie_vector).
    • 2nd Argument: The vector of the search query you want to compare.
    • 3rd Argument: Calculation method for distance (Metric).
      • COSINE (Cosine Similarity): Measures the closeness of the “direction” of vectors. Most commonly used for text and semantic search.
      • EUCLIDEAN (Euclidean Distance): Measures the straight-line distance between two points. Suitable for physical positional relationships, etc.
      • DOT (Dot Product): Considers both the magnitude and direction of the vectors.
  • ORDER BY distance ASC: Since a “closer” distance means it is more similar, sorting in ascending order (ASC) is basic.

Image of Execution Result:

TITLE                      GENRE   DESCRIPTION                               DISTANCE
-------------------------- ------- ----------------------------------------- --------
Explosive Chase 2026       Action  Intense car chases and continuous...      0.002
Galaxy War                 Sci-Fi  Fierce battle between space fleets...     0.150
Promise of the Starry Sky  Romance A quiet love story of two people...       0.850

You can see that the “Action” movie comes to the top, followed by “Sci-Fi” which includes action elements, and “Romance,” which has a completely different tendency, has a far distance (low similarity). This is Vector Search.


3. Why Do It with Oracle Database?

Dedicated vector databases (Pinecone, Milvus, etc.) exist, but the merits of doing it with Oracle Database are as follows:

  • Unified Data Management: Since you can place vector data in the same place as customer data and inventory data (Relational Data), there is no effort in data movement or security risks.
  • Hybrid Search is Possible: Combinations with traditional SQL conditions, such as “Action movie (Vector Search)” AND “Released after 2020 (Relational Search),” can be written very fast and easily.SQLSELECT title FROM movies WHERE release_year >= 2020 -- Existing SQL condition ORDER BY VECTOR_DISTANCE(movie_vector, :query_vector, COSINE); -- Vector search
  • Utilization of Existing Skills: You can implement cutting-edge AI search functions using only the usual SQL, without needing to learn new languages or APIs.

4. Summary

Oracle AI Vector Search is a groundbreaking feature that lets the database understand “meaning.”

  • Intuitive Search: Even if keywords do not match, you can find data close to the user’s intent.
  • Complete with SQL: Implementable without complex AI programming.
  • Application to RAG: Optimal as a search foundation for providing internal data to Generative AI (such as ChatGPT).

I recommend you first use Oracle Database 26ai Free edition and actually try out the VECTOR type and VECTOR_DISTANCE function.

Would you like me to create a follow-up article on how to implement RAG (Retrieval-Augmented Generation) connecting this Vector Search with an LLM?

[reference]
Oracle AI Vector Search

コメント

Copied title and URL