Introduction to Oracle SQL Subqueries! From Basics to Speed Optimization

English

Do you ever struggle when writing SQL, thinking “I want to use the result of another aggregation in a condition” or “I want to complete a two-step process in a single execution”?

Utilizing Oracle subqueries allows you to describe complex data retrieval simply and can lead to improved application performance.

This article comprehensively explains everything from basic syntax to correlated subqueries and practical techniques using EXISTS and the WITH clause, assuming Oracle Database 19c.

Conclusion: Key Points for Utilizing Subqueries

For those short on time, here are the main points of this article.

  • What is a Subquery: A “nested” SELECT statement embedded within a SQL statement.
  • Usage Distinction:
    • Scalar: Result is “1 row, 1 column”. Compare using =.
    • Multiple-row: Result is a “list”. Compare using IN.
    • Correlated: Evaluated for each row of the outer query. Be careful as it can become heavy.
  • Best Practice: We recommend using the WITH clause to enhance readability and reusability.
  • Caution: NOT IN results in an empty set if it includes NULL (use NOT EXISTS instead).

1. Basics of Subqueries

Definition and Mechanism

A Subquery is a SELECT statement enclosed in parentheses () that is written inside the main SQL statement (outer query).

Usually, SQL executes the inner subquery first, and the outer main query is executed using that result (except for correlated subqueries).

  • Main Query: The parent SQL (Outer)
  • Subquery: The child SQL (Inner)

2. Creating Sample Data for Practice

We will create sample tables so you can verify the behavior in your own environment along with the explanation.

Note: We will create them in a format that does not include multi-byte characters (full-width) in the SQL.

Prerequisites

  • Connection User: A user with general creation privileges (SCOTT, etc.)
  • Environment: Oracle Database 19c Enterprise/Standard Edition 2

Please execute the following SQL to create EMPLOYEES and DEPARTMENTS.

-- Drop tables if they already exist (Optional)
-- DROP TABLE employees PURGE;
-- DROP TABLE departments PURGE;

-- 1. Create departments table
CREATE TABLE departments (
    dept_id   NUMBER PRIMARY KEY,
    dept_name VARCHAR2(50)
);

-- 2. Create employees table
CREATE TABLE employees (
    emp_id    NUMBER PRIMARY KEY,
    emp_name  VARCHAR2(50),
    dept_id   NUMBER,
    salary    NUMBER,
    FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
);

-- 3. Insert data
-- Departments: 1=Sales, 2=Dev, 3=HR, 4=Marketing (Empty department)
INSERT INTO departments VALUES (1, 'Sales');
INSERT INTO departments VALUES (2, 'Dev');
INSERT INTO departments VALUES (3, 'HR');
INSERT INTO departments VALUES (4, 'Marketing');

-- Employees
INSERT INTO employees VALUES (1, 'Taro Tanaka', 1, 300000);
INSERT INTO employees VALUES (2, 'Jiro Suzuki', 2, 400000);
INSERT INTO employees VALUES (3, 'Saburo Sato', 1, 350000);
INSERT INTO employees VALUES (4, 'Shiro Yamada', 3, 500000);

COMMIT;

▼ Created Tables

SQL> SELECT * FROM departments;

DEPT_ID DEPT_NAME
---------- --------------------
1 Sales
2 Dev
3 HR
4 Marketing

SQL> SELECT * FROM employees;

EMP_ID EMP_NAME DEPT_ID SALARY
---------- -------------------- ---------- ----------
1 Taro Tanaka 1 300000
2 Jiro Suzuki 2 400000
3 Saburo Sato 1 350000
4 Shiro Yamada 3 500000

3. Practice: Types and Writing of Subqueries

3-1. Scalar Subquery (Single Row)

This is a pattern where the result of the subquery is “1 row and 1 column“. It is used with comparison operators such as = or <.

Example: Retrieve employees belonging to the ‘Sales’ department (ID unknown)

SELECT emp_name, salary
FROM   employees
WHERE  dept_id = (
           SELECT dept_id 
           FROM   departments 
           WHERE  dept_name = 'Sales'
       );

▼ Execution Result

SQL> SELECT emp_name, salary
2 FROM employees
3 WHERE dept_id = (
4 SELECT dept_id
5 FROM departments
6 WHERE dept_name = 'Sales'
7 );

EMP_NAME SALARY
-------------------------------------------------- ----------
Taro Tanaka 300000
Saburo Sato 350000

The SQL inside the parentheses is executed first, returning 1. Then, the main query is executed as WHERE dept_id = 1.

3-2. Multiple-row Subquery

This is a pattern where the result of the subquery consists of “multiple rows“. Connecting with = causes an error, so use the IN operator.

Example: Retrieve employees belonging to the ‘Sales’ or ‘Dev’ departments

SELECT emp_name, salary
FROM   employees
WHERE  dept_id IN (
           SELECT dept_id 
           FROM   departments 
           WHERE  dept_name IN ('Sales', 'Dev')
       );

▼ Execution Result

SQL> SELECT emp_name, salary
2 FROM employees
3 WHERE dept_id IN (
4 SELECT dept_id
5 FROM departments
6 WHERE dept_name IN ('Sales', 'Dev')
7 );

EMP_NAME SALARY
-------------------------------------------------- ----------
Taro Tanaka 300000
Jiro Suzuki 400000
Saburo Sato 350000

Image of Execution Result

The subquery returns 1 and 2. The main query is evaluated as WHERE dept_id IN (1, 2).

3-3. Correlated Subquery

This is a point that often becomes a hurdle for beginners.

This is a pattern where the subquery refers to columns of the outer table.

Example: Find employees who earn a higher salary than the “average salary” of their own department

SELECT e.emp_name, e.salary, e.dept_id
FROM   employees e
WHERE  e.salary > (
           SELECT AVG(s.salary) 
           FROM   employees s 
           WHERE  s.dept_id = e.dept_id
       );

▼ Execution Result

SQL> SELECT e.emp_name, e.salary, e.dept_id
2 FROM employees e
3 WHERE e.salary > (
4 SELECT AVG(s.salary)
5 FROM employees s
6 WHERE s.dept_id = e.dept_id
7 );

EMP_NAME SALARY DEPT_ID
-------------------------------------------------- ---------- ----------
Saburo Sato 350000 1

Explanation

  1. The main query reads 1 row from the employees table (e) (e.g., Taro Tanaka, Dept 1).
  2. The subquery is executed. At this time, 1 is substituted into e.dept_id in WHERE s.dept_id = e.dept_id.
  3. The average salary for Dept 1 (325,000) is calculated.
  4. It is compared with Taro’s salary (300,000) (300,000 > 325,000 is False).
  5. Proceeds to the next row and repeats.

4. Advanced: EXISTS and WITH Clause

4-1. EXISTS / NOT EXISTS (Existence Check)

This checks only “whether data exists“, not the row data itself. Especially when dealing with large amounts of data, it may operate faster than IN.

Example: Find departments with no employees (NOT EXISTS)

SELECT d.dept_name
FROM   departments d
WHERE  NOT EXISTS (
           SELECT 1
           FROM   employees e
           WHERE  e.dept_id = d.dept_id
       );

▼ Execution Result

SQL> SELECT d.dept_name
2 FROM departments d
3 WHERE NOT EXISTS (
4 SELECT 1
5 FROM employees e
6 WHERE e.dept_id = d.dept_id
7 );

DEPT_NAME
--------------------------------------------------
Marketing

(Because no one is assigned to the Marketing department)

4-2. WITH Clause (Common Table Expression)

When SQL becomes long, extracting the subquery as a WITH clause at the beginning dramatically improves readability. This is a recommended writing style in modern Oracle SQL development.

Example: Calculate the average salary for each department and list employees with a higher salary than that (Rewriting correlated subquery)

-- Pre-processing: Define average salary per department
WITH dept_avg AS (
    SELECT dept_id, AVG(salary) AS avg_sal
    FROM   employees
    GROUP  BY dept_id
)
-- Main processing
SELECT e.emp_name, e.salary, a.avg_sal
FROM   employees e
JOIN   dept_avg a ON e.dept_id = a.dept_id
WHERE  e.salary > a.avg_sal;

▼ Execution Result

SQL> WITH dept_avg AS (
2 SELECT dept_id, AVG(salary) AS avg_sal
3 FROM employees
4 GROUP BY dept_id
5 )
6 SELECT e.emp_name, e.salary, a.avg_sal
7 FROM employees e
8 JOIN dept_avg a ON e.dept_id = a.dept_id
9 WHERE e.salary > a.avg_sal;

EMP_NAME SALARY AVG_SAL
-------------------------------------------------- ---------- ----------
Saburo Sato 350000 325000

5. Troubleshooting (Common Errors)

Here are frequent errors and countermeasures when using subqueries.

Error CodeError MessageCause and Countermeasure
ORA-01427single-row subquery returns more than one rowA subquery being compared with = is returning multiple values. Change to IN or make the subquery condition stricter to narrow it down to 1 row.
ORA-00904invalid identifierThe column name in the subquery is incorrect, or the alias specification when referencing the outer column is incorrect.
No Data
(Result is 0 rows)
This is a pitfall of NOT IN. If the subquery result contains NULL, the entire result becomes empty. Please use NOT EXISTS.

6. Operations & Performance Considerations

Merits and Demerits

  • Merits: Complex logic can be completed in a single SQL statement. Reduces lines of code on the application side.
  • Demerits: If the nesting is too deep, it becomes “spaghetti code” that third parties cannot read.

Performance Pitfalls

Correlated subqueries execute the “subquery for the number of rows”, so if the outer table is huge, performance degrades significantly.

  • Countermeasure: Consider rewriting to a join process using JOIN or the WITH clause if possible.
  • Verification: Obtain an Execution Plan (Explain Plan) and check if indexes are being used.

7. FAQ (Frequently Asked Questions)

Q1. Which is faster, Subquery or Join (JOIN)?

A. It is case-by-case, but currently, the optimizer (Oracle’s optimization feature) is excellent, so often the same processing plan is used internally regardless of how it is written. However, since correlated subqueries tend to be slow, rewriting to JOIN or the WITH clause is recommended for large amounts of data.

Q2. Can I use subqueries in an UPDATE statement?

A. Yes, it is possible. It can be used for value specification in the SET clause or condition specification in the WHERE clause.

Example: UPDATE employees SET salary = (SELECT …)

Q3. Is there a limit to subquery nesting?

A. According to Oracle specifications, there is no strict upper limit on the nesting level of subqueries, but from the perspective of readability and memory consumption, it is best practice to keep it to about 3 levels at most.

8. Summary

Subqueries are an essential technique that expands the expressiveness of SQL.

  • Use = for single rows, IN for multiple rows.
  • Correlated subqueries are powerful but watch out for performance.
  • If it looks like it will become complex, organize it with the WITH clause.
  • It is dangerous if NULL is mixed in NOT IN. Get into the habit of using NOT EXISTS.

First, please copy the sample code and execute it, and try to experience the behavior by intentionally generating errors (ORA-01427).

This article explains concepts targeting Oracle Database 19c (screens and default values may differ in other versions).

[reference]
Oracle Database SQL Language Reference, 19c

コメント

Copied title and URL