Oracle SQL Aggregate Functions: Handling NULLs and Using DISTINCT

English

“The total sum is less than expected.” “The average value doesn’t match my manual calculation.”
When performing data analysis in Oracle Database, the biggest cause of such discrepancies lies in the handling of NULL.

This article explains the behavior of NULL in aggregate functions (SUM, AVG, MAX, MIN, COUNT) and the correct usage of the DISTINCT option to exclude duplicates, targeting the Oracle 19c environment. We will verify the behavior using a test table to aim for SQL without aggregation errors.


Conclusion: List of Rules During Aggregation

First, let’s grasp the basic rules of NULL and DISTINCT in aggregate functions.

1. Handling NULL (Basically “Ignored”)

Aggregate functions basically treat NULL as “non-existent data”. It is excluded from both the calculation target rows and the denominator (row count). The only exception is COUNT(*).

FunctionHandling NULLRemarks
SUM / AVGIgnoredRows where the value is NULL are not included in the formula.
MAX / MINIgnoredFinds max/min from non-NULL values.
COUNT(column)IgnoredCounts only rows containing a value.
COUNT(*)IncludedCounts as 1 row even if the row contains only NULLs.

2. Handling DISTINCT (Duplicate Elimination)

If you write DISTINCT inside the function, aggregation is performed after removing duplicate data.

  • Syntax Example: SUM(DISTINCT SAL)
  • Process Order: Duplicate Elimination → Aggregation

3. Building a Verification Environment

Create the EMP_TEST table for verification to understand the behavior correctly.
You can execute this in your own learning environment (SQL*Plus, SQL Developer, etc.).

Prerequisites:

  • Oracle Database 19c (Enterprise/Standard)
  • User requires CREATE TABLE privilege and tablespace quota.

Oracle Master Exam Tip:

Creating tables and manipulating data are fundamental skills often tested.

-- 1. Create verification table
CREATE TABLE EMP_TEST (
    EMP_ID  NUMBER PRIMARY KEY,    -- Employee ID
    NAME    VARCHAR2(50),          -- Name
    JOB     VARCHAR2(50),          -- Job Title
    SAL     NUMBER,                -- Salary (Has NULLs)
    DEPT_ID NUMBER,                -- Department ID
    BONUS   NUMBER                 -- Bonus (Has NULLs)
);

-- 2. Insert data (Intentionally including NULLs)
INSERT INTO EMP_TEST VALUES (1, 'Alice', 'Manager', 3000, 10, NULL);
INSERT INTO EMP_TEST VALUES (2, 'Bob', 'Clerk', NULL, 20, NULL);
INSERT INTO EMP_TEST VALUES (3, 'Charlie', 'Analyst', 4000, 10, 500);
INSERT INTO EMP_TEST VALUES (4, 'David', 'Clerk', 2500, 30, 300);
INSERT INTO EMP_TEST VALUES (5, 'Emma', 'Salesman', 2000, 30, 200);
INSERT INTO EMP_TEST VALUES (6, 'Frank', 'Manager', 3000, 20, NULL);
INSERT INTO EMP_TEST VALUES (7, 'Grace', 'Analyst', 4000, 20, 500);
INSERT INTO EMP_TEST VALUES (8, 'Helen', 'Salesman', 2000, 30, 200);
INSERT INTO EMP_TEST VALUES (9, 'Ian', 'Clerk', 2500, 20, NULL);
INSERT INTO EMP_TEST VALUES (10, 'Jane', 'Clerk', NULL, 10, NULL);

COMMIT;

4. Practice: Behavior of Aggregate Functions and NULL

Using the actual data, we will confirm how NULL affects calculation results.

4-1. SUM (Total) and COUNT (Number of items)

The SAL (Salary) column contains 2 NULLs (out of 10 rows, only 8 have values).

SELECT 
    SUM(SAL)   AS TOTAL_SAL,    -- Sum excluding NULL
    COUNT(SAL) AS COUNT_SAL,    -- Number of non-NULL rows
    COUNT(*)   AS COUNT_ALL     -- Total number of rows
FROM EMP_TEST;

Execution Result and Explanation

SQL> SELECT
  2      SUM(SAL)   AS TOTAL_SAL,
  3      COUNT(SAL) AS COUNT_SAL,
  4      COUNT(*)   AS COUNT_ALL
  5  FROM EMP_TEST;

 TOTAL_SAL  COUNT_SAL  COUNT_ALL
---------- ---------- ----------
     23000          8         10
  • TOTAL_SAL (23000): The result of adding the salaries of 8 people excluding NULL. NULL is not treated as 0, but simply ignored.
  • COUNT_SAL (8): The number of employees who have salary information.
  • COUNT_ALL (10): The number of rows in the entire table.

4-2. [Important] The Pitfall of AVG (Average)

When calculating average values, special attention is needed because the handling of NULL affects the “denominator”.

SELECT 
    AVG(SAL)           AS AVG_IGNORE_NULL,  -- A: Average ignoring NULL
    AVG(NVL(SAL, 0))   AS AVG_INCLUDE_NULL  -- B: Average considering NULL as 0
FROM EMP_TEST;

Execution Result

SQL> SELECT
  2      AVG(SAL)           AS AVG_IGNORE_NULL,
  3      AVG(NVL(SAL, 0))   AS AVG_INCLUDE_NULL
  4  FROM EMP_TEST;

AVG_IGNORE_NULL AVG_INCLUDE_NULL
--------------- ----------------
           2875             2300
  • A: 2875 (23000 ÷ 8 people): Average amount only for those receiving a salary.
  • B: 2300 (23000 ÷ 10 people): Since NULL was converted to 0 with NVL(SAL, 0), the denominator became the total number of employees (10). If you want to output “the average of everyone including those with undecided salaries,” this is the correct answer.

5. Practice: Aggregation Eliminating Duplicates Using DISTINCT

Using the DISTINCT keyword allows you to aggregate only using unique values, excluding duplicates. This is effective when you want to know the “number of types.”

5-1. SUM + DISTINCT

Count people with the same salary amount as 1 person and calculate the sum (This is not very practical, but we do it to understand the behavior).

SELECT 
    SUM(SAL)          AS NORMAL_SUM,    -- Normal sum
    SUM(DISTINCT SAL) AS UNIQUE_SUM     -- Sum excluding duplicates
FROM EMP_TEST;

Execution Result

SQL> SELECT
  2      SUM(SAL)          AS NORMAL_SUM,
  3      SUM(DISTINCT SAL) AS UNIQUE_SUM
  4  FROM EMP_TEST;

NORMAL_SUM UNIQUE_SUM
---------- ----------
     23000      11500

Breakdown of Calculation (UNIQUE_SUM):
Unique values: 2000, 2500, 3000, 4000
Sum: 11500

5-2. COUNT + DISTINCT (Frequently Used)

Often used when counting numbers of types, such as “How many departments exist?”.

SELECT 
    COUNT(DEPT_ID)          AS ALL_ROWS,    -- Count of all rows
    COUNT(DISTINCT DEPT_ID) AS UNIQUE_DEPTS -- Number of department types
FROM EMP_TEST;

Execution Result

SQL> SELECT
  2      COUNT(DEPT_ID)          AS ALL_ROWS,    -- Count of all rows
  3      COUNT(DISTINCT DEPT_ID) AS UNIQUE_DEPTS -- Number of department types
  4  FROM EMP_TEST;

  ALL_ROWS UNIQUE_DEPTS
---------- ------------
        10            3

Explanation:

DEPT_ID repeatedly contains 3 types of values: 10, 20, and 30. COUNT(DISTINCT …) returns “3”, which is the count of those types.


6. Troubleshooting & Notes

Impact on Performance

When DISTINCT is used, “sorting” and “hash processing” occur internally in the database, incurring costs to search for duplicates. Overusing COUNT(DISTINCT column) on tables with millions of rows can cause query response deterioration.

Avoiding ORA Errors

DISTINCT is written inside the parentheses of the aggregate function.

  • Correct: COUNT(DISTINCT col)
  • Incorrect: DISTINCT COUNT(col) … This results in a syntax error.

7. FAQ: Frequently Asked Questions

Q1. I want to count NULL as a specific value (e.g., ‘Unset’).

A. Use the NVL or COALESCE function.

By doing COUNT(DISTINCT NVL(JOB, ‘Unset’)), you can count NULL as one type called ‘Unset’.

Q2. Can I perform DISTINCT aggregation on multiple columns?

A. Basic aggregate functions only allow specifying one column (or expression).

You need to avoid this by concatenating strings like COUNT(DISTINCT col1 || col2) or using a subquery.

Q3. Should I calculate average treating NULL as 0 in AVG function?

A. It depends on the business requirements.

Check if it is “Average of actual values (Exclude NULL)” or “Overall expected value (NULL is 0)”. If it’s unclear, it is helpful to list both in the report.


Summary

  1. Basic Rule: All aggregate functions ignore NULL, except for COUNT(*).
  2. The Trap of Average: When using AVG, always consider whether to ignore NULL or turn it into 0 with NVL.
  3. Duplicate Elimination: Write DISTINCT inside parentheses (e.g., COUNT(DISTINCT column_name)).
  4. Actual Machine Verification: When calculations don’t match, check if NULL data is mixed in.

This article explains using Oracle Database 19c (screens and default values may differ in other versions).

[reference]
Oracle Database SQL Language Reference, 19c

コメント

Copied title and URL