“The total sum is less than expected.” “The average value doesn’t match my manual calculation.”
When performing data analysis in Oracle Database, the biggest cause of such discrepancies lies in the handling of NULL.
This article explains the behavior of NULL in aggregate functions (SUM, AVG, MAX, MIN, COUNT) and the correct usage of the DISTINCT option to exclude duplicates, targeting the Oracle 19c environment. We will verify the behavior using a test table to aim for SQL without aggregation errors.
Conclusion: List of Rules During Aggregation
First, let’s grasp the basic rules of NULL and DISTINCT in aggregate functions.
1. Handling NULL (Basically “Ignored”)
Aggregate functions basically treat NULL as “non-existent data”. It is excluded from both the calculation target rows and the denominator (row count). The only exception is COUNT(*).
| Function | Handling NULL | Remarks |
| SUM / AVG | Ignored | Rows where the value is NULL are not included in the formula. |
| MAX / MIN | Ignored | Finds max/min from non-NULL values. |
| COUNT(column) | Ignored | Counts only rows containing a value. |
| COUNT(*) | Included | Counts as 1 row even if the row contains only NULLs. |
2. Handling DISTINCT (Duplicate Elimination)
If you write DISTINCT inside the function, aggregation is performed after removing duplicate data.
- Syntax Example:
SUM(DISTINCT SAL) - Process Order: Duplicate Elimination → Aggregation
3. Building a Verification Environment
Create the EMP_TEST table for verification to understand the behavior correctly.
You can execute this in your own learning environment (SQL*Plus, SQL Developer, etc.).
Prerequisites:
- Oracle Database 19c (Enterprise/Standard)
- User requires
CREATE TABLEprivilege and tablespace quota.
Oracle Master Exam Tip:
Creating tables and manipulating data are fundamental skills often tested.
-- 1. Create verification table
CREATE TABLE EMP_TEST (
EMP_ID NUMBER PRIMARY KEY, -- Employee ID
NAME VARCHAR2(50), -- Name
JOB VARCHAR2(50), -- Job Title
SAL NUMBER, -- Salary (Has NULLs)
DEPT_ID NUMBER, -- Department ID
BONUS NUMBER -- Bonus (Has NULLs)
);
-- 2. Insert data (Intentionally including NULLs)
INSERT INTO EMP_TEST VALUES (1, 'Alice', 'Manager', 3000, 10, NULL);
INSERT INTO EMP_TEST VALUES (2, 'Bob', 'Clerk', NULL, 20, NULL);
INSERT INTO EMP_TEST VALUES (3, 'Charlie', 'Analyst', 4000, 10, 500);
INSERT INTO EMP_TEST VALUES (4, 'David', 'Clerk', 2500, 30, 300);
INSERT INTO EMP_TEST VALUES (5, 'Emma', 'Salesman', 2000, 30, 200);
INSERT INTO EMP_TEST VALUES (6, 'Frank', 'Manager', 3000, 20, NULL);
INSERT INTO EMP_TEST VALUES (7, 'Grace', 'Analyst', 4000, 20, 500);
INSERT INTO EMP_TEST VALUES (8, 'Helen', 'Salesman', 2000, 30, 200);
INSERT INTO EMP_TEST VALUES (9, 'Ian', 'Clerk', 2500, 20, NULL);
INSERT INTO EMP_TEST VALUES (10, 'Jane', 'Clerk', NULL, 10, NULL);
COMMIT;
4. Practice: Behavior of Aggregate Functions and NULL
Using the actual data, we will confirm how NULL affects calculation results.
4-1. SUM (Total) and COUNT (Number of items)
The SAL (Salary) column contains 2 NULLs (out of 10 rows, only 8 have values).
SELECT
SUM(SAL) AS TOTAL_SAL, -- Sum excluding NULL
COUNT(SAL) AS COUNT_SAL, -- Number of non-NULL rows
COUNT(*) AS COUNT_ALL -- Total number of rows
FROM EMP_TEST;
Execution Result and Explanation
SQL> SELECT
2 SUM(SAL) AS TOTAL_SAL,
3 COUNT(SAL) AS COUNT_SAL,
4 COUNT(*) AS COUNT_ALL
5 FROM EMP_TEST;
TOTAL_SAL COUNT_SAL COUNT_ALL
---------- ---------- ----------
23000 8 10
- TOTAL_SAL (23000): The result of adding the salaries of 8 people excluding NULL. NULL is not treated as 0, but simply ignored.
- COUNT_SAL (8): The number of employees who have salary information.
- COUNT_ALL (10): The number of rows in the entire table.
4-2. [Important] The Pitfall of AVG (Average)
When calculating average values, special attention is needed because the handling of NULL affects the “denominator”.
SELECT
AVG(SAL) AS AVG_IGNORE_NULL, -- A: Average ignoring NULL
AVG(NVL(SAL, 0)) AS AVG_INCLUDE_NULL -- B: Average considering NULL as 0
FROM EMP_TEST;
Execution Result
SQL> SELECT
2 AVG(SAL) AS AVG_IGNORE_NULL,
3 AVG(NVL(SAL, 0)) AS AVG_INCLUDE_NULL
4 FROM EMP_TEST;
AVG_IGNORE_NULL AVG_INCLUDE_NULL
--------------- ----------------
2875 2300
- A: 2875 (23000 ÷ 8 people): Average amount only for those receiving a salary.
- B: 2300 (23000 ÷ 10 people): Since NULL was converted to 0 with
NVL(SAL, 0), the denominator became the total number of employees (10). If you want to output “the average of everyone including those with undecided salaries,” this is the correct answer.
5. Practice: Aggregation Eliminating Duplicates Using DISTINCT
Using the DISTINCT keyword allows you to aggregate only using unique values, excluding duplicates. This is effective when you want to know the “number of types.”
5-1. SUM + DISTINCT
Count people with the same salary amount as 1 person and calculate the sum (This is not very practical, but we do it to understand the behavior).
SELECT
SUM(SAL) AS NORMAL_SUM, -- Normal sum
SUM(DISTINCT SAL) AS UNIQUE_SUM -- Sum excluding duplicates
FROM EMP_TEST;
Execution Result
SQL> SELECT
2 SUM(SAL) AS NORMAL_SUM,
3 SUM(DISTINCT SAL) AS UNIQUE_SUM
4 FROM EMP_TEST;
NORMAL_SUM UNIQUE_SUM
---------- ----------
23000 11500
Breakdown of Calculation (UNIQUE_SUM):
Unique values: 2000, 2500, 3000, 4000
Sum: 11500
5-2. COUNT + DISTINCT (Frequently Used)
Often used when counting numbers of types, such as “How many departments exist?”.
SELECT
COUNT(DEPT_ID) AS ALL_ROWS, -- Count of all rows
COUNT(DISTINCT DEPT_ID) AS UNIQUE_DEPTS -- Number of department types
FROM EMP_TEST;
Execution Result
SQL> SELECT
2 COUNT(DEPT_ID) AS ALL_ROWS, -- Count of all rows
3 COUNT(DISTINCT DEPT_ID) AS UNIQUE_DEPTS -- Number of department types
4 FROM EMP_TEST;
ALL_ROWS UNIQUE_DEPTS
---------- ------------
10 3
Explanation:
DEPT_ID repeatedly contains 3 types of values: 10, 20, and 30. COUNT(DISTINCT …) returns “3”, which is the count of those types.
6. Troubleshooting & Notes
Impact on Performance
When DISTINCT is used, “sorting” and “hash processing” occur internally in the database, incurring costs to search for duplicates. Overusing COUNT(DISTINCT column) on tables with millions of rows can cause query response deterioration.
Avoiding ORA Errors
DISTINCT is written inside the parentheses of the aggregate function.
- Correct:
COUNT(DISTINCT col) - Incorrect:
DISTINCT COUNT(col)… This results in a syntax error.
7. FAQ: Frequently Asked Questions
Q1. I want to count NULL as a specific value (e.g., ‘Unset’).
A. Use the NVL or COALESCE function.
By doing COUNT(DISTINCT NVL(JOB, ‘Unset’)), you can count NULL as one type called ‘Unset’.
Q2. Can I perform DISTINCT aggregation on multiple columns?
A. Basic aggregate functions only allow specifying one column (or expression).
You need to avoid this by concatenating strings like COUNT(DISTINCT col1 || col2) or using a subquery.
Q3. Should I calculate average treating NULL as 0 in AVG function?
A. It depends on the business requirements.
Check if it is “Average of actual values (Exclude NULL)” or “Overall expected value (NULL is 0)”. If it’s unclear, it is helpful to list both in the report.
Summary
- Basic Rule: All aggregate functions ignore NULL, except for
COUNT(*). - The Trap of Average: When using
AVG, always consider whether to ignore NULL or turn it into 0 withNVL. - Duplicate Elimination: Write
DISTINCTinside parentheses (e.g.,COUNT(DISTINCT column_name)). - Actual Machine Verification: When calculations don’t match, check if NULL data is mixed in.
This article explains using Oracle Database 19c (screens and default values may differ in other versions).
[reference]
Oracle Database SQL Language Reference, 19c

コメント