How to Use Oracle SQL GROUP BY and HAVING! Aggregation and Conditions

English

“I want to calculate the total salary for each department.” “I want to extract only classes with an average score of 80 or more.”

In Oracle SQL, GROUP BY and HAVING are used when you want to group data, aggregate it, and then specify conditions based on the results.

This article explains everything from the basic syntax of GROUP BY (grouping) and HAVING (filtering after aggregation) to the distinction from the WHERE clause, which beginners often stumble upon, using actual examples from the SCOTT schema.

Conclusion & Quick Reference for Syntax

The basic form and evaluation order of SQL when performing aggregation and filtering are as follows.

Basic Syntax

SELECT column_name, AGGREGATE_FUNCTION(column_name)
FROM table_name
WHERE filtering_condition_before_aggregation
GROUP BY column_name_to_group
HAVING filtering_condition_after_aggregation
ORDER BY sorting;

Order of Execution (Evaluation)

  1. FROM: Look at the target table
  2. WHERE: Filter row by row (Before aggregation)
  3. GROUP BY: Combine data into groups
  4. HAVING: Filter by group (After aggregation)
  5. SELECT: Display results
  6. ORDER BY: Sort

1. Background and Basics: How GROUP BY and HAVING Work

What is GROUP BY?

It combines data with the same value in specified columns into “one group.” By grouping, you can apply aggregate functions such as SUM (total) and AVG (average) for each unit.

What is HAVING?

It specifies conditions for the aggregated results. Specifications such as “groups where the total is XX or more” must be done in the HAVING clause, not the WHERE clause.

Definitive Difference Between WHERE and HAVING

  • WHERE: Filters original data (rows). Discards unnecessary data “before aggregation.”
  • HAVING: Filters aggregated results (groups). Judges conditions “after aggregation.”

2. Practice: Verification with the SCOTT Schema

From here, we will check the behavior using the EMP (Employee) table of the SCOTT schema, which is the standard sample for Oracle Database.

Prerequisite Environment:

  • Oracle Database 19c
  • Connected as SCOTT user (or a user with equivalent tables)

You can import and use the SCOTT sample schema by executing the following:

$ sqlplus / as sysdba
SQL> @?/rdbms/admin/utlsampl.sql
$ sqlplus scott/tiger

Check Data

First, let’s check the raw data before aggregation.

SELECT EMPNO, ENAME, JOB, SAL, DEPTNO FROM EMP ORDER BY DEPTNO;

Execution Result:

SQL> SELECT EMPNO, ENAME, JOB, SAL, DEPTNO FROM EMP ORDER BY DEPTNO;

EMPNO ENAME JOB SAL DEPTNO
---------- ---------- --------- ---------- ----------
7782 CLARK MANAGER 2450 10
7839 KING PRESIDENT 5000 10
7934 MILLER CLERK 1300 10
7566 JONES MANAGER 2975 20
7902 FORD ANALYST 3000 20
7876 ADAMS CLERK 1100 20
7369 SMITH CLERK 800 20
7788 SCOTT ANALYST 3000 20
7521 WARD SALESMAN 1250 30
7844 TURNER SALESMAN 1500 30
7499 ALLEN SALESMAN 1600 30
7900 JAMES CLERK 950 30
7698 BLAKE MANAGER 2850 30
7654 MARTIN SALESMAN 1250 30

14 rows selected.

3. Procedure/Implementation: From Basic to Advanced

Pattern 1: GROUP BY Basics (Total Salary by Department)

Group by department number (DEPTNO) and calculate the total salary (SAL) for each.

SELECT 
    DEPTNO, 
    SUM(SAL) AS TOTAL_SALARY
FROM EMP
GROUP BY DEPTNO
ORDER BY DEPTNO;

Execution Result:

SQL> SELECT
2 DEPTNO,
3 SUM(SAL) AS TOTAL_SALARY
4 FROM EMP
5 GROUP BY DEPTNO
6 ORDER BY DEPTNO;

DEPTNO TOTAL_SALARY
---------- ------------
10 8750
20 10875
30 9400

Explanation:

The scattered rows were consolidated into one row per department code, and the SAL within them was summed up.

Pattern 2: Adding HAVING (Extract Only Departments with High Total Salary)

If you want to know “only departments where the total salary exceeds 9,000”, add HAVING.

SELECT 
    DEPTNO, 
    SUM(SAL) AS TOTAL_SALARY
FROM EMP
GROUP BY DEPTNO
HAVING SUM(SAL) > 9000
ORDER BY DEPTNO;

Execution Result:

SQL> SELECT
2 DEPTNO,
3 SUM(SAL) AS TOTAL_SALARY
4 FROM EMP
5 GROUP BY DEPTNO
6 HAVING SUM(SAL) > 9000
7 ORDER BY DEPTNO;

DEPTNO TOTAL_SALARY
---------- ------------
20 10875
30 9400

Explanation:

“DEPTNO 10”, which had a total of 8750, was excluded by filtering after aggregation.

Pattern 3: Combination of WHERE and HAVING (Advanced)

This is the form closest to actual business use.

“Aggregate only employees whose job is ‘CLERK’, and display departments where the average salary within that group is 1000 or more.”

SELECT 
    DEPTNO, 
    COUNT(*) AS MEMBER_COUNT, 
    AVG(SAL) AS AVG_SAL
FROM EMP
WHERE JOB = 'CLERK'           -- 1. First, filter only CLERKs
GROUP BY DEPTNO               -- 2. Group by department
HAVING AVG(SAL) >= 1000;      -- 3. Keep average of 1000 or more

Execution Result:

SQL> SELECT
2 DEPTNO,
3 COUNT(*) AS MEMBER_COUNT,
4 AVG(SAL) AS AVG_SAL
5 FROM EMP
6 WHERE JOB = 'CLERK'
7 GROUP BY DEPTNO
8 HAVING AVG(SAL) >= 1000;

DEPTNO MEMBER_COUNT AVG_SAL
---------- ------------ ----------
10 1 1300

Explanation:

  1. WHERE: Extracts only CLERK from all employees.
  2. GROUP BY: Divides the remaining CLERKs by department.
  3. HAVING: Judges the average salary of that group.

4. Troubleshooting: Common Errors

Here are ORA errors that occur frequently in aggregation queries and how to deal with them.

Error CodeMessageCause and Solution
ORA-00934group function is not allowed hereCause: Using aggregate functions (SUM, AVG, etc.) in the WHERE clause.
Solution: Move conditions for aggregation results to the HAVING clause.
ORA-00979not a GROUP BY expressionCause: Columns in the SELECT list are not included in GROUP BY.
Solution: All columns not using aggregate functions must be described in the GROUP BY clause.

NG Example (ORA-00934):

-- Mistake: Trying to judge aggregation results with WHERE
SELECT DEPTNO, SUM(SAL) 
FROM EMP 
WHERE SUM(SAL) > 9000 
GROUP BY DEPTNO;

Execution Result:

SQL> SELECT DEPTNO, SUM(SAL)
2 FROM EMP
3 WHERE SUM(SAL) > 9000
4 GROUP BY DEPTNO;
WHERE SUM(SAL) > 9000
*
ERROR at line 3:
ORA-00934: group function is not allowed here

5. Operational, Monitoring, and Security Notes

Impact on Performance

GROUP BY performs data sorting and hash processing internally, so the load increases if there is a large amount of data.

Best Practice: Filter data with the WHERE clause beforehand as much as possible, then GROUP BY. Reducing the number of rows to be aggregated is the best tuning.

Handling NULL

If the GROUP BY column contains NULL, Oracle groups NULL as one group. If this is unintended, exclude it with WHERE column_name IS NOT NULL.


6. FAQ: Frequently Asked Questions

Q1. Can I GROUP BY a column not written in the SELECT clause?

A. Yes, it is possible. However, since that column is not displayed in the result, you may not be able to distinguish which group the aggregation result belongs to. Usually, match the columns in the SELECT clause and GROUP BY clause.

Q2. Can I use aliases in the HAVING clause?

A. As of Oracle Database 19c, in standard settings, you cannot directly use aliases (AS XX) defined in the SELECT clause in GROUP BY or HAVING clauses (it results in ORA-00904 error). Write the expression as is or use a subquery.

Q3. Can I group by multiple columns?

A. Yes, you can specify multiple columns separated by commas.

Example: GROUP BY DEPTNO, JOB

This aggregates by detailed groups of “Department AND Job”.


Summary

  1. Division of Roles: Filtering before aggregation is WHERE, filtering after aggregation is HAVING.
  2. Syntax Order: Write (and execute) in the order of WHEREGROUP BYHAVING.
  3. Error Avoidance: Write columns in SELECT in GROUP BY as well (except for aggregate functions).
  4. Practice: It is easier to understand if you execute with only GROUP BY first, look at the result, and then add HAVING.

This article explains using Oracle Database 19c (screens and default values may differ in other versions).


[reference]
Oracle Database SQL Language Reference, 19c

コメント

Copied title and URL