When manipulating data with SQL, you often encounter situations where you want to “eliminate duplicate data.” For example, when you want to extract a unique customer list or non-duplicated data for specific items. The DISTINCT clause is useful in such cases. This article provides a thorough explanation for beginners, covering everything from the basics of DISTINCT to its applications and points of caution. It also covers common mistakes and error examples to provide the knowledge you need to master it!
Note: This article will explain using examples based on the SCOTT sample schema. The SCOTT sample schema can be imported and used by executing the following:
$ sqlplus / as sysdba
SQL> @?/rdbms/admin/utlsampl.sql
$ sqlplus scott/tiger
1. What is DISTINCT?
The DISTINCT clause is an SQL clause used in a SELECT statement to eliminate duplicates from specified columns and extract only unique data. Sample Data: EMP table To retrieve only the unique department numbers from this data, use the following query.
SELECT DISTINCT DEPTNO FROM EMP;
Execution Result
SQL> SELECT DEPTNO FROM EMP;
DEPTNO
----------
20
30
30
20
30
30
10
20
10
30
20
30
20
10
14 rows selected.
SQL> SELECT DISTINCT DEPTNO FROM EMP;
DEPTNO
----------
30
10
20
As a result, only the department numbers “10”, “20”, and “30” were displayed.
2. Basic Syntax and Usage
The basic syntax for using DISTINCT is as follows:
SELECT DISTINCT column_name FROM table_name;
2.1 Applying to a Single Column
The following query retrieves the unique job titles from the JOB column.
SELECT DISTINCT JOB FROM EMP;
Execution Result
SQL> SELECT JOB FROM EMP;
JOB
---------
CLERK
SALESMAN
SALESMAN
MANAGER
SALESMAN
MANAGER
MANAGER
ANALYST
PRESIDENT
SALESMAN
CLERK
CLERK
ANALYST
CLERK
14 rows selected.
SQL> SELECT DISTINCT JOB FROM EMP;
JOB
---------
CLERK
SALESMAN
ANALYST
MANAGER
PRESIDENT
In this way, you can easily retrieve only the unique data from the specified column.
2.2 Applying to Multiple Columns
DISTINCT can also be used for combinations of multiple columns. The following query retrieves the unique combinations of JOB and DEPTNO.
SELECT DISTINCT JOB, DEPTNO FROM EMP;
Execution Result
SQL> SELECT JOB, DEPTNO FROM EMP;
JOB DEPTNO
--------- ----------
CLERK 20
SALESMAN 30
SALESMAN 30
MANAGER 20
SALESMAN 30
MANAGER 30
MANAGER 10
ANALYST 20
PRESIDENT 10
SALESMAN 30
CLERK 20
CLERK 30
ANALYST 20
CLERK 10
14 rows selected.
SQL> SELECT DISTINCT JOB, DEPTNO FROM EMP;
JOB DEPTNO
--------- ----------
CLERK 20
MANAGER 20
MANAGER 30
MANAGER 10
SALESMAN 30
PRESIDENT 10
ANALYST 20
CLERK 30
CLERK 10
9 rows selected.
DISTINCT eliminates duplicates based on all specified columns.
2.3 Combining DISTINCT and COUNT
By combining DISTINCT with the COUNT function, you can get the number of unique values.
SELECT COUNT(DISTINCT DEPTNO) AS UNIQUE_DEPT_COUNT FROM EMP;
Execution Result
SQL> SELECT COUNT(DEPTNO) FROM EMP;
COUNT(DEPTNO)
-------------
14
SQL> SELECT COUNT(DISTINCT DEPTNO) AS UNIQUE_DEPT_COUNT FROM EMP;
UNIQUE_DEPT_COUNT
-----------------
3
We can see that there are 3 unique department number values.
3. Advanced Applications
3.1 Handling NULL Values
DISTINCT treats NULL as a single unique value. Let’s confirm by running the following query.
SELECT DISTINCT DEPTNO FROM EMP WHERE DEPTNO IS NULL;
In the results, NULL will be displayed only once.
3.2 SELECT DISTINCT *
Using SELECT DISTINCT * retrieves all unique rows from the entire table. However, this can impact performance if there are many columns or a large amount of data.
4. Points of Caution and Common Misunderstandings
4.1 DISTINCT Applies to All Specified Columns
When DISTINCT is applied to multiple columns, unique data based on the combination of all columns is retrieved. Let’s look at the following query.
SELECT DISTINCT JOB, DEPTNO FROM EMP;
In this case, it retrieves only the unique combinations of JOB and DEPTNO. If you want to get unique data for JOB alone, you must limit the columns.
4.2 Performance Impact
DISTINCT performs internal sort or hash operations, which can impact performance when dealing with large amounts of data. Improvement Suggestions
- Write queries by narrowing down to only the necessary columns.
- Reduce the data volume using the
WHEREclause.
5. Common Error Examples
5.1 Specifying the Wrong Column
SELECT DISTINCT NON_EXISTING_COLUMN FROM EMP;
This query specifies a column that does not exist, so it will result in an error. Check the column names carefully.
5.2 Unnecessary Use of DISTINCT
SELECT DISTINCT ENAME, ENAME FROM EMP;
Specifying the same column multiple times is redundant. Be careful when selecting columns.
6. Summary
The DISTINCT clause is a powerful tool for eliminating data duplication and retrieving unique data. You can use it even more effectively by keeping the following points in mind:
- Use it by specifying only the necessary columns.
- Be mindful of performance and appropriately utilize
WHEREclauses and indexes. - Understand the characteristics of applying it to NULL values and multiple columns.
Let’s use DISTINCT correctly to achieve efficient data extraction!
[reference]
Oracle Database SQL Language Reference, 19c

コメント