Monday, December 25, 2017

Joins and Subqueries



This article is the second in a series that helps you build on the fundamentals you learned in the 12-part SQL 101 series in Oracle Magazine. In the previous article in the Beyond SQL 101 series, you learned how to use table aliases to reduce the amount of code necessary to formulate a table join. You saw, too, that a Cartesian product can result from missing join criteria, and you learned how to double-check that you’ve included the correct number of necessary join conditions against the number of tables in your FROM clause. You also got an introduction to ANSI syntax for inner joins, and you learned how the USING and ON clauses can be used.
In this article, you’ll learn about
·    Outer joins and self-joins
·    Scalar subqueries
·    Correlated subqueries
Outer joins and self-joins help you perform more-complex actions to obtain results that are difficult to get with an ordinary equijoin. The results of an outer join include the rows returned by an equijoin plus certain rows from one table for which no rows in the other table satisfy the join condition. A self-join joins a table to itself. And subqueries—whether they are scalar or correlated—enable you to use output from one query as input to another query or SQL statement.
To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system. I recommend installing Oracle Database, Enterprise Edition 12c Release 1 (12.1.0.2.0). If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_201 is the user account to use for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for the SYS and SYSTEM users and make a note of them.
Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_201 schema that are required for this article’s examples. (View the script in a text editor for execution instructions.)

All Inclusive
Suppose that your business requirement is to obtain a list of all employees in your company along with their assigned departments, if any. The list needs to include any employees who don’t have an assigned department value. The query in Listing 1 obtains a result set of all employees in the EMPLOYEE table alongside their DEPARTMENT_ID values. You can see that one employee, Frances Newton, does not have an assigned DEPARTMENT_ID value, so that employee’s DEPARTMENT_ID value is NULL.
Code Listing 1: Obtaining a list of employees with their DEPARTMENT_ID values
SQL> set feedback on
SQL> set lines 32000
SQL> select rtrim(first_name||' '||last_name) "Employee Name", department_id
  2    from employee
  3  order by last_name, first_name;

Employee Name                            DEPARTMENT_ID
———————————————————————————————————————— —————————————
Lori Dovichi                                        10
Emily Eckhardt                                      10
Roger Friedli                                       10
Betsy James                                         10
Thomas Jeffrey                                      30
Matthew Michaels                                    10
Donald Newton                                       10
Frances Newton
Theresa Wong                                        30
mark leblanc                                        20
michael peterson                                    20

11 rows selected.


A more meaningful result set, though, would include department names instead of DEPARTMENT_ID values. You can obtain the name of each employee’s department by joining the EMPLOYEE table to the DEPARTMENT table. But if you use an equijoin statement—one that joins the DEPARTMENT_ID column of the EMPLOYEE table with the DEPARTMENT_ID column of the DEPARTMENT table—you won’t meet your business requirement. An equijoin relies on the fact that a column value that exists in one column also exists in the column with which it is being joined. But as you learned in “United Relations: Accessing More Than One Table at Once,” a NULL value cannot be equal to another value—not even another NULL value. The row with a value of NULL in the DEPARTMENT_ID column would be ignored, so the result set would not include a row for Frances Newton.
You can get the full result set by using an outer join instead of an equijoin. Like an equijoin, an outer join returns all the records that match the join criteria. But an outer join can also return records from one table that have no matching records in another table. For example, you can use an outer join when you want to show records with NULL values for nonmatching rows. When an outer join operator is placed on a table.column combination, the query returns NULL column values when an equijoin match is not found.
One outer join syntax option uses Oracle’s outer join operator, (+). The query in Listing 2 uses this operator to obtain every employee record in the EMPLOYEE table alongside the name of the employee’s assigned department. The outer join operator is placed on the D.DEPARTMENT_ID table.column combination. Therefore, even if a DEPARTMENT_ID value from the EMPLOYEE table has no match in the DEPARTMENT table, the query result should include any records from the EMPLOYEE table with a NULL value for the unmatched DEPARTMENT_ID value. The result includes the record for Frances Newton, demonstrating that the requested outer join operation took place.
Code Listing 2: An outer join that uses Oracle’s (+) operator
SQL> select d.name "Department Name", rtrim(first_name||' '||last_name) "Employee Name"
  2    from department d, employee e
  3   where e.department_id = d.department_id(+)
  4  order by d.name, last_name, first_name;

Department Name          Employee Name
———————————————————————— ————————————————————————————————
Accounting               Lori Dovichi
Accounting               Emily Eckhardt
Accounting               Roger Friedli
Accounting               Betsy James
Accounting               Matthew Michaels
Accounting               Donald Newton
IT                       Thomas Jeffrey
IT                       Theresa Wong
Payroll                  mark leblanc
Payroll                  michael peterson
                         Frances Newton

11 rows selected.


Listings 3 and 4 demonstrate the same query and results as Listing 2, but they use ANSI outer join syntax instead of the Oracle-specific (+) operator. Listing 3 uses the ANSI RIGHT OUTER JOIN keywords to indicate that the outer join operation should be placed on the table that is listed to the right of the keywords (the EMPLOYEE table). Listing 4 reverses the order of the tables listed in the FROM clause, to demonstrate the use of the ANSI LEFT OUTER JOIN keywords. In this case, the outer join operation is placed on the EMPLOYEE table, because it is the table listed to the left of the outer join keywords.
Code Listing 3: An outer join that uses the ANSI RIGHT OUTER JOIN syntax
SQL> select d.name "Department Name", rtrim(first_name||' '||last_name) "Employee Name"
  2    from department d RIGHT OUTER JOIN employee e
  3      ON d.department_id = e.department_id
  4  order by d.name, last_name, first_name;

Department Name          Employee Name
———————————————————————— ————————————————————————————————
Accounting               Lori Dovichi
Accounting               Emily Eckhardt
Accounting               Roger Friedli
Accounting               Betsy James
Accounting               Matthew Michaels
Accounting               Donald Newton
IT                       Thomas Jeffrey
IT                       Theresa Wong
Payroll                  mark leblanc
Payroll                  michael peterson
                         Frances Newton

11 rows selected.

Code Listing 4: An outer join that uses the ANSI LEFT OUTER JOIN syntax
SQL> select d.name "Department Name", rtrim(first_name||' '||last_name) "Employee Name"
  2    from employee e LEFT OUTER JOIN department d
  3      ON d.department_id = e.department_id
  4  order by d.name, last_name, first_name;

Department Name          Employee Name
———————————————————————— ————————————————————————————————
Accounting               Lori Dovichi
Accounting               Emily Eckhardt
Accounting               Roger Friedli
Accounting               Betsy James
Accounting               Matthew Michaels
Accounting               Donald Newton
IT                       Thomas Jeffrey
IT                       Theresa Wong
Payroll                  mark leblanc
Payroll                  michael peterson
                         Frances Newton

11 rows selected.


Coming Full Circle
Whereas an equijoin always joins two or more different tables, a self-join joins a table to itself. You list the table multiple times in the FROM clause, using table aliases to distinguish the multiple instances of the table name from one another.
Suppose that your business requirement is to generate a list of managers alongside the employees who report to those managers. The EMPLOYEE table contains both an EMPLOYEE_ID column and a MANAGER column. A quick query of this table, shown in Listing 5, shows that some EMPLOYEE_ID values can be found in the MANAGER column for other employees. For example, you can see that Emily Eckhardt (28) is the manager of five other employees.
Code Listing 5: Query showing that certain employees are the managers of other employees
SQL> select employee_id, rtrim(first_name||' '||last_name) "Employee Name", manager
  2    from employee
  3  order by last_name, first_name;

EMPLOYEE_ID Employee Name                 MANAGER
——————————— ————————————————————————————— ———————
       6573 Lori Dovichi                       28
         28 Emily Eckhardt
       6567 Roger Friedli                      28
       6568 Betsy James                        28
       6571 Thomas Jeffrey
       7895 Matthew Michaels                   28
       1234 Donald Newton                      28
         37 Frances Newton
       6572 Theresa Wong                     6571
       6570 mark leblanc                     6569
       6569 michael peterson

11 rows selected.

Listing 6 uses a self-join of the EMPLOYEE table to display all managers alongside the names of the employees who report to them. Listing 7 uses both a self-join and an outer join to obtain a list of all employees, whether or not they have an assigned manager.
Code Listing 6: A self-join of the EMPLOYEE table to itself
SQL> select rtrim(mgr.first_name||' '||mgr.last_name) "Manager Name",
rtrim(emp.first_name||' '||emp.last_name) "Employee Name"
  2    from employee mgr, employee emp
  3   where mgr.employee_id = emp.manager
  4  order by emp.last_name, emp.first_name, mgr.last_name, mgr.first_name;

Manager Name             Employee Name
———————————————————————— ————————————————————————————————
Emily Eckhardt           Lori Dovichi
Emily Eckhardt           Roger Friedli
Emily Eckhardt           Betsy James
Emily Eckhardt           Matthew Michaels
Emily Eckhardt           Donald Newton
Thomas Jeffrey           Theresa Wong
michael peterson         mark leblanc

7 rows selected.

Code Listing 7: A self-join and outer join listing managers alongside their employees
SQL> select rtrim(mgr.first_name||' '||mgr.last_name) "Manager Name",
rtrim(emp.first_name||' '||emp.last_name) "Employee Name"
  2   from employee mgr, employee emp
  3  where mgr.employee_id (+) = emp.manager
  4  order by emp.last_name, emp.first_name, mgr.last_name, mgr.first_name;

Manager Name             Employee Name
———————————————————————— ————————————————————————————————
Emily Eckhardt           Lori Dovichi
                         Emily Eckhardt
Emily Eckhardt           Roger Friedli
Emily Eckhardt           Betsy James
                         Thomas Jeffrey
Emily Eckhardt           Matthew Michaels
Emily Eckhardt           Donald Newton
                         Frances Newton
Thomas Jeffrey           Theresa Wong
michael peterson         mark leblanc
                         michael peterson

11 rows selected.

In Listings 6 and 7, note that the EMPLOYEE table is listed twice in the FROM clause. Because the two table name instances have different aliases—mgr and emp—Oracle Database treats them as two different tables. Primary and foreign key constraints will be more fully outlined in a future article in this series. For now, note that in the EMPLOYEE table, the MANAGER column would typically be created as a foreign key to the primary key column of the EMPLOYEE_ID column, illustrating a self-referencing, or recursive, relationship between the two columns.
The Ins and Outs of Subqueries
Some problems are best solved when they are broken down into individual pieces. Subqueries can help you break a SQL statement into multiple components, in divide-and-conquer fashion, by nesting queries. A subquery is often referred to as an inner query, and the surrounding statement that invokes it is often called an outer query. In its simplest form, an inner query is conceptually executed once, before the outer query is executed. A scalar subquery, also known as a single-row subquery, returns a single column value with zero rows or one row.
Suppose you want to show all employees with the lowest salary. You could query the EMPLOYEE table and sort by the SALARY value in ascending order, listing NULLs last. This type of query would return the lowest-salary earners first, but the result set would also include data that is not meaningful to your goal.
Listings 8 and 9 illustrate another option for obtaining the result. The query in Listing 8 obtains the minimum SALARY value (60000) from the EMPLOYEE table as a first step. The query in Listing 9then uses the result from Listing 8 to finish answering the question WHERE SALARY = 60000. The ultimate answer is correct, and it doesn’t include unnecessary extra rows, but the two separate queries require more work than is necessary.
Code Listing 8: Obtaining the minimum salary value from the EMPLOYEE table
SQL> select min(salary)
  2    from employee;

MIN(SALARY)
———————————
      60000

1 row selected.

Code Listing 9: Obtaining a list of the employees who earn the known minimum salary
SQL> select rtrim(first_name||' '||last_name) "Employee Name", salary
  2    from employee
  3   where salary = 60000
  4  order by last_name, first_name;

Employee Name                SALARY
———————————————————————— ——————
Roger Friedli             60000
Betsy James               60000

2 rows selected.

By using a subquery, you can write both queries in one statement. The intermediate step of feeding the result of the first query to the second query is performed automatically as part of the statement. In Listing 10, which uses a subquery, the same ultimate result that is obtained in two parts in Listings 8 and 9 is obtained in one part. The inner query, which obtains the lowest salary value from the EMPLOYEE table, is conceptually executed first. Its result is fed to the outer query, which retrieves all rows that satisfy the minimum-salary-value condition.
Code Listing 10: Using a subquery to obtain the employees who earn the minimum salary

SQL> select rtrim(first_name||' '||last_name) "Employee Name", salary
  2    from employee
  3   where salary =
  4                  (select min(salary)
  5                     from employee)
  6  order by last_name, first_name;

Employee Name            SALARY
———————————————————————— ——————
Roger Friedli             60000
Betsy James               60000

2 rows selected.

Listing 10’s subquery uses the = operator. Subqueries can also use other operators, such as >, <, >=, <=, and <>. However, as with the = operator, such comparisons work only when the subquery returns at most a single row. Listing 11 demonstrates the type of error message you receive when you attempt a scalar subquery action on a query that does not return a single-row result. Because five employee records satisfy the condition of having salary values of less than 75000, the error occurs. You must use operators such as IN or NOT IN for subqueries that return multiple rows. By using IN instead of the single-row < operator that Listing 11 uses, Listing 12 generates the correct result set.
Code Listing 11: The error message returned when a scalar subquery returns more than one row

SQL> select rtrim(first_name||' '||last_name) "Employee Name", salary
  2    from employee
  3   where salary < (select salary
  4                     from employee
  5                    where salary < 75000)
  6  order by last_name, first_name;
where salary < (select salary
                 *
ERROR at line 3:
ORA-01427: single-row subquery returns more than one row

Code Listing 12: A subquery that uses the IN operator to return multiple rows

SQL> select rtrim(first_name||' '||last_name) "Employee Name", salary
  2    from employee
  3   where salary IN (select salary
  4                      from employee
  5                     where salary < 75000)
  6  order by last_name, first_name;

Employee Name            SALARY
———————————————————————— ——————
Roger Friedli             60000
Betsy James               60000
Matthew Michaels          70000
Theresa Wong              70000
mark leblanc              65000

5 rows selected.


Meeting Again and Again
Statements that include simple subqueries—both single-row and multirow—conceptually execute their inner queries first, feed the results of those queries to their outer queries, and then execute the outer queries. Statements that include correlated subqueries conceptually execute the inner query repeatedly, because correlated subqueries can reference columns from the outer query (hence the term correlated). A correlated subquery is useful when you need to compare every row of the outer query’s results with the result of the inner query.
Listing 13 displays the employees who earn the highest salary in their respective departments. The inner query is correlated with the outer query on the DEPARTMENT_ID value. For each outer query iteration, the DEPARTMENT_ID value from the outer query is compared with the DEPARTMENT_ID value from the inner query. Every time the DEPARTMENT_ID value in the outer query finds a match in the inner query, the maximum salary value for the matching department is obtained and fed to the outer query to obtain a row result. Thus, only the top-employee-salary earners for each department are returned.

Code Listing 13: A correlated subquery returning top salary earners by department

SQL> select rtrim(first_name||' '||last_name) "Employee Name", salary, d.name
  2    from employee e, department d
  3   where e.department_id = d.department_id
  4     and salary = (select max(salary)
  5                     from employee
  6                   where department_id = d.department_id);

Employee Name            SALARY NAME
———————————————————————— —————— ———————————————
Emily Eckhardt           100000 Accounting
michael peterson          90000 Payroll
Thomas Jeffrey           300000 IT

3 rows selected.

To test whether a correlated subquery returns at least one row, you can use the EXISTS operator. This operator returns TRUE or FALSE but never UNKNOWN. To obtain a list of only those employees who have assigned managers, you can use a correlated subquery similar to the one in Listing 14. Because you use the EXISTS operator only to test whether a row exists, the columns included in the SELECT list of the subquery have no relevance. Note that the NULL value is included in the SELECT list of the subquery in Listing 14. Conversely, the NOT EXISTS operator—a frequently used correlated subquery construct—tests whether a matching row cannot be found. The query in Listing 15 displays all employees not yet assigned to any department, so only the employee record for Frances Newton is returned.
Code Listing 14: A correlated subquery that uses the EXISTS operator

SQL> select rtrim(first_name||' '||last_name) "Employee Name"
  2    from employee e
  3   where EXISTS (select NULL
  4                   from employee
  5                  where manager is not null
  6                    and employee_id = e.employee_id)
  7  order by last_name, first_name;

Employee Name
————————————————————————
Lori Dovichi
Roger Friedli
Betsy James
Matthew Michaels
Donald Newton
Theresa Wong
mark leblanc

7 rows selected.


Code Listing 15: A correlated subquery that uses the NOT EXISTS operator

SQL> select rtrim(first_name||' '||last_name) "Employee Name"
  2    from employee e
  3   where NOT EXISTS (select 1
  4                       from employee
  5                      where department_id is not null
  6                        and employee_id = e.employee_id)
  7  order by last_name, first_name;

Employee Name
————————————————————————
Frances Newton

1 row selected.


Conclusion
This article introduced you to Oracle SQL outer joins, self-joins, scalar subqueries, multirow subqueries, and correlated subqueries. It also illustrated two types of outer join operators: the Oracle-specific (+) operator and the ANSI outer join keywords. You’ve seen how self-joins can be used to obtain results from tables with recursive relationships. You’ve learned how subqueries help solve a business problem in one SQL statement. You’ve discovered how a scalar subquery differs from a multirow subquery and which operators are appropriate for each. And you’ve also been introduced to correlated subqueries and the EXISTS and NOT EXISTS correlated subquery operators. The next article in this series introduces inline views and set operators.