Mastering OUTER JOIN with NULL in PostgreSQL: A Step-by-Step Guide

Understanding OUTER JOIN with NULL

When working with relational databases, joining tables is a fundamental operation that allows you to combine data from multiple tables based on common columns. One of the most commonly used types of joins is the OUTER JOIN, which returns all records from one or both tables, depending on the type of join.

In this article, we’ll explore how to use OUTER JOIN with NULL in PostgreSQL and provide a step-by-step guide on how to achieve your desired result.

Introduction to OUTER JOIN

An OUTER JOIN (also known as a right outer join) returns all records from the left table and the matching records from the right table. If there are no matches, the result will contain NULL values for the right table columns.

The syntax for an OUTER JOIN is as follows:

SELECT column1, column2
FROM table1
RIGHT OUTER JOIN table2 ON table1.column = table2.column;

In this example, table1 and table2 are the two tables being joined, and column1 and column2 are the columns that will be included in the result set.

Using RIGHT OUTER JOIN with NULL

Let’s consider the example provided by Stack Overflow. We have two tables: Nodes and Node_Properties. The Nodes table has an id, attr1, and attr2, while the Node_Properties table has an id, name, and value.

We want to use a RIGHT OUTER JOIN to join these two tables on the id column. However, we also want to include rows from the Nodes table where there is no matching row in the Node_Properties table.

The SQL query provided by the Stack Overflow answer uses a combination of UNION and LEFT JOIN to achieve this:

WITH np (id, name, value) AS (
  SELECT id, name, value
  FROM node_properties
  UNION
  -- bike must appear in the output so generate a row if it doesn't exist.
  SELECT null, 'bike', null 
  WHERE not exists 
        (SELECT null
           from node_properties
          where name = 'bike'
        )  
)
SELECT n.id      node_id 
     , n.name    node_name
     , np.name   properties_name
     , np.value  properties_value
  FROM np 
  LEFT JOIN nodes n on np.id = n.id
ORDER BY n.id NULLs first;

Let’s break down what’s happening in this query:

  1. The WITH clause defines a temporary result set called np that contains all rows from the node_properties table, as well as a row with id NULL and name and value columns set to 'bike'.
  2. The SELECT statement joins the np result set with the nodes table using an INNER JOIN, but only includes rows where there is a match in both tables.
  3. However, the WHERE clause also filters out rows where there is no matching row in the node_properties table, which means that any rows from the nodes table without a corresponding row in node_properties are not included in the result set.
  4. To fix this, we use the NULLs first keyword to include all rows from the nodes table at the beginning of the sorted result set, followed by the matching rows.

Why does this work?

The key insight here is that the UNION operation combines two separate queries: one that returns all rows from the node_properties table, and another that generates a single row with id NULL and name and value columns set to 'bike'.

By using UNION, we’re effectively creating a temporary result set that contains all possible combinations of rows from both tables. The LEFT JOIN then filters out any rows where there is no match in the node_properties table.

The WHERE clause ensures that only rows with matching data are included in the final result set, while the NULLs first keyword guarantees that all rows from the nodes table will be included at the beginning of the sorted result set.

Conclusion

In this article, we’ve explored how to use OUTER JOIN with NULL in PostgreSQL. By combining a RIGHT OUTER JOIN with UNION and LEFT JOIN operations, we can achieve our desired result: including rows from the Nodes table where there is no matching row in the Node_Properties table.

Remember that when working with OUTER JOINs, it’s essential to understand how they return data even when there are no matches. By using the correct syntax and techniques, you can create powerful queries that combine data from multiple tables and produce meaningful results.


Last modified on 2023-05-11