A short focus on lateral join

Posted 2 years ago.

In the last article, focus was made on the full text search capabilities of Postgresql. Because real life businesses are complex, full text queries become complex with lot joins inside CTEs (covered by another previous article), lateral joins may then be an interesting tool to compute data.

Let’s consider the following example: we want to fetch the last measure from meteorology stations:

SELECT
  station.station_id,
  station.name,
  station.coordinates,
  (
    SELECT measure
    FROM   measure
    WHERE  measure.station_id = station.station_id
    ORDER BY measured_at DESC
    LIMIT 1
  ) AS last_measure
FROM
  station
WHERE
  station_id IN (…)

The result of such query would look like

┌────────────┬────────┬───────────────────────────┬─────────── …
│ station_id │  name  │        coordinates        │   last_measure
├────────────┼────────┼───────────────────────────┼─────────── …
│          1 │ alpha  │ (34.20348,-3.283465)      │ (4ba46d39- …
│          3 │ charly │ (47.234634,-140.34523543) │ (6521b7e3- …

This is exactly what was expected but in the case the query has a condition on the measure or to order the results by one of the measure’s data, it has to be composed in a more complex query.

What can LATERAL do ?

SELECT
  station.station_id,
  last_measure,
  station.name,
  station.coordinates
FROM
  station
  LEFT JOIN LATERAL
  (
    SELECT
      measure.measure_id,
      measure.station_id,
    …
      measure.rain
    FROM    measure
    WHERE   station.station_id = measure.station_id
    ORDER BY measured_at DESC
    LIMIT 1
  ) AS last_measure ON true
WHERE
  station.station_id IN (1, 3)

Exactly as the subselect in the projection part of the main query, a LATERAL query can reference a context in its WHERE clause. But instead of being evaluated in the last part of the plan (SELECT) it is evaluated in the first stages of the process (FROM / WHERE). This means it is now possible to filter upon one of the measure data. Since it is a LEFT JOIN here, different kind of results can be returned:

The obvious

WHERE
  station.station_id IN (1, 3)
  AND last_measure.rain > 0

will filter out all stations in the predefined set that have last results with no rain, but there is also an interesting

SELECTFROM
  station
  LEFT JOIN LATERAL
  (
    …
  ) AS last_measure ON last_measure.rain > 0
WHERE
  station.station_id IN (1, 3)

That will return all stations of the defined set but they will have a measure data only if the join condition is filled:

┌────────────┬──────────────────────┬────────┬───────────────────────────┐
│ station_id │     last_measure     │  name  │        coordinates        │
├────────────┼──────────────────────┼────────┼───────────────────────────┤
│          1 │ ø                    │ alpha  │ (34.20348,-3.283465)      │
│          3 │ (652 …               │ charly │ (47.234634,-140.34523543) │
└────────────┴──────────────────────┴────────┴───────────────────────────┘
(2 rows)

From a performance perspective, it is important to keep in mind that the cost of both kind of queries directly depends on the number of stations returned by the main query, it is okay to do that with a very limited number of stations whereas other queries will give better performances for a bigger set of stations. This will be the subject of a future article on performances.