A short focus on recursive queries

Posted 3 years ago.

In the last article we saw how to perform basic statistical operations in the database. Let’s go on a previous article about CTE and more precisely about recursive CTE.

In case you want to try it your side, here is the GIST with the data and the structures.. Be sure you download the file and insert it through a \i psql command to avoid paste & copy annoyances with the COPY commands where data are separated using tabs.

What do we want ?

Let’s say we want to fetch a single employee department but with the hierarchical list of its departments. The department table has a link to itself which indicates a parent relationship. By example, the employee id 56 (Dr. Ursula Davis II) belongs to the dba department which is part of the production department and ultimately the my_company department.

The problem here is that we have employees with different depth of departments so depending on the employee there might be a different number of joins to fetch all of its departments. The SQL99 standard defines a way to perform such recursive queries.

To understand how a recursive CTE works, let’s make a simple sequence generator:

with recursive
  inc (a) as (
    select 1::int4 as a
    union all
    select a + 1 as a from inc where a < 6
)
select a as generator from inc;
┌───────────┐
│ generator │
├───────────┤
│         1 │
│         2 │
│         3 │
│         4 │
│         5 │
│         6 │
└───────────┘
(6 rows)

A first set named inc is defined. It contains a union of two subqueries, the first one being the starting element. The recursive query is joining itself through the from statement until a reaches 6. The same kind of queries can be used to create generators like Fibonacci, there is even a Mandelbrot fractal generator written in SQL on Postgresql’s wiki.

Department recursion

How to follow the department path of a single employee:

with recursive
  sub_department as (
    select d.* from department d where d.department_id = 9
    union all
    select d.* from department d join sub_department sd on d.department_id = sd.department_parent_id
)
select sd.* from sub_department sd;
┌───────────────┬────────────┬──────────────────────┐
│ department_id │    name    │ department_parent_id │
├───────────────┼────────────┼──────────────────────┤
│             9 │ developer  │                    6 │
│             6 │ production │                    1 │
│             1 │ my_company │                    ¤ │
└───────────────┴────────────┴──────────────────────┘
(3 rows)

The recursive part fetch the asked department first then fetch the department where the department_id is the previous department’s parent until there is no parent. Let’s now add the employee’s informations.

with recursive
    emp_department as (select e.* from employee e where e.employee_id = 56),
    sub_department as (
        select d.* from department d join emp_department ed using (department_id)
        union all
        select d.* from department d join sub_department sd on sd.department_parent_id = d.department_id
    )
select
  ed.employee_id,
  ed.name,
  ed.birthdate,
  array_agg(sd) as departments
from
  emp_department ed,
  sub_department sd
group by
  ed.employee_id,
  ed.name,
  ed.birthdate
;

The code above first query all the informations about the given employee. Then it recursively traverses its departments using a join on department_id. At last, it projects the result reducing the departments as an array of tuples. The result is

┌─────────────┬─────────────────────┬────────────────────────┬────────────────────────────────────────────────────┐
│ employee_id │        name         │       birthdate        │                    departments                     │
├─────────────┼─────────────────────┼────────────────────────┼────────────────────────────────────────────────────┤
│          56 │ Dr. Ursula Davis II │ 1995-01-07 17:18:17+00 │ {"(7,dba,6)","(6,production,1)","(1,my_company,)"} │
└─────────────┴─────────────────────┴────────────────────────┴────────────────────────────────────────────────────┘
(1 row)

Depending on the interface’s needs, it may be needed to format directly the department hierarchy. This can be done using the string_agg aggregate function in place of array_agg:

  …
  ed.birthdate,
  string_agg(sd.name, ' > ') as hierarchy
from
  emp_department ed,
  …

┌─────────────┬─────────────────────┬────────────────────────┬───────────────────────────────┐
│ employee_id │        name         │       birthdate        │           hierarchy           │
├─────────────┼─────────────────────┼────────────────────────┼───────────────────────────────┤
│          56 │ Dr. Ursula Davis II │ 1995-01-07 17:18:17+00 │ dba > production > my_company │
└─────────────┴─────────────────────┴────────────────────────┴───────────────────────────────┘
(1 row)

What about Pomm ?

The last couple of articles were about statistical operations. The results returned by such queries are very often send to draw graphics and dashboards. Here we are more in more standard entity approach. Let’s use Pomm’s model manager to have Employee and Department entities. In an empty folder, add the following composer.json file:

{
    require: {
        pomm-project/cli: 2.0.*@dev,
        pomm-project/model-manager: 2.0.*@dev,
        pomm-project/foundation: 2.0.*@dev
    }
}

Then enter composer.phar install to enjoy composer’s awesomeness. Once the operation is finished, just create a file named .pomm_cli_bootstrap.php ( mind the starting dot ):

<?php
$loader = require __DIR__ . '/vendor/autoload.php';
$loader->add(null, __DIR__);

return new PommProject\Foundation\Pomm([
    'my_company' => [
        'dsn'                   => 'pgsql://greg/test',
        'class:session_builder' => '\PommProject\ModelManager\SessionBuilder'
    ]
]);

Now we can use Pomm’s CLI to check the connection and then generate:

  • model class
  • entity class
  • structure class (overwritten every time)

php vendor/bin/pomm.php pomm:generate:schema-all my_company public

This command will generate the classes for all relations in the public schema (the default schema). This creates a directory structure as follow:

MyCompany
└── PublicSchema
    ├── AutoStructure
    │   ├── Department.php
    │   └── Employee.php
    ├── DepartmentModel.php
    ├── Department.php
    ├── EmployeeModel.php
    └── Employee.php

2 directories, 6 files

Let’s implement the query in the EmployeeModel class:

    public function findByPkWithDepartmentHierarchy($employee_id)
    {
        $department_model = $this
            ->getSession()
            ->getModel('\MyCompany\PublicSchema\DepartmentModel')
            ;
        $sql = <<<SQL
with recursive
    emp_department as (select e.* from :employee e where e.employee_id = $*),
    sub_department as (
        select d.* from :department d join emp_department ed using (department_id)
        union all
        select d.* from :department d join sub_department sd on sd.department_parent_id = d.department_id
    )
select
  :projection
from
  emp_department ed,
  sub_department sd
group by
  :group_fields
SQL;
        $projection = $this->createProjection()
            ->unsetField('department_id')
            ->setField('departments', "array_agg(sd)", "public.department[]")
            ;
        $sql = strtr($sql, [
            ':employee'   => $this->structure->getRelation(),
            ':department' => $department_model->getStructure()->getRelation(),
            ':projection' => $projection->formatFieldsWithFieldAlias('ed'),
            ':group_fields' => $this
                ->createProjection()
                ->unsetField('department_id')
                ->formatFields('ed'),
        ]);

        return $this->query($sql, [$employee_id], $projection)->current();
    }

One can notice the user’s departments are fetched as objects. The converter is smart enough to convert this array as an array of Department entities.

And here is how to use it from a test.php file:

<?php
$pomm = require __DIR__ . '/.pomm_cli_bootstrap.php';

$employee = $pomm['my_company']
    ->getModel('MyCompany\PublicSchema\EmployeeModel')
    ->findByPkWithDepartmentHierarchy(56)
    ;

echo json_encode($employee->extract(), JSON_PRETTY_PRINT) . "\n";

And here is the result:

{
    employee_id: 56,
        name: Dr. Ursula Davis II,
        birthdate: {
            date: 1995-01-07 17:18:17.000000,
            timezone_type: 1,
            timezone: +00:00
        },
        departments: [
        {
            department_id: 7,
            name: dba,
            department_parent_id: 6
        },
        {
            department_id: 6,
            name: production,
            department_parent_id: 1
        },
        {
            department_id: 1,
            name: my_company,
            department_parent_id: null
        }
    ]
}

Enjoy !