A take on ORM limitations

When ORM appeared in the PHP world, they brought with them a lot of good practices in addition of sticking to a purely object oriented model which, at the time, was missing in most of PHP projects. If ORM are widely used today it is because they ease the development of large applications. But they also come with pitfalls that lead to badly designed data structures, complex processes and more generally, performances issues. The goal of this article is to present Pomm's underpinning choices and what they change for developers.

Imagine an ORM but upside down. That's what Pomm is.

Classes Vs sets

Object Relational Mappers try to grant developers with an object oriented API of the database. To do so they present database's tables as entities class definitions. Roughly, one class => one table so one record => one instance. This static structure definition puts handcuffs on the database engine since relational databases deal with sets and sets are extendible by definition. Being extendible does not mean «support for change», of course a database structure can change, extendible means sets can be transformed through SQL manipulation.

SELECT letter FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) AS my_set (letter, position);

The example below shows how to project a part of a static set. In a way, tables are just persistent sets. To return a set in a SQL database you need to use a projection operator known as SELECT which defines what set the database is going to return.

Enforcing the cast of a static class definition in the relational world means engraving projections in stone plates hence loosing query control. In some way, ORM can create simplistic joins to fetch several linked entities but still we are far from what SQL can do in terms of combining sets to shape collections of entities.

In Pomm, a set database structure is held by a mapping class instance. It defines a projection between the set and a schemaless entity class. In fact, each query defines a set developers choose to inject into typed schemaless entities.

For example, if a student in the database holds the student password, it is possible to change the projection not to hydrate Student entities with this information, it is also possible to add a new attribute named age computed by the database from the student's birth date.

This projection operator works with every builtin queries and there are helpers to use it in custom queries. It is even possible to create custom projection methods.

Postgresql also proposes another projection operator: RETURNING. It can turn every INSERT, UPDATE, DELETE into a projection. With Pomm you can do:

$student = $student_model
        ->deleteByPK(['student_id' => 2]);
It will delete the given student from the database and if it exists, return an according entity hydrated using the configured projection.

The cost of database abstraction (DBAL)

Database abstraction is a very interesting promise for developers: they can access and use all different databases the same way. It makes general purpose frameworks able to give developers access to any databases and it allows to write generic software (forum or blog plate-form) without the itch of managing database specific code.

But this has costs which are not often considered by ORM users:
it makes developers to have to learn a new specific query language simpler (and poorer) than SQL
it can only propose a small subset of types and functionalities of the underlying database
it hides a fast database under a slow and memory consuming PHP layer

In the vast majority of PHP web projects, the whole abstraction layer is interpreted for each request to access a single SQL database. In the same time, the DBAL reduces the database to a pool of entities identified by an auto-incremented id. In the end, developers use the relational database like a id => instance store in which rdbms only present poor added value. The only options presented to people building a database are poor and clumsy structures. When knowing the database engine below, using an abstraction layer on a project is like feet cuffing it.

This also leads DSL to be written in a non SQL format that is interpreted by the ORM and then imported to the database through the abstraction layer ignoring the specific goodnesses like constraints, tablespaces and almost all index types. (Oddly, the only rdbms specific features in PHP ORM are dedicated to mysql). Even though this can be considered as a feature since it brings easy migrations scripts, not using performances mechanisms while not having a hand on generated queries invariably leads to performances problems.

Postgresql as object oriented database

Pomm's choice was not to use an abstraction layer in favor of working with only one database engine: Postgresql.

Postgresql is a fast, full of features, free and robust database engine supporting SQL2008 standard augmented with some killing advantages. Knowing the database engine, PHP's PDO can be avoided in profit of native Pg library proposing even better performances and more functionalities (asynchronous event dispatcher system, scrollable cursors etc.).

There are types for business data

Postgresql comes with a large (and extensible) set of types to architecture data. It would be long to list them all here but here are the most common:

➥ JSON, XML, text, UUID
➥ timestamps with time zone, periods, timestamp ranges
➥ HSTore (key => value store), LTree (materialized paths), ts_vector (full text lexemes)
➥ point, segment, circle, box, polygon
➥ inet4, inet6 (IP addresses V4 and V6), macaddr
➥ custom types
Postgresql also proposes a bunch of functions and operators that interact with those types. The example below asks the database if a circle centered in (1,1) with radius 1 owns the point (0,0):
SELECT circle '((1,1), 1)' @> point '(0,0)'; -- returns false
This makes developers to easily query and sort results on distance, position etc.

The problem with PDO or most of the ORM is they do not know how to handle theses types. Pomm comes with a transparent and easily extensible converter layer that translates Postgresql types from / to PHP. Postgresql also proposes arrays of any supported types, so does Pomm. Custom types and table types can be converted directly into PHP object instances. It is not replacing the missing ORM's «relational» feature but it can create structured data in the database that is re-used in different entities.

Heading for performances

Like many ORM do today, in order to combine performances and security, all statements are prepared prior to execution. With Pomm, prepared statements are also pooled so if they are issued more than once, the prepared query is re-used automatically.

$model->findWhere('birth_date >= $* AND gender = $*', [ new \DateTime('1994-01-01'), 'M' ]);
$model->findWhere('birth_date >= $* AND gender = $*', [ new \DateTime('1996-05-31'), 'F' ]);
Both queries above use the same prepared statement.

Pomm 1.2 switched from PDO to native PHP's Postgresql API. This resulted in a real performances boost in addition of consuming less memory to fetch data from database. PDO's postgresql driver does not support scrollable cursor on results, collection had to keep results in memory to be iterated over more than once.

SQL as query language

Structured Query Language has 40 years of experience in manipulating sets and querying data. Programmers in the past had problem building dynamic queries. This has been replaced today with ORM's query builders. Of course using an object oriented API to build queries is easier and more testable than using old tedious string operations but in the end, the result shows same complexity with added tables as joins, added conditions in the WHERE clause, added fields in the SELECT part.

Pomm chose not to propose a query builder but a Where clause builder that could be used in generic methods and custom SQL queries instead.

$where = \Pomm\Query\Where::create('birth_date >= $*', [ new \DateTime('1994-01-01') ])
    ->andWhere('gender = $*', ['M']);

$students = $student_model->findWhere($where);

For complex queries, SQL is the best candidate for the job. Pomm proposes a smooth integration with the PHP layer using query helpers. The following example lists nearest cycle stations with available slots from a given station name. Fields list and table name are replaced by a placeholder wich is substitued at the end so developers keep concentrating on what the query does and this allow structure to change without changing the query. This query will return a collection (an iterator) on Station entities augmented with the distance information.

public function getNearestFrom($name)
{
    $sql = <<<SQL
SELECT
  :projection
FROM
  :station_table station
    LEFT JOIN :station_table near_station ON near_station.name <> station.name
WHERE
    station.name = $*
  AND
    near_station.available_slots > 0
ORDER BY
  distance ASC
SQL;

    $projection = $this->createProjection()
        ->setField('distance', '%:coord:% <-> near_station.coord', 'float4')
        ;

    $sql = strtr($sql,
        [
        ':projection'    => $projection->formatFieldsWithFieldAlias('station')
        ':station_table' => $this->getStructure()->getRelation()
        ]
    );
    return $this->query($sql, [$name], $projection);
}

The whole SQL language's features are available: WITH statements, window functions, foreign data wrappers, custom functions etc.

Pomm's ability to integrate SQL as query language allied with schemaless entities and the projection system make developers able to shape exactly the data they need to present with maximum performances.