A short focus on width_bucket function

Posted 3 years ago.

In the last short focus article we saw how to see how many sales were performed during each day of a given period of time. What if now we want to see the distribution of sales upon their prices? What do we want? To see how many sales have a total price between 0-999, 1,000-1,999 etc. up to 10,000?

This is where a Postgres’s mathematical function enters the scene: width_bucket. To be short, this function returns the position of each row in the defined buckets.

select
  sale_id,
  total_price_ct,
  width_bucket(total_price_ct, 0, 10000, 10) as bucket_position
from sale
where seller_id = 3
;

┌─────────┬────────────────┬─────────────────┐
│ sale_id │ total_price_ct │ bucket_position │
├─────────┼────────────────┼─────────────────┤
│      64 │           5420 │               6 │
│      65 │           3210 │               4 │
│      66 │           2510 │               3 │
│      67 │           6780 │               7 │
│      68 │           2640 │               3 │
│      69 │            280 │               1 │
│      70 │            130 │               1 │
│      71 │           6180 │               7 │
│      72 │           8580 │               9 │
│      73 │            160 │               1 │
└─────────┴────────────────┴─────────────────┘
(10 rows)
In the example above, ten buckets are defined between 0 and 10,000 so the function returns the bucket index each record has its total_price_ct value in. From here, it may appear easy to get the count of sales in each bucket hence the distribution of sales upon total price:

select
  width_bucket(total_price_ct, 0, 10000, 10) as bucket_position,
  count(sale) as sales_count
from sale
where seller_id = 3
group by 1
order by 1 asc
;

┌─────────────────┬─────────────┐
│ bucket_position │ sales_count │
├─────────────────┼─────────────┤
│               1 │           3 │
│               3 │           2 │
│               4 │           1 │
│               6 │           1 │
│               7 │           2 │
│               9 │           1 │
└─────────────────┴─────────────┘
(6 rows)

This looks familiar to those who followed the previous article. Some buckets do not have sales hence are not present in the result set. So they can not appear in the grouped clause.

with
  sale_distribution as (
    select
      s.sale_id,
      width_bucket(s.total_price_ct, 0, 10000, 10) as bucket_no,
      s.total_price_ct
    from
      sale s
    where
      s.seller_id = 3
)
select
  val.n as bucket_position,
  count(sd.*) as sales_count
from
    generate_series(1, 10) val (n)
    left join sale_distribution sd on val.n = sd.bucket_no
group by 1
order by 1 asc
    ;

┌─────────────────┬─────────────┐
│ bucket_position │ sales_count │
├─────────────────┼─────────────┤
│               1 │           3 │
│               2 │           0 │
│               3 │           2 │
│               4 │           1 │
│               5 │           0 │
│               6 │           1 │
│               7 │           2 │
│               8 │           0 │
│               9 │           1 │
│              10 │           0 │
└─────────────────┴─────────────┘
(10 rows)

What about a formatted graph directly in text mode ?

with
  sale_distribution as (
    select
      s.sale_id,
      width_bucket(s.total_price_ct, 0, 10000, 10) as bucket_no,
      s.total_price_ct
    from
      sale s
    where
      s.seller_id = 3
)
select
  format('%04s - %04s',(val.n - 1) * 1000, (val.n - 1) * 1000 + 999) as sale_price,
  repeat('#', 3 * count(sd.*)::int4) as sales_count
from
    generate_series(1, 10) val (n)
    left join sale_distribution sd on val.n = sd.bucket_no
group by 1
order by 1 asc
    ;

┌─────────────┬─────────────┐
│ sale_price  │ sales_count │
├─────────────┼─────────────┤
│    0 -  999 │ #########   │
│ 1000 - 1999 │             │
│ 2000 - 2999 │ ######      │
│ 3000 - 3999 │ ###         │
│ 4000 - 4999 │             │
│ 5000 - 5999 │ ###         │
│ 6000 - 6999 │ ######      │
│ 7000 - 7999 │             │
│ 8000 - 8999 │ ###         │
│ 9000 - 9999 │             │
└─────────────┴─────────────┘
(10 rows)

How to do that using Pomm ?

I have been asked to show how to perform theses queries with Pomm. Here is an example of CLI PHP script that will show nice formatted results (thanks to Symfony Console component).

<?php
$loader = require DIR . '/vendor/autoload.php';

use PommProject\Foundation\Pomm; use Symfony\Component\Console\Helper\Table; use Symfony\Component\Console\Output\ConsoleOutput;

$iterator = (new Pomm(['my_db' => ['dsn' => 'pgsql://user:pass@host/db_name']])) ->getDefaultSession() ->getQueryManager() ->query(<<<SQL select sale_id, total_price_ct from sales s where s.seller_id = $* SQL , [3]) ; $table = new Table(new ConsoleOutput()); $table->setHeaders(['sale ID', 'sale price (cent)']);

foreach ($iterator as $row) { $table->addRow(array_values($row)); }

$table->render();

Enjoy.