Merge Time Slices by union command

table product_1
product_idvalid_fromvalid_toprice
10022020-01-012020-01-0540
10022020-01-052020-01-0940
10022020-01-092020-01-1130
10022021-01-012021-01-0230
10022021-03-152021-03-3135
10022021-03-312021-04-0135
10022021-04-012022-01-0135
10022022-01-012022-05-0130
10022022-05-012022-06-0130
10022022-06-012022-07-0150

This means, for example, that the price of the product with product_id = 1002 will be $40 from 2020-01-01 up to and including 2020-01-08. For temporal table see Unitemporal Data.

The table product_1 with the key product_id and the time intervals from valid_from to valid_to serves as an example for merging time slices. The attribute price stands here representative for any number of attributes that say something about the product at a certain time interval.

table product_2
product_idvalid_fromvalid_toprice
10022020-01-012020-01-0940
10022020-01-092020-01-1130
10022021-01-012021-01-0230
10022021-03-152022-01-0135
10022022-01-012022-06-0130
10022022-06-012022-07-0150

The table product_2 is created by temporal normalisation from product_1. This means that two rows with adjacent time intervals are combined into one time slice if all the attached attributes have the same value. This is illustrated in the colour-coded tables product_1 and product_2 by the same colours.

This site shows a sequence of SQL commands that can be used to perform temporal normalisation. The algorithm works correct under two conditions:

   – The column pair product_id + valid_from can be used as primary key.
   – It is always valid_to >= valid_from.

Therefore, valid_from = valid_to is also permitted if each valid_from exists only once for a product_id. Since time intervals are open to the right, a time interval [valid_from, valid_to) comprises 0 days. These zero intervals are used practically to be able to store information about a product that once existed but whose validity has been completely revoked retroactively (see also Bitemporal Data).

Gaps are also allowed. The time slice starting on 2020-09-01 ends on 2020-01-11. Since the next time slice does not start until 2021-01-01, the product with the product_id 1002 is not defined from 2020-01-11 to 2020-12-31. The described algorithm for merging time slices works both with and without gaps in the temporal history.

The table product_1 was created with the following SQL commands

create table product_1 as 
select * from ( values
    (1002, '2020-01-01'::date, '2020-01-05'::date, 40), 
    (1002, '2020-01-05'::date, '2020-01-09'::date, 40), 
    (1002, '2020-01-09'::date, '2020-01-11'::date, 30), 
    (1002, '2021-01-01'::date, '2021-01-02'::date, 30),   
    (1002, '2021-03-15'::date, '2021-03-31'::date, 35), 
    (1002, '2021-03-31'::date, '2021-04-01'::date, 35), 
    (1002, '2021-04-01'::date, '2022-01-01'::date, 35), 
    (1002, '2022-01-01'::date, '2022-05-01'::date, 30), 
    (1002, '2022-05-01'::date, '2022-06-01'::date, 30), 
    (1002, '2022-06-01'::date, '2022-07-01'::date, 50) 
) as foo (product_id, valid_from, valid_to, price);

A merging of neighbouring time slices via the date valid is possible if all dependent attributes in the time slice which is valid up to the date valid have the same values as the attribute values in the time slice which is valid from the date valid.

In order not to have to compare the potentially very many attributes of different time slices with each other individually, the SQL command union is used. The command union without the option all condenses two identical data records into one data record. The possibility of temporal merging without compares is thus given by counting the data records for the key product_id + the border date valid after the union.

The transfer to product_2 is carried out in seven steps.

                                 

SQL 1

Code to extract all valid_from from table product_id.

The neighbour_missing attribute is only filled with “0” if both an earlier adjacent time slice and subsequent adjacent time slice exist for the date valid.

create table valid_from as 
select 
    a.*, a.valid_from as valid, 
    case 
        when b.product_id is null then 1 
        else 0 
    end as neighbour_missing 
from product_1 a left join product_1 b 
    on a.product_id = b.product_id and a.valid_from = b.valid_to 
       and b.valid_to > b.valid_from; 
alter table valid_from drop column valid_from, drop column valid_to;
table valid_from
product_idvalidpriceneighbour_missing
10022020-01-01401
10022020-01-05400
10022020-01-09300
10022021-01-01301
10022021-03-15351
10022021-03-31350
10022021-04-01350
10022022-01-01300
10022022-05-01300
10022022-06-01500
                                 

SQL 2

Code to extract valid_to from table product_id.

Only valid data are extracted that occur as valid_from and valid_to for the product_id. The attribute neighbour_missing is thus by definition always 0.

create table valid_to as 
select 
    a.*, a.valid_to as valid, 0 as neighbour_missung 
    from product_1 a join product_1 b 
       on a.product_id = b.product_id and a.valid_to = b.valid_from; 
alter table valid_to drop column valid_from, drop column valid_to;
table valid_to
product_idvalidpriceneighbour_missing
10022020-01-05400
10022020-01-09400
10022021-03-31350
10022021-04-01350
10022022-01-01350
10022022-05-01300
10022022-06-01300
                                 

SQL 3

Merge valid_from and valid_to via union.
create table valid as 
select * from valid_from union select * from valid_to;
table valid
product_idvalidpriceneighbour_missing
10022020-01-01401
10022020-01-05400
10022020-01-09400
10022020-01-09300
10022021-01-01301
10022021-03-15351
10022021-03-31350
10022021-04-01350
10022021-01-01350
10022022-01-01300
10022022-05-01300
10022022-06-01300
10022022-06-01500
                                 

SQL 4

Counting the time slices matching a key summed with the number of time gaps.
create table counting as 
select product_id, valid, 
    count(*) - 1 + max(neighbour_missing) as different 
from valid 
group by 1, 2;
/tr>
table counting
product_idvaliddifferent
10022020-01-011
10022020-01-050
10022020-01-091
10022021-01-011
10022021-03-151
10022021-03-310
10022021-04-010
10022022-01-011
10022022-05-010
10022022-06-011
                                 

SQL 5

Determine contiguous time slices of the same values by calculating “level”.
create table level as 
select product_id, valid, 
    sum(different) 
        over(partition by product_id order by valid) as level 
from counting;
/tr>
table level
product_idvalidlevel
10022020-01-011
10022020-01-051
10022020-01-092
10022021-01-013
10022021-03-154
10022021-03-314
10022021-04-014
10022022-01-015
10022022-05-015
10022022-06-016
                                 

SQL 6

Merge time slices with the same level.
create table compressor as 
select 
    product_id, level, 
    min(valid) as min_valid, 
    max(valid) as max_valid 
from level 
group by 1 , 2;
table compressor
product_idlevelmin_validmax_valid
100212020-01-012020-01-05
100222020-01-092020-01-09
100232021-01-012021-01-01
100242021-03-152021-04-01
100252022-01-012022-05-01
100262022-06-012022-06-01
                                 

SQL 7

Determine valid_from and valid_to using valid.
create table product_2 as 
select a.min_valid, b.* 
from compressor a join product_1 b 
    on a.product_id = b.product_id and a.max_valid = b.valid_from; 
alter table product_2 drop column valid_from; 
alter table product_2 rename column min_valid to valid_from;
table product_2
product_idvalid_fromvalid_toprice
10022020-01-012020-01-0940
10022020-01-092020-01-1130
10022021-01-012021-01-0230
10022021-03-152022-01-0135
10022022-01-012022-06-0130
10022022-06-012022-07-0150
                                 

SQL Summary

Use at your own responsibility!

create table product_1 as 
select * from ( values
    (1002, '2020-01-01'::date, '2020-01-05'::date, 40), 
    (1002, '2020-01-05'::date, '2020-01-09'::date, 40), 
    (1002, '2020-01-09'::date, '2020-01-11'::date, 30), 
    (1002, '2021-01-01'::date, '2021-01-02'::date, 30),   
    (1002, '2021-03-15'::date, '2021-03-31'::date, 35), 
    (1002, '2021-03-31'::date, '2021-04-01'::date, 35), 
    (1002, '2021-04-01'::date, '2022-01-01'::date, 35), 
    (1002, '2022-01-01'::date, '2022-05-01'::date, 30), 
    (1002, '2022-05-01'::date, '2022-06-01'::date, 30), 
    (1002, '2022-06-01'::date, '2022-07-01'::date, 50) 
) as foo (product_id, valid_from, valid_to, price);

create table valid_from as 
select 
    a.*, a.valid_from as valid, 
    case 
        when b.product_id is null then 1 
        else 0 
    end as neighbour_missing 
from product_1 a left join product_1 b 
    on a.product_id = b.product_id and a.valid_from = b.valid_to 
       and b.valid_to > b.valid_from; 
alter table valid_from drop column valid_from, drop column valid_to;

create table valid_to as 
select 
    a.*, a.valid_to as valid, 0 as neighbour_missung 
    from product_1 a join product_1 b 
       on a.product_id = b.product_id and a.valid_to = b.valid_from; 
alter table valid_to drop column valid_from, drop column valid_to;

create table valid as 
select * from valid_from union select * from valid_to;

create table counting as 
select product_id, valid, 
    count(*) - 1 + max(neighbour_missing) as different 
from valid 
group by 1, 2;

create table level as 
select product_id, valid, 
    sum(different) 
        over(partition by product_id order by valid) as level 
from counting;

create table compressor as 
select 
    product_id, level, 
    min(valid) as min_valid, 
    max(valid) as max_valid 
from level 
group by 1 , 2;

create table product_2 as 
select a.min_valid, b.* 
from compressor a join product_1 b 
    on a.product_id = b.product_id and a.max_valid = b.valid_from; 
alter table product_2 drop column valid_from; 
alter table product_2 rename column min_valid to valid_from;
%d bloggers like this: