Merge Time Slices by union command

table product_1
product_id	valid_from	valid_to	price
1002	2020-01-01	2020-01-05	40
1002	2020-01-05	2020-01-09	40
1002	2020-01-09	2020-01-11	30
1002	2021-01-01	2021-01-02	30
1002	2021-03-15	2021-03-31	35
1002	2021-03-31	2021-04-01	35
1002	2021-04-01	2022-01-01	35
1002	2022-01-01	2022-05-01	30
1002	2022-05-01	2022-06-01	30
1002	2022-06-01	2022-07-01	50

This means, for example, that the price of the product with product_id = 1002 will be $40 from 2020-01-01 up to and including 2020-01-08. For temporal table see Unitemporal Data.

The table product_1 with the key product_id and the time intervals from valid_from to valid_to serves as an example for merging time slices. The attribute price stands here representative for any number of attributes that say something about the product at a certain time interval.

table product_2
product_id	valid_from	valid_to	price
1002	2020-01-01	2020-01-09	40
1002	2020-01-09	2020-01-11	30
1002	2021-01-01	2021-01-02	30
1002	2021-03-15	2022-01-01	35
1002	2022-01-01	2022-06-01	30
1002	2022-06-01	2022-07-01	50

The table product_2 is created by temporal normalisation from product_1. This means that two rows with adjacent time intervals are combined into one time slice if all the attached attributes have the same value. This is illustrated in the colour-coded tables product_1 and product_2 by the same colours.

This site shows a sequence of SQL commands that can be used to perform temporal normalisation. The algorithm works correct under two conditions:

– The column pair product_id + valid_from can be used as primary key.
– It is always valid_to >= valid_from.

Therefore, valid_from = valid_to is also permitted if each valid_from exists only once for a product_id. Since time intervals are open to the right, a time interval [valid_from, valid_to) comprises 0 days. These zero intervals are used practically to be able to store information about a product that once existed but whose validity has been completely revoked retroactively (see also Bitemporal Data).

Gaps are also allowed. The time slice starting on 2020-09-01 ends on 2020-01-11. Since the next time slice does not start until 2021-01-01, the product with the product_id 1002 is not defined from 2020-01-11 to 2020-12-31. The described algorithm for merging time slices works both with and without gaps in the temporal history.

The table product_1 was created with the following SQL commands

create table product_1 as 
select * from ( values
    (1002, '2020-01-01'::date, '2020-01-05'::date, 40), 
    (1002, '2020-01-05'::date, '2020-01-09'::date, 40), 
    (1002, '2020-01-09'::date, '2020-01-11'::date, 30), 
    (1002, '2021-01-01'::date, '2021-01-02'::date, 30),   
    (1002, '2021-03-15'::date, '2021-03-31'::date, 35), 
    (1002, '2021-03-31'::date, '2021-04-01'::date, 35), 
    (1002, '2021-04-01'::date, '2022-01-01'::date, 35), 
    (1002, '2022-01-01'::date, '2022-05-01'::date, 30), 
    (1002, '2022-05-01'::date, '2022-06-01'::date, 30), 
    (1002, '2022-06-01'::date, '2022-07-01'::date, 50) 
) as foo (product_id, valid_from, valid_to, price);

A merging of neighbouring time slices via the date valid is possible if all dependent attributes in the time slice which is valid up to the date valid have the same values as the attribute values in the time slice which is valid from the date valid.

In order not to have to compare the potentially very many attributes of different time slices with each other individually, the SQL command union is used. The command union without the option all condenses two identical data records into one data record. The possibility of temporal merging without compares is thus given by counting the data records for the key product_id + the border date valid after the union.

The transfer to product_2 is carried out in seven steps.

SQL 1

Code to extract all valid_from from table product_id.

The neighbour_missing attribute is only filled with “0” if both an earlier adjacent time slice and subsequent adjacent time slice exist for the date valid.

create table valid_from as 
select 
    a.*, a.valid_from as valid, 
    case 
        when b.product_id is null then 1 
        else 0 
    end as neighbour_missing 
from product_1 a left join product_1 b 
    on a.product_id = b.product_id and a.valid_from = b.valid_to 
       and b.valid_to > b.valid_from; 
alter table valid_from drop column valid_from, drop column valid_to;

table valid_from
product_id	valid	price	neighbour_missing
1002	2020-01-01	40	1
1002	2020-01-05	40	0
1002	2020-01-09	30	0
1002	2021-01-01	30	1
1002	2021-03-15	35	1
1002	2021-03-31	35	0
1002	2021-04-01	35	0
1002	2022-01-01	30	0
1002	2022-05-01	30	0
1002	2022-06-01	50	0

SQL 2

Code to extract valid_to from table product_id.

Only valid data are extracted that occur as valid_from and valid_to for the product_id. The attribute neighbour_missing is thus by definition always 0.

create table valid_to as 
select 
    a.*, a.valid_to as valid, 0 as neighbour_missung 
    from product_1 a join product_1 b 
       on a.product_id = b.product_id and a.valid_to = b.valid_from; 
alter table valid_to drop column valid_from, drop column valid_to;

table valid_to
product_id	valid	price
1002	2020-01-05	40
1002	2020-01-09	40
1002	2021-03-31	35
1002	2021-04-01	35
1002	2022-01-01	35
1002	2022-05-01	30
1002	2022-06-01	30

SQL 3

Merge valid_from and valid_to via union.

create table valid as 
select * from valid_from union select * from valid_to;

table valid
product_id	valid	price	neighbour_missing
1002	2020-01-01	40	1
1002	2020-01-05	40	0
1002	2020-01-09	40	0
1002	2020-01-09	30	0
1002	2021-01-01	30	1
1002	2021-03-15	35	1
1002	2021-03-31	35	0
1002	2021-04-01	35	0
1002	2021-01-01	35	0
1002	2022-01-01	30	0
1002	2022-05-01	30	0
1002	2022-06-01	30	0
1002	2022-06-01	50	0

SQL 4

Counting the time slices matching a key summed with the number of time gaps.

create table counting as 
select product_id, valid, 
    count(*) - 1 + max(neighbour_missing) as different 
from valid 
group by 1, 2;

/tr>

table counting
product_id	valid	different
1002	2020-01-01	1
1002	2020-01-05	0
1002	2020-01-09	1
1002	2021-01-01	1
1002	2021-03-15	1
1002	2021-03-31	0
1002	2021-04-01	0
1002	2022-01-01	1
1002	2022-05-01	0
1002	2022-06-01	1

SQL 5

Determine contiguous time slices of the same values by calculating “level”.

create table level as 
select product_id, valid, 
    sum(different) 
        over(partition by product_id order by valid) as level 
from counting;

/tr>

table level
product_id	valid	level
1002	2020-01-01	1
1002	2020-01-05	1
1002	2020-01-09	2
1002	2021-01-01	3
1002	2021-03-15	4
1002	2021-03-31	4
1002	2021-04-01	4
1002	2022-01-01	5
1002	2022-05-01	5
1002	2022-06-01	6

SQL 6

Merge time slices with the same level.

create table compressor as 
select 
    product_id, level, 
    min(valid) as min_valid, 
    max(valid) as max_valid 
from level 
group by 1 , 2;

table compressor
product_id	level	min_valid	max_valid
1002	1	2020-01-01	2020-01-05
1002	2	2020-01-09	2020-01-09
1002	3	2021-01-01	2021-01-01
1002	4	2021-03-15	2021-04-01
1002	5	2022-01-01	2022-05-01
1002	6	2022-06-01	2022-06-01

SQL 7

Determine valid_from and valid_to using valid.

create table product_2 as 
select a.min_valid, b.* 
from compressor a join product_1 b 
    on a.product_id = b.product_id and a.max_valid = b.valid_from; 
alter table product_2 drop column valid_from; 
alter table product_2 rename column min_valid to valid_from;

table product_2
product_id	valid_from	valid_to	price
1002	2020-01-01	2020-01-09	40
1002	2020-01-09	2020-01-11	30
1002	2021-01-01	2021-01-02	30
1002	2021-03-15	2022-01-01	35
1002	2022-01-01	2022-06-01	30
1002	2022-06-01	2022-07-01	50

SQL Summary

Use at your own responsibility!

create table product_1 as 
select * from ( values
    (1002, '2020-01-01'::date, '2020-01-05'::date, 40), 
    (1002, '2020-01-05'::date, '2020-01-09'::date, 40), 
    (1002, '2020-01-09'::date, '2020-01-11'::date, 30), 
    (1002, '2021-01-01'::date, '2021-01-02'::date, 30),   
    (1002, '2021-03-15'::date, '2021-03-31'::date, 35), 
    (1002, '2021-03-31'::date, '2021-04-01'::date, 35), 
    (1002, '2021-04-01'::date, '2022-01-01'::date, 35), 
    (1002, '2022-01-01'::date, '2022-05-01'::date, 30), 
    (1002, '2022-05-01'::date, '2022-06-01'::date, 30), 
    (1002, '2022-06-01'::date, '2022-07-01'::date, 50) 
) as foo (product_id, valid_from, valid_to, price);

create table valid_from as 
select 
    a.*, a.valid_from as valid, 
    case 
        when b.product_id is null then 1 
        else 0 
    end as neighbour_missing 
from product_1 a left join product_1 b 
    on a.product_id = b.product_id and a.valid_from = b.valid_to 
       and b.valid_to > b.valid_from; 
alter table valid_from drop column valid_from, drop column valid_to;

create table valid_to as 
select 
    a.*, a.valid_to as valid, 0 as neighbour_missung 
    from product_1 a join product_1 b 
       on a.product_id = b.product_id and a.valid_to = b.valid_from; 
alter table valid_to drop column valid_from, drop column valid_to;

create table valid as 
select * from valid_from union select * from valid_to;

create table counting as 
select product_id, valid, 
    count(*) - 1 + max(neighbour_missing) as different 
from valid 
group by 1, 2;

create table level as 
select product_id, valid, 
    sum(different) 
        over(partition by product_id order by valid) as level 
from counting;

create table compressor as 
select 
    product_id, level, 
    min(valid) as min_valid, 
    max(valid) as max_valid 
from level 
group by 1 , 2;

create table product_2 as 
select a.min_valid, b.* 
from compressor a join product_1 b 
    on a.product_id = b.product_id and a.max_valid = b.valid_from; 
alter table product_2 drop column valid_from; 
alter table product_2 rename column min_valid to valid_from;