product_id | valid_from | valid_to | price |
---|---|---|---|
1002 | 2020-01-01 | 2020-01-05 | 40 |
1002 | 2020-01-05 | 2020-01-09 | 40 |
1002 | 2020-01-09 | 2020-01-11 | 30 |
1002 | 2021-01-01 | 2021-01-02 | 30 |
1002 | 2021-03-15 | 2021-03-31 | 35 |
1002 | 2021-03-31 | 2021-04-01 | 35 |
1002 | 2021-04-01 | 2022-01-01 | 35 |
1002 | 2022-01-01 | 2022-05-01 | 30 |
1002 | 2022-05-01 | 2022-06-01 | 30 |
1002 | 2022-06-01 | 2022-07-01 | 50 |
This means, for example, that the price of the product with product_id = 1002 will be $40 from 2020-01-01 up to and including 2020-01-08. For temporal table see Unitemporal Data.
The table product_1 with the key product_id and the time intervals from valid_from to valid_to serves as an example for merging time slices. The attribute price stands here representative for any number of attributes that say something about the product at a certain time interval.
product_id | valid_from | valid_to | price |
---|---|---|---|
1002 | 2020-01-01 | 2020-01-09 | 40 |
1002 | 2020-01-09 | 2020-01-11 | 30 |
1002 | 2021-01-01 | 2021-01-02 | 30 |
1002 | 2021-03-15 | 2022-01-01 | 35 |
1002 | 2022-01-01 | 2022-06-01 | 30 |
1002 | 2022-06-01 | 2022-07-01 | 50 |
The table product_2 is created by temporal normalisation from product_1. This means that two rows with adjacent time intervals are combined into one time slice if all the attached attributes have the same value. This is illustrated in the colour-coded tables product_1 and product_2 by the same colours.
This site shows a sequence of SQL commands that can be used to perform temporal normalisation. The algorithm works correct under two conditions:
– The column pair product_id + valid_from can be used as primary key.
– It is always valid_to >= valid_from.
Therefore, valid_from = valid_to is also permitted if each valid_from exists only once for a product_id. Since time intervals are open to the right, a time interval [valid_from, valid_to) comprises 0 days. These zero intervals are used practically to be able to store information about a product that once existed but whose validity has been completely revoked retroactively (see also Bitemporal Data).
Gaps are also allowed. The time slice starting on 2020-09-01 ends on 2020-01-11. Since the next time slice does not start until 2021-01-01, the product with the product_id 1002 is not defined from 2020-01-11 to 2020-12-31. The described algorithm for merging time slices works both with and without gaps in the temporal history.
The table product_1 was created with the following SQL commands
create table product_1 as select * from ( values (1002, '2020-01-01'::date, '2020-01-05'::date, 40), (1002, '2020-01-05'::date, '2020-01-09'::date, 40), (1002, '2020-01-09'::date, '2020-01-11'::date, 30), (1002, '2021-01-01'::date, '2021-01-02'::date, 30), (1002, '2021-03-15'::date, '2021-03-31'::date, 35), (1002, '2021-03-31'::date, '2021-04-01'::date, 35), (1002, '2021-04-01'::date, '2022-01-01'::date, 35), (1002, '2022-01-01'::date, '2022-05-01'::date, 30), (1002, '2022-05-01'::date, '2022-06-01'::date, 30), (1002, '2022-06-01'::date, '2022-07-01'::date, 50) ) as foo (product_id, valid_from, valid_to, price);
A merging of neighbouring time slices via the date valid is possible if all dependent attributes in the time slice which is valid up to the date valid have the same values as the attribute values in the time slice which is valid from the date valid.
In order not to have to compare the potentially very many attributes of different time slices with each other individually, the SQL command union is used. The command union without the option all condenses two identical data records into one data record. The possibility of temporal merging without compares is thus given by counting the data records for the key product_id + the border date valid after the union.
The transfer to product_2 is carried out in seven steps.
SQL 1
Code to extract all valid_from from table product_id.
The neighbour_missing attribute is only filled with “0” if both an earlier adjacent time slice and subsequent adjacent time slice exist for the date valid.
create table valid_from as select a.*, a.valid_from as valid, case when b.product_id is null then 1 else 0 end as neighbour_missing from product_1 a left join product_1 b on a.product_id = b.product_id and a.valid_from = b.valid_to and b.valid_to > b.valid_from; alter table valid_from drop column valid_from, drop column valid_to;
product_id | valid | price | neighbour_missing |
---|---|---|---|
1002 | 2020-01-01 | 40 | 1 |
1002 | 2020-01-05 | 40 | 0 |
1002 | 2020-01-09 | 30 | 0 |
1002 | 2021-01-01 | 30 | 1 |
1002 | 2021-03-15 | 35 | 1 |
1002 | 2021-03-31 | 35 | 0 |
1002 | 2021-04-01 | 35 | 0 |
1002 | 2022-01-01 | 30 | 0 |
1002 | 2022-05-01 | 30 | 0 |
1002 | 2022-06-01 | 50 | 0 |
SQL 2
Code to extract valid_to from table product_id.
Only valid data are extracted that occur as valid_from and valid_to for the product_id. The attribute neighbour_missing is thus by definition always 0.
create table valid_to as select a.*, a.valid_to as valid, 0 as neighbour_missung from product_1 a join product_1 b on a.product_id = b.product_id and a.valid_to = b.valid_from; alter table valid_to drop column valid_from, drop column valid_to;
product_id | valid | price | neighbour_missing |
---|---|---|---|
1002 | 2020-01-05 | 40 | 0 |
1002 | 2020-01-09 | 40 | 0 |
1002 | 2021-03-31 | 35 | 0 |
1002 | 2021-04-01 | 35 | 0 |
1002 | 2022-01-01 | 35 | 0 |
1002 | 2022-05-01 | 30 | 0 |
1002 | 2022-06-01 | 30 | 0 |
SQL 3
Merge valid_from and valid_to via union.
create table valid as select * from valid_from union select * from valid_to;
product_id | valid | price | neighbour_missing |
---|---|---|---|
1002 | 2020-01-01 | 40 | 1 |
1002 | 2020-01-05 | 40 | 0 |
1002 | 2020-01-09 | 40 | 0 |
1002 | 2020-01-09 | 30 | 0 |
1002 | 2021-01-01 | 30 | 1 |
1002 | 2021-03-15 | 35 | 1 |
1002 | 2021-03-31 | 35 | 0 |
1002 | 2021-04-01 | 35 | 0 |
1002 | 2021-01-01 | 35 | 0 |
1002 | 2022-01-01 | 30 | 0 |
1002 | 2022-05-01 | 30 | 0 |
1002 | 2022-06-01 | 30 | 0 |
1002 | 2022-06-01 | 50 | 0 |
SQL 4
Counting the time slices matching a key summed with the number of time gaps.
create table counting as select product_id, valid, count(*) - 1 + max(neighbour_missing) as different from valid group by 1, 2;/tr>
product_id | valid | different |
---|---|---|
1002 | 2020-01-01 | 1 |
1002 | 2020-01-05 | 0 |
1002 | 2020-01-09 | 1 |
1002 | 2021-01-01 | 1 |
1002 | 2021-03-15 | 1 |
1002 | 2021-03-31 | 0 |
1002 | 2021-04-01 | 0 |
1002 | 2022-01-01 | 1 |
1002 | 2022-05-01 | 0 |
1002 | 2022-06-01 | 1 |
SQL 5
Determine contiguous time slices of the same values by calculating “level”.
create table level as select product_id, valid, sum(different) over(partition by product_id order by valid) as level from counting;/tr>
product_id | valid | level |
---|---|---|
1002 | 2020-01-01 | 1 |
1002 | 2020-01-05 | 1 |
1002 | 2020-01-09 | 2 |
1002 | 2021-01-01 | 3 |
1002 | 2021-03-15 | 4 |
1002 | 2021-03-31 | 4 |
1002 | 2021-04-01 | 4 |
1002 | 2022-01-01 | 5 |
1002 | 2022-05-01 | 5 |
1002 | 2022-06-01 | 6 |
SQL 6
Merge time slices with the same level.
create table compressor as select product_id, level, min(valid) as min_valid, max(valid) as max_valid from level group by 1 , 2;
product_id | level | min_valid | max_valid |
---|---|---|---|
1002 | 1 | 2020-01-01 | 2020-01-05 |
1002 | 2 | 2020-01-09 | 2020-01-09 |
1002 | 3 | 2021-01-01 | 2021-01-01 |
1002 | 4 | 2021-03-15 | 2021-04-01 |
1002 | 5 | 2022-01-01 | 2022-05-01 |
1002 | 6 | 2022-06-01 | 2022-06-01 |
SQL 7
Determine valid_from and valid_to using valid.
create table product_2 as select a.min_valid, b.* from compressor a join product_1 b on a.product_id = b.product_id and a.max_valid = b.valid_from; alter table product_2 drop column valid_from; alter table product_2 rename column min_valid to valid_from;
product_id | valid_from | valid_to | price |
---|---|---|---|
1002 | 2020-01-01 | 2020-01-09 | 40 |
1002 | 2020-01-09 | 2020-01-11 | 30 |
1002 | 2021-01-01 | 2021-01-02 | 30 |
1002 | 2021-03-15 | 2022-01-01 | 35 |
1002 | 2022-01-01 | 2022-06-01 | 30 |
1002 | 2022-06-01 | 2022-07-01 | 50 |
SQL Summary
Use at your own responsibility!
create table product_1 as select * from ( values (1002, '2020-01-01'::date, '2020-01-05'::date, 40), (1002, '2020-01-05'::date, '2020-01-09'::date, 40), (1002, '2020-01-09'::date, '2020-01-11'::date, 30), (1002, '2021-01-01'::date, '2021-01-02'::date, 30), (1002, '2021-03-15'::date, '2021-03-31'::date, 35), (1002, '2021-03-31'::date, '2021-04-01'::date, 35), (1002, '2021-04-01'::date, '2022-01-01'::date, 35), (1002, '2022-01-01'::date, '2022-05-01'::date, 30), (1002, '2022-05-01'::date, '2022-06-01'::date, 30), (1002, '2022-06-01'::date, '2022-07-01'::date, 50) ) as foo (product_id, valid_from, valid_to, price); create table valid_from as select a.*, a.valid_from as valid, case when b.product_id is null then 1 else 0 end as neighbour_missing from product_1 a left join product_1 b on a.product_id = b.product_id and a.valid_from = b.valid_to and b.valid_to > b.valid_from; alter table valid_from drop column valid_from, drop column valid_to; create table valid_to as select a.*, a.valid_to as valid, 0 as neighbour_missung from product_1 a join product_1 b on a.product_id = b.product_id and a.valid_to = b.valid_from; alter table valid_to drop column valid_from, drop column valid_to; create table valid as select * from valid_from union select * from valid_to; create table counting as select product_id, valid, count(*) - 1 + max(neighbour_missing) as different from valid group by 1, 2; create table level as select product_id, valid, sum(different) over(partition by product_id order by valid) as level from counting; create table compressor as select product_id, level, min(valid) as min_valid, max(valid) as max_valid from level group by 1 , 2; create table product_2 as select a.min_valid, b.* from compressor a join product_1 b on a.product_id = b.product_id and a.max_valid = b.valid_from; alter table product_2 drop column valid_from; alter table product_2 rename column min_valid to valid_from;