Merge Time Slices

It can happen that adjacent time slices in a table contain the same values in all their attributes. This is typical for views on tables that contain fewer columns than their source table. Then it makes sense to merge these neighboring time slices into one time slice.

table product_1
product_idvalid_fromvalid_toprice
10022020-01-012020-01-0540
10022020-01-052020-01-0940
10022020-01-092020-01-1130
10022021-01-012021-01-0230
10022021-03-152021-03-3135
10022021-03-312021-04-0135
10022021-04-012022-01-0135
10022022-01-012022-05-0130
10022022-05-012022-06-0130
10022022-06-012022-07-0150

This means, for example, that the price of the product with product_id = 1002 will be $40 from 2020-01-01 up to and including 2020-01-08. For temporal table see Unitemporal Data.

The table product_1 with the key product_id and the time intervals from valid_from to valid_to serves as an example for merging time slices. The attribute price stands here representative for any number of attributes that say something about the product at a certain time interval.

table product_2
product_idvalid_fromvalid_toprice
10022020-01-012020-01-0940
10022020-01-092020-01-1130
10022021-01-012021-01-0230
10022021-03-152022-01-0135
10022022-01-012022-06-0130
10022022-06-012022-07-0150

The table product_2 is created by temporal normalisation from product_1. This means that two rows with adjacent time intervals are combined into one time slice if all the attached attributes have the same value. This is illustrated in the colour-coded tables product_1 and product_2 by the same colours.

This site shows SQL code that can be used to perform temporal normalisation. The algorithm works correct under two conditions:

   – The column pair product_id + valid_from can be used as primary key.
   – It is always valid_to >= valid_from.

Therefore, valid_from = valid_to is also permitted if each valid_from exists only once for a product_id. Since time intervals are open to the right, a time interval [valid_from, valid_to) comprises 0 days in this situation. These zero intervals are used practically to be able to store information about a product that once existed but whose validity has been completely revoked retroactively (see also Bitemporal Data).

Gaps are also allowed. The time slice starting on 2020-09-01 ends on 2020-01-11. Since the next time slice does not start until 2021-01-01, the product with the product_id 1002 is not defined from 2020-01-11 to 2020-12-31. The described algorithm for merging time slices works both with and without gaps in the temporal history.

The table product_1 was created with the following SQL commands

create table product_1 as 
select * from ( values
    (1002, '2020-01-01'::date, '2020-01-05'::date, 40), 
    (1002, '2020-01-05'::date, '2020-01-09'::date, 40), 
    (1002, '2020-01-09'::date, '2020-01-11'::date, 30), 
    (1002, '2021-01-01'::date, '2021-01-02'::date, 30),   
    (1002, '2021-03-15'::date, '2021-03-31'::date, 35), 
    (1002, '2021-03-31'::date, '2021-04-01'::date, 35), 
    (1002, '2021-04-01'::date, '2022-01-01'::date, 35), 
    (1002, '2022-01-01'::date, '2022-05-01'::date, 30), 
    (1002, '2022-05-01'::date, '2022-06-01'::date, 30), 
    (1002, '2022-06-01'::date, '2022-07-01'::date, 50) 
) as foo (product_id, valid_from, valid_to, price);

A merging of neighboring time slices via the date valid is possible if all dependent attributes in the time slice which is valid up to the date valid have the same values as the attribute values in the time slice which is valid from the date valid.

The transfer to product_2 is carried out by a method developed by Dr. Jörg Ising, an employee of R+V Versicherung in Wiesbaden:
The basic idea is to group by all attributes except valid_from and valid_to and take the minimum of valid_from and the maximum of valid_to. The problem with this is that attributes can change and later change back to the initial value. One must therefore construct a group ID that gives adjacent time slices with the same attribute values the same group ID. If you include this group ID in the grouping, the minimum of valid_from and maximum of valid_to extends only over these so connected range.

I have adapted this to our table example:

SQL

In this SQL the table columns product_id, price stands for all table columns of a table except the temporal attributes valid_from and valid_to.

create table product_2 as 
with temp1 as
(
select *
,(lag(valid_to,1,valid_from) over(w) < valid_from)::integer
as gap_flag
from product_1
window w as (partition by product_id, price order by valid_from)
),
temp2 as
(
select *
,sum(gap_flag) over(w) as group_sequence_number
from temp1
window w as
(
partition by product_id, price
order by valid_from
rows between unbounded preceding and current row
)
)
select
product_id,
price,
min(valid_from) as valid_from,
max(valid_to) as valid_to
from temp2
group by product_id, price, group_sequence_number
;
                                 

Another way to merge time slices uses the SQL command union

%d bloggers like this: