Mining Frequent Time Interval Related Patterns Using the VertTIRP Algorithm(SPMF documentation)

This example explains how to run the VertTIRP algorithm using the SPMF open-source data mining library.

How to run this example?

To run the VertTIRP algorithm:

What is VertTIRP?

VertTIRP is an algorithm for discovering time interval related patterns in databases containing multiple event sequences, and where each event is represented by a time interval (an event has a start time and end time, and thus a duration). VertTIRP algorithm was proposed in 2021 by Mordvanyuk et al..

What is the input data of VertTIRP?

The input of VertTIRP is a time-interval sequence database.

A time-interval sequence database is a set of sequences, where each sequence contains multiple events. Each event is represented by a time interval, that is a start time and an end time. In other words, each event has a duration.

For example, here is a picture of a time interval sequence database containing three sequences, here called S1, S2 and S3:.

time interval sequence database

The first sequence (S1) indicates that an event of type C started at time 8 and ended at time 11, that an event of type A started at time 8 and ended at time 12, and an event of type B started at time 10 and ended at time 12.

The second sequence (S2) indicates that an event of type C started at time 8 and ended at time 12, that an event of type A started at time 10 and ended at time 16, and an event of type B started at time 15 and ended at time 18.

The third sequence (S3) indicates that an event of type C started at time 10 and ended at time 16, that an event of type A started at time 15 and ended at time 19, an event of type B started at time 14 and ended at time 16, and another event of type B started at time 14 and ended at time 19.

Time interval sequence databases can be used inmany applications. For example, it can be used to encode data about daily activities (A = having a meeting, B = making a phone call, C = eating).

The above example is provided in the file test.csv of the SPMF distribution. The content is as follows:

8,12,1;10,12,2;8,11,3;
8,12,3;10,16,1;15,18,2;
10,16,3;14,19,2;15,19,1;14,16,2;

In this format, each event type is an integer (A = 1, B = 2, C =3).

Then, each line is a sequence. This file has three lines and thus three sequences.

In a sequence (line), each event is represented using the format X,Y,Z; where X is the start time, Y is the end time, and Z is the event type.

For instance, the first line indicates that an event of type 1 (which was called A in the above example) has started at time 8 and ended at time 12, that an event of type 2 (which was called B in the above example) started at time 10 and ended at time 12, and that an event of type 3 (called C in the above example) started at time 8 and ended at time 11. The other lines follow the same format.

What are the input parameters of VertTIRP?

The VertTIRP algorithm is designed to find patterns in such data. To apply VertTIRP, we need to set some input parameters.

Here is an intuitive explanation of these parameters:

What is the output of VertTIRP?

VertTIRP discovers patterns that appear frequently in a time interval sequence database.

These patterns called TIRP (time interval related patterns represents) contain 1 or more events and can reveal temporal relationships that frequently appear between events.

Generally, speaking, there are seven possible relationships between two events (the 7 core Allen relationships), which are illustrated below:

allen relationships

These relationships are called b (before), m (meets), o (overlap), c (contains) , f (finish by), e (equal) and s (starts).

If we set the input parameters as in the above example, the following output is produced:

[1] #SUP: 3 #MD: 4.666666666666667 #MHS: 1.0
[1, 2] #SUP: 2 #MD: 6.0 #MHS: 1.0
[2] #SUP: 3 #MD: 3.0 #MHS: 1.3333333333333333
[3] #SUP: 3 #MD: 4.333333333333333 #MHS: 1.0
[3, 1] #SUP: 2 #MD: 8.5 #MHS: 1.0
[3, 2] #SUP: 3 #MD: 7.25 #MHS: 1.3333333333333333

In this output, each line is a pattern. Thus, the output file contains six patterns.

Detailed output

Now, if we want to get more details about the patterns that are discovered, we can changed the parameter detailed output of the algorithm and set it to true. Then, the result will be an output file that contains much more details about each pattern that is discovered. The output will then be as follows:

[1] #SUP: 3 #MD: 4.666666666666667 #MHS: 1.0
#SID: 0 #EID: 1 #START: 8 #END: 12 #SOURCEINTERVALS: [8,12] #RELATIONS: #DURATION: 4
#SID: 1 #EID: 1 #START: 10 #END: 16 #SOURCEINTERVALS: [10,16] #RELATIONS: #DURATION: 6
#SID: 2 #EID: 3 #START: 15 #END: 19 #SOURCEINTERVALS: [15,19] #RELATIONS: #DURATION: 4
[1, 2] #SUP: 2 #MD: 6.0 #MHS: 1.0
#SID: 0 #EID: 2 #START: 8 #END: 12 #SOURCEINTERVALS: [8,12][10,12] #RELATIONS: F #DURATION: 4
#SID: 1 #EID: 2 #START: 10 #END: 18 #SOURCEINTERVALS: [10,16][15,18] #RELATIONS: M #DURATION: 8
[2] #SUP: 3 #MD: 3.0 #MHS: 1.3333333333333333
#SID: 0 #EID: 2 #START: 10 #END: 12 #SOURCEINTERVALS: [10,12] #RELATIONS: #DURATION: 2
#SID: 1 #EID: 2 #START: 15 #END: 18 #SOURCEINTERVALS: [15,18] #RELATIONS: #DURATION: 3
#SID: 2 #EID: 1 #START: 14 #END: 16 #SOURCEINTERVALS: [14,16] #RELATIONS: #DURATION: 2
#SID: 2 #EID: 2 #START: 14 #END: 19 #SOURCEINTERVALS: [14,19] #RELATIONS: #DURATION: 5
[3] #SUP: 3 #MD: 4.333333333333333 #MHS: 1.0
#SID: 0 #EID: 0 #START: 8 #END: 11 #SOURCEINTERVALS: [8,11] #RELATIONS: #DURATION: 3
#SID: 1 #EID: 0 #START: 8 #END: 12 #SOURCEINTERVALS: [8,12] #RELATIONS: #DURATION: 4
#SID: 2 #EID: 0 #START: 10 #END: 16 #SOURCEINTERVALS: [10,16] #RELATIONS: #DURATION: 6
[3, 1] #SUP: 2 #MD: 8.5 #MHS: 1.0
#SID: 1 #EID: 1 #START: 8 #END: 16 #SOURCEINTERVALS: [8,12][10,16] #RELATIONS: O #DURATION: 8
#SID: 2 #EID: 3 #START: 10 #END: 19 #SOURCEINTERVALS: [10,16][15,19] #RELATIONS: M #DURATION: 9
[3, 2] #SUP: 3 #MD: 7.25 #MHS: 1.3333333333333333
#SID: 0 #EID: 2 #START: 8 #END: 12 #SOURCEINTERVALS: [8,11][10,12] #RELATIONS: F #DURATION: 4
#SID: 1 #EID: 2 #START: 8 #END: 18 #SOURCEINTERVALS: [8,12][15,18] #RELATIONS: B #DURATION: 10
#SID: 2 #EID: 1 #START: 10 #END: 16 #SOURCEINTERVALS: [10,16][14,16] #RELATIONS: F #DURATION: 6
#SID: 2 #EID: 2 #START: 10 #END: 19 #SOURCEINTERVALS: [10,16][14,19] #RELATIONS: O #DURATION: 9

In this detailed output, we have additional information about each pattern that can be interesting. Lets take the example of the first pattern, described by the first four lines:

[1] #SUP: 3 #MD: 4.666666666666667 #MHS: 1.0
#SID: 0 #EID: 1 #START: 8 #END: 12 #SOURCEINTERVALS: [8,12] #RELATIONS: #DURATION: 4
#SID: 1 #EID: 1 #START: 10 #END: 16 #SOURCEINTERVALS: [10,16] #RELATIONS: #DURATION: 6
#SID: 2 #EID: 3 #START: 15 #END: 19 #SOURCEINTERVALS: [15,19] #RELATIONS: #DURATION: 4

Now lets look at another example:

[1, 2] #SUP: 2 #MD: 6.0 #MHS: 1.0
#SID: 0 #EID: 2 #START: 8 #END: 12 #SOURCEINTERVALS: [8,12][10,12] #RELATIONS: F #DURATION: 4
#SID: 1 #EID: 2 #START: 10 #END: 18 #SOURCEINTERVALS: [10,16][15,18] #RELATIONS: M #DURATION: 8

The other lines of the output file follow the same format.

Another example

Now let's see what happens if we find patterns that contain more than two events. For this, we need to run the example again, but change the parameter minVertSUP to 0.2 and use detailed output = true. Then, the output file will contain several patterns including this pattern:

[3, 2, 2] #SUP: 1 #MD: 9.0 #MHS: 1.0
#SID: 2 #EID: 2 #START: 10 #END: 19 #SOURCEINTERVALS: [10,16][14,16][14,19] #RELATIONS: F O S #DURATION: 9

The first line of this pattern indicates that this is the pattern 3, 2, 2 which means the events C, B and B in the example.

The second line indicates that this pattern appears in the third sequence (#SID: 2) and ends in the third event (#EID: 2), and that the occurrence starts at 10 and end at 19. It is also said that the intervals of C, B and B in that occurrences are [10,16][14,16] and [14,19], respectively. Moreover, it is said that the temporal relationships between C, B and B in that occurrence are F O S, which means:

Generally, if a pattern has a set of events { e1, e2, e3... en-1, en} the relationships will be listed in this order: relationship between e1 and e2, relationship between e1 and e3, ... relationship between e1 and en-1, relationship between e1 and en, relationship between e2 and e3, ... relationship between e2 and en-1, relationship between e2 and en, ... etc.

That is the general idea about the output of VertTIRP.

Optional parameter(s)

If you are using the source code version of SPMF, you also have the option to indicate which temporal relations you want to find. By uncommenting the following lines, you can remove some relations. Then, VertTIRP will ignore them.

// algo.removeRelation(Relation.B);
// algo.removeRelation(Relation.M);
// algo.removeRelation(Relation.O);
// algo.removeRelation(Relation.F);
// algo.removeRelation(Relation.S);
// algo.removeRelation(Relation.L);
// algo.removeRelation(Relation.C);
// algo.removeRelation(Relation.E);

For example, if you uncomment the first line, then VertTIRP will ignore all the relationships of type "B" (before). This feature is interesting as you can focus on finding only some types of relationships instead of all of them, according to your needs.

Performance

VertTIRP is one of the most efficient algorithm for TIRP mining. There exists an improve version of VertTIRP also offered in SPMF, which is called FastTIRP and was shown to be faster.

Where can I get more information about VertTIRP?

The VertTIRP algorithm is described in this article:

Mordvanyuk, N., Lopez, B., Bifet, A.: VertTIRP: Robust and efficient vertical frequent time interval-related pattern mining. Expert Syst. Appl. 168, 114276 (2021)