Mining Frequent Time Interval Related Patterns Using the FastTIRP Algorithm (SPMF documentation)

This example explains how to run the FastTIRP algorithm using the SPMF open-source data mining library.

How to run this example?

To run the FastTIRP algorithm:

What is FastTIRP?

FastTIRP is an algorithm for discovering time interval related patterns in databases containing multiple event sequences, and where each event is represented by a time interval (an event has a start time and end time, and thus a duration). The FastTIRP algorithm is an improved version of the VertTIRP algorithm, which was proposed in 2022 by Fournier-Viger et al..

What is the input data of FastTIRP?

The input of FastTIRP is a time-interval sequence database.

A time-interval sequence database is a set of sequences, where each sequence contains multiple events. Each event is represented by a time interval, that is a start time and an end time. In other words, each event has a duration.

For example, here is a picture of a time interval sequence database containing three sequences, here called S1, S2 and S3:.

time interval sequence database

The first sequence (S1) indicates that an event of type C started at time 8 and ended at time 11, that an event of type A started at time 8 and ended at time 12, and an event of type B started at time 10 and ended at time 12.

The second sequence (S2) indicates that an event of type C started at time 8 and ended at time 12, that an event of type A started at time 10 and ended at time 16, and an event of type B started at time 15 and ended at time 18.

The third sequence (S3) indicates that an event of type C started at time 10 and ended at time 16, that an event of type A started at time 15 and ended at time 19, an event of type B started at time 14 and ended at time 16, and another event of type B started at time 14 and ended at time 19.

Time interval sequence databases can be used inmany applications. For example, it can be used to encode data about daily activities (A = having a meeting, B = making a phone call, C = eating).

The above example is provided in the file test.csv of the SPMF distribution. The content is as follows:

8,12,1;10,12,2;8,11,3;
8,12,3;10,16,1;15,18,2;
10,16,3;14,19,2;15,19,1;14,16,2;

In this format, each event type is an integer (A = 1, B = 2, C =3).

Then, each line is a sequence. This file has three lines and thus three sequences.

In a sequence (line), each event is represented using the format X,Y,Z; where X is the start time, Y is the end time, and Z is the event type.

For instance, the first line indicates that an event of type 1 (which was called A in the above example) has started at time 8 and ended at time 12, that an event of type 2 (which was called B in the above example) started at time 10 and ended at time 12, and that an event of type 3 (called C in the above example) started at time 8 and ended at time 11. The other lines follow the same format.

What are the input parameters of FastTIRP?

The FastTIRP algorithm is designed to find patterns in such data. To apply FastTIRP, we need to set some input parameters.

Here is an intuitive explanation of these parameters:

What is the output of FastTIRP?

FastTIRP discovers patterns that appear frequently in a time interval sequence database.

These patterns called TIRP (time interval related patterns represents) contain 1 or more events and can reveal temporal relationships that frequently appear between events.

Generally, speaking, there are seven possible relationships between two events (the 7 core Allen relationships), which are illustrated below:

allen relationships

These relationships are called b (before), m (meets), o (overlap), c (contains) , f (finish by), e (equal) and s (starts).

If we set the input parameters as in the above example, the following output is produced:

[1] #SUP: 3 #MD: 4.666666666666667 #MHS: 1.0
[1, 2] #SUP: 2 #MD: 6.0 #MHS: 1.0
[2] #SUP: 3 #MD: 3.0 #MHS: 1.3333333333333333
[3] #SUP: 3 #MD: 4.333333333333333 #MHS: 1.0
[3, 1] #SUP: 2 #MD: 8.5 #MHS: 1.0
[3, 2] #SUP: 3 #MD: 7.25 #MHS: 1.3333333333333333

In this output, each line is a pattern. Thus, the output file contains six patterns.

Detailed output

Now, if we want to get more details about the patterns that are discovered, we can changed the parameter detailed output of the algorithm and set it to true. Then, the result will be an output file that contains much more details about each pattern that is discovered. The output will then be as follows:

[1] #SUP: 3 #MD: 4.666666666666667 #MHS: 1.0
#SID: 0 #EID: 1 #START: 8 #END: 12 #SOURCEINTERVALS: [8,12] #RELATIONS: #DURATION: 4
#SID: 1 #EID: 1 #START: 10 #END: 16 #SOURCEINTERVALS: [10,16] #RELATIONS: #DURATION: 6
#SID: 2 #EID: 3 #START: 15 #END: 19 #SOURCEINTERVALS: [15,19] #RELATIONS: #DURATION: 4
[1, 2] #SUP: 2 #MD: 6.0 #MHS: 1.0
#SID: 0 #EID: 2 #START: 8 #END: 12 #SOURCEINTERVALS: [8,12][10,12] #RELATIONS: F #DURATION: 4
#SID: 1 #EID: 2 #START: 10 #END: 18 #SOURCEINTERVALS: [10,16][15,18] #RELATIONS: M #DURATION: 8
[2] #SUP: 3 #MD: 3.0 #MHS: 1.3333333333333333
#SID: 0 #EID: 2 #START: 10 #END: 12 #SOURCEINTERVALS: [10,12] #RELATIONS: #DURATION: 2
#SID: 1 #EID: 2 #START: 15 #END: 18 #SOURCEINTERVALS: [15,18] #RELATIONS: #DURATION: 3
#SID: 2 #EID: 1 #START: 14 #END: 16 #SOURCEINTERVALS: [14,16] #RELATIONS: #DURATION: 2
#SID: 2 #EID: 2 #START: 14 #END: 19 #SOURCEINTERVALS: [14,19] #RELATIONS: #DURATION: 5
[3] #SUP: 3 #MD: 4.333333333333333 #MHS: 1.0
#SID: 0 #EID: 0 #START: 8 #END: 11 #SOURCEINTERVALS: [8,11] #RELATIONS: #DURATION: 3
#SID: 1 #EID: 0 #START: 8 #END: 12 #SOURCEINTERVALS: [8,12] #RELATIONS: #DURATION: 4
#SID: 2 #EID: 0 #START: 10 #END: 16 #SOURCEINTERVALS: [10,16] #RELATIONS: #DURATION: 6
[3, 1] #SUP: 2 #MD: 8.5 #MHS: 1.0
#SID: 1 #EID: 1 #START: 8 #END: 16 #SOURCEINTERVALS: [8,12][10,16] #RELATIONS: O #DURATION: 8
#SID: 2 #EID: 3 #START: 10 #END: 19 #SOURCEINTERVALS: [10,16][15,19] #RELATIONS: M #DURATION: 9
[3, 2] #SUP: 3 #MD: 7.25 #MHS: 1.3333333333333333
#SID: 0 #EID: 2 #START: 8 #END: 12 #SOURCEINTERVALS: [8,11][10,12] #RELATIONS: F #DURATION: 4
#SID: 1 #EID: 2 #START: 8 #END: 18 #SOURCEINTERVALS: [8,12][15,18] #RELATIONS: B #DURATION: 10
#SID: 2 #EID: 1 #START: 10 #END: 16 #SOURCEINTERVALS: [10,16][14,16] #RELATIONS: F #DURATION: 6
#SID: 2 #EID: 2 #START: 10 #END: 19 #SOURCEINTERVALS: [10,16][14,19] #RELATIONS: O #DURATION: 9

In this detailed output, we have additional information about each pattern that can be interesting. Lets take the example of the first pattern, described by the first four lines:

[1] #SUP: 3 #MD: 4.666666666666667 #MHS: 1.0
#SID: 0 #EID: 1 #START: 8 #END: 12 #SOURCEINTERVALS: [8,12] #RELATIONS: #DURATION: 4
#SID: 1 #EID: 1 #START: 10 #END: 16 #SOURCEINTERVALS: [10,16] #RELATIONS: #DURATION: 6
#SID: 2 #EID: 3 #START: 15 #END: 19 #SOURCEINTERVALS: [15,19] #RELATIONS: #DURATION: 4

Now lets look at another example:

[1, 2] #SUP: 2 #MD: 6.0 #MHS: 1.0
#SID: 0 #EID: 2 #START: 8 #END: 12 #SOURCEINTERVALS: [8,12][10,12] #RELATIONS: F #DURATION: 4
#SID: 1 #EID: 2 #START: 10 #END: 18 #SOURCEINTERVALS: [10,16][15,18] #RELATIONS: M #DURATION: 8

The other lines of the output file follow the same format.

Another example

Now let's see what happens if we find patterns that contain more than two events. For this, we need to run the example again, but change the parameter minVertSUP to 0.2 and use detailed output = true. Then, the output file will contain several patterns including this pattern:

[3, 2, 2] #SUP: 1 #MD: 9.0 #MHS: 1.0
#SID: 2 #EID: 2 #START: 10 #END: 19 #SOURCEINTERVALS: [10,16][14,16][14,19] #RELATIONS: F O S #DURATION: 9

The first line of this pattern indicates that this is the pattern 3, 2, 2 which means the events C, B and B in the example.

The second line indicates that this pattern appears in the third sequence (#SID: 2) and ends in the third event (#EID: 2), and that the occurrence starts at 10 and end at 19. It is also said that the intervals of C, B and B in that occurrences are [10,16][14,16] and [14,19], respectively. Moreover, it is said that the temporal relationships between C, B and B in that occurrence are F O S, which means:

Generally, if a pattern has a set of events {e1, e2, e3... en-1, en} the relationships will be listed in this order: relationship between e1 and e2, relationship between e1 and e3, ... relationship between e1 and en-1, relationship between e1 and en, relationship between e2 and e3, ... relationship between e2 and en-1, relationship between e2 and en, ... etc.

That is the general idea about the output of FastTIRP.

Optional parameter(s)

If you are using the source code version of SPMF, you also have the option to indicate which temporal relations you want to find. By uncommenting the following lines, you can remove some relations. Then, FastTIRP will ignore them.

// algo.removeRelation(Relation.B);
// algo.removeRelation(Relation.M);
// algo.removeRelation(Relation.O);
// algo.removeRelation(Relation.F);
// algo.removeRelation(Relation.S);
// algo.removeRelation(Relation.L);
// algo.removeRelation(Relation.C);
// algo.removeRelation(Relation.E);

For example, if you uncomment the first line, then FastTIRP will ignore all the relationships of type "B" (before). This feature is interesting as you can focus on finding only some types of relationships instead of all of them, according to your needs.

Performance

FastTIRP is one of the fastest algorithm for TIRP mining. It was shown to be faster than other state-of-the-art algorithms such as VertTIRP.

Where can I get more information about FastTIRP?

The FastTIRP algorithm is described in this article:

Fournier-Viger, P. Li, Y., Nawaz, M. S., He, Y. (2022) FastTIRP: Efficient discovery of Time-Interval Related Patterns. Proc. of 10th Intern. Conf. on Big Data Analytics (BDA 2022).