View a sequence utility database file with the Sequence Utility Database Viewer (SPMF documentation)

Sequence databases with utility information are a type of data taken as input by several data mining algorithms offered in SPMF such as HUSRM and USPAN.

SPMF offers a tool to view the content of a sequence database with utility information. This tool is called the SPMF SequenceUtility Database Viewer.

This page explains how to use this tool with an example.

How to run this example?

If you want to run this example from the graphical interface of SPMF, (1) choose the algorithm "Open_sequence_utility_database_file_with_sequence_db_viewer", (2) choose the DataBase_HUSRM.txt file as input, and then (3) click "run algorithm" .

graph viewer open

What is displayed?

After running the example, the content of the file will be displayed by the tool. The picture below shows the user interface of this viewer.

The window A) show in the picture below is the main window. It displays the utility sequence database using a table. The table has four rows in this example. Each row is a sequence from the sequence database.

Assume that this database contains the sequence of purchases made by different customers.

Take the first row as example.
The first cell of that row indicate that the ID of the first sequence is 0.
The second cell of that sequence means that the customer 0 represented by this sequence bought items 1 and 2, and those items respectively generated a profit of 1$ and 4$.
The third cell indicates that the customer 0 then bought item 3 for 10$.
The fourth cell indicates that the customer 0 then bought item 6 for 9 $.
The fifth cell indicates that the customer 0 then bought item 7 for 2$.
The sixth cell indicates that the customer 0 then bought item 5 for 1 $.
The last cell of the first row gives the total utility (profit) generated by that sequence of transactions made by customer 0, which is 1$ + 4$ + 10$ + 9$ + 2$ + 1$ = 27 $.

The other sequences follow the same format.

This view as a table can be useful to understand the content of a utility sequence database file.

Besides, there are three buttons that provides additional features:

graph viewer database graph

What is the input?

The algorithm takes as input a sequence database with utility information, as used by algorithms such as HUSRM and USPAN..

The database used in this example is provided in the text file "DataBase_HUSRM.txt" in the package ca.pfv.spmf.tests of the SPMF distribution, which follows the file format for HUSRM and USPAN.

In that format, a sequence database contains multiple sequences, and each item appearing in sequences have a utility value.

More precisely, the input file format of HUSRM is defined as follows. It is a text file.

For example, the file DataBase_HUSRM.txt contains the following content:

1[1] 2[4] -1 3[10] -1 6[9] -1 7[2] -1 5[1] -1 -2 SUtility:27
1[1] 4[12] -1 3[20] -1 2[4] -1 5[1] 7[2] -1 -2 SUtility:40
1[1] -1 2[4] -1 6[9] -1 5[1] -1 -2 SUtility:15
1[3] 2[4] 3[5] -1 6[3] 7[1] -1 -2 SUtility:16

For example, consider the first line. It means that the first customer nbought items 1 and 2, and those items respectively generated a profit of 1$ and 4$. Then, the customer bought item 3 for 10$. Then, the customer bought item 6 for 9 $. Then, the customer bought items 7 for 2$. Then the customer bought item 5 for 1 $. Thus, this customer has made 5 transaction. The total utility (profit) generated by that sequence of transaction is 1$ + 4$ + 10$ + 9$ + 2$ + 1$ = 27 $.