View a transaction database file with the transaction database Viewer (SPMF documentation)

Transaction databases are a type of data taken as input by many data mining algorithms offered in SPMF such as FP-Growth, Apriori and Eclat.

SPMF offers a tool to view the content of a transaction database. This tool is called the SPMF transaction database Viewer.

This page explains how to use this tool with an example.

How to run this example?

If you want to run this example from the graphical interface of SPMF, (1) choose the algorithm "Open_transaction_database_file_with_transaction_db_viewer", (2) choose the contextPasquier99.txt file as input, and then (3) click "run algorithm" .

graph viewer open

What is displayed?

After running the example, the content of the file will be displayed by the tool. The picture below shows the user interface of this viewer.

The window A) show in the picture below is the main window. It displays the transaction database using a table. The table has six rows in this example. Each row (except the last one) is a transaction from the transaction database.

Take the first row as example.
The cell in the first column indicates that the ID of this transaction is 0.
The cell in the second column indicates that this transaction 0 contains the item 1.
The cell in the third column indicates that this transaction 0 does not contain the item 2.
The cell in the fourth column indicates that this transaction 0 contains the item 3.
The cell in the fifth column indicates that this transaction 0 contains the item 4.
The cell in the sixth column indicates that this transaction 0 does not contain the item 5.

The other transactions follow the same format.

The last row of the table indicates the frequency of each item. For instance, the cell in the last row and second column indicates that the item 1 appears in 3 transactions.

This view as a table can be useful to understand the content of a transaction database file.

Besides, there are buttons that provides additional features:

graph viewer database graph

What is the input?

The algorithm takes as input a transaction database in SPMF format, as used by algorithm such FP-Growth, Apriori and Eclat.

The database used in this example is provided in the text file "contextPasquier99.txt" in the package ca.pfv.spmf.tests of the SPMF distribution.

The input file format for Apriori is defined as follows. It is a text file. An item is represented by a positive integer. A transaction is a line in the text file. In each line (transaction), items are separated by a single space. It is assumed that all items within a same transaction (line) are sorted according to a total order (e.g. ascending order) and that no item can appear twice within the same line.

For example, this is the content of the example file "contextPasquier99.txt":

1 3 4
2 3 5
1 2 3 5
2 5
1 2 3 5

For example, the first transaction represents the set of items 1, 3 and 4.

Optional feature: giving names to items

Some users have requested the feature of given names to items instead of using numbers. This feature is offered in the user interface of SPMF and in the command line of SPMF. To use this feature, your file must include @CONVERTED_FROM_TEXT as first line and then several lines to define the names of items in your file. For example, consider the example database "contextPasquier99.txt". Here we have modified the file to give names to the items: 

@CONVERTED_FROM_TEXT
@ITEM=1=apple
@ITEM=2=orange
@ITEM=3=tomato
@ITEM=4=milk
@ITEM=5=bread
1 3 4
2 3 5
1 2 3 5
2 5
1 2 3 5

In this file, called contextPasquier99WithNames.txt, the first line indicates, that it is a file where names are given to items. Then, the second line indicates that the item 1 is called "apple". The third line indicates that the item 2 is called "orange". Then the following lines define transactions in the SPMF format.

Then, if we apply the algorithm using this file using the user interface of SPMF or the command line, the output file contains several patterns, including the following ones:

Using the Transaction database Viewer with this file, we obtain the following view:

view sdb with names