Mining High Average-Utility Itemsets with Multiple Thresholds in a Transaction Database using the HAUI-MMAU Algorithm (SPMF documentation)
This example explains how to run the HAUI-MMAU algorithm using the SPMF open-source data mining library.
How to run this example?
- If you are using the graphical interface, (1) choose the "HAUI-MMAU" algorithm, (2) select the input file "contextHAUIMMAU.txt", (3) set the output file name (e.g. "output.txt") (4) set the GLMAU to 0 and the MAUI file to "MAU_Utility.txt" and (5) click "Run algorithm".
- If you want to execute this example from the command line,
then execute this command:
java -jar spmf.jar run HAUI-MMAU contextHAUIMMAU.txt output.txt 0 MAU_Utility.txt in a folder containing spmf.jar and the example input file contextHAUIMMAU.txt. - If you are using the source code version of SPMF, launch the file "MainTestHAUI_MMAU_saveToFile.java" in the package ca.pfv.SPMF.tests.
What is HAUI-MMAU?
HAUI-MMAU is an algorithm for mining average-utility itemsets by using multiple minimum average-utility thresholds. Unlike some other algorithms such as HAUI-Miner, the HAUI-MMAU algorithms allows to set a different threshold for each item, rather than a single threshold to evaluate all items. Setting multiple thresholds is useful because it allows to set lower minimum average-utility thresholds for low profit items. Therefore, it allows discovering high average-utility itemsets containing low profit items.
What is the input?
The input of HAUI-MMAU is a transaction database and a list of minimum average-utility thresholds indicating the minimum average-utility threshold for each item.
A transaction database is a set of transactions, where each transaction is a list of distinct items (symbols). For example, let's consider the following transaction database. It consists of 5 transactions (t1, t2, ..., t5) and 8 items (1, 2, 3, 4, 5, 6). For instance, transaction t1 is the set of items {2, 3, 4, 5}. This database is provided in the file "contextHAUIMMAU.txt" of the SPMF distribution. It is important to note that an item is not allowed to appear twice in the same transaction.
Transaction ID |
Items |
Utilities |
t1 |
2 3 4 5 |
14 2 6 4 |
t2 |
2 3 4 |
8 3 6 |
t3 |
1 4 |
10 2 |
t4 |
1 3 6 |
5 6 4 |
t5 |
2 3 4 6 |
4 3 2 2 |
The list of minimum average-utility threshold is stored in a text file that is read as input by the algorithm. This is provided in the file "MAU_Utility.txt":
item |
minimum average-utility threshold |
1 |
5 |
2 |
2 |
3 |
1 |
4 |
2 |
5 |
4 |
6 |
1 |
This file indicated for example that the average-utility threshold to be used for item 1 is 5.
What is the output?
The output of HAUI-MMAU is the set of all high average-utility itemsets contained in the database.
What is a high average-utility itemset? The average-utility of an itemset is the sum of the average-utilities of transactions containing the itemset. An itemset is a high average-utility itemset if its average-utility is higher or equal to the smallest minimum average-utility threshold among the minimum average-utility thresholds of all its items. For example, the itemset {1, 4} is high average-utility because it appears in transactions (t3) and its average-utility (= 6)is higher than the average minimum average-utility item 1, item 4, which is (5+2)/2 = 3.5.
Why HAUI-MMAU is useful? It is useful because it permits setting lower minimum average-utility thresholds for low profit items. Therefore, it allows discovering high average-utility itemsets containing low profit items.
If we run HAUI-MMAU on the previous transaction database with the MAU_Utility.txt file previously described, we get the following result, where each line represents an itemsets followed by ":" and then its absolute average-utility and minimum average-utility count:
1 #AUTIL: 15 #mau: 5.0
2 #AUTIL: 26 #mau: 2.0
3 #AUTIL: 14 #mau: 1.0
4 #AUTIL: 16 #mau: 2.0
5 #AUTIL: 4 #mau: 4.0
6 #AUTIL: 6 #mau: 1.0
3 6 #AUTIL: 7 #mau: 1.0
3 2 #AUTIL: 17 #mau: 1.5
3 4 #AUTIL: 11 #mau: 1.5
3 5 #AUTIL: 3 #mau: 2.5
3 1 #AUTIL: 5 #mau: 3.0
6 2 #AUTIL: 3 #mau: 1.5
6 4 #AUTIL: 2 #mau: 1.5
6 1 #AUTIL: 4 #mau: 3.0
2 4 #AUTIL: 20 #mau: 2.0
2 5 #AUTIL: 9 #mau: 3.0
4 5 #AUTIL: 5 #mau: 3.0
4 1 #AUTIL: 6 #mau: 3.5
3 6 2 #AUTIL: 3 #mau: 1.3333334
3 6 4 #AUTIL: 2 #mau: 1.3333334
3 6 1 #AUTIL: 5 #mau: 2.3333333
3 2 4 #AUTIL: 16 #mau: 1.6666666
3 2 5 #AUTIL: 6 #mau: 2.3333333
3 4 5 #AUTIL: 4 #mau: 2.3333333
6 2 4 #AUTIL: 2 #mau: 1.6666666
2 4 5 #AUTIL: 8 #mau: 2.6666667
3 6 2 4 #AUTIL: 2 #mau: 1.5
3 2 4 5 #AUTIL: 6 #mau: 2.25
For example, the first line represent the itemset “3, 2” has average-utility 17 and its minimum average-utility count is 1.5. The other lines follows the same format.
Input file format
HAUI-MMAU takes two files as input, defined as follows.
The first file (e.g. contextHAUIMMAU.txt) is a text file containing transactions. Each lines represents a transaction. Each line is composed of three sections, as follows.
- First, the items contained in the transaction are listed. An item is represented by a positive integer. Each item is separated from the next item by a single space. It is assumed that all items within a same transaction (line) are sorted according to a total order (e.g. ascending order) and that no item can appear twice within the same transaction.
- Second, the symbol ":" appears and is followed by the transaction utility (an integer).
- Third, the symbol ":" appears and is followed by the utility of each item in this transaction (an integer), separated by single spaces.
2 3 4 5:26:14 2 6 4
2 3 4:17:8 3 6
1 4:12:10 2
1 3 6:15:5 6 4
2 3 4 6:11:4 3 2 2
Consider the first line. It means that the first transaction is the itemset {2, 3, 4, 5} with utilities {14, 2, 6, 4}. The following lines follow the same format.
The second file is a text file (e.g. MAU_Utility.txt) which provides the minimum average-utility to be used for each item. Each line indicate the minimum average-utility for an item and consists of two integer values separated by a single space. The first value is the item. The second value is the minimum average-utility value to be used for this item. For example, here is the file used in this example. The first line indicate that for item "1" and the minimum average-utility to be used is 1 (one transaction). The other lines follow the same format.
1 5
2 2
3 1
4 2
5 4
6 1
Output file format
The output file format is defined as follows. It is a text file, where each line represents a high average-utility itemset. On each line, the items of the itemset are first listed. Each item is represented by an integer, followed by a single space. After, all the items, the keyword " #AUTIL: " appears and is followed by the average-utility of the itemset. For example, we show below the output file for this example.
1 #AUTIL: 15 #mau: 5.0
2 #AUTIL: 26 #mau: 2.0
3 #AUTIL: 14 #mau: 1.0
4 #AUTIL: 16 #mau: 2.0
5 #AUTIL: 4 #mau: 4.0
6 #AUTIL: 6 #mau: 1.0
3 6 #AUTIL: 7 #mau: 1.0
3 2 #AUTIL: 17 #mau: 1.5
3 4 #AUTIL: 11 #mau: 1.5
3 5 #AUTIL: 3 #mau: 2.5
3 1 #AUTIL: 5 #mau: 3.0
6 2 #AUTIL: 3 #mau: 1.5
6 4 #AUTIL: 2 #mau: 1.5
6 1 #AUTIL: 4 #mau: 3.0
2 4 #AUTIL: 20 #mau: 2.0
2 5 #AUTIL: 9 #mau: 3.0
4 5 #AUTIL: 5 #mau: 3.0
4 1 #AUTIL: 6 #mau: 3.5
3 6 2 #AUTIL: 3 #mau: 1.3333334
3 6 4 #AUTIL: 2 #mau: 1.3333334
3 6 1 #AUTIL: 5 #mau: 2.3333333
3 2 4 #AUTIL: 16 #mau: 1.6666666
3 2 5 #AUTIL: 6 #mau: 2.3333333
3 4 5 #AUTIL: 4 #mau: 2.3333333
6 2 4 #AUTIL: 2 #mau: 1.6666666
2 4 5 #AUTIL: 8 #mau: 2.6666667
3 6 2 4 #AUTIL: 2 #mau: 1.5
3 2 4 5 #AUTIL: 6 #mau: 2.25
For example, the last line indicates that the itemset {3 2 4 5} has average utility of 6 which is larger than its minimum average-utility threshold 2.25. The other lines follows the same format.
Implementation details
This is the original implementation of the algorithm.
Where can I get more information about the HAUI-MMAU algorithm?
This is the reference of the article describing the HAUI-MMAU algorithm:
Jerry Chun-Wei Lin, Ting Li, Philippe Fournier-Viger, Tzung-Pei Hong, and Ja-Hwung Su. Efficient Mining of High Average-Utility Itemsets with Multiple Minimum Thresholds[C]//Proceedings of the Industrial Conference on Data Mining, 2016:14-28.
Besides, for a general overview of high utility itemset mining, you may read this survey paper.