Mining High Average-Utility Itemsets with Multiple Thresholds in a Transaction Database using the HAUI-MMAU Algorithm (SPMF documentation)

This example explains how to run the HAUI-MMAU algorithm using the SPMF open-source data mining library.

How to run this example?

What is HAUI-MMAU?

HAUI-MMAU is an algorithm for mining average-utility itemsets by using multiple minimum average-utility thresholds. Unlike some other algorithms such as HAUI-Miner, the HAUI-MMAU algorithms allows to set a different threshold for each item, rather than a single threshold to evaluate all items. Setting multiple thresholds is useful because it allows to set lower minimum average-utility thresholds for low profit items. Therefore, it allows discovering high average-utility itemsets containing low profit items.

What is the input?

The input of HAUI-MMAU is a transaction database and a list of minimum average-utility thresholds indicating the minimum average-utility threshold for each item.
A transaction database is a set of transactions, where each transaction is a list of distinct items (symbols). For example, let's consider the following transaction database. It consists of 5 transactions (t1, t2, ..., t5) and 8 items (1, 2, 3, 4, 5, 6). For instance, transaction t1 is the set of items {2, 3, 4, 5}. This database is provided in the file "contextHAUIMMAU.txt" of the SPMF distribution. It is important to note that an item is not allowed to appear twice in the same transaction.


Transaction ID

Items

Utilities

t1

2 3 4 5

14 2 6 4

t2

2 3 4

8 3 6

t3

1 4

10 2

t4

1 3 6

5 6 4

t5

2 3 4 6

4 3 2 2

The list of minimum average-utility threshold is stored in a text file that is read as input by the algorithm. This is provided in the file "MAU_Utility.txt":


item

minimum average-utility threshold

1

5

2

2

3

1

4

2

5

4

6

1

This file indicated for example that the average-utility threshold to be used for item 1 is 5.

What is the output?

The output of HAUI-MMAU is the set of all high average-utility itemsets contained in the database.

What is a high average-utility itemset? The average-utility of an itemset is the sum of the average-utilities of transactions containing the itemset. An itemset is a high average-utility itemset if its average-utility is higher or equal to the smallest minimum average-utility threshold among the minimum average-utility thresholds of all its items. For example, the itemset {1, 4} is high average-utility because it appears in transactions (t3) and its average-utility (= 6)is higher than the average minimum average-utility item 1, item 4, which is (5+2)/2 = 3.5.

Why HAUI-MMAU is useful? It is useful because it permits setting lower minimum average-utility thresholds for low profit items. Therefore, it allows discovering high average-utility itemsets containing low profit items.
If we run HAUI-MMAU on the previous transaction database with the MAU_Utility.txt file previously described, we get the following result, where each line represents an itemsets followed by ":" and then its absolute average-utility and minimum average-utility count:

1  #AUTIL: 15 #mau: 5.0
2  #AUTIL: 26 #mau: 2.0
3  #AUTIL: 14 #mau: 1.0
4  #AUTIL: 16 #mau: 2.0
5  #AUTIL: 4 #mau: 4.0
6  #AUTIL: 6 #mau: 1.0
3 6  #AUTIL: 7 #mau: 1.0
3 2  #AUTIL: 17 #mau: 1.5
3 4  #AUTIL: 11 #mau: 1.5
3 5  #AUTIL: 3 #mau: 2.5
3 1  #AUTIL: 5 #mau: 3.0
6 2  #AUTIL: 3 #mau: 1.5
6 4  #AUTIL: 2 #mau: 1.5
6 1  #AUTIL: 4 #mau: 3.0
2 4  #AUTIL: 20 #mau: 2.0
2 5  #AUTIL: 9 #mau: 3.0
4 5  #AUTIL: 5 #mau: 3.0
4 1  #AUTIL: 6 #mau: 3.5
3 6 2  #AUTIL: 3 #mau: 1.3333334
3 6 4  #AUTIL: 2 #mau: 1.3333334
3 6 1  #AUTIL: 5 #mau: 2.3333333
3 2 4  #AUTIL: 16 #mau: 1.6666666
3 2 5  #AUTIL: 6 #mau: 2.3333333
3 4 5  #AUTIL: 4 #mau: 2.3333333
6 2 4  #AUTIL: 2 #mau: 1.6666666
2 4 5  #AUTIL: 8 #mau: 2.6666667
3 6 2 4  #AUTIL: 2 #mau: 1.5
3 2 4 5  #AUTIL: 6 #mau: 2.25

For example, the first line represent the itemset “3, 2” has average-utility 17 and its minimum average-utility count is 1.5. The other lines follows the same format.

Input file format

HAUI-MMAU takes two files as input, defined as follows.

The first file (e.g. contextHAUIMMAU.txt) is a text file containing transactions. Each lines represents a transaction. Each line is composed of three sections, as follows.

2 3 4 5:26:14 2 6 4
2 3 4:17:8 3 6
1 4:12:10 2
1 3 6:15:5 6 4
2 3 4 6:11:4 3 2 2

Consider the first line. It means that the first transaction is the itemset {2, 3, 4, 5} with utilities {14, 2, 6, 4}. The following lines follow the same format.

The second file is a text file (e.g. MAU_Utility.txt) which provides the minimum average-utility to be used for each item. Each line indicate the minimum average-utility for an item and consists of two integer values separated by a single space. The first value is the item. The second value is the minimum average-utility value to be used for this item. For example, here is the file used in this example. The first line indicate that for item "1" and the minimum average-utility to be used is 1 (one transaction). The other lines follow the same format.
1 5
2 2
3 1
4 2
5 4
6 1

Output file format

The output file format is defined as follows. It is a text file, where each line represents a high average-utility itemset. On each line, the items of the itemset are first listed. Each item is represented by an integer, followed by a single space. After, all the items, the keyword " #AUTIL: " appears and is followed by the average-utility of the itemset. For example, we show below the output file for this example.

1  #AUTIL: 15 #mau: 5.0
2  #AUTIL: 26 #mau: 2.0
3  #AUTIL: 14 #mau: 1.0
4  #AUTIL: 16 #mau: 2.0
5  #AUTIL: 4 #mau: 4.0
6  #AUTIL: 6 #mau: 1.0
3 6  #AUTIL: 7 #mau: 1.0
3 2  #AUTIL: 17 #mau: 1.5
3 4  #AUTIL: 11 #mau: 1.5
3 5  #AUTIL: 3 #mau: 2.5
3 1  #AUTIL: 5 #mau: 3.0
6 2  #AUTIL: 3 #mau: 1.5
6 4  #AUTIL: 2 #mau: 1.5
6 1  #AUTIL: 4 #mau: 3.0
2 4  #AUTIL: 20 #mau: 2.0
2 5  #AUTIL: 9 #mau: 3.0
4 5  #AUTIL: 5 #mau: 3.0
4 1  #AUTIL: 6 #mau: 3.5
3 6 2  #AUTIL: 3 #mau: 1.3333334
3 6 4  #AUTIL: 2 #mau: 1.3333334
3 6 1  #AUTIL: 5 #mau: 2.3333333
3 2 4  #AUTIL: 16 #mau: 1.6666666
3 2 5  #AUTIL: 6 #mau: 2.3333333
3 4 5  #AUTIL: 4 #mau: 2.3333333
6 2 4  #AUTIL: 2 #mau: 1.6666666
2 4 5  #AUTIL: 8 #mau: 2.6666667
3 6 2 4  #AUTIL: 2 #mau: 1.5
3 2 4 5  #AUTIL: 6 #mau: 2.25

For example, the last line indicates that the itemset {3 2 4 5} has average utility of 6 which is larger than its minimum average-utility threshold 2.25. The other lines follows the same format.

Implementation details

This is the original implementation of the algorithm.

Where can I get more information about the HAUI-MMAU algorithm?

This is the reference of the article describing the HAUI-MMAU algorithm:

Jerry Chun-Wei Lin, Ting Li, Philippe Fournier-Viger, Tzung-Pei Hong, and Ja-Hwung Su. Efficient Mining of High Average-Utility Itemsets with Multiple Minimum Thresholds[C]//Proceedings of the Industrial Conference on Data Mining, 2016:14-28.

Besides, for a general overview of high utility itemset mining, you may read this survey paper.