Mining Quantiative High Utility Itemsets using the FHUQI-Miner Algorithm (SPMF documentation)

This example explains how to run the FHUQI-Miner algorithm using the SPMF open-source data mining library.

How to run this example?

What is FHUQI-Miner?

FHUQI-Miner is a fast algorithm for mining quantitative high utility itemsets proposed by Nouioua et al. (2021) . Unlike traditional high utility itemsets, the FHUQI-Miner algorithm will not only find patterns that have a high utility, but also give information about the ranges of purchase quantities (internal utility) that lead to this high utility. This is the original implementation and was shown to be faster than the previous best algorithm (FHUQI-Miner).

What is the input?

The input of FHUQI-Miner is a transaction database with purchase quantities (internal utility) and unit profit information (external utility). Moreover, three other parameter must be set:

For details about the meaning of these parameters, please see the FHUQI-Miner research paper.

A transaction database is a set of transactions, where each transaction is a list of distinct items (symbols). For example, let's consider the following transaction database. It consists of 4 transactions (t1, t2, ..., t4) and 4 items (1, 2, 3, 4). For instance, transaction t1 is the set of items {1, 2, 3, 4}. In a transaction database, each item has a corresponding purchase quantity. For examle, in the first transaction, the quantity of item 1 is 3, the quantity of item 2 is 3, the quantity of item 2 is 6, and the quantity of item 4 is 5. This database is provided in the file "dbHUQI.txt" of the SPMF distribution. It is important to note that an item is not allowed to appear twice in the same transaction, and that items are assumed to be sorted according to some order (e.g. ascending order of numbers).

Transaction ID

Items

Quantities

t1

1 2 3 4

3 3 2 5

t2

1 2 3

2 3 3

t3

1 2 4

7 6 4

t4

1 2 3 4

7 8 9 6

Besides a table indicating the unit profit of each item is required. For example, consider the following table:

Item

Unit profit

1

42

2

87

3

94

4

97

It indicates that the unit profit yield by selling the items 1, 2, 3 and 4 are 42, 87, 94 and 97, respectively. This table is provided in the file "dbHUQI_p.txt".

Note that in research papers, unit profit is often called "external utility" while purchase quantities are called "internal utility". This is the terminology used by researchers. Although in the above example, utility is used to represent quantities and profit, it could be more generally be used to represent the relative importance of items to the user. Thus, it can be also applied to other types of data besides shopping databases.

What is the output?

The output file format is defined as follows. It is a text file, where each line represents a quantitative high utility itemset. An itemset contains several items. Each item is represented in the form (x,y) where x is the item and y is its purchase quantity, or by the form (x,y,z) indicating that from y to z units of item x where purchased. At the end of each line, the utility of the itemset is indicated after the keyword #UTIL: For examle, here is a few line of the output file for the above example:

(4,4,6) #UTIL: 1455
(4,5,6) #UTIL: 1067
(3,9) (2,8) #UTIL: 1542
(3,9) (4,6) #UTIL: 1428
(3,9) (1,7) #UTIL: 1140
(3,9) (2,8) (4,6) #UTIL: 2124
(3,9) (2,8) (1,7) #UTIL: 1836
(3,9) (2,8) (4,6) (1,7) #UTIL: 2418
(3,9) (4,6) (1,7) #UTIL: 1722
(2,8) (4,6) #UTIL: 1278
(2,8) (4,6) (1,7) #UTIL: 1572
(4,4,6) (1,7) #UTIL: 1558
(2,6) (4,4) (1,7) #UTIL: 1204

For example, the third line indicates that the itemset containing 9 units of item 3 and 8 units of item 2 yield a utility of 1542 $. The second line indicates that the purchase of 5 to 6 units of item 4 yield a 1067 $ profit.

Input file format

FHUQI-Miner takes two files as input, defined as follows.

The first file (e.g. dbHUQI.txt) is a text file containing transactions. Each lines represents a transaction. Each line is a list of positive numbers where an item is followed by a space and then its quantity. Then, the next item appears, followed by a space, followed by its quantity, and so on... This is the file dbHUQI.txt used in the example.

1,3 2,3 3,2 4,5
1,2 2,3 3,3
1,7 2,6 4,4
1,7 2,8 3,9 4,6

The second file is a text file (e.g. dbHUQI.txt) which provides the unit profit of each item. Each is an item, followed by a space, followed by its unit profit. This is the file dbHUQI_p.txt used in the above example.

1, 42
2, 87
3, 94
4, 97

Output file format

The output file format is defined as follows. It is a text file, where each line represents a quantitative high utility itemset. An itemset contains several items. Each item is represented in the form (x,y) where x is the item and y is its purchase quantity, or by the form (x,y,z) indicating that from y to z units of item x where purchased. At the end of each line, the utility of the itemset is indicated after the keyword #UTIL: For examle, here is a few line of the output file for the above example:

(4,4,6) #UTIL: 1455
(4,5,6) #UTIL: 1067
(3,9) (2,8) #UTIL: 1542
(3,9) (4,6) #UTIL: 1428
(3,9) (1,7) #UTIL: 1140
(3,9) (2,8) (4,6) #UTIL: 2124
(3,9) (2,8) (1,7) #UTIL: 1836
(3,9) (2,8) (4,6) (1,7) #UTIL: 2418
(3,9) (4,6) (1,7) #UTIL: 1722
(2,8) (4,6) #UTIL: 1278
(2,8) (4,6) (1,7) #UTIL: 1572
(4,4,6) (1,7) #UTIL: 1558
(2,6) (4,4) (1,7) #UTIL: 1204

Implementation details

This is the original implementation of the algorithm.

Where can I get more information about the FHUQI-Miner algorithm?

This is the reference of the article describing the FHUQI-Miner algorithm:

Nouioua, M., Fournier-Viger, P., W. C.-W., Lin, J. C.-W., Gan, W. (2021). FHUQI-Miner: Fast High Utility Quantitative Itemset Mining. Applied Intelligence

Besides, for a general overview of high utility itemset mining, you may read this survey paper.