Mining Frequent Weighted Closed Itemsets using the NFWCI Algorithm (SPMF documentation)
This example explains how to run the NFWCI algorithm using the SPMF open-source data mining library.
How to run this example?
- If you are using the graphical interface, (1) choose the "NFWCI" algorithm, (2) select the input file "DB_RWFIM.txt", (3) set the weight file "weights_RWFIM.txt", (4) set the output file name (e.g. "output.txt"), (5) set the minimum weighted support to 0.40, and (6) click "Run algorithm".
- If you want to execute this example from the command line,
then execute this command:
java -jar spmf.jar run NFWCI DB_RWFIM.txt output.txt weights_RWFIM.txt 0.40 in a folder containing spmf.jar and the example input files DB_RWFIM.txt and weights_RWFIM.txt. - If you are using the source code version of SPMF, launch the file "MainTestNFWCI.java" in the package ca.pfv.spmf.algorithms.frequentpatterns.NFWI.
What is NFWCI?
NFWCI (Bui et al., Applied Intelligence 2021) is an algorithm for discovering frequent weighted closed itemsets in a transaction database where each item is associated with a weight representing its importance. NFWCI extends the NFWI algorithm by adding closedness checking to output only the closed itemsets among all frequent weighted itemsets.
An itemset X is closed if there exists no proper superset Y ⊃ X such that Y has the same weighted support as X. Mining closed itemsets produces a more compact result set while preserving complete information about all frequent weighted itemsets.
The weighted support of an itemset X is defined as:
wsup(X) = ΣT∈g(X) w(X,T) / ΣT∈D maxi∈T w(i)
where g(X) is the set of transactions containing X, w(X,T) is the weight of X in transaction T (computed as the sum of weights of items in X that appear in T), and the denominator is the total transaction weight across the entire database D. An itemset X is a Frequent Weighted Closed Itemset (FWCI) if (1) its weighted support is no less than a user-specified minimum weighted support threshold, and (2) it is closed.
What is the input?
NFWCI takes as input:
- A transaction database (a set of transactions, each being a set of items).
- A weight table associating a weight (a positive real number) to each item.
- A minimum weighted support mi
consisting of 10 transactions and 5 items (1, 2, 3, 4, 5). This database is provided in the text file "DB_RWFIM.txt" in the package ca.pfv.spmf.algorithms.frequentpatterns.NFWI of the SPMF distribution.
Transaction ID Items T1 1 2 3 4 5 T2 1 3 4 T3 2 3 5 T4 1 2 4 5 T5 2 3 4 5 T6 1 3 5 T7 2 4 T8 1 2 3 T9 3 4 5 T10 1 2 4 The weight table is provided in the file "weights_RWFIM.txt" and assigns a weight to each item as follows:
Item Weight 1 0.40 2 0.70 3 1.00 4 0.50 5 0.45 For example, item 3 has the highest weight (1.00), meaning it is the most important item, while item 1 has the lowest weight (0.40).
What is the output?
The output of NFWCI is the set of all frequent weighted closed itemsets whose weighted support is no less than the minimum weighted support threshold and that are closed (no proper superset has the same weighted support).
For example, if we run NFWCI on the database above with a minimum weighted support of 40% (i.e., minWS = 0.40), we obtain the following 11 frequent weighted closed itemsets:
-
Itemset Weighted Support (wsup) {1, 3} 0.4106 {1} 0.5783 {3, 5} 0.5222 {2, 5} 0.4012 {5} 0.6044 {3, 4} 0.4099 {2, 4} 0.4681 {4} 0.6739 {2, 3} 0.4313 {2} 0.6953 {3} 0.7360
For example, consider the first row of the table. It indicates that itemset {1,3} consisting of items 1 and 3, has a weighted support of 0.4106. It is a frequent weighted itemset because this value is no less than the minimum threshold set to 0.4 in this example.
Note: Compared to NFWI (which finds all 11 frequent weighted itemsets), NFWCI finds the same 11 itemsets in this example because all happen to be closed. In general, NFWCI produces a subset of the itemsets found by NFWI, containing only those that are closed.
Input file format
NFWCI takes two input files:
Transaction database file. The transaction database file is a text file where each line represents one transaction. Each transaction is a list of items (positive integers) separated by single spaces. Lines starting with #, %, or @ are treated as comments and ignored. For example, the file DB_RWFIM.txt is defined as follows:
1 2 3 4 5
1 3 4
2 3 5
1 2 4 5
2 3 4 5
1 3 5
2 4
1 2 3
3 4 5
1 2 4
Each line is a transaction. For example, the first line means that transaction T1 contains items 1, 2, 3, 4, and 5.
Weight table file. The weight table file is a text file where each line associates one item with its weight. Each line contains an item identifier (a positive integer) followed by a single space and the item weight (a positive real number). Lines starting with #, %, or @ are treated as comments and ignored. For example, the file weights_RWFIM.txt is defined as follows:
1 0.4
2 0.7
3 1.0
4 0.5
5 0.45
Each line associates one item with its weight. For example, item 1 has weight 0.4 and item 3 has the maximum weight of 1.0.
Output file format
The output file is a text file where each line represents one frequent weighted closed itemset. On each line, the items of the itemset are first listed as positive integers separated by single spaces. After all the items, the keyword "#WSUP:" appears, followed by the weighted support of the itemset (a real number formatted to 4 decimal places). For example, the output file for this example is:
1 3 #WSUP: 0.4106
1 #WSUP: 0.5783
3 5 #WSUP: 0.5222
2 5 #WSUP: 0.4012
5 #WSUP: 0.6044
3 4 #WSUP: 0.4099
2 4 #WSUP: 0.4681
4 #WSUP: 0.6739
2 3 #WSUP: 0.4313
2 #WSUP: 0.6953
3 #WSUP: 0.7360
For example, the first line indicates that the itemset {1, 3} is a frequent weighted closed itemset with weighted support 0.4106. The following lines follow the same format.
Performance
Frequent weighted closed itemset mining discovers a compact representation of all frequent weighted itemsets. NFWCI achieves efficiency by:
- Using a WN-tree data structure to compactly represent the weighted transaction database with parent-child relationships encoded via pre-order and post-order codes.
- Employing WN-lists that can be intersected in linear time without rescanning the database.
- Performing ancestral checking using WN-list structural codes to efficiently detect when an itemset is not closed without generating all supersets.
- Using global subsumption pruning to avoid exploring branches that will produce non-closed itemsets.
In practice, NFWCI is efficient and produces a much more compact output than NFWI while preserving all information about frequent weighted itemsets.
Implementation details
The version offered in SPMF is an implementation of the NFWCI algorithm based on the description provided in the original paper (Bui et al., Applied Intelligence 2021). The implementation builds a WN-tree with pre-order and post-order traversal codes, constructs WN-lists for single items, and then recursively mines weighted closed itemsets using equivalence class-based depth-first search with WN-list intersection and ancestral operations for closedness checking.
Where can I get more information about the NFWCI algorithm?
This is the reference of the article describing the NFWCI algorithm:
Bui, H., Vo, B., Nguyen-Hoang, T.-A., Yun, U. (2021). Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy. Applied Intelligence, 51, 1439-1459.