Mining Frequent Weighted Itemsets using the NFWI Algorithm (SPMF documentation)

This example explains how to run the NFWI algorithm using the SPMF open-source data mining library.

How to run this example?

What is NFWI?

NFWI (Bui et al., Applied Intelligence 2021) is an algorithm for discovering frequent weighted itemsets in a transaction database where each item is associated with a weight representing its importance. NFWI uses a novel data structure called the WN-list (Weighted N-list) along with a WN-tree to efficiently mine weighted itemsets without candidate generation.

The weighted support of an itemset X is defined as:

wsup(X) = ΣT∈g(X) w(X,T) / ΣT∈D maxi∈T w(i)

where g(X) is the set of transactions containing X, w(X,T) is the weight of X in transaction T (computed as the sum of weights of items in X that appear in T), and the denominator is the total transaction weight across the entire database D. An itemset X is a Frequent Weighted Itemset (FWI) if its weighted support is no less than a user-specified minimum weighted support threshold.

What is the input?

NFWI takes as input:

Let's consider the following database consisting of 10 transactions and 5 items (1, 2, 3, 4, 5). This database is provided in the text file "DB_RWFIM.txt" in the package ca.pfv.spmf.algorithms.frequentpatterns.NFWI of the SPMF distribution.

Transaction ID Items
T11 2 3 4 5
T21 3 4
T32 3 5
T41 2 4 5
T52 3 4 5
T61 3 5
T72 4
T81 2 3
T93 4 5
T101 2 4

The weight table is provided in the file "weights_RWFIM.txt" and assigns a weight to each item as follows:

Item Weight
10.40
20.70
31.00
40.50
50.45

For example, item 3 has the highest weight (1.00), meaning it is the most important item, while item 1 has the lowest weight (0.40).

What is the output?

The output of NFWI is the set of all frequent weighted itemsets whose weighted support is no less than the minimum weighted support threshold.

For example, if we run NFWI on the database above with a minimum weighted support of 40% (i.e., minWS = 0.40), we obtain the following 11 frequent weighted itemsets:

Itemset Weighted Support (wsup)
{1}0.5783
{1, 3}0.4106
{5}0.6044
{2, 5}0.4012
{3, 5}0.5222
{4}0.6739
{2, 4}0.4681
{3, 4}0.4099
{2}0.6953
{2, 3}0.4313
{3}0.7360

For example, consider the second row of the table. It indicates that itemset {1,3} consisting of items 1 and 3, has a weighted support of 0.4106. It is a frequent weighted itemset because this value is no less than the minimum threshold set to 0.4 in this example.

Input file format

NFWI takes two input files as input:

Transaction database file: The transaction database file is a text file where each line represents one transaction. Each transaction is a list of items (positive integers) separated by single spaces. Lines starting with #, %, or @ are treated as comments and ignored. For example, the file DB_RWFIM.txt is defined as follows:

1 2 3 4 5
1 3 4
2 3 5
1 2 4 5
2 3 4 5
1 3 5
2 4
1 2 3
3 4 5
1 2 4

Each line is a transaction. For example, the first line means that transaction T1 contains items 1, 2, 3, 4, and 5.

Weight table file. The weight table file is a text file where each line associates one item with its weight. Each line contains an item identifier (a positive integer) followed by a single space and the item weight (a positive real number). Lines starting with #, %, or @ are treated as comments and ignored.

For example, the file weights_RWFIM.txt is defined as follows:

1 0.4
2 0.7
3 1.0
4 0.5
5 0.45

Each line associates one item with its weight. For example, item 1 has weight 0.4 and item 3 has the maximum weight of 1.0.

Output file format

The output file is a text file where each line represents one frequent weighted itemset. On each line, the items of the itemset are first listed as positive integers separated by single spaces. After all the items, the keyword "#WSUP:" appears, followed by the weighted support of the itemset (a real number formatted to 4 decimal places).

For example, the output file for this example is:

1 #WSUP: 0.5783
1 3 #WSUP: 0.4106
5 #WSUP: 0.6044
2 5 #WSUP: 0.4012
3 5 #WSUP: 0.5222
4 #WSUP: 0.6739
2 4 #WSUP: 0.4681
3 4 #WSUP: 0.4099
2 #WSUP: 0.6953
2 3 #WSUP: 0.4313
3 #WSUP: 0.7360

For example, the first line indicates that the itemset {1} has a weighted support of 0.5783. The following lines follow the same format.

Performance

Frequent weighted itemset mining is a generalization of traditional frequent itemset mining that introduces item weights to reflect the varying importance of items. NFWI achieves efficiency by:

In practice, NFWI is efficient for moderate to large databases and was presented as outperforming prior weighted itemset mining algorithms that use the same weighted support definition.

Implementation details

The version offered in SPMF is an implementation of the NFWI algorithm based on the description provided in the original paper (Bui et al., Applied Intelligence 2021). The implementation builds a WN-tree with pre-order and post-order traversal codes, constructs WN-lists for single items, and then recursively mines weighted itemsets using equivalence class-based depth-first search with WN-list intersection operations.

Where can I get more information about the NFWI algorithm?

This is the reference of the article describing the NFWI algorithm:

Huong Bui, Bay Vo, Ham Nguyen, Tu-Anh Nguyen-Hoang, Tzung-Pei Hong, A weighted N-list-based method for mining frequent weighted itemsets, Expert Systems with Applications, Volume 96, 2018, Pages 388-405