Skip to content

FPMax Wrong Support? #709

@shenxiangzhuang

Description

@shenxiangzhuang

Test Dataset

I'm using the mushroom dataset which can be found at here

Versions

Python=3.7.6
mlxtend=0.17.2

Test Code

# coding=utf-8
import numpy as np
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import fpmax


# mushroom dataset
mushroomdata = []
with open("mushroom.txt", "r") as f:
    for line in f.readlines():
        nums = [int(c) for c in line.strip("\n").split(" ") if c != "" and c != "\r"]
        mushroomdata.append(nums)
# trans
te = TransactionEncoder()
mushroomTeArray = te.fit(mushroomdata).transform(mushroomdata)
mushroomDf = pd.DataFrame(mushroomTeArray, columns=te.columns_)
# MFI, min-sup = 0.3
mushroomMfIDf = fpmax(mushroomDf, min_support=0.3, use_colnames=True)
mushroomMfIDf["count"] = len(mushroomdata) * mushroomMfIDf.support
mushroomMFI = mushroomMfIDf[["itemsets", "count"]]
mushroomMFI["itemsets"] = mushroomMFI["itemsets"].apply(lambda x: tuple(sorted(x)))

And the last output is (34, 39, 59, 63, 85, 86, 90) with a support 2696.0. BUT, the true support of the MFI is 2504.

i = 0                                                                                                                                                                                              
for row in mushroomdata: 
    if len(set(row) & {34, 39, 59, 63, 85, 86, 90}) == 7: 
        i += 1 
print(i)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions