-
Notifications
You must be signed in to change notification settings - Fork 886
Closed
Description
Test Dataset
I'm using the mushroom dataset which can be found at here
Versions
Python=3.7.6
mlxtend=0.17.2
Test Code
# coding=utf-8
import numpy as np
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import fpmax
# mushroom dataset
mushroomdata = []
with open("mushroom.txt", "r") as f:
for line in f.readlines():
nums = [int(c) for c in line.strip("\n").split(" ") if c != "" and c != "\r"]
mushroomdata.append(nums)
# trans
te = TransactionEncoder()
mushroomTeArray = te.fit(mushroomdata).transform(mushroomdata)
mushroomDf = pd.DataFrame(mushroomTeArray, columns=te.columns_)
# MFI, min-sup = 0.3
mushroomMfIDf = fpmax(mushroomDf, min_support=0.3, use_colnames=True)
mushroomMfIDf["count"] = len(mushroomdata) * mushroomMfIDf.support
mushroomMFI = mushroomMfIDf[["itemsets", "count"]]
mushroomMFI["itemsets"] = mushroomMFI["itemsets"].apply(lambda x: tuple(sorted(x)))And the last output is (34, 39, 59, 63, 85, 86, 90) with a support 2696.0. BUT, the true support of the MFI is 2504.
i = 0
for row in mushroomdata:
if len(set(row) & {34, 39, 59, 63, 85, 86, 90}) == 7:
i += 1
print(i)Metadata
Metadata
Assignees
Labels
No labels