极客时间-轻松学习，高效学习-极客邦

mickey

2019-09-27

import numpy as np
import pandas as pd
title = ['牛奶', '面包', '尿布', '可乐', '啤酒', '鸡蛋'];
x = [[1, 1, 1, 0, 0, 0],
     [0, 1, 1, 1, 1, 0],
     [1, 0, 1, 0, 1, 1],
     [1, 1, 1, 0, 1, 0],
     [1, 1, 1, 1, 0, 0]]
df = pd.DataFrame(x, columns=title)

# 创建两个表分别作为支持度和置信度的准备表
df1 = pd.DataFrame(np.zeros([1, 6]), index=['支持度'], columns=title)
df2 = pd.DataFrame(np.zeros([6, 6]), index=title, columns=title)
df3 = pd.DataFrame(np.zeros([6, 6]), index=title, columns=title)

# 计算支持度
for i in x:
    for j in range(1):
        for k in range(j, 6):
           if not i[k] : continue
           df1.iloc[j,k] += 1

support = df1.apply(lambda x: x /5)
# 返回支持度的结果
print(support)

# 计算置信度
for i in x:
    for j in range(5):
        # 如果为0 就跳过
        if not i[j] : continue
        # 如果不0，继续遍历，如果有购买，便+1
        for k in range(j+1,5):
            if not i[k] : continue
            df2.iloc[j,k] += 1
            df2.iloc[k,j] += 1
for j in range(6):
    df3.iloc[j] = df2.iloc[j] / df.sum()[j]
confidence = df3.round(2) # 以3位小数返回置信度表
# 返回置信度的结果
print(confidence)

      牛奶面包尿布可乐啤酒鸡蛋
支持度 0.8 0.8 1.0 0.4 0.6 0.2
      牛奶面包尿布可乐啤酒鸡蛋
牛奶 0.00 0.75 1.0 0.25 0.5 0.0
面包 0.75 0.00 1.0 0.50 0.5 0.0
尿布 0.80 0.80 0.0 0.40 0.6 0.0
可乐 0.50 1.00 1.0 0.00 0.5 0.0
啤酒 0.67 0.67 1.0 0.33 0.0 0.0
鸡蛋 0.00 0.00 0.0 0.00 0.0 0.0

展开

作者回复: 赞一下，除了自己写，你还可以使用efficient_apriori或者mlxtend工具包
from mlxtend.frequent_patterns import apriori
使用起来很方便



 8
ttttt

2019-09-27

遇到错误：NotSupportedError: (mysql.connector.errors.NotSupportedError) Authentication plugin 'caching_sha2_password' is not supported (Background on this error at: http://sqlalche.me/e/tw8g)
解决方法
engine = sql.create_engine( 'mysql+pymysql://{}:{}@{}/{}'.format(user, passwd, host, database))
mysql+mysqlconnector 改成 mysql+pymysql 就行了

展开



 1
ttttt

2019-09-27

efficient-apriori官方文档
https://efficient-apriori.readthedocs.io/en/stable/

作者回复: 感谢分享文档，除了使用efficient-apriori，还可以使用mlxtend.frequent_patterns 这个工具



 1
骑行的掌柜J

2019-12-06

评论里朋友ttttt说”
遇到错误：mysql.connector.errors.NotSupportedError) Authentication plugin 'caching_sha2_password' is not supported “
换pymysql就可以，不过我这里有另一种解法，可以到我的博客看看，希望对你有帮助！谢谢
https://blog.csdn.net/weixin_41013322/article/details/103427293

作者回复: 整理的很好




Destroy、

2019-09-27

transactions = []
temp_index = 0
for i, v in orders_series.items():
    if i != temp_index:
        temp_set = set()
        temp_index = i
        temp_set.add(v)
        transactions.append(temp_set)
        print(transactions)
    else:
        temp_set.add(v)
老师，这里的transactions = [] 里面的元素，不应该是每个订单所有的商品集合吗？但是上述代码不是实现这个需求

展开

作者回复: transactions里面是每个订单的商品集合，你可以运行下是work的




ttttt

2019-09-27

# 一行代码数据集格式转换
# transactions = list(data.groupby('Transaction').agg(lambda x: set(x.Item.values))['Item'])
# 完整代码
from efficient_apriori import apriori
import sqlalchemy as sql
import pandas as pd

# 数据加载
engine = sql.create_engine('mysql+pymysql://root:passwd@localhost/wucai')
query = 'SELECT * FROM bread_basket'
data = pd.read_sql_query(query, engine)

# 统一小写
data['Item'] = data['Item'].str.lower()
# 去掉none项
data = data.drop(data[data.Item == 'none'].index)

# 得到一维数组orders_series，并且将Transaction作为index, value为Item取值
orders_series = data.set_index('Transaction')['Item']
# 将数据集进行格式转换
transactions = transactions = list(data.groupby('Transaction').agg(lambda x: set(x.Item.values))['Item'])

# 挖掘频繁项集和频繁规则
itemsets, rules = apriori(transactions, min_support=0.02, min_confidence=0.5)
print('频繁项集：', itemsets)
print('关联规则：', rules)

# ----------输出结果------------------ #
频繁项集： {1: {('alfajores',): 344, ('bread',): 3096, ('brownie',): 379, ('cake',): 983, ('coffee',): 4528, ('cookies',): 515, ('farm house',): 371, ('hot chocolate',): 552, ('juice',): 365, ('medialuna',): 585, ('muffin',): 364, ('pastry',): 815, ('sandwich',): 680, ('scandinavian',): 275, ('scone',): 327, ('soup',): 326, ('tea',): 1350, ('toast',): 318, ('truffles',): 192}, 2: {('bread', 'cake'): 221, ('bread', 'coffee'): 852, ('bread', 'pastry'): 276, ('bread', 'tea'): 266, ('cake', 'coffee'): 518, ('cake', 'tea'): 225, ('coffee', 'cookies'): 267, ('coffee', 'hot chocolate'): 280, ('coffee', 'juice'): 195, ('coffee', 'medialuna'): 333, ('coffee', 'pastry'): 450, ('coffee', 'sandwich'): 362, ('coffee', 'tea'): 472, ('coffee', 'toast'): 224}}
关联规则： [{cake} -> {coffee}, {cookies} -> {coffee}, {hot chocolate} -> {coffee}, {juice} -> {coffee}, {medialuna} -> {coffee}, {pastry} -> {coffee}, {sandwich} -> {coffee}, {toast} -> {coffee}]

展开

作者回复: Good Job

 1


mickey

2019-09-27

    支持度
牛奶    0.8
面包    0.8
尿布    1
可乐    0.4
啤酒    0.6
鸡蛋    0.2

展开

作者回复: 正确，同时（牛奶、面包、尿布）的支持度应该是3/5=0.6




学习

2019-09-27

牛奶，面包，尿布同时出现是3，支持度是3/5=0.6

作者回复: 正确



