We need to set up the R package arules and rpy2 to connect to R. Create a new conda environment.
To install arules, open R and install the package arules using install.packages("arules")
.
To install rpy2 and pandas use:
conda install -c conda-forge rpy2
conda install -c conda-forge pandas
The data need to be prepared as a Pandas dataframe. Here we have 9 transactions with three items called A, B and C. True means that a transaction contains the item.
import pandas as pd
df = pd.DataFrame (
[
[True,True, True],
[True, False,False],
[True, True, True],
[True, False, False],
[True, True, True],
[True, False, True],
[True, True, True],
[False, False, True],
[False, True, True],
[True, False, True],
],
columns=list ('ABC'))
df
A | B | C | |
---|---|---|---|
0 | True | True | True |
1 | True | False | False |
2 | True | True | True |
3 | True | False | False |
4 | True | True | True |
5 | True | False | True |
6 | True | True | True |
7 | False | False | True |
8 | False | True | True |
9 | True | False | True |
from rpy2.robjects import pandas2ri
pandas2ri.activate()
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
arules = importr("arules")
# some helper functions
def arules_as_matrix(x, what = "items"):
return ro.r('function(x) as(' + what + '(x), "matrix")')(x)
def arules_as_dict(x, what = "items"):
l = ro.r('function(x) as(' + what + '(x), "list")')(x)
l.names = [*range(0, len(l))]
return dict(zip(l.names, map(list,list(l))))
def arules_quality(x):
return x.slots["quality"]
itsets = arules.apriori(df,
parameter = ro.ListVector({"supp": 0.1, "target": "frequent itemsets"}))
Apriori Parameter specification: confidence minval smax arem aval originalSupport maxtime support minlen NA 0.1 1 none FALSE TRUE 5 0.1 1 maxlen target ext 10 frequent itemsets TRUE Algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE Absolute minimum support count: 1 set item appearances ...[0 item(s)] done [0.00s]. set transactions ...[3 item(s), 10 transaction(s)] done [0.00s]. sorting and recoding items ... [3 item(s)] done [0.00s]. creating transaction tree ... done [0.00s]. checking subsets of size 1 2 3 done [0.00s]. sorting transactions ... done [0.00s]. writing ... [7 set(s)] done [0.00s]. creating S4 object ... done [0.00s].
print(arules.DATAFRAME(itsets))
items support transIdenticalToItemsets count 1 {B} 0.5 0.0 5 2 {A} 0.8 0.2 8 3 {C} 0.8 0.1 8 4 {A,B} 0.4 0.0 4 5 {B,C} 0.5 0.1 5 6 {A,C} 0.6 0.2 6 7 {A,B,C} 0.4 0.4 4
The frequent itemsets can be accessed as a binary matrix.
its = arules_as_matrix(itsets)
print(its)
[[0 1 0] [1 0 0] [0 0 1] [1 1 0] [0 1 1] [1 0 1] [1 1 1]]
Access itemset as a dictionary
its = arules_as_dict(itsets)
print(its)
{'0': ['B'], '1': ['A'], '2': ['C'], '3': ['A', 'B'], '4': ['B', 'C'], '5': ['A', 'C'], '6': ['A', 'B', 'C']}
Accessing the quality measures
arules_quality(itsets)
support | transIdenticalToItemsets | count | |
---|---|---|---|
1 | 0.5 | 0.0 | 5 |
2 | 0.8 | 0.2 | 8 |
3 | 0.8 | 0.1 | 8 |
4 | 0.4 | 0.0 | 4 |
5 | 0.5 | 0.1 | 5 |
6 | 0.6 | 0.2 | 6 |
7 | 0.4 | 0.4 | 4 |
rules = arules.apriori(df,
parameter = ro.ListVector({"supp": 0.1, "conf": 0.8}))
Apriori Parameter specification: confidence minval smax arem aval originalSupport maxtime support minlen 0.8 0.1 1 none FALSE TRUE 5 0.1 1 maxlen target ext 10 rules TRUE Algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE Absolute minimum support count: 1 set item appearances ...[0 item(s)] done [0.00s]. set transactions ...[3 item(s), 10 transaction(s)] done [0.00s]. sorting and recoding items ... [3 item(s)] done [0.00s]. creating transaction tree ... done [0.00s]. checking subsets of size 1 2 3 done [0.00s]. writing ... [6 rule(s)] done [0.00s]. creating S4 object ... done [0.00s].
print(arules.DATAFRAME(rules))
LHS RHS support confidence coverage lift count 1 {} {A} 0.8 0.8 1.0 1.00 8 2 {} {C} 0.8 0.8 1.0 1.00 8 3 {B} {A} 0.4 0.8 0.5 1.00 4 4 {B} {C} 0.5 1.0 0.5 1.25 5 5 {A,B} {C} 0.4 1.0 0.4 1.25 4 6 {B,C} {A} 0.4 0.8 0.5 1.00 4
Get the left-hand-side, the right-hand-side and the rule quality.
lhs = arules_as_matrix(rules, what = "lhs")
print (lhs)
[[0 0 0] [0 0 0] [0 1 0] [0 1 0] [1 1 0] [0 1 1]]
rhs = arules_as_matrix(rules, what = "rhs")
print(rhs)
[[1 0 0] [0 0 1] [1 0 0] [0 0 1] [0 0 1] [1 0 0]]
lhs = arules_as_dict(rules, what = "lhs")
print (lhs)
{'0': [], '1': [], '2': ['B'], '3': ['B'], '4': ['A', 'B'], '5': ['B', 'C']}
rhs = arules_as_dict(rules, what = "rhs")
print (rhs)
{'0': ['A'], '1': ['C'], '2': ['A'], '3': ['C'], '4': ['C'], '5': ['A']}
arules_quality(rules)
support | confidence | coverage | lift | count | |
---|---|---|---|---|---|
1 | 0.8 | 0.8 | 1.0 | 1.00 | 8 |
2 | 0.8 | 0.8 | 1.0 | 1.00 | 8 |
3 | 0.4 | 0.8 | 0.5 | 1.00 | 4 |
4 | 0.5 | 1.0 | 0.5 | 1.25 | 5 |
5 | 0.4 | 1.0 | 0.4 | 1.25 | 4 |
6 | 0.4 | 0.8 | 0.5 | 1.00 | 4 |