Skip to content

Commit e290585

Browse files
committed
initial commit
1 parent f8fcc5f commit e290585

14 files changed

+1903
-1
lines changed

README.md

Lines changed: 79 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,79 @@
1-
# KNI
1+
Tensorflow implementation of the paper ``An End-to-End Neighborhood-based Interaction Model for Knowledge-enhanced Recommendation``.
2+
This paper is accepted by, and and wins the ``best paper award`` of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data, KDD'19 (DLP-KDD'19), Anchorage, AK, USA.
3+
4+
See our [paper](todo), [poster](./material/kni_poster.pdf), [ppt](./material/kni_presentation.pdf).
5+
If you have any questions, please contact ``kevinqu16@gmail.com`` directly or open a github issue.
6+
I will reply ASAP.
7+
8+
### What's KNI model?
9+
10+
In recommender systems, graph-based models build interaction graphs from historical feedbacks (e.g., user ratings) and side information (e.g., film tags, artists),
11+
and utilize the rich structural information to boost recommendation performance.
12+
13+
![graph-methods](./material/graph-methods.png)
14+
15+
Due to the complex structures and large scales (of the graphs), it is hard to make predictions directly,
16+
thus the existing approaches turn to encoding the meticulous structures into user/item embeddings.
17+
Since the rich structural information is compressed in only 2 nodes and 1 edge,
18+
we concern the valuable local structures are not fully utilized in previous literature,
19+
which we call the ``early summarization issue``.
20+
21+
After reviewing the existing methods, we derive a general architecture of these methods,
22+
and propose ``Neighborhood Interaction`` (NI) model to make predictions from the graph structures directly.
23+
NI is further integrated with graph neural networks (GNNs) and knowledge graphs (KGs), namely Knowledge-enhanced NI (KNI).
24+
25+
KNI model is not only theoretically more expressive, but also achieves great improvements (1.1% ~ 8.4%) over SOTA models.
26+
We also provide statistical analysis and case study to explain the early summarization issue and compare different models' behaviors.
27+
28+
For more details, please refer to our [paper](todo), [poster](./material/kni_poster.pdf), [ppt](./material/kni_presentation.pdf).
29+
30+
### Running step-by-step
31+
32+
Requirements:
33+
- python3
34+
- numpy
35+
- ccipy
36+
- sklearn
37+
- tqdm
38+
- tensorflow-gpu
39+
40+
Step 1. Download the data from [https://pan.baidu.com/s/1usnQtW-YodlPUQ1TNrrafw#list/path=%2Fdataset%2Fkg4rs](https://pan.baidu.com/s/1usnQtW-YodlPUQ1TNrrafw#list/path=%2Fdataset%2Fkg4rs) and uncompress ``pickled_data.tar.gz`` under ``./data/``, like:
41+
42+
43+
./data
44+
ab.pkl
45+
bc.pkl
46+
ml-1m.pkl
47+
ml-20m.pkl
48+
./process
49+
*.py
50+
*.py
51+
52+
The data is processed and pickled by python3, up to 4-hop.
53+
According to your experiment settings, you can remove unreachable nodes and edges of the datasets.
54+
55+
Step 2. Run ``train.py`` with default parameters for ``bc`` dataset.
56+
57+
cd /path/to/code/
58+
python3 train.py --dataset=bc --model=ni
59+
60+
After a while, you will see logs like the following (the train/dev scores are disabled for speed concern):
61+
62+
...
63+
Epoch: 0057 test: auc=0.771917 ll=0.575557 acc=0.706051
64+
Epoch: 0058 test: auc=0.772221 ll=0.575257 acc=0.705843
65+
Epoch: 0059 test: auc=0.772380 ll=0.575088 acc=0.703685
66+
Epoch: 0060 test: auc=0.771758 ll=0.575617 acc=0.704059
67+
Epoch: 0061 test: auc=0.771504 ll=0.575559 acc=0.704017
68+
...
69+
70+
The default script will run the same experiment for 5 times with different random seeds.
71+
You may find the experiments early stop at 0.772 AUC, 0.706 ACC (+/- 0.002).
72+
73+
Now you achieve the new ``state-of-the-art`` :-) (the most recently reported SOTA is RippleNet, 0.729 AUC, 0.663 ACC).
74+
75+
### Stay Connected!
76+
77+
If you see this paper/data/code helpful or related, please cite our paper with the following BibTeX entry
78+
79+
todo

layers.py

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
from utils import zeros, glorot
2+
import tensorflow as tf
3+
4+
flags = tf.app.flags
5+
FLAGS = flags.FLAGS
6+
7+
_LAYER_UIDS = {}
8+
9+
10+
def get_layer_uid(layer_name=''):
11+
if layer_name not in _LAYER_UIDS:
12+
_LAYER_UIDS[layer_name] = 1
13+
return 1
14+
else:
15+
_LAYER_UIDS[layer_name] += 1
16+
return _LAYER_UIDS[layer_name]
17+
18+
19+
class Layer(object):
20+
def __init__(self, name='layer', verbose=True, **kwargs):
21+
if not name:
22+
layer_name = self.__class__.__name__.lower()
23+
name = layer_name + '_' + str(get_layer_uid(layer_name))
24+
else:
25+
layer_name = name
26+
name = layer_name + '_' + str(get_layer_uid(layer_name))
27+
self.name = name
28+
self.vars = {}
29+
self.verbose = verbose
30+
31+
def _call(self, inputs):
32+
return inputs
33+
34+
def __call__(self, inputs=None):
35+
if self.verbose and inputs is not None:
36+
if not isinstance(inputs, list):
37+
tf.summary.histogram(self.name + '/inputs', inputs)
38+
else:
39+
for i, x in enumerate(inputs):
40+
tf.summary.histogram(self.name + '/inputs_%d' % i, x)
41+
outputs = self._call(inputs)
42+
if self.verbose:
43+
tf.summary.histogram(self.name + '/outputs', outputs)
44+
return outputs
45+
46+
def _log_vars(self):
47+
if self.verbose:
48+
for var in self.vars:
49+
tf.summary.histogram(self.name + '/vars/' + var, self.vars[var])
50+
51+
52+
class UniformSampler(Layer):
53+
def __init__(self, name='uniform', verbose=False, adj_list=None):
54+
super(UniformSampler, self).__init__(name=name, verbose=verbose)
55+
self.adj_list = adj_list
56+
57+
def _call(self, inputs):
58+
ids, n_sample = inputs
59+
# len(id) * max_degree
60+
neighbors = tf.nn.embedding_lookup(self.adj_list, ids)
61+
neighbors = tf.transpose(
62+
tf.random_shuffle(
63+
tf.transpose(neighbors)))
64+
neighbors = neighbors[:, :n_sample]
65+
return neighbors
66+
67+
68+
class GCNAgg(Layer):
69+
def __init__(self, name='gcn_agg', verbose=False, input_dim=None, output_dim=None,
70+
act=tf.nn.relu, weight=True, dropout=0.):
71+
super(GCNAgg, self).__init__(name=name, verbose=verbose)
72+
73+
self.input_dim = input_dim
74+
self.output_dim = output_dim
75+
self.act = act
76+
self.weight = weight
77+
self.dropout = dropout
78+
79+
with tf.variable_scope(self.name):
80+
if self.weight:
81+
self.vars['weights'] = glorot([input_dim, output_dim], name='weights')
82+
self.vars['bias'] = zeros([output_dim], name='bias')
83+
84+
self._log_vars()
85+
86+
def _call(self, inputs):
87+
# n_sup * k, n_sup * n_sample * k, (n_sup * n_sample)
88+
self_vecs, neigh_vecs, n_sample = inputs
89+
neigh_vecs = tf.nn.dropout(neigh_vecs, 1 - self.dropout)
90+
self_vecs = tf.nn.dropout(self_vecs, 1 - self.dropout)
91+
92+
hidden = tf.reduce_mean(tf.concat([tf.expand_dims(self_vecs, axis=1), neigh_vecs], axis=1), axis=1)
93+
if self.weight:
94+
hidden = tf.matmul(hidden, self.vars['weights'])
95+
hidden += self.vars['bias']
96+
return self.act(hidden)
97+
98+
99+
class GATAgg(Layer):
100+
def __init__(self, name='gat_agg', verbose=False, input_dim=None, output_dim=None,
101+
act=tf.nn.relu, bias=True, weight=True, dropout=0., atn_type=1, atn_drop=False):
102+
super(GATAgg, self).__init__(name=name, verbose=verbose)
103+
104+
self.input_dim = input_dim
105+
self.output_dim = output_dim
106+
self.act = act
107+
self.bias = bias
108+
self.weight = weight
109+
self.dropout = dropout
110+
self.atn_type = atn_type
111+
self.atn_drop = dropout if atn_drop else 0.
112+
113+
with tf.variable_scope(self.name):
114+
if self.weight:
115+
self.vars['weights'] = glorot(shape=[input_dim, output_dim], name='weights')
116+
else:
117+
assert input_dim == output_dim
118+
119+
self.vars['atn_weights_1'] = glorot([output_dim, 1], name='atn_weights_1')
120+
self.vars['atn_weights_2'] = glorot([output_dim, 1], name='atn_weights_2')
121+
self.vars['atn_bias_1'] = zeros([1], name='atn_bias_1')
122+
self.vars['atn_bias_2'] = zeros([1], name='atn_bias_2')
123+
124+
if self.bias:
125+
self.vars['bias'] = zeros([output_dim], name='bias')
126+
127+
self._log_vars()
128+
129+
def _call(self, inputs):
130+
# n_sup * k, n_sup * n_sample * k
131+
self_vecs, neigh_vecs, n_sample, _ = inputs
132+
neigh_vecs = tf.nn.dropout(neigh_vecs, 1 - self.dropout)
133+
self_vecs = tf.nn.dropout(self_vecs, 1 - self.dropout)
134+
135+
if self.weight:
136+
self_vecs = tf.matmul(self_vecs, self.vars['weights'])
137+
neigh_vecs = tf.reshape(
138+
tf.matmul(tf.reshape(neigh_vecs, [-1, self.input_dim]),
139+
self.vars['weights']),
140+
[-1, n_sample, self.output_dim])
141+
142+
# append self_vecs to neigh_vecs
143+
neigh_vecs = tf.concat([tf.expand_dims(self_vecs, axis=1), neigh_vecs], axis=1)
144+
n_neigh = n_sample + 1
145+
146+
# n_sup * 1
147+
f_1 = tf.matmul(self_vecs, self.vars['atn_weights_1']) + self.vars['atn_bias_1']
148+
# n_sup * (n_sample + 1)
149+
f_2 = tf.reshape(
150+
tf.matmul(tf.reshape(neigh_vecs, [-1, self.output_dim]),
151+
self.vars['atn_weights_2']),
152+
[-1, n_neigh]) + self.vars['atn_bias_2']
153+
# n_sup * (n_sample + 1)
154+
logits = f_1 + f_2
155+
scores = tf.nn.dropout(tf.nn.tanh(logits), 1 - self.atn_drop) / FLAGS.temp
156+
coefs = tf.nn.softmax(scores)
157+
output = tf.reduce_sum(tf.expand_dims(coefs, 2) * neigh_vecs, axis=1)
158+
159+
if self.bias:
160+
output += self.vars['bias']
161+
return self.act(output)

material/graph-methods.png

416 KB
Loading

material/kni_poster.pdf

7.09 MB
Binary file not shown.

material/kni_presentation.pdf

5.46 MB
Binary file not shown.

0 commit comments

Comments
 (0)