Author Topic: Netflix Prize Summary: Factorization Meets the Neighborhood  (Read 120 times)

0 Members and 1 Guest are viewing this topic.

Offline Flavio58

Netflix Prize Summary: Factorization Meets the Neighborhood
« Reply #1 on: July 10, 2018, 10:03:48 AM »
Advertisement
Netflix Prize Summary: Factorization Meets the Neighborhood

(Way back when, I went through all the Netflix prize papers. I’m now (very slowly) trying to clean up my notes and put them online. Eventually, I hope to have a more integrated tutorial, but here’s a rough draft for now.)



 

This is a summary of Koren’s 2008 Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model.



 

There are two approaches to collaborative filtering: neighborhood methods and latent factor models.



 

     
  • Neighborhood models are most effective at detecting very localized relationships (e.g., that people who like X-Men also like Spiderman), but poor at detecting a user’s overall signals.

  •  
  • Latent factor models are best at estimating overall structure (e.g., that a user likes horror movies), but are poor at detecting strong associations among small sets of closely related items.

  •  



 

Since the two approaches have complementary strengths and weaknesses, we should integrate the two; this integration is the focus of this paper.



 

Preliminaries



 

As mentioned in previous papers, we should normalize out common effects from movies. Throughout the rest of this paper, Koren uses a baseline estimate of overall rating mean + user deviation from average + movie deviation from average for the rating of user i on movie i; estimation of the latter two parameters are done by solving a regularized least squares problem.



 

Koren then describes using a binary matrix (1 for rated, 0 for not rated) as a source of implicit feedback. This is useful because the mere fact that a user rated many science fiction movies (say) suggests that the user likes science fiction movies.



 

A Neighborhood Model



 

Recall the previous paper, where we modeled each rating $r_{ui}$ as



 

$$r_{ui} = b_{ui}+ \sum_{N \in N(i; u)} (r_{uj} - b_{uj}) w_{ij},$$



 

where $N(i; u)$ is the k items most similar to i among the items user u rated, and the $w_{ij}$ are parameters to be learned by solving a regularized least squares problem.



 

This paper makes several enhancements to that model. First, we replace $N(i; u)$ with $R^k(i; u)$, the intersection of the k items most similar to i (among all items) intersected with the items user u rated. Also, we denote by $N^k(i; u)$ the intersection of the k items most similar to i with the items user u has provided implicit feedback for. This gives us



 

$$r_{ui} = b_{ui} + \sum_{j \in R^k(i; u)} (r_{uj} - b_{uj}) w_{ij} + \sum_{j \in N^k(i; u)} c_{ij},$$



 

where the $c_{ij}$ are another set of parameters to learn.



 

Notice that by taking the intersection of the k items most similar to i with the items user u rated (giving perhaps a set of size less than k), rather than taking the k items most similar to i among the items user u rated, we let our model be influenced not only by what a user rates, but also by what a user does not rate. For example, if a user does not rate LOTR 1 or LOTR 2, his predicted rating for LOTR 3 is penalized.



 

This implies that our current model encourages greater deviations from baseline estimates for users that provided many ratings or plenty of implicit feedback. In other words, for well-modeled users with a lot of input, we are willing to predict quirkier and less common recommendations; users we have less information about, on the other hand, receive safer, baseline estimates.



 

Nonetheless, this dichotomy between power users and newbie users is perhaps overemphasized by our current model, so we moderate the dichotomy by modifying our model to be



 

$$r_{ui} = b_{ui} + |R^k(i; u)|^{-0.5} \sum_{j \in R^k(i; u)} (r_{uj} - b_{uj}) w_{ij} + |N^k(i; u)|^{-0.5} \sum_{j \in N^k(i; u)} c_{ij}.$$



 

Parameters are determined by solving a regularized least squares problem.



 

Latent Factor Models Revisited



 

Typical SVD approaches are based on the following rule:



 

$$r_{ui} = b_{ui} + p_u^T q_i,$$



 

where $p_u$ is a user-factors vector and $q_i$ is an item-factors vector. We describe two enhancements.



 

Asymmetric-SVD



 

One suggestion is to replace $p_u$ with



 

$$|R(u)|^{-0.5} + \sum_{j \in R(u)} (r_{uj} - b_{uj}) x_j + |N(u)|^{-0.5} \sum_{j \in N(u)} y_j,$$



 

where $R(u)$ is the set of items user u has rated, and $N(u)$ is the set of items user u has provided implicit feedback for. In other words, this model represents users through the items they prefer, rather than expressing users in a latent feature space. This model has several advantages:



 

     
  • Asymmetric-SVD does not parameterize users, so we do not need to wait to retrain the model when a user comes in. Instead, we can handle new users as soon as they provide feedback.

  •  
  • Predictions are a direct function of past feedback, so we can easily explain predictions. (When using a pure latent feature solution, however, explainability is difficult.)

  •  



 

As usual, parameters are learned via a regularized least-squares minimization.



 

SVD++



 

Another approach is to continue modeling users as latent features, while adding implicit feedback. Thus, we replace $p_u$ with $p_u + |N(u)|^{-0.5} \sum_{j \in N(u)} y_j$. While we lose the easily explainability and immediate feedback of the Asymmetric-SVD model, this approach is likely more accurate.



 

An Integrated Model



 

An integrated model incorporating baseline estimates, the neighborhood approach, and the latent factor approach is as follows:



 

$$r_{ui} = \left[\mu + b_u + b_i\right] +\left[q_i^T \big(p_u + \sqrt{|N(u)|}\sum_{j \in N(u)} y_j \big)\right] + \left[\sqrt{|R^k(i;u)} \sum_{j \in R^k(i; u)}(r_{uj} - b_{uj})w_{ij}+\sqrt{|N^k(i;u)|} \sum_{j \in N^k(i; u)} c_{ij}\right].$$



 

Note that we have used $(\mu + b_u + b_i)$ as our baseline estimate. We also used the SVD++ model, but we could use the Asymmetric-SVD model instead.



 

This rule provides a 3-tier model for recommendations:



 

     
  • The first baseline group describes general properties of the item and user. For example, it may say that “The Sixth Sense” movie is known to be a good movie in general, and that Joe rates like the average user.

  •  
  • The next latent factor group may say that since “The Sixth Sense” and Joe rate high on the Psychological Thrillers Scale, Joe may like The Sixth Sense because he likes this genre of movies in general.

  •  
  • The final neighborhood tier makes fine-grained adjustments that are hard to file, such as the fact that Joe rated low the movie “Signs”, a similar psychological thriller by the same director.

  •  



 

As usual, model parameters are determined by minimizing the regularized squared error function through gradient descent.


 

Source: Netflix Prize Summary: Factorization Meets the Neighborhood


Consulente in Informatica dal 1984

Software automazione, progettazione elettronica, computer vision, intelligenza artificiale, IoT, sicurezza informatica, tecnologie di sicurezza militare, SIGINT. 

Facebook:https://www.facebook.com/flaviobernardotti58
Twitter : https://www.twitter.com/Flavio58

Cell:  +39 366 3416556

f.bernardotti@deeplearningitalia.eu

#deeplearning #computervision #embeddedboard #iot #ai

 

Related Topics

  Subject / Started by Replies Last post
0 Replies
85 Views
Last post July 09, 2018, 02:02:37 PM
by Flavio58
0 Replies
74 Views
Last post July 15, 2018, 10:01:49 PM
by Flavio58
0 Replies
36 Views
Last post November 13, 2018, 08:20:11 PM
by Ruggero Respigo
0 Replies
55 Views
Last post November 17, 2018, 02:06:28 PM
by Ruggero Respigo
0 Replies
25 Views
Last post March 20, 2019, 02:06:48 PM
by Flavio58

Sitemap 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326