程序代做CS代考 User-based collaborative filtering: Intuition – cscodehelp代写

User-based collaborative filtering: Intuition
People like things liked by other people with similar taste
• Search for similarities among users
– Two users Jane and Bob tend to like same movies; they
have similar taste in movies.
• Recommend items like by users similar to the target user. – Jane and Bob have similar rating behaviours (taste), – If Jane liked Batman then
recommend Batman to Bob.
• Mathematically similar to Item-based methods.

User-based method
Target user
Q1: how to measure similarity?
Q2: how to find similar users?
Q3: how to combine?

Q1: How to measure similarity between users
• Euclidean distance with mean imputation
𝑖1 in 𝑢1
𝑢2
𝒖𝟏
17

20
18
17
18.5
𝒖𝟐
8


17
14
17.5
• 𝑠𝑖𝑚𝑢1,𝑢2 = 1 =0.08 1+𝑑(𝑢1,𝑢2)
𝑑𝑢1,𝑢2 =11.9=
(17 − 8)2+ (18.1 − 14.1)2+(20 − 14.1)2+(18 − 17)2+(17 − 14)2+(18.5 − 17.5)2
• Compute mean value for user1’s missing values (18.1)
• Compute mean value for user2’s missing values (14.1)
• Compute Euclidean distance between resulting rows
• Convert the distance into a similarity (high similarity for low
distance, low similarity for high distance)

User-based: Q2: How to find similar users? Q3: How to combine ratings?
• Selecting similar users and making prediction
• With respect to user a and item 𝑗:
– Choose 𝒌 most similar users who have rated item 𝒋.
– Prediction of rating is weighted average of the ratings of item 𝑗 from the top-k similar users.

User-based method
• Mathematically similar to Item-based method. • However:
– Item-based performs better in many practical cases: movies, books, etc.
– User preference is dynamic; relatively static for item based High update frequency of offline-calculated information
– Sparsity problem with user based method.
• No recommendation for new users
• Scalability issues
– As the number of users increase, more costly to find similar users.
– Offline clustering of users

Scale-up search of k-similar users

Scale up search of k-similar users
• Offline step

Options for Q1: Similarity metrics
• Item-item: Considers the similar items
• User-user: Considers the similar users
• We looked at Euclidean distance based similarity.
• The other two popular similarity measures are
– Cosine similarity and
– (centered cosine similarity).

Cosine similarity
• Cosine similarity is a measure of similarity between two vectors X, Y.
– a dot product between two vectors X, Y.
– X, Y: 2 vectors of ratings by user x and user y – X, Y: 2 vectors of ratings of item x and item y
𝑋⋅𝑌
• 𝑐𝑜𝑠𝑋,𝑌=(𝑋⋅𝑌)=
• 𝑐𝑜𝑠 𝑋,𝑌 >𝑐𝑜𝑠(𝑋,𝑍)
𝑖=1
𝑖=1
σ𝑛 𝑋𝑖×𝑌𝑖 𝑖=1
σ𝑛 𝑥𝑖2× σ𝑛 𝑦𝑖2
Y
X
Z
𝜃1 𝜃2

Centred cosine similarity
• Cosine similarity
– Missing values in vectors are imputed with the value 0
– Issue: 0 has very different meanings in different vector context
• Two users, one is tough and one is easy
• Two items having higher and lower ratings. • Misleading results
• let 𝑋 be normalised values of 𝑋 and 𝑌 𝑛𝑟𝑜𝑚 𝑛𝑟𝑜𝑚
normalised 𝑌
• 𝑐𝑒𝑛𝑡𝑟𝑒𝑑_𝑐𝑜𝑠 𝑋,𝑌 = 𝑐𝑜𝑠 𝑋 ,𝑌 𝑛𝑜𝑟𝑚 𝑛𝑜𝑟𝑚
σ𝑛 (𝑋𝑖 −𝑥ҧ )×(𝑌𝑖− 𝑦ത) 𝑖=1
σ𝑛 (𝑥𝑖 −𝑥ҧ)2 × σ𝑛 (𝑦𝑖 −𝑦ത)2 𝑖=1 𝑖=1
=
• Centred cosine similarity is Pearson correlation.

Summary
• We learnt:
– Popularity based.
– Item-based and user-based collaborative filtering (Gen 1)
• Simple but reasonably powerful.
• Achieves some level of personalisation. – Different measurements of similarities.
– Some limitations with these approaches.
• Cold start problem – new items/users • Scalability issues

Leave a Reply

Your email address will not be published. Required fields are marked *