**Problem Detail:**

I'm trying to create a nodeJS application. It allows users to rate a bunch of songs and it stores them in their user profiles. I use this information to compare them to other users, and try to find users with similar interests in songs, and I suggest them new songs based on that.

Each user profile basically looks like this

`userID: [the userid of the user] songs: [the list of songs that the user has rated] ratings: [the corresponding ratings each user gave to the song] `

Each **song** is represented by a **9 digit whole number** and each **rating** is a **whole number from 1 to 6**. Basically put, I have to compare one user to the rest of the users to determine which of them match this user the best. By match, I mean gave the same songs similar ratings. To do this, I created a simple algorithm.

`Step 1) Create a list, in which each entry maps our target user to each of the other users. Step 2) Now consider each entry, determine which songs both of the users have rated, and store those songs (along with the corresponding ratings each user gave them) in this entry itself Step 3) Now iterate through each entry and perform the following operations a) let percentage = 0 b) let num = [the number of songs that both users rated] c) iterate through each song (that both the users rated) and perform the following operations i) determine the score our target user gave this song and store it in variable a ii) determing the score the other user gave this song and store it in variable b iii) map a and b to new values based on this 1 --> -3 2 --> -2 3 --> -1 4 --> 1 5 --> 2 6 --> 3 iv) now calculate sum as sum = |a| + |b| where || is the absolute values v) now calculate degree as degree = sum/2 vi) now if a * b is less than 0 then calculate (percentage - degree) and store that value again in percentage if not then calculate (percentage - degree) and store that value again in percentage d) now calculate (50 + (percentage / (6*num))*100) and store that value in this entry as match Step 4) Now that I have my list of entries (along with the match between each pair of target user and other user) I sort the list in descending order of match and from that, I can determine which users have the closest match in taste by selecting from the first entries `

Now for each user pair, I am completely neglecting the songs that one user rated and the other didn't, and that is okay for me.

There are several problems with this method though, the biggest one being that this algorithm is very time consuming for a large set of users (say around 1,000,000). And also that I have to load in all the users, every time I need to find a set of matching users for just 1 user. And I need to do this repeatedly for that user, to update his/her list.

Is there a way to make this more efficient? Can I assign a value to each user that takes into account all the songs that they rated, and use that number to compare the users? Is that even possible? I guess what I'm asking is, how can I compare and contrast this data, mathematically, to find similar users, efficiently. Also, can someone suggest a proper tag for this type of question?

**EDIT**

D.W. has suggested a solution, but how do I implement it? Can you provide more detail?

###### Asked By : Rockstar5645

###### Answered By : D.W.

One approach is to use low-rank matrix factorization to approximate the ratings matrix, then use a nearest neighbors data structure.

In particular, let $M$ be the $m \times n$ ratings matrix, where $M_{ij}$ is the rating that user $i$ has provided to user $j$. Look for a $m\times r$ matrix $U$ and a $r \times n$ matrix $V$ such that $M$ is well-approximated by $UV$ (e.g., $||M-UV||$ is minimized, except that missing entries of $M$ are ignored).

Once you've done this, each user can be mapped to a $r$-dimensional vector that summarizes their scores. Store the set of all of these vectors in a data structure that supports nearest neighbor search, e.g., a k-d tree. If you have chosen $r$ small enough (e.g., 10 or so), then nearest neighbor search should be very efficient. This will let you find the users who are most similar to a given user, and thus make a recommendation (e.g., using the k-nearest neighbors classifier).

There are many other approaches you could use: read the extensive literature on recommender systems and collaborative filtering.

As far as your other issue:

I have to load in all the users, every time I need to find a set of matching users for just 1 user

Keep the data structure in memory, rather than re-loading it into memory each time.

Question Source : http://cs.stackexchange.com/questions/60055

**3200 people like this**

## 0 comments:

## Post a Comment

Let us know your responses and feedback