I'm trying to figure out a way I could represent a Facebook user as a vector. I decided to go with stacking the different attributes/parameters of the user into one big vector (i.e. age is a vector of size 100, where 100 is the maximum age you can have, if you are lets say 50, the first 50 values of the vector would be 1 just like a thermometer).
Now I want to represent the Facebook interests as a vector too, and I just can't figure out a way. They are a collection of words and the space that represents all the words is huge, I can't go for a model like a bag of words or something similar. How should I proceed? I'm still new to this, any reference would be highly appreciated.
Asked By : mabounassif
Answered By : Emre
The interests are categorical data and may be modeled as binary variables (a user either likes them or he does not). You can subsume little-used categories under broader categories. For example, a user who likes a little-known horror movie can simply be marked as liking horror movies. You can even subsume such items under multiple categories if it belongs to several.
For what you can do with the data see A Review on Data Clustering Algorithms for Mixed Data
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/1394
0 comments:
Post a Comment
Let us know your responses and feedback