Mo kobrosli 1:41 PM
Sent from Toronto Apache Spark
Hi Suhail. Sorry. Yesterday was nuts. Answers below! Didn't fit inline on the comments. over the max :( Didn't feel like breaking it all up.
Picking what type of 'session' to define is really dependent the ultimate goal the recommendations are meant to achieve. If you want very serendipitous type discovery based recommendations then not putting any constraints on them would probably do just that. So just use a session cookie to break up your event streams over a period of your choice. Or just a user id if you have it.
But depending on the domain, it may just all appear to be random 'noise'. In a horizontal classifieds site this is exactly what happens. However if you have A LOT of covisitation for a given pair of items, then that is a signal that those points are not random. So you could filter out say coviews < X where X is the threshold of your choice.
Other approaches can be used by thinking of a session as more of an 'information seeking' session. So even if your actual web 'session' involved looking at guitars and then new homes, maybe it's best to SPLIT that session into two information sessions. i.e. pseudo sessions. one for the guitar viewing and the other for homes. This would help to stop random crossover of your calculated covisitations. How many people could really be looking for homes and guitars? Is that a common cross over? If it was then you don't want to do that probably. It's really dependent on your domain, deep analysis and testing the ultimate success of your recommendations. i.e. are people engaging with them. That's really the only metric that matters at the end of the day. It's best not to spend TOO much time, and get early A/B tests in to find the right approach and track.
If you had a single vertical market then this issue isn't as concerning. For example, autos or real estate only. But even then, you may get people just window shopping with no real intent. It's really the 'intent' If you want to capture. You have to find the signal that captures the right level of intent to make coviews actually worth something.
This is indeed a tricky one. If your recommendations are for popular items, and those items are long lived (like movies) then you can very well create a positive feedback loop here. The same problems could exist in covisitation depending on your normalization function. Again this is totally dependent on your goal and domain. If your inventory is short lived and volatile it's less of an issue. You could tune your normalization to handle this issue, but one of the most effective things to do in recommendation systems is to make sure you build a complete feedback loop. Most people solve this issue by not counting clicks on recommendations back into their algorithms, or only take a portion of the clicks. or discount their value. So for something like 'trending' searches they are determined by user manual searches and not clicks on trending searches. Or alternatively discount (or sample) the searches that came from a recommendation click to make the playing field even for NEW recommendations.
Hopefully this answers your questions!
the above was in response to the following questions:
to summarize,
1] the means of comparison is some (unspecified) metric of user engagement and/or efficacy of recommendations (there are a few options in the literature). measurement is done, as expected, via user testing.
2] the issue here is that of the training feedback loop that exists in most "live" machine learning systems. there's a wealth of research on the topic and under simplified assumptions, multiarmed bandits and reinforcement learning describe the associated exploration/exploitation dilemma. however, it's not clear from Mo's response what was used by them (perhaps it was all of those techniques and more).