Build a Better Netflix, Win a Million Dollars? 197
An anonymous reader writes "In a quest to better movie recommendations, Netflix is opening their database (nytimes, registration and first child required) to users to try to craft a better recommendation technology. The problem is not easy. Says one researcher: 'You're competing with 15 years of really smart people banging away at the problem.'" Recommender systems are really an interesting problem, and that is likely very interesting data to play with.
Seems like a free gift for Netflix to me... (Score:3, Insightful)
But if someone does win within a year they will still have the ability to use others' code, free of charge, as part of their product.
The article doesn't say but how will you know if your code is making choices better than their existing system? I wouldn't be submitting my code unless I was sure I was going to win. Then again I'm not a gambler or a coder
Re: (Score:3, Informative)
But that seems pretty reasonable...you only have to hand over your code if you win, otherwise you're only submitting the results of yo
Re: (Score:3, Informative)
Re: (Score:2)
Would you rent this movie again? 1 2 3 4 5
Would you recommend this to a friend? 1 2 3 4 5
Then choose the choices of why you liked or disliked it by of 5-6 choices, such as: Acting, plot, cinematography, moral values, hotness
Re: (Score:3, Insightful)
Re: (Score:2)
Challenge Accepted (Score:2)
So, we can then conclude (Score:4, Funny)
So, the professionals have been working at it for a long time. Is it safe to assume some teenage to early college hacker will find a success within two weeks.
Re: (Score:2)
Simply filter their existing result set to exclude titles that are in a genre that the user has NEVER rented anything from and that would be a huge improvement!
Simple (Score:5, Funny)
recomendation=MovieGenre.PORN;
else
recomendation=MovieGenre.CHICKFLICK;
And of course, slashdot must have sensed my post as my image word is "pervert"
Re:Simple (Score:5, Funny)
Old Version:
if(user.getGender()==Person.MALE)
recomendation=MovieGenre.PORN;
else
recomendation=MovieGenre.CHICKFLICK;
New Version, sure to win the million bucks:
if(user.getGender()==Person.MALE && user.getOrientation()==Person.STRAIGHT)
recomendation=MovieGenre.PORN;
else
recomendation=MovieGenre.CHICKFLICK;
Re: (Score:2)
Re: (Score:2)
Gay porn? or chick flick?
--
Go where Web Thinkers gather [webcogito.com]
Re: (Score:2)
I had a thought like this a while back... (Score:5, Interesting)
Re: (Score:2)
go see porn sites (Score:3, Interesting)
Especially the newer blogish type pages where theres a gallery and a small selection underneath.
Not that I would know of course.
Re: (Score:2)
Suggestion (Score:5, Insightful)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Just a nitpick... If I mark, say, season 1 of series X as Not Interested, maybe it means I already own it and have no need to rent it, but still might want to see season 2. Of course, if I marked it as 1-star (Which I assume means "Utter crap"), then as you said, it should shut the hell up about the rest of the series.
I disagree. If you have it, you presumably have watched it and should give it a rating. You do have interest in it, or you would not have bought it. So things you mark as 1 star should prob
Re: (Score:2, Funny)
Re: (Score:2)
I didn't like Star Wars:Episode I very much. Episode 4 was great though.
Right, so you might mark episode I, (technically number 4 by release order and prequels generally suck so I think this should be the ordering mechanism) as 2 stars or even 1. You wouldn't mark it as not interested, since from your comment you were interested enough to watch it. If, however, you were so disinterested in episode I so as to mark it as not interested (meaning you did not watch it and don't ever want to) then the chances
Re: (Score:2)
Damn prequels.
I think this is an awesome idea for development - let the whole world offer solutions and spend only a million dollars. Most companies would spend that much just on a consultant team to tell them their current system sucks.
Re: (Score:2)
Re: (Score:2)
Besides any logical cataloging system would mark the SW prequels as a completely different series then the original.
Personally I think a good place to start is by director/writer... So if I like
Re: (Score:3, Insightful)
I was about to mention that I mark things as Not Interested when I own them, to avoid being reccommended the rest (Usually because I prefer to buy series I like, and rent actual movies), but then I realized that fits into what you said perfectly.
Point conceded.
Re:Suggestion (Score:5, Funny)
For the record, this is a turning point in slashdot history. I'll forever remember where I was when I first saw those words in a slashdot comment. (Which of course is at work, sitting through a boring meeting.)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Privacy issues? (Score:3, Interesting)
Re: (Score:2)
Re: (Score:3, Informative)
To prevent certain inferences being drawn about the Netflix customer base, some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates.
Plus all the usual replacing of IDs and such you'd expect. Looks like they're trying to avoid a repeat of the AOL debacle at least.
Re: (Score:3, Insightful)
RSSTimes (Score:5, Insightful)
Why is it that the Slashdot editors are just too damn lazy to look up the RSS feed links to these pages?
While this may be true, I wouldn't let it deter you. Collaborative filtering is a field that is far from dead. The interesting thing about collaborative filtering is that on the surface, it seems pretty straight forward but once you dig into the mechanics of it, there is actually a lot of playing you can do. Ironically, the way you display the data to the end user is often what determines how well of a job you did.
Allow me to take a naïve approach at this topic and say we generate a movie index of each person. I would have A Clockwork Orange and Koyaanisqatsi at 5 while The Ring 2 would be at the very low end. My friend might have similar movies. If he has A Clockwork Orange up there, you might be able to compute a Euclidean distance between us. However, this approach falls apart because no one has seen Koyaanisqatsi and of the 20 movies I've ranked highly, they are hard to find.
You don't have to stop there, however. You could also database the movies I marked as "uninterested" or the movies that were presented to me but I didn't vote on. Like if I had seen the offer to mark J-Lo's latest flop but didn't, wouldn't that tell you something about me?
So these caveats present themselves all along the way and, at the end computation, you have many different strategies for this data. For example, while you might not be able to link my friend an I through movies, how far apart are we on a nod network? What I mean is, if you plotted every user in their own dimension depending on the movies they ranked and attempted to compute as good a distance as possible between all users, how far would I be away from my friend by hopping on these nodes? There's a lot of information to be gleaned in this sort of friend-of-a-friend collaborative approach.
Now you need to present this information to the user. Do you just up and recommend him a movie? Do you take Amazon's approach and say "Other people did this -- so should you."? Or do you give them some sort of three dimensional flash plotting of you versus the people nearest to you? Do you allow the user to contact those closest to them? Those farthest away?
My point is that while 15 years of research has been done, it doesn't mean there's been 15 years of testing and implementation which, in the end of creating products, is where most of the importance lies.
About no-login links on /. (Score:3, Insightful)
I think it is the reason.
Slashdot can't send thousands of users with a fake referrer to NY Times. That link you provided is for people using RSS readers and subscribed to NY Times RSS feed.
I think they should talk with NY Times web team to allow slashdot readers with referrer=slashdot without needing login. They can arrange it for sure, this isn't a "no name" site.
It would be nice for NY Times for
Re: (Score:2)
Well...almost no one.
Re: (Score:3, Informative)
(FWIW, Powaqqatsi was a better flick, IMHO)
Re: (Score:2)
(FWIW, Powaqqatsi was a better flick, IMHO)
See, I liked the first one better, and I didn't have any trouble staying awake. I like Philip Glass and I like Francis Ford Coppola, though.
Psychology, extended data (Score:2)
They are leaving out a whole aspect of psychology. The problem is that they have a 2 day lead time to get the content to you. So, it's not just a matter of them asking what your mood is and then providing you with the movie. Instead, they have to predict how you will be feeling 2 days from now and send you the
I know...credit reports! (Score:2)
Copy the Music Genome Project (Score:5, Interesting)
What they need to do is copy the methods of the Music Genome Project (www.pandora.com), and list a larger set of attributes for the films. This way it can recommend films by checking many more characteristics, such as director, tone, writer, or subject.
Re:Copy the Music Genome Project (Score:5, Informative)
What they need to do is copy the methods of the Music Genome Project (www.pandora.com), and list a larger set of attributes for the films. This way it can recommend films by checking many more characteristics, such as director, tone, writer, or subject.
In this contest, you run your own code and submit the results to NetFlix to be scored. This means that you can use any other data (e.g. A Movie Genome projct) you can compile to enhance your rankings. Netflix apparently specifically designed the contest to allow this.
Not just the movie characteristics. (Score:2)
But they also need ways to identify the characteristics of people's choices. Right now, one NetFlix account can be used by a whole family. So instead of getting 1 person's characteristic choices (teenage emo goth girl), you get those combined with the other family members (Dad's action films, Mom's chick flicks, Jr's teenage sex comedies).
Eventually, you'd end up with a movie genome cross indexed to a sub-culture.
Re: (Score:2)
Movie folksonomy (Score:2)
Or leech off of IMDB's recommendation system, which seems to be quite good.
only a million? (Score:3, Interesting)
Re: (Score:2, Insightful)
So, you could take the money from Netflix, use it to start your business, then license it to the other players, too.
Same as all jobs (Score:2)
Many companies do offer incentive programs (more likely for upper level positions), but is still just a percentage of the actual "value" that you created,
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I know I'm sounding like I'm promoting "execs" but for the most part they put up with a lot. Granted some truly are heartless, but there are also nice people that have to fire hundreds (if not more) people that they may even be friends with. If it truly were a walk in the park and everyone could (or even would want to) do it, then ummm...things would be...different. I'm not sure how, but it would just be different. In an
Fix the problems with what they send me first (Score:5, Interesting)
On top of that, don't show me that it's available in my queue but send me something else instead. While I haven't asked netflix about this, I have asked blockbuster online, and I imagine they are both doing the same thing. The disc is "available" just not at the warehouse used to ship to me personally. Instead of basing one piece of information off of total stock and one off of local stock, base them both on the stock at the warehouse shipping to me.
Re: (Score:3, Funny)
I thought Netflix users just ripped the movies to their hard drive for later viewing anyway?
Re: (Score:2)
Re: (Score:2)
Remove Artificial Supply Limitations (Score:3, Insightful)
Re: (Score:2)
Re: (Score:2)
Difficulties on the data-gathering end (Score:5, Interesting)
Not that marketers have a better handle, but simply that people will swear up and down that they would buy a peanut-butter-filled hot dog, that they loved the one they tried, and then don't actually buy any.
Don't believe me? Go see Snakes on a Plane. Nobody else did. (Sure, $33 million seems like a lot, but that's chump change for a major studio release these days.)
The best improvements will come from insights gained between the lines. You may have rated The English Patient eleventeen stars, but if your next seven rentals were all episodes of The Girls Next Door, which you only rated 3 stars, it certainly looks like you want more Hugh Hefner and less Ralph Fiennes.
The best data is the data that the subject doesn't realize he's giving you. Once you start imposing conscious choice on the ratings, you get only what they say they like, not what they really like.
Re: (Score:2)
It's the difference between scientists and engineers trying to decide what activities seem more dangerous, and actuarians using real historical data to rate activities. Guess which method insurance companies use when their money is on the line...
Re: (Score:2)
If the actual data supports that then yes. Otherwise no.
Using actuarial statistics, we don't care what the explanations are, just the results.
If the data suprises us with a high level of deaths from nose-picking, then that would be a high risk activity.
Re: (Score:2)
an excellent book which covers, amongst many other things, how people do behave over how they say they'll behave is Freakonomics [amazon.co.uk].
for example, they cover how people behave about race and dating, whilst people SAY they have little preference, analysis from dating agencies shows the opposite. Even some game show stats are used to prove people are prejudiced.
Re: (Score:2)
Did anyone say they wanted to see it, though? I saw a lot of people mocking it, but I didn't see any non-ironic expressions of enthusiasm.
Intractable problem - liking the movie, not genre (Score:4, Interesting)
I stopped rating movies after I found that I got recommended a lot of crap. Say I rent a slasher movie that, for its genre, is artfully done. I rate it high. Now I have recommendations for a bunch of worthless, straight-to-video stuff that I really don't want to see.
This is the real nut to crack, IMO. How do come up with an algorithm that rates 'quality,' an elusive concept that means different things to different people?
Not to mention, I'm fickle.
Re:Intractable problem - liking the movie, not gen (Score:2)
Re: (Score:2)
Movie A is an artful horror film (we'll even give it a bonus point for originality.) Movie B is a low budget straight to video rehash of Movie A.
What are the differences between the two?
A) Budget / production values.
B) Production company.
C) Actors.
D) Crew, particularly director / writer / producer.
E) Originality of script. (This is kind of subjective, but surely a remake of a movie is less original t
Re: (Score:2)
It all depends on how dedicated to the genre you are. If you liked everything from Event Horizon, to Evil Dead, to Bram Stoker's Dracula, then there probably aren't too many horror movies wouldn't at least mildly enjoy.
But if you only ever rent Action movies if they star Tom Cruise, then your high War of the Worlds rating wouldn't necessarily imply you'd also enjoy Armageddon.
5 star rating is flawed (Score:4, Insightful)
The problem is is that that is my rating system. It works for me. But it does little good to anybody else because they are rating based purely on something else.
I think they need to implement the ability to rate more aspects of the movie. I'm sure some people out there rate the movie poorly if their disc is scratched or the transfer quality is poor even. A simple 1 to 5 system doesn't cut it. People rate things that aren't "Was the (romance) plot good?", "Do you like this director?", "Do you like these actors?". People rate things that aren't on the box.
Re: (Score:2)
Re: (Score:2)
The major problem is, simply put, movies are not treated as a disposable commodity.
I worked at a radio station for several years, and one of my jobs was to review CDs for library status. Listening to an entire CD takes anywhere from 30 to 60 minutes - simply impossible. So, you resort to fast-forwarding, previewing, and generally getting a feel for an album. Quickly you separate the wheat from t
Re: (Score:3, Funny)
Re: (Score:3, Funny)
Just add racing stripes! (Score:2, Funny)
Define "better" (Score:2)
FROM tblMovies as m, tblAdvertisers as a
WHERE m.studio = a.studio
ORDER BY a.adRevenue DESC
I win.
And the winner is .... (Score:2)
BitTorrent!!!
Here's a problem to solve with much larger impact (Score:3, Interesting)
Re:Here's a problem to solve with much larger impa (Score:3, Informative)
This problem is already solved.
Wi
Common data (Score:3, Informative)
Of course, I'm biased since I had John Riedl as a professor in a few easy classes. I think he tried to spin off this research as a new company, but I'm not sure if it ever got off the ground.
One thing I'd really like to see has little to do with the quality of ratings, though. I'd like to be able to keep a common database of my ratings across multiple sites. At the moment, I've rated a number of movies at Netflix, MovieLens, and IMDb, but they aren't entirely consistent. Unfortunately, two of the sites use a ten-point system (IMDb has a ten-point scale, MovieLens goes up to 5 stars, but in half-star increments), while the other uses a five-point one (maybe six if you say "Not Interested"..).
Well, I'll have to poke around a bit with this stuff. I wouldn't be able to do much, though, since my level of knowledge in this arena is very limited...
Netflix Stats (Score:2)
# Netflix has more than 1 billion movie ratings from customers. The average subscriber has rated more than 200 movies.
# Netflix members select approximately 60 percent of their movies based on movie recommendations tailored to their individual tastes.
# Netflix's members rent more than 95 percent of all titles in the Netflix library each quarter.
http://web.netflix.com/MediaCenter?id=1005&hnjr=8 [netflix.com]
1324 movies rated, 0 recommendations (Score:2)
Once, I rented and liked a Devo DVD, so it recommended every band with a concert movie, but I don't like every band and started marking things "Not interested". Then, I added a Sarah Silverman disc to my queue, which NetFlix took to mean that I love all stand-up comics, especially those on the B
uh huh (Score:2)
Did anyone realize that Netflix is releasing 100 million of anonymous customer data? I thought
Netflix also needs to introduce a dvd iso download service. In an age where most people own dvd burners, Why the hell not open dvd iso download service?
here's the problem with one-dimensional recommend. (Score:2)
Combine tagging with rating and you'll find a much better recommendation system.
i.e.: "Shrek II: cartoon, comedy, satire, Mike Myers; 4.5 stars" vs. "Shrek II: 4.5 stars" and "The Motorcycle Diaries:
My problem (Score:2)
Consequently NetFlix thinks I like everything. While the system is smart enough to not recommend Martin Lawrence movies, it usually gives me movies I'm simply not interested in. Or at the other extreme, it gi
Re: (Score:2)
Hilarious. Seriously, I thought "oh, HELL no" when it was chosen as "the movie we are going to watch tonight", but I laughed my ass off. Le singe est dans l'arbre.
No Sweat (Score:2)
One problem, how many dimensions ARE there to human affin
I hope Amazon... (Score:2, Interesting)
Wow... I've seen this movie before! (Score:2)
The First one is the age olde "frame problem". This is IT taking a perfectly good database and expecting it to be an even better recommendation system.
Airlines hit the same wall decades ago. They had databases of flights, seats and routes - all excellent. But they really wanted a reservation system based upon ticketing against that database. They finally recognized that nothing less than a mainframe was needed.
The answers you get is all in how you frame the question. Starting with "dat
Re: (Score:2)
Anyone else have better luck?
Re:database? (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
Thanks, but I'm hoping the winner is a couple of smart college guys or girls, and not Stallman
Re: (Score:2)
Job Application (Score:2)
If you are looking for a job, I wouldn't view this as a competition to make you obsol
Re: (Score:2, Insightful)
Re: (Score:2, Informative)