Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Data Storage Google Privacy Communications The Internet Yahoo! Science Technology

Google, Bing, Yahoo Data Retention Doesn't Improve Search Quality, Study Claims (theregister.co.uk) 38

A new paper released on Monday via the National Bureau of Economic Research claims that retaining search log data doesn't do much for search quality. "Data retention has implications in the debate over Europe's right to be forgotten, the authors suggest, because retained data undermines that right," reports The Register. "It's also relevant to U.S. policy discussions about privacy regulations." From the report: To determine whether retention policies affected the accuracy of search results, Chiou and Tucker used data from metrics biz Hitwise to assess web traffic being driven by search sites. They looked at Microsoft Bing and Yahoo! Search during a period when Bing changed its search data retention period from 18 months to 6 months and when Yahoo! changed its retention period from 13 months to 3 months, as well as when Yahoo! had second thoughts and shifted to an 18-month retention period. According to Chiou and Tucker, data retention periods didn't affect the flow of traffic from search engines to downstream websites. "Our findings suggest that long periods of data storage do not confer advantages in search quality, which is an often-cited benefit of data retention by companies," their paper states. Chiou and Tucker observe that the supposed cost of privacy laws to consumers and to companies may be lower than perceived. They also contend that their findings weaken the claim that data retention affects search market dominance, which could make data retention less relevant in antitrust discussions of Google.
This discussion has been archived. No new comments can be posted.

Google, Bing, Yahoo Data Retention Doesn't Improve Search Quality, Study Claims

Comments Filter:
  • by AvitarX ( 172628 ) <me&brandywinehundred,org> on Tuesday September 19, 2017 @08:49PM (#55228801) Journal

    Because I bet the 3 month retention is a huge boost, if only in giving me history of older searches in auto complete.

    Much more than that doesn't seem too helpful though, three months is a whole lot of searches, and should give plenty of information about what I'm searching for right now.

    • by dgatwood ( 11270 )

      Much more than that doesn't seem too helpful though, three months is a whole lot of searches, and should give plenty of information about what I'm searching for right now.

      Maybe, but maybe not. There are very good reasons for keeping data longer than three months. Not all data will be valuable after three months, but it doesn't take much effort at all to come up with counterexamples in which longer retention could make a significant difference in search quality.

      For example, consider people who are in colle

      • by AvitarX ( 172628 )

        Valid points.

        The way you describe things it seems how google now handles my weekly routine.

        It did a very good job of finding out where I go on Friday and Sunday with no input from me, I just started getting travel times about 4 hours before I would go. I can definitely picture a world where they can find these long cycles (or with your examples of vacation and school figure it out the first time from words used and time of year, and be better year two).

        I don't know if they're that smart yet, but they very l

      • by olau ( 314197 )

        Won't argue about the study which may very well be flawed, but I don't think your last assertion is correct.

        The data centers certainly aren't full of search histories. Let's say each person generates 1 KB of data per day in search history (with compression) - that's 1 TB/day to store data from 1 billion. What's the marginal cost of storing that data per year? 100,000 dollars?

        One thing you need to keep in mind is that a company like Google ultimately isn't storing data because of the value it provides to the

        • by AmiMoJo ( 196126 )

          1 TB/day to store data from 1 billion. What's the marginal cost of storing that data per year? 100,000 dollars?

          Google charges me much less than $100k to store 1TB of data for a year, so I can assure you it costs them much less. Let's see, 1TB of HDD space is a few tens of dollars, multiplied by 3 for redundancy, electricity cost for a year, maintenance costs are going to be pretty low along side the million other HDDs they have spinning... Maybe $100, max? I bet it's actually closer to $20 for someone like Google.

          • Don't forget to multiply by 365.2422. Still a lot less than $100K, of course.
            • by AvitarX ( 172628 )

              I'd even bet they spend more on the systems that analyze the data (especially people that figure out how to) than the storage, likely by close to an order of magnitude.

    • "Much more than that doesn't seem too helpful though, three months is a whole lot of searches, and should give plenty of information about what I'm searching for right now."

      I should have thought that three DAYS would be sufficient. But what do I know?

      • by AvitarX ( 172628 )

        I would think at least a month.

        Sometimes I try to find an article to share with someone or some such.

  • As they are being taken out of context. Many websites imported UseNet newsgroups, a popular one was one I frequented.
    Those would be best removed, yet none I regret; other than some of the websites they ended up at.

  • The only reason for data retention is tracking. 3 months ago I was searching for info on what DVDs came out recently. Yesterday I searched for what DVDs came out recently.

    My searches tend to be pretty random. Someone started showing The Incredible Hulk a few months ago. I searched for the show, Bill Bixby, the guy who played the reporter, and Lou Ferrigno. Why? Not cuz I want to buy them, but because I don't have it in me to just sit back and watch a TV show nowdays.

    Hey, Hill Street Blues! Never caught
  • by davecb ( 6526 ) <davecb@spamcop.net> on Tuesday September 19, 2017 @09:37PM (#55228979) Homepage Journal

    And then the suckers "have" their advertizer send me ads for something they know I like... becauseI just bought it, and the advertizers know they can prove my interest to their customers/suckers.

    Net result? You get ads for stuff you bought.

    • by rtb61 ( 674572 )

      There is areas of worthwhile data retention. One requires log in and that is of course blocking specific sites from turning up in your searches, the more that happens for specific sites, the further they drop down search rankings (it would require thousands of down votes). Next up of course is how to better aligning searches ie locality based, and how local, country, state, city and making than easy to use. Next is type of service you are searching, info, sales, repair, showroom, online only etc and how to

  • What impact does this have on I.T.?

    I regularly search for things three to five years old, sometimes I even find my own solutions on a website somewhere.

    If the data retention has no effect on searches three to five years apart, on well aged data, then I've no problem with lower data retention.
  • Over time I've noticed various programming-related phrases that come up as the first result if I'm on my account, but are burried if I'm not.

    So, I'd say it works good for me. Now, if things need to be stored long-term to get the same benefit versus, say, only a couple months, I have no idea.

    • ...although they could just be fiddling with the 'settings' for your search experience - they don't need data retention to do that - they can do it incrementally as you search/browse or whatever.

      Retention is useful (for them) because they can look for new patterns they don't have 'settings' for yet. They can also pigeon-hole you into new categories that they can sell to advertisers.

  • How's that again? (Score:4, Insightful)

    by 93 Escort Wagon ( 326346 ) on Tuesday September 19, 2017 @10:47PM (#55229235)

    What has data retention got to do with search results? Advertising is why they want to hold onto all your data.

    • by GuB-42 ( 2483988 )

      Both search results and advertising work the same : try to find the most relevant site for you. The fundamental difference is than one is paid and the other is not.
      And in both cases short term data retention definitely helps. Long term may give a marginal improvement. One area where long term may help is with periodic tasks. For example if you are doing your taxes, remember what you did the year before may be helpful for both you (ex: you found a great site listing deductibles) and advertisers (ex: you cons

      • Both search results and advertising work the same : try to find the most relevant site for you.

        And I really, really wish they'd stop doing that for both search results and advertising. It works poorly for both and entails a loss of privacy.

      • For example if you are doing your taxes, remember what you did the year before may be helpful for both you (ex: you found a great site listing deductibles) and advertisers (ex: you considered hiring an accountant).

        If I've found a useful site I may want to use in the future - I bookmark it.

        And, unless that "great site" was #1 on my initial search for information... I probably clicked on the links which were presented above it in the Google search results. So it seems unlikely Google is going to know that result #3 was actually the one I preferred rather than result #1 or #2.

    • Google's business is selling exposure to advertisers. Advertisers are, basically, nuts. Nonetheless, advertisers are nutcases with money burning holes in their pockets. (Why would anyone give real money to marketing people?) It doesn't matter if Google's vast trove of data has any real value in matching advertisers to potential buyers. (I'm guessing that it mostly doesn't).

      As long as the advertisers believe it is effective, it works.

  • For me, there were two articles from the 80's that I remember in either Popular Science or Popular Mechanics that were relevant to /. stories in the past couple months. Unfortunately, not even the respective websites could be of any help. I would have really gotten a kick out of reading both articles, but it wasn't to be.

    It comes down to information quality. Most forum answers to a question have a five-year or less value, and while an archive of my travel stories from a couple decades ago might be fun for

    • "the noise" is a failing of the search engine.

      Back in the day, Altavista or Yahoo or whomever used to show you a glorified 'grep' of the Internet. That ended up being a pretty poor experience because fledgling SEO hacks were promoting irrelevant content over more useful stuff. Then Google showed up, and did a far better job of it with their PageRank algorithm. Nowadays we're back to the 'noise' era of old, with a much bigger Internet and far more well funded, well motivated 'SEO hackers'. We need a new algo

  • A 12 month warranty, with the 13 month retention you can bet they'll be looking for the same product once it goes kaput shortly after the warranty expires... another brick in the wall...
  • Logging your searches has nothing whatsoever to do with improving the quality of search results because Google, Bing, and Yahoo don't give a damn about YOU, you're just a farm animal that produces data that they sell to so-called 'partner companies' that turn around and shove ads in your feed-box, expecting you to gobble them up, then defecate money that businesses scoop up to put in their pockets. I'm only half-surprised that they don't claim rights to our corpses when we die so they can sell our organs an
  • Data retention does improve the only thing they care about: monetization.

  • So I'm confused... the study is about "search quality", but I don't understand how they define that term. They were looking at search engines that changed their retention policy. They evaluated search quality before and after. That part sounds good.

    It seems that they counted the number of users coming from search engine A and landing at site B before and after. Can anyone explain how that's an indicator of search quality? Perhaps they want to measure if the search engine lost or gained users?

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...