Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×

Trending Low-Volume Google Searches with Gootrude 37

michaelrash writes "The Google Trends project provides some visibility into how popular search terms like 'Myspace' or '2008 Election' change over time and points out relevant news articles that create jumps in search volume. This is a handy tool, but there are many search terms that Google Trends does not display any results for. Such terms (such as 'Linux Firewalls' — with the quotes) have insufficient search volumes to display graphs according to the error message that Google Trends generates. Fair enough. Google sets an internal threshold on search volume, and this threshold could be set for reasons that range anywhere from Google Trends is still experimental to Google not wanting to provide data on how it builds its massive search index for emerging search terms. Either way, I would like a way to see search term trends that Google doesn't currently make available to me. So, I've released an open source project called 'Gootrude' to do just this. For the past year Gootrude has collected a set of low-volume search terms and interfaced with Gnuplot to visualize them."
This discussion has been archived. No new comments can be posted.

Trending Low-Volume Google Searches with Gootrude

Comments Filter:
  • wow (Score:3, Insightful)

    by Gewalt ( 1200451 ) on Monday June 16, 2008 @09:09AM (#23810141)
    wow, um...congrats I think? I mean, after you get over your pat on the back, can anyone explain why this matters?
    • Re: (Score:1, Redundant)

      by Gewalt ( 1200451 )
      It's not a troll. His data is not what google trends reports, and isnt even remotely comparable to what google trend reports. In short, his results do not have any use at all. So really, can anyone explain why this matters?
  • I took the time to look through the work - looks impressive for a "hobby project".

    The only thing I feel is missing is more options to narrow the searches and statistics on geographical information.

    Does anybody have some thoughts on how reliable this tool is? And what the terms for using (read: distributing the data/results) the data is?

    - Jesper
    • Just did searches on all of the terms the author mentions and got a few different numbers:

      1. "iptables attack visualization" -- 19 results (~35) (close)
      2. "single packet authentication" -- 93 (1,300) -- off by more than 1 magnitude
      3. "linux firewalls attack detection" - 9290
      3a. "Linux Firewalls Attack Detection" - 9240 (~9000) (close)
      4. cipherdyne -- 85,200 (~70,000) ~off a bit
      4a.Cipherdyne -- 84,500 (~70,000)
      5. gpgdir (same)
      6. fwsnort (same)
      -------
      Note...caps vs. no caps made no difference on
  • It it only me.... (Score:5, Insightful)

    by vidarh ( 309115 ) <vidar@hokstad.com> on Monday June 16, 2008 @09:11AM (#23810165) Homepage Journal
    ... or does the author of this tool seemingly not realize that Google Trends reports volume of searches, while what he's tracking is amount of documents indexed for a search term, and that there's no basis for assuming the two are correlated in a meaningful way?
    • Re:It it only me.... (Score:5, Interesting)

      by Gewalt ( 1200451 ) on Monday June 16, 2008 @09:15AM (#23810235)
      I find it highly unlikely that someone who can make the page in question would not be smart enough to also understand what it is that google/trend is really doing, and as such, I choose to believe instead that the author is being intentionally deceptive.
      • The perspective he seems to be taking is not so much 'what users search for' but more 'what users post about or publish' with a view to studying the correlation of a large site publishing something and then the number of other websites or pages picking it up and running with it.

        I'm pretty sure he understands what he's doing, the article summary is just a bit twisted.

        --
        Free Playstation 3, XBox 360 and Nintendo Wii [free-toys.co.uk]
      • I find it highly unlikely that someone who can make the page in question would not be smart enough to also understand what it is that google/trend is really doing, and as such, I choose to believe instead that the author is being intentionally deceptive.
        It's a trap!
    • by aleph42 ( 1082389 ) * on Monday June 16, 2008 @09:27AM (#23810403)
      Agreed, the summary is misleading, as is the comparaison (from TFA) to googletrends.

      This aside, the interest of "gootrude" is that it's not porvided by google, and so it's part of the many efforts to reverse engineer how goole comes up with his numbers.

      Specificaly, it appears from TFA that the "number of results" stated by google is a wild guess for low numbers (1,000-10,000), with very sharp variations which hint at an iterative process.

      So as I get it, it's not a tool for you and me, rather for google specialists.
  • Google trends plots how popular a search phrase is. This mashup of google results is not that at all. it is nothing more than a mashup of the count of pages in google's database. it has nothing to do with how often a phrase is searched for.
  • Google Trends plots the frequency of queries, i.e. the number of times information is asked about a subject. Gootrude plots the number of pages found, or the quantity of information google can retrieve on this subject. These are completely different.
    • Many thanks for making this clear : this is also what I had fathomed from the very clear summary, but wasn't too sure.

      Well.. we might actually be the two wrong ones :)

      Al.

  • Such terms (such as "Linux Firewalls" â" with the quotes) have insufficient search volumes to display graphs according to the error message that Google Trends generates.
    Try Linux Firewall [google.com] in quotes as the search term for some results.
  • Spore (Score:1, Offtopic)

    by Chemisor ( 97276 )
    Have you noticed how "spore demo" is the 77th top search? On the WHOLE INTERNET! :)
  • by molo ( 94384 )
    Google trends measures what people are seaching for, while Gootrude measures how many results are in the google database for a given term. These are not even remotely the same thing.

    -molo
  • by swarsron ( 612788 ) on Monday June 16, 2008 @09:39AM (#23810545)
    Besides not being the same as google trends, this tool is not allowed by the TOS of google. Automatic querying of their services without prior permission is forbidden by google. But since it probably won't put any noticeable load on their network they most likely won't care
    • by Vectronic ( 1221470 ) on Monday June 16, 2008 @10:20AM (#23811105)
      Until there was an article posted on Slashdot that is.
    • Re: (Score:1, Offtopic)

      by vrmlguy ( 120854 )
      Since I'm always forgetting to log my business driving, I've got a program that uses Google maps to figure out the driving distance between various pairs of points. It uses two files, one consisting of about 250 lines like this:
      home, office, client-a, restaurant-x, client-b, home
      home, client-b, restaurant-y, client-b, home
      and the other listing street addresses for everyone. I'm sure it's a big violation of Google's ToS, but it tries to play fair: it caches the distances that
    • Google has a relatively simple API you can apply for to allow for a fixed number of automated queries of their system. It doesn't actually give you new functionality but does make automated queries of their databases "authorized". Without the API license key, you run the risk of getting noticed by them and ban-hammered if they think your just a bot scraping their data, something they do NOT like. I think this article just got in because it had both Google and Open Source as subjects. If they have figured a
      • Re: (Score:2, Informative)

        by swarsron ( 612788 )
        Google doesn't give out any more keys for this api, only old keys continue to work. So if you don't already have a key you're out of luck
  • to do something similar with my parody of google [librelogiciel.com] where search terms can be looked at in real time (empty or spammy search terms are replaced with fake words on display, but not in the history).
  • Everyone has already noted that this only tracks hits, not searches. I'd like to suggest a few code improvements.

    At a high level, use RRD (http://search.cpan.org/~nicolaw/RRD-Simple-1.43/lib/RRD/Simple.pm [cpan.org]) for the underlying database. RRD is used by MRTG to track time-varying data over multiple time scales, keeping details for recent data and summaries for historical data. RRD also comes with its own plotting module, although you could keep using Gnuplot if you wish.

    In the code itself, there are places wh
  • This article has been on /. for almost 3 hours and "Linux Firewalls" still isn't a significant enough search query for Google Trends? Well THAT is surprising.
  • Everytime I see graphs with a moving average, be it in TFA or some stock market graph it makes me cringe. OK, the moving average isn't the best filtering out there, there's a whole range of finite impulse response filters that have a more desirable frequency response than a moving average (which is convolution a rectangle, which means its frequency response is essentially a sinc function, which means a shitload of ripples), but why on Earth don't they compensate for the delay induced by the convolution?

    Why

  • Privacy? (Score:3, Insightful)

    by Temporal ( 96070 ) on Monday June 16, 2008 @06:11PM (#23816627) Journal

    Google sets an internal threshold on search volume, and this threshold could be set for reasons that range anywhere from Google Trends is still experimental to Google not wanting to provide data on how it builds its massive search index for emerging search terms.
    Or maybe for privacy reasons? Some search queries implicitly reveal the identity of the person making them. Such queries are naturally low-volume, so refusing to show low-volume queries is an effective way to protect the privacy of the searchers.
  • I have updated my original post to address some of the comments made here on Slashdot. Peer review is always good, and thank you all for the insights.

"There are some good people in it, but the orchestra as a whole is equivalent to a gang bent on destruction." -- John Cage, composer

Working...