If you're trying to have a pissing contest to prove you know how to spoof the system and create bad data - congratulations - you win.
But with over 8 million "hits", and statistician will tell you that the data is statistically significant within some margin of error. Any researcher worth their weight will take bad data and outliers into consideration.
The overwhelming majority (easily > 99%) of the hits are valid. Most people (i.e. general public, not slashdotters or technical people) - who use my website don't know an IP address from a street address, and don't know cookies from brownies. I know my demographics very, very well - I know the types of people that participate on the site - YOU DO NOT. For instance, do you know how many times I've gotten an email from a user with the question "Where's the zip code on the dollar bill?" or sign on as "email@example.com" or "hotmial.com" ?? ALOT OF THEM. We're not dealing with technical people, I'm usually dealing with people who can barely figure out how to log on and send email.
The very few people who do attempt to spoof the system are *usually* detected. I've gotten very good at the detecting the patterns of abuse after doing this for eight years. Do some slip bogus entries through - sure they do.. and that's why anyone using this dataset would take that into consideration.