Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 Internet speed test! ×

Comment Re:Here is my clever idea... (Score 1) 86

Try explaining that to the legacy mainstream media dinosaurs that are still busy taking Google to court for spidering, indexing, and linking to their content, despite the debacle of Spain a few years back, and see how far it gets you. Common sense is in short supply in some corners of the Internet, and fairly large corners at that.

Comment Re:Cautiously saying yes to this (Score 1) 86

I think the law of averages would take care of that. Bandwidth is pretty cheap and the chances are that even if you are constrained by bandwidth, as might be the case with a smaller site on an "xGB/day" hosting plan, then it's more likely to be the case there won't be too many GB of content to spider in the first place. There are always exceptions though, and where there is a real problem there are still going to be workarounds, e.g. explicit opt out clauses for spiders like IA's or, if all else fails, denying access based on User-Agent strings.

It does clearly depend on what effect this might have on the value of "everyone" though. Spidering (for legit purposes and otherwise) is mostly just background noise at present; the real bad actors - cyber criminals - already ignore robots.txt, and not every good actor would significantly benefit from ignoring robots.txt. The only real reasons a good actor might have for ignoring it are for better archiving (as with IA's proposals) or more complete search engine indicies, but if the reason for the content being excluded via robots.txt is that it is highly dynamic, transient, or just fodder for bad robots, then it's of minimal value to search engines anyway. Even if some (or all) of the search engines were to follow IA's lead on this, I think they'd still be looking at balancing that with more intelligence in their spidering just to avoid the risk of cluttering up their databases with broken links and expired data, and that's likely to limit the bandwidth requirements considerably.

Comment Re:yeah (Score 3, Informative) 86

IA does still spider, but they seem to use a more nuanced system than the rudimentary "start at /, then recursively follow every link" approach used by more trivial site spider algorithms. Firstly, they don't download an entire site in one go - they spread things out over time to avoid putting large spikes into the traffic pattern which is more friendly for sites that are bandwidth limited and on things like "xGB/month" plans. Secondly, they have a "popularity weighting" system that governs the order they spider and refresh sections of a given site, which is the main reason for the difference between the level of content for popular and less popular sites - although I have no idea whether that's based entirely off something like the site's Alexa ranking or is also weighted against how dynamic the content is (e.g a highly dynamic site like Slashdot would get a bump up the priority, whereas a mostly static reference site might get downgraded). Combine the two approaches and you get the results you are seeing: major web homepages get spidered more or less every day with several levels of links retrieved, while some random personal blog only get spidered every few weeks or more, and only with the homepage and first level or two of links ever getting looked at.

Comment Re:yeah (Score 4, Informative) 86

Even more specific robots.txt directive for this instance:

User Agent: ia_archiver
Disallow: /


As is often the case, Lauren is going off half-cocked with only part of the story. The IA already has a policy for removal requests (email info@) and is only considering expanding their current position of ignoring robots.txt on sites outside their current "test zone" of the .gov and .mil gTLD domains and have not had any problems. They probably will do that (and for their archival purposes it's a good idea in principle), but I think it's only fair to see whether or not they listen to the feedback and provide some specific opt-out policy and technical mechanisms like at least honoring either of the above prior to going live on the rest of the Internet before starting to scream and shout. It's going to be a two-way street anyway because they're going to find a lot more sites that feed multiple-MB of pseudo-random crap to spiders that ignore robots.txt to try and do things like poison spammer's address lists, so it's actually in their best interests to provide an opt-out they honor.

Besides, it's going to be interesting to see what kind of idiotic crap web admins who should know better think is safely hidden and/or secured because of robots.txt - it's useful to know who is particularly clueless so you can avoid them at all costs. :)

Submission + - Devuan Jessie 1.0.0 stable release candidate announced (devuan.org)

jaromil writes: Devuan 1.0.0-RC is announced, following its beta 2 release last year. The Debian fork that spawned over systemd controversy is reaching stability and plans long term support. Devuan deploys an innovative continuous integration setup: with fallback on Debian packages, it overlays its own modifications and then uses the merged source repository to ship images for 11 ARM targets, a desktop and a minimal live, vagrant and qemu virtual machines and the classic installer isos. The release announcements contains several links to project that have already adopted this distribution as a base OS.

Submission + - EFF Says Google Chromebooks Are Still Spying on Students (softpedia.com)

schwit1 writes: In the past two years since a formal complaint was made against Google, not much has changed in the way they handle this.

Google still hasn't shed its "bad guy" clothes when it comes to the data it collects on underage students. In fact, the Electronic Frontier Foundation says the company continues to massively collect and store information on children without their consent or their parents'. Not even school administrators fully understand the extent of this operation, the EFF says.

According to the latest status report from the EFF, Google is still up to no good, trying to eliminate students privacy without their parents notice or consent and "without a real choice to opt out." This, they say, is done via the Chromebooks Google is selling to schools across the United States.

Submission + - 107 Cancer Papers Retracted Due To Peer Review Fraud (arstechnica.com)

An anonymous reader writes: The journal Tumor Biology is retracting 107 research papers after discovering that the authors faked the peer review process. This isn’t the journal’s first rodeo. Late last year, 58 papers were retracted from seven different journals— 25 came from Tumor Biology for the same reason. It’s possible to fake peer review because authors are often asked to suggest potential reviewers for their own papers. This is done because research subjects are often blindingly niche; a researcher working in a sub-sub-field may be more aware than the journal editor of who is best-placed to assess the work. But some journals go further and request, or allow, authors to submit the contact details of these potential reviewers. If the editor isn’t aware of the potential for a scam, they then merrily send the requests for review out to fake e-mail addresses, often using the names of actual researchers. And at the other end of the fake e-mail address is someone who’s in on the game and happy to send in a friendly review. This most recent avalanche of fake-reviewed papers was discovered because of extra screening at the journal. According to an official statement from Springer, the company that published Tumor Biology until this year, “the decision was made to screen new papers before they are released to production.” The extra screening turned up the names of fake reviewers that hadn’t previously been detected, and “in order to clean up our scientific records, we will now start retracting these affected articles...Springer will continue to proactively investigate these issues.”

 

Submission + - Britain Set For First Coal-Free Day Since Industrial Revolution (theguardian.com)

An anonymous reader writes: The UK is set to have its first ever working day without coal power generation since the Industrial Revolution, according to the National Grid. The control room tweeted the predicted milestone on Friday, adding that it is also set to be the first 24-hour coal-free period in Britain. The UK has had shorter coal-free periods in 2016, as gas and renewables such as wind and solar play an increasing role in the power mix. The longest continuous period until now was 19 hours – first achieved on a weekend last May, and matched on Thursday. Hannah Martin, head of energy at Greenpeace UK, said: “The first day without coal in Britain since the Industrial Revolution marks a watershed in the energy transition. A decade ago, a day without coal would have been unimaginable, and in 10 years’ time our energy system will have radically transformed again." Britain became the first country to use coal for electricity when Thomas Edison opened the Holborn Viaduct power station in London in 1882. It was reported in the Observer at the time that “a hundred weight of coal properly used will yield 50 horse power for an hour.” And that each horse power “will supply at least a light equivalent to 150 candles”.

Submission + - Ask Slashdot: How Do You Explain "Don't Improve My Software Syndrome" or DIMSS? 7

dryriver writes: I am someone who likes to post improvement suggestions for different software tools I use on the internet. If I see a function in a software that doesn't work well for me or could work better for everyone else, I immediately post suggestions as to how that function could be improved and made to work better for everybody. A striking phenomenon I have come across in posting such suggestions is the sheer number of "why would you want that at all" or "nobody needs that" or "the software is fine as it is" type responses from software users. What is particularly puzzling is that its not the developers of the software rejecting the suggestions — its users of the software that often react sourly to improvement suggestions that could, if implemented well, benefit a lot of people using the software in question. I have observed this happening online for years even for really good software feature/function improvement ideas that actually wound up being implemented. My question is — what causes this behavior of software users on the internet? Why would a software user see a suggestion that would very likely benefit many other users of the software and object loudly to that suggestion, or even pretend that "the suggestion is a bad one"?

Submission + - Russia is better at encouraging women into tech?

randomErr writes: A new study from Microsoft based on interviews with 11,500 girls and young women across Europe finds their interest in engineering or technology subjects drops dramatically at age 15. The reason found are that girls follow gender stereotypes, have few female role models, peer pressure and a lack of encouragement from parents and teachers. Russia is different. According to Unesco, 29% of women worldwide are in science research, compared with 41% in Russia. In the UK, about 4% of inventors are women, whereas the figure is 15% in Russia. Russian girls view Stem far more positively, with their interest starting earlier and lasting longer, says Julian Lambertin, managing director at KRC Research, the firm that oversaw the Microsoft interviews.

Submission + - Light Sail propulsion could reach Sirius sooner than Alpha Centauri (arxiv.org)

RockDoctor writes: A recent proposition to launch probes to other star systems driven by lasers which remain in the Solar system has garnered considerable attention. But recently published work suggests that there are unexpected complexities to the system.

One would think that the closest star systems would be the easiest to reach. But unless you are content with a fly-by examination of the star system, with much reduced science returns, you will need to decelerate the probe at the far end, without any infrastructure to assist with the braking.

By combining both light-pressure braking and gravitational slingshots, a team of German, French and Chilean astronomers discover that the brightness of the destination star can significantly increase deceleration, and thus travel time (because higher flight velocities can be used. Sling-shotting around a companion star to lengthen deceleration times can help shed flight velocity to allow capture into a stable orbit.

The 4.37 light year distant binary stars Alpha Centauri A and B could be reached in 75 years from Earth. Covering the 0.24 light year distance to Proxima Centauri depends on arriving at the correct relative orientations of Alpha Centauri A and B in their mutual 80 year orbit for the sling shot to work. Without a companion star, Proxima Centauri can only absorb a final leg velocity of about 1280km/s, so that leg of the trip would take an additional 46 years.

Using the same performance characteristics for the light sail the corresponding duration for an approach to the Sirius system, almost twice as far away (8.58ly), is a mere 68.9 years, making it (and it's white dwarf companion) possibly a more attractive target.

Of course, none of this addresses the question of how to get any data from there to here. Or, indeed, how to manage a project that will last longer than a working lifetime. There are also issues of aiming — the motion of the Alpha Centauri system isn't well-enough known at the moment to achieve the precise manoeuvring needed without course corrections (and so, data transmission from there to here) en route.

Submission + - Ambient Light Sensors Can Be Used to Steal Browser Data (bleepingcomputer.com)

An anonymous reader writes: Over the past decade, ambient light sensors have become quite common in smartphones, tablets, and laptops, where they are used to detect the level of surrounding light and automatically adjust a screen's intensity to optimize battery consumption... and other stuff. The sensors have become so prevalent, that the World Wide Web Consortium (W3C) has developed a special API that allows websites (through a browser) to interact with a device's ambient light sensors. Browsers such as Chrome and Firefox have already shipped versions of this API with their products.

According to two privacy and security experts, malicious web pages can launch attacks using this new API and collect data on users, such as URLs they visited in the past and extract QR codes displayed on the screen. This is possible because the light coming from the screen is picked up by these sensors. Mitigating such attacks is quite easy, as it only requires browser makers and the W3C to adjust the default frequency at which the sensors report their readings. Furthermore, the researcher also recommends that browser makers quantize the result by limiting the precision of the sensor output to only a few values in a preset range. The two researchers filed bug reports with both Chrome and Firefox in the hopes their recommendations will be followed.

Comment Re:Cool (Score 2) 128

Actually, this is entirely on ARIN rather than ICANN these days, and they absolutely allow transfer of IPv4 space for money (subject to a few criteria) and have done so for some time as part of their approach to dealing with IPv4 exhaustion. There's also nothing to say that these IPs have never been used by MIT - for all we know they were previously in use but have been freed up as part of MIT's IPv6 rollout - and since Amazon needs IPv4 space for their growing cloud platforms and can clearly afford this many IPs in one go it makes sense for MIT and Amazon to do a deal rather than parcel them out piecemeal to multiple users.

IPv4 space has been a resource with a sell by date for some time; at some point (probably still some way off) IPv6 will gain critical mass and the value of IPv4 space will plummet, but until then its basically a game of chicken against that unpredicatable deadline. You can sell now, and maybe get $10/IP (for suitably large allocations), or you can wait a bit longer and gamble on either making more money for your IP space as people get more desperate, or wiping out because IPv6 has finally taken off and demand for IPv4 space has dropped. MIT could easily have held on to the IPs for a few more years, and would likely have make a lot more as a result, but by doing a deal now they've actually helped Amazon grow their cloud and put the IPs into productive use again. Sure, MIT likely made a lot of money on the deal, but that's still better than having the IP space sitting around doing nothing at all.

Submission + - Neuroscientists offer a reality check on Facebook's "typing by brain" project (ieee.org)

the_newsbeagle writes: Yesterday Facebook announced that it's working on a "typing by brain" project, promising a non-invasive technology that can decode signals from the brain's speech center and translate them directly to text (see the video beginning at 1:18:00). What's more, Facebook exec Regina Dugan said, the technology will achieve a typing rate of 100 words per minute.

Here, a few neuroscientists are asked: Is such a thing remotely feasible? One neuroscientist points out that his team set the current speed record for brain-typing earlier this year: They enabled a paralyzed man to type 8 words per minute, and that was using an invasive brain implant that could get high-fidelity signals from neurons. To date, all non-invasive methods that read brain signals through the scalp and skull have performed much worse.

Comment Re:RTFMA (Score 5, Informative) 128

Needs an "M" in there for "misleading". MIT hasn't released the entire /8 back to ARIN; AFAICT from whois queries they've transfered a whole bunch of /16s (20+) directly over to Amazon, all of which are above the 18.145.0.0 line. Given the highly non-contiguous allocations across the upper half of the /8 range the most likely cause is that they've received chunk of cash for giving Amazon all the /16s that they were not currently actively using.

Slashdot Top Deals

I haven't lost my mind -- it's backed up on tape somewhere.

Working...