Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 internet speed test! ×

Comment Hope you have friends inside (Score 5, Insightful) 341

I've been in IT at a major corp and had a supplier that I worked with personally come to me due to non-payment. I had to go pretty far up my chain of command before I found someone who would apply pressure to finance to pay up on the contract that they signed and approved. Had I not been there to facilitate it would have taken even longer, if they got paid at all. The supplier was international so they got a runaround. Wish I had a better answer, but finance depts sometimes like to collect interest on their bank accounts even at the expense of the company's reputation.

Comment Re:Killing? (Score 1) 133

As far as I know it's not an "instead of" type thing. The site that they have now is basically an online video on demand service for existing subscribers. Previously it was only accessible via a computer, so for those of us without their rental cable boxes (go TiVo!) this is the first chance to have easy access to their VoD solution on the TV without running HDMI to a computer.

Nothing at all about killing cable, more features for subscribers.

Comment Re:Sphinx or Lucene (Score 1) 134

The standard /. IANAL applies here, but I'm pretty sure that if you have legal access to the copyrighted text (ie you or someone you know owns a copy of the magazine) then it is ok to create a derivative work for the purposes of searching that work. This is the loophole that Google (name your favorite search engine here) uses, and they go so far as to offer cached versions of some sites.

Lucene, or a more friendly wrapper around it like SOLR, has the option of creating a search index based on an original text from which the original content cannot be extracted (indexed=true, stored=false on a field), so that would seem to cover the case of finding an article without violating the rights of the author or the publisher.

As for not having the text online, I'd suggest either scraping the archive sites in the process of building your search index, it's pretty hard to search something that isn't digitized.

Best of luck, as this sounds like a worthwhile project. I do think that the volume of data you're discussing would fit easily in a SOLR instance that would consume very modest amounts of server resources to operate.

Slashdot Top Deals

"The vast majority of successful major crimes against property are perpetrated by individuals abusing positions of trust." -- Lawrence Dalzell