Please create an account to participate in the Slashdot moderation system


Forgot your password?
Slashdot Deals: Deal of the Day - 6 month subscription of Pandora One at 46% off. ×

Submission + - Theologian attempts censorship after losing public ( 3

RockDoctor writes: Theologian John Haught publicly debated prominent evolutionary scientist and atheist Jerry Coyne at the University of Kentucky back in October. Before the debate, both parties agreed to the debate being video-taped. Coyne is of the opinion that he convincingly won the debate over Haught. But we'll never know, because Haught, with the assistance of staff at University of Kentucky who sponsored the debate, is banning publication of the video of the event. They are even refusing to release the half of the debate containing Coyne's comments and questions, which is his intellectual property. And that latter is theft, plain and simple, in addition to Haught's cowardice.

Submission + - Collective Intelligence (

aceydacey writes: "The web is the largest collection of information known, and it is open for anyone to analyze. So, while "hard" AGI (Artificial General Intelligence) may be in a three decade long slump, using the web as a vast database for data mining, analysis and machine learning is a fruitful field of endeavor. In addition, the web makes millions of users available to give rapid feedback to ingenious programmers. The book "Programming Collective Intelligence" is aimed squarely at using this collected data and feedback from the web to make intelligent applications.

I acquired the book when it first came out in October 2007. After reading the first third of it and scanning the rest, I "loaned" it out to my computer scientist nephew but soon began to miss it. Upon acquiring my second copy of the book, I began a more thorough reading and used tidbits of it in my own web scripting exploits. Thus, in contrast to my usual habit of devouring good books in a wild weekend orgy of reading, I have slowly absorbed the book's contents over a one year time period, and I find that to be a valuable way to use the book. It is a useful hands-on reference for active web programming.

The book covers many varieties of web data mining and analysis, and provides actual, working code to support each subject. The code is all in the python language, which is especially useful to python hackers, but is also quite readable and understandable to just about any programmer.

Many of the algorithms and techniques covered are accompanied by an open source tool for doing the analysis, and this makes it very easy to experiment and use the coding techniques. Also, there are discussions of several open API's for various web sites, such as Yahoo, Delicious, My Space, etc. At the end of the day, however, I find the most useful tool to be Beautiful Soup, the python library for screen scraping where there are no standard API's offered, which is also covered along with the Python Imaging Library, pysqlite, NumPy, matplotlib, and Mark Pilgrim's Universal Feed Parser.

Author Tony Segaran states in the preface that no specialized mathematical knowledge is required, and he does a good job of explaining every concept he introduces. I have an undergraduate degree in math and the coverage of statistical analysis is really the heart of the book. Mathematical and statistical algorithms covered include Euclidean distance, Pearson Correlation Coefficients, weighted means, Tanimoto coefficients, conditional probabilities, Gini impurities, entropy, variance, Gaussian functions, and dot products. I believe the math is accessible to most programmers but YMMV. Do not buy the book unless you are interested in statistical analysis.

Each chapter explores a different way of analyzing and using data collected from the web to solve a particular kind of problem. One chapter explores web crawlers and page rankings and includes a version of Google Page Rank. Another chapter shows how to make a site that recommends movies to a user based on how similar their likes and dislikes are to a vast user database of individual likes and dislikes culled from using that site's standard API. Other kinds of web apps covered include use of Google maps and analyzing word frequency in textual material. There is also a coverage of spam filtering software.

Algorithms that are given extensive coverage include Bayesian classifiers, decision tree classifiers, neural networks, genetic programming and genetic algorithms, Support Vector Machines, k-nearest neighbors, clustering, hierarchical clustering, k-means clustering, multidimensional scaling, non-negative matrix factorization, optimization, cost functions, and simulated annealing. By now you should be getting the idea that this is a very technical book, but don't let that scare you off. If you are interested in this sort of thing, then this book will walk you through it and help you use the working code the book freely offers without trying to make you a world class expert in each and every technique, which of course would be impossible in a book that summarizes such a broad area. ,p>Web sites, blog entries, and similar data sets can be analyzed to find similarities and differences in many different ways, and filtered to pick out the significant features of the data from all the noise. One section of the book goes through political blogs to find these kinds of trends in the data. Another chapter analyzes stock market data from Wall Street to find trends and tendencies.

Genetic programming and genetic algorithms are covered and working code is introduced that can be used to play around with genetic programming in all sorts of situations. This is a real strength of the book, that it walks you through the creation of object oriented, working code that you can apply to problems and applications completely unrelated to the specific problems explored in the book. For instance, the file created to use genetic algorithms is applied in the book to a simple AI game and also to analyzing a specific mathematical function. However, the classes and functions in can be used equally well to apply genetic programming to other subjects. The same is true for the book's coverage of neural networks and other techniques. The book creates jumping off points from which you can hack and explore in endless directions and for endless hours. I would not be surprised if the book starts a few intrepid souls down pathways that lead to significant new applications. For most of us, it will be an educational exercise that at least allows us to better appreciate the creations of those intrepid few.

I think the book is accessible to any programmer who is not afraid of a little math. If you are a Python programmer who is interested in Web 2.0 type applications, you really can't live without this book, so to speak. It was published in late 2007 and will stay fresh for at least a decade because the kind of statistics and mathematical analysis covered do not change over time. Even if you don't actually code web 2.0 type apps it is interesting to see how this stuff works behind the scenes and so reading the book is highly educational. For me, the book will be a source of scripting fun for many years. It is a book that I will turn to again and again. If you've ever wondered "how does Amazon make recommendations of books I might like?" then this is the book to answer your question.

For more about Python, try my Python411 podcast series."


Submission + - What to build into a new home?

OldButNotWise writes: I'm lucky enough to be planning to build a new home. Houses last decades, while technology changes drastically in just a few years. So what should I be planning to put into my new house? Wire for central DC power? Count on wireless, or bring a net connection to each room? What about entertainment? Central music/video server, wired how? Network controllable heating? What about security systems? If you were in my shoes, what would you build today to handle your needs in 10 or 20 years?

Submission + - IP Meets Physical Reality - Next Stop for Google? ( 1

An anonymous reader writes: When Google is clouding the borderline between web and the desktop, a much, much smaller project is blurring the border between the Internet and the physical reality: the newly released Contiki operating system version 2.2.1. Contiki runs on networked wireless sensors that are used for anything from road tunnel monitoring for fire rescue operations to collecting vital statistics from ice hockey players. These sensors typically have as little as a few kilobytes of memory and a few milliwatts of power budget — a thousandth of the resources of a typical PC computer — yet Contiki provides them with full TCP/IP connectivity. Meanwhile, San Francisco is monitoring parking spaces with wireless technology. If IP can run on anything and the Internet is about to reach out into the physical reality, what happens when Google gets involved?

Submission + - Musical Taste Linked to Personality

Hugh Pickens writes: "An extensive psychological survey of more than 36,000 music lovers confirms that our musical tastes really do reflect our personality. The research, by the department of psychology at Heriot Watt University in Edinburgh, asked people worldwide to describe their personality, and then to list their favorite musical genres and the results showed a distinct correlation between people's personality traits and the style of music they enjoy. One of the study's most interesting discoveries was the similarity between the personalities of fans of classical music and heavy metal who are creative and at ease but not outgoing. Fans of indie music were found to have low self-esteem and little motivation, but described themselves as creative. Rap enthusiasts, on the other hand, tend to think a lot of themselves and are extremely outgoing. Professor Adrian North, who led the study, said: "People often define their sense of identity through their musical taste, wearing particular clothes, going to certain pubs, and using certain types of slang. It's not so surprising that personality should also be related to musical preference.""

Submission + - US DoD poll on leap seconds (

@10u8 writes: "For time scales to leap or not to leap has been the question here before. The ITU-R will be considering leap seconds again in a few weeks. This week the USNO posted a survey about leap seconds by the US DoD. The issue has civil implications as well as technical ones, and there is a demonstrated way to respect the history, remove leaps from navigation and POSIX time, yet keep the sun overhead at noon."

Submission + - NetBSD gets a journaling FFS (

jschauma writes: NetBSD's Simon Burge has added metadata journaling to the FFS (fast file system) code to NetBSD-current. The journaling code, known as WAPBL — Write Ahead Physical Block Logging — was originally written by Darrin B. Jewell for Wasabi Systems, Inc., and was contributed by Wasabi to the NetBSD community earlier this year. Wasabi has been shipping WAPBL-enabled products since 2003.

Submission + - Particle accelerator uncovers hidden paintings (

Chatsubo writes: Paintings that were painted over by Vincent Van Gogh are being uncovered by science, From the article: 'Dik and Janssens used high-intensity X-rays from a particle accelerator in Hamburg, Germany to compile a two-dimensional map of the metallic atoms on the painting beneath "Patch of Grass," which is part of the large Van Gogh collection in the Kroller-Muller Museum in the Netherlands. Knowing that mercury atoms were part of a red pigment and the antimony atoms were part of a yellow pigment, they were able to chart those colors in the underlying image.'

Submission + - OpenBSD and Linux code sharing issue resolved

An anonymous reader writes: According to an undeadly article: "All the copyright holders of the Linux ath5k-driver code, derived from ar5k, have been contacted and have agreed to license their changes under the ISC license, thus allowing improvements to be re-incorporated into OpenBSD." So, after much drama, much of it documented on Slashdot, it looks like OpenBSD will be able to benefit from Linux-specific changes to its Atheros code after all.

Submission + - OpenBSD patched BIND9 10 years ago ( 1

juct writes: "News from the "they-could-have-known-better-department": As heise Security reports OpenBSD changed the buggy implementation of the pseudo random number generator before switching from BIND 8 to Bind 9 back in 1997. So OpenBSD was not affected by the recent Cache Poisoning problem in Bind 9. According to Theo deRaadt the OpenBSD team even told ISC that their PRNG was flawed — but "the didn't listen"."

"You can't get very far in this world without your dossier being there first." -- Arthur Miller