lamaditx - Slashdot User

Submission + - Collective Intelligence in Action

Submitted by lamaditx on Sunday February 01, 2009 @09:00PM

lamaditx writes: [EDITORS PLEASE NOTE: 1. I am not a native speaker so there might be mistakes. 2.You may reach me on ICQ: 304350346 or on skype: lamaditx if I am online. Otherwise I will check my mails at least every day. 3.I have read the "Slashdot Book Review Guidelines" and tried my best to meet your expectations. ]

The book "Collective Intelligence in Actions" shows you how to apply theory from Machine Learning, Artificial Intelligence and Data Mining to your business case. The goal is to create systems which make use of data created by groups of people i.e. social networks and abstract from these to gain new or additional information. Some of you might think "yet another kind of Web 2.0 topic". This is one application one might think of, but the input and output format do not matter that much. You can use these methods anywhere as long as your amount of data is big enough. You will find some examples related to the latest web technologies to explain methods but the code is rather generic. Also you won't find a lot disturbing details about HTML, HTTP and the like.

There are three main parts. The first part explains how to gather data from external sources or internal repositories. The second part, "Deriving Intelligence", explains how to analyze the collected data. This is the part where you gain information and create new knowledge. This does not help you much unless you find a way to use this in you application. The third part — which is also the shortest — provides you with some information how to use the results in order to build user centric applications. This is obviously the best way to create a unique difference no matter what kind of services you may want to provide.

I have to admit that I waited for such a book for some time. After studying "Artificial Intelligence — a modern approach" — maybe THE book about AI — I felt like knowing a lot theory but missed the practical aspects, meaning that I was not sure about how to apply the concepts. Maybe this is also an important aspect we need to point out: Several AI concepts are used in this guide but you don't create an AI system or an agent. Don't mix up those two even though they are similar.

The "in Action" series in supposed to show how things are done in practice. You can expect a lot of Java code samples (available here ) and advises. Several open source tools are introduced to enable you to build your own system. These are also Java tools. It's up to you if you prefer to use Java or some other language. From my perspective it does not really matter which language you choose because the concepts can be implemented using other languages as well. The main drawback is that you will not be able to use Java Data Mining API (JDM) which is used extensively.

The first chapter introduces the main terms and concepts of the book. It is available here together with chapter 2 and the source code. One thing I consider to be an important prerequisite are mathematics. Most aspects are easy to read and understand if you have some knowledge about statistics and linear algebra. One the other hand you can still get it with basic maths because the explanation is well written. The same holds for standard concepts and algorithms like word stemming, decision trees, Bayesian networks or k-means. These are summarized with the most important properties such that you don't require prior knowledge. You will notice that the chapter, like the following ones, ends with a large amount of references.

Personally I find it hard to read formulas when they are describes in words (like: take the square root of x and multiply with y) instead of the mathematical notation. This is due to the fact that you cannot look up the formula quickly, because it does not stand out from the text. It might have been better to provide the formula in words and a mathematical notation as well. You will find some formulas in mathematical notation but some are really hard to read since they are printed in a font size of about 4 while the text is written in 10.

Coming back to the content: The other sections of the first part show you how to gather data from external online sources. Of course you can apply the same concepts to offline sources or other data repositories. The key is to collect usable data to derive intelligence later on. One example is generating tags from a number of sources and associate each tag with a weight relative to the occurrence of the tag. The result will be one of the well known tag clouds.

You will need a persistent data storage such as a database for the results and access them in the second step. Unsurprisingly you will find several ER diagrams to create the right data structure. A big plus it that the author tells you explicitly the important facts which can be derived from formulas or (ER-) diagrams. Reading the text is much more convenient this way. He will also provide implications for the database design when discussing ER diagrams. You can be sure that you do not miss the important points.

The second part starts with an introduction of data mining and machine learning terminology and concepts. You are also introduced to the JDM API which proves to be helpful in the future. You may start looking for a substitute if you choose not to use Java. The extensive usage of design patterns in almost every aspect eases the change from Java to an alternative language. You get to know the common methods and how to implement them. I consider this part to be more or less craftsmanship after all. There is some magic to it if you never heard anything about the utilized methods.

The only thing that caught my eye was the calculation of the inverse of a matrix. The notation is pretty common when solving linear equations but you should never (except in rare cases) use the plain matrix inversion operation when implementing your solution. The reason is that the amount of effort to be undertaken grows exponentially. The more data is used the larger the matrix will be and thus the longer it will take to compute the inverse. Instead one should use i.e. LU decomposition. The footnote points you to use the weka.core.matrix.Matrix class, which uses LU decomposition, but make sure about that if you use some other package or some other language. Count it as a sign of quality that I do not have additional comments on this part.

The last 80 pages enable you to make use of your information gain and integrate it in the application. This is also the shortest part but that is due to the fact that the heavy lifting was done in part one and two. Application means basically querying your data in the correct way to generate the right recommendations for your users. One part of that is searching and the other one is recommending. You may imagine the necessary effort to undertake if you ever happened to take a look at the way search engines work. The author deals with that by using the open source search engine Nutch together with Lucene in such a way that you just use the interfaces. This approach enables the author to keep the last part as short as it is and let yourself imagine what you want to do with the knowledge you discovered. I understood the last part as an example how to gain from your data and enable you to do it for your specific domain on your own.

I consider "Collective Intelligence in Action" to be a very good book. It is thought through from the beginning to the end. Examples are not just presented to the reader but evolve step by step. You know why things are done the way they are which enables you to change every aspect in a way you need to. From my point of view this is the right way to do it because a copy and past solution would not get the job done. I pointed out some issues that could be done better such as too small fonts in graphics or missing literature references in the text. However these are not major problems or content errors that should be blamed on the author. Finally I think you will gain from this book because it addresses Web 2.0 to some degree but is generic enough for other applications as well. It might be a good supplement to understand the application of main methods in artificial intelligence.

Adrian Lambeck is a graduate student in "Media and Information Technologies" and uses C# more often than Java.

Submission + - Pogramming .NET 3.5

Submitted by

lamaditx

on Monday November 03, 2008 @10:48AM

lamaditx writes: "[EDITORS PLEASE NOTE: 1.I received this book through the O'Reilly Linux User Group Programme. http://ug.oreilly.com/ . I was asked to write this review but did not gain any benefit except the book itself. I was not influenced by O'Reilly in any way nor did I submit this review to anybody but slashdot. Everything I wrote is my personal opinion. 2.Also I am not a native speaker so there might be mistakes. 3.You may include my e-mail address something like this "adrian [dot] lambeck [at] tu-harburg [dot] de" . Please do not put it online in plain text as I want to avoide spam 4.You may reach me on ICQ: 304350346 or on skype: lamaditx if I am online. Otherwise I will check my mails at least every day. 5.I have read the "Slashdot Book Review Guidelines" and tried my best to meet your expectations. ]

The world of the .NET framework is taken to the next level by the release of .NET 3.5. It is quiet easy to get lost in all the different terms that come with it. The authors of "Programming .NET 3.5" provide you with the "why" for each technology and how it works. You also get some examples to let them work together so you understand the interfaces. In the end you have a good overview and know how to accomplish basic tasks.

The table of contents is available from O'Reilly — together with a chapter preview — here. The book does not come with any extras but includes the usual free 45 days access to the book on Safari.

The intended audience of this book are experienced .NET programmers. The point is that there are no sections that tell you details about C#, SQL servers or something like that. I don't recommend this book if you never worked on a .NET project and don't know how to set up a SQL database. You should be aware that the code is written in C#. You might use one of the software code converters if you prefer Visual Basic instead. I think the code is still readable even if you do not know C#. I appreciate the fact that the authors decided to use one language only because it keeps the book smaller. The authors assume you are using Visual Studio 2008. You don't necessarily need to update to 2008 if you are working with an older edition because you can use the free Express Edition to get started.

This book covers the key technologies in .NET. You might wonder now because there are books on each of these technologies such as Windows Presentation Foundation (WPF), Windows
Communication Foundation (WCF), XAML, AJAX,C# and Silverligth on its own. The key concept of the book is to show you how everything is connected with each other. As the authors note: "Our goal is to show you the 25% that you will use 85% of the time.".
From my point of view this is good because I have a .NET 2.0 background and wanted to know what is new in .NET 3.5 and how things are connected.

The book is divided in 3 main parts. The first is presentation which covers XAML, WPF and AJAX. The second describes how to take advantage of the design pattern support in .NET. The last part covers the business layer which includes LINQ, WCF, WF and CardSpace.

The first part starts with XAML. This is the eXtensible Application Markup Language from Microsoft to describe user interfaces. Now the title of the book reads "Programming .Net 3.5" and writing XML is not quiet programming — it is declarative programming. If you know ASP.NET, which splits a webpage into a presentation (HTML) and a code file (C#, VB), then you won't have a lot difficulties to get started on XAML. Be prepared for pages over pages of code. The good thing about this is the fact that you can run the examples. The disadvantage would be that you don't gain a lot by reading six pages of XAML layout description. The important parts are explained in the text anyways so I usually do not read the code. I think this may depend on your skill level.

The next main topic is using WPF which is the successor of Windows Forms. The authors explain how to connect data structures to the user interface which I consider to be one of the most important points when using WPF. You will also find a lot of code and XAML layout descriptions.

The chapter on Silverlight was not very helpful to me. Silverlight is the competitor of Adobe Flash. Giving samples how to layout a Silverlight application is essentially the same as a WPF application thus it dives into more details of XAML. I am missing the real Silverlight message so this part did not meet my expectations.

The third technology you will learn about is AJAX which leads us away from the desktop client to a web client. The explanation how AJAX works is pretty good.
The authors show you step by step how to create a todo list web-application with a database backend using ASP.NET and AJAX. Again, this does not cover all AJAX controls or ASP.NET but it shows you how the parts are interconnected and assumes that if you know how to handle one control, then you can also figure out how to handle all the others.
Most web applications need some kind of access control. At this point the authors argue that it is faster to implement your own security tables instead of using the ASP.NET forms-based controls. My opinion is that you should never do something that is not correct to teach something else. There are always people who get it wrong in a way you did not anticipate. My recommendation: use the ASP.NET components and do not implement them by yourself.

The second part about the design patterns was surprising to me because I expected the common introduction to standard design pattern. The Model-View-Controller project implements the pattern for ASP.NET and allows developers to incorporate it easily. The advantage is that you get a comprehensive and easy to understand introduction how .NET supports design pattern implementation. I guess this will lead some developers from theory of design patterns to actually implementing them.

I consider the third part to be the real interesting content. It starts with LINQ which bridges object-oriented code to relational databases. You get to know the differences to SQL and also the advantages it provides by explaining new concepts. The examples are easy to understand but successfully make their point.

Windows Communication Foundation (WCF) covers the hot Service-Oriented-Architecture (SOA) topic. The authors explain what it is all about but you will need some knowledge about Web Services and XML to really get it. The introduction is rather short but more details are explained in the corresponding example.

The chapter about Windows Workflow Foundation (WF) starts with a short example how you implement a workflow without WF. After that you get to see how you do the same with WF. This way the necessity for WF become clear and you understand how to take advantage of this technology.

Card Space is the successor of Microsoft passport which was not successful as an authentication service with respect to user acceptance. This is also the key issue that decides on the success of Card Space. Maybe the improved interoperability will help.
The chapter provides you with a short authenticate-yourself test and shows you how to offer Card Space authentication in your ASP.NET application.

The book is a good entry to the world of .NET 3.5 because it gives you an idea about every part and what it is good for. Maybe you do not need all of it for your job but at least you know that it exists and what it is good for.
I think it is reasonable that a comprehensive introduction to .NET 3.5 can not satisfy everybody because the range of topics is too broad. One can argue that this kind of information could also be retrieved from the net. I consider the book to be a better resource because it already summarizes the important information such that you do not drown in a flood of information.

There is also some criticism as I pointed out earlier. Maybe I am just a little picky about the details but if you print code download references into a book, they must be available. Most examples can be downloaded but Alex Horovitz site was not reachable when I tried to access it (several times) whereas http://www.jliberty.com/">the other one pointed to the correct location a few days after the first try.
Another personal remark is that I do not like to see quotes from Wikipedia. Other people might think different about that so you just need to decide on your own.

I rate this book as a 7 which means it is a good book. The authors scratch the surface of every topic and choose an appropriate style to explain it. You can tell that they thought about how to explain and motivate each topic on it's own. You expect to get the "how" but also get the "why".

Adrian Lambeck is a graduate student in "Media and Information Technologies" and worked with .NET for a few years."

Comment Re:Technology? (Score 1) 151

by lamaditx on Wednesday July 16, 2008 @04:23AM (#24209587) Attached to: Web 2.0: A Strategy Guide

right - this book is not for nerds ...

« Newer Older »

Slashdot Top Deals