Slashdot Log In
Internet Data Mining for Investment Analysis
Posted by
ScuttleMonkey
on Wed Feb 15, 2006 08:31 AM
from the real-time-economic-snapshot dept.
from the real-time-economic-snapshot dept.
CaroKann writes "Reuters is reporting on a Wall Street investment research company, Majestic Research, that is using web crawling techniques to track business performance. Instead of attempting to estimate business conditions by talking to company management, or pounding the pavement visiting stores, this company uses data mining systems to collect real-time sales data and other information on companies that have a web presence. Using this data, Majestic attempts to estimate company earnings more accurately than traditional research outfits."
This discussion has been archived.
No new comments can be posted.
Internet Data Mining for Investment Analysis
|
Log In/Create an Account
| Top
| 74 comments
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Traditional Wall Street Research? (Score:5, Funny)
(http://slashdot.org/~eldavojohn/ | Last Journal: Tuesday October 16, @03:26PM)
Economics and future fiscal predictions are completely theoretical. There are just too many variables involved, folks.
Now that the companies know that... (Score:3, Insightful)
Cue the web spam... (Score:5, Insightful)
(http://robvincent.net/ | Last Journal: Tuesday October 09, @01:55PM)
My data mining results (Score:4, Funny)
I call Bull (Score:4, Insightful)
(http://sourceforge.net/projects/karekol/)
mining online news stories for word connotations (Score:3, Interesting)
(http://www.leftwingmediamachine.blogspot.com/ | Last Journal: Sunday January 15 2006, @10:04AM)
I posted the preliminary code online in the perl newsgroup.
google "data mining" "news" "perl" etc
Realtime News Analysis (Score:2, Interesting)
Will it work? Yes and no. (Score:2)
But the real problem with everything like this is.. even if it works well for many things... there will be those who will try to missuse it.. and finding all those will be very hard. Further it only takes one major problem case and your nice product becomes a laughing stock.
This is great (Score:2, Interesting)
This is a good thing for mankind.
the rise of the machines... (Score:3, Insightful)
(http://www.developeradvantage.com/)
I remember back in grad school in the late 90s I worked on a major project to design an intelligent agent based system including the same functionality, but, in addition to pulling information off the internet, it could also take into account whatever other information could be gathered and interfaced into it (for example, there is also a lot of content on TV which could be fed into a system, in addition to the online data). It was a design project though and not implemented, perhaps I will need to resurrect it!
I do think the whole area of quantitative or at least semi-quantitative analysis of information, both textual and numerical, is going to explode over the next few years, driven by vast amounts of incredibly cheap computing power and bandwidth. Computer applications do amazing stuff right now, but five years from now truly "intelligent" applications will exist. The term "artificial intelligence" has fallen out of fashion, perhaps a sign of how common place these systems have now become.
As an example, our local phone company has a voice recognition system which actually works reasonably well, much, much better than anything 5-10 years ago. We are certainly making progress.
Ties to Majestic 12? (Score:2, Interesting)
Will it work? (Score:2, Informative)
Several groups do this (Score:3, Interesting)
- An eBay crawler that could estimate the number of auctions and average selling price to predict whether eBay would make their earnings target or not. eBay quickly blacklisted their IP space, so they started using a bunch of open proxies they found.
- By analyzing client/server communication for the Sims Online, they discovered that each connection was assigned a sequentially incrementing connection ID number. By looking at the rate at which the connection ID numbers were increasing each time they logged in, they determined that the Sims Online wasn't going to be nearly as popular as Electronic Arts was forecasting.
- They talked about placing a camera somewhere in Union Square (in SF) to monitor the entrace to Tiffany's during the holiday shopping season, and doing image analysis to determine what percentage of shoppers left the store with a Tiffany's bag in hand.
- Monitoring wireless carriers' spectrum to determine what percentage of GSM/CDMA channels were in use for data vs. voice. The communication itself is encrypted of course, but you can still tell whether a channel is carrying voice or data. They wanted to determine if wireless carriers forecasts about revenue from data services were accurate.
Well, then they should analyze gold (Score:2)
Seriously, just watch what happens when the fed decides to print up money to try and stall off a cascading credit collapse. They will print up some, but that will make things worse because it will drive up costs without driving up pay or driving down personal debt. So they will print up more, and that will make things more worse for the same reasons, and so on. When it is all over, costs will likely be 10 x higher while pay stas about the same. I woulnd't be supprised if the dollar stopped being a currency.
Differing Research Methods (Score:1)
(http://www.failuretolaunch.net/)
This is not new (Score:1)
Maybe they'll soon announce a deal with Google? (Score:1)
(http://thinkabdul.com/)
Difference between a Prediction and a Summary (Score:1)
I like to highlight that there is a difference between a Prediction and a Summary. From what I read so far, the tool posted in the article generates a summary, which maybe used as a prediction.
Let s(t) be the Summary of a system (in this case, the economy) at any given time, then:
A prediction, p(S), would be a prediction based on a set of summary S, where: S == {s(t), s(t-1), s(t-2),
One can always make a prediction based on a very small number of summaries. |S| = 0 is a guess. |S| = 1 means that no past summaries are considered in the predication, just the most up-to-date one. Presumably, the bigger |S| is, the more information is considered in that summary.
The usefulness of such a tool lies in the value of t. Web-crawling allows one to collect much data in a small amount of time. If one is able to collect a summary quicker than everyone else, then presumably, someone using this summary tool would be able to stay ahead of the trend.
That being said, one of the input of s(t) is actually publicly available data. Financial reports events after the fact. Information based on actual financial transactions (ones that you can collect if you plan a spybot at the central booth of a major retailer, for example) is much better. At the end of the day, if you want to play a really cut-throat, high profit game of stock trading, I think you are better off having insider info.
Cheers.
B. Pascal
Real-Time? (Score:1)
(http://snicks.bravehost.com/)
The more things change... (Score:1)
(http://www.brainbenc...ript.jsp?pid=4586726)
I have seen ones that scanned EDGAR filings, (got canceled when the company was destroyed in the 9/11 attack), campaign contributions (works wonderfully for the telco and other highly regulated industries). patent filings (generally surprisingly well, though no one knows why), job adss, and many others.
I even heard of one that analyzed free internet porn...(insert your favorite joke here, but it actually was a fairly good predictor. The cognitive psychology behind it was fascinating).
Using search engines and NORA text mining is basically a form of technical investing. If you have a data store of any kind whose contents influenced or are influenced by members of the market niche of a company, it can tell you something about the future of that niche. Thats just plain marketing 101...