I've been on the Data Science road for about 5 months. I initially became intrigued by the idea of Data Science on January 5th, 2015. This came about when I inquired about starting my master's degree in mathematics and was informed about a concentration in Data Science. "What is Data Science?" I wondered. So I started looking into it.
Here's my initial thought progression over time:
"I have no idea what that is"
"Sure looks trendy!"
"I bet this is one of those things everybody and their dog will want to do but without having the math chops to really be good... just like developers."
"Ooohhhhhh, look at all these tools! Hey! I've heard of a lot of these! Hadoop! CouchDB! MongoDB! Ummm, Spark? Dremmel? Spanner? Voldemort?! Where does this list end?!?!?! This must be Data Science!"
"Oh, so Data Science is an umbrella term. Underneath that is Predictive Analytics, Machine Learning, Data Engineering, Data Architecture, Computer Vision, Natural Language Processing [list goes on]. Ok, so we're back to Computer Science."
Me now: "Why don't they just call it Data-centric Computer Science?" "Because it's not catchy enough and it wouldn't pay as well." Oh yeah.
So we've come full circle. It's always been Computer Science. Some of us just took more math classes. All of those tools I mentioned in my post Dear Gournal have as much to do with data science as Matlab has to do with Mathematics. You wouldn't say Matlab is Mathematics. You would say Matlab is a Mathematical tool. In the same way, all of those technologies are Data Science tools, but they are not Data Science. I'm glad I'm realizing that now.
I'm almost through with my Probability Models class. I am somehow riding on a low A, and hope to finish strong, but the latest lessons on Poisson Processes and Renewal Theory are clouding my head. Still, it's been a very good class. Now I know what actuaries do! And I know I do not want to be one. Still, probability will never not be useful. It is at the core of what I want to do. AI, for example, is heavily based on prob. DS which is heavily stats based, is inherently prob based as well. Predictive Analytics, for example, would be impossible without probability theory. I hope to bring some of that to the table. Next semester, I'm taking Prob & Stat II. I think this class prepared me pretty well for it.
But while I now realize what Data Science is more or less, it only makes me realize how much I don't know. It's not as simple as learning a few tools and technologies. It's about learning the fundamentals of statistical analysis and probability, and then the things that build on that, like machine learning and predictive analytics. I'm excited and scared at the same time. It's terrifying if you try to eat the elephant in one bite. So I'm trying to take it a byte at a time, starting with the toe.
In my spare time, I'm working on a sports database, and a statistical analysis of a fantasy prognosticator. I've finally got the database together, and now I'm working toward the guru analysis. It's taken a little over a month, and I'm guessing will take another month to finish up, but I'm proud of it nonetheless. It's stupid but has been a fun exercise in Data Science.
Not much has changed at work except that I now have a project, finally. I am tasked with bringing Instant Messaging to the company. I mean, we already have IM, but it's internal only. This has to be internal AND external. So I'm rolling with XMPP. At least I hope. I have a meeting with our security team tomorrow to discuss the feasibility (security-wise, not technical). I think it's going to be a fun project. I'm planning on rolling with OpenFire. Should be pretty straightforward.
Anyway, that's it for now. Gonna go read more Anti-Fragile.