I was in the same position as OP about 5 years ago. I have a PhD in CS from many years back but in operating systems and programming languages. Around 2010 I wanted to get into machine learning and decided to enroll part-time in a university to take some classes. Currently, I am leading a small team of engineers that work on ML-related topics.
Here are some points that the OP needs to understand.
1. There are two different levels of expertise with working on machine learning: either as a library/tool user, or as a ML algorithm developer. It is EXACTLY analogous to how one approaches SQL: You can make a great living being a SQL user and knowing how to write efficient queries and build indexes, or you can go deeper and build the SQL engine itself along with its query optimizer, storage layer, etc. If you want to use ML as a library/tool user, you can have a great career as long as you know what tools and algorithms to use. If you want to be a ML algorithm developer, that means you want to work on the innards, such as using new SVM kernels or building new deep learning networks; for this role, you'll usually need a PhD-calibre background heavy in math. I personally started out as a library/tool user with Weka and Mallet, but as I used them more, I was able to understand the math behind them.
2. ML is an abstract field, and it's best to approach it from an applications point of view. Pick a problem that needs ML, such as natural language processing or image recognition. It's important to pick a problem that has an abundant amount of labelled data. There are some fields such as voice recognition where it is terribly difficult to get real labelled data. For NLP (aka computational linguistics), you can start with some basic problems such as document classification (e.g. for this document, is it about sports, business, entertainment, etc.?) or sentiment analytics (e.g. for this Twitter tweet, is it positive or negative?). There are lots of good datasets in the NLP field.
3. You can explore datasets from the Kaggle competitions and the University of California, Irvine, repository: http://archive.ics.uci.edu/ml/
4. Pick a tool and stick with it. I have used Weka, Mallet, and R. You can also use Python and Matlab.
5. When you read the literature, you will find two nearly-synonymous terms: "machine learning" and "data mining". Both are closely related. Machine learning historically comes from the AI community and generally focuses on building better ML algorithms and solving supervised ML problems. Data mining historically comes from the database community and generally focuses on using tools and solving unsupervised ML problems (e.g. finding clusters of similar customers).
6. At the end of the day, creating a better solution does not come down to the ML algorithms themselves. Rather, the better solution comes from the amount of data and what features you are able to extract. As for the many ML algorithms for supervised learning: at the end of the day, your main responsibility will come down to picking the one that best suits your application. It is just like picking which sorting algorithm to use: when do you use Quicksort, and when do you use Mergesort?
7. Here are some really good books that I have personally read:
- Programming Collective Intelligence by T. Segaran.
- Introduction to Data Mining by P.-N. Tan and M. Steinbach.
- Data Mining: Practical Machine Learning Tools by I. Witten and E. Frank. (goes with the Weka tool)
- Artificial intelligence: A Modern Approach by S. Russell and P. Norvig. (touches on all aspects of AI, such as tic-tac-toe algorithms with minimax and First Order Logic)
- Introduction to Machine Learning by E. Alpaydin
PROTIP: How to tell if you're reading an advanced machine learning book -- if the index contains reference to Vapnik–Chervonenkis dimension or shattering, then the book is hardcore.