For the past 10 years, the internet industry has been dominated by the LAMP stack, EJBs, and Ruby on Rails. Most major industries now have an online presence, large IT departments, and a good amount of business driven by the web. Major companies like , say, Macy’s, have a massive and successful online business.
Now they want more. They want personalized recommendations. They want to compete with eBay, Amazon, Netflix, etc. They see the major players— who have been doing this for 10 years — eating their lunch.
And the standard technology, and the standard coders, just can not deliver
Make no mistake; Large companies, like Google, Amazon, and EBay, have been doing machine learning for a long time. Amazon starting developing their recommender 10 years ago.
And high performance,machine learning solutions have been around over 10 years. For example, SVM Light is now at version 6
Packages like Scikit Learn are , for a large part, wrapper classes to the long standing academic codes.
Even Hadoop is ancient; Google did map reduce 10 years ago
I personally developed a Transductive Learning Algo, based on ideas from quantum field theory, for personalized recommendations over 10 years ago:Applied Machine Learning For Search Engine Relevance
But the market was not ready
Most IT shops have been dominated by PHP and Ruby guys, and have absolutely no idea how to use this stuff. Indeed, products like Ruby on Rails are flat out useless for even traditional data mining (i.e. using Star Schemas), lets alone large scale map reduce and/or machine learning applications. So the industry has been in a chokehold for the past few years. It has been necessary to set up entirely new data science departments that don’t report IT and can generate revenue, not just spend money.
IT cost is a cost sink; machine learning generates revenue
A breakthrough came with Aardvark—the features Lean Startup to really successfully using machine learning in a very different way. All of sudden small entrepreneurs, angels , and VCs realized, the technology is ready.
I would also argue that companies like Demand Media opened the floodgates; we were the first $1B IPO since Google, and our technology was driven by NLP and machine learning. That’s $200M to $300M / year in pure ad arbitrage.
So a huge number of people starting putting together great open source packages, like Scikit Learn, Apache Mahout, and Graphlab.
And the academic world had 4 years to produce a whole new crop of engineers with training in ML and python
The time is now ripe