In recent times, Machine Learning has emerged as one of the most talked about topics in the field of information science and technology. Although the subject has probably intrigued the researchers and academicians for decades, it's only now every Software Engineer is trying to get a hang of it. It's changing the face of computing for ever and in a way, accelerating the move from Software Eating The World to Software Eating Software.
In this blog, I will cover (rather list out) some of the top Machine Learning libraries and services available for a Java developer and try to highlight some of the salient points associated with each of those libraries. Please note some of the most powerful machine learning libraries are in Python and they are not covered in this post at all.
Before I go to the list of libraries, here is a short list of algorithms or broad categories of problems that most of the Machine Learning libraries would cover either fully or partially -
Here are few examples of application:
1. Apache SPARK MLlib
2. Deeplearning4j - http://deeplearning4j.org
3. Apache Mahout - http://mahout.apache.org
5. Google Prediction APIs (as service) - https://cloud.google.com/prediction
Types of the problems where you may see few ready examples
5. IBM Watson + AlchemyAPI (as service) - http://www.ibm.com/smarterplanet/us/en/ibmwatson
You can try Alchemy APIs at http://www.alchemyapi.com/products/demo . One of the newest and a very significant acquisition by IBM is Alchemy API. Here are the two primary offerings from Alchemy API -
6. MS Machine Learning (as service) - http://azure.microsoft.com/en-in/services/machine-learning
Microsoft has done big deal around coming up with an intuitive UI where users can create a model, train it and run the analytics - all in drag and drops. For a new user it would take sometime to get accustomed to the various UI controls and how Application works. Developers can create her own model and sell it in Azure Marketplace.
Outlook account works seamlessly.
7. Amazon AWS Machine Learning - https://aws.amazon.com/machine-learning
Others deserving a mention but could not make it to the list of top 10
JSAT - https://code.google.com/p/java-statistical-analysis-tool
LensKit - http://lenskit.org
Connect to me on twitter: @satya_paul
In this blog, I will cover (rather list out) some of the top Machine Learning libraries and services available for a Java developer and try to highlight some of the salient points associated with each of those libraries. Please note some of the most powerful machine learning libraries are in Python and they are not covered in this post at all.
Before I go to the list of libraries, here is a short list of algorithms or broad categories of problems that most of the Machine Learning libraries would cover either fully or partially -
- Classification
- Regression
- Clustering
- Ranking
Here are few examples of application:
- Outlier Detection
- Recommendation
- Natural Language Processing
- Neural Networks
1. Apache SPARK MLlib
- Top ML Library and growing fast
- Part of Apache Spark
- Infinite Scalability
- On Premise - https://spark.apache.org/mllib
- As Service - https://databricks.com/product/databricks-cloud
2. Deeplearning4j - http://deeplearning4j.org
- One of the top ML Libraries in Java
- Integrates with Hadoop, Spark
- Use Cases
- Face/image recognition
- Voice search
- Speech-to-text (transcription)
- Spam filtering (anomaly detection)
- E-commerce fraud detection
- Regression
3. Apache Mahout - http://mahout.apache.org
- Runs on Hadoop Cluster, so infinite scalability
- Good for recommendations
5. Google Prediction APIs (as service) - https://cloud.google.com/prediction
Types of the problems where you may see few ready examples
- Classification
- Regression
5. IBM Watson + AlchemyAPI (as service) - http://www.ibm.com/smarterplanet/us/en/ibmwatson
You can try Alchemy APIs at http://www.alchemyapi.com/products/demo . One of the newest and a very significant acquisition by IBM is Alchemy API. Here are the two primary offerings from Alchemy API -
- AlchemyLanguage - Text Analytics and Natural Language Processing.
- AlchemyVision - Leverages deep learning for photo and image processing.
6. MS Machine Learning (as service) - http://azure.microsoft.com/en-in/services/machine-learning
Microsoft has done big deal around coming up with an intuitive UI where users can create a model, train it and run the analytics - all in drag and drops. For a new user it would take sometime to get accustomed to the various UI controls and how Application works. Developers can create her own model and sell it in Azure Marketplace.
Outlook account works seamlessly.
7. Amazon AWS Machine Learning - https://aws.amazon.com/machine-learning
- Provides visualisation tools to create ML Models
- Simple API support for the models generated this way
- Highly Scalable, can generate billions of predictions in day
- Provides a graphical user interface, command line interface and Java API
- One of the most popular Java machine learning library
- Available under GPL License
- Statistical natural language processing, document classification, clustering, topic modeling and information extraction.
- In-memory data engine
- Designed for running various types of types of statistical computations (including Deep Learning)
- Works with Hadoop Distributed File System
Others deserving a mention but could not make it to the list of top 10
JSAT - https://code.google.com/p/java-statistical-analysis-tool
- Library for quickly getting started with Machine Learning problems
- Available under GPL 3 but author is open for discussion
- List of supported algorithms is impressive
- One man project, done in his free time. Creator is Edward Raff @EdwardRaffML
LensKit - http://lenskit.org
- Focused on building recommender system, primarily for research based projects
- Good for trying out. For scale and for production env, one can move to Apache Mahout.
- Actively developed, managed.
- Built on top of Mahout
- Supports streaming instead of batch jobs, making it realtime
- Still in early stage.
- Provides a collection of algorithms
- No new release since 2012
Connect to me on twitter: @satya_paul