For most businesses, machine learning seems close to rocket science, appearing expensive and talent demanding. And, if you’re aiming at building another Netflix recommendation system, it really is. But the trend of making everything-as-a-service has affected this sophisticated sphere, too. You can jump-start an ML initiative without much investment, which would be the right move if you are new to data science and just want to grab the low hanging fruit.
One of Machine Learning most inspiring stories is the one about a Japanese farmer who decided to sort cucumbers automatically to help his parents with this painstaking operation. Unlike the stories that abound about large enterprises, the guy had neither expertise in machine learning, nor a big budget. But he did manage to get familiar with TensorFlow and employed deep learning to recognize different classes of cucumbers.
By using machine-learning cloud services, you can start building your first working models, yielding valuable insights from predictions with a relatively small team. We’ve already discussed machine learning strategy. Now let’s have a look at the best machine learning platforms on the market and consider some of the infrastructural decisions to be made.
What is machine learning as a service
Machine learning as a service (MLaaS) is an umbrella definition of automated and semi-automated cloud platforms that cover most infrastructure issues such as data pre-processing, model training, and model evaluation, with further prediction. Prediction results can be bridged with your internal IT infrastructure through REST APIs.
Amazon Machine Learning services, Azure Machine Learning, and Google Cloud AI are three leading cloud MLaaS services that allow for fast model training and deployment with little to no data science expertise. These should be considered first if you assemble a homegrown data science team out of available software engineers. Have a look at our data science team structures story to have a better idea of roles distribution.
Within this article, we’ll first give an overview of the main machine-learning-as-a-service platforms by Amazon, Google, and Microsoft, and will follow it by comparing machine learning APIs that these vendors support. Please note that this overview isn’t intended to provide exhaustive instructions on when and how to use these platforms, but rather what to look for before you start reading through their documentation.
Machine learning services for custom predictive analytics tasks Predictive analytics with Amazon ML
Amazon Machine Learning services are available on two levels: predictive analytics with Amazon ML and the SageMaker tool for data scientists.
Amazon Machine Learning for predictive analytics is one of the most automated solutions on the market and the best fit for deadline-sensitive operations. The service can load data from multiple sources, including Amazon RDS, Amazon Redshift, CSV files, etc. All data preprocessing operations are performed automatically: The service identifies which fields are categorical and which are numerical, and it doesn’t ask a user to choose the methods of further data preprocessing (dimensionality reduction and whitening).
Prediction capacities of Amazon ML are limited to three options: binary classification, multiclass classification, and regression. That said, this Amazon ML service doesn’t support any unsupervised learning methods, and a user must select a target variable to label it in a training set. Also, a user isn’t required to know any machine learning methods because Amazon chooses them automatically after looking at the provided data.
This high automation level acts both as an advantage and disadvantage for Amazon ML use. If you need a fully automated yet limited solution, the service can match your expectations. If not, there’s SageMaker.
Amazon SageMaker and frameworks-based services
SageMaker is a machine learning environment that’s supposed to simplify the work of a fellow data scientist by providing tools for quick model building and deployment. For instance, it provides Jupyter, an authoring notebook, to simplify data exploration and analysis without server management hassle. Amazon also has built-in algorithms that are optimized for large datasets and computations in distributed systems. These include:
- Linear learner, a supervised method for classification and regression
- Factorization machines for classification and regression designed for sparse datasets
- XGBoost is a supervised boosted trees algorithm that increases prediction accuracy in classification, regression, and ranking by combining the predictions of simpler algorithms
- Image classification based on ResNet, which can also be applied for transfer learning
- Seq2seq is a supervised algorithm for predicting sequences (e.g. translating sentences, converting strings of words into shorter ones as a summary, etc.)
- K-means is an unsupervised learning method for clustering tasks
- Principal component analysis used for dimensionality reduction
- Latent Dirichlet allocation is an unsupervised method used for finding categories in documents
- Neural topic model (NTM) is an unsupervised method that explores documents, reveals top ranking words, and defines the topics (users can’t predefine topics, but they can set the expected number of them)
Built-in SageMaker methods largely intersect with the ML APIs that Amazon suggests, but here it allows data scientists to play with them and use their own datasets.
If you don’t want to use these, you can add your own methods and run models via SageMaker leveraging its deployment features. Or you can integrate SageMaker with TensorFlow and MXNet, deep learning libraries.
Generally, Amazon machine learning services provide enough freedom for both experienced data scientists and those who just need things done without digging deeper into dataset preparations and modeling. This would be a solid choice for companies that already use Amazon environment and don’t plan to transition to another cloud provider.
Microsoft Azure Machine Learning Studio
Azure Machine Learning is aimed at setting a powerful playground both for newcomers and experienced data scientists. The roster of ML products from Microsoft is similar to the ones from Amazon, but Azure, as of today, seems more flexible in terms of out-of-the-box algorithms.
Services from Azure can be divided into two main categories: Azure Machine Learning Studio and Bot Service. Let’s find out what’s under the hood of Azure ML Studio. We’ll return to Bot Service in the section dedicated to specific APIs and tools.
ML Studio is the main MLaaS package to look at. Almost all operations in Azure ML Studio must be completed manually. This includes data exploration, preprocessing, choosing methods, and validating modeling results.
Approaching machine learning with Azure entails some learning curve. But it eventually leads to a deeper understanding of all major techniques in the field. On the other hand, Azure ML supports graphical interface to visualize each step within the workflow. Perhaps the main benefit of using Azure is the variety of algorithms available to play with. The Studio supports around 100 methods that address classification (binary+multiclass), anomaly detection, regression, recommendation, and text analysis. It’s worth mentioning that the platform has one clustering algorithm (K-means).
Another big part of Azure ML is Cortana Intelligence Gallery. It’s a collection of machine learning solutions provided by the community to be explored and reused by data scientists. The Azure product is a powerful tool for starting with machine learning and introducing its capabilities to new employees.
Google Prediction API
Google provides AI services on two levels: a machine learning engine for savvy data scientists and highly automated Google Prediction API. Unfortunately, Google Prediction API has been deprecated recently and Google is pulling the plug on April 30, 2018.
The doomed Predicion API resembles Amazon ML. Its minimalistic approach narrows down to solving two main issues: classification (both binary and multiclass) and regression. Trained models can be deployed through the REST API interface.
Google doesn’t disclose exactly which algorithms were utilized for drawing predictions and didn’t allow engineers to customize models. On the other hand, Google’s environment was the best fit for running machine learning within tight deadlines and the early launch of the ML initiative. But it seems that the product wasn’t nearly as popular as Google expected. It’s a shame that those who were using Prediction API will have to “recreate existing models” using other platforms as the end-of-life FAQ suggests.
So, what’s coming instead?
Google Cloud Machine Learning Engine
High automation of Prediction API was available at the cost of flexibility. Google ML Engine is the direct opposite. It caters to experienced data scientists, it’s very flexible, and it suggests using cloud infrastructure with TensorFlow as a machine learning driver. So, ML Engine is pretty similar to SageMaker in principle.
TensorFlow is another Google product, which is an open source machine learning library of various data science tools rather than ML-as-a-service. It doesn’t have visual interface and the learning curve for TensorFlow would be quite steep. However, the library is also targeted at software engineers that plan transitioning to data science. TensorFlow is quite powerful, but aimed mostly at deep neural network tasks.
Basically, the combination of TensorFlow and Google Cloud service suggests infrastructure-as-a-service and platform-as-a-service solutions according to the three-tier model of cloud services. We talked about this concept in our whitepaper on digital transformation. Have a look, if you aren’t familiar with it.
To wrap up machine-learning-as-a-service platforms, it seems that Azure currently has the most versatile toolset on the MLaaS market. It covers most ML-related tasks, provides a visualization interface for building custom models, and has a solid set of APIs for those who don’t want to nail data science with their bare hands. However, it still lacks automation capacities available at Amazon.
Machine learning APIs from Amazon, Microsoft, and Google comparison
Besides full-blown platforms, you can use high-level APIs. These are the services with trained models under the hood that you can feed your data into and get results. APIs don’t require machine learning expertise at all. Currently, the APIs from these three vendors can be broadly divided into three large groups:
1) text recognition, translation, and textual analysis
2) image + video recognition and related analysis
3) other, that includes specific uncategorized services
Speech and text processing APIs: Amazon
Amazon provides multiple APIs that aim at popular tasks within text analysis. These are also highly automated in terms of machine learning and just need proper integration to work.
Amazon Lex. The Lex API is created to embed chatbots in your applications as it contains automatic speech recognition (ASR) and natural language processing (NLP) capacities. These are based on deep learning models. The API can recognize written and spoken text and the Lex interface allows you to hook the recognized inputs to various back-end solutions. Obviously, Amazon encourages use of its Lambda cloud environment. So, prior to subscribing to Lex, get acquainted with Lambda as well. Besides standalone apps, Lex currently supports deploying chatbots for Facebook Messenger, Slack, and Twilio.
Amazon Transcribe. While Lex is a complex chatbot-oriented tool, Transcribe is created solely for recognizing spoken text. The tool can recognize multiple speakers and works with low-quality telephony audio. This makes the API a go-to solution for cataloging audio archives or a good support for the further text analysis of call-center data.
Amazon Polly. The Polly service is kind of a reverse of Lex. It turns text into speech, which will allow your chatbots to respond with voice. It’s not going to compose the text though, just make the text sound close to human. If you’ve ever tried Alexa, you’ve got the idea. Currently, it supports both female and male voices for 25 languages, mostly English and Western European ones. Some languages have multiple female and male voices, so there’s even a variety to choose from. Like Lex, Polly is recommended for use with Lambda.
Amazon Comprehend. Comprehend is another NLP set of APIs that, unlike Lex and Transcribe, aim at different text analysis tasks. Currently, Comprehend supports:
- Entities extraction (recognizing names, dates, organizations, etc.)
- Key phrase detection
- Language recognition
- Sentiment analysis (how positive, neutral, or negative a text is)
- Topic modeling (defining dominant topics by analyzing keywords)
This service will help you analyze social media responses, comments, and other big textual data that’s not amenable to manual analysis, e.g. the combo of Comprehend and Transcribe will help analyze sentiment in your telephony-driven customer service.
Amazon Translate. As the name states, the Translate service translates texts. Amazon claims that it uses neural networks which – compared to rule-based translation approaches – provides better translation quality. Unfortunately, the current version supports translation from only six languages into English and from English into those six. The languages are Arabic, Chinese, French, German, Portuguese, and Spanish.
Speech and text processing APIs: Microsoft Azure Cognitive Services
Just like Amazon, Microsoft suggests high-level APIs, Cognitive Services, that can be integrated with your infrastructure and perform tasks with no data science expertise needed.
Speech. The speech set contains four APIs that apply different types of natural language processing (NLP) techniques for natural speech recognition and other operations:
- Translator Speech API
- Bing Speech API to convert text into speech and speech into text
- Speaker Recognition API for voice verification tasks
- Custom Speech Service to apply Azure NLP capacities using own data and models
Language. The language group of APIs focuses on textual analysis similar to Amazon Comprehend:
- Language Understanding Intelligent Service is an API that analyzes intentions in text to be recognized as commands (e.g. “run YouTube app” or “turn on the living room lights”)
- Text Analysis API for sentiment analysis and defining topics
- Bing Spell Check
- Translator Text API
- Web Language Model API that estimates probabilities of words combinations and supports word autocompletion
- Linguistic Analysis API used for sentence separation, tagging the parts of speech, and dividing texts into labeled phrases
Article by channel:
Everything you need to know about Digital Transformation
The best articles, news and events direct to your inbox
Read more articles tagged: Cloud