How to choose a Machine Learning algorithm

January 5, 2017

Some of the most common examples of machine learning are Netflix’s algorithms which give movie suggestions based on movies you have watched in the past, or Amazon’s algorithms that recommend products based on what other customers bought before.

Typical algorithm model selection can be broadly decided on the following questions:

How much data do you have and is it continuous?
Is it classification or regression problem?
Predefined variables (Labeled), unlabeled or a mix?
Data class skewed?
What is the goal? – predict or rank?
Result interpretation easy or hard?

Here are the most used algorithms for various business problems:

Decision Trees:
Decision tree output is very easy to understand even for people from a non-analytical background. It does not require any statistical knowledge to read and interpret them. The fastest way to identify the most significant variables and the relation between two or more variables. Decision Trees are excellent tools for helping you to choose between several courses of action. Most popular decision trees are CART, CHAID, and C4.5 etc.

In general, decision trees can be used in real-world applications such as:

Investment decisions
Customer churn
Banks loan defaulters
Build vs Buy decisions
Company mergers decisions
Sales lead qualifications

Logistic Regression:
Logistic regression is a powerful statistical way of modeling a binomial outcome with one or more explanatory variables. It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution.

In general, regressions can be used in real-world applications such as:

Predicting the Customer Churn
Credit Scoring and Fraud Detection
Measuring the effectiveness of marketing campaigns

Support Vector Machines:
Support Vector Machine (SVM) is a supervised machine learning technique that is widely used in pattern recognition and classification problems – when your data has exactly two classes.

In general, SVM can be used in real-world applications such as:

detecting persons with common diseases such as diabetes
hand-written character recognition
text categorization – news articles by topics
stock market price prediction

Naive Bayes:
Is a classification technique based on Bayes’ theorem and very easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. Naive Bayes is also a good choice when CPU and memory resources are limiting factors.

In general, Naive Bayes can be used in real-world applications such as:

Sentiment analysis and text classification
Recommendation systems like Netflix, Amazon
To mark an email as spam or not spam
Facebook like face recognition

This algorithm generates association rules from a given data set. Association rule implies that if an item A occurs, then item B also occurs with a certain probability.

In general, Apriori can be used in real-world applications such as:

Market basket analysis like amazon – products purchased together
Auto complete functionality like Google to provide words which come together
Identify Drugs and their effects on patients

Random Forest : is an ensemble of decision trees. It can solve both regression and classification problems with large data sets. It also helps identify most significant variables from thousands of input variables.

In general, Random Forest can be used in real-world applications such as:

Predict patients for high risks
Predict parts failures in manufacturing
Predict loan defaulters

The most powerful form of machine learning being used today, is called “Deep Learning”.

In today’s Digital Transformation age, most businesses will tap into machine learning algorithms for their operational and customer-facing functions.

Arrange a Conversation

Browse

Article by channel:

Everything you need to know about Digital Transformation

Read more articles tagged: Featured, Machine Learning

Enabling Technologies

Popular Now

Related Articles