Technology

Lazy Predict – Find the best suitable ML model

As in the earlier blog “text classification using machine learning”, we saw a few drawbacks on how difficult it is to select the best ML models and time-consuming for tuning different model parameters to achieve better accuracy. To overcome this problem we will discuss here an awesome python library “Lazy Predict”. This module helps us find the best model for classification and regression based on our data.

It provides a Lazy Classifier for classification problems and Lazy Regression for regression problems.

Note: Lazy Predict takes high computational power and it was a little time-consuming for me to run high dimensional data with multiple features.

Let us see how it works:

First, install this library in your local system

pip install lazypredict

Dataset

Here we are not concentrating more on the dataset or its feature extraction and transformation steps, as it has been shown in the previous blog on “text classification using machine learning”.

To demonstrate lazy predict classification and regression problems we are using “Drug type” and “Wine quality” data both taken from kaggle.com

Code

Importing required libraries

import lazypredict

import pandas as pd

from sklearn.model_selection import train_test_split

from lazypredict.Supervised import LazyClassifier, LazyRegressor

Importing data and LazyClassifier model fitting

classificationData = pd.read_csv(“drugType.csv”)

classificationData.head()

X = classificationData..drop(columns=”Drug”)

y = classificationData.[“Drug”]

# Splitting our data into a train and test set

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.2,

random_state=42)

classifiers = LazyClassifier(ignore_warnings=True, custom_metric=None)

models,predictions = classifiers.fit(X_train, X_test, y_train, y_test)

print(models)

Here the model returns two values, different model names with its prediction accuracy.

Importing data and LazyRegressor model fitting

regressionData = pd.read_csv(“winequality.csv”)

regressionData.head()

X = regressionData.drop(columns=”quality”)

y = regressionData[“quality”]

# Splitting our data into a train and test set

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.2, random_state = 42)

regressors = LazyRegressor(ignore_warnings=True, custom_metric=None)

models, predictions = regressors.fit(X_train, X_test, y_train, y_test)

print(models)

Conclusion

Here, when we use the “Lazy Predict” library, different models are fitted on our data, and model results provide us with accuracy metrics for the given data. Observing the result we can then select the top 5 base models based on the best accuracy.

Later we can tune the parameters of those top models and get better accuracy.

As this library runs many different models at once it takes a lot of computational power. If you have low computational power I would suggest you use Google Colab.

We’re hiring!

Sounds like your cup of tea? Join us!