Part 2 of my previous story on using a multi-effect guitar processor as a portable one-stop-shop for all home playing needs This part continues on the quest to make a portable, USB battery powered…
Logistic regression is a simple, supervised, machine learning algorithm that is used to predict the result as “True” or “False”.
I will walk through a simple “Bank churn prediction” problem which deals with the customer’s maintenance of minimum balance in their account. The model relays upon so many features like the customer’s salary, age, etc.
I will be using the Jupyter notebook for running the project. I have used libraries like pandas, numpy, seaborn, matplotlib, sklearn.
Step 1:importing libraries and load the dataset.
Step2: cleaning the data.
Data cleaning in the sense we remove the null values present in data by replacing them with new values. Those new values can be the unknown value or the values generated by statistical functions as mean, mode, etc.
Step 3: Encoding the data using One Hot Encoder.
One hot encoder is used to encode the columns with categorical data. The encoder creates a new column for every categorical value and
Step 4: Creating a new data frame by concatenating the encoded columns.
While encoding the columns we have to do for only categorical attributes. For that first, we need to sort them as numerical and categorical. The target variable is the attribute that we have to predict.
Step 4. Divide into Training and Testing Data
We can train the model using data which we call as training data or training set. The training data is the one which already has the actual value that the model should have predicted and thus the algorithm changes the value of parameters to account for the data in the training set.
But how do we know after training the model is overall good ?
For that, we have test data/test set which is basically a different data for which we know the values but this data was never shown to the model before. Thus if the model after training is performing good on test set as well then we can say that the Machine Learning model is good.
step 5. Call the logistic regression model and fit it.
Using the sklearn linear model we will import logistic regression and fit the data to the model.
Step 6: predicting the outputs from the model.
the model predicts the outputs for the test data set which is trained b using the training data set. The model outputs as “0” or “1”.
Step 7: Evaluation metrics:
we have predefined functions in the sklearn library. I have imported some of them and used them.
Step 8: Cross-validation.
cross-validation is a technique in machine learning which is used to eliminate bias and variance. As we train the model so many times it will give us different accuracies. So to tackle with that I have used kfold and stratified kfold techniques.
stratified cross validation
these validations depict that my model works with 82.5%accuracy for the given data, which is neither under fitted nor over fitted
code can be accessed through the GitHub link:
Zapling Bygone is a fun and engaging game that can be downloaded for free. It is available in various app stores and can be enjoyed by people of all ages. The game is set in a fictional world where…
While I will always proudly remain a supporter and fierce advocate for WGC, I will be stepping down as Executive Director of Wild Goose Creative at the end of this month. What an incredible journey…
In regards to ethics in journalism, the rules seem to be applied more loosely when the story involves a celebrity. One of the ethical decision that was made this past weekend by journalist who…