How to make new good friend using deep learning

Sumeet More
6 min readDec 30, 2019

--

DISCLAIMER: Lately, we have seen people freak out on AI vs human debate. This article may strike some reader as offensive, disagreeable, or (worst of all)serious. It’s just a funny medium article to demonstrate how one can identify AI based problems. To make this article little fun and relatable, we have used friends scenario/example. Please adjust your expectations and interpretations accordingly. This article is purely for educational purposes.

This is my first time doing collaborative style of writing medium article with my very good friend Pranita.

So recently Pranita was telling me how she finds it hard to make new good friends now. When we are young we don’t have filters and hence making friends during childhood was easy and most of us have good connection/communication with school friends no matter how old we get. But when we get old, making friends is easy but making good friends is little tough due to multiple reasons. I found these points of her very valid. At same time, I felt deep learning can help us here by giving second opinion. Quickly, I told Pranita can you give me a dataset in which you put columns as important qualities for becoming good friend according to you and she delivered.

Before I go ahead with article, I would like to introduce co-writer of this post(my talented friend)Pranita. She holds Bachelor in Electronics Engineering and recently graduated from University of Illinois with MIS degree. She expertises in Business/Data Analytics and Intelligence. Datsets provided for this article is purely her efforts(so if you have questions with respect to how to model your datasets, she will be the best person to have conversation)

Let’s dive into how deep learning helped Pranita to solve her finding good friends problem.

Approach

  • We classified this problem as supervised learning problem where we have little data with labels but we need deep learning model to fully understand the data.
  • Whether someone can easily sync up or relate with her or not is classification problem
  • Pranita provided datasets.
  • I used ANN as my deep learning model to solve this classification problem.

Enough talking 😋 Let’s help Pranita.

  • Load dataset

you can clearly see how datasets look like, For her she values someone as good friend if that person respects other, is of same college/school, loves pets etc.

Let’s understand why we are using deep learning here at first place. We might have doubts like can’t we just code each case for her columns and get result? that makes sense but let’s calculate we have 8 columns so permutations and combinations can be 2⁸ and each value can take any value from 0.0 to 1.0 so things can go wild easily and we can’t hand code each cases so deep learning is your answer to this problem(I hope you are convinced now and she had only 8 columns, some of us might have 100 columns as well).

  • Visualize the datasets using pandas and matplot

First image, we check if there are any missing or null values.

2nd image, first we check what is count of datasets with 1 and 0 labels so from bar chart we can say she has given dataset where she has given more data points of person she thinks might not sync with her that well to become good friends hence 0 label blue color bar is higher than other one. Second, we see what columns/qualities matter to her most eg: she thinks her good friends should have a quality where they should respect others and help each others in their career growth

  • Preprocessing datasets

From this image, we can see datasets is divided into training and test. We applied scaler transformation to data to improve quality of data and did preprocessing on that datasets.

  • Build the model using Tensorflow 2 and Keras

I developed basic ANN. we can see we have first dense layer with 8 neurons (any guesses why 8? bcz her columns are 8).Added 4 neurons 2nd dense layer and finally 1 neuron 3rd dense layer to give 0 or 1 as classification output. binary_crossentropy is used as this is binary classification problem. Activation function is sigmoid( Incase you want to know why this only activation function and why not other one(like 'relu’), I can definitely write about it but in this article, those things are little out of scope). Again, you can go crazy on hidden layers and add more layers. I will leave upto you. Careful reader might question what is this early_stop?One of the classic problems in building models is overfitting(Overfitting is a modeling error that occurs when a function is too closely fit to a limited set of data points). To avoid that problem, we use a mechanism which is called earlyStopping which tells model “dude, stop! Otherwise we might suffer from overfitting problem”.

  • Visualize Early Stopping

Hey sumit? what is this val_loss. Basically it tells how good your model is performing with respect to correct output label and model predicted output labels. You can spot from previous figure that we wanted model to run 600times(epochs) but in this figure we see model stopped at approx 250 because it found that model might face overfitting beyond this if we continue training on given datasets.

  • Model metrics and performance.

If you see F1 score corresponding accuracy it says 83% accuracy which is not bad for start to be honest and we can definitely improve. Then we see confusion matrix which basically tells 5 times model mispredicted label 0 and 4 times model mispredicted label 1.

  • Taking new data point and predicting on it. In this task, I gave same data point to Pranita as well and matched her opinion with model’s opinion

Love for pet: 0.65, Same school/college:0.65,Career:0.65,respect for others: 0.55, movie enthusiast: 1, music enthusiast: 0.34, Traveller:0.7, cleanliness: 0.25

Pranita thought they might not sync well and same was model’s answer. Yay! I know pranita in person and I know how much she cares for cleanliness so this response is no surprise for me. Hahahah!

Just to recollect, you can see which factors will influence Pranita’s decision according to model and you can clearly see cleanliness is one of the important factors for her according to model.

This way model will help Pranita with second opinion.

I hope you guys enjoyed this article ✌️ Next time I am thinking of exploring GAN deep learning model with some cool use case 😎

Happy Learning and Coding.

You can extend this article to solve problems like which team players to play together for one day and test match to get best performance. It is upto your imagination for extending technique mentioned in this article☺

--

--

Sumeet More
Sumeet More

Written by Sumeet More

Software Engineer 2 at Microsoft | Backend Engineer and Architect| Blockchain & ML enthusiast | C#,.NET Core, Rust, Javascript and Go

No responses yet