Project One: Mode Develop

NOTE : This project is not perfect, still working on it as I learn Machine Learning.

With an upsurge in cyber-crimes related to Sim Card Swap fraud in developing countries, making fraud detection is a top priority. If we are able to estimate whether someone is going to commit Sim Card Fraud we can surely try to prevent it earlier. So I decided to take up this challenge and try to see if I can be able to detect given fake data if someone will swap their sim card or not. The project is not perfect but I wanted to try this for learning purposes. I decided to use only one Model for this project and that is LR

 

Intro

Predicting the likelihood of Sim Card Swap Fraud Occurrence.

  • Train and test the data samples
  • Normalize and summarize the data

I decided to use Logistic Regression for this project.

  • Logistic Regression. Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis.

Data

Sample Dataset

There can be many factors as to why someone would want to swap his/her sim card, I will just use few. The swap will be represented by 1 and 0 will represent not swapped. I created this data for this exercise.

Sample Output Representation:

 

Swap Not Swapped
1 0


Graphing the features in a pair plot

 

 Fake Data 

    • Data is not given in this case so I decided to create my own, I will identify Locations here though I will not use Location since we can have many customers living in the same Location.
ID Location Age Subscriber Complaints Monthly Payments USD Contacts Swap Agent
1 N/A 30 3 120 20 0
2 N/A 18 2 60 10 1
3 N/A 60 1 180 44 0
4 N/A 25 2 200 30 0
5 N/A 30 2 300 10 1
6 N/A 45 1 100 55 0
7 N/A 50 3 120 20 0
8 N/A 78 1 60 10 1
9 N/A 26 1 180 44 0
10 N/A 23 2 200 30 0
11 N/A 33 2 300 10 1
12 N/A 45 1 120 55 0
13 N/A 30 2 800 100 0
14 N/A 33 6 60 90 1
15 N/A 26 1 180 44 0
16 N/A 23 2 200 30 0
17 N/A 33 2 30 10 1
18 N/A 45 1 1200 55 0
19 N/A 66 1 50 100 0
20 N/A 78 1 60 10 1
21 N/A 26 1 180 44 0
22 N/A 23 2 200 30 0
23 N/A 33 2 300 10 1
24 N/A 45 1 120 55 0
25 N/A 66 1 50 100 0
26 N/A 78 1 60 10 1
27 N/A 26 1 180 44 0
28 N/A 23 2 200 30 0
29 N/A 33 2 300 10 1
30 N/A 45 1 120 55 0

Results

  • 0.625 Not very bad since the data is Random.

ROC

 

Preview of Data

data.describe()

Disclaimer :

Not perfectly done, I am still learning Machine Learning and if you want to join me on this project feel free to advice me or even make suggestions.

 

License

Copyright [2018] [Madonah Syombua]

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

 

 

Get Code Here