Gilbert

DevOps Engineer
interested in @MSA @CloudArchitect @Log

[Kaggle Study] Titanic - Machine Learning from Disaster

March 14, 2021 1 minute read

I started Kaggle Study alone.

I will write most of my writing in English to become familiar with English.

The first competition is ‘Titanic - Machine Learning from Disaster’, It’s very famous competition.

The kernel I’m currently studying is ‘EDA To Prediction(DieTanic)’.

Today I learned(TIL) will be updated at my kernel.

TIL

Sun Mar 14

Check competition overview and contents of the kernel.
Check type of features
- Categorical Features : Sex, Embarked
- Ordinal Features : Pclass
- Continuous Features : Age
Analysis a categorical feature(‘Sex’), a ordinal feature(‘Pclass’) and a continuous feature(‘Age’).
Make a new categorical feature(‘Initial’) from ‘Name’.
Fill continuous data(‘Age’).

Mon Mar 15

Analysis categorical feature(‘Embarked’).
Fill categorical data(‘Embarked’).

Tue Mar 16 ~ Wed Mar 18

I Studied another topics.

Fri Mar 19

Review that I studied up to now.

Sat Mar 20

Analysis two discrete features(‘SibSip’, ‘Parch’).
Pracitce Feature Enginnering for categorical values(Sum three categorical values(‘Pclass’, ‘Sex’, ‘Embarked’) -> Make a new feature).
Analysis a continuous feature(‘Fare’).
Summary Observations.
Check correlation between the features(Heatmap)
Feature Engineering and Data Cleaning
- Binning ‘Age’ Features -> ‘Age_band’
- Summation ‘Parch’ and ‘SibSp’ -> ‘Family_Size’
- ‘Family_Size’ == 0 -> ‘Alone’

Sun Mar 21

Feature Engineering and Data Cleaning
- ‘Fare’ -> ‘Fare range’ -> ‘Fare_cat’
Label Encoding
- ‘Sex’, ‘Embarked’, ‘Initial’
Drop unneeded features
- ‘Name’, ‘Age’, ‘Ticket’, ‘Fare’, ‘Cabin’, ‘Fare_Range’, ‘PassengerId’
Heatmap
Predicitve Modeling
- Logistic Regression
- Support Vector Machines(Linear and radial)
- Random Forest
- K-Nearest Neighbours
- Naive Bayes
- Decision Tree
- Logistic Regression

Mon Mar 22

Cross Validation
- K-Fold
- Confusion Matrix
Hyper-Parameters Tuning
- GridSearchCV
- Ensembling
  - Voting Classifier

Tue Mar 23

Hyper-Parameters Tuning
- Bagging
- Boosting
- Hyper-Parameter Tuning for AdaBoost
Confusion Matrix for the Best Model
- Feature Importance

Sun Mar 28

Finish ‘EDA To Prediction(DieTanic)’ kernel.
Add a few contents by referring to other blogs.
I will restart with the next kernel.

Leave a comment

You may also enjoy

[Docker] VM, Docker, Kubernetes 구조

[Java] Stream (스트림)

Stream (스트림)

[Java] Map Struct(Entity, DTO 변환)

Map Struct Entity ↔ DTO 변환을 용이하게 해주는 라이브러리

[Java] Abstract Class (추상클래스)

Abstract Class