[Kaggle Study] Titanic - Machine Learning from Disaster
I started Kaggle Study alone.
I will write most of my writing in English to become familiar with English.
The first competition is ‘Titanic - Machine Learning from Disaster’, It’s very famous competition.
The kernel I’m currently studying is ‘EDA To Prediction(DieTanic)’.
Today I learned(TIL) will be updated at my kernel.
TIL
Sun Mar 14
- Check competition overview and contents of the kernel.
- Check type of features
- Categorical Features : Sex, Embarked
- Ordinal Features : Pclass
- Continuous Features : Age
- Analysis a categorical feature(‘Sex’), a ordinal feature(‘Pclass’) and a continuous feature(‘Age’).
- Make a new categorical feature(‘Initial’) from ‘Name’.
- Fill continuous data(‘Age’).
Mon Mar 15
- Analysis categorical feature(‘Embarked’).
- Fill categorical data(‘Embarked’).
Tue Mar 16 ~ Wed Mar 18
- I Studied another topics.
Fri Mar 19
- Review that I studied up to now.
Sat Mar 20
- Analysis two discrete features(‘SibSip’, ‘Parch’).
- Pracitce Feature Enginnering for categorical values(Sum three categorical values(‘Pclass’, ‘Sex’, ‘Embarked’) -> Make a new feature).
- Analysis a continuous feature(‘Fare’).
- Summary Observations.
- Check correlation between the features(Heatmap)
- Feature Engineering and Data Cleaning
- Binning ‘Age’ Features -> ‘Age_band’
- Summation ‘Parch’ and ‘SibSp’ -> ‘Family_Size’
- ‘Family_Size’ == 0 -> ‘Alone’
Sun Mar 21
- Feature Engineering and Data Cleaning
- ‘Fare’ -> ‘Fare range’ -> ‘Fare_cat’
- Label Encoding
- ‘Sex’, ‘Embarked’, ‘Initial’
- Drop unneeded features
- ‘Name’, ‘Age’, ‘Ticket’, ‘Fare’, ‘Cabin’, ‘Fare_Range’, ‘PassengerId’
- Heatmap
- Predicitve Modeling
- Logistic Regression
- Support Vector Machines(Linear and radial)
- Random Forest
- K-Nearest Neighbours
- Naive Bayes
- Decision Tree
- Logistic Regression
Mon Mar 22
- Cross Validation
- K-Fold
- Confusion Matrix
- Hyper-Parameters Tuning
- GridSearchCV
- Ensembling
- Voting Classifier
Tue Mar 23
- Hyper-Parameters Tuning
- Bagging
- Boosting
- Hyper-Parameter Tuning for AdaBoost
- Confusion Matrix for the Best Model
- Feature Importance
Sun Mar 28
- Finish ‘EDA To Prediction(DieTanic)’ kernel.
- Add a few contents by referring to other blogs.
- I will restart with the next kernel.
Leave a comment