titanic dataset github
The sinking of the RMS Titanic is one of the most infamous shipwrecks inhistory. GitHub Gist: instantly share code, notes, and snippets. Please refer to Kaggle for more details about the dataset. Learn more, We use analytics cookies to understand how you use our websites so we can make them better, e.g. Introduction. Analyzing Titanic Dataset with Python. This is a modified dataset from datasets package. However, I'm using this opportunity to explore a well known set as a first post to my blog. Skip to content. samiranberahaldia / Feature Selection - Titanic Dataset. You can view a description of this dataset on the Kaggle website, where the data was obtained (https://www.kaggle.com/c/titanic/data). Learn more. ... instant-weka-howto / dataset / titanic.arff Go to file Go to file T; Go to line L; Copy path Cannot retrieve contributors at this time. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic. The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables. If nothing happens, download GitHub Desktop and try again. For more information, see our Privacy Statement. Purpose: To performa data analysis on a sample Titanic dataset. training set (train.csv) [ ] Update missing value for Cabin if some parent has Cabin information, [X] Convert Embarked from text to Numeric, [X] Pack the families in groups (Same cabin, same lastname,...), [X] Feature engineering ( new features from current ones ). This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner 'Titanic', summarized according to economic status (class), sex, age and survival. GitHub Gist: instantly share code, notes, and snippets. fyyying / titanic_dataset.csv. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'. They hope that kagglers will help to create better models, find some unique insights and improve geo-analytics. Work fast with our official CLI. This dataset has been analyzed to death with many more sophisticated measures than a logistic regression. SMOTE Before the data balancing, we need to split the dataset into a training set (70%) and a testing set (30%), and we'll be applying smote on the training set only. If nothing happens, download the GitHub extension for Visual Studio and try again. Classification, Clustering, Causal-Discovery . Embed. fyyying / titanic_dataset.csv. GitHub is where people build software. Star 0 Fork 0; Star Code Revisions 2. https://medium.com/@NotAyushXD/workflow-of-a-machine-learning-project-ec1dba419b94.