├── 01_泰坦尼克号(titanic) ├── .ipynb_checkpoints │ └── kaggle泰坦尼克之灾-checkpoint.ipynb ├── README.md ├── datasets │ ├── test.csv │ └── train.csv └── kaggle泰坦尼克之灾.ipynb ├── 02_共享单车(Bike Sharing Demand) ├── .ipynb_checkpoints │ └── kaggle 共享单车项目可视化-checkpoint.ipynb ├── input │ ├── test.csv │ └── train.csv ├── kaggle 共享单车项目可视化.ipynb └── output │ ├── bike_predictions.csv │ └── sampleSubmission.csv ├── 03_Facebook广告投放(Sales Conversion Optimization) ├── README.md └── datasets │ └── KAG_conversion_data.csv └── README.md /01_泰坦尼克号(titanic)/.ipynb_checkpoints/kaggle泰坦尼克之灾-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# kaggle泰坦尼克之灾" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "步骤:\n", 15 | "\n", 16 | "一.导入数据包与数据集\n", 17 | "\n", 18 | "二.数据分析\n", 19 | "\n", 20 | "1. 总体预览:了解每列数据的含义,数据的格式等\n", 21 | "2. .数据初步分析,使用统计学与绘图:初步了解数据之间的相关性,为构造特征工程以及模型建立做准备\n", 22 | "\n", 23 | "三.特征工程\n", 24 | "\n", 25 | "1. 根据业务,常识,以及第二步的数据分析构造特征工程.\n", 26 | "2. 将特征转换为模型可以辨别的类型(如处理缺失值,处理文本进行等)\n", 27 | "\n", 28 | "四.模型选择\n", 29 | "\n", 30 | "1. 根据目标函数确定学习类型,是无监督学习还是监督学习,是分类问题还是回归问题等.\n", 31 | "2. 比较各个模型的分数,然后取效果较好的模型作为基础模型.\n", 32 | "\n", 33 | "五.修改特征和模型参数\n", 34 | "\n", 35 | "1. 可以通过添加或者修改特征,提高模型的上限.\n", 36 | "2. 通过修改模型的参数,是模型逼近上限" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## 一.导入数据包与数据集" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": null, 49 | "metadata": { 50 | "collapsed": true 51 | }, 52 | "outputs": [], 53 | "source": [ 54 | "# 导入相关数据包\n", 55 | "import numpy as np\n", 56 | "import pandas as pd\n", 57 | "import seaborn as sns\n", 58 | "import matplotlib.pyplot as plt\n", 59 | "%matplotlib inline" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "root_path = './datasets'\n", 69 | "\n", 70 | "train = pd.read_csv('%s/%s' % (root_path, 'train.csv'))\n", 71 | "test = pd.read_csv('%s/%s' % (root_path, 'test.csv'))" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "## 二.数据分析" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "### 1.总体预览\n", 86 | "了解每列数据的含义,数据的格式等" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "train.head(5)" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "# 返回每列列名,该列非nan值个数,以及该列类型\n", 105 | "train.info()" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "test.info()" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "# 返回数值型变量的统计量\n", 124 | "# train.describe(percentiles=[0.00, 0.25, 0.5, 0.75, 1.00])\n", 125 | "train.describe()" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "## 2..数据初步分析,使用统计学与绘图\n", 133 | "目的:初步了解数据之间的相关性,为构造特征工程以及模型建立做准备" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "#存活人数\n", 143 | "train['Survived'].value_counts()" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "#### 1)数值型数据协方差,corr()函数\n", 151 | "来个总览,快速了解个数据的相关性" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": null, 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "#相关性协方差表,corr()函数,返回结果接近0说明无相关性,大于0说明是正相关,小于0是负相关.\n", 161 | "train_corr = train.drop('PassengerId',axis=1).corr()\n", 162 | "train_corr" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": null, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [ 171 | "#画出相关性热力图\n", 172 | "a = plt.subplots(figsize=(15,9))#调整画布大小\n", 173 | "a = sns.heatmap(train_corr, vmin=-1, vmax=1 , annot=True , square=True)#画热力图" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "#### 2)各个数据与结果的关系\n", 181 | "进一步探索分析各个数据与结果的关系" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "##### ①Pclass,乘客等级,1是最高级\n", 189 | "结果分析:可以看出Survived和Pclass在Pclass=1的时候有较强的相关性(>0.5),所以最终模型中包含该特征。" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": {}, 196 | "outputs": [], 197 | "source": [ 198 | "train.groupby(['Pclass'])['Pclass','Survived'].mean()" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [ 207 | "train[['Pclass','Survived']].groupby(['Pclass']).mean().plot.bar()" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "##### ②Sex,性别\n", 215 | "结果分析:女性有更高的活下来的概率(74%),保留该特征" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [ 224 | "train.groupby(['Sex'])['Sex','Survived'].mean()" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": null, 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [ 233 | "train[['Sex','Survived']].groupby(['Sex']).mean().plot.bar()" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "##### ③SibSp and Parch 兄妹配偶数/父母子女数\n", 241 | "结果分析:这些特征与特定的值没有相关性不明显,最好是由这些独立的特征派生出一个新特征或者一组新特征" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": null, 247 | "metadata": {}, 248 | "outputs": [], 249 | "source": [ 250 | "train[['SibSp','Survived']].groupby(['SibSp']).mean()" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": {}, 257 | "outputs": [], 258 | "source": [ 259 | "train[['Parch','Survived']].groupby(['Parch']).mean().plot.bar()" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "##### ④Age年龄与生存情况的分析.\n", 267 | "结果分析:由图,可以看到年龄是影响生存情况的. \n", 268 | "\n", 269 | "但是年龄是有大部分缺失值的,缺失值需要进行处理,可以使用填充或者模型预测." 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "g = sns.FacetGrid(train, col='Survived',size=5)\n", 279 | "g.map(plt.hist, 'Age', bins=40)" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "train.groupby(['Age'])['Survived'].mean().plot()" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "##### ⑤Embarked登港港口与生存情况的分析\n", 296 | "结果分析:C地的生存率更高,这个也应该保留为模型特征." 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [ 305 | "sns.countplot('Embarked',hue='Survived',data=train)" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "⑥其他因素\n", 313 | "*在数据的Name项中包含了对该乘客的称呼,如Mr、Miss等,这些信息包含了乘客的年龄、性别、也有可能包含社会地位,如Dr、Lady、Major、Master等称呼。这一项不方便用图表展示,但是在特征工程中,我们会将其提取出来,然后放到模型中。\n", 314 | "\n", 315 | "*剩余因素还有船票价格、船舱号和船票号,这三个因素都可能会影响乘客在船中的位置从而影响逃生顺序,但是因为这三个因素与生存之间看不出明显规律,所以在后期模型融合时,将这些因素交给模型来决定其重要性。" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": {}, 321 | "source": [ 322 | "## 三.特征工程" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": { 329 | "collapsed": true 330 | }, 331 | "outputs": [], 332 | "source": [ 333 | "#先将数据集合并,一起做特征工程(注意,标准化的时候需要分开处理)\n", 334 | "#先将test补齐,然后通过pd.apped()合并\n", 335 | "test['Survived'] = 0\n", 336 | "train_test = train.append(test)" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "### ①Pclass,乘客等级,1是最高级\n", 344 | "两种方式:一是该特征不做处理,可以直接保留.二是再处理:也进行分列处理(比较那种方式模型效果更好,就选那种)" 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": { 351 | "collapsed": true 352 | }, 353 | "outputs": [], 354 | "source": [ 355 | "train_test = pd.get_dummies(train_test,columns=['Pclass'])" 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": {}, 361 | "source": [ 362 | "### ②Sex,性别\n", 363 | "无缺失值,直接分列" 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": null, 369 | "metadata": { 370 | "collapsed": true 371 | }, 372 | "outputs": [], 373 | "source": [ 374 | "train_test = pd.get_dummies(train_test,columns=[\"Sex\"])" 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "### ③SibSp and Parch 兄妹配偶数/父母子女数\n", 382 | "第一次直接保留:这两个都影响生存率,且都是数值型,先直接保存.\n", 383 | "\n", 384 | "第二次进行两项求和,并进行分列处理.(兄妹配偶数和父母子女数都是认识人的数量,所以总数可能也会更好)(模型结果提高到了)" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": { 391 | "collapsed": true 392 | }, 393 | "outputs": [], 394 | "source": [ 395 | "#这是剑豪模型后回来添加的新特征,模型的分数最终有所提高了.\n", 396 | "train_test['SibSp_Parch'] = train_test['SibSp'] + train_test['Parch']" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": null, 402 | "metadata": { 403 | "collapsed": true 404 | }, 405 | "outputs": [], 406 | "source": [ 407 | "train_test = pd.get_dummies(train_test,columns = ['SibSp','Parch','SibSp_Parch']) " 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "### ④Embarked \n", 415 | "数据有极少量(3个)缺失值,但是在分列的时候,缺失值的所有列可以均为0,所以可以考虑不填充.\n", 416 | "\n", 417 | "另外,也可以考虑用测试集众数来填充.先找出众数,再采用df.fillna()方法" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": null, 423 | "metadata": { 424 | "collapsed": true 425 | }, 426 | "outputs": [], 427 | "source": [ 428 | "train_test = pd.get_dummies(train_test,columns=[\"Embarked\"])" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "### ⑤ Name\n", 436 | "1.在数据的Name项中包含了对该乘客的称呼,将这些关键词提取出来,然后做分列处理.(参考别人的)" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": null, 442 | "metadata": {}, 443 | "outputs": [], 444 | "source": [ 445 | "#从名字中提取出称呼: df['Name].str.extract()是提取函数,配合正则一起使用\n", 446 | "train_test['Name1'] = train_test['Name'].str.extract('.+,(.+)').str.extract( '^(.+?)\\.').str.strip()" 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": null, 452 | "metadata": { 453 | "collapsed": true 454 | }, 455 | "outputs": [], 456 | "source": [ 457 | "#将姓名分类处理()\n", 458 | "train_test['Name1'].replace(['Capt', 'Col', 'Major', 'Dr', 'Rev'], 'Officer' , inplace = True)\n", 459 | "train_test['Name1'].replace(['Jonkheer', 'Don', 'Sir', 'the Countess', 'Dona', 'Lady'], 'Royalty' , inplace = True)\n", 460 | "train_test['Name1'].replace(['Mme', 'Ms', 'Mrs'], 'Mrs')\n", 461 | "train_test['Name1'].replace(['Mlle', 'Miss'], 'Miss')\n", 462 | "train_test['Name1'].replace(['Mr'], 'Mr' , inplace = True)\n", 463 | "train_test['Name1'].replace(['Master'], 'Master' , inplace = True)" 464 | ] 465 | }, 466 | { 467 | "cell_type": "code", 468 | "execution_count": null, 469 | "metadata": { 470 | "collapsed": true 471 | }, 472 | "outputs": [], 473 | "source": [ 474 | "#分列处理\n", 475 | "train_test = pd.get_dummies(train_test,columns=['Name1'])" 476 | ] 477 | }, 478 | { 479 | "cell_type": "markdown", 480 | "metadata": {}, 481 | "source": [ 482 | "##### 从姓名中提取出姓做特征" 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": null, 488 | "metadata": { 489 | "collapsed": true 490 | }, 491 | "outputs": [], 492 | "source": [ 493 | "#从姓名中提取出姓\n", 494 | "train_test['Name2'] = train_test['Name'].apply(lambda x: x.split('.')[1])\n", 495 | "\n", 496 | "#计算数量,然后合并数据集\n", 497 | "Name2_sum = train_test['Name2'].value_counts().reset_index()\n", 498 | "Name2_sum.columns=['Name2','Name2_sum']\n", 499 | "train_test = pd.merge(train_test,Name2_sum,how='left',on='Name2')\n", 500 | "\n", 501 | "#由于出现一次时该特征时无效特征,用one来代替出现一次的姓\n", 502 | "train_test.loc[train_test['Name2_sum'] == 1 , 'Name2_new'] = 'one'\n", 503 | "train_test.loc[train_test['Name2_sum'] > 1 , 'Name2_new'] = train_test['Name2']\n", 504 | "del train_test['Name2']\n", 505 | "\n", 506 | "#分列处理\n", 507 | "train_test = pd.get_dummies(train_test,columns=['Name2_new'])" 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": null, 513 | "metadata": { 514 | "collapsed": true 515 | }, 516 | "outputs": [], 517 | "source": [ 518 | "#删掉姓名这个特征\n", 519 | "del train_test['Name']" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "### ⑥ Fare\n", 527 | "该特征有缺失值,先找出缺失值的那调数据,然后用平均数填充" 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": null, 533 | "metadata": {}, 534 | "outputs": [], 535 | "source": [ 536 | "#从上面的分析,发现该特征train集无miss值,test有一个缺失值,先查看\n", 537 | "train_test.loc[train_test[\"Fare\"].isnull()]" 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": null, 543 | "metadata": {}, 544 | "outputs": [], 545 | "source": [ 546 | "#票价与pclass和Embarked有关,所以用train分组后的平均数填充\n", 547 | "train.groupby(by=[\"Pclass\",\"Embarked\"]).Fare.mean()" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": null, 553 | "metadata": { 554 | "collapsed": true 555 | }, 556 | "outputs": [], 557 | "source": [ 558 | "#用pclass=3和Embarked=S的平均数14.644083来填充\n", 559 | "train_test[\"Fare\"].fillna(14.435422,inplace=True)" 560 | ] 561 | }, 562 | { 563 | "cell_type": "markdown", 564 | "metadata": { 565 | "collapsed": true 566 | }, 567 | "source": [ 568 | "### ⑦ Ticket\n", 569 | "该列和名字做类似的处理,先提取,然后分列" 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": null, 575 | "metadata": { 576 | "collapsed": true 577 | }, 578 | "outputs": [], 579 | "source": [ 580 | "#将Ticket提取字符列\n", 581 | "#str.isnumeric() 如果S中只有数字字符,则返回True,否则返回False\n", 582 | "train_test['Ticket_Letter'] = train_test['Ticket'].str.split().str[0]\n", 583 | "train_test['Ticket_Letter'] = train_test['Ticket_Letter'].apply(lambda x:np.nan if x.isnumeric() else x)\n", 584 | "train_test.drop('Ticket',inplace=True,axis=1)" 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": null, 590 | "metadata": { 591 | "collapsed": true 592 | }, 593 | "outputs": [], 594 | "source": [ 595 | "#分列,此时nan值可以不做处理\n", 596 | "train_test = pd.get_dummies(train_test,columns=['Ticket_Letter'],drop_first=True)" 597 | ] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "metadata": {}, 602 | "source": [ 603 | "### ⑧ Age\n", 604 | "1.该列有大量缺失值,考虑用一个回归模型进行填充.\n", 605 | "\n", 606 | "2.在模型修改的时候,考虑到年龄缺失值可能影响死亡情况,用年龄是否缺失值来构造新特征" 607 | ] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": null, 612 | "metadata": {}, 613 | "outputs": [], 614 | "source": [ 615 | "\"\"\"\n", 616 | "这是模型就好后回来增加的新特征\n", 617 | "考虑年龄缺失值可能影响死亡情况,数据表明,年龄缺失的死亡率为0.19.\n", 618 | "\"\"\"\n", 619 | "train_test.loc[train_test[\"Age\"].isnull()]['Survived'].mean()" 620 | ] 621 | }, 622 | { 623 | "cell_type": "code", 624 | "execution_count": null, 625 | "metadata": { 626 | "collapsed": true 627 | }, 628 | "outputs": [], 629 | "source": [ 630 | "# 所以用年龄是否缺失值来构造新特征\n", 631 | "train_test.loc[train_test[\"Age\"].isnull() ,\"age_nan\"] = 1\n", 632 | "train_test.loc[train_test[\"Age\"].notnull() ,\"age_nan\"] = 0\n", 633 | "train_test = pd.get_dummies(train_test,columns=['age_nan'])" 634 | ] 635 | }, 636 | { 637 | "cell_type": "markdown", 638 | "metadata": {}, 639 | "source": [ 640 | "### 利用其他组特征量,采用机器学习算法来预测Age" 641 | ] 642 | }, 643 | { 644 | "cell_type": "code", 645 | "execution_count": null, 646 | "metadata": {}, 647 | "outputs": [], 648 | "source": [ 649 | "train_test.info()" 650 | ] 651 | }, 652 | { 653 | "cell_type": "code", 654 | "execution_count": null, 655 | "metadata": {}, 656 | "outputs": [], 657 | "source": [ 658 | "#创建没有['Age','Survived']的数据集\n", 659 | "missing_age = train_test.drop(['Survived','Cabin'],axis=1)\n", 660 | "#将Age完整的项作为训练集、将Age缺失的项作为测试集。\n", 661 | "missing_age_train = missing_age[missing_age['Age'].notnull()]\n", 662 | "missing_age_test = missing_age[missing_age['Age'].isnull()]" 663 | ] 664 | }, 665 | { 666 | "cell_type": "code", 667 | "execution_count": null, 668 | "metadata": {}, 669 | "outputs": [], 670 | "source": [ 671 | "#构建训练集合预测集的X和Y值\n", 672 | "missing_age_X_train = missing_age_train.drop(['Age'], axis=1)\n", 673 | "missing_age_Y_train = missing_age_train['Age']\n", 674 | "missing_age_X_test = missing_age_test.drop(['Age'], axis=1)" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": null, 680 | "metadata": { 681 | "collapsed": true 682 | }, 683 | "outputs": [], 684 | "source": [ 685 | "# 先将数据标准化\n", 686 | "from sklearn.preprocessing import StandardScaler\n", 687 | "ss = StandardScaler()\n", 688 | "#用测试集训练并标准化\n", 689 | "ss.fit(missing_age_X_train)\n", 690 | "missing_age_X_train = ss.transform(missing_age_X_train)\n", 691 | "missing_age_X_test = ss.transform(missing_age_X_test)" 692 | ] 693 | }, 694 | { 695 | "cell_type": "code", 696 | "execution_count": null, 697 | "metadata": { 698 | "collapsed": true 699 | }, 700 | "outputs": [], 701 | "source": [ 702 | "#使用贝叶斯预测年龄\n", 703 | "from sklearn import linear_model\n", 704 | "lin = linear_model.BayesianRidge()" 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": null, 710 | "metadata": {}, 711 | "outputs": [], 712 | "source": [ 713 | "lin.fit(missing_age_X_train,missing_age_Y_train)" 714 | ] 715 | }, 716 | { 717 | "cell_type": "code", 718 | "execution_count": null, 719 | "metadata": { 720 | "collapsed": true 721 | }, 722 | "outputs": [], 723 | "source": [ 724 | "#利用loc将预测值填入数据集\n", 725 | "train_test.loc[(train_test['Age'].isnull()), 'Age'] = lin.predict(missing_age_X_test)" 726 | ] 727 | }, 728 | { 729 | "cell_type": "code", 730 | "execution_count": null, 731 | "metadata": { 732 | "collapsed": true 733 | }, 734 | "outputs": [], 735 | "source": [ 736 | "#将年龄划分是个阶段10以下,10-18,18-30,30-50,50以上\n", 737 | "train_test['Age'] = pd.cut(train_test['Age'], bins=[0,10,18,30,50,100],labels=[1,2,3,4,5])\n", 738 | "\n", 739 | "train_test = pd.get_dummies(train_test,columns=['Age'])" 740 | ] 741 | }, 742 | { 743 | "cell_type": "markdown", 744 | "metadata": {}, 745 | "source": [ 746 | "### ⑨ Cabin\n", 747 | "cabin项缺失太多,只能将有无Cain首字母进行分类,缺失值为一类,作为特征值进行建模,也可以考虑直接舍去该特征" 748 | ] 749 | }, 750 | { 751 | "cell_type": "code", 752 | "execution_count": null, 753 | "metadata": { 754 | "collapsed": true 755 | }, 756 | "outputs": [], 757 | "source": [ 758 | "#cabin项缺失太多,只能将有无Cain首字母进行分类,缺失值为一类,作为特征值进行建模\n", 759 | "train_test['Cabin'] = train_test['Cabin'].apply(lambda x:str(x)[0] if pd.notnull(x) else x)\n", 760 | "train_test = pd.get_dummies(train_test,columns=['Cabin'])" 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": null, 766 | "metadata": {}, 767 | "outputs": [], 768 | "source": [ 769 | "#cabin项缺失太多,只能将有无Cain首字母进行分类,\n", 770 | "train_test.loc[train_test[\"Cabin\"].isnull() ,\"Cabin_nan\"] = 1\n", 771 | "train_test.loc[train_test[\"Cabin\"].notnull() ,\"Cabin_nan\"] = 0\n", 772 | "train_test = pd.get_dummies(train_test,columns=['Cabin_nan'])\n", 773 | "train_test.drop('Cabin',axis=1,inplace=True)" 774 | ] 775 | }, 776 | { 777 | "cell_type": "markdown", 778 | "metadata": {}, 779 | "source": [ 780 | "### ⑩ 特征工程处理完了,划分数据集" 781 | ] 782 | }, 783 | { 784 | "cell_type": "code", 785 | "execution_count": null, 786 | "metadata": { 787 | "collapsed": true 788 | }, 789 | "outputs": [], 790 | "source": [ 791 | "train_data = train_test[:891]\n", 792 | "test_data = train_test[891:]\n", 793 | "train_data_X = train_data.drop(['Survived'],axis=1)\n", 794 | "train_data_Y = train_data['Survived']\n", 795 | "test_data_X = test_data.drop(['Survived'],axis=1)" 796 | ] 797 | }, 798 | { 799 | "cell_type": "markdown", 800 | "metadata": {}, 801 | "source": [ 802 | "## 四 建立模型" 803 | ] 804 | }, 805 | { 806 | "cell_type": "markdown", 807 | "metadata": {}, 808 | "source": [ 809 | "### 数据标准化\n", 810 | "1.线性模型需要用标准化的数据建模,而树类模型不需要标准化的数据\n", 811 | "\n", 812 | "2.处理标准化的时候,注意将测试集的数据transform到test集上" 813 | ] 814 | }, 815 | { 816 | "cell_type": "code", 817 | "execution_count": null, 818 | "metadata": { 819 | "collapsed": true 820 | }, 821 | "outputs": [], 822 | "source": [ 823 | "from sklearn.preprocessing import StandardScaler\n", 824 | "ss2 = StandardScaler()\n", 825 | "ss2.fit(train_data_X)\n", 826 | "train_data_X_sd = ss2.transform(train_data_X)\n", 827 | "test_data_X_sd = ss2.transform(test_data_X)" 828 | ] 829 | }, 830 | { 831 | "cell_type": "markdown", 832 | "metadata": {}, 833 | "source": [ 834 | "### 开始建模\n", 835 | "1.可选单个模型模型有随机森林,逻辑回归,svm,xgboost,gbdt等.\n", 836 | "\n", 837 | "2.也可以将多个模型组合起来,进行模型融合,比如voting,stacking等方法\n", 838 | "\n", 839 | "3.好的特征决定模型上限,好的模型和参数可以无线逼近上限.\n", 840 | "\n", 841 | "4.我测试了多种模型,模型结果最高的随机森林,最高有0.8." 842 | ] 843 | }, 844 | { 845 | "cell_type": "markdown", 846 | "metadata": {}, 847 | "source": [ 848 | "### 随机森林" 849 | ] 850 | }, 851 | { 852 | "cell_type": "code", 853 | "execution_count": null, 854 | "metadata": { 855 | "collapsed": true 856 | }, 857 | "outputs": [], 858 | "source": [ 859 | "from sklearn.ensemble import RandomForestClassifier\n", 860 | "\n", 861 | "rf = RandomForestClassifier(n_estimators=150,min_samples_leaf=3,max_depth=6,oob_score=True)\n", 862 | "rf.fit(train_data_X,train_data_Y)\n", 863 | "\n", 864 | "test[\"Survived\"] = rf.predict(test_data_X)\n", 865 | "RF = test[['PassengerId','Survived']].set_index('PassengerId')\n", 866 | "RF.to_csv('RF.csv')" 867 | ] 868 | }, 869 | { 870 | "cell_type": "code", 871 | "execution_count": null, 872 | "metadata": { 873 | "collapsed": true 874 | }, 875 | "outputs": [], 876 | "source": [ 877 | "#随机森林是随机选取特征进行建模的,所以每次的结果可能都有点小差异\n", 878 | "#如果分数足够好,可以将该模型保存起来,下次直接调出来使用0.81339 'rf10.pkl'\n", 879 | "from sklearn.externals import joblib\n", 880 | "joblib.dump(rf, 'rf10.pkl')" 881 | ] 882 | }, 883 | { 884 | "cell_type": "markdown", 885 | "metadata": {}, 886 | "source": [ 887 | "### LogisticRegression" 888 | ] 889 | }, 890 | { 891 | "cell_type": "code", 892 | "execution_count": null, 893 | "metadata": { 894 | "collapsed": true 895 | }, 896 | "outputs": [], 897 | "source": [ 898 | "from sklearn.linear_model import LogisticRegression\n", 899 | "\n", 900 | "lr = LogisticRegression()\n", 901 | "\n", 902 | "from sklearn.grid_search import GridSearchCV\n", 903 | "\n", 904 | "param = {'C':[0.001,0.01,0.1,1,10],\"max_iter\":[100,250]}\n", 905 | "\n", 906 | "clf = GridSearchCV(lr,param,cv=5,n_jobs=-1,verbose=1,scoring=\"roc_auc\")\n", 907 | "\n", 908 | "clf.fit(train_data_X_sd,train_data_Y)\n", 909 | "\n", 910 | "#打印参数的得分情况\n", 911 | "clf.grid_scores_\n", 912 | "\n", 913 | "#打印最佳参数\n", 914 | "clf.best_params_\n", 915 | "\n", 916 | "#将最佳参数传入训练模型\n", 917 | "lr = LogisticRegression(clf.best_params_)\n", 918 | "\n", 919 | "lr.fit(train_data_X_sd,train_data_Y)\n", 920 | "\n", 921 | "#预测结果\n", 922 | "lr.predict(test_data_X_sd)\n", 923 | "\n", 924 | "#打印结果\n", 925 | "test[\"Survived\"] = lr.predict(test_data_X_sd)\n", 926 | "LS = test[['PassengerId','Survived']].set_index('PassengerId')\n", 927 | "\n", 928 | "#输出结果\n", 929 | "LS.to_csv('LS5.csv')" 930 | ] 931 | }, 932 | { 933 | "cell_type": "markdown", 934 | "metadata": {}, 935 | "source": [ 936 | "### SVM" 937 | ] 938 | }, 939 | { 940 | "cell_type": "code", 941 | "execution_count": null, 942 | "metadata": { 943 | "collapsed": true 944 | }, 945 | "outputs": [], 946 | "source": [ 947 | "from sklearn import svm\n", 948 | "svc = svm.SVC()\n", 949 | "\n", 950 | "clf = GridSearchCV(svc,param,cv=5,n_jobs=-1,verbose=1,scoring=\"roc_auc\")\n", 951 | "clf.fit(train_data_X_sd,train_data_Y)\n", 952 | "\n", 953 | "clf.best_params_\n", 954 | "\n", 955 | "svc = svm.SVC(C=1,max_iter=250)\n", 956 | "\n", 957 | "#训练模型并预测结果\n", 958 | "svc.fit(train_data_X_sd,train_data_Y)\n", 959 | "svc.predict(test_data_X_sd)\n", 960 | "\n", 961 | "#打印结果\n", 962 | "test[\"Survived\"] = svc.predict(test_data_X_sd)\n", 963 | "SVM = test[['PassengerId','Survived']].set_index('PassengerId')\n", 964 | "SVM.to_csv('svm1.csv')" 965 | ] 966 | }, 967 | { 968 | "cell_type": "markdown", 969 | "metadata": {}, 970 | "source": [ 971 | "### xgboost" 972 | ] 973 | }, 974 | { 975 | "cell_type": "code", 976 | "execution_count": null, 977 | "metadata": { 978 | "collapsed": true 979 | }, 980 | "outputs": [], 981 | "source": [ 982 | "import xgboost as xgb\n", 983 | "\n", 984 | "xgb_model = xgb.XGBClassifier(n_estimators=150,min_samples_leaf=3,max_depth=6)\n", 985 | "\n", 986 | "xgb_model.fit(train_data_X,train_data_Y)\n", 987 | "\n", 988 | "test[\"Survived\"] = xgb_model.predict(test_data_X)\n", 989 | "XGB = test[['PassengerId','Survived']].set_index('PassengerId')\n", 990 | "XGB.to_csv('XGB5.csv')" 991 | ] 992 | }, 993 | { 994 | "cell_type": "markdown", 995 | "metadata": {}, 996 | "source": [ 997 | "### GBDT" 998 | ] 999 | }, 1000 | { 1001 | "cell_type": "code", 1002 | "execution_count": null, 1003 | "metadata": { 1004 | "collapsed": true 1005 | }, 1006 | "outputs": [], 1007 | "source": [ 1008 | "from sklearn.ensemble import GradientBoostingClassifier\n", 1009 | "\n", 1010 | "gbdt = GradientBoostingClassifier(learning_rate=0.7,max_depth=6,n_estimators=100,min_samples_leaf=2)\n", 1011 | "\n", 1012 | "gbdt.fit(train_data_X,train_data_Y)\n", 1013 | "\n", 1014 | "test[\"Survived\"] = gbdt.predict(test_data_X)\n", 1015 | "test[['PassengerId','Survived']].set_index('PassengerId').to_csv('gbdt3.csv')" 1016 | ] 1017 | }, 1018 | { 1019 | "cell_type": "markdown", 1020 | "metadata": {}, 1021 | "source": [ 1022 | "### 模型融合voting" 1023 | ] 1024 | }, 1025 | { 1026 | "cell_type": "code", 1027 | "execution_count": null, 1028 | "metadata": { 1029 | "collapsed": true 1030 | }, 1031 | "outputs": [], 1032 | "source": [ 1033 | "from sklearn.ensemble import VotingClassifier\n", 1034 | "\n", 1035 | "from sklearn.linear_model import LogisticRegression\n", 1036 | "lr = LogisticRegression(C=0.1,max_iter=100)\n", 1037 | "\n", 1038 | "import xgboost as xgb\n", 1039 | "xgb_model = xgb.XGBClassifier(max_depth=6,min_samples_leaf=2,n_estimators=100,num_round = 5)\n", 1040 | "\n", 1041 | "from sklearn.ensemble import RandomForestClassifier\n", 1042 | "rf = RandomForestClassifier(n_estimators=200,min_samples_leaf=2,max_depth=6,oob_score=True)\n", 1043 | "\n", 1044 | "from sklearn.ensemble import GradientBoostingClassifier\n", 1045 | "gbdt = GradientBoostingClassifier(learning_rate=0.1,min_samples_leaf=2,max_depth=6,n_estimators=100)\n", 1046 | "\n", 1047 | "vot = VotingClassifier(estimators=[('lr', lr), ('rf', rf),('gbdt',gbdt),('xgb',xgb_model)], voting='hard')\n", 1048 | "\n", 1049 | "vot.fit(train_data_X_sd,train_data_Y)\n", 1050 | "\n", 1051 | "vot.predict(test_data_X_sd)\n", 1052 | "\n", 1053 | "test[\"Survived\"] = vot.predict(test_data_X_sd)\n", 1054 | "test[['PassengerId','Survived']].set_index('PassengerId').to_csv('vot5.csv')" 1055 | ] 1056 | }, 1057 | { 1058 | "cell_type": "markdown", 1059 | "metadata": {}, 1060 | "source": [ 1061 | "### 模型融合stacking" 1062 | ] 1063 | }, 1064 | { 1065 | "cell_type": "code", 1066 | "execution_count": null, 1067 | "metadata": { 1068 | "collapsed": true 1069 | }, 1070 | "outputs": [], 1071 | "source": [ 1072 | "#划分train数据集,调用代码,把数据集名字转成和代码一样\n", 1073 | "X = train_data_X_sd\n", 1074 | "X_predict = test_data_X_sd\n", 1075 | "y = train_data_Y\n", 1076 | "\n", 1077 | "'''模型融合中使用到的各个单模型'''\n", 1078 | "from sklearn.linear_model import LogisticRegression\n", 1079 | "from sklearn import svm\n", 1080 | "import xgboost as xgb\n", 1081 | "from sklearn.ensemble import RandomForestClassifier\n", 1082 | "from sklearn.ensemble import GradientBoostingClassifier\n", 1083 | "\n", 1084 | "clfs = [LogisticRegression(C=0.1,max_iter=100),\n", 1085 | " xgb.XGBClassifier(max_depth=6,n_estimators=100,num_round = 5),\n", 1086 | " RandomForestClassifier(n_estimators=100,max_depth=6,oob_score=True),\n", 1087 | " GradientBoostingClassifier(learning_rate=0.3,max_depth=6,n_estimators=100)]\n", 1088 | "\n", 1089 | "#创建n_folds\n", 1090 | "from sklearn.cross_validation import StratifiedKFold\n", 1091 | "n_folds = 5\n", 1092 | "skf = list(StratifiedKFold(y, n_folds))\n", 1093 | "\n", 1094 | "#创建零矩阵\n", 1095 | "dataset_blend_train = np.zeros((X.shape[0], len(clfs)))\n", 1096 | "dataset_blend_test = np.zeros((X_predict.shape[0], len(clfs)))\n", 1097 | "\n", 1098 | "#建立模型\n", 1099 | "for j, clf in enumerate(clfs):\n", 1100 | " '''依次训练各个单模型'''\n", 1101 | " # print(j, clf)\n", 1102 | " dataset_blend_test_j = np.zeros((X_predict.shape[0], len(skf)))\n", 1103 | " for i, (train, test) in enumerate(skf):\n", 1104 | " '''使用第i个部分作为预测,剩余的部分来训练模型,获得其预测的输出作为第i部分的新特征。'''\n", 1105 | " # print(\"Fold\", i)\n", 1106 | " X_train, y_train, X_test, y_test = X[train], y[train], X[test], y[test]\n", 1107 | " clf.fit(X_train, y_train)\n", 1108 | " y_submission = clf.predict_proba(X_test)[:, 1]\n", 1109 | " dataset_blend_train[test, j] = y_submission\n", 1110 | " dataset_blend_test_j[:, i] = clf.predict_proba(X_predict)[:, 1]\n", 1111 | " '''对于测试集,直接用这k个模型的预测值均值作为新的特征。'''\n", 1112 | " dataset_blend_test[:, j] = dataset_blend_test_j.mean(1)\n", 1113 | "\n", 1114 | "# 用建立第二层模型\n", 1115 | "clf2 = LogisticRegression(C=0.1,max_iter=100)\n", 1116 | "clf2.fit(dataset_blend_train, y)\n", 1117 | "y_submission = clf2.predict_proba(dataset_blend_test)[:, 1]\n", 1118 | "\n", 1119 | "\n", 1120 | "test = pd.read_csv(\"test.csv\")\n", 1121 | "test[\"Survived\"] = clf2.predict(dataset_blend_test)\n", 1122 | "test[['PassengerId','Survived']].set_index('PassengerId').to_csv('stack3.csv')" 1123 | ] 1124 | } 1125 | ], 1126 | "metadata": { 1127 | "kernelspec": { 1128 | "display_name": "Python 3", 1129 | "language": "python", 1130 | "name": "python3" 1131 | }, 1132 | "language_info": { 1133 | "codemirror_mode": { 1134 | "name": "ipython", 1135 | "version": 3 1136 | }, 1137 | "file_extension": ".py", 1138 | "mimetype": "text/x-python", 1139 | "name": "python", 1140 | "nbconvert_exporter": "python", 1141 | "pygments_lexer": "ipython3", 1142 | "version": "3.6.1" 1143 | } 1144 | }, 1145 | "nbformat": 4, 1146 | "nbformat_minor": 2 1147 | } 1148 | -------------------------------------------------------------------------------- /01_泰坦尼克号(titanic)/README.md: -------------------------------------------------------------------------------- 1 | # **泰坦尼克号** 2 | 3 | ## 比赛说明 4 | 5 | * [**泰坦尼克号**](https://www.kaggle.com/c/titanic) 的沉没是历史上最臭名昭着的沉船之一。1912年4月15日,在首航期间,泰坦尼克号撞上一座冰山后沉没,2224名乘客和机组人员中有1502人遇难。这一耸人听闻的悲剧震撼了国际社会,并导致了更好的船舶安全条例。 6 | * 沉船导致生命损失的原因之一是乘客和船员没有足够的救生艇。虽然幸存下来的运气有一些因素,但有些人比其他人更有可能生存,比如妇女,儿童和上层阶级。 7 | * 在这个挑战中,我们要求你完成对什么样的人可能生存的分析。特别是,我们要求你运用机器学习的工具来预测哪些乘客幸存下来的悲剧。 8 | 9 | ## 比赛分析 10 | 11 | - 分类问题:预测的是`生`与`死`的问题 12 | 13 | - 常用算法: K近邻(KNN)、 逻辑回归(LogisticRegression)、 随机森林(RandomForest)、 支持向量机(SVM)、 xgboost、 GBDT 14 | 15 | 16 | #### 步骤: 17 | **一. 数据分析** 18 | 19 | 1.下载并加载数据 20 | 21 | 2.总体预览:了解每列数据的含义,数据的格式等 22 | 23 | 3.数据初步分析,使用统计学与绘图:初步了解数据之间的相关性,为构造特征工程以及模型建立做准备 24 | 25 | **二. 特征工程** 26 | 27 | 1.根据业务,常识,以及第二步的数据分析构造特征工程. 28 | 29 | 2.将特征转换为模型可以辨别的类型(如处理缺失值,处理文本进行等) 30 | 31 | **三. 模型选择** 32 | 33 | 1.根据目标函数确定学习类型,是无监督学习还是监督学习,是分类问题还是回归问题等. 34 | 35 | 2.比较各个模型的分数,然后取效果较好的模型作为基础模型. 36 | 37 | **四. 模型融合** 38 | 39 | **五. 修改特征和模型参数** 40 | 41 | 1.可以通过添加或者修改特征,提高模型的上限. 42 | 43 | 2.通过修改模型的参数,是模型逼近上限 44 | 45 | 46 | ## 一. 数据分析 47 | 48 | ### 数据下载和加载 49 | 50 | * 数据集下载地址: 51 | 52 | ---------- 53 | 54 | # 导入相关数据包 55 | import numpy as np 56 | import pandas as pd 57 | import seaborn as sns 58 | import matplotlib.pyplot as plt 59 | %matplotlib inline 60 | 61 | root_path = '/datasets' 62 | 63 | train = pd.read_csv('%s/%s' % (root_path, 'train.csv')) 64 | test = pd.read_csv('%s/%s' % (root_path, 'test.csv')) 65 | 66 | ### 特征说明 67 | 68 | - | survival | 生存 | 0 = No, 1 = Yes | 69 | - | pclass | 票类别-社会地位 | 1 = 1st, 2 = 2nd, 3 = 3rd | 70 | - | sex | 性别 | 71 | - | Age | 年龄 | 72 | - | sibsp | 兄弟姐妹/配偶 | 73 | - | parch | 父母/孩子的数量 | 74 | - | ticket | 票号 | 75 | - | fare | 乘客票价 | 76 | - | cabin | 客舱号码 | 77 | - | embarked | 登船港口 | C=Cherbourg, Q=Queenstown, S=Southampton | 78 | 79 | ---------- 80 | ### 特征详情 81 | 82 | train.head(5) 83 | 84 | ---------- 85 | 86 | # 返回每列列名,该列非nan值个数,以及该列类型 87 | train.info() 88 | 89 | 90 | RangeIndex: 891 entries, 0 to 890 91 | Data columns (total 12 columns): 92 | PassengerId 891 non-null int64 93 | Survived 891 non-null int64 94 | Pclass 891 non-null int64 95 | Name 891 non-null object 96 | Sex 891 non-null object 97 | Age 714 non-null float64 98 | SibSp 891 non-null int64 99 | Parch 891 non-null int64 100 | Ticket 891 non-null object 101 | Fare 891 non-null float64 102 | Cabin 204 non-null object 103 | Embarked 889 non-null object 104 | dtypes: float64(2), int64(5), object(5) 105 | memory usage: 83.6+ KB 106 | 107 | 108 | ---------- 109 | test.info() 110 | 111 | 112 | RangeIndex: 418 entries, 0 to 417 113 | Data columns (total 11 columns): 114 | PassengerId 418 non-null int64 115 | Pclass 418 non-null int64 116 | Name 418 non-null object 117 | Sex 418 non-null object 118 | Age 332 non-null float64 119 | SibSp 418 non-null int64 120 | Parch 418 non-null int64 121 | Ticket 418 non-null object 122 | Fare 417 non-null float64 123 | Cabin 91 non-null object 124 | Embarked 418 non-null object 125 | dtypes: float64(2), int64(4), object(5) 126 | memory usage: 36.0+ KB 127 | 128 | 129 | ---------- 130 | # 返回数值型变量的统计量 131 | # train.describe(percentiles=[0.00, 0.25, 0.5, 0.75, 1.00]) 132 | train.describe() 133 | 134 | ### 特征分析(统计学与绘图) 135 | 目的:初步了解数据之间的相关性,为构造特征工程以及模型建立做准备 136 | 137 | 138 | # 存活人数 139 | train['Survived'].value_counts() 140 | 141 | 0 549 142 | 1 342 143 | Name: Survived, dtype: int64 144 | 145 | 146 | **数值型数据协方差,corr()函数** 147 | 148 | 来个总览,快速了解个数据的相关性 149 | 150 | # 相关性协方差表,corr()函数,返回结果接近0说明无相关性,大于0说明是正相关,小于0是负相关. 151 | train_corr = train.drop('PassengerId',axis=1).corr() 152 | train_corr 153 | 154 | 155 | ---------- 156 | # 画出相关性热力图 157 | a = plt.subplots(figsize=(15,9))#调整画布大小 158 | a = sns.heatmap(train_corr, vmin=-1, vmax=1 , annot=True , square=True)#画热力图 159 | 160 | **各个数据与结果的关系** 161 | 进一步探索分析各个数据与结果的关系 162 | 163 | - ① Pclass,乘客等级,1是最高级 164 | 165 | 166 | 结果分析:可以看出Survived和Pclass在Pclass=1的时候有较强的相关性(>0.5),所以最终模型中包含该特征。 167 | 168 | train.groupby(['Pclass'])['Pclass','Survived'].mean() 169 | 170 | 171 | ---------- 172 | train[['Pclass','Survived']].groupby(['Pclass']).mean().plot.bar() 173 | 174 | - ② Sex,性别 175 | 176 | 结果分析:女性有更高的活下来的概率(74%),保留该特征 177 | 178 | train.groupby(['Sex'])['Sex','Survived'].mean() 179 | 180 | 181 | ---------- 182 | - ③ SibSp and Parch 兄妹配偶数/父母子女数 183 | 184 | 结果分析:这些特征与特定的值没有相关性不明显,最好是由这些独立的特征派生出一个新特征或者一组新特征 185 | 186 | train[['SibSp','Survived']].groupby(['SibSp']).mean() 187 | 188 | 189 | ---------- 190 | train[['Parch','Survived']].groupby(['Parch']).mean().plot.bar() 191 | 192 | ---------- 193 | 194 | - ④ Age年龄与生存情况的分析. 195 | 196 | 结果分析:由图,可以看到年龄是影响生存情况的. 197 | 198 | 但是年龄是有大部分缺失值的,缺失值需要进行处理,可以使用填充或者模型预测. 199 | 200 | g = sns.FacetGrid(train, col='Survived',size=5) 201 | g.map(plt.hist, 'Age', bins=40) 202 | 203 | 204 | 205 | 206 | ---------- 207 | train.groupby(['Age'])['Survived'].mean().plot() 208 | 209 | 210 | 211 | 212 | ---------- 213 | - ⑤ Embarked登港港口与生存情况的分析 214 | 215 | 216 | 结果分析:C地的生存率更高,这个也应该保留为模型特征. 217 | 218 | sns.countplot('Embarked',hue='Survived',data=train) 219 | 220 | 221 | 222 | - ⑥ 其他因素 223 | 224 | 在数据的Name项中包含了对该乘客的称呼,如Mr、Miss等,这些信息包含了乘客的年龄、性别、也有可能包含社会地位,如Dr、Lady、Major、Master等称呼。这一项不方便用图表展示,但是在特征工程中,我们会将其提取出来,然后放到模型中。 225 | 226 | 剩余因素还有船票价格、船舱号和船票号,这三个因素都可能会影响乘客在船中的位置从而影响逃生顺序,但是因为这三个因素与生存之间看不出明显规律,所以在后期模型融合时,将这些因素交给模型来决定其重要性。 227 | 228 | ## 二. 特征工程 229 | 230 | #先将数据集合并,一起做特征工程(注意,标准化的时候需要分开处理) 231 | #先将test补齐,然后通过pd.apped()合并 232 | test['Survived'] = 0 233 | train_test = train.append(test) 234 | 235 | ### 特征处理 236 | 237 | * ① Pclass,乘客等级,1是最高级 238 | 239 | 两种方式:一是该特征不做处理,可以直接保留.二是再处理:也进行分列处理(比较那种方式模型效果更好,就选那种) 240 | 241 | 242 | train_test = pd.get_dummies(train_test,columns=['Pclass']) 243 | 244 | 245 | * ② Sex,性别 246 | 247 | 无缺失值,直接分列 248 | 249 | 250 | train_test = pd.get_dummies(train_test,columns=["Sex"]) 251 | 252 | 253 | * ③ SibSp and Parch 兄妹配偶数/父母子女数 254 | 255 | 第一次直接保留:这两个都影响生存率,且都是数值型,先直接保存. 256 | 257 | 第二次进行两项求和,并进行分列处理.(兄妹配偶数和父母子女数都是认识人的数量,所以总数可能也会更好)(模型结果提高到了) 258 | 259 | #这是剑豪模型后回来添加的新特征,模型的分数最终有所提高了. 260 | train_test['SibSp_Parch'] = train_test['SibSp'] + train_test['Parch'] 261 | 262 | ---------- 263 | 264 | 265 | 266 | train_test = pd.get_dummies(train_test,columns['SibSp','Parch','SibSp_Parch']) 267 | 268 | 269 | * ④ Embarked 270 | 271 | 数据有极少量(3个)缺失值,但是在分列的时候,缺失值的所有列可以均为0,所以可以考虑不填充. 272 | 273 | 另外,也可以考虑用测试集众数来填充.先找出众数,再采用df.fillna()方法 274 | 275 | 276 | train_test = pd.get_dummies(train_test,columns=["Embarked"]) 277 | 278 | 279 | * ⑤ Name 280 | 281 | 1.在数据的Name项中包含了对该乘客的称呼,将这些关键词提取出来,然后做分列处理.(参考别人的) 282 | 283 | 284 | #从名字中提取出称呼: df['Name].str.extract()是提取函数,配合正则一起使用 285 | train_test['Name1'] = train_test['Name'].str.extract('.+,(.+)').str.extract( '^(.+?)\.').str.strip() 286 | 287 | 288 | ---------- 289 | 290 | #将姓名分类处理() 291 | train_test['Name1'].replace(['Capt', 'Col', 'Major', 'Dr', 'Rev'], 'Officer' , inplace = True) 292 | train_test['Name1'].replace(['Jonkheer', 'Don', 'Sir', 'the Countess', 'Dona', 'Lady'], 'Royalty' , inplace = True) 293 | train_test['Name1'].replace(['Mme', 'Ms', 'Mrs'], 'Mrs') 294 | train_test['Name1'].replace(['Mlle', 'Miss'], 'Miss') 295 | train_test['Name1'].replace(['Mr'], 'Mr' , inplace = True) 296 | train_test['Name1'].replace(['Master'], 'Master' , inplace = True) 297 | 298 | ---------- 299 | 300 | #分列处理 301 | train_test = pd.get_dummies(train_test,columns=['Name1']) 302 | 303 | ---------- 304 | 305 | 306 | 2.从姓名中提取出姓做特征 307 | 308 | 309 | ---------- 310 | 311 | #从姓名中提取出姓 312 | train_test['Name2'] = train_test['Name'].apply(lambda x: x.split('.')[1]) 313 | 314 | #计算数量,然后合并数据集 315 | Name2_sum = train_test['Name2'].value_counts().reset_index() 316 | Name2_sum.columns=['Name2','Name2_sum'] 317 | train_test = pd.merge(train_test,Name2_sum,how='left',on='Name2') 318 | 319 | #由于出现一次时该特征时无效特征,用one来代替出现一次的姓 320 | train_test.loc[train_test['Name2_sum'] == 1 , 'Name2_new'] = 'one' 321 | train_test.loc[train_test['Name2_sum'] > 1 , 'Name2_new'] = train_test['Name2'] 322 | del train_test['Name2'] 323 | 324 | #分列处理 325 | train_test = pd.get_dummies(train_test,columns=['Name2_new']) 326 | 327 | 328 | ---------- 329 | 330 | 331 | #删掉姓名这个特征 332 | del train_test['Name'] 333 | 334 | 335 | * ⑥ Fare 336 | 337 | 该特征有缺失值,先找出缺失值的那调数据,然后用平均数填充 338 | 339 | 340 | #从上面的分析,发现该特征train集无miss值,test有一个缺失值,先查看 341 | train_test.loc[train_test["Fare"].isnull()] 342 | 343 | 344 | ---------- 345 | #票价与pclass和Embarked有关,所以用train分组后的平均数填充 346 | train.groupby(by=["Pclass","Embarked"]).Fare.mean() 347 | 348 | Pclass Embarked 349 | 1 C 104.718529 350 | Q 90.000000 351 | S 70.364862 352 | 2 C 25.358335 353 | Q 12.350000 354 | S 20.327439 355 | 3 C 11.214083 356 | Q 11.183393 357 | S 14.644083 358 | Name: Fare, dtype: float64 359 | 360 | 361 | ---------- 362 | 363 | #用pclass=3和Embarked=S的平均数14.644083来填充 364 | train_test["Fare"].fillna(14.435422,inplace=True) 365 | 366 | 367 | ---------- 368 | 369 | * ⑦ Ticket 370 | 371 | 该列和名字做类似的处理,先提取,然后分列 372 | 373 | 374 | #将Ticket提取字符列 375 | #str.isnumeric() 如果S中只有数字字符,则返回True,否则返回False 376 | train_test['Ticket_Letter'] = train_test['Ticket'].str.split().str[0] 377 | train_test['Ticket_Letter'] = train_test['Ticket_Letter'].apply(lambda x:np.nan if x.isnumeric() else x) 378 | train_test.drop('Ticket',inplace=True,axis=1) 379 | 380 | 381 | ---------- 382 | 383 | #分列,此时nan值可以不做处理 384 | train_test = pd.get_dummies(train_test,columns=['Ticket_Letter'],drop_first=True) 385 | 386 | 387 | * ⑧ Age 388 | 389 | 1.该列有大量缺失值,考虑用一个回归模型进行填充. 390 | 391 | 2.在模型修改的时候,考虑到年龄缺失值可能影响死亡情况,用年龄是否缺失值来构造新特征 392 | 393 | ---------- 394 | 395 | 396 | """ 397 | 这是模型就好后回来增加的新特征 398 | 考虑年龄缺失值可能影响死亡情况,数据表明,年龄缺失的死亡率为0.19 399 | """ 400 | train_test.loc[train_test["Age"].isnull()]['Survived'].mean() 401 | 402 | 0.19771863117870722 403 | 404 | 405 | ---------- 406 | 407 | # 所以用年龄是否缺失值来构造新特征 408 | train_test.loc[train_test["Age"].isnull() ,"age_nan"] = 1 409 | train_test.loc[train_test["Age"].notnull() ,"age_nan"] = 0 410 | train_test = pd.get_dummies(train_test,columns=['age_nan']) 411 | 412 | 413 | 利用其他组特征量,采用机器学习算法来预测Age 414 | 415 | train_test.info() 416 | 417 | 418 | Int64Index: 1309 entries, 0 to 1308 419 | Columns: 187 entries, Age to age_nan_1.0 420 | dtypes: float64(2), int64(3), object(1), uint8(181) 421 | memory usage: 343.0+ KB 422 | 423 | 424 | ---------- 425 | 426 | 427 | #创建没有['Age','Survived']的数据集 428 | missing_age = train_test.drop(['Survived','Cabin'],axis=1) 429 | #将Age完整的项作为训练集、将Age缺失的项作为测试集。 430 | missing_age_train = missing_age[missing_age['Age'].notnull()] 431 | missing_age_test = missing_age[missing_age['Age'].isnull()] 432 | 433 | ---------- 434 | 435 | #构建训练集合预测集的X和Y值 436 | missing_age_X_train = missing_age_train.drop(['Age'], axis=1) 437 | missing_age_Y_train = missing_age_train['Age'] 438 | missing_age_X_test = missing_age_test.drop(['Age'], axis=1) 439 | 440 | ---------- 441 | 442 | # 先将数据标准化 443 | from sklearn.preprocessing import StandardScaler 444 | ss = StandardScaler() 445 | #用测试集训练并标准化 446 | ss.fit(missing_age_X_train) 447 | missing_age_X_train = ss.transform(missing_age_X_train) 448 | missing_age_X_test = ss.transform(missing_age_X_test) 449 | 450 | ---------- 451 | 452 | #使用贝叶斯预测年龄 453 | from sklearn import linear_model 454 | lin = linear_model.BayesianRidge() 455 | 456 | ---------- 457 | 458 | lin.fit(missing_age_X_train,missing_age_Y_train) 459 | 460 | BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False, copy_X=True, 461 | fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300, 462 | normalize=False, tol=0.001, verbose=False) 463 | 464 | ---------- 465 | 466 | #利用loc将预测值填入数据集 467 | train_test.loc[(train_test['Age'].isnull()), 'Age'] = lin.predict(missing_age_X_test) 468 | 469 | ---------- 470 | 471 | #将年龄划分是个阶段10以下,10-18,18-30,30-50,50以上 472 | train_test['Age'] = pd.cut(train_test['Age'], bins=[0,10,18,30,50,100],labels=[1,2,3,4,5]) 473 | 474 | train_test = pd.get_dummies(train_test,columns=['Age']) 475 | 476 | ---------- 477 | 478 | 479 | * ⑨ Cabin 480 | 481 | cabin项缺失太多,只能将有无Cain首字母进行分类,缺失值为一类,作为特征值进行建模,也可以考虑直接舍去该特征 482 | 483 | #cabin项缺失太多,只能将有无Cain首字母进行分类,缺失值为一类,作为特征值进行建模 484 | train_test['Cabin'] = train_test['Cabin'].apply(lambda x:str(x)[0] if pd.notnull(x) else x) 485 | train_test = pd.get_dummies(train_test,columns=['Cabin']) 486 | 487 | ---------- 488 | 489 | #cabin项缺失太多,只能将有无Cain首字母进行分类, 490 | train_test.loc[train_test["Cabin"].isnull() ,"Cabin_nan"] = 1 491 | train_test.loc[train_test["Cabin"].notnull() ,"Cabin_nan"] = 0 492 | train_test = pd.get_dummies(train_test,columns=['Cabin_nan']) 493 | train_test.drop('Cabin',axis=1,inplace=True) 494 | 495 | 496 | * ⑩ 特征工程处理完了,划分数据集 497 | 498 | ---------- 499 | 500 | train_data = train_test[:891] 501 | test_data = train_test[891:] 502 | train_data_X = train_data.drop(['Survived'],axis=1) 503 | train_data_Y = train_data['Survived'] 504 | test_data_X = test_data.drop(['Survived'],axis=1) 505 | 506 | ### 数据规约 507 | 508 | 1. 线性模型需要用标准化的数据建模,而树类模型不需要标准化的数据 509 | 2. 处理标准化的时候,注意将测试集的数据transform到test集上 510 | 511 | ---------- 512 | 513 | 514 | from sklearn.preprocessing import StandardScaler 515 | ss2 = StandardScaler() 516 | ss2.fit(train_data_X) 517 | train_data_X_sd = ss2.transform(train_data_X) 518 | test_data_X_sd = ss2.transform(test_data_X) 519 | 520 | 521 | ## 三. 建立模型 522 | 523 | ### 模型发现 524 | 525 | 1. 可选单个模型模型有随机森林,逻辑回归,svm,xgboost,gbdt等. 526 | 2. 也可以将多个模型组合起来,进行模型融合,比如voting,stacking等方法 527 | 3. 好的特征决定模型上限,好的模型和参数可以无线逼近上限. 528 | 4. 我测试了多种模型,模型结果最高的随机森林,最高有0.8. 529 | 530 | ### 构建模型 531 | 532 | - 随机森林 533 | 534 | ---------- 535 | 536 | from sklearn.ensemble import RandomForestClassifier 537 | 538 | rf = RandomForestClassifier(n_estimators=150,min_samples_leaf=3,max_depth=6,oob_score=True) 539 | rf.fit(train_data_X,train_data_Y) 540 | 541 | test["Survived"] = rf.predict(test_data_X) 542 | RF = test[['PassengerId','Survived']].set_index('PassengerId') 543 | RF.to_csv('RF.csv') 544 | 545 | ---------- 546 | 547 | # 随机森林是随机选取特征进行建模的,所以每次的结果可能都有点小差异 548 | # 如果分数足够好,可以将该模型保存起来,下次直接调出来使用0.81339 'rf10.pkl' 549 | from sklearn.externals import joblib 550 | joblib.dump(rf, 'rf10.pkl') 551 | 552 | ---------- 553 | 554 | 555 | - LogisticRegression 556 | 557 | ---------- 558 | from sklearn.linear_model import LogisticRegression 559 | from sklearn.grid_search import GridSearchCV 560 | 561 | lr = LogisticRegression() 562 | param = {'C':[0.001,0.01,0.1,1,10],"max_iter":[100,250]} 563 | clf = GridSearchCV(lr,param,cv=5,n_jobs=-1,verbose=1,scoring="roc_auc") 564 | clf.fit(train_data_X_sd,train_data_Y) 565 | 566 | # 打印参数的得分情况 567 | clf.grid_scores_ 568 | 569 | # 打印最佳参数 570 | clf.best_params_ 571 | 572 | # 将最佳参数传入训练模型 573 | lr = LogisticRegression(clf.best_params_) 574 | lr.fit(train_data_X_sd,train_data_Y) 575 | 576 | # 预测结果 577 | lr.predict(test_data_X_sd) 578 | 579 | # 打印结果 580 | test["Survived"] = lr.predict(test_data_X_sd) 581 | LS = test[['PassengerId','Survived']].set_index('PassengerId') 582 | 583 | # 输出结果 584 | LS.to_csv('LS5.csv') 585 | 586 | ---------- 587 | 588 | - SVM 589 | 590 | ---------- 591 | from sklearn import svm 592 | svc = svm.SVC() 593 | 594 | clf = GridSearchCV(svc,param,cv=5,n_jobs=-1,verbose=1,scoring="roc_auc") 595 | clf.fit(train_data_X_sd,train_data_Y) 596 | 597 | clf.best_params_ 598 | 599 | svc = svm.SVC(C=1,max_iter=250) 600 | 601 | # 训练模型并预测结果 602 | svc.fit(train_data_X_sd,train_data_Y) 603 | svc.predict(test_data_X_sd) 604 | 605 | # 打印结果 606 | test["Survived"] = svc.predict(test_data_X_sd) 607 | SVM = test[['PassengerId','Survived']].set_index('PassengerId') 608 | SVM.to_csv('svm1.csv') 609 | 610 | 611 | ---------- 612 | 613 | 614 | - GBDT 615 | 616 | ---------- 617 | 618 | from sklearn.ensemble import GradientBoostingClassifier 619 | 620 | gbdt = GradientBoostingClassifier(learning_rate=0.7,max_depth=6,n_estimators=100,min_samples_leaf=2) 621 | 622 | gbdt.fit(train_data_X,train_data_Y) 623 | 624 | test["Survived"] = gbdt.predict(test_data_X) 625 | test[['PassengerId','Survived']].set_index('PassengerId').to_csv('gbdt3.csv') 626 | 627 | ---------- 628 | 629 | - xgboost 630 | 631 | 632 | ---------- 633 | 634 | import xgboost as xgb 635 | 636 | xgb_model = xgb.XGBClassifier(n_estimators=150,min_samples_leaf=3,max_depth=6) 637 | xgb_model.fit(train_data_X,train_data_Y) 638 | 639 | test["Survived"] = xgb_model.predict(test_data_X) 640 | XGB = test[['PassengerId','Survived']].set_index('PassengerId') 641 | XGB.to_csv('XGB5.csv') 642 | 643 | 644 | ---------- 645 | 646 | 647 | ## 四 建立模型 648 | 649 | - 模型融合 voting 650 | 651 | 652 | ---------- 653 | 654 | from sklearn.ensemble import VotingClassifier 655 | 656 | from sklearn.linear_model import LogisticRegression 657 | lr = LogisticRegression(C=0.1,max_iter=100) 658 | 659 | import xgboost as xgb 660 | xgb_model = xgb.XGBClassifier(max_depth=6,min_samples_leaf=2,n_estimators=100,num_round = 5) 661 | 662 | from sklearn.ensemble import RandomForestClassifier 663 | rf = RandomForestClassifier(n_estimators=200,min_samples_leaf=2,max_depth=6,oob_score=True) 664 | 665 | from sklearn.ensemble import GradientBoostingClassifier 666 | gbdt = GradientBoostingClassifier(learning_rate=0.1,min_samples_leaf=2,max_depth=6,n_estimators=100) 667 | 668 | vot = VotingClassifier(estimators=[('lr', lr), ('rf', rf),('gbdt',gbdt),('xgb',xgb_model)], voting='hard') 669 | vot.fit(train_data_X_sd,train_data_Y) 670 | vot.predict(test_data_X_sd) 671 | 672 | test["Survived"] = vot.predict(test_data_X_sd) 673 | test[['PassengerId','Survived']].set_index('PassengerId').to_csv('vot5.csv') 674 | 675 | ---------- 676 | 677 | 678 | - 模型融合 stacking 679 | 680 | ---------- 681 | 682 | # 划分train数据集,调用代码,把数据集名字转成和代码一样 683 | X = train_data_X_sd 684 | X_predict = test_data_X_sd 685 | y = train_data_Y 686 | 687 | '''模型融合中使用到的各个单模型''' 688 | from sklearn.linear_model import LogisticRegression 689 | from sklearn import svm 690 | import xgboost as xgb 691 | from sklearn.ensemble import RandomForestClassifier 692 | from sklearn.ensemble import GradientBoostingClassifier 693 | 694 | clfs = [LogisticRegression(C=0.1,max_iter=100), 695 | xgb.XGBClassifier(max_depth=6,n_estimators=100,num_round = 5), 696 | RandomForestClassifier(n_estimators=100,max_depth=6,oob_score=True), 697 | GradientBoostingClassifier(learning_rate=0.3,max_depth=6,n_estimators=100)] 698 | 699 | # 创建n_folds 700 | from sklearn.cross_validation import StratifiedKFold 701 | n_folds = 5 702 | skf = list(StratifiedKFold(y, n_folds)) 703 | 704 | # 创建零矩阵 705 | dataset_blend_train = np.zeros((X.shape[0], len(clfs))) 706 | dataset_blend_test = np.zeros((X_predict.shape[0], len(clfs))) 707 | 708 | # 建立模型 709 | for j, clf in enumerate(clfs): 710 | '''依次训练各个单模型''' 711 | # print(j, clf) 712 | dataset_blend_test_j = np.zeros((X_predict.shape[0], len(skf))) 713 | for i, (train, test) in enumerate(skf): 714 | '''使用第i个部分作为预测,剩余的部分来训练模型,获得其预测的输出作为第i部分的新特征。''' 715 | # print("Fold", i) 716 | X_train, y_train, X_test, y_test = X[train], y[train], X[test], y[test] 717 | clf.fit(X_train, y_train) 718 | y_submission = clf.predict_proba(X_test)[:, 1] 719 | dataset_blend_train[test, j] = y_submission 720 | dataset_blend_test_j[:, i] = clf.predict_proba(X_predict)[:, 1] 721 | '''对于测试集,直接用这k个模型的预测值均值作为新的特征。''' 722 | dataset_blend_test[:, j] = dataset_blend_test_j.mean(1) 723 | 724 | # 用建立第二层模型 725 | clf2 = LogisticRegression(C=0.1,max_iter=100) 726 | clf2.fit(dataset_blend_train, y) 727 | y_submission = clf2.predict_proba(dataset_blend_test)[:, 1] 728 | 729 | test = pd.read_csv("test.csv") 730 | test["Survived"] = clf2.predict(dataset_blend_test) 731 | test[['PassengerId','Survived']].set_index('PassengerId').to_csv('stack3.csv') 732 | 733 | 734 | 735 | 736 | 737 | -------------------------------------------------------------------------------- /01_泰坦尼克号(titanic)/datasets/test.csv: -------------------------------------------------------------------------------- 1 | PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked 2 | 892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q 3 | 893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47,1,0,363272,7,,S 4 | 894,2,"Myles, Mr. Thomas Francis",male,62,0,0,240276,9.6875,,Q 5 | 895,3,"Wirz, Mr. Albert",male,27,0,0,315154,8.6625,,S 6 | 896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22,1,1,3101298,12.2875,,S 7 | 897,3,"Svensson, Mr. Johan Cervin",male,14,0,0,7538,9.225,,S 8 | 898,3,"Connolly, Miss. Kate",female,30,0,0,330972,7.6292,,Q 9 | 899,2,"Caldwell, Mr. Albert Francis",male,26,1,1,248738,29,,S 10 | 900,3,"Abrahim, Mrs. Joseph (Sophie Halaut Easu)",female,18,0,0,2657,7.2292,,C 11 | 901,3,"Davies, Mr. John Samuel",male,21,2,0,A/4 48871,24.15,,S 12 | 902,3,"Ilieff, Mr. Ylio",male,,0,0,349220,7.8958,,S 13 | 903,1,"Jones, Mr. Charles Cresson",male,46,0,0,694,26,,S 14 | 904,1,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23,1,0,21228,82.2667,B45,S 15 | 905,2,"Howard, Mr. Benjamin",male,63,1,0,24065,26,,S 16 | 906,1,"Chaffee, Mrs. Herbert Fuller (Carrie Constance Toogood)",female,47,1,0,W.E.P. 5734,61.175,E31,S 17 | 907,2,"del Carlo, Mrs. Sebastiano (Argenia Genovesi)",female,24,1,0,SC/PARIS 2167,27.7208,,C 18 | 908,2,"Keane, Mr. Daniel",male,35,0,0,233734,12.35,,Q 19 | 909,3,"Assaf, Mr. Gerios",male,21,0,0,2692,7.225,,C 20 | 910,3,"Ilmakangas, Miss. Ida Livija",female,27,1,0,STON/O2. 3101270,7.925,,S 21 | 911,3,"Assaf Khalil, Mrs. Mariana (Miriam"")""",female,45,0,0,2696,7.225,,C 22 | 912,1,"Rothschild, Mr. Martin",male,55,1,0,PC 17603,59.4,,C 23 | 913,3,"Olsen, Master. Artur Karl",male,9,0,1,C 17368,3.1708,,S 24 | 914,1,"Flegenheim, Mrs. Alfred (Antoinette)",female,,0,0,PC 17598,31.6833,,S 25 | 915,1,"Williams, Mr. Richard Norris II",male,21,0,1,PC 17597,61.3792,,C 26 | 916,1,"Ryerson, Mrs. Arthur Larned (Emily Maria Borie)",female,48,1,3,PC 17608,262.375,B57 B59 B63 B66,C 27 | 917,3,"Robins, Mr. Alexander A",male,50,1,0,A/5. 3337,14.5,,S 28 | 918,1,"Ostby, Miss. Helene Ragnhild",female,22,0,1,113509,61.9792,B36,C 29 | 919,3,"Daher, Mr. Shedid",male,22.5,0,0,2698,7.225,,C 30 | 920,1,"Brady, Mr. John Bertram",male,41,0,0,113054,30.5,A21,S 31 | 921,3,"Samaan, Mr. Elias",male,,2,0,2662,21.6792,,C 32 | 922,2,"Louch, Mr. Charles Alexander",male,50,1,0,SC/AH 3085,26,,S 33 | 923,2,"Jefferys, Mr. Clifford Thomas",male,24,2,0,C.A. 31029,31.5,,S 34 | 924,3,"Dean, Mrs. Bertram (Eva Georgetta Light)",female,33,1,2,C.A. 2315,20.575,,S 35 | 925,3,"Johnston, Mrs. Andrew G (Elizabeth Lily"" Watson)""",female,,1,2,W./C. 6607,23.45,,S 36 | 926,1,"Mock, Mr. Philipp Edmund",male,30,1,0,13236,57.75,C78,C 37 | 927,3,"Katavelas, Mr. Vassilios (Catavelas Vassilios"")""",male,18.5,0,0,2682,7.2292,,C 38 | 928,3,"Roth, Miss. Sarah A",female,,0,0,342712,8.05,,S 39 | 929,3,"Cacic, Miss. Manda",female,21,0,0,315087,8.6625,,S 40 | 930,3,"Sap, Mr. Julius",male,25,0,0,345768,9.5,,S 41 | 931,3,"Hee, Mr. Ling",male,,0,0,1601,56.4958,,S 42 | 932,3,"Karun, Mr. Franz",male,39,0,1,349256,13.4167,,C 43 | 933,1,"Franklin, Mr. Thomas Parham",male,,0,0,113778,26.55,D34,S 44 | 934,3,"Goldsmith, Mr. Nathan",male,41,0,0,SOTON/O.Q. 3101263,7.85,,S 45 | 935,2,"Corbett, Mrs. Walter H (Irene Colvin)",female,30,0,0,237249,13,,S 46 | 936,1,"Kimball, Mrs. Edwin Nelson Jr (Gertrude Parsons)",female,45,1,0,11753,52.5542,D19,S 47 | 937,3,"Peltomaki, Mr. Nikolai Johannes",male,25,0,0,STON/O 2. 3101291,7.925,,S 48 | 938,1,"Chevre, Mr. Paul Romaine",male,45,0,0,PC 17594,29.7,A9,C 49 | 939,3,"Shaughnessy, Mr. Patrick",male,,0,0,370374,7.75,,Q 50 | 940,1,"Bucknell, Mrs. William Robert (Emma Eliza Ward)",female,60,0,0,11813,76.2917,D15,C 51 | 941,3,"Coutts, Mrs. William (Winnie Minnie"" Treanor)""",female,36,0,2,C.A. 37671,15.9,,S 52 | 942,1,"Smith, Mr. Lucien Philip",male,24,1,0,13695,60,C31,S 53 | 943,2,"Pulbaum, Mr. Franz",male,27,0,0,SC/PARIS 2168,15.0333,,C 54 | 944,2,"Hocking, Miss. Ellen Nellie""""",female,20,2,1,29105,23,,S 55 | 945,1,"Fortune, Miss. Ethel Flora",female,28,3,2,19950,263,C23 C25 C27,S 56 | 946,2,"Mangiavacchi, Mr. Serafino Emilio",male,,0,0,SC/A.3 2861,15.5792,,C 57 | 947,3,"Rice, Master. Albert",male,10,4,1,382652,29.125,,Q 58 | 948,3,"Cor, Mr. Bartol",male,35,0,0,349230,7.8958,,S 59 | 949,3,"Abelseth, Mr. Olaus Jorgensen",male,25,0,0,348122,7.65,F G63,S 60 | 950,3,"Davison, Mr. Thomas Henry",male,,1,0,386525,16.1,,S 61 | 951,1,"Chaudanson, Miss. Victorine",female,36,0,0,PC 17608,262.375,B61,C 62 | 952,3,"Dika, Mr. Mirko",male,17,0,0,349232,7.8958,,S 63 | 953,2,"McCrae, Mr. Arthur Gordon",male,32,0,0,237216,13.5,,S 64 | 954,3,"Bjorklund, Mr. Ernst Herbert",male,18,0,0,347090,7.75,,S 65 | 955,3,"Bradley, Miss. Bridget Delia",female,22,0,0,334914,7.725,,Q 66 | 956,1,"Ryerson, Master. John Borie",male,13,2,2,PC 17608,262.375,B57 B59 B63 B66,C 67 | 957,2,"Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)",female,,0,0,F.C.C. 13534,21,,S 68 | 958,3,"Burns, Miss. Mary Delia",female,18,0,0,330963,7.8792,,Q 69 | 959,1,"Moore, Mr. Clarence Bloomfield",male,47,0,0,113796,42.4,,S 70 | 960,1,"Tucker, Mr. Gilbert Milligan Jr",male,31,0,0,2543,28.5375,C53,C 71 | 961,1,"Fortune, Mrs. Mark (Mary McDougald)",female,60,1,4,19950,263,C23 C25 C27,S 72 | 962,3,"Mulvihill, Miss. Bertha E",female,24,0,0,382653,7.75,,Q 73 | 963,3,"Minkoff, Mr. Lazar",male,21,0,0,349211,7.8958,,S 74 | 964,3,"Nieminen, Miss. Manta Josefina",female,29,0,0,3101297,7.925,,S 75 | 965,1,"Ovies y Rodriguez, Mr. Servando",male,28.5,0,0,PC 17562,27.7208,D43,C 76 | 966,1,"Geiger, Miss. Amalie",female,35,0,0,113503,211.5,C130,C 77 | 967,1,"Keeping, Mr. Edwin",male,32.5,0,0,113503,211.5,C132,C 78 | 968,3,"Miles, Mr. Frank",male,,0,0,359306,8.05,,S 79 | 969,1,"Cornell, Mrs. Robert Clifford (Malvina Helen Lamson)",female,55,2,0,11770,25.7,C101,S 80 | 970,2,"Aldworth, Mr. Charles Augustus",male,30,0,0,248744,13,,S 81 | 971,3,"Doyle, Miss. Elizabeth",female,24,0,0,368702,7.75,,Q 82 | 972,3,"Boulos, Master. Akar",male,6,1,1,2678,15.2458,,C 83 | 973,1,"Straus, Mr. Isidor",male,67,1,0,PC 17483,221.7792,C55 C57,S 84 | 974,1,"Case, Mr. Howard Brown",male,49,0,0,19924,26,,S 85 | 975,3,"Demetri, Mr. Marinko",male,,0,0,349238,7.8958,,S 86 | 976,2,"Lamb, Mr. John Joseph",male,,0,0,240261,10.7083,,Q 87 | 977,3,"Khalil, Mr. Betros",male,,1,0,2660,14.4542,,C 88 | 978,3,"Barry, Miss. Julia",female,27,0,0,330844,7.8792,,Q 89 | 979,3,"Badman, Miss. Emily Louisa",female,18,0,0,A/4 31416,8.05,,S 90 | 980,3,"O'Donoghue, Ms. Bridget",female,,0,0,364856,7.75,,Q 91 | 981,2,"Wells, Master. Ralph Lester",male,2,1,1,29103,23,,S 92 | 982,3,"Dyker, Mrs. Adolf Fredrik (Anna Elisabeth Judith Andersson)",female,22,1,0,347072,13.9,,S 93 | 983,3,"Pedersen, Mr. Olaf",male,,0,0,345498,7.775,,S 94 | 984,1,"Davidson, Mrs. Thornton (Orian Hays)",female,27,1,2,F.C. 12750,52,B71,S 95 | 985,3,"Guest, Mr. Robert",male,,0,0,376563,8.05,,S 96 | 986,1,"Birnbaum, Mr. Jakob",male,25,0,0,13905,26,,C 97 | 987,3,"Tenglin, Mr. Gunnar Isidor",male,25,0,0,350033,7.7958,,S 98 | 988,1,"Cavendish, Mrs. Tyrell William (Julia Florence Siegel)",female,76,1,0,19877,78.85,C46,S 99 | 989,3,"Makinen, Mr. Kalle Edvard",male,29,0,0,STON/O 2. 3101268,7.925,,S 100 | 990,3,"Braf, Miss. Elin Ester Maria",female,20,0,0,347471,7.8542,,S 101 | 991,3,"Nancarrow, Mr. William Henry",male,33,0,0,A./5. 3338,8.05,,S 102 | 992,1,"Stengel, Mrs. Charles Emil Henry (Annie May Morris)",female,43,1,0,11778,55.4417,C116,C 103 | 993,2,"Weisz, Mr. Leopold",male,27,1,0,228414,26,,S 104 | 994,3,"Foley, Mr. William",male,,0,0,365235,7.75,,Q 105 | 995,3,"Johansson Palmquist, Mr. Oskar Leander",male,26,0,0,347070,7.775,,S 106 | 996,3,"Thomas, Mrs. Alexander (Thamine Thelma"")""",female,16,1,1,2625,8.5167,,C 107 | 997,3,"Holthen, Mr. Johan Martin",male,28,0,0,C 4001,22.525,,S 108 | 998,3,"Buckley, Mr. Daniel",male,21,0,0,330920,7.8208,,Q 109 | 999,3,"Ryan, Mr. Edward",male,,0,0,383162,7.75,,Q 110 | 1000,3,"Willer, Mr. Aaron (Abi Weller"")""",male,,0,0,3410,8.7125,,S 111 | 1001,2,"Swane, Mr. George",male,18.5,0,0,248734,13,F,S 112 | 1002,2,"Stanton, Mr. Samuel Ward",male,41,0,0,237734,15.0458,,C 113 | 1003,3,"Shine, Miss. Ellen Natalia",female,,0,0,330968,7.7792,,Q 114 | 1004,1,"Evans, Miss. Edith Corse",female,36,0,0,PC 17531,31.6792,A29,C 115 | 1005,3,"Buckley, Miss. Katherine",female,18.5,0,0,329944,7.2833,,Q 116 | 1006,1,"Straus, Mrs. Isidor (Rosalie Ida Blun)",female,63,1,0,PC 17483,221.7792,C55 C57,S 117 | 1007,3,"Chronopoulos, Mr. Demetrios",male,18,1,0,2680,14.4542,,C 118 | 1008,3,"Thomas, Mr. John",male,,0,0,2681,6.4375,,C 119 | 1009,3,"Sandstrom, Miss. Beatrice Irene",female,1,1,1,PP 9549,16.7,G6,S 120 | 1010,1,"Beattie, Mr. Thomson",male,36,0,0,13050,75.2417,C6,C 121 | 1011,2,"Chapman, Mrs. John Henry (Sara Elizabeth Lawry)",female,29,1,0,SC/AH 29037,26,,S 122 | 1012,2,"Watt, Miss. Bertha J",female,12,0,0,C.A. 33595,15.75,,S 123 | 1013,3,"Kiernan, Mr. John",male,,1,0,367227,7.75,,Q 124 | 1014,1,"Schabert, Mrs. Paul (Emma Mock)",female,35,1,0,13236,57.75,C28,C 125 | 1015,3,"Carver, Mr. Alfred John",male,28,0,0,392095,7.25,,S 126 | 1016,3,"Kennedy, Mr. John",male,,0,0,368783,7.75,,Q 127 | 1017,3,"Cribb, Miss. Laura Alice",female,17,0,1,371362,16.1,,S 128 | 1018,3,"Brobeck, Mr. Karl Rudolf",male,22,0,0,350045,7.7958,,S 129 | 1019,3,"McCoy, Miss. Alicia",female,,2,0,367226,23.25,,Q 130 | 1020,2,"Bowenur, Mr. Solomon",male,42,0,0,211535,13,,S 131 | 1021,3,"Petersen, Mr. Marius",male,24,0,0,342441,8.05,,S 132 | 1022,3,"Spinner, Mr. Henry John",male,32,0,0,STON/OQ. 369943,8.05,,S 133 | 1023,1,"Gracie, Col. Archibald IV",male,53,0,0,113780,28.5,C51,C 134 | 1024,3,"Lefebre, Mrs. Frank (Frances)",female,,0,4,4133,25.4667,,S 135 | 1025,3,"Thomas, Mr. Charles P",male,,1,0,2621,6.4375,,C 136 | 1026,3,"Dintcheff, Mr. Valtcho",male,43,0,0,349226,7.8958,,S 137 | 1027,3,"Carlsson, Mr. Carl Robert",male,24,0,0,350409,7.8542,,S 138 | 1028,3,"Zakarian, Mr. Mapriededer",male,26.5,0,0,2656,7.225,,C 139 | 1029,2,"Schmidt, Mr. August",male,26,0,0,248659,13,,S 140 | 1030,3,"Drapkin, Miss. Jennie",female,23,0,0,SOTON/OQ 392083,8.05,,S 141 | 1031,3,"Goodwin, Mr. Charles Frederick",male,40,1,6,CA 2144,46.9,,S 142 | 1032,3,"Goodwin, Miss. Jessie Allis",female,10,5,2,CA 2144,46.9,,S 143 | 1033,1,"Daniels, Miss. Sarah",female,33,0,0,113781,151.55,,S 144 | 1034,1,"Ryerson, Mr. Arthur Larned",male,61,1,3,PC 17608,262.375,B57 B59 B63 B66,C 145 | 1035,2,"Beauchamp, Mr. Henry James",male,28,0,0,244358,26,,S 146 | 1036,1,"Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey"")""",male,42,0,0,17475,26.55,,S 147 | 1037,3,"Vander Planke, Mr. Julius",male,31,3,0,345763,18,,S 148 | 1038,1,"Hilliard, Mr. Herbert Henry",male,,0,0,17463,51.8625,E46,S 149 | 1039,3,"Davies, Mr. Evan",male,22,0,0,SC/A4 23568,8.05,,S 150 | 1040,1,"Crafton, Mr. John Bertram",male,,0,0,113791,26.55,,S 151 | 1041,2,"Lahtinen, Rev. William",male,30,1,1,250651,26,,S 152 | 1042,1,"Earnshaw, Mrs. Boulton (Olive Potter)",female,23,0,1,11767,83.1583,C54,C 153 | 1043,3,"Matinoff, Mr. Nicola",male,,0,0,349255,7.8958,,C 154 | 1044,3,"Storey, Mr. Thomas",male,60.5,0,0,3701,,,S 155 | 1045,3,"Klasen, Mrs. (Hulda Kristina Eugenia Lofqvist)",female,36,0,2,350405,12.1833,,S 156 | 1046,3,"Asplund, Master. Filip Oscar",male,13,4,2,347077,31.3875,,S 157 | 1047,3,"Duquemin, Mr. Joseph",male,24,0,0,S.O./P.P. 752,7.55,,S 158 | 1048,1,"Bird, Miss. Ellen",female,29,0,0,PC 17483,221.7792,C97,S 159 | 1049,3,"Lundin, Miss. Olga Elida",female,23,0,0,347469,7.8542,,S 160 | 1050,1,"Borebank, Mr. John James",male,42,0,0,110489,26.55,D22,S 161 | 1051,3,"Peacock, Mrs. Benjamin (Edith Nile)",female,26,0,2,SOTON/O.Q. 3101315,13.775,,S 162 | 1052,3,"Smyth, Miss. Julia",female,,0,0,335432,7.7333,,Q 163 | 1053,3,"Touma, Master. Georges Youssef",male,7,1,1,2650,15.2458,,C 164 | 1054,2,"Wright, Miss. Marion",female,26,0,0,220844,13.5,,S 165 | 1055,3,"Pearce, Mr. Ernest",male,,0,0,343271,7,,S 166 | 1056,2,"Peruschitz, Rev. Joseph Maria",male,41,0,0,237393,13,,S 167 | 1057,3,"Kink-Heilmann, Mrs. Anton (Luise Heilmann)",female,26,1,1,315153,22.025,,S 168 | 1058,1,"Brandeis, Mr. Emil",male,48,0,0,PC 17591,50.4958,B10,C 169 | 1059,3,"Ford, Mr. Edward Watson",male,18,2,2,W./C. 6608,34.375,,S 170 | 1060,1,"Cassebeer, Mrs. Henry Arthur Jr (Eleanor Genevieve Fosdick)",female,,0,0,17770,27.7208,,C 171 | 1061,3,"Hellstrom, Miss. Hilda Maria",female,22,0,0,7548,8.9625,,S 172 | 1062,3,"Lithman, Mr. Simon",male,,0,0,S.O./P.P. 251,7.55,,S 173 | 1063,3,"Zakarian, Mr. Ortin",male,27,0,0,2670,7.225,,C 174 | 1064,3,"Dyker, Mr. Adolf Fredrik",male,23,1,0,347072,13.9,,S 175 | 1065,3,"Torfa, Mr. Assad",male,,0,0,2673,7.2292,,C 176 | 1066,3,"Asplund, Mr. Carl Oscar Vilhelm Gustafsson",male,40,1,5,347077,31.3875,,S 177 | 1067,2,"Brown, Miss. Edith Eileen",female,15,0,2,29750,39,,S 178 | 1068,2,"Sincock, Miss. Maude",female,20,0,0,C.A. 33112,36.75,,S 179 | 1069,1,"Stengel, Mr. Charles Emil Henry",male,54,1,0,11778,55.4417,C116,C 180 | 1070,2,"Becker, Mrs. Allen Oliver (Nellie E Baumgardner)",female,36,0,3,230136,39,F4,S 181 | 1071,1,"Compton, Mrs. Alexander Taylor (Mary Eliza Ingersoll)",female,64,0,2,PC 17756,83.1583,E45,C 182 | 1072,2,"McCrie, Mr. James Matthew",male,30,0,0,233478,13,,S 183 | 1073,1,"Compton, Mr. Alexander Taylor Jr",male,37,1,1,PC 17756,83.1583,E52,C 184 | 1074,1,"Marvin, Mrs. Daniel Warner (Mary Graham Carmichael Farquarson)",female,18,1,0,113773,53.1,D30,S 185 | 1075,3,"Lane, Mr. Patrick",male,,0,0,7935,7.75,,Q 186 | 1076,1,"Douglas, Mrs. Frederick Charles (Mary Helene Baxter)",female,27,1,1,PC 17558,247.5208,B58 B60,C 187 | 1077,2,"Maybery, Mr. Frank Hubert",male,40,0,0,239059,16,,S 188 | 1078,2,"Phillips, Miss. Alice Frances Louisa",female,21,0,1,S.O./P.P. 2,21,,S 189 | 1079,3,"Davies, Mr. Joseph",male,17,2,0,A/4 48873,8.05,,S 190 | 1080,3,"Sage, Miss. Ada",female,,8,2,CA. 2343,69.55,,S 191 | 1081,2,"Veal, Mr. James",male,40,0,0,28221,13,,S 192 | 1082,2,"Angle, Mr. William A",male,34,1,0,226875,26,,S 193 | 1083,1,"Salomon, Mr. Abraham L",male,,0,0,111163,26,,S 194 | 1084,3,"van Billiard, Master. Walter John",male,11.5,1,1,A/5. 851,14.5,,S 195 | 1085,2,"Lingane, Mr. John",male,61,0,0,235509,12.35,,Q 196 | 1086,2,"Drew, Master. Marshall Brines",male,8,0,2,28220,32.5,,S 197 | 1087,3,"Karlsson, Mr. Julius Konrad Eugen",male,33,0,0,347465,7.8542,,S 198 | 1088,1,"Spedden, Master. Robert Douglas",male,6,0,2,16966,134.5,E34,C 199 | 1089,3,"Nilsson, Miss. Berta Olivia",female,18,0,0,347066,7.775,,S 200 | 1090,2,"Baimbrigge, Mr. Charles Robert",male,23,0,0,C.A. 31030,10.5,,S 201 | 1091,3,"Rasmussen, Mrs. (Lena Jacobsen Solvang)",female,,0,0,65305,8.1125,,S 202 | 1092,3,"Murphy, Miss. Nora",female,,0,0,36568,15.5,,Q 203 | 1093,3,"Danbom, Master. Gilbert Sigvard Emanuel",male,0.33,0,2,347080,14.4,,S 204 | 1094,1,"Astor, Col. John Jacob",male,47,1,0,PC 17757,227.525,C62 C64,C 205 | 1095,2,"Quick, Miss. Winifred Vera",female,8,1,1,26360,26,,S 206 | 1096,2,"Andrew, Mr. Frank Thomas",male,25,0,0,C.A. 34050,10.5,,S 207 | 1097,1,"Omont, Mr. Alfred Fernand",male,,0,0,F.C. 12998,25.7417,,C 208 | 1098,3,"McGowan, Miss. Katherine",female,35,0,0,9232,7.75,,Q 209 | 1099,2,"Collett, Mr. Sidney C Stuart",male,24,0,0,28034,10.5,,S 210 | 1100,1,"Rosenbaum, Miss. Edith Louise",female,33,0,0,PC 17613,27.7208,A11,C 211 | 1101,3,"Delalic, Mr. Redjo",male,25,0,0,349250,7.8958,,S 212 | 1102,3,"Andersen, Mr. Albert Karvin",male,32,0,0,C 4001,22.525,,S 213 | 1103,3,"Finoli, Mr. Luigi",male,,0,0,SOTON/O.Q. 3101308,7.05,,S 214 | 1104,2,"Deacon, Mr. Percy William",male,17,0,0,S.O.C. 14879,73.5,,S 215 | 1105,2,"Howard, Mrs. Benjamin (Ellen Truelove Arman)",female,60,1,0,24065,26,,S 216 | 1106,3,"Andersson, Miss. Ida Augusta Margareta",female,38,4,2,347091,7.775,,S 217 | 1107,1,"Head, Mr. Christopher",male,42,0,0,113038,42.5,B11,S 218 | 1108,3,"Mahon, Miss. Bridget Delia",female,,0,0,330924,7.8792,,Q 219 | 1109,1,"Wick, Mr. George Dennick",male,57,1,1,36928,164.8667,,S 220 | 1110,1,"Widener, Mrs. George Dunton (Eleanor Elkins)",female,50,1,1,113503,211.5,C80,C 221 | 1111,3,"Thomson, Mr. Alexander Morrison",male,,0,0,32302,8.05,,S 222 | 1112,2,"Duran y More, Miss. Florentina",female,30,1,0,SC/PARIS 2148,13.8583,,C 223 | 1113,3,"Reynolds, Mr. Harold J",male,21,0,0,342684,8.05,,S 224 | 1114,2,"Cook, Mrs. (Selena Rogers)",female,22,0,0,W./C. 14266,10.5,F33,S 225 | 1115,3,"Karlsson, Mr. Einar Gervasius",male,21,0,0,350053,7.7958,,S 226 | 1116,1,"Candee, Mrs. Edward (Helen Churchill Hungerford)",female,53,0,0,PC 17606,27.4458,,C 227 | 1117,3,"Moubarek, Mrs. George (Omine Amenia"" Alexander)""",female,,0,2,2661,15.2458,,C 228 | 1118,3,"Asplund, Mr. Johan Charles",male,23,0,0,350054,7.7958,,S 229 | 1119,3,"McNeill, Miss. Bridget",female,,0,0,370368,7.75,,Q 230 | 1120,3,"Everett, Mr. Thomas James",male,40.5,0,0,C.A. 6212,15.1,,S 231 | 1121,2,"Hocking, Mr. Samuel James Metcalfe",male,36,0,0,242963,13,,S 232 | 1122,2,"Sweet, Mr. George Frederick",male,14,0,0,220845,65,,S 233 | 1123,1,"Willard, Miss. Constance",female,21,0,0,113795,26.55,,S 234 | 1124,3,"Wiklund, Mr. Karl Johan",male,21,1,0,3101266,6.4958,,S 235 | 1125,3,"Linehan, Mr. Michael",male,,0,0,330971,7.8792,,Q 236 | 1126,1,"Cumings, Mr. John Bradley",male,39,1,0,PC 17599,71.2833,C85,C 237 | 1127,3,"Vendel, Mr. Olof Edvin",male,20,0,0,350416,7.8542,,S 238 | 1128,1,"Warren, Mr. Frank Manley",male,64,1,0,110813,75.25,D37,C 239 | 1129,3,"Baccos, Mr. Raffull",male,20,0,0,2679,7.225,,C 240 | 1130,2,"Hiltunen, Miss. Marta",female,18,1,1,250650,13,,S 241 | 1131,1,"Douglas, Mrs. Walter Donald (Mahala Dutton)",female,48,1,0,PC 17761,106.425,C86,C 242 | 1132,1,"Lindstrom, Mrs. Carl Johan (Sigrid Posse)",female,55,0,0,112377,27.7208,,C 243 | 1133,2,"Christy, Mrs. (Alice Frances)",female,45,0,2,237789,30,,S 244 | 1134,1,"Spedden, Mr. Frederic Oakley",male,45,1,1,16966,134.5,E34,C 245 | 1135,3,"Hyman, Mr. Abraham",male,,0,0,3470,7.8875,,S 246 | 1136,3,"Johnston, Master. William Arthur Willie""""",male,,1,2,W./C. 6607,23.45,,S 247 | 1137,1,"Kenyon, Mr. Frederick R",male,41,1,0,17464,51.8625,D21,S 248 | 1138,2,"Karnes, Mrs. J Frank (Claire Bennett)",female,22,0,0,F.C.C. 13534,21,,S 249 | 1139,2,"Drew, Mr. James Vivian",male,42,1,1,28220,32.5,,S 250 | 1140,2,"Hold, Mrs. Stephen (Annie Margaret Hill)",female,29,1,0,26707,26,,S 251 | 1141,3,"Khalil, Mrs. Betros (Zahie Maria"" Elias)""",female,,1,0,2660,14.4542,,C 252 | 1142,2,"West, Miss. Barbara J",female,0.92,1,2,C.A. 34651,27.75,,S 253 | 1143,3,"Abrahamsson, Mr. Abraham August Johannes",male,20,0,0,SOTON/O2 3101284,7.925,,S 254 | 1144,1,"Clark, Mr. Walter Miller",male,27,1,0,13508,136.7792,C89,C 255 | 1145,3,"Salander, Mr. Karl Johan",male,24,0,0,7266,9.325,,S 256 | 1146,3,"Wenzel, Mr. Linhart",male,32.5,0,0,345775,9.5,,S 257 | 1147,3,"MacKay, Mr. George William",male,,0,0,C.A. 42795,7.55,,S 258 | 1148,3,"Mahon, Mr. John",male,,0,0,AQ/4 3130,7.75,,Q 259 | 1149,3,"Niklasson, Mr. Samuel",male,28,0,0,363611,8.05,,S 260 | 1150,2,"Bentham, Miss. Lilian W",female,19,0,0,28404,13,,S 261 | 1151,3,"Midtsjo, Mr. Karl Albert",male,21,0,0,345501,7.775,,S 262 | 1152,3,"de Messemaeker, Mr. Guillaume Joseph",male,36.5,1,0,345572,17.4,,S 263 | 1153,3,"Nilsson, Mr. August Ferdinand",male,21,0,0,350410,7.8542,,S 264 | 1154,2,"Wells, Mrs. Arthur Henry (Addie"" Dart Trevaskis)""",female,29,0,2,29103,23,,S 265 | 1155,3,"Klasen, Miss. Gertrud Emilia",female,1,1,1,350405,12.1833,,S 266 | 1156,2,"Portaluppi, Mr. Emilio Ilario Giuseppe",male,30,0,0,C.A. 34644,12.7375,,C 267 | 1157,3,"Lyntakoff, Mr. Stanko",male,,0,0,349235,7.8958,,S 268 | 1158,1,"Chisholm, Mr. Roderick Robert Crispin",male,,0,0,112051,0,,S 269 | 1159,3,"Warren, Mr. Charles William",male,,0,0,C.A. 49867,7.55,,S 270 | 1160,3,"Howard, Miss. May Elizabeth",female,,0,0,A. 2. 39186,8.05,,S 271 | 1161,3,"Pokrnic, Mr. Mate",male,17,0,0,315095,8.6625,,S 272 | 1162,1,"McCaffry, Mr. Thomas Francis",male,46,0,0,13050,75.2417,C6,C 273 | 1163,3,"Fox, Mr. Patrick",male,,0,0,368573,7.75,,Q 274 | 1164,1,"Clark, Mrs. Walter Miller (Virginia McDowell)",female,26,1,0,13508,136.7792,C89,C 275 | 1165,3,"Lennon, Miss. Mary",female,,1,0,370371,15.5,,Q 276 | 1166,3,"Saade, Mr. Jean Nassr",male,,0,0,2676,7.225,,C 277 | 1167,2,"Bryhl, Miss. Dagmar Jenny Ingeborg ",female,20,1,0,236853,26,,S 278 | 1168,2,"Parker, Mr. Clifford Richard",male,28,0,0,SC 14888,10.5,,S 279 | 1169,2,"Faunthorpe, Mr. Harry",male,40,1,0,2926,26,,S 280 | 1170,2,"Ware, Mr. John James",male,30,1,0,CA 31352,21,,S 281 | 1171,2,"Oxenham, Mr. Percy Thomas",male,22,0,0,W./C. 14260,10.5,,S 282 | 1172,3,"Oreskovic, Miss. Jelka",female,23,0,0,315085,8.6625,,S 283 | 1173,3,"Peacock, Master. Alfred Edward",male,0.75,1,1,SOTON/O.Q. 3101315,13.775,,S 284 | 1174,3,"Fleming, Miss. Honora",female,,0,0,364859,7.75,,Q 285 | 1175,3,"Touma, Miss. Maria Youssef",female,9,1,1,2650,15.2458,,C 286 | 1176,3,"Rosblom, Miss. Salli Helena",female,2,1,1,370129,20.2125,,S 287 | 1177,3,"Dennis, Mr. William",male,36,0,0,A/5 21175,7.25,,S 288 | 1178,3,"Franklin, Mr. Charles (Charles Fardon)",male,,0,0,SOTON/O.Q. 3101314,7.25,,S 289 | 1179,1,"Snyder, Mr. John Pillsbury",male,24,1,0,21228,82.2667,B45,S 290 | 1180,3,"Mardirosian, Mr. Sarkis",male,,0,0,2655,7.2292,F E46,C 291 | 1181,3,"Ford, Mr. Arthur",male,,0,0,A/5 1478,8.05,,S 292 | 1182,1,"Rheims, Mr. George Alexander Lucien",male,,0,0,PC 17607,39.6,,S 293 | 1183,3,"Daly, Miss. Margaret Marcella Maggie""""",female,30,0,0,382650,6.95,,Q 294 | 1184,3,"Nasr, Mr. Mustafa",male,,0,0,2652,7.2292,,C 295 | 1185,1,"Dodge, Dr. Washington",male,53,1,1,33638,81.8583,A34,S 296 | 1186,3,"Wittevrongel, Mr. Camille",male,36,0,0,345771,9.5,,S 297 | 1187,3,"Angheloff, Mr. Minko",male,26,0,0,349202,7.8958,,S 298 | 1188,2,"Laroche, Miss. Louise",female,1,1,2,SC/Paris 2123,41.5792,,C 299 | 1189,3,"Samaan, Mr. Hanna",male,,2,0,2662,21.6792,,C 300 | 1190,1,"Loring, Mr. Joseph Holland",male,30,0,0,113801,45.5,,S 301 | 1191,3,"Johansson, Mr. Nils",male,29,0,0,347467,7.8542,,S 302 | 1192,3,"Olsson, Mr. Oscar Wilhelm",male,32,0,0,347079,7.775,,S 303 | 1193,2,"Malachard, Mr. Noel",male,,0,0,237735,15.0458,D,C 304 | 1194,2,"Phillips, Mr. Escott Robert",male,43,0,1,S.O./P.P. 2,21,,S 305 | 1195,3,"Pokrnic, Mr. Tome",male,24,0,0,315092,8.6625,,S 306 | 1196,3,"McCarthy, Miss. Catherine Katie""""",female,,0,0,383123,7.75,,Q 307 | 1197,1,"Crosby, Mrs. Edward Gifford (Catherine Elizabeth Halstead)",female,64,1,1,112901,26.55,B26,S 308 | 1198,1,"Allison, Mr. Hudson Joshua Creighton",male,30,1,2,113781,151.55,C22 C26,S 309 | 1199,3,"Aks, Master. Philip Frank",male,0.83,0,1,392091,9.35,,S 310 | 1200,1,"Hays, Mr. Charles Melville",male,55,1,1,12749,93.5,B69,S 311 | 1201,3,"Hansen, Mrs. Claus Peter (Jennie L Howard)",female,45,1,0,350026,14.1083,,S 312 | 1202,3,"Cacic, Mr. Jego Grga",male,18,0,0,315091,8.6625,,S 313 | 1203,3,"Vartanian, Mr. David",male,22,0,0,2658,7.225,,C 314 | 1204,3,"Sadowitz, Mr. Harry",male,,0,0,LP 1588,7.575,,S 315 | 1205,3,"Carr, Miss. Jeannie",female,37,0,0,368364,7.75,,Q 316 | 1206,1,"White, Mrs. John Stuart (Ella Holmes)",female,55,0,0,PC 17760,135.6333,C32,C 317 | 1207,3,"Hagardon, Miss. Kate",female,17,0,0,AQ/3. 30631,7.7333,,Q 318 | 1208,1,"Spencer, Mr. William Augustus",male,57,1,0,PC 17569,146.5208,B78,C 319 | 1209,2,"Rogers, Mr. Reginald Harry",male,19,0,0,28004,10.5,,S 320 | 1210,3,"Jonsson, Mr. Nils Hilding",male,27,0,0,350408,7.8542,,S 321 | 1211,2,"Jefferys, Mr. Ernest Wilfred",male,22,2,0,C.A. 31029,31.5,,S 322 | 1212,3,"Andersson, Mr. Johan Samuel",male,26,0,0,347075,7.775,,S 323 | 1213,3,"Krekorian, Mr. Neshan",male,25,0,0,2654,7.2292,F E57,C 324 | 1214,2,"Nesson, Mr. Israel",male,26,0,0,244368,13,F2,S 325 | 1215,1,"Rowe, Mr. Alfred G",male,33,0,0,113790,26.55,,S 326 | 1216,1,"Kreuchen, Miss. Emilie",female,39,0,0,24160,211.3375,,S 327 | 1217,3,"Assam, Mr. Ali",male,23,0,0,SOTON/O.Q. 3101309,7.05,,S 328 | 1218,2,"Becker, Miss. Ruth Elizabeth",female,12,2,1,230136,39,F4,S 329 | 1219,1,"Rosenshine, Mr. George (Mr George Thorne"")""",male,46,0,0,PC 17585,79.2,,C 330 | 1220,2,"Clarke, Mr. Charles Valentine",male,29,1,0,2003,26,,S 331 | 1221,2,"Enander, Mr. Ingvar",male,21,0,0,236854,13,,S 332 | 1222,2,"Davies, Mrs. John Morgan (Elizabeth Agnes Mary White) ",female,48,0,2,C.A. 33112,36.75,,S 333 | 1223,1,"Dulles, Mr. William Crothers",male,39,0,0,PC 17580,29.7,A18,C 334 | 1224,3,"Thomas, Mr. Tannous",male,,0,0,2684,7.225,,C 335 | 1225,3,"Nakid, Mrs. Said (Waika Mary"" Mowad)""",female,19,1,1,2653,15.7417,,C 336 | 1226,3,"Cor, Mr. Ivan",male,27,0,0,349229,7.8958,,S 337 | 1227,1,"Maguire, Mr. John Edward",male,30,0,0,110469,26,C106,S 338 | 1228,2,"de Brito, Mr. Jose Joaquim",male,32,0,0,244360,13,,S 339 | 1229,3,"Elias, Mr. Joseph",male,39,0,2,2675,7.2292,,C 340 | 1230,2,"Denbury, Mr. Herbert",male,25,0,0,C.A. 31029,31.5,,S 341 | 1231,3,"Betros, Master. Seman",male,,0,0,2622,7.2292,,C 342 | 1232,2,"Fillbrook, Mr. Joseph Charles",male,18,0,0,C.A. 15185,10.5,,S 343 | 1233,3,"Lundstrom, Mr. Thure Edvin",male,32,0,0,350403,7.5792,,S 344 | 1234,3,"Sage, Mr. John George",male,,1,9,CA. 2343,69.55,,S 345 | 1235,1,"Cardeza, Mrs. James Warburton Martinez (Charlotte Wardle Drake)",female,58,0,1,PC 17755,512.3292,B51 B53 B55,C 346 | 1236,3,"van Billiard, Master. James William",male,,1,1,A/5. 851,14.5,,S 347 | 1237,3,"Abelseth, Miss. Karen Marie",female,16,0,0,348125,7.65,,S 348 | 1238,2,"Botsford, Mr. William Hull",male,26,0,0,237670,13,,S 349 | 1239,3,"Whabee, Mrs. George Joseph (Shawneene Abi-Saab)",female,38,0,0,2688,7.2292,,C 350 | 1240,2,"Giles, Mr. Ralph",male,24,0,0,248726,13.5,,S 351 | 1241,2,"Walcroft, Miss. Nellie",female,31,0,0,F.C.C. 13528,21,,S 352 | 1242,1,"Greenfield, Mrs. Leo David (Blanche Strouse)",female,45,0,1,PC 17759,63.3583,D10 D12,C 353 | 1243,2,"Stokes, Mr. Philip Joseph",male,25,0,0,F.C.C. 13540,10.5,,S 354 | 1244,2,"Dibden, Mr. William",male,18,0,0,S.O.C. 14879,73.5,,S 355 | 1245,2,"Herman, Mr. Samuel",male,49,1,2,220845,65,,S 356 | 1246,3,"Dean, Miss. Elizabeth Gladys Millvina""""",female,0.17,1,2,C.A. 2315,20.575,,S 357 | 1247,1,"Julian, Mr. Henry Forbes",male,50,0,0,113044,26,E60,S 358 | 1248,1,"Brown, Mrs. John Murray (Caroline Lane Lamson)",female,59,2,0,11769,51.4792,C101,S 359 | 1249,3,"Lockyer, Mr. Edward",male,,0,0,1222,7.8792,,S 360 | 1250,3,"O'Keefe, Mr. Patrick",male,,0,0,368402,7.75,,Q 361 | 1251,3,"Lindell, Mrs. Edvard Bengtsson (Elin Gerda Persson)",female,30,1,0,349910,15.55,,S 362 | 1252,3,"Sage, Master. William Henry",male,14.5,8,2,CA. 2343,69.55,,S 363 | 1253,2,"Mallet, Mrs. Albert (Antoinette Magnin)",female,24,1,1,S.C./PARIS 2079,37.0042,,C 364 | 1254,2,"Ware, Mrs. John James (Florence Louise Long)",female,31,0,0,CA 31352,21,,S 365 | 1255,3,"Strilic, Mr. Ivan",male,27,0,0,315083,8.6625,,S 366 | 1256,1,"Harder, Mrs. George Achilles (Dorothy Annan)",female,25,1,0,11765,55.4417,E50,C 367 | 1257,3,"Sage, Mrs. John (Annie Bullen)",female,,1,9,CA. 2343,69.55,,S 368 | 1258,3,"Caram, Mr. Joseph",male,,1,0,2689,14.4583,,C 369 | 1259,3,"Riihivouri, Miss. Susanna Juhantytar Sanni""""",female,22,0,0,3101295,39.6875,,S 370 | 1260,1,"Gibson, Mrs. Leonard (Pauline C Boeson)",female,45,0,1,112378,59.4,,C 371 | 1261,2,"Pallas y Castello, Mr. Emilio",male,29,0,0,SC/PARIS 2147,13.8583,,C 372 | 1262,2,"Giles, Mr. Edgar",male,21,1,0,28133,11.5,,S 373 | 1263,1,"Wilson, Miss. Helen Alice",female,31,0,0,16966,134.5,E39 E41,C 374 | 1264,1,"Ismay, Mr. Joseph Bruce",male,49,0,0,112058,0,B52 B54 B56,S 375 | 1265,2,"Harbeck, Mr. William H",male,44,0,0,248746,13,,S 376 | 1266,1,"Dodge, Mrs. Washington (Ruth Vidaver)",female,54,1,1,33638,81.8583,A34,S 377 | 1267,1,"Bowen, Miss. Grace Scott",female,45,0,0,PC 17608,262.375,,C 378 | 1268,3,"Kink, Miss. Maria",female,22,2,0,315152,8.6625,,S 379 | 1269,2,"Cotterill, Mr. Henry Harry""""",male,21,0,0,29107,11.5,,S 380 | 1270,1,"Hipkins, Mr. William Edward",male,55,0,0,680,50,C39,S 381 | 1271,3,"Asplund, Master. Carl Edgar",male,5,4,2,347077,31.3875,,S 382 | 1272,3,"O'Connor, Mr. Patrick",male,,0,0,366713,7.75,,Q 383 | 1273,3,"Foley, Mr. Joseph",male,26,0,0,330910,7.8792,,Q 384 | 1274,3,"Risien, Mrs. Samuel (Emma)",female,,0,0,364498,14.5,,S 385 | 1275,3,"McNamee, Mrs. Neal (Eileen O'Leary)",female,19,1,0,376566,16.1,,S 386 | 1276,2,"Wheeler, Mr. Edwin Frederick""""",male,,0,0,SC/PARIS 2159,12.875,,S 387 | 1277,2,"Herman, Miss. Kate",female,24,1,2,220845,65,,S 388 | 1278,3,"Aronsson, Mr. Ernst Axel Algot",male,24,0,0,349911,7.775,,S 389 | 1279,2,"Ashby, Mr. John",male,57,0,0,244346,13,,S 390 | 1280,3,"Canavan, Mr. Patrick",male,21,0,0,364858,7.75,,Q 391 | 1281,3,"Palsson, Master. Paul Folke",male,6,3,1,349909,21.075,,S 392 | 1282,1,"Payne, Mr. Vivian Ponsonby",male,23,0,0,12749,93.5,B24,S 393 | 1283,1,"Lines, Mrs. Ernest H (Elizabeth Lindsey James)",female,51,0,1,PC 17592,39.4,D28,S 394 | 1284,3,"Abbott, Master. Eugene Joseph",male,13,0,2,C.A. 2673,20.25,,S 395 | 1285,2,"Gilbert, Mr. William",male,47,0,0,C.A. 30769,10.5,,S 396 | 1286,3,"Kink-Heilmann, Mr. Anton",male,29,3,1,315153,22.025,,S 397 | 1287,1,"Smith, Mrs. Lucien Philip (Mary Eloise Hughes)",female,18,1,0,13695,60,C31,S 398 | 1288,3,"Colbert, Mr. Patrick",male,24,0,0,371109,7.25,,Q 399 | 1289,1,"Frolicher-Stehli, Mrs. Maxmillian (Margaretha Emerentia Stehli)",female,48,1,1,13567,79.2,B41,C 400 | 1290,3,"Larsson-Rondberg, Mr. Edvard A",male,22,0,0,347065,7.775,,S 401 | 1291,3,"Conlon, Mr. Thomas Henry",male,31,0,0,21332,7.7333,,Q 402 | 1292,1,"Bonnell, Miss. Caroline",female,30,0,0,36928,164.8667,C7,S 403 | 1293,2,"Gale, Mr. Harry",male,38,1,0,28664,21,,S 404 | 1294,1,"Gibson, Miss. Dorothy Winifred",female,22,0,1,112378,59.4,,C 405 | 1295,1,"Carrau, Mr. Jose Pedro",male,17,0,0,113059,47.1,,S 406 | 1296,1,"Frauenthal, Mr. Isaac Gerald",male,43,1,0,17765,27.7208,D40,C 407 | 1297,2,"Nourney, Mr. Alfred (Baron von Drachstedt"")""",male,20,0,0,SC/PARIS 2166,13.8625,D38,C 408 | 1298,2,"Ware, Mr. William Jeffery",male,23,1,0,28666,10.5,,S 409 | 1299,1,"Widener, Mr. George Dunton",male,50,1,1,113503,211.5,C80,C 410 | 1300,3,"Riordan, Miss. Johanna Hannah""""",female,,0,0,334915,7.7208,,Q 411 | 1301,3,"Peacock, Miss. Treasteall",female,3,1,1,SOTON/O.Q. 3101315,13.775,,S 412 | 1302,3,"Naughton, Miss. Hannah",female,,0,0,365237,7.75,,Q 413 | 1303,1,"Minahan, Mrs. William Edward (Lillian E Thorpe)",female,37,1,0,19928,90,C78,Q 414 | 1304,3,"Henriksson, Miss. Jenny Lovisa",female,28,0,0,347086,7.775,,S 415 | 1305,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.05,,S 416 | 1306,1,"Oliva y Ocana, Dona. Fermina",female,39,0,0,PC 17758,108.9,C105,C 417 | 1307,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.25,,S 418 | 1308,3,"Ware, Mr. Frederick",male,,0,0,359309,8.05,,S 419 | 1309,3,"Peter, Master. Michael J",male,,1,1,2668,22.3583,,C 420 | -------------------------------------------------------------------------------- /01_泰坦尼克号(titanic)/datasets/train.csv: -------------------------------------------------------------------------------- 1 | PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked 2 | 1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 3 | 2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C 4 | 3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S 5 | 4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S 6 | 5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S 7 | 6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q 8 | 7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S 9 | 8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S 10 | 9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S 11 | 10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C 12 | 11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S 13 | 12,1,1,"Bonnell, Miss. Elizabeth",female,58,0,0,113783,26.55,C103,S 14 | 13,0,3,"Saundercock, Mr. William Henry",male,20,0,0,A/5. 2151,8.05,,S 15 | 14,0,3,"Andersson, Mr. Anders Johan",male,39,1,5,347082,31.275,,S 16 | 15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14,0,0,350406,7.8542,,S 17 | 16,1,2,"Hewlett, Mrs. (Mary D Kingcome) ",female,55,0,0,248706,16,,S 18 | 17,0,3,"Rice, Master. Eugene",male,2,4,1,382652,29.125,,Q 19 | 18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13,,S 20 | 19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",female,31,1,0,345763,18,,S 21 | 20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C 22 | 21,0,2,"Fynney, Mr. Joseph J",male,35,0,0,239865,26,,S 23 | 22,1,2,"Beesley, Mr. Lawrence",male,34,0,0,248698,13,D56,S 24 | 23,1,3,"McGowan, Miss. Anna ""Annie""",female,15,0,0,330923,8.0292,,Q 25 | 24,1,1,"Sloper, Mr. William Thompson",male,28,0,0,113788,35.5,A6,S 26 | 25,0,3,"Palsson, Miss. Torborg Danira",female,8,3,1,349909,21.075,,S 27 | 26,1,3,"Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)",female,38,1,5,347077,31.3875,,S 28 | 27,0,3,"Emir, Mr. Farred Chehab",male,,0,0,2631,7.225,,C 29 | 28,0,1,"Fortune, Mr. Charles Alexander",male,19,3,2,19950,263,C23 C25 C27,S 30 | 29,1,3,"O'Dwyer, Miss. Ellen ""Nellie""",female,,0,0,330959,7.8792,,Q 31 | 30,0,3,"Todoroff, Mr. Lalio",male,,0,0,349216,7.8958,,S 32 | 31,0,1,"Uruchurtu, Don. Manuel E",male,40,0,0,PC 17601,27.7208,,C 33 | 32,1,1,"Spencer, Mrs. William Augustus (Marie Eugenie)",female,,1,0,PC 17569,146.5208,B78,C 34 | 33,1,3,"Glynn, Miss. Mary Agatha",female,,0,0,335677,7.75,,Q 35 | 34,0,2,"Wheadon, Mr. Edward H",male,66,0,0,C.A. 24579,10.5,,S 36 | 35,0,1,"Meyer, Mr. Edgar Joseph",male,28,1,0,PC 17604,82.1708,,C 37 | 36,0,1,"Holverson, Mr. Alexander Oskar",male,42,1,0,113789,52,,S 38 | 37,1,3,"Mamee, Mr. Hanna",male,,0,0,2677,7.2292,,C 39 | 38,0,3,"Cann, Mr. Ernest Charles",male,21,0,0,A./5. 2152,8.05,,S 40 | 39,0,3,"Vander Planke, Miss. Augusta Maria",female,18,2,0,345764,18,,S 41 | 40,1,3,"Nicola-Yarred, Miss. Jamila",female,14,1,0,2651,11.2417,,C 42 | 41,0,3,"Ahlin, Mrs. Johan (Johanna Persdotter Larsson)",female,40,1,0,7546,9.475,,S 43 | 42,0,2,"Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)",female,27,1,0,11668,21,,S 44 | 43,0,3,"Kraeff, Mr. Theodor",male,,0,0,349253,7.8958,,C 45 | 44,1,2,"Laroche, Miss. Simonne Marie Anne Andree",female,3,1,2,SC/Paris 2123,41.5792,,C 46 | 45,1,3,"Devaney, Miss. Margaret Delia",female,19,0,0,330958,7.8792,,Q 47 | 46,0,3,"Rogers, Mr. William John",male,,0,0,S.C./A.4. 23567,8.05,,S 48 | 47,0,3,"Lennon, Mr. Denis",male,,1,0,370371,15.5,,Q 49 | 48,1,3,"O'Driscoll, Miss. Bridget",female,,0,0,14311,7.75,,Q 50 | 49,0,3,"Samaan, Mr. Youssef",male,,2,0,2662,21.6792,,C 51 | 50,0,3,"Arnold-Franchi, Mrs. Josef (Josefine Franchi)",female,18,1,0,349237,17.8,,S 52 | 51,0,3,"Panula, Master. Juha Niilo",male,7,4,1,3101295,39.6875,,S 53 | 52,0,3,"Nosworthy, Mr. Richard Cater",male,21,0,0,A/4. 39886,7.8,,S 54 | 53,1,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female,49,1,0,PC 17572,76.7292,D33,C 55 | 54,1,2,"Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson)",female,29,1,0,2926,26,,S 56 | 55,0,1,"Ostby, Mr. Engelhart Cornelius",male,65,0,1,113509,61.9792,B30,C 57 | 56,1,1,"Woolner, Mr. Hugh",male,,0,0,19947,35.5,C52,S 58 | 57,1,2,"Rugg, Miss. Emily",female,21,0,0,C.A. 31026,10.5,,S 59 | 58,0,3,"Novel, Mr. Mansouer",male,28.5,0,0,2697,7.2292,,C 60 | 59,1,2,"West, Miss. Constance Mirium",female,5,1,2,C.A. 34651,27.75,,S 61 | 60,0,3,"Goodwin, Master. William Frederick",male,11,5,2,CA 2144,46.9,,S 62 | 61,0,3,"Sirayanian, Mr. Orsen",male,22,0,0,2669,7.2292,,C 63 | 62,1,1,"Icard, Miss. Amelie",female,38,0,0,113572,80,B28, 64 | 63,0,1,"Harris, Mr. Henry Birkhardt",male,45,1,0,36973,83.475,C83,S 65 | 64,0,3,"Skoog, Master. Harald",male,4,3,2,347088,27.9,,S 66 | 65,0,1,"Stewart, Mr. Albert A",male,,0,0,PC 17605,27.7208,,C 67 | 66,1,3,"Moubarek, Master. Gerios",male,,1,1,2661,15.2458,,C 68 | 67,1,2,"Nye, Mrs. (Elizabeth Ramell)",female,29,0,0,C.A. 29395,10.5,F33,S 69 | 68,0,3,"Crease, Mr. Ernest James",male,19,0,0,S.P. 3464,8.1583,,S 70 | 69,1,3,"Andersson, Miss. Erna Alexandra",female,17,4,2,3101281,7.925,,S 71 | 70,0,3,"Kink, Mr. Vincenz",male,26,2,0,315151,8.6625,,S 72 | 71,0,2,"Jenkin, Mr. Stephen Curnow",male,32,0,0,C.A. 33111,10.5,,S 73 | 72,0,3,"Goodwin, Miss. Lillian Amy",female,16,5,2,CA 2144,46.9,,S 74 | 73,0,2,"Hood, Mr. Ambrose Jr",male,21,0,0,S.O.C. 14879,73.5,,S 75 | 74,0,3,"Chronopoulos, Mr. Apostolos",male,26,1,0,2680,14.4542,,C 76 | 75,1,3,"Bing, Mr. Lee",male,32,0,0,1601,56.4958,,S 77 | 76,0,3,"Moen, Mr. Sigurd Hansen",male,25,0,0,348123,7.65,F G73,S 78 | 77,0,3,"Staneff, Mr. Ivan",male,,0,0,349208,7.8958,,S 79 | 78,0,3,"Moutal, Mr. Rahamin Haim",male,,0,0,374746,8.05,,S 80 | 79,1,2,"Caldwell, Master. Alden Gates",male,0.83,0,2,248738,29,,S 81 | 80,1,3,"Dowdell, Miss. Elizabeth",female,30,0,0,364516,12.475,,S 82 | 81,0,3,"Waelens, Mr. Achille",male,22,0,0,345767,9,,S 83 | 82,1,3,"Sheerlinck, Mr. Jan Baptist",male,29,0,0,345779,9.5,,S 84 | 83,1,3,"McDermott, Miss. Brigdet Delia",female,,0,0,330932,7.7875,,Q 85 | 84,0,1,"Carrau, Mr. Francisco M",male,28,0,0,113059,47.1,,S 86 | 85,1,2,"Ilett, Miss. Bertha",female,17,0,0,SO/C 14885,10.5,,S 87 | 86,1,3,"Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",female,33,3,0,3101278,15.85,,S 88 | 87,0,3,"Ford, Mr. William Neal",male,16,1,3,W./C. 6608,34.375,,S 89 | 88,0,3,"Slocovski, Mr. Selman Francis",male,,0,0,SOTON/OQ 392086,8.05,,S 90 | 89,1,1,"Fortune, Miss. Mabel Helen",female,23,3,2,19950,263,C23 C25 C27,S 91 | 90,0,3,"Celotti, Mr. Francesco",male,24,0,0,343275,8.05,,S 92 | 91,0,3,"Christmann, Mr. Emil",male,29,0,0,343276,8.05,,S 93 | 92,0,3,"Andreasson, Mr. Paul Edvin",male,20,0,0,347466,7.8542,,S 94 | 93,0,1,"Chaffee, Mr. Herbert Fuller",male,46,1,0,W.E.P. 5734,61.175,E31,S 95 | 94,0,3,"Dean, Mr. Bertram Frank",male,26,1,2,C.A. 2315,20.575,,S 96 | 95,0,3,"Coxon, Mr. Daniel",male,59,0,0,364500,7.25,,S 97 | 96,0,3,"Shorney, Mr. Charles Joseph",male,,0,0,374910,8.05,,S 98 | 97,0,1,"Goldschmidt, Mr. George B",male,71,0,0,PC 17754,34.6542,A5,C 99 | 98,1,1,"Greenfield, Mr. William Bertram",male,23,0,1,PC 17759,63.3583,D10 D12,C 100 | 99,1,2,"Doling, Mrs. John T (Ada Julia Bone)",female,34,0,1,231919,23,,S 101 | 100,0,2,"Kantor, Mr. Sinai",male,34,1,0,244367,26,,S 102 | 101,0,3,"Petranec, Miss. Matilda",female,28,0,0,349245,7.8958,,S 103 | 102,0,3,"Petroff, Mr. Pastcho (""Pentcho"")",male,,0,0,349215,7.8958,,S 104 | 103,0,1,"White, Mr. Richard Frasar",male,21,0,1,35281,77.2875,D26,S 105 | 104,0,3,"Johansson, Mr. Gustaf Joel",male,33,0,0,7540,8.6542,,S 106 | 105,0,3,"Gustafsson, Mr. Anders Vilhelm",male,37,2,0,3101276,7.925,,S 107 | 106,0,3,"Mionoff, Mr. Stoytcho",male,28,0,0,349207,7.8958,,S 108 | 107,1,3,"Salkjelsvik, Miss. Anna Kristine",female,21,0,0,343120,7.65,,S 109 | 108,1,3,"Moss, Mr. Albert Johan",male,,0,0,312991,7.775,,S 110 | 109,0,3,"Rekic, Mr. Tido",male,38,0,0,349249,7.8958,,S 111 | 110,1,3,"Moran, Miss. Bertha",female,,1,0,371110,24.15,,Q 112 | 111,0,1,"Porter, Mr. Walter Chamberlain",male,47,0,0,110465,52,C110,S 113 | 112,0,3,"Zabour, Miss. Hileni",female,14.5,1,0,2665,14.4542,,C 114 | 113,0,3,"Barton, Mr. David John",male,22,0,0,324669,8.05,,S 115 | 114,0,3,"Jussila, Miss. Katriina",female,20,1,0,4136,9.825,,S 116 | 115,0,3,"Attalah, Miss. Malake",female,17,0,0,2627,14.4583,,C 117 | 116,0,3,"Pekoniemi, Mr. Edvard",male,21,0,0,STON/O 2. 3101294,7.925,,S 118 | 117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q 119 | 118,0,2,"Turpin, Mr. William John Robert",male,29,1,0,11668,21,,S 120 | 119,0,1,"Baxter, Mr. Quigg Edmond",male,24,0,1,PC 17558,247.5208,B58 B60,C 121 | 120,0,3,"Andersson, Miss. Ellis Anna Maria",female,2,4,2,347082,31.275,,S 122 | 121,0,2,"Hickman, Mr. Stanley George",male,21,2,0,S.O.C. 14879,73.5,,S 123 | 122,0,3,"Moore, Mr. Leonard Charles",male,,0,0,A4. 54510,8.05,,S 124 | 123,0,2,"Nasser, Mr. Nicholas",male,32.5,1,0,237736,30.0708,,C 125 | 124,1,2,"Webber, Miss. Susan",female,32.5,0,0,27267,13,E101,S 126 | 125,0,1,"White, Mr. Percival Wayland",male,54,0,1,35281,77.2875,D26,S 127 | 126,1,3,"Nicola-Yarred, Master. Elias",male,12,1,0,2651,11.2417,,C 128 | 127,0,3,"McMahon, Mr. Martin",male,,0,0,370372,7.75,,Q 129 | 128,1,3,"Madsen, Mr. Fridtjof Arne",male,24,0,0,C 17369,7.1417,,S 130 | 129,1,3,"Peter, Miss. Anna",female,,1,1,2668,22.3583,F E69,C 131 | 130,0,3,"Ekstrom, Mr. Johan",male,45,0,0,347061,6.975,,S 132 | 131,0,3,"Drazenoic, Mr. Jozef",male,33,0,0,349241,7.8958,,C 133 | 132,0,3,"Coelho, Mr. Domingos Fernandeo",male,20,0,0,SOTON/O.Q. 3101307,7.05,,S 134 | 133,0,3,"Robins, Mrs. Alexander A (Grace Charity Laury)",female,47,1,0,A/5. 3337,14.5,,S 135 | 134,1,2,"Weisz, Mrs. Leopold (Mathilde Francoise Pede)",female,29,1,0,228414,26,,S 136 | 135,0,2,"Sobey, Mr. Samuel James Hayden",male,25,0,0,C.A. 29178,13,,S 137 | 136,0,2,"Richard, Mr. Emile",male,23,0,0,SC/PARIS 2133,15.0458,,C 138 | 137,1,1,"Newsom, Miss. Helen Monypeny",female,19,0,2,11752,26.2833,D47,S 139 | 138,0,1,"Futrelle, Mr. Jacques Heath",male,37,1,0,113803,53.1,C123,S 140 | 139,0,3,"Osen, Mr. Olaf Elon",male,16,0,0,7534,9.2167,,S 141 | 140,0,1,"Giglio, Mr. Victor",male,24,0,0,PC 17593,79.2,B86,C 142 | 141,0,3,"Boulos, Mrs. Joseph (Sultana)",female,,0,2,2678,15.2458,,C 143 | 142,1,3,"Nysten, Miss. Anna Sofia",female,22,0,0,347081,7.75,,S 144 | 143,1,3,"Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)",female,24,1,0,STON/O2. 3101279,15.85,,S 145 | 144,0,3,"Burke, Mr. Jeremiah",male,19,0,0,365222,6.75,,Q 146 | 145,0,2,"Andrew, Mr. Edgardo Samuel",male,18,0,0,231945,11.5,,S 147 | 146,0,2,"Nicholls, Mr. Joseph Charles",male,19,1,1,C.A. 33112,36.75,,S 148 | 147,1,3,"Andersson, Mr. August Edvard (""Wennerstrom"")",male,27,0,0,350043,7.7958,,S 149 | 148,0,3,"Ford, Miss. Robina Maggie ""Ruby""",female,9,2,2,W./C. 6608,34.375,,S 150 | 149,0,2,"Navratil, Mr. Michel (""Louis M Hoffman"")",male,36.5,0,2,230080,26,F2,S 151 | 150,0,2,"Byles, Rev. Thomas Roussel Davids",male,42,0,0,244310,13,,S 152 | 151,0,2,"Bateman, Rev. Robert James",male,51,0,0,S.O.P. 1166,12.525,,S 153 | 152,1,1,"Pears, Mrs. Thomas (Edith Wearne)",female,22,1,0,113776,66.6,C2,S 154 | 153,0,3,"Meo, Mr. Alfonzo",male,55.5,0,0,A.5. 11206,8.05,,S 155 | 154,0,3,"van Billiard, Mr. Austin Blyler",male,40.5,0,2,A/5. 851,14.5,,S 156 | 155,0,3,"Olsen, Mr. Ole Martin",male,,0,0,Fa 265302,7.3125,,S 157 | 156,0,1,"Williams, Mr. Charles Duane",male,51,0,1,PC 17597,61.3792,,C 158 | 157,1,3,"Gilnagh, Miss. Katherine ""Katie""",female,16,0,0,35851,7.7333,,Q 159 | 158,0,3,"Corn, Mr. Harry",male,30,0,0,SOTON/OQ 392090,8.05,,S 160 | 159,0,3,"Smiljanic, Mr. Mile",male,,0,0,315037,8.6625,,S 161 | 160,0,3,"Sage, Master. Thomas Henry",male,,8,2,CA. 2343,69.55,,S 162 | 161,0,3,"Cribb, Mr. John Hatfield",male,44,0,1,371362,16.1,,S 163 | 162,1,2,"Watt, Mrs. James (Elizabeth ""Bessie"" Inglis Milne)",female,40,0,0,C.A. 33595,15.75,,S 164 | 163,0,3,"Bengtsson, Mr. John Viktor",male,26,0,0,347068,7.775,,S 165 | 164,0,3,"Calic, Mr. Jovo",male,17,0,0,315093,8.6625,,S 166 | 165,0,3,"Panula, Master. Eino Viljami",male,1,4,1,3101295,39.6875,,S 167 | 166,1,3,"Goldsmith, Master. Frank John William ""Frankie""",male,9,0,2,363291,20.525,,S 168 | 167,1,1,"Chibnall, Mrs. (Edith Martha Bowerman)",female,,0,1,113505,55,E33,S 169 | 168,0,3,"Skoog, Mrs. William (Anna Bernhardina Karlsson)",female,45,1,4,347088,27.9,,S 170 | 169,0,1,"Baumann, Mr. John D",male,,0,0,PC 17318,25.925,,S 171 | 170,0,3,"Ling, Mr. Lee",male,28,0,0,1601,56.4958,,S 172 | 171,0,1,"Van der hoef, Mr. Wyckoff",male,61,0,0,111240,33.5,B19,S 173 | 172,0,3,"Rice, Master. Arthur",male,4,4,1,382652,29.125,,Q 174 | 173,1,3,"Johnson, Miss. Eleanor Ileen",female,1,1,1,347742,11.1333,,S 175 | 174,0,3,"Sivola, Mr. Antti Wilhelm",male,21,0,0,STON/O 2. 3101280,7.925,,S 176 | 175,0,1,"Smith, Mr. James Clinch",male,56,0,0,17764,30.6958,A7,C 177 | 176,0,3,"Klasen, Mr. Klas Albin",male,18,1,1,350404,7.8542,,S 178 | 177,0,3,"Lefebre, Master. Henry Forbes",male,,3,1,4133,25.4667,,S 179 | 178,0,1,"Isham, Miss. Ann Elizabeth",female,50,0,0,PC 17595,28.7125,C49,C 180 | 179,0,2,"Hale, Mr. Reginald",male,30,0,0,250653,13,,S 181 | 180,0,3,"Leonard, Mr. Lionel",male,36,0,0,LINE,0,,S 182 | 181,0,3,"Sage, Miss. Constance Gladys",female,,8,2,CA. 2343,69.55,,S 183 | 182,0,2,"Pernot, Mr. Rene",male,,0,0,SC/PARIS 2131,15.05,,C 184 | 183,0,3,"Asplund, Master. Clarence Gustaf Hugo",male,9,4,2,347077,31.3875,,S 185 | 184,1,2,"Becker, Master. Richard F",male,1,2,1,230136,39,F4,S 186 | 185,1,3,"Kink-Heilmann, Miss. Luise Gretchen",female,4,0,2,315153,22.025,,S 187 | 186,0,1,"Rood, Mr. Hugh Roscoe",male,,0,0,113767,50,A32,S 188 | 187,1,3,"O'Brien, Mrs. Thomas (Johanna ""Hannah"" Godfrey)",female,,1,0,370365,15.5,,Q 189 | 188,1,1,"Romaine, Mr. Charles Hallace (""Mr C Rolmane"")",male,45,0,0,111428,26.55,,S 190 | 189,0,3,"Bourke, Mr. John",male,40,1,1,364849,15.5,,Q 191 | 190,0,3,"Turcin, Mr. Stjepan",male,36,0,0,349247,7.8958,,S 192 | 191,1,2,"Pinsky, Mrs. (Rosa)",female,32,0,0,234604,13,,S 193 | 192,0,2,"Carbines, Mr. William",male,19,0,0,28424,13,,S 194 | 193,1,3,"Andersen-Jensen, Miss. Carla Christine Nielsine",female,19,1,0,350046,7.8542,,S 195 | 194,1,2,"Navratil, Master. Michel M",male,3,1,1,230080,26,F2,S 196 | 195,1,1,"Brown, Mrs. James Joseph (Margaret Tobin)",female,44,0,0,PC 17610,27.7208,B4,C 197 | 196,1,1,"Lurette, Miss. Elise",female,58,0,0,PC 17569,146.5208,B80,C 198 | 197,0,3,"Mernagh, Mr. Robert",male,,0,0,368703,7.75,,Q 199 | 198,0,3,"Olsen, Mr. Karl Siegwart Andreas",male,42,0,1,4579,8.4042,,S 200 | 199,1,3,"Madigan, Miss. Margaret ""Maggie""",female,,0,0,370370,7.75,,Q 201 | 200,0,2,"Yrois, Miss. Henriette (""Mrs Harbeck"")",female,24,0,0,248747,13,,S 202 | 201,0,3,"Vande Walle, Mr. Nestor Cyriel",male,28,0,0,345770,9.5,,S 203 | 202,0,3,"Sage, Mr. Frederick",male,,8,2,CA. 2343,69.55,,S 204 | 203,0,3,"Johanson, Mr. Jakob Alfred",male,34,0,0,3101264,6.4958,,S 205 | 204,0,3,"Youseff, Mr. Gerious",male,45.5,0,0,2628,7.225,,C 206 | 205,1,3,"Cohen, Mr. Gurshon ""Gus""",male,18,0,0,A/5 3540,8.05,,S 207 | 206,0,3,"Strom, Miss. Telma Matilda",female,2,0,1,347054,10.4625,G6,S 208 | 207,0,3,"Backstrom, Mr. Karl Alfred",male,32,1,0,3101278,15.85,,S 209 | 208,1,3,"Albimona, Mr. Nassef Cassem",male,26,0,0,2699,18.7875,,C 210 | 209,1,3,"Carr, Miss. Helen ""Ellen""",female,16,0,0,367231,7.75,,Q 211 | 210,1,1,"Blank, Mr. Henry",male,40,0,0,112277,31,A31,C 212 | 211,0,3,"Ali, Mr. Ahmed",male,24,0,0,SOTON/O.Q. 3101311,7.05,,S 213 | 212,1,2,"Cameron, Miss. Clear Annie",female,35,0,0,F.C.C. 13528,21,,S 214 | 213,0,3,"Perkin, Mr. John Henry",male,22,0,0,A/5 21174,7.25,,S 215 | 214,0,2,"Givard, Mr. Hans Kristensen",male,30,0,0,250646,13,,S 216 | 215,0,3,"Kiernan, Mr. Philip",male,,1,0,367229,7.75,,Q 217 | 216,1,1,"Newell, Miss. Madeleine",female,31,1,0,35273,113.275,D36,C 218 | 217,1,3,"Honkanen, Miss. Eliina",female,27,0,0,STON/O2. 3101283,7.925,,S 219 | 218,0,2,"Jacobsohn, Mr. Sidney Samuel",male,42,1,0,243847,27,,S 220 | 219,1,1,"Bazzani, Miss. Albina",female,32,0,0,11813,76.2917,D15,C 221 | 220,0,2,"Harris, Mr. Walter",male,30,0,0,W/C 14208,10.5,,S 222 | 221,1,3,"Sunderland, Mr. Victor Francis",male,16,0,0,SOTON/OQ 392089,8.05,,S 223 | 222,0,2,"Bracken, Mr. James H",male,27,0,0,220367,13,,S 224 | 223,0,3,"Green, Mr. George Henry",male,51,0,0,21440,8.05,,S 225 | 224,0,3,"Nenkoff, Mr. Christo",male,,0,0,349234,7.8958,,S 226 | 225,1,1,"Hoyt, Mr. Frederick Maxfield",male,38,1,0,19943,90,C93,S 227 | 226,0,3,"Berglund, Mr. Karl Ivar Sven",male,22,0,0,PP 4348,9.35,,S 228 | 227,1,2,"Mellors, Mr. William John",male,19,0,0,SW/PP 751,10.5,,S 229 | 228,0,3,"Lovell, Mr. John Hall (""Henry"")",male,20.5,0,0,A/5 21173,7.25,,S 230 | 229,0,2,"Fahlstrom, Mr. Arne Jonas",male,18,0,0,236171,13,,S 231 | 230,0,3,"Lefebre, Miss. Mathilde",female,,3,1,4133,25.4667,,S 232 | 231,1,1,"Harris, Mrs. Henry Birkhardt (Irene Wallach)",female,35,1,0,36973,83.475,C83,S 233 | 232,0,3,"Larsson, Mr. Bengt Edvin",male,29,0,0,347067,7.775,,S 234 | 233,0,2,"Sjostedt, Mr. Ernst Adolf",male,59,0,0,237442,13.5,,S 235 | 234,1,3,"Asplund, Miss. Lillian Gertrud",female,5,4,2,347077,31.3875,,S 236 | 235,0,2,"Leyson, Mr. Robert William Norman",male,24,0,0,C.A. 29566,10.5,,S 237 | 236,0,3,"Harknett, Miss. Alice Phoebe",female,,0,0,W./C. 6609,7.55,,S 238 | 237,0,2,"Hold, Mr. Stephen",male,44,1,0,26707,26,,S 239 | 238,1,2,"Collyer, Miss. Marjorie ""Lottie""",female,8,0,2,C.A. 31921,26.25,,S 240 | 239,0,2,"Pengelly, Mr. Frederick William",male,19,0,0,28665,10.5,,S 241 | 240,0,2,"Hunt, Mr. George Henry",male,33,0,0,SCO/W 1585,12.275,,S 242 | 241,0,3,"Zabour, Miss. Thamine",female,,1,0,2665,14.4542,,C 243 | 242,1,3,"Murphy, Miss. Katherine ""Kate""",female,,1,0,367230,15.5,,Q 244 | 243,0,2,"Coleridge, Mr. Reginald Charles",male,29,0,0,W./C. 14263,10.5,,S 245 | 244,0,3,"Maenpaa, Mr. Matti Alexanteri",male,22,0,0,STON/O 2. 3101275,7.125,,S 246 | 245,0,3,"Attalah, Mr. Sleiman",male,30,0,0,2694,7.225,,C 247 | 246,0,1,"Minahan, Dr. William Edward",male,44,2,0,19928,90,C78,Q 248 | 247,0,3,"Lindahl, Miss. Agda Thorilda Viktoria",female,25,0,0,347071,7.775,,S 249 | 248,1,2,"Hamalainen, Mrs. William (Anna)",female,24,0,2,250649,14.5,,S 250 | 249,1,1,"Beckwith, Mr. Richard Leonard",male,37,1,1,11751,52.5542,D35,S 251 | 250,0,2,"Carter, Rev. Ernest Courtenay",male,54,1,0,244252,26,,S 252 | 251,0,3,"Reed, Mr. James George",male,,0,0,362316,7.25,,S 253 | 252,0,3,"Strom, Mrs. Wilhelm (Elna Matilda Persson)",female,29,1,1,347054,10.4625,G6,S 254 | 253,0,1,"Stead, Mr. William Thomas",male,62,0,0,113514,26.55,C87,S 255 | 254,0,3,"Lobb, Mr. William Arthur",male,30,1,0,A/5. 3336,16.1,,S 256 | 255,0,3,"Rosblom, Mrs. Viktor (Helena Wilhelmina)",female,41,0,2,370129,20.2125,,S 257 | 256,1,3,"Touma, Mrs. Darwis (Hanne Youssef Razi)",female,29,0,2,2650,15.2458,,C 258 | 257,1,1,"Thorne, Mrs. Gertrude Maybelle",female,,0,0,PC 17585,79.2,,C 259 | 258,1,1,"Cherry, Miss. Gladys",female,30,0,0,110152,86.5,B77,S 260 | 259,1,1,"Ward, Miss. Anna",female,35,0,0,PC 17755,512.3292,,C 261 | 260,1,2,"Parrish, Mrs. (Lutie Davis)",female,50,0,1,230433,26,,S 262 | 261,0,3,"Smith, Mr. Thomas",male,,0,0,384461,7.75,,Q 263 | 262,1,3,"Asplund, Master. Edvin Rojj Felix",male,3,4,2,347077,31.3875,,S 264 | 263,0,1,"Taussig, Mr. Emil",male,52,1,1,110413,79.65,E67,S 265 | 264,0,1,"Harrison, Mr. William",male,40,0,0,112059,0,B94,S 266 | 265,0,3,"Henry, Miss. Delia",female,,0,0,382649,7.75,,Q 267 | 266,0,2,"Reeves, Mr. David",male,36,0,0,C.A. 17248,10.5,,S 268 | 267,0,3,"Panula, Mr. Ernesti Arvid",male,16,4,1,3101295,39.6875,,S 269 | 268,1,3,"Persson, Mr. Ernst Ulrik",male,25,1,0,347083,7.775,,S 270 | 269,1,1,"Graham, Mrs. William Thompson (Edith Junkins)",female,58,0,1,PC 17582,153.4625,C125,S 271 | 270,1,1,"Bissette, Miss. Amelia",female,35,0,0,PC 17760,135.6333,C99,S 272 | 271,0,1,"Cairns, Mr. Alexander",male,,0,0,113798,31,,S 273 | 272,1,3,"Tornquist, Mr. William Henry",male,25,0,0,LINE,0,,S 274 | 273,1,2,"Mellinger, Mrs. (Elizabeth Anne Maidment)",female,41,0,1,250644,19.5,,S 275 | 274,0,1,"Natsch, Mr. Charles H",male,37,0,1,PC 17596,29.7,C118,C 276 | 275,1,3,"Healy, Miss. Hanora ""Nora""",female,,0,0,370375,7.75,,Q 277 | 276,1,1,"Andrews, Miss. Kornelia Theodosia",female,63,1,0,13502,77.9583,D7,S 278 | 277,0,3,"Lindblom, Miss. Augusta Charlotta",female,45,0,0,347073,7.75,,S 279 | 278,0,2,"Parkes, Mr. Francis ""Frank""",male,,0,0,239853,0,,S 280 | 279,0,3,"Rice, Master. Eric",male,7,4,1,382652,29.125,,Q 281 | 280,1,3,"Abbott, Mrs. Stanton (Rosa Hunt)",female,35,1,1,C.A. 2673,20.25,,S 282 | 281,0,3,"Duane, Mr. Frank",male,65,0,0,336439,7.75,,Q 283 | 282,0,3,"Olsson, Mr. Nils Johan Goransson",male,28,0,0,347464,7.8542,,S 284 | 283,0,3,"de Pelsmaeker, Mr. Alfons",male,16,0,0,345778,9.5,,S 285 | 284,1,3,"Dorking, Mr. Edward Arthur",male,19,0,0,A/5. 10482,8.05,,S 286 | 285,0,1,"Smith, Mr. Richard William",male,,0,0,113056,26,A19,S 287 | 286,0,3,"Stankovic, Mr. Ivan",male,33,0,0,349239,8.6625,,C 288 | 287,1,3,"de Mulder, Mr. Theodore",male,30,0,0,345774,9.5,,S 289 | 288,0,3,"Naidenoff, Mr. Penko",male,22,0,0,349206,7.8958,,S 290 | 289,1,2,"Hosono, Mr. Masabumi",male,42,0,0,237798,13,,S 291 | 290,1,3,"Connolly, Miss. Kate",female,22,0,0,370373,7.75,,Q 292 | 291,1,1,"Barber, Miss. Ellen ""Nellie""",female,26,0,0,19877,78.85,,S 293 | 292,1,1,"Bishop, Mrs. Dickinson H (Helen Walton)",female,19,1,0,11967,91.0792,B49,C 294 | 293,0,2,"Levy, Mr. Rene Jacques",male,36,0,0,SC/Paris 2163,12.875,D,C 295 | 294,0,3,"Haas, Miss. Aloisia",female,24,0,0,349236,8.85,,S 296 | 295,0,3,"Mineff, Mr. Ivan",male,24,0,0,349233,7.8958,,S 297 | 296,0,1,"Lewy, Mr. Ervin G",male,,0,0,PC 17612,27.7208,,C 298 | 297,0,3,"Hanna, Mr. Mansour",male,23.5,0,0,2693,7.2292,,C 299 | 298,0,1,"Allison, Miss. Helen Loraine",female,2,1,2,113781,151.55,C22 C26,S 300 | 299,1,1,"Saalfeld, Mr. Adolphe",male,,0,0,19988,30.5,C106,S 301 | 300,1,1,"Baxter, Mrs. James (Helene DeLaudeniere Chaput)",female,50,0,1,PC 17558,247.5208,B58 B60,C 302 | 301,1,3,"Kelly, Miss. Anna Katherine ""Annie Kate""",female,,0,0,9234,7.75,,Q 303 | 302,1,3,"McCoy, Mr. Bernard",male,,2,0,367226,23.25,,Q 304 | 303,0,3,"Johnson, Mr. William Cahoone Jr",male,19,0,0,LINE,0,,S 305 | 304,1,2,"Keane, Miss. Nora A",female,,0,0,226593,12.35,E101,Q 306 | 305,0,3,"Williams, Mr. Howard Hugh ""Harry""",male,,0,0,A/5 2466,8.05,,S 307 | 306,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S 308 | 307,1,1,"Fleming, Miss. Margaret",female,,0,0,17421,110.8833,,C 309 | 308,1,1,"Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)",female,17,1,0,PC 17758,108.9,C65,C 310 | 309,0,2,"Abelson, Mr. Samuel",male,30,1,0,P/PP 3381,24,,C 311 | 310,1,1,"Francatelli, Miss. Laura Mabel",female,30,0,0,PC 17485,56.9292,E36,C 312 | 311,1,1,"Hays, Miss. Margaret Bechstein",female,24,0,0,11767,83.1583,C54,C 313 | 312,1,1,"Ryerson, Miss. Emily Borie",female,18,2,2,PC 17608,262.375,B57 B59 B63 B66,C 314 | 313,0,2,"Lahtinen, Mrs. William (Anna Sylfven)",female,26,1,1,250651,26,,S 315 | 314,0,3,"Hendekovic, Mr. Ignjac",male,28,0,0,349243,7.8958,,S 316 | 315,0,2,"Hart, Mr. Benjamin",male,43,1,1,F.C.C. 13529,26.25,,S 317 | 316,1,3,"Nilsson, Miss. Helmina Josefina",female,26,0,0,347470,7.8542,,S 318 | 317,1,2,"Kantor, Mrs. Sinai (Miriam Sternin)",female,24,1,0,244367,26,,S 319 | 318,0,2,"Moraweck, Dr. Ernest",male,54,0,0,29011,14,,S 320 | 319,1,1,"Wick, Miss. Mary Natalie",female,31,0,2,36928,164.8667,C7,S 321 | 320,1,1,"Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)",female,40,1,1,16966,134.5,E34,C 322 | 321,0,3,"Dennis, Mr. Samuel",male,22,0,0,A/5 21172,7.25,,S 323 | 322,0,3,"Danoff, Mr. Yoto",male,27,0,0,349219,7.8958,,S 324 | 323,1,2,"Slayter, Miss. Hilda Mary",female,30,0,0,234818,12.35,,Q 325 | 324,1,2,"Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)",female,22,1,1,248738,29,,S 326 | 325,0,3,"Sage, Mr. George John Jr",male,,8,2,CA. 2343,69.55,,S 327 | 326,1,1,"Young, Miss. Marie Grice",female,36,0,0,PC 17760,135.6333,C32,C 328 | 327,0,3,"Nysveen, Mr. Johan Hansen",male,61,0,0,345364,6.2375,,S 329 | 328,1,2,"Ball, Mrs. (Ada E Hall)",female,36,0,0,28551,13,D,S 330 | 329,1,3,"Goldsmith, Mrs. Frank John (Emily Alice Brown)",female,31,1,1,363291,20.525,,S 331 | 330,1,1,"Hippach, Miss. Jean Gertrude",female,16,0,1,111361,57.9792,B18,C 332 | 331,1,3,"McCoy, Miss. Agnes",female,,2,0,367226,23.25,,Q 333 | 332,0,1,"Partner, Mr. Austen",male,45.5,0,0,113043,28.5,C124,S 334 | 333,0,1,"Graham, Mr. George Edward",male,38,0,1,PC 17582,153.4625,C91,S 335 | 334,0,3,"Vander Planke, Mr. Leo Edmondus",male,16,2,0,345764,18,,S 336 | 335,1,1,"Frauenthal, Mrs. Henry William (Clara Heinsheimer)",female,,1,0,PC 17611,133.65,,S 337 | 336,0,3,"Denkoff, Mr. Mitto",male,,0,0,349225,7.8958,,S 338 | 337,0,1,"Pears, Mr. Thomas Clinton",male,29,1,0,113776,66.6,C2,S 339 | 338,1,1,"Burns, Miss. Elizabeth Margaret",female,41,0,0,16966,134.5,E40,C 340 | 339,1,3,"Dahl, Mr. Karl Edwart",male,45,0,0,7598,8.05,,S 341 | 340,0,1,"Blackwell, Mr. Stephen Weart",male,45,0,0,113784,35.5,T,S 342 | 341,1,2,"Navratil, Master. Edmond Roger",male,2,1,1,230080,26,F2,S 343 | 342,1,1,"Fortune, Miss. Alice Elizabeth",female,24,3,2,19950,263,C23 C25 C27,S 344 | 343,0,2,"Collander, Mr. Erik Gustaf",male,28,0,0,248740,13,,S 345 | 344,0,2,"Sedgwick, Mr. Charles Frederick Waddington",male,25,0,0,244361,13,,S 346 | 345,0,2,"Fox, Mr. Stanley Hubert",male,36,0,0,229236,13,,S 347 | 346,1,2,"Brown, Miss. Amelia ""Mildred""",female,24,0,0,248733,13,F33,S 348 | 347,1,2,"Smith, Miss. Marion Elsie",female,40,0,0,31418,13,,S 349 | 348,1,3,"Davison, Mrs. Thomas Henry (Mary E Finck)",female,,1,0,386525,16.1,,S 350 | 349,1,3,"Coutts, Master. William Loch ""William""",male,3,1,1,C.A. 37671,15.9,,S 351 | 350,0,3,"Dimic, Mr. Jovan",male,42,0,0,315088,8.6625,,S 352 | 351,0,3,"Odahl, Mr. Nils Martin",male,23,0,0,7267,9.225,,S 353 | 352,0,1,"Williams-Lambert, Mr. Fletcher Fellows",male,,0,0,113510,35,C128,S 354 | 353,0,3,"Elias, Mr. Tannous",male,15,1,1,2695,7.2292,,C 355 | 354,0,3,"Arnold-Franchi, Mr. Josef",male,25,1,0,349237,17.8,,S 356 | 355,0,3,"Yousif, Mr. Wazli",male,,0,0,2647,7.225,,C 357 | 356,0,3,"Vanden Steen, Mr. Leo Peter",male,28,0,0,345783,9.5,,S 358 | 357,1,1,"Bowerman, Miss. Elsie Edith",female,22,0,1,113505,55,E33,S 359 | 358,0,2,"Funk, Miss. Annie Clemmer",female,38,0,0,237671,13,,S 360 | 359,1,3,"McGovern, Miss. Mary",female,,0,0,330931,7.8792,,Q 361 | 360,1,3,"Mockler, Miss. Helen Mary ""Ellie""",female,,0,0,330980,7.8792,,Q 362 | 361,0,3,"Skoog, Mr. Wilhelm",male,40,1,4,347088,27.9,,S 363 | 362,0,2,"del Carlo, Mr. Sebastiano",male,29,1,0,SC/PARIS 2167,27.7208,,C 364 | 363,0,3,"Barbara, Mrs. (Catherine David)",female,45,0,1,2691,14.4542,,C 365 | 364,0,3,"Asim, Mr. Adola",male,35,0,0,SOTON/O.Q. 3101310,7.05,,S 366 | 365,0,3,"O'Brien, Mr. Thomas",male,,1,0,370365,15.5,,Q 367 | 366,0,3,"Adahl, Mr. Mauritz Nils Martin",male,30,0,0,C 7076,7.25,,S 368 | 367,1,1,"Warren, Mrs. Frank Manley (Anna Sophia Atkinson)",female,60,1,0,110813,75.25,D37,C 369 | 368,1,3,"Moussa, Mrs. (Mantoura Boulos)",female,,0,0,2626,7.2292,,C 370 | 369,1,3,"Jermyn, Miss. Annie",female,,0,0,14313,7.75,,Q 371 | 370,1,1,"Aubart, Mme. Leontine Pauline",female,24,0,0,PC 17477,69.3,B35,C 372 | 371,1,1,"Harder, Mr. George Achilles",male,25,1,0,11765,55.4417,E50,C 373 | 372,0,3,"Wiklund, Mr. Jakob Alfred",male,18,1,0,3101267,6.4958,,S 374 | 373,0,3,"Beavan, Mr. William Thomas",male,19,0,0,323951,8.05,,S 375 | 374,0,1,"Ringhini, Mr. Sante",male,22,0,0,PC 17760,135.6333,,C 376 | 375,0,3,"Palsson, Miss. Stina Viola",female,3,3,1,349909,21.075,,S 377 | 376,1,1,"Meyer, Mrs. Edgar Joseph (Leila Saks)",female,,1,0,PC 17604,82.1708,,C 378 | 377,1,3,"Landergren, Miss. Aurora Adelia",female,22,0,0,C 7077,7.25,,S 379 | 378,0,1,"Widener, Mr. Harry Elkins",male,27,0,2,113503,211.5,C82,C 380 | 379,0,3,"Betros, Mr. Tannous",male,20,0,0,2648,4.0125,,C 381 | 380,0,3,"Gustafsson, Mr. Karl Gideon",male,19,0,0,347069,7.775,,S 382 | 381,1,1,"Bidois, Miss. Rosalie",female,42,0,0,PC 17757,227.525,,C 383 | 382,1,3,"Nakid, Miss. Maria (""Mary"")",female,1,0,2,2653,15.7417,,C 384 | 383,0,3,"Tikkanen, Mr. Juho",male,32,0,0,STON/O 2. 3101293,7.925,,S 385 | 384,1,1,"Holverson, Mrs. Alexander Oskar (Mary Aline Towner)",female,35,1,0,113789,52,,S 386 | 385,0,3,"Plotcharsky, Mr. Vasil",male,,0,0,349227,7.8958,,S 387 | 386,0,2,"Davies, Mr. Charles Henry",male,18,0,0,S.O.C. 14879,73.5,,S 388 | 387,0,3,"Goodwin, Master. Sidney Leonard",male,1,5,2,CA 2144,46.9,,S 389 | 388,1,2,"Buss, Miss. Kate",female,36,0,0,27849,13,,S 390 | 389,0,3,"Sadlier, Mr. Matthew",male,,0,0,367655,7.7292,,Q 391 | 390,1,2,"Lehmann, Miss. Bertha",female,17,0,0,SC 1748,12,,C 392 | 391,1,1,"Carter, Mr. William Ernest",male,36,1,2,113760,120,B96 B98,S 393 | 392,1,3,"Jansson, Mr. Carl Olof",male,21,0,0,350034,7.7958,,S 394 | 393,0,3,"Gustafsson, Mr. Johan Birger",male,28,2,0,3101277,7.925,,S 395 | 394,1,1,"Newell, Miss. Marjorie",female,23,1,0,35273,113.275,D36,C 396 | 395,1,3,"Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson)",female,24,0,2,PP 9549,16.7,G6,S 397 | 396,0,3,"Johansson, Mr. Erik",male,22,0,0,350052,7.7958,,S 398 | 397,0,3,"Olsson, Miss. Elina",female,31,0,0,350407,7.8542,,S 399 | 398,0,2,"McKane, Mr. Peter David",male,46,0,0,28403,26,,S 400 | 399,0,2,"Pain, Dr. Alfred",male,23,0,0,244278,10.5,,S 401 | 400,1,2,"Trout, Mrs. William H (Jessie L)",female,28,0,0,240929,12.65,,S 402 | 401,1,3,"Niskanen, Mr. Juha",male,39,0,0,STON/O 2. 3101289,7.925,,S 403 | 402,0,3,"Adams, Mr. John",male,26,0,0,341826,8.05,,S 404 | 403,0,3,"Jussila, Miss. Mari Aina",female,21,1,0,4137,9.825,,S 405 | 404,0,3,"Hakkarainen, Mr. Pekka Pietari",male,28,1,0,STON/O2. 3101279,15.85,,S 406 | 405,0,3,"Oreskovic, Miss. Marija",female,20,0,0,315096,8.6625,,S 407 | 406,0,2,"Gale, Mr. Shadrach",male,34,1,0,28664,21,,S 408 | 407,0,3,"Widegren, Mr. Carl/Charles Peter",male,51,0,0,347064,7.75,,S 409 | 408,1,2,"Richards, Master. William Rowe",male,3,1,1,29106,18.75,,S 410 | 409,0,3,"Birkeland, Mr. Hans Martin Monsen",male,21,0,0,312992,7.775,,S 411 | 410,0,3,"Lefebre, Miss. Ida",female,,3,1,4133,25.4667,,S 412 | 411,0,3,"Sdycoff, Mr. Todor",male,,0,0,349222,7.8958,,S 413 | 412,0,3,"Hart, Mr. Henry",male,,0,0,394140,6.8583,,Q 414 | 413,1,1,"Minahan, Miss. Daisy E",female,33,1,0,19928,90,C78,Q 415 | 414,0,2,"Cunningham, Mr. Alfred Fleming",male,,0,0,239853,0,,S 416 | 415,1,3,"Sundman, Mr. Johan Julian",male,44,0,0,STON/O 2. 3101269,7.925,,S 417 | 416,0,3,"Meek, Mrs. Thomas (Annie Louise Rowley)",female,,0,0,343095,8.05,,S 418 | 417,1,2,"Drew, Mrs. James Vivian (Lulu Thorne Christian)",female,34,1,1,28220,32.5,,S 419 | 418,1,2,"Silven, Miss. Lyyli Karoliina",female,18,0,2,250652,13,,S 420 | 419,0,2,"Matthews, Mr. William John",male,30,0,0,28228,13,,S 421 | 420,0,3,"Van Impe, Miss. Catharina",female,10,0,2,345773,24.15,,S 422 | 421,0,3,"Gheorgheff, Mr. Stanio",male,,0,0,349254,7.8958,,C 423 | 422,0,3,"Charters, Mr. David",male,21,0,0,A/5. 13032,7.7333,,Q 424 | 423,0,3,"Zimmerman, Mr. Leo",male,29,0,0,315082,7.875,,S 425 | 424,0,3,"Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)",female,28,1,1,347080,14.4,,S 426 | 425,0,3,"Rosblom, Mr. Viktor Richard",male,18,1,1,370129,20.2125,,S 427 | 426,0,3,"Wiseman, Mr. Phillippe",male,,0,0,A/4. 34244,7.25,,S 428 | 427,1,2,"Clarke, Mrs. Charles V (Ada Maria Winfield)",female,28,1,0,2003,26,,S 429 | 428,1,2,"Phillips, Miss. Kate Florence (""Mrs Kate Louise Phillips Marshall"")",female,19,0,0,250655,26,,S 430 | 429,0,3,"Flynn, Mr. James",male,,0,0,364851,7.75,,Q 431 | 430,1,3,"Pickard, Mr. Berk (Berk Trembisky)",male,32,0,0,SOTON/O.Q. 392078,8.05,E10,S 432 | 431,1,1,"Bjornstrom-Steffansson, Mr. Mauritz Hakan",male,28,0,0,110564,26.55,C52,S 433 | 432,1,3,"Thorneycroft, Mrs. Percival (Florence Kate White)",female,,1,0,376564,16.1,,S 434 | 433,1,2,"Louch, Mrs. Charles Alexander (Alice Adelaide Slow)",female,42,1,0,SC/AH 3085,26,,S 435 | 434,0,3,"Kallio, Mr. Nikolai Erland",male,17,0,0,STON/O 2. 3101274,7.125,,S 436 | 435,0,1,"Silvey, Mr. William Baird",male,50,1,0,13507,55.9,E44,S 437 | 436,1,1,"Carter, Miss. Lucile Polk",female,14,1,2,113760,120,B96 B98,S 438 | 437,0,3,"Ford, Miss. Doolina Margaret ""Daisy""",female,21,2,2,W./C. 6608,34.375,,S 439 | 438,1,2,"Richards, Mrs. Sidney (Emily Hocking)",female,24,2,3,29106,18.75,,S 440 | 439,0,1,"Fortune, Mr. Mark",male,64,1,4,19950,263,C23 C25 C27,S 441 | 440,0,2,"Kvillner, Mr. Johan Henrik Johannesson",male,31,0,0,C.A. 18723,10.5,,S 442 | 441,1,2,"Hart, Mrs. Benjamin (Esther Ada Bloomfield)",female,45,1,1,F.C.C. 13529,26.25,,S 443 | 442,0,3,"Hampe, Mr. Leon",male,20,0,0,345769,9.5,,S 444 | 443,0,3,"Petterson, Mr. Johan Emil",male,25,1,0,347076,7.775,,S 445 | 444,1,2,"Reynaldo, Ms. Encarnacion",female,28,0,0,230434,13,,S 446 | 445,1,3,"Johannesen-Bratthammer, Mr. Bernt",male,,0,0,65306,8.1125,,S 447 | 446,1,1,"Dodge, Master. Washington",male,4,0,2,33638,81.8583,A34,S 448 | 447,1,2,"Mellinger, Miss. Madeleine Violet",female,13,0,1,250644,19.5,,S 449 | 448,1,1,"Seward, Mr. Frederic Kimber",male,34,0,0,113794,26.55,,S 450 | 449,1,3,"Baclini, Miss. Marie Catherine",female,5,2,1,2666,19.2583,,C 451 | 450,1,1,"Peuchen, Major. Arthur Godfrey",male,52,0,0,113786,30.5,C104,S 452 | 451,0,2,"West, Mr. Edwy Arthur",male,36,1,2,C.A. 34651,27.75,,S 453 | 452,0,3,"Hagland, Mr. Ingvald Olai Olsen",male,,1,0,65303,19.9667,,S 454 | 453,0,1,"Foreman, Mr. Benjamin Laventall",male,30,0,0,113051,27.75,C111,C 455 | 454,1,1,"Goldenberg, Mr. Samuel L",male,49,1,0,17453,89.1042,C92,C 456 | 455,0,3,"Peduzzi, Mr. Joseph",male,,0,0,A/5 2817,8.05,,S 457 | 456,1,3,"Jalsevac, Mr. Ivan",male,29,0,0,349240,7.8958,,C 458 | 457,0,1,"Millet, Mr. Francis Davis",male,65,0,0,13509,26.55,E38,S 459 | 458,1,1,"Kenyon, Mrs. Frederick R (Marion)",female,,1,0,17464,51.8625,D21,S 460 | 459,1,2,"Toomey, Miss. Ellen",female,50,0,0,F.C.C. 13531,10.5,,S 461 | 460,0,3,"O'Connor, Mr. Maurice",male,,0,0,371060,7.75,,Q 462 | 461,1,1,"Anderson, Mr. Harry",male,48,0,0,19952,26.55,E12,S 463 | 462,0,3,"Morley, Mr. William",male,34,0,0,364506,8.05,,S 464 | 463,0,1,"Gee, Mr. Arthur H",male,47,0,0,111320,38.5,E63,S 465 | 464,0,2,"Milling, Mr. Jacob Christian",male,48,0,0,234360,13,,S 466 | 465,0,3,"Maisner, Mr. Simon",male,,0,0,A/S 2816,8.05,,S 467 | 466,0,3,"Goncalves, Mr. Manuel Estanslas",male,38,0,0,SOTON/O.Q. 3101306,7.05,,S 468 | 467,0,2,"Campbell, Mr. William",male,,0,0,239853,0,,S 469 | 468,0,1,"Smart, Mr. John Montgomery",male,56,0,0,113792,26.55,,S 470 | 469,0,3,"Scanlan, Mr. James",male,,0,0,36209,7.725,,Q 471 | 470,1,3,"Baclini, Miss. Helene Barbara",female,0.75,2,1,2666,19.2583,,C 472 | 471,0,3,"Keefe, Mr. Arthur",male,,0,0,323592,7.25,,S 473 | 472,0,3,"Cacic, Mr. Luka",male,38,0,0,315089,8.6625,,S 474 | 473,1,2,"West, Mrs. Edwy Arthur (Ada Mary Worth)",female,33,1,2,C.A. 34651,27.75,,S 475 | 474,1,2,"Jerwan, Mrs. Amin S (Marie Marthe Thuillard)",female,23,0,0,SC/AH Basle 541,13.7917,D,C 476 | 475,0,3,"Strandberg, Miss. Ida Sofia",female,22,0,0,7553,9.8375,,S 477 | 476,0,1,"Clifford, Mr. George Quincy",male,,0,0,110465,52,A14,S 478 | 477,0,2,"Renouf, Mr. Peter Henry",male,34,1,0,31027,21,,S 479 | 478,0,3,"Braund, Mr. Lewis Richard",male,29,1,0,3460,7.0458,,S 480 | 479,0,3,"Karlsson, Mr. Nils August",male,22,0,0,350060,7.5208,,S 481 | 480,1,3,"Hirvonen, Miss. Hildur E",female,2,0,1,3101298,12.2875,,S 482 | 481,0,3,"Goodwin, Master. Harold Victor",male,9,5,2,CA 2144,46.9,,S 483 | 482,0,2,"Frost, Mr. Anthony Wood ""Archie""",male,,0,0,239854,0,,S 484 | 483,0,3,"Rouse, Mr. Richard Henry",male,50,0,0,A/5 3594,8.05,,S 485 | 484,1,3,"Turkula, Mrs. (Hedwig)",female,63,0,0,4134,9.5875,,S 486 | 485,1,1,"Bishop, Mr. Dickinson H",male,25,1,0,11967,91.0792,B49,C 487 | 486,0,3,"Lefebre, Miss. Jeannie",female,,3,1,4133,25.4667,,S 488 | 487,1,1,"Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)",female,35,1,0,19943,90,C93,S 489 | 488,0,1,"Kent, Mr. Edward Austin",male,58,0,0,11771,29.7,B37,C 490 | 489,0,3,"Somerton, Mr. Francis William",male,30,0,0,A.5. 18509,8.05,,S 491 | 490,1,3,"Coutts, Master. Eden Leslie ""Neville""",male,9,1,1,C.A. 37671,15.9,,S 492 | 491,0,3,"Hagland, Mr. Konrad Mathias Reiersen",male,,1,0,65304,19.9667,,S 493 | 492,0,3,"Windelov, Mr. Einar",male,21,0,0,SOTON/OQ 3101317,7.25,,S 494 | 493,0,1,"Molson, Mr. Harry Markland",male,55,0,0,113787,30.5,C30,S 495 | 494,0,1,"Artagaveytia, Mr. Ramon",male,71,0,0,PC 17609,49.5042,,C 496 | 495,0,3,"Stanley, Mr. Edward Roland",male,21,0,0,A/4 45380,8.05,,S 497 | 496,0,3,"Yousseff, Mr. Gerious",male,,0,0,2627,14.4583,,C 498 | 497,1,1,"Eustis, Miss. Elizabeth Mussey",female,54,1,0,36947,78.2667,D20,C 499 | 498,0,3,"Shellard, Mr. Frederick William",male,,0,0,C.A. 6212,15.1,,S 500 | 499,0,1,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25,1,2,113781,151.55,C22 C26,S 501 | 500,0,3,"Svensson, Mr. Olof",male,24,0,0,350035,7.7958,,S 502 | 501,0,3,"Calic, Mr. Petar",male,17,0,0,315086,8.6625,,S 503 | 502,0,3,"Canavan, Miss. Mary",female,21,0,0,364846,7.75,,Q 504 | 503,0,3,"O'Sullivan, Miss. Bridget Mary",female,,0,0,330909,7.6292,,Q 505 | 504,0,3,"Laitinen, Miss. Kristina Sofia",female,37,0,0,4135,9.5875,,S 506 | 505,1,1,"Maioni, Miss. Roberta",female,16,0,0,110152,86.5,B79,S 507 | 506,0,1,"Penasco y Castellana, Mr. Victor de Satode",male,18,1,0,PC 17758,108.9,C65,C 508 | 507,1,2,"Quick, Mrs. Frederick Charles (Jane Richards)",female,33,0,2,26360,26,,S 509 | 508,1,1,"Bradley, Mr. George (""George Arthur Brayton"")",male,,0,0,111427,26.55,,S 510 | 509,0,3,"Olsen, Mr. Henry Margido",male,28,0,0,C 4001,22.525,,S 511 | 510,1,3,"Lang, Mr. Fang",male,26,0,0,1601,56.4958,,S 512 | 511,1,3,"Daly, Mr. Eugene Patrick",male,29,0,0,382651,7.75,,Q 513 | 512,0,3,"Webber, Mr. James",male,,0,0,SOTON/OQ 3101316,8.05,,S 514 | 513,1,1,"McGough, Mr. James Robert",male,36,0,0,PC 17473,26.2875,E25,S 515 | 514,1,1,"Rothschild, Mrs. Martin (Elizabeth L. Barrett)",female,54,1,0,PC 17603,59.4,,C 516 | 515,0,3,"Coleff, Mr. Satio",male,24,0,0,349209,7.4958,,S 517 | 516,0,1,"Walker, Mr. William Anderson",male,47,0,0,36967,34.0208,D46,S 518 | 517,1,2,"Lemore, Mrs. (Amelia Milley)",female,34,0,0,C.A. 34260,10.5,F33,S 519 | 518,0,3,"Ryan, Mr. Patrick",male,,0,0,371110,24.15,,Q 520 | 519,1,2,"Angle, Mrs. William A (Florence ""Mary"" Agnes Hughes)",female,36,1,0,226875,26,,S 521 | 520,0,3,"Pavlovic, Mr. Stefo",male,32,0,0,349242,7.8958,,S 522 | 521,1,1,"Perreault, Miss. Anne",female,30,0,0,12749,93.5,B73,S 523 | 522,0,3,"Vovk, Mr. Janko",male,22,0,0,349252,7.8958,,S 524 | 523,0,3,"Lahoud, Mr. Sarkis",male,,0,0,2624,7.225,,C 525 | 524,1,1,"Hippach, Mrs. Louis Albert (Ida Sophia Fischer)",female,44,0,1,111361,57.9792,B18,C 526 | 525,0,3,"Kassem, Mr. Fared",male,,0,0,2700,7.2292,,C 527 | 526,0,3,"Farrell, Mr. James",male,40.5,0,0,367232,7.75,,Q 528 | 527,1,2,"Ridsdale, Miss. Lucy",female,50,0,0,W./C. 14258,10.5,,S 529 | 528,0,1,"Farthing, Mr. John",male,,0,0,PC 17483,221.7792,C95,S 530 | 529,0,3,"Salonen, Mr. Johan Werner",male,39,0,0,3101296,7.925,,S 531 | 530,0,2,"Hocking, Mr. Richard George",male,23,2,1,29104,11.5,,S 532 | 531,1,2,"Quick, Miss. Phyllis May",female,2,1,1,26360,26,,S 533 | 532,0,3,"Toufik, Mr. Nakli",male,,0,0,2641,7.2292,,C 534 | 533,0,3,"Elias, Mr. Joseph Jr",male,17,1,1,2690,7.2292,,C 535 | 534,1,3,"Peter, Mrs. Catherine (Catherine Rizk)",female,,0,2,2668,22.3583,,C 536 | 535,0,3,"Cacic, Miss. Marija",female,30,0,0,315084,8.6625,,S 537 | 536,1,2,"Hart, Miss. Eva Miriam",female,7,0,2,F.C.C. 13529,26.25,,S 538 | 537,0,1,"Butt, Major. Archibald Willingham",male,45,0,0,113050,26.55,B38,S 539 | 538,1,1,"LeRoy, Miss. Bertha",female,30,0,0,PC 17761,106.425,,C 540 | 539,0,3,"Risien, Mr. Samuel Beard",male,,0,0,364498,14.5,,S 541 | 540,1,1,"Frolicher, Miss. Hedwig Margaritha",female,22,0,2,13568,49.5,B39,C 542 | 541,1,1,"Crosby, Miss. Harriet R",female,36,0,2,WE/P 5735,71,B22,S 543 | 542,0,3,"Andersson, Miss. Ingeborg Constanzia",female,9,4,2,347082,31.275,,S 544 | 543,0,3,"Andersson, Miss. Sigrid Elisabeth",female,11,4,2,347082,31.275,,S 545 | 544,1,2,"Beane, Mr. Edward",male,32,1,0,2908,26,,S 546 | 545,0,1,"Douglas, Mr. Walter Donald",male,50,1,0,PC 17761,106.425,C86,C 547 | 546,0,1,"Nicholson, Mr. Arthur Ernest",male,64,0,0,693,26,,S 548 | 547,1,2,"Beane, Mrs. Edward (Ethel Clarke)",female,19,1,0,2908,26,,S 549 | 548,1,2,"Padro y Manent, Mr. Julian",male,,0,0,SC/PARIS 2146,13.8625,,C 550 | 549,0,3,"Goldsmith, Mr. Frank John",male,33,1,1,363291,20.525,,S 551 | 550,1,2,"Davies, Master. John Morgan Jr",male,8,1,1,C.A. 33112,36.75,,S 552 | 551,1,1,"Thayer, Mr. John Borland Jr",male,17,0,2,17421,110.8833,C70,C 553 | 552,0,2,"Sharp, Mr. Percival James R",male,27,0,0,244358,26,,S 554 | 553,0,3,"O'Brien, Mr. Timothy",male,,0,0,330979,7.8292,,Q 555 | 554,1,3,"Leeni, Mr. Fahim (""Philip Zenni"")",male,22,0,0,2620,7.225,,C 556 | 555,1,3,"Ohman, Miss. Velin",female,22,0,0,347085,7.775,,S 557 | 556,0,1,"Wright, Mr. George",male,62,0,0,113807,26.55,,S 558 | 557,1,1,"Duff Gordon, Lady. (Lucille Christiana Sutherland) (""Mrs Morgan"")",female,48,1,0,11755,39.6,A16,C 559 | 558,0,1,"Robbins, Mr. Victor",male,,0,0,PC 17757,227.525,,C 560 | 559,1,1,"Taussig, Mrs. Emil (Tillie Mandelbaum)",female,39,1,1,110413,79.65,E67,S 561 | 560,1,3,"de Messemaeker, Mrs. Guillaume Joseph (Emma)",female,36,1,0,345572,17.4,,S 562 | 561,0,3,"Morrow, Mr. Thomas Rowan",male,,0,0,372622,7.75,,Q 563 | 562,0,3,"Sivic, Mr. Husein",male,40,0,0,349251,7.8958,,S 564 | 563,0,2,"Norman, Mr. Robert Douglas",male,28,0,0,218629,13.5,,S 565 | 564,0,3,"Simmons, Mr. John",male,,0,0,SOTON/OQ 392082,8.05,,S 566 | 565,0,3,"Meanwell, Miss. (Marion Ogden)",female,,0,0,SOTON/O.Q. 392087,8.05,,S 567 | 566,0,3,"Davies, Mr. Alfred J",male,24,2,0,A/4 48871,24.15,,S 568 | 567,0,3,"Stoytcheff, Mr. Ilia",male,19,0,0,349205,7.8958,,S 569 | 568,0,3,"Palsson, Mrs. Nils (Alma Cornelia Berglund)",female,29,0,4,349909,21.075,,S 570 | 569,0,3,"Doharr, Mr. Tannous",male,,0,0,2686,7.2292,,C 571 | 570,1,3,"Jonsson, Mr. Carl",male,32,0,0,350417,7.8542,,S 572 | 571,1,2,"Harris, Mr. George",male,62,0,0,S.W./PP 752,10.5,,S 573 | 572,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53,2,0,11769,51.4792,C101,S 574 | 573,1,1,"Flynn, Mr. John Irwin (""Irving"")",male,36,0,0,PC 17474,26.3875,E25,S 575 | 574,1,3,"Kelly, Miss. Mary",female,,0,0,14312,7.75,,Q 576 | 575,0,3,"Rush, Mr. Alfred George John",male,16,0,0,A/4. 20589,8.05,,S 577 | 576,0,3,"Patchett, Mr. George",male,19,0,0,358585,14.5,,S 578 | 577,1,2,"Garside, Miss. Ethel",female,34,0,0,243880,13,,S 579 | 578,1,1,"Silvey, Mrs. William Baird (Alice Munger)",female,39,1,0,13507,55.9,E44,S 580 | 579,0,3,"Caram, Mrs. Joseph (Maria Elias)",female,,1,0,2689,14.4583,,C 581 | 580,1,3,"Jussila, Mr. Eiriik",male,32,0,0,STON/O 2. 3101286,7.925,,S 582 | 581,1,2,"Christy, Miss. Julie Rachel",female,25,1,1,237789,30,,S 583 | 582,1,1,"Thayer, Mrs. John Borland (Marian Longstreth Morris)",female,39,1,1,17421,110.8833,C68,C 584 | 583,0,2,"Downton, Mr. William James",male,54,0,0,28403,26,,S 585 | 584,0,1,"Ross, Mr. John Hugo",male,36,0,0,13049,40.125,A10,C 586 | 585,0,3,"Paulner, Mr. Uscher",male,,0,0,3411,8.7125,,C 587 | 586,1,1,"Taussig, Miss. Ruth",female,18,0,2,110413,79.65,E68,S 588 | 587,0,2,"Jarvis, Mr. John Denzil",male,47,0,0,237565,15,,S 589 | 588,1,1,"Frolicher-Stehli, Mr. Maxmillian",male,60,1,1,13567,79.2,B41,C 590 | 589,0,3,"Gilinski, Mr. Eliezer",male,22,0,0,14973,8.05,,S 591 | 590,0,3,"Murdlin, Mr. Joseph",male,,0,0,A./5. 3235,8.05,,S 592 | 591,0,3,"Rintamaki, Mr. Matti",male,35,0,0,STON/O 2. 3101273,7.125,,S 593 | 592,1,1,"Stephenson, Mrs. Walter Bertram (Martha Eustis)",female,52,1,0,36947,78.2667,D20,C 594 | 593,0,3,"Elsbury, Mr. William James",male,47,0,0,A/5 3902,7.25,,S 595 | 594,0,3,"Bourke, Miss. Mary",female,,0,2,364848,7.75,,Q 596 | 595,0,2,"Chapman, Mr. John Henry",male,37,1,0,SC/AH 29037,26,,S 597 | 596,0,3,"Van Impe, Mr. Jean Baptiste",male,36,1,1,345773,24.15,,S 598 | 597,1,2,"Leitch, Miss. Jessie Wills",female,,0,0,248727,33,,S 599 | 598,0,3,"Johnson, Mr. Alfred",male,49,0,0,LINE,0,,S 600 | 599,0,3,"Boulos, Mr. Hanna",male,,0,0,2664,7.225,,C 601 | 600,1,1,"Duff Gordon, Sir. Cosmo Edmund (""Mr Morgan"")",male,49,1,0,PC 17485,56.9292,A20,C 602 | 601,1,2,"Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)",female,24,2,1,243847,27,,S 603 | 602,0,3,"Slabenoff, Mr. Petco",male,,0,0,349214,7.8958,,S 604 | 603,0,1,"Harrington, Mr. Charles H",male,,0,0,113796,42.4,,S 605 | 604,0,3,"Torber, Mr. Ernst William",male,44,0,0,364511,8.05,,S 606 | 605,1,1,"Homer, Mr. Harry (""Mr E Haven"")",male,35,0,0,111426,26.55,,C 607 | 606,0,3,"Lindell, Mr. Edvard Bengtsson",male,36,1,0,349910,15.55,,S 608 | 607,0,3,"Karaic, Mr. Milan",male,30,0,0,349246,7.8958,,S 609 | 608,1,1,"Daniel, Mr. Robert Williams",male,27,0,0,113804,30.5,,S 610 | 609,1,2,"Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue)",female,22,1,2,SC/Paris 2123,41.5792,,C 611 | 610,1,1,"Shutes, Miss. Elizabeth W",female,40,0,0,PC 17582,153.4625,C125,S 612 | 611,0,3,"Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)",female,39,1,5,347082,31.275,,S 613 | 612,0,3,"Jardin, Mr. Jose Neto",male,,0,0,SOTON/O.Q. 3101305,7.05,,S 614 | 613,1,3,"Murphy, Miss. Margaret Jane",female,,1,0,367230,15.5,,Q 615 | 614,0,3,"Horgan, Mr. John",male,,0,0,370377,7.75,,Q 616 | 615,0,3,"Brocklebank, Mr. William Alfred",male,35,0,0,364512,8.05,,S 617 | 616,1,2,"Herman, Miss. Alice",female,24,1,2,220845,65,,S 618 | 617,0,3,"Danbom, Mr. Ernst Gilbert",male,34,1,1,347080,14.4,,S 619 | 618,0,3,"Lobb, Mrs. William Arthur (Cordelia K Stanlick)",female,26,1,0,A/5. 3336,16.1,,S 620 | 619,1,2,"Becker, Miss. Marion Louise",female,4,2,1,230136,39,F4,S 621 | 620,0,2,"Gavey, Mr. Lawrence",male,26,0,0,31028,10.5,,S 622 | 621,0,3,"Yasbeck, Mr. Antoni",male,27,1,0,2659,14.4542,,C 623 | 622,1,1,"Kimball, Mr. Edwin Nelson Jr",male,42,1,0,11753,52.5542,D19,S 624 | 623,1,3,"Nakid, Mr. Sahid",male,20,1,1,2653,15.7417,,C 625 | 624,0,3,"Hansen, Mr. Henry Damsgaard",male,21,0,0,350029,7.8542,,S 626 | 625,0,3,"Bowen, Mr. David John ""Dai""",male,21,0,0,54636,16.1,,S 627 | 626,0,1,"Sutton, Mr. Frederick",male,61,0,0,36963,32.3208,D50,S 628 | 627,0,2,"Kirkland, Rev. Charles Leonard",male,57,0,0,219533,12.35,,Q 629 | 628,1,1,"Longley, Miss. Gretchen Fiske",female,21,0,0,13502,77.9583,D9,S 630 | 629,0,3,"Bostandyeff, Mr. Guentcho",male,26,0,0,349224,7.8958,,S 631 | 630,0,3,"O'Connell, Mr. Patrick D",male,,0,0,334912,7.7333,,Q 632 | 631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80,0,0,27042,30,A23,S 633 | 632,0,3,"Lundahl, Mr. Johan Svensson",male,51,0,0,347743,7.0542,,S 634 | 633,1,1,"Stahelin-Maeglin, Dr. Max",male,32,0,0,13214,30.5,B50,C 635 | 634,0,1,"Parr, Mr. William Henry Marsh",male,,0,0,112052,0,,S 636 | 635,0,3,"Skoog, Miss. Mabel",female,9,3,2,347088,27.9,,S 637 | 636,1,2,"Davis, Miss. Mary",female,28,0,0,237668,13,,S 638 | 637,0,3,"Leinonen, Mr. Antti Gustaf",male,32,0,0,STON/O 2. 3101292,7.925,,S 639 | 638,0,2,"Collyer, Mr. Harvey",male,31,1,1,C.A. 31921,26.25,,S 640 | 639,0,3,"Panula, Mrs. Juha (Maria Emilia Ojala)",female,41,0,5,3101295,39.6875,,S 641 | 640,0,3,"Thorneycroft, Mr. Percival",male,,1,0,376564,16.1,,S 642 | 641,0,3,"Jensen, Mr. Hans Peder",male,20,0,0,350050,7.8542,,S 643 | 642,1,1,"Sagesser, Mlle. Emma",female,24,0,0,PC 17477,69.3,B35,C 644 | 643,0,3,"Skoog, Miss. Margit Elizabeth",female,2,3,2,347088,27.9,,S 645 | 644,1,3,"Foo, Mr. Choong",male,,0,0,1601,56.4958,,S 646 | 645,1,3,"Baclini, Miss. Eugenie",female,0.75,2,1,2666,19.2583,,C 647 | 646,1,1,"Harper, Mr. Henry Sleeper",male,48,1,0,PC 17572,76.7292,D33,C 648 | 647,0,3,"Cor, Mr. Liudevit",male,19,0,0,349231,7.8958,,S 649 | 648,1,1,"Simonius-Blumer, Col. Oberst Alfons",male,56,0,0,13213,35.5,A26,C 650 | 649,0,3,"Willey, Mr. Edward",male,,0,0,S.O./P.P. 751,7.55,,S 651 | 650,1,3,"Stanley, Miss. Amy Zillah Elsie",female,23,0,0,CA. 2314,7.55,,S 652 | 651,0,3,"Mitkoff, Mr. Mito",male,,0,0,349221,7.8958,,S 653 | 652,1,2,"Doling, Miss. Elsie",female,18,0,1,231919,23,,S 654 | 653,0,3,"Kalvik, Mr. Johannes Halvorsen",male,21,0,0,8475,8.4333,,S 655 | 654,1,3,"O'Leary, Miss. Hanora ""Norah""",female,,0,0,330919,7.8292,,Q 656 | 655,0,3,"Hegarty, Miss. Hanora ""Nora""",female,18,0,0,365226,6.75,,Q 657 | 656,0,2,"Hickman, Mr. Leonard Mark",male,24,2,0,S.O.C. 14879,73.5,,S 658 | 657,0,3,"Radeff, Mr. Alexander",male,,0,0,349223,7.8958,,S 659 | 658,0,3,"Bourke, Mrs. John (Catherine)",female,32,1,1,364849,15.5,,Q 660 | 659,0,2,"Eitemiller, Mr. George Floyd",male,23,0,0,29751,13,,S 661 | 660,0,1,"Newell, Mr. Arthur Webster",male,58,0,2,35273,113.275,D48,C 662 | 661,1,1,"Frauenthal, Dr. Henry William",male,50,2,0,PC 17611,133.65,,S 663 | 662,0,3,"Badt, Mr. Mohamed",male,40,0,0,2623,7.225,,C 664 | 663,0,1,"Colley, Mr. Edward Pomeroy",male,47,0,0,5727,25.5875,E58,S 665 | 664,0,3,"Coleff, Mr. Peju",male,36,0,0,349210,7.4958,,S 666 | 665,1,3,"Lindqvist, Mr. Eino William",male,20,1,0,STON/O 2. 3101285,7.925,,S 667 | 666,0,2,"Hickman, Mr. Lewis",male,32,2,0,S.O.C. 14879,73.5,,S 668 | 667,0,2,"Butler, Mr. Reginald Fenton",male,25,0,0,234686,13,,S 669 | 668,0,3,"Rommetvedt, Mr. Knud Paust",male,,0,0,312993,7.775,,S 670 | 669,0,3,"Cook, Mr. Jacob",male,43,0,0,A/5 3536,8.05,,S 671 | 670,1,1,"Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright)",female,,1,0,19996,52,C126,S 672 | 671,1,2,"Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)",female,40,1,1,29750,39,,S 673 | 672,0,1,"Davidson, Mr. Thornton",male,31,1,0,F.C. 12750,52,B71,S 674 | 673,0,2,"Mitchell, Mr. Henry Michael",male,70,0,0,C.A. 24580,10.5,,S 675 | 674,1,2,"Wilhelms, Mr. Charles",male,31,0,0,244270,13,,S 676 | 675,0,2,"Watson, Mr. Ennis Hastings",male,,0,0,239856,0,,S 677 | 676,0,3,"Edvardsson, Mr. Gustaf Hjalmar",male,18,0,0,349912,7.775,,S 678 | 677,0,3,"Sawyer, Mr. Frederick Charles",male,24.5,0,0,342826,8.05,,S 679 | 678,1,3,"Turja, Miss. Anna Sofia",female,18,0,0,4138,9.8417,,S 680 | 679,0,3,"Goodwin, Mrs. Frederick (Augusta Tyler)",female,43,1,6,CA 2144,46.9,,S 681 | 680,1,1,"Cardeza, Mr. Thomas Drake Martinez",male,36,0,1,PC 17755,512.3292,B51 B53 B55,C 682 | 681,0,3,"Peters, Miss. Katie",female,,0,0,330935,8.1375,,Q 683 | 682,1,1,"Hassab, Mr. Hammad",male,27,0,0,PC 17572,76.7292,D49,C 684 | 683,0,3,"Olsvigen, Mr. Thor Anderson",male,20,0,0,6563,9.225,,S 685 | 684,0,3,"Goodwin, Mr. Charles Edward",male,14,5,2,CA 2144,46.9,,S 686 | 685,0,2,"Brown, Mr. Thomas William Solomon",male,60,1,1,29750,39,,S 687 | 686,0,2,"Laroche, Mr. Joseph Philippe Lemercier",male,25,1,2,SC/Paris 2123,41.5792,,C 688 | 687,0,3,"Panula, Mr. Jaako Arnold",male,14,4,1,3101295,39.6875,,S 689 | 688,0,3,"Dakic, Mr. Branko",male,19,0,0,349228,10.1708,,S 690 | 689,0,3,"Fischer, Mr. Eberhard Thelander",male,18,0,0,350036,7.7958,,S 691 | 690,1,1,"Madill, Miss. Georgette Alexandra",female,15,0,1,24160,211.3375,B5,S 692 | 691,1,1,"Dick, Mr. Albert Adrian",male,31,1,0,17474,57,B20,S 693 | 692,1,3,"Karun, Miss. Manca",female,4,0,1,349256,13.4167,,C 694 | 693,1,3,"Lam, Mr. Ali",male,,0,0,1601,56.4958,,S 695 | 694,0,3,"Saad, Mr. Khalil",male,25,0,0,2672,7.225,,C 696 | 695,0,1,"Weir, Col. John",male,60,0,0,113800,26.55,,S 697 | 696,0,2,"Chapman, Mr. Charles Henry",male,52,0,0,248731,13.5,,S 698 | 697,0,3,"Kelly, Mr. James",male,44,0,0,363592,8.05,,S 699 | 698,1,3,"Mullens, Miss. Katherine ""Katie""",female,,0,0,35852,7.7333,,Q 700 | 699,0,1,"Thayer, Mr. John Borland",male,49,1,1,17421,110.8833,C68,C 701 | 700,0,3,"Humblen, Mr. Adolf Mathias Nicolai Olsen",male,42,0,0,348121,7.65,F G63,S 702 | 701,1,1,"Astor, Mrs. John Jacob (Madeleine Talmadge Force)",female,18,1,0,PC 17757,227.525,C62 C64,C 703 | 702,1,1,"Silverthorne, Mr. Spencer Victor",male,35,0,0,PC 17475,26.2875,E24,S 704 | 703,0,3,"Barbara, Miss. Saiide",female,18,0,1,2691,14.4542,,C 705 | 704,0,3,"Gallagher, Mr. Martin",male,25,0,0,36864,7.7417,,Q 706 | 705,0,3,"Hansen, Mr. Henrik Juul",male,26,1,0,350025,7.8542,,S 707 | 706,0,2,"Morley, Mr. Henry Samuel (""Mr Henry Marshall"")",male,39,0,0,250655,26,,S 708 | 707,1,2,"Kelly, Mrs. Florence ""Fannie""",female,45,0,0,223596,13.5,,S 709 | 708,1,1,"Calderhead, Mr. Edward Pennington",male,42,0,0,PC 17476,26.2875,E24,S 710 | 709,1,1,"Cleaver, Miss. Alice",female,22,0,0,113781,151.55,,S 711 | 710,1,3,"Moubarek, Master. Halim Gonios (""William George"")",male,,1,1,2661,15.2458,,C 712 | 711,1,1,"Mayne, Mlle. Berthe Antonine (""Mrs de Villiers"")",female,24,0,0,PC 17482,49.5042,C90,C 713 | 712,0,1,"Klaber, Mr. Herman",male,,0,0,113028,26.55,C124,S 714 | 713,1,1,"Taylor, Mr. Elmer Zebley",male,48,1,0,19996,52,C126,S 715 | 714,0,3,"Larsson, Mr. August Viktor",male,29,0,0,7545,9.4833,,S 716 | 715,0,2,"Greenberg, Mr. Samuel",male,52,0,0,250647,13,,S 717 | 716,0,3,"Soholt, Mr. Peter Andreas Lauritz Andersen",male,19,0,0,348124,7.65,F G73,S 718 | 717,1,1,"Endres, Miss. Caroline Louise",female,38,0,0,PC 17757,227.525,C45,C 719 | 718,1,2,"Troutt, Miss. Edwina Celia ""Winnie""",female,27,0,0,34218,10.5,E101,S 720 | 719,0,3,"McEvoy, Mr. Michael",male,,0,0,36568,15.5,,Q 721 | 720,0,3,"Johnson, Mr. Malkolm Joackim",male,33,0,0,347062,7.775,,S 722 | 721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6,0,1,248727,33,,S 723 | 722,0,3,"Jensen, Mr. Svend Lauritz",male,17,1,0,350048,7.0542,,S 724 | 723,0,2,"Gillespie, Mr. William Henry",male,34,0,0,12233,13,,S 725 | 724,0,2,"Hodges, Mr. Henry Price",male,50,0,0,250643,13,,S 726 | 725,1,1,"Chambers, Mr. Norman Campbell",male,27,1,0,113806,53.1,E8,S 727 | 726,0,3,"Oreskovic, Mr. Luka",male,20,0,0,315094,8.6625,,S 728 | 727,1,2,"Renouf, Mrs. Peter Henry (Lillian Jefferys)",female,30,3,0,31027,21,,S 729 | 728,1,3,"Mannion, Miss. Margareth",female,,0,0,36866,7.7375,,Q 730 | 729,0,2,"Bryhl, Mr. Kurt Arnold Gottfrid",male,25,1,0,236853,26,,S 731 | 730,0,3,"Ilmakangas, Miss. Pieta Sofia",female,25,1,0,STON/O2. 3101271,7.925,,S 732 | 731,1,1,"Allen, Miss. Elisabeth Walton",female,29,0,0,24160,211.3375,B5,S 733 | 732,0,3,"Hassan, Mr. Houssein G N",male,11,0,0,2699,18.7875,,C 734 | 733,0,2,"Knight, Mr. Robert J",male,,0,0,239855,0,,S 735 | 734,0,2,"Berriman, Mr. William John",male,23,0,0,28425,13,,S 736 | 735,0,2,"Troupiansky, Mr. Moses Aaron",male,23,0,0,233639,13,,S 737 | 736,0,3,"Williams, Mr. Leslie",male,28.5,0,0,54636,16.1,,S 738 | 737,0,3,"Ford, Mrs. Edward (Margaret Ann Watson)",female,48,1,3,W./C. 6608,34.375,,S 739 | 738,1,1,"Lesurer, Mr. Gustave J",male,35,0,0,PC 17755,512.3292,B101,C 740 | 739,0,3,"Ivanoff, Mr. Kanio",male,,0,0,349201,7.8958,,S 741 | 740,0,3,"Nankoff, Mr. Minko",male,,0,0,349218,7.8958,,S 742 | 741,1,1,"Hawksford, Mr. Walter James",male,,0,0,16988,30,D45,S 743 | 742,0,1,"Cavendish, Mr. Tyrell William",male,36,1,0,19877,78.85,C46,S 744 | 743,1,1,"Ryerson, Miss. Susan Parker ""Suzette""",female,21,2,2,PC 17608,262.375,B57 B59 B63 B66,C 745 | 744,0,3,"McNamee, Mr. Neal",male,24,1,0,376566,16.1,,S 746 | 745,1,3,"Stranden, Mr. Juho",male,31,0,0,STON/O 2. 3101288,7.925,,S 747 | 746,0,1,"Crosby, Capt. Edward Gifford",male,70,1,1,WE/P 5735,71,B22,S 748 | 747,0,3,"Abbott, Mr. Rossmore Edward",male,16,1,1,C.A. 2673,20.25,,S 749 | 748,1,2,"Sinkkonen, Miss. Anna",female,30,0,0,250648,13,,S 750 | 749,0,1,"Marvin, Mr. Daniel Warner",male,19,1,0,113773,53.1,D30,S 751 | 750,0,3,"Connaghton, Mr. Michael",male,31,0,0,335097,7.75,,Q 752 | 751,1,2,"Wells, Miss. Joan",female,4,1,1,29103,23,,S 753 | 752,1,3,"Moor, Master. Meier",male,6,0,1,392096,12.475,E121,S 754 | 753,0,3,"Vande Velde, Mr. Johannes Joseph",male,33,0,0,345780,9.5,,S 755 | 754,0,3,"Jonkoff, Mr. Lalio",male,23,0,0,349204,7.8958,,S 756 | 755,1,2,"Herman, Mrs. Samuel (Jane Laver)",female,48,1,2,220845,65,,S 757 | 756,1,2,"Hamalainen, Master. Viljo",male,0.67,1,1,250649,14.5,,S 758 | 757,0,3,"Carlsson, Mr. August Sigfrid",male,28,0,0,350042,7.7958,,S 759 | 758,0,2,"Bailey, Mr. Percy Andrew",male,18,0,0,29108,11.5,,S 760 | 759,0,3,"Theobald, Mr. Thomas Leonard",male,34,0,0,363294,8.05,,S 761 | 760,1,1,"Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)",female,33,0,0,110152,86.5,B77,S 762 | 761,0,3,"Garfirth, Mr. John",male,,0,0,358585,14.5,,S 763 | 762,0,3,"Nirva, Mr. Iisakki Antino Aijo",male,41,0,0,SOTON/O2 3101272,7.125,,S 764 | 763,1,3,"Barah, Mr. Hanna Assi",male,20,0,0,2663,7.2292,,C 765 | 764,1,1,"Carter, Mrs. William Ernest (Lucile Polk)",female,36,1,2,113760,120,B96 B98,S 766 | 765,0,3,"Eklund, Mr. Hans Linus",male,16,0,0,347074,7.775,,S 767 | 766,1,1,"Hogeboom, Mrs. John C (Anna Andrews)",female,51,1,0,13502,77.9583,D11,S 768 | 767,0,1,"Brewe, Dr. Arthur Jackson",male,,0,0,112379,39.6,,C 769 | 768,0,3,"Mangan, Miss. Mary",female,30.5,0,0,364850,7.75,,Q 770 | 769,0,3,"Moran, Mr. Daniel J",male,,1,0,371110,24.15,,Q 771 | 770,0,3,"Gronnestad, Mr. Daniel Danielsen",male,32,0,0,8471,8.3625,,S 772 | 771,0,3,"Lievens, Mr. Rene Aime",male,24,0,0,345781,9.5,,S 773 | 772,0,3,"Jensen, Mr. Niels Peder",male,48,0,0,350047,7.8542,,S 774 | 773,0,2,"Mack, Mrs. (Mary)",female,57,0,0,S.O./P.P. 3,10.5,E77,S 775 | 774,0,3,"Elias, Mr. Dibo",male,,0,0,2674,7.225,,C 776 | 775,1,2,"Hocking, Mrs. Elizabeth (Eliza Needs)",female,54,1,3,29105,23,,S 777 | 776,0,3,"Myhrman, Mr. Pehr Fabian Oliver Malkolm",male,18,0,0,347078,7.75,,S 778 | 777,0,3,"Tobin, Mr. Roger",male,,0,0,383121,7.75,F38,Q 779 | 778,1,3,"Emanuel, Miss. Virginia Ethel",female,5,0,0,364516,12.475,,S 780 | 779,0,3,"Kilgannon, Mr. Thomas J",male,,0,0,36865,7.7375,,Q 781 | 780,1,1,"Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)",female,43,0,1,24160,211.3375,B3,S 782 | 781,1,3,"Ayoub, Miss. Banoura",female,13,0,0,2687,7.2292,,C 783 | 782,1,1,"Dick, Mrs. Albert Adrian (Vera Gillespie)",female,17,1,0,17474,57,B20,S 784 | 783,0,1,"Long, Mr. Milton Clyde",male,29,0,0,113501,30,D6,S 785 | 784,0,3,"Johnston, Mr. Andrew G",male,,1,2,W./C. 6607,23.45,,S 786 | 785,0,3,"Ali, Mr. William",male,25,0,0,SOTON/O.Q. 3101312,7.05,,S 787 | 786,0,3,"Harmer, Mr. Abraham (David Lishin)",male,25,0,0,374887,7.25,,S 788 | 787,1,3,"Sjoblom, Miss. Anna Sofia",female,18,0,0,3101265,7.4958,,S 789 | 788,0,3,"Rice, Master. George Hugh",male,8,4,1,382652,29.125,,Q 790 | 789,1,3,"Dean, Master. Bertram Vere",male,1,1,2,C.A. 2315,20.575,,S 791 | 790,0,1,"Guggenheim, Mr. Benjamin",male,46,0,0,PC 17593,79.2,B82 B84,C 792 | 791,0,3,"Keane, Mr. Andrew ""Andy""",male,,0,0,12460,7.75,,Q 793 | 792,0,2,"Gaskell, Mr. Alfred",male,16,0,0,239865,26,,S 794 | 793,0,3,"Sage, Miss. Stella Anna",female,,8,2,CA. 2343,69.55,,S 795 | 794,0,1,"Hoyt, Mr. William Fisher",male,,0,0,PC 17600,30.6958,,C 796 | 795,0,3,"Dantcheff, Mr. Ristiu",male,25,0,0,349203,7.8958,,S 797 | 796,0,2,"Otter, Mr. Richard",male,39,0,0,28213,13,,S 798 | 797,1,1,"Leader, Dr. Alice (Farnham)",female,49,0,0,17465,25.9292,D17,S 799 | 798,1,3,"Osman, Mrs. Mara",female,31,0,0,349244,8.6833,,S 800 | 799,0,3,"Ibrahim Shawah, Mr. Yousseff",male,30,0,0,2685,7.2292,,C 801 | 800,0,3,"Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)",female,30,1,1,345773,24.15,,S 802 | 801,0,2,"Ponesell, Mr. Martin",male,34,0,0,250647,13,,S 803 | 802,1,2,"Collyer, Mrs. Harvey (Charlotte Annie Tate)",female,31,1,1,C.A. 31921,26.25,,S 804 | 803,1,1,"Carter, Master. William Thornton II",male,11,1,2,113760,120,B96 B98,S 805 | 804,1,3,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C 806 | 805,1,3,"Hedman, Mr. Oskar Arvid",male,27,0,0,347089,6.975,,S 807 | 806,0,3,"Johansson, Mr. Karl Johan",male,31,0,0,347063,7.775,,S 808 | 807,0,1,"Andrews, Mr. Thomas Jr",male,39,0,0,112050,0,A36,S 809 | 808,0,3,"Pettersson, Miss. Ellen Natalia",female,18,0,0,347087,7.775,,S 810 | 809,0,2,"Meyer, Mr. August",male,39,0,0,248723,13,,S 811 | 810,1,1,"Chambers, Mrs. Norman Campbell (Bertha Griggs)",female,33,1,0,113806,53.1,E8,S 812 | 811,0,3,"Alexander, Mr. William",male,26,0,0,3474,7.8875,,S 813 | 812,0,3,"Lester, Mr. James",male,39,0,0,A/4 48871,24.15,,S 814 | 813,0,2,"Slemen, Mr. Richard James",male,35,0,0,28206,10.5,,S 815 | 814,0,3,"Andersson, Miss. Ebba Iris Alfrida",female,6,4,2,347082,31.275,,S 816 | 815,0,3,"Tomlin, Mr. Ernest Portage",male,30.5,0,0,364499,8.05,,S 817 | 816,0,1,"Fry, Mr. Richard",male,,0,0,112058,0,B102,S 818 | 817,0,3,"Heininen, Miss. Wendla Maria",female,23,0,0,STON/O2. 3101290,7.925,,S 819 | 818,0,2,"Mallet, Mr. Albert",male,31,1,1,S.C./PARIS 2079,37.0042,,C 820 | 819,0,3,"Holm, Mr. John Fredrik Alexander",male,43,0,0,C 7075,6.45,,S 821 | 820,0,3,"Skoog, Master. Karl Thorsten",male,10,3,2,347088,27.9,,S 822 | 821,1,1,"Hays, Mrs. Charles Melville (Clara Jennings Gregg)",female,52,1,1,12749,93.5,B69,S 823 | 822,1,3,"Lulic, Mr. Nikola",male,27,0,0,315098,8.6625,,S 824 | 823,0,1,"Reuchlin, Jonkheer. John George",male,38,0,0,19972,0,,S 825 | 824,1,3,"Moor, Mrs. (Beila)",female,27,0,1,392096,12.475,E121,S 826 | 825,0,3,"Panula, Master. Urho Abraham",male,2,4,1,3101295,39.6875,,S 827 | 826,0,3,"Flynn, Mr. John",male,,0,0,368323,6.95,,Q 828 | 827,0,3,"Lam, Mr. Len",male,,0,0,1601,56.4958,,S 829 | 828,1,2,"Mallet, Master. Andre",male,1,0,2,S.C./PARIS 2079,37.0042,,C 830 | 829,1,3,"McCormack, Mr. Thomas Joseph",male,,0,0,367228,7.75,,Q 831 | 830,1,1,"Stone, Mrs. George Nelson (Martha Evelyn)",female,62,0,0,113572,80,B28, 832 | 831,1,3,"Yasbeck, Mrs. Antoni (Selini Alexander)",female,15,1,0,2659,14.4542,,C 833 | 832,1,2,"Richards, Master. George Sibley",male,0.83,1,1,29106,18.75,,S 834 | 833,0,3,"Saad, Mr. Amin",male,,0,0,2671,7.2292,,C 835 | 834,0,3,"Augustsson, Mr. Albert",male,23,0,0,347468,7.8542,,S 836 | 835,0,3,"Allum, Mr. Owen George",male,18,0,0,2223,8.3,,S 837 | 836,1,1,"Compton, Miss. Sara Rebecca",female,39,1,1,PC 17756,83.1583,E49,C 838 | 837,0,3,"Pasic, Mr. Jakob",male,21,0,0,315097,8.6625,,S 839 | 838,0,3,"Sirota, Mr. Maurice",male,,0,0,392092,8.05,,S 840 | 839,1,3,"Chip, Mr. Chang",male,32,0,0,1601,56.4958,,S 841 | 840,1,1,"Marechal, Mr. Pierre",male,,0,0,11774,29.7,C47,C 842 | 841,0,3,"Alhomaki, Mr. Ilmari Rudolf",male,20,0,0,SOTON/O2 3101287,7.925,,S 843 | 842,0,2,"Mudd, Mr. Thomas Charles",male,16,0,0,S.O./P.P. 3,10.5,,S 844 | 843,1,1,"Serepeca, Miss. Augusta",female,30,0,0,113798,31,,C 845 | 844,0,3,"Lemberopolous, Mr. Peter L",male,34.5,0,0,2683,6.4375,,C 846 | 845,0,3,"Culumovic, Mr. Jeso",male,17,0,0,315090,8.6625,,S 847 | 846,0,3,"Abbing, Mr. Anthony",male,42,0,0,C.A. 5547,7.55,,S 848 | 847,0,3,"Sage, Mr. Douglas Bullen",male,,8,2,CA. 2343,69.55,,S 849 | 848,0,3,"Markoff, Mr. Marin",male,35,0,0,349213,7.8958,,C 850 | 849,0,2,"Harper, Rev. John",male,28,0,1,248727,33,,S 851 | 850,1,1,"Goldenberg, Mrs. Samuel L (Edwiga Grabowska)",female,,1,0,17453,89.1042,C92,C 852 | 851,0,3,"Andersson, Master. Sigvard Harald Elias",male,4,4,2,347082,31.275,,S 853 | 852,0,3,"Svensson, Mr. Johan",male,74,0,0,347060,7.775,,S 854 | 853,0,3,"Boulos, Miss. Nourelain",female,9,1,1,2678,15.2458,,C 855 | 854,1,1,"Lines, Miss. Mary Conover",female,16,0,1,PC 17592,39.4,D28,S 856 | 855,0,2,"Carter, Mrs. Ernest Courtenay (Lilian Hughes)",female,44,1,0,244252,26,,S 857 | 856,1,3,"Aks, Mrs. Sam (Leah Rosen)",female,18,0,1,392091,9.35,,S 858 | 857,1,1,"Wick, Mrs. George Dennick (Mary Hitchcock)",female,45,1,1,36928,164.8667,,S 859 | 858,1,1,"Daly, Mr. Peter Denis ",male,51,0,0,113055,26.55,E17,S 860 | 859,1,3,"Baclini, Mrs. Solomon (Latifa Qurban)",female,24,0,3,2666,19.2583,,C 861 | 860,0,3,"Razi, Mr. Raihed",male,,0,0,2629,7.2292,,C 862 | 861,0,3,"Hansen, Mr. Claus Peter",male,41,2,0,350026,14.1083,,S 863 | 862,0,2,"Giles, Mr. Frederick Edward",male,21,1,0,28134,11.5,,S 864 | 863,1,1,"Swift, Mrs. Frederick Joel (Margaret Welles Barron)",female,48,0,0,17466,25.9292,D17,S 865 | 864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.55,,S 866 | 865,0,2,"Gill, Mr. John William",male,24,0,0,233866,13,,S 867 | 866,1,2,"Bystrom, Mrs. (Karolina)",female,42,0,0,236852,13,,S 868 | 867,1,2,"Duran y More, Miss. Asuncion",female,27,1,0,SC/PARIS 2149,13.8583,,C 869 | 868,0,1,"Roebling, Mr. Washington Augustus II",male,31,0,0,PC 17590,50.4958,A24,S 870 | 869,0,3,"van Melkebeke, Mr. Philemon",male,,0,0,345777,9.5,,S 871 | 870,1,3,"Johnson, Master. Harold Theodor",male,4,1,1,347742,11.1333,,S 872 | 871,0,3,"Balkic, Mr. Cerin",male,26,0,0,349248,7.8958,,S 873 | 872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47,1,1,11751,52.5542,D35,S 874 | 873,0,1,"Carlsson, Mr. Frans Olof",male,33,0,0,695,5,B51 B53 B55,S 875 | 874,0,3,"Vander Cruyssen, Mr. Victor",male,47,0,0,345765,9,,S 876 | 875,1,2,"Abelson, Mrs. Samuel (Hannah Wizosky)",female,28,1,0,P/PP 3381,24,,C 877 | 876,1,3,"Najib, Miss. Adele Kiamie ""Jane""",female,15,0,0,2667,7.225,,C 878 | 877,0,3,"Gustafsson, Mr. Alfred Ossian",male,20,0,0,7534,9.8458,,S 879 | 878,0,3,"Petroff, Mr. Nedelio",male,19,0,0,349212,7.8958,,S 880 | 879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S 881 | 880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56,0,1,11767,83.1583,C50,C 882 | 881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25,0,1,230433,26,,S 883 | 882,0,3,"Markun, Mr. Johann",male,33,0,0,349257,7.8958,,S 884 | 883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,22,0,0,7552,10.5167,,S 885 | 884,0,2,"Banfield, Mr. Frederick James",male,28,0,0,C.A./SOTON 34068,10.5,,S 886 | 885,0,3,"Sutehall, Mr. Henry Jr",male,25,0,0,SOTON/OQ 392076,7.05,,S 887 | 886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39,0,5,382652,29.125,,Q 888 | 887,0,2,"Montvila, Rev. Juozas",male,27,0,0,211536,13,,S 889 | 888,1,1,"Graham, Miss. Margaret Edith",female,19,0,0,112053,30,B42,S 890 | 889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S 891 | 890,1,1,"Behr, Mr. Karl Howell",male,26,0,0,111369,30,C148,C 892 | 891,0,3,"Dooley, Mr. Patrick",male,32,0,0,370376,7.75,,Q 893 | -------------------------------------------------------------------------------- /01_泰坦尼克号(titanic)/kaggle泰坦尼克之灾.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# kaggle泰坦尼克之灾" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "步骤:\n", 15 | "\n", 16 | "一.导入数据包与数据集\n", 17 | "\n", 18 | "二.数据分析\n", 19 | "\n", 20 | "1. 总体预览:了解每列数据的含义,数据的格式等\n", 21 | "2. .数据初步分析,使用统计学与绘图:初步了解数据之间的相关性,为构造特征工程以及模型建立做准备\n", 22 | "\n", 23 | "三.特征工程\n", 24 | "\n", 25 | "1. 根据业务,常识,以及第二步的数据分析构造特征工程.\n", 26 | "2. 将特征转换为模型可以辨别的类型(如处理缺失值,处理文本进行等)\n", 27 | "\n", 28 | "四.模型选择\n", 29 | "\n", 30 | "1. 根据目标函数确定学习类型,是无监督学习还是监督学习,是分类问题还是回归问题等.\n", 31 | "2. 比较各个模型的分数,然后取效果较好的模型作为基础模型.\n", 32 | "\n", 33 | "五.修改特征和模型参数\n", 34 | "\n", 35 | "1. 可以通过添加或者修改特征,提高模型的上限.\n", 36 | "2. 通过修改模型的参数,是模型逼近上限" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## 一.导入数据包与数据集" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": null, 49 | "metadata": { 50 | "collapsed": true 51 | }, 52 | "outputs": [], 53 | "source": [ 54 | "# 导入相关数据包\n", 55 | "import numpy as np\n", 56 | "import pandas as pd\n", 57 | "import seaborn as sns\n", 58 | "import matplotlib.pyplot as plt\n", 59 | "%matplotlib inline" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "root_path = './datasets'\n", 69 | "\n", 70 | "train = pd.read_csv('%s/%s' % (root_path, 'train.csv'))\n", 71 | "test = pd.read_csv('%s/%s' % (root_path, 'test.csv'))" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "## 二.数据分析" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "### 1.总体预览\n", 86 | "了解每列数据的含义,数据的格式等" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "train.head(5)" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "# 返回每列列名,该列非nan值个数,以及该列类型\n", 105 | "train.info()" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "test.info()" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "# 返回数值型变量的统计量\n", 124 | "# train.describe(percentiles=[0.00, 0.25, 0.5, 0.75, 1.00])\n", 125 | "train.describe()" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "## 2..数据初步分析,使用统计学与绘图\n", 133 | "目的:初步了解数据之间的相关性,为构造特征工程以及模型建立做准备" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "#存活人数\n", 143 | "train['Survived'].value_counts()" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "#### 1)数值型数据协方差,corr()函数\n", 151 | "来个总览,快速了解个数据的相关性" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": null, 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "#相关性协方差表,corr()函数,返回结果接近0说明无相关性,大于0说明是正相关,小于0是负相关.\n", 161 | "train_corr = train.drop('PassengerId',axis=1).corr()\n", 162 | "train_corr" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": null, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [ 171 | "#画出相关性热力图\n", 172 | "a = plt.subplots(figsize=(15,9))#调整画布大小\n", 173 | "a = sns.heatmap(train_corr, vmin=-1, vmax=1 , annot=True , square=True)#画热力图" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "#### 2)各个数据与结果的关系\n", 181 | "进一步探索分析各个数据与结果的关系" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "##### ①Pclass,乘客等级,1是最高级\n", 189 | "结果分析:可以看出Survived和Pclass在Pclass=1的时候有较强的相关性(>0.5),所以最终模型中包含该特征。" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": {}, 196 | "outputs": [], 197 | "source": [ 198 | "train.groupby(['Pclass'])['Pclass','Survived'].mean()" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [ 207 | "train[['Pclass','Survived']].groupby(['Pclass']).mean().plot.bar()" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "##### ②Sex,性别\n", 215 | "结果分析:女性有更高的活下来的概率(74%),保留该特征" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [ 224 | "train.groupby(['Sex'])['Sex','Survived'].mean()" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": null, 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [ 233 | "train[['Sex','Survived']].groupby(['Sex']).mean().plot.bar()" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "##### ③SibSp and Parch 兄妹配偶数/父母子女数\n", 241 | "结果分析:这些特征与特定的值没有相关性不明显,最好是由这些独立的特征派生出一个新特征或者一组新特征" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": null, 247 | "metadata": {}, 248 | "outputs": [], 249 | "source": [ 250 | "train[['SibSp','Survived']].groupby(['SibSp']).mean()" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": {}, 257 | "outputs": [], 258 | "source": [ 259 | "train[['Parch','Survived']].groupby(['Parch']).mean().plot.bar()" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "##### ④Age年龄与生存情况的分析.\n", 267 | "结果分析:由图,可以看到年龄是影响生存情况的. \n", 268 | "\n", 269 | "但是年龄是有大部分缺失值的,缺失值需要进行处理,可以使用填充或者模型预测." 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "g = sns.FacetGrid(train, col='Survived',size=5)\n", 279 | "g.map(plt.hist, 'Age', bins=40)" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "train.groupby(['Age'])['Survived'].mean().plot()" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "##### ⑤Embarked登港港口与生存情况的分析\n", 296 | "结果分析:C地的生存率更高,这个也应该保留为模型特征." 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [ 305 | "sns.countplot('Embarked',hue='Survived',data=train)" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "⑥其他因素\n", 313 | "*在数据的Name项中包含了对该乘客的称呼,如Mr、Miss等,这些信息包含了乘客的年龄、性别、也有可能包含社会地位,如Dr、Lady、Major、Master等称呼。这一项不方便用图表展示,但是在特征工程中,我们会将其提取出来,然后放到模型中。\n", 314 | "\n", 315 | "*剩余因素还有船票价格、船舱号和船票号,这三个因素都可能会影响乘客在船中的位置从而影响逃生顺序,但是因为这三个因素与生存之间看不出明显规律,所以在后期模型融合时,将这些因素交给模型来决定其重要性。" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": {}, 321 | "source": [ 322 | "## 三.特征工程" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": { 329 | "collapsed": true 330 | }, 331 | "outputs": [], 332 | "source": [ 333 | "#先将数据集合并,一起做特征工程(注意,标准化的时候需要分开处理)\n", 334 | "#先将test补齐,然后通过pd.apped()合并\n", 335 | "test['Survived'] = 0\n", 336 | "train_test = train.append(test)" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "### ①Pclass,乘客等级,1是最高级\n", 344 | "两种方式:一是该特征不做处理,可以直接保留.二是再处理:也进行分列处理(比较那种方式模型效果更好,就选那种)" 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": { 351 | "collapsed": true 352 | }, 353 | "outputs": [], 354 | "source": [ 355 | "train_test = pd.get_dummies(train_test,columns=['Pclass'])" 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": {}, 361 | "source": [ 362 | "### ②Sex,性别\n", 363 | "无缺失值,直接分列" 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": null, 369 | "metadata": { 370 | "collapsed": true 371 | }, 372 | "outputs": [], 373 | "source": [ 374 | "train_test = pd.get_dummies(train_test,columns=[\"Sex\"])" 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "### ③SibSp and Parch 兄妹配偶数/父母子女数\n", 382 | "第一次直接保留:这两个都影响生存率,且都是数值型,先直接保存.\n", 383 | "\n", 384 | "第二次进行两项求和,并进行分列处理.(兄妹配偶数和父母子女数都是认识人的数量,所以总数可能也会更好)(模型结果提高到了)" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": { 391 | "collapsed": true 392 | }, 393 | "outputs": [], 394 | "source": [ 395 | "#这是剑豪模型后回来添加的新特征,模型的分数最终有所提高了.\n", 396 | "train_test['SibSp_Parch'] = train_test['SibSp'] + train_test['Parch']" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": null, 402 | "metadata": { 403 | "collapsed": true 404 | }, 405 | "outputs": [], 406 | "source": [ 407 | "train_test = pd.get_dummies(train_test,columns = ['SibSp','Parch','SibSp_Parch']) " 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "### ④Embarked \n", 415 | "数据有极少量(3个)缺失值,但是在分列的时候,缺失值的所有列可以均为0,所以可以考虑不填充.\n", 416 | "\n", 417 | "另外,也可以考虑用测试集众数来填充.先找出众数,再采用df.fillna()方法" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": null, 423 | "metadata": { 424 | "collapsed": true 425 | }, 426 | "outputs": [], 427 | "source": [ 428 | "train_test = pd.get_dummies(train_test,columns=[\"Embarked\"])" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "### ⑤ Name\n", 436 | "1.在数据的Name项中包含了对该乘客的称呼,将这些关键词提取出来,然后做分列处理.(参考别人的)" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": null, 442 | "metadata": {}, 443 | "outputs": [], 444 | "source": [ 445 | "#从名字中提取出称呼: df['Name].str.extract()是提取函数,配合正则一起使用\n", 446 | "train_test['Name1'] = train_test['Name'].str.extract('.+,(.+)').str.extract( '^(.+?)\\.').str.strip()" 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": null, 452 | "metadata": { 453 | "collapsed": true 454 | }, 455 | "outputs": [], 456 | "source": [ 457 | "#将姓名分类处理()\n", 458 | "train_test['Name1'].replace(['Capt', 'Col', 'Major', 'Dr', 'Rev'], 'Officer' , inplace = True)\n", 459 | "train_test['Name1'].replace(['Jonkheer', 'Don', 'Sir', 'the Countess', 'Dona', 'Lady'], 'Royalty' , inplace = True)\n", 460 | "train_test['Name1'].replace(['Mme', 'Ms', 'Mrs'], 'Mrs')\n", 461 | "train_test['Name1'].replace(['Mlle', 'Miss'], 'Miss')\n", 462 | "train_test['Name1'].replace(['Mr'], 'Mr' , inplace = True)\n", 463 | "train_test['Name1'].replace(['Master'], 'Master' , inplace = True)" 464 | ] 465 | }, 466 | { 467 | "cell_type": "code", 468 | "execution_count": null, 469 | "metadata": { 470 | "collapsed": true 471 | }, 472 | "outputs": [], 473 | "source": [ 474 | "#分列处理\n", 475 | "train_test = pd.get_dummies(train_test,columns=['Name1'])" 476 | ] 477 | }, 478 | { 479 | "cell_type": "markdown", 480 | "metadata": {}, 481 | "source": [ 482 | "##### 从姓名中提取出姓做特征" 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": null, 488 | "metadata": { 489 | "collapsed": true 490 | }, 491 | "outputs": [], 492 | "source": [ 493 | "#从姓名中提取出姓\n", 494 | "train_test['Name2'] = train_test['Name'].apply(lambda x: x.split('.')[1])\n", 495 | "\n", 496 | "#计算数量,然后合并数据集\n", 497 | "Name2_sum = train_test['Name2'].value_counts().reset_index()\n", 498 | "Name2_sum.columns=['Name2','Name2_sum']\n", 499 | "train_test = pd.merge(train_test,Name2_sum,how='left',on='Name2')\n", 500 | "\n", 501 | "#由于出现一次时该特征时无效特征,用one来代替出现一次的姓\n", 502 | "train_test.loc[train_test['Name2_sum'] == 1 , 'Name2_new'] = 'one'\n", 503 | "train_test.loc[train_test['Name2_sum'] > 1 , 'Name2_new'] = train_test['Name2']\n", 504 | "del train_test['Name2']\n", 505 | "\n", 506 | "#分列处理\n", 507 | "train_test = pd.get_dummies(train_test,columns=['Name2_new'])" 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": null, 513 | "metadata": { 514 | "collapsed": true 515 | }, 516 | "outputs": [], 517 | "source": [ 518 | "#删掉姓名这个特征\n", 519 | "del train_test['Name']" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "### ⑥ Fare\n", 527 | "该特征有缺失值,先找出缺失值的那调数据,然后用平均数填充" 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": null, 533 | "metadata": {}, 534 | "outputs": [], 535 | "source": [ 536 | "#从上面的分析,发现该特征train集无miss值,test有一个缺失值,先查看\n", 537 | "train_test.loc[train_test[\"Fare\"].isnull()]" 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": null, 543 | "metadata": {}, 544 | "outputs": [], 545 | "source": [ 546 | "#票价与pclass和Embarked有关,所以用train分组后的平均数填充\n", 547 | "train.groupby(by=[\"Pclass\",\"Embarked\"]).Fare.mean()" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": null, 553 | "metadata": { 554 | "collapsed": true 555 | }, 556 | "outputs": [], 557 | "source": [ 558 | "#用pclass=3和Embarked=S的平均数14.644083来填充\n", 559 | "train_test[\"Fare\"].fillna(14.435422,inplace=True)" 560 | ] 561 | }, 562 | { 563 | "cell_type": "markdown", 564 | "metadata": { 565 | "collapsed": true 566 | }, 567 | "source": [ 568 | "### ⑦ Ticket\n", 569 | "该列和名字做类似的处理,先提取,然后分列" 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": null, 575 | "metadata": { 576 | "collapsed": true 577 | }, 578 | "outputs": [], 579 | "source": [ 580 | "#将Ticket提取字符列\n", 581 | "#str.isnumeric() 如果S中只有数字字符,则返回True,否则返回False\n", 582 | "train_test['Ticket_Letter'] = train_test['Ticket'].str.split().str[0]\n", 583 | "train_test['Ticket_Letter'] = train_test['Ticket_Letter'].apply(lambda x:np.nan if x.isnumeric() else x)\n", 584 | "train_test.drop('Ticket',inplace=True,axis=1)" 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": null, 590 | "metadata": { 591 | "collapsed": true 592 | }, 593 | "outputs": [], 594 | "source": [ 595 | "#分列,此时nan值可以不做处理\n", 596 | "train_test = pd.get_dummies(train_test,columns=['Ticket_Letter'],drop_first=True)" 597 | ] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "metadata": {}, 602 | "source": [ 603 | "### ⑧ Age\n", 604 | "1.该列有大量缺失值,考虑用一个回归模型进行填充.\n", 605 | "\n", 606 | "2.在模型修改的时候,考虑到年龄缺失值可能影响死亡情况,用年龄是否缺失值来构造新特征" 607 | ] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": null, 612 | "metadata": {}, 613 | "outputs": [], 614 | "source": [ 615 | "\"\"\"\n", 616 | "这是模型就好后回来增加的新特征\n", 617 | "考虑年龄缺失值可能影响死亡情况,数据表明,年龄缺失的死亡率为0.19.\n", 618 | "\"\"\"\n", 619 | "train_test.loc[train_test[\"Age\"].isnull()]['Survived'].mean()" 620 | ] 621 | }, 622 | { 623 | "cell_type": "code", 624 | "execution_count": null, 625 | "metadata": { 626 | "collapsed": true 627 | }, 628 | "outputs": [], 629 | "source": [ 630 | "# 所以用年龄是否缺失值来构造新特征\n", 631 | "train_test.loc[train_test[\"Age\"].isnull() ,\"age_nan\"] = 1\n", 632 | "train_test.loc[train_test[\"Age\"].notnull() ,\"age_nan\"] = 0\n", 633 | "train_test = pd.get_dummies(train_test,columns=['age_nan'])" 634 | ] 635 | }, 636 | { 637 | "cell_type": "markdown", 638 | "metadata": {}, 639 | "source": [ 640 | "### 利用其他组特征量,采用机器学习算法来预测Age" 641 | ] 642 | }, 643 | { 644 | "cell_type": "code", 645 | "execution_count": null, 646 | "metadata": {}, 647 | "outputs": [], 648 | "source": [ 649 | "train_test.info()" 650 | ] 651 | }, 652 | { 653 | "cell_type": "code", 654 | "execution_count": null, 655 | "metadata": {}, 656 | "outputs": [], 657 | "source": [ 658 | "#创建没有['Age','Survived']的数据集\n", 659 | "missing_age = train_test.drop(['Survived','Cabin'],axis=1)\n", 660 | "#将Age完整的项作为训练集、将Age缺失的项作为测试集。\n", 661 | "missing_age_train = missing_age[missing_age['Age'].notnull()]\n", 662 | "missing_age_test = missing_age[missing_age['Age'].isnull()]" 663 | ] 664 | }, 665 | { 666 | "cell_type": "code", 667 | "execution_count": null, 668 | "metadata": {}, 669 | "outputs": [], 670 | "source": [ 671 | "#构建训练集合预测集的X和Y值\n", 672 | "missing_age_X_train = missing_age_train.drop(['Age'], axis=1)\n", 673 | "missing_age_Y_train = missing_age_train['Age']\n", 674 | "missing_age_X_test = missing_age_test.drop(['Age'], axis=1)" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": null, 680 | "metadata": { 681 | "collapsed": true 682 | }, 683 | "outputs": [], 684 | "source": [ 685 | "# 先将数据标准化\n", 686 | "from sklearn.preprocessing import StandardScaler\n", 687 | "ss = StandardScaler()\n", 688 | "#用测试集训练并标准化\n", 689 | "ss.fit(missing_age_X_train)\n", 690 | "missing_age_X_train = ss.transform(missing_age_X_train)\n", 691 | "missing_age_X_test = ss.transform(missing_age_X_test)" 692 | ] 693 | }, 694 | { 695 | "cell_type": "code", 696 | "execution_count": null, 697 | "metadata": { 698 | "collapsed": true 699 | }, 700 | "outputs": [], 701 | "source": [ 702 | "#使用贝叶斯预测年龄\n", 703 | "from sklearn import linear_model\n", 704 | "lin = linear_model.BayesianRidge()" 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": null, 710 | "metadata": {}, 711 | "outputs": [], 712 | "source": [ 713 | "lin.fit(missing_age_X_train,missing_age_Y_train)" 714 | ] 715 | }, 716 | { 717 | "cell_type": "code", 718 | "execution_count": null, 719 | "metadata": { 720 | "collapsed": true 721 | }, 722 | "outputs": [], 723 | "source": [ 724 | "#利用loc将预测值填入数据集\n", 725 | "train_test.loc[(train_test['Age'].isnull()), 'Age'] = lin.predict(missing_age_X_test)" 726 | ] 727 | }, 728 | { 729 | "cell_type": "code", 730 | "execution_count": null, 731 | "metadata": { 732 | "collapsed": true 733 | }, 734 | "outputs": [], 735 | "source": [ 736 | "#将年龄划分是个阶段10以下,10-18,18-30,30-50,50以上\n", 737 | "train_test['Age'] = pd.cut(train_test['Age'], bins=[0,10,18,30,50,100],labels=[1,2,3,4,5])\n", 738 | "\n", 739 | "train_test = pd.get_dummies(train_test,columns=['Age'])" 740 | ] 741 | }, 742 | { 743 | "cell_type": "markdown", 744 | "metadata": {}, 745 | "source": [ 746 | "### ⑨ Cabin\n", 747 | "cabin项缺失太多,只能将有无Cain首字母进行分类,缺失值为一类,作为特征值进行建模,也可以考虑直接舍去该特征" 748 | ] 749 | }, 750 | { 751 | "cell_type": "code", 752 | "execution_count": null, 753 | "metadata": { 754 | "collapsed": true 755 | }, 756 | "outputs": [], 757 | "source": [ 758 | "#cabin项缺失太多,只能将有无Cain首字母进行分类,缺失值为一类,作为特征值进行建模\n", 759 | "train_test['Cabin'] = train_test['Cabin'].apply(lambda x:str(x)[0] if pd.notnull(x) else x)\n", 760 | "train_test = pd.get_dummies(train_test,columns=['Cabin'])" 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": null, 766 | "metadata": {}, 767 | "outputs": [], 768 | "source": [ 769 | "#cabin项缺失太多,只能将有无Cain首字母进行分类,\n", 770 | "train_test.loc[train_test[\"Cabin\"].isnull() ,\"Cabin_nan\"] = 1\n", 771 | "train_test.loc[train_test[\"Cabin\"].notnull() ,\"Cabin_nan\"] = 0\n", 772 | "train_test = pd.get_dummies(train_test,columns=['Cabin_nan'])\n", 773 | "train_test.drop('Cabin',axis=1,inplace=True)" 774 | ] 775 | }, 776 | { 777 | "cell_type": "markdown", 778 | "metadata": {}, 779 | "source": [ 780 | "### ⑩ 特征工程处理完了,划分数据集" 781 | ] 782 | }, 783 | { 784 | "cell_type": "code", 785 | "execution_count": null, 786 | "metadata": { 787 | "collapsed": true 788 | }, 789 | "outputs": [], 790 | "source": [ 791 | "train_data = train_test[:891]\n", 792 | "test_data = train_test[891:]\n", 793 | "train_data_X = train_data.drop(['Survived'],axis=1)\n", 794 | "train_data_Y = train_data['Survived']\n", 795 | "test_data_X = test_data.drop(['Survived'],axis=1)" 796 | ] 797 | }, 798 | { 799 | "cell_type": "markdown", 800 | "metadata": {}, 801 | "source": [ 802 | "## 四 建立模型" 803 | ] 804 | }, 805 | { 806 | "cell_type": "markdown", 807 | "metadata": {}, 808 | "source": [ 809 | "### 数据标准化\n", 810 | "1.线性模型需要用标准化的数据建模,而树类模型不需要标准化的数据\n", 811 | "\n", 812 | "2.处理标准化的时候,注意将测试集的数据transform到test集上" 813 | ] 814 | }, 815 | { 816 | "cell_type": "code", 817 | "execution_count": null, 818 | "metadata": { 819 | "collapsed": true 820 | }, 821 | "outputs": [], 822 | "source": [ 823 | "from sklearn.preprocessing import StandardScaler\n", 824 | "ss2 = StandardScaler()\n", 825 | "ss2.fit(train_data_X)\n", 826 | "train_data_X_sd = ss2.transform(train_data_X)\n", 827 | "test_data_X_sd = ss2.transform(test_data_X)" 828 | ] 829 | }, 830 | { 831 | "cell_type": "markdown", 832 | "metadata": {}, 833 | "source": [ 834 | "### 开始建模\n", 835 | "1.可选单个模型模型有随机森林,逻辑回归,svm,xgboost,gbdt等.\n", 836 | "\n", 837 | "2.也可以将多个模型组合起来,进行模型融合,比如voting,stacking等方法\n", 838 | "\n", 839 | "3.好的特征决定模型上限,好的模型和参数可以无线逼近上限.\n", 840 | "\n", 841 | "4.我测试了多种模型,模型结果最高的随机森林,最高有0.8." 842 | ] 843 | }, 844 | { 845 | "cell_type": "markdown", 846 | "metadata": {}, 847 | "source": [ 848 | "### 随机森林" 849 | ] 850 | }, 851 | { 852 | "cell_type": "code", 853 | "execution_count": null, 854 | "metadata": { 855 | "collapsed": true 856 | }, 857 | "outputs": [], 858 | "source": [ 859 | "from sklearn.ensemble import RandomForestClassifier\n", 860 | "\n", 861 | "rf = RandomForestClassifier(n_estimators=150,min_samples_leaf=3,max_depth=6,oob_score=True)\n", 862 | "rf.fit(train_data_X,train_data_Y)\n", 863 | "\n", 864 | "test[\"Survived\"] = rf.predict(test_data_X)\n", 865 | "RF = test[['PassengerId','Survived']].set_index('PassengerId')\n", 866 | "RF.to_csv('RF.csv')" 867 | ] 868 | }, 869 | { 870 | "cell_type": "code", 871 | "execution_count": null, 872 | "metadata": { 873 | "collapsed": true 874 | }, 875 | "outputs": [], 876 | "source": [ 877 | "#随机森林是随机选取特征进行建模的,所以每次的结果可能都有点小差异\n", 878 | "#如果分数足够好,可以将该模型保存起来,下次直接调出来使用0.81339 'rf10.pkl'\n", 879 | "from sklearn.externals import joblib\n", 880 | "joblib.dump(rf, 'rf10.pkl')" 881 | ] 882 | }, 883 | { 884 | "cell_type": "markdown", 885 | "metadata": {}, 886 | "source": [ 887 | "### LogisticRegression" 888 | ] 889 | }, 890 | { 891 | "cell_type": "code", 892 | "execution_count": null, 893 | "metadata": { 894 | "collapsed": true 895 | }, 896 | "outputs": [], 897 | "source": [ 898 | "from sklearn.linear_model import LogisticRegression\n", 899 | "\n", 900 | "lr = LogisticRegression()\n", 901 | "\n", 902 | "from sklearn.grid_search import GridSearchCV\n", 903 | "\n", 904 | "param = {'C':[0.001,0.01,0.1,1,10],\"max_iter\":[100,250]}\n", 905 | "\n", 906 | "clf = GridSearchCV(lr,param,cv=5,n_jobs=-1,verbose=1,scoring=\"roc_auc\")\n", 907 | "\n", 908 | "clf.fit(train_data_X_sd,train_data_Y)\n", 909 | "\n", 910 | "#打印参数的得分情况\n", 911 | "clf.grid_scores_\n", 912 | "\n", 913 | "#打印最佳参数\n", 914 | "clf.best_params_\n", 915 | "\n", 916 | "#将最佳参数传入训练模型\n", 917 | "lr = LogisticRegression(clf.best_params_)\n", 918 | "\n", 919 | "lr.fit(train_data_X_sd,train_data_Y)\n", 920 | "\n", 921 | "#预测结果\n", 922 | "lr.predict(test_data_X_sd)\n", 923 | "\n", 924 | "#打印结果\n", 925 | "test[\"Survived\"] = lr.predict(test_data_X_sd)\n", 926 | "LS = test[['PassengerId','Survived']].set_index('PassengerId')\n", 927 | "\n", 928 | "#输出结果\n", 929 | "LS.to_csv('LS5.csv')" 930 | ] 931 | }, 932 | { 933 | "cell_type": "markdown", 934 | "metadata": {}, 935 | "source": [ 936 | "### SVM" 937 | ] 938 | }, 939 | { 940 | "cell_type": "code", 941 | "execution_count": null, 942 | "metadata": { 943 | "collapsed": true 944 | }, 945 | "outputs": [], 946 | "source": [ 947 | "from sklearn import svm\n", 948 | "svc = svm.SVC()\n", 949 | "\n", 950 | "clf = GridSearchCV(svc,param,cv=5,n_jobs=-1,verbose=1,scoring=\"roc_auc\")\n", 951 | "clf.fit(train_data_X_sd,train_data_Y)\n", 952 | "\n", 953 | "clf.best_params_\n", 954 | "\n", 955 | "svc = svm.SVC(C=1,max_iter=250)\n", 956 | "\n", 957 | "#训练模型并预测结果\n", 958 | "svc.fit(train_data_X_sd,train_data_Y)\n", 959 | "svc.predict(test_data_X_sd)\n", 960 | "\n", 961 | "#打印结果\n", 962 | "test[\"Survived\"] = svc.predict(test_data_X_sd)\n", 963 | "SVM = test[['PassengerId','Survived']].set_index('PassengerId')\n", 964 | "SVM.to_csv('svm1.csv')" 965 | ] 966 | }, 967 | { 968 | "cell_type": "markdown", 969 | "metadata": {}, 970 | "source": [ 971 | "### xgboost" 972 | ] 973 | }, 974 | { 975 | "cell_type": "code", 976 | "execution_count": null, 977 | "metadata": { 978 | "collapsed": true 979 | }, 980 | "outputs": [], 981 | "source": [ 982 | "import xgboost as xgb\n", 983 | "\n", 984 | "xgb_model = xgb.XGBClassifier(n_estimators=150,min_samples_leaf=3,max_depth=6)\n", 985 | "\n", 986 | "xgb_model.fit(train_data_X,train_data_Y)\n", 987 | "\n", 988 | "test[\"Survived\"] = xgb_model.predict(test_data_X)\n", 989 | "XGB = test[['PassengerId','Survived']].set_index('PassengerId')\n", 990 | "XGB.to_csv('XGB5.csv')" 991 | ] 992 | }, 993 | { 994 | "cell_type": "markdown", 995 | "metadata": {}, 996 | "source": [ 997 | "### GBDT" 998 | ] 999 | }, 1000 | { 1001 | "cell_type": "code", 1002 | "execution_count": null, 1003 | "metadata": { 1004 | "collapsed": true 1005 | }, 1006 | "outputs": [], 1007 | "source": [ 1008 | "from sklearn.ensemble import GradientBoostingClassifier\n", 1009 | "\n", 1010 | "gbdt = GradientBoostingClassifier(learning_rate=0.7,max_depth=6,n_estimators=100,min_samples_leaf=2)\n", 1011 | "\n", 1012 | "gbdt.fit(train_data_X,train_data_Y)\n", 1013 | "\n", 1014 | "test[\"Survived\"] = gbdt.predict(test_data_X)\n", 1015 | "test[['PassengerId','Survived']].set_index('PassengerId').to_csv('gbdt3.csv')" 1016 | ] 1017 | }, 1018 | { 1019 | "cell_type": "markdown", 1020 | "metadata": {}, 1021 | "source": [ 1022 | "### 模型融合voting" 1023 | ] 1024 | }, 1025 | { 1026 | "cell_type": "code", 1027 | "execution_count": null, 1028 | "metadata": { 1029 | "collapsed": true 1030 | }, 1031 | "outputs": [], 1032 | "source": [ 1033 | "from sklearn.ensemble import VotingClassifier\n", 1034 | "\n", 1035 | "from sklearn.linear_model import LogisticRegression\n", 1036 | "lr = LogisticRegression(C=0.1,max_iter=100)\n", 1037 | "\n", 1038 | "import xgboost as xgb\n", 1039 | "xgb_model = xgb.XGBClassifier(max_depth=6,min_samples_leaf=2,n_estimators=100,num_round = 5)\n", 1040 | "\n", 1041 | "from sklearn.ensemble import RandomForestClassifier\n", 1042 | "rf = RandomForestClassifier(n_estimators=200,min_samples_leaf=2,max_depth=6,oob_score=True)\n", 1043 | "\n", 1044 | "from sklearn.ensemble import GradientBoostingClassifier\n", 1045 | "gbdt = GradientBoostingClassifier(learning_rate=0.1,min_samples_leaf=2,max_depth=6,n_estimators=100)\n", 1046 | "\n", 1047 | "vot = VotingClassifier(estimators=[('lr', lr), ('rf', rf),('gbdt',gbdt),('xgb',xgb_model)], voting='hard')\n", 1048 | "\n", 1049 | "vot.fit(train_data_X_sd,train_data_Y)\n", 1050 | "\n", 1051 | "vot.predict(test_data_X_sd)\n", 1052 | "\n", 1053 | "test[\"Survived\"] = vot.predict(test_data_X_sd)\n", 1054 | "test[['PassengerId','Survived']].set_index('PassengerId').to_csv('vot5.csv')" 1055 | ] 1056 | }, 1057 | { 1058 | "cell_type": "markdown", 1059 | "metadata": {}, 1060 | "source": [ 1061 | "### 模型融合stacking" 1062 | ] 1063 | }, 1064 | { 1065 | "cell_type": "code", 1066 | "execution_count": null, 1067 | "metadata": { 1068 | "collapsed": true 1069 | }, 1070 | "outputs": [], 1071 | "source": [ 1072 | "#划分train数据集,调用代码,把数据集名字转成和代码一样\n", 1073 | "X = train_data_X_sd\n", 1074 | "X_predict = test_data_X_sd\n", 1075 | "y = train_data_Y\n", 1076 | "\n", 1077 | "'''模型融合中使用到的各个单模型'''\n", 1078 | "from sklearn.linear_model import LogisticRegression\n", 1079 | "from sklearn import svm\n", 1080 | "import xgboost as xgb\n", 1081 | "from sklearn.ensemble import RandomForestClassifier\n", 1082 | "from sklearn.ensemble import GradientBoostingClassifier\n", 1083 | "\n", 1084 | "clfs = [LogisticRegression(C=0.1,max_iter=100),\n", 1085 | " xgb.XGBClassifier(max_depth=6,n_estimators=100,num_round = 5),\n", 1086 | " RandomForestClassifier(n_estimators=100,max_depth=6,oob_score=True),\n", 1087 | " GradientBoostingClassifier(learning_rate=0.3,max_depth=6,n_estimators=100)]\n", 1088 | "\n", 1089 | "#创建n_folds\n", 1090 | "from sklearn.cross_validation import StratifiedKFold\n", 1091 | "n_folds = 5\n", 1092 | "skf = list(StratifiedKFold(y, n_folds))\n", 1093 | "\n", 1094 | "#创建零矩阵\n", 1095 | "dataset_blend_train = np.zeros((X.shape[0], len(clfs)))\n", 1096 | "dataset_blend_test = np.zeros((X_predict.shape[0], len(clfs)))\n", 1097 | "\n", 1098 | "#建立模型\n", 1099 | "for j, clf in enumerate(clfs):\n", 1100 | " '''依次训练各个单模型'''\n", 1101 | " # print(j, clf)\n", 1102 | " dataset_blend_test_j = np.zeros((X_predict.shape[0], len(skf)))\n", 1103 | " for i, (train, test) in enumerate(skf):\n", 1104 | " '''使用第i个部分作为预测,剩余的部分来训练模型,获得其预测的输出作为第i部分的新特征。'''\n", 1105 | " # print(\"Fold\", i)\n", 1106 | " X_train, y_train, X_test, y_test = X[train], y[train], X[test], y[test]\n", 1107 | " clf.fit(X_train, y_train)\n", 1108 | " y_submission = clf.predict_proba(X_test)[:, 1]\n", 1109 | " dataset_blend_train[test, j] = y_submission\n", 1110 | " dataset_blend_test_j[:, i] = clf.predict_proba(X_predict)[:, 1]\n", 1111 | " '''对于测试集,直接用这k个模型的预测值均值作为新的特征。'''\n", 1112 | " dataset_blend_test[:, j] = dataset_blend_test_j.mean(1)\n", 1113 | "\n", 1114 | "# 用建立第二层模型\n", 1115 | "clf2 = LogisticRegression(C=0.1,max_iter=100)\n", 1116 | "clf2.fit(dataset_blend_train, y)\n", 1117 | "y_submission = clf2.predict_proba(dataset_blend_test)[:, 1]\n", 1118 | "\n", 1119 | "\n", 1120 | "test = pd.read_csv(\"test.csv\")\n", 1121 | "test[\"Survived\"] = clf2.predict(dataset_blend_test)\n", 1122 | "test[['PassengerId','Survived']].set_index('PassengerId').to_csv('stack3.csv')" 1123 | ] 1124 | } 1125 | ], 1126 | "metadata": { 1127 | "kernelspec": { 1128 | "display_name": "Python 3", 1129 | "language": "python", 1130 | "name": "python3" 1131 | }, 1132 | "language_info": { 1133 | "codemirror_mode": { 1134 | "name": "ipython", 1135 | "version": 3 1136 | }, 1137 | "file_extension": ".py", 1138 | "mimetype": "text/x-python", 1139 | "name": "python", 1140 | "nbconvert_exporter": "python", 1141 | "pygments_lexer": "ipython3", 1142 | "version": "3.6.1" 1143 | } 1144 | }, 1145 | "nbformat": 4, 1146 | "nbformat_minor": 2 1147 | } 1148 | -------------------------------------------------------------------------------- /03_Facebook广告投放(Sales Conversion Optimization)/README.md: -------------------------------------------------------------------------------- 1 | ## Facebook广告投放策略数据分析项目实践 2 | 本项目数据来自于Kaggle上的一个Facebook广告投放策略数据,项目主页地址: 3 | [Sales Conversion Optimization](https://www.kaggle.com/loveall/clicks-conversion-tracking/home) 4 | 5 | 项目使用的数据来自一个匿名组织的社交媒体广告活动,数据文件包含11个属性(列),1143个观察值(行),下面是属性的描述: 6 | 7 | 1. ad_id: an unique ID for each ad. 8 | 9 | >>即每个广告的唯一标识 10 | 11 | 2. xyz_campaign_id: an ID associated with each ad campaign of XYZ company. 12 | 13 | >>即xyz公司的每次广告活动的唯一标识 14 | 15 | 3. fb_campaign_id: an ID associated with how Facebook tracks each campaign. 16 | 17 | >>即facebook用来跟踪每次广告活动的唯一标识 18 | 19 | 4. age: age of the person to whom the ad is shown. 20 | 21 | >>即广告展示对象的年龄 22 | 23 | 5. gender: gender of the person to whim the add is shown 24 | 25 | >>即广告展示对象的性别 26 | 27 | 6. interest: a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile). 28 | 29 | >>即facebook用户的兴趣标签,也就是facebook用户的个人主页所显示的兴趣标签。 30 | 31 | 7. Impressions: the number of times the ad was shown. 32 | 33 | >>即广告曝光次数。 34 | 35 | 8. Clicks: number of clicks on for that ad. 36 | 37 | >>即广告点击次数。 38 | 39 | 9. Spent: Amount paid by company xyz to Facebook, to show that ad. 40 | 41 | >>即xyz公司为每次广告展示向facebook支付的费用。 42 | 43 | 10. Total conversion: Total number of people who enquired about the product after seeing the ad. 44 | 45 | >>即用户看到广告之后向xyz公司咨询产品的次数。 46 | 47 | 11. Approved conversion: Total number of people who bought the product after seeing the ad. 48 | 49 | >>即用户看到广告之后购买产品的次数。 50 | 51 | 52 | --- 53 | 54 | 社交媒体广告活动营销是销售转换的一个主要来源, 本次分析聚焦两个问题: 55 | 56 | - 兴趣标签(interest)取值为多少的人点击率最高? 57 | 58 | - 哪个群体的转化率最高? 59 | 60 | 由于本次项目数据集不大,且数据质量很高,无需进行缺失值、异常值处理,故直接将其导入MySQL进行分析,本人使用的数据库管理软件是Mysql For Navicat Premium,分析内容分为两个部分:数据统计和投放效果分析。 61 | 62 | --- 63 | 64 | ### 一、数据统计 65 | 66 | 1、将数据集导入Mysql数据库,结果如下: 67 | ![1](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/yrapkc5xIrnCnqICQ4UqqhtSGpdN8WCYvbsAVlw0C0s!/b/dLYAAAAAAAAA&bo=wQOdAgAAAAARB20!&rf=viewer_4) 68 | 69 | 70 | 71 | 2、查看有多少个广告活动(compaign): 72 | ![2](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/*R9gxsbLPHOh3dt96uEiAsH37JCzRuLBRCUka9Vydgk!/b/dFIBAAAAAAAA&bo=.AIPAgAAAAARB8c!&rf=viewer_4) 73 | 74 | 75 | 76 | 3、计算每个广告活动(campaign)包含了多少个广告,按降序排列 77 | ![3](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/jDH*5itbngY5oBRbwT3PCBi8C5mK3oJLOF6gQaZ2PrU!/b/dAgBAAAAAAAA&bo=9gIOAgAAAAARB8g!&rf=viewer_4) 78 | 79 | 80 | 81 | 4、计算每个广告活动的总花费,按降序排列 82 | 83 | ![4](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/M5XckK*3KWyH5hB3dOSQTT.SWe36ecxX.rEH2EoUSvo!/b/dDYBAAAAAAAA&bo=9gIMAgAAAAARF9o!&rf=viewer_4) 84 | 85 | 86 | 87 | 5、以百分数形式显示xyz_campaign_id=916广告活动中每个广告的点击率CTR,并按降序排列,CTR=clicks/impressions 88 | ![5](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/GU8MN2N.wVmceRiE2L1VI*5SDT2baRQY4fR2*C*du5o!/b/dLYAAAAAAAAA&bo=.QIIAgAAAAARB8E!&rf=viewer_4) 89 | 90 | 91 | 92 | 6、以百分数形式显示xyz_campaign_id=916广告活动中每个广告的转化率CVR,并按降序排列,CVR= total_conversion/clicks 93 | ![6](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/h.z1qP1msT42MG8i2tTe0IRaIpWGRhwRKOmjVKZe6FM!/b/dLYAAAAAAAAA&bo=9QIQAgAAAAARF8U!&rf=viewer_4) 94 | 95 | --- 96 | 97 | ### 二、投放效果分析 98 | 99 | 1、 interest值为多少的人点击率最高?分别对3个不同的广告活动进行分析。 100 | 101 | - 对于xyz_campaign_id=916, interest=21的人点击率最高,分析过程如下: 102 | ![7](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/2wogMG2q3rVRnLXy3WHUu2hwk7xgWQfrTVtsJHD164g!/b/dDEBAAAAAAAA&bo=9gIRAgAAAAARB9c!&rf=viewer_4) 103 | 104 | 105 | - 对于xyz_campaign_id=936,interest=2的人点击率最高,分析过程同上,只需要将where语句中xyz_campain_code=916改为xyz_campain_code=936 106 | ![8](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/.B8FlbcfcpVnUsRixsu1Doizi3DYa.dcP4OmwOI4Y.A!/b/dLYAAAAAAAAA&bo=.AIMAgAAAAARF9Q!&rf=viewer_4) 107 | 108 | 109 | - 同理,对于xyz_campain_code=1178,分析可得interest=26的人点击率最高。 110 | ![9](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/PycCiOj08nV.XghHE.nD2xaN5aQ5BDoasNPyxLmHhQw!/b/dLYAAAAAAAAA&bo=9QIQAgAAAAARF8U!&rf=viewer_4) 111 | 112 | --- 113 | 114 | 2.男性还是女性更有可能完成转化?分别对3个不同的广告活动进行分析。 115 | 116 | - 对于xyz_campaign_code=916,男性更容易完成转化,分析过程如下: 117 | ![](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/Tlwowojpj7kO6Z*n71Y6drcXxbA1Ry*9tZtFgZCtJNg!/b/dDIBAAAAAAAA&bo=KAMUAgAAAAARBw0!&rf=viewer_4) 118 | 119 | 120 | 121 | - 对于xyz_campaign_code=936,男性更容易完成转化,分析过程同上,只需要将where语句中xyz_campain_code=916改为xyz_campain_code=936 122 | ![](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/oWmXZY9NtZpLY.yRog.*vj068oDSz26hn3I6UT6FWY0!/b/dDUBAAAAAAAA&bo=KgMNAgAAAAARFwY!&rf=viewer_4) 123 | 124 | 125 | - 同理,对于xyz_campain_code=1178,分析可得男性更容易完成转化。 126 | ![](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/oh2ZDQ3Tg.825WATCENsKiE4uz6Nf8jmYpS8.Q7UIZg!/b/dDQBAAAAAAAA&bo=JgMTAgAAAAARFxQ!&rf=viewer_4) 127 | 128 | 129 | --- 130 | 131 | 3、 进一步分析,哪个年龄段的男性转化率最高?分别对3个不同的广告活动进行分析。 132 | 133 | - 对于xyz_campaign_code=916,30-34岁的男性转化率最高,分析过程如下: 134 | ![13](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/38i201zy*CssGbxNwsRAIi0rQaFrxxTCoUqlL3QWl4A!/b/dEYBAAAAAAAA&bo=8gIMAgAAAAARB84!&rf=viewer_4) 135 | 136 | 137 | 138 | - 对于xyz_campaign_code=916,30-34岁的男性转化率最高,分析过程同上,只需要将where语句中xyz_campain_code=916改为xyz_campain_code=936。 139 | 140 | ![14](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/wDxsWB8TJupNTw2pTwiqGEdNjOr1onvt7Xp9q2pYHXQ!/b/dFQBAAAAAAAA&bo=9AISAgAAAAARF8Y!&rf=viewer_4) 141 | 142 | 143 | 144 | - 同理,对于xyz_campain_code=1178,分析可得30-34岁的男性更容易完成转化。 145 | 146 | ![15](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/RaUtjKtPDa8O3G8YT6ywuHYhhIHUHFWydIfqWv*txF8!/b/dDEBAAAAAAAA&bo=9gIMAgAAAAAKF8E!&rf=viewer_4) 147 | 148 | 149 | 150 | 4、 通过前两项分析可以看出30-34岁男性是转化率最高的群体,那么该群体的点击率如何呢? 151 | 152 | ![16](http://m.qpic.cn/psb?/V10P3PxC0koPqQ/h09yipMdBbh97UBFg*q25Jq0LVDAzNSpGPimlR156y0!/b/dDYBAAAAAAAA&bo=JwMRAgAAAAARBwc!&rf=viewer_4) 153 | 154 | 155 | 156 | 由上图可知,30-34岁的男性虽然转化率很高,但是点击率并不高。这说明如果我们可以想办法提高这个群体的点击率的话,就能提高转化。比如针对这个群体设计更吸引他们的广告。 157 | 158 | 159 | 160 | 161 | 162 | 163 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 大数据竞赛项目实战清单 2 | 3 | - 01_泰坦尼克号(titanic) 【kaggle】 4 | - 02_共享单车(Bike Sharing Demand) 【Kaggle】 5 | - 03_Facebook广告投放(Sales Conversion Optimization) 【Kaggle】 --------------------------------------------------------------------------------