├── .gitattributes ├── .gitignore ├── LICENSE.txt ├── README.md ├── aggregate.py ├── association_rules.py ├── average.py ├── basic_filter.py ├── data ├── dna_seq.txt ├── retail.txt ├── titanic.csv └── words.txt ├── figures ├── 20newsgroups.png ├── als.png ├── lda_topics.png ├── mlp.png ├── random_forrest.png └── spark.png ├── lda.py ├── log_reg.py ├── mapper.py ├── mlp.py ├── outliers.py ├── pi_est.py ├── random_forest.py ├── recommender.py ├── scala └── pi_est.scala ├── term_doc.py └── word_count.py /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | 4 | # Custom for Visual Studio 5 | *.cs diff=csharp 6 | 7 | # Standard to msysgit 8 | *.doc diff=astextplain 9 | *.DOC diff=astextplain 10 | *.docx diff=astextplain 11 | *.DOCX diff=astextplain 12 | *.dot diff=astextplain 13 | *.DOT diff=astextplain 14 | *.pdf diff=astextplain 15 | *.PDF diff=astextplain 16 | *.rtf diff=astextplain 17 | *.RTF diff=astextplain 18 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Windows image file caches 2 | Thumbs.db 3 | ehthumbs.db 4 | 5 | # Folder config file 6 | Desktop.ini 7 | 8 | # Recycle Bin used on file shares 9 | $RECYCLE.BIN/ 10 | 11 | # Windows Installer files 12 | *.cab 13 | *.msi 14 | *.msm 15 | *.msp 16 | 17 | # Windows shortcuts 18 | *.lnk 19 | 20 | # ========================= 21 | # Operating System Files 22 | # ========================= 23 | 24 | # OSX 25 | # ========================= 26 | 27 | .DS_Store 28 | .AppleDouble 29 | .LSOverride 30 | 31 | # Thumbnails 32 | ._* 33 | 34 | # Files that might appear in the root of a volume 35 | .DocumentRevisions-V100 36 | .fseventsd 37 | .Spotlight-V100 38 | .TemporaryItems 39 | .Trashes 40 | .VolumeIcon.icns 41 | 42 | # Directories potentially created on remote AFP share 43 | .AppleDB 44 | .AppleDesktop 45 | Network Trash Folder 46 | Temporary Items 47 | .apdisk 48 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017, Vadim Smolyakov 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # spark 2 | pyspark and scala spark 3 | 4 | ### Description 5 | 6 | This repo is a collection of Spark resources. 7 | 8 |

9 | 10 |

11 | 12 | References: 13 | *https://spark.apache.org/docs/latest/programming-guide.html#transformations* 14 | *https://spark.apache.org/docs/latest/ml-guide.html* 15 | 16 | **ALS Recommender System** 17 | 18 | Alternating Least Squares (ALS) matrix factorization recommender system was used to predict top movies for a new user given the ratings in the MovieLens dataset. 19 | 20 |

21 | 22 |

23 | 24 | RMSE was computed for different ALS ranks in order to select the best model, which was then used to predict movie ratings and recommend highest rated movies to a new user. 25 | 26 | References: 27 | *https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html* 28 | 29 | **Distributed Logistic Regression** 30 | 31 | LBFGS Logistic Regression was used to classify the 20newsgroups dataset according to one of 20 topics. Each document in a corpus was converted to a tf-idf vector labelled by the corresponding topic for training. 32 | 33 |

34 | 35 |

36 | 37 | A test accuracy was computed by predicting the topic label based on test tf-idf document vectors. The figure above shows a t-SNE visualization of the 20newsgroups corpus. 38 | 39 | References: 40 | *https://spark.apache.org/docs/latest/mllib-linear-methods.html#logistic-regression* 41 | 42 | **Distributed Random Forest** 43 | 44 | A random forest classifier was used to predict survival on the titanic using features such as age, class, ticket fare and others. The dataset was converted to Spark dataframe and the features were aggregated with vector assembler. 45 | 46 |

47 | 48 |

49 | 50 | A random forest with 100 trees and a max depth of 6 was used to make binary predictions using the Spark ML library. 51 | 52 | References: 53 | *https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier* 54 | 55 | **Multi Layer Perceptron** 56 | 57 | A Multi Layer Perceptron (MLP) was used to predict a binary label based on the titanic kaggle dataset. Spark data frames were used to read in and prepare the data for classification 58 | 59 |

60 | 61 |

62 | 63 | The MLP was configured with two hidden layers, 7 input and 2 output neurons. It achieved an occuracy of over 80% on the validation set with 100 training iterations. 64 | 65 | References: 66 | *https://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier* 67 | 68 | 69 | **Distributed LDA** 70 | 71 | A distributed Latent Dirichlet Allocation (LDA) topic model was fit on the 20 newsgroups dataset. The training data was preprocessed using a tokenizer, stop-word remover and a tf-idf transformer. 72 | 73 |

74 | 75 |

76 | 77 | The number of topics was set to K = 20. The figure above shows a word-cloud of topics learned from the 20 newsgroups dataset. 78 | 79 | References: 80 | *https://spark.apache.org/docs/latest/ml-clustering.html#latent-dirichlet-allocation-lda* 81 | 82 | 83 | **Misc** 84 | 85 | [RDD aggregation](https://github.com/vsmolyakov/pyspark/blob/master/aggregate.py), [RDD filter](https://github.com/vsmolyakov/pyspark/blob/master/basic_filter.py), [RDD mapper](https://github.com/vsmolyakov/pyspark/blob/master/mapper.py), 86 | [word count](https://github.com/vsmolyakov/pyspark/blob/master/word_count.py), [term document matrix](https://github.com/vsmolyakov/pyspark/blob/master/term_doc.py), [average](https://github.com/vsmolyakov/pyspark/blob/master/average.py), [outliers](https://github.com/vsmolyakov/pyspark/blob/master/outliers.py), [pi_est](https://github.com/vsmolyakov/pyspark/blob/master/pi_est.py) 87 | 88 | ### Dependencies 89 | 90 | PySpark 2.1.1 91 | Python 2.7 92 | 93 | -------------------------------------------------------------------------------- /aggregate.py: -------------------------------------------------------------------------------- 1 | ''' 2 | >>> sum_cnt 3 | (55, 10) 4 | ''' 5 | 6 | from pyspark import SparkContext 7 | 8 | if __name__ == "__main__": 9 | 10 | sc = SparkContext('local', 'aggregate') 11 | nums = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) 12 | 13 | sum_cnt = nums.aggregate( 14 | (0,0), #initial value 15 | (lambda acc, value: (acc[0] + value, acc[1] + 1)), #combine value with acc 16 | (lambda acc1, acc2: (acc1[0]+acc2[0],acc1[1]+acc2[1])) #combine accumulators 17 | ) 18 | 19 | print "mean: ", round(sum_cnt[0]/float(sum_cnt[1]),4) 20 | 21 | 22 | -------------------------------------------------------------------------------- /association_rules.py: -------------------------------------------------------------------------------- 1 | from pyspark import SparkContext 2 | from pyspark.sql import SQLContext 3 | from pyspark.sql import SparkSession 4 | from pyspark.sql import Row 5 | 6 | import numpy as np 7 | import seaborn as sns 8 | import matplotlib.pyplot as plt 9 | 10 | from pyspark.ml.fpm import FPGrowth 11 | 12 | if __name__ == "__main__": 13 | 14 | sc = SparkContext('local', 'arules') 15 | sqlContext = SQLContext(sc) 16 | 17 | spark = SparkSession\ 18 | .builder\ 19 | .appName("arules")\ 20 | .getOrCreate() 21 | 22 | #dataset = sc.textFile("./data/retail.txt") 23 | df = spark.createDataFrame([ 24 | (0, [1, 2, 5]), 25 | (1, [1, 2, 3, 5]), 26 | (2, [1, 2]) 27 | ], ["id", "items"]) 28 | 29 | fpGrowth = FPGrowth(itemsCol="items", minSupport=0.5, minConfidence=0.6) 30 | model = fpGrowth.fit(df) 31 | 32 | #display frequent itemsets 33 | model.freqItemsets.show() 34 | 35 | #display generated association rules 36 | model.associationRules.show() 37 | 38 | #apply transform 39 | model.transform(df).show() 40 | 41 | -------------------------------------------------------------------------------- /average.py: -------------------------------------------------------------------------------- 1 | ''' 2 | >>> sum_count 3 | (55, 10) 4 | >>> average 5 | 5.5 6 | ''' 7 | 8 | from pyspark import SparkContext 9 | 10 | if __name__ == "__main__": 11 | 12 | sc = SparkContext('local', 'word_count') 13 | nums = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) 14 | sum_count = nums.map(lambda x: (x, 1)).fold((0,0), (lambda x, y: (x[0]+y[0], x[1]+y[1]))) 15 | average = sum_count[0] / float(sum_count[1]) 16 | print average 17 | -------------------------------------------------------------------------------- /basic_filter.py: -------------------------------------------------------------------------------- 1 | ''' 2 | >>> print res 3 | [4, 16, 36, 64] 4 | ''' 5 | 6 | from pyspark import SparkContext 7 | 8 | def even_squares(num): 9 | return num.filter(lambda x: x % 2 == 0).map(lambda x: x * x) 10 | 11 | 12 | if __name__ == "__main__": 13 | 14 | sc = SparkContext('local', 'word_count') 15 | nums = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8]) 16 | res = sorted(even_squares(nums).collect()) 17 | print res 18 | -------------------------------------------------------------------------------- /data/dna_seq.txt: -------------------------------------------------------------------------------- 1 | ATATCCCCGGGAT 2 | ATCGATCGATATG -------------------------------------------------------------------------------- /data/titanic.csv: -------------------------------------------------------------------------------- 1 | PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked 2 | 1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 3 | 2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C 4 | 3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S 5 | 4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S 6 | 5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S 7 | 6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q 8 | 7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S 9 | 8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S 10 | 9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S 11 | 10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C 12 | 11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S 13 | 12,1,1,"Bonnell, Miss. Elizabeth",female,58,0,0,113783,26.55,C103,S 14 | 13,0,3,"Saundercock, Mr. William Henry",male,20,0,0,A/5. 2151,8.05,,S 15 | 14,0,3,"Andersson, Mr. Anders Johan",male,39,1,5,347082,31.275,,S 16 | 15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14,0,0,350406,7.8542,,S 17 | 16,1,2,"Hewlett, Mrs. (Mary D Kingcome) ",female,55,0,0,248706,16,,S 18 | 17,0,3,"Rice, Master. Eugene",male,2,4,1,382652,29.125,,Q 19 | 18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13,,S 20 | 19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",female,31,1,0,345763,18,,S 21 | 20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C 22 | 21,0,2,"Fynney, Mr. Joseph J",male,35,0,0,239865,26,,S 23 | 22,1,2,"Beesley, Mr. Lawrence",male,34,0,0,248698,13,D56,S 24 | 23,1,3,"McGowan, Miss. Anna ""Annie""",female,15,0,0,330923,8.0292,,Q 25 | 24,1,1,"Sloper, Mr. William Thompson",male,28,0,0,113788,35.5,A6,S 26 | 25,0,3,"Palsson, Miss. Torborg Danira",female,8,3,1,349909,21.075,,S 27 | 26,1,3,"Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)",female,38,1,5,347077,31.3875,,S 28 | 27,0,3,"Emir, Mr. Farred Chehab",male,,0,0,2631,7.225,,C 29 | 28,0,1,"Fortune, Mr. Charles Alexander",male,19,3,2,19950,263,C23 C25 C27,S 30 | 29,1,3,"O'Dwyer, Miss. Ellen ""Nellie""",female,,0,0,330959,7.8792,,Q 31 | 30,0,3,"Todoroff, Mr. Lalio",male,,0,0,349216,7.8958,,S 32 | 31,0,1,"Uruchurtu, Don. Manuel E",male,40,0,0,PC 17601,27.7208,,C 33 | 32,1,1,"Spencer, Mrs. William Augustus (Marie Eugenie)",female,,1,0,PC 17569,146.5208,B78,C 34 | 33,1,3,"Glynn, Miss. Mary Agatha",female,,0,0,335677,7.75,,Q 35 | 34,0,2,"Wheadon, Mr. Edward H",male,66,0,0,C.A. 24579,10.5,,S 36 | 35,0,1,"Meyer, Mr. Edgar Joseph",male,28,1,0,PC 17604,82.1708,,C 37 | 36,0,1,"Holverson, Mr. Alexander Oskar",male,42,1,0,113789,52,,S 38 | 37,1,3,"Mamee, Mr. Hanna",male,,0,0,2677,7.2292,,C 39 | 38,0,3,"Cann, Mr. Ernest Charles",male,21,0,0,A./5. 2152,8.05,,S 40 | 39,0,3,"Vander Planke, Miss. Augusta Maria",female,18,2,0,345764,18,,S 41 | 40,1,3,"Nicola-Yarred, Miss. Jamila",female,14,1,0,2651,11.2417,,C 42 | 41,0,3,"Ahlin, Mrs. Johan (Johanna Persdotter Larsson)",female,40,1,0,7546,9.475,,S 43 | 42,0,2,"Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)",female,27,1,0,11668,21,,S 44 | 43,0,3,"Kraeff, Mr. Theodor",male,,0,0,349253,7.8958,,C 45 | 44,1,2,"Laroche, Miss. Simonne Marie Anne Andree",female,3,1,2,SC/Paris 2123,41.5792,,C 46 | 45,1,3,"Devaney, Miss. Margaret Delia",female,19,0,0,330958,7.8792,,Q 47 | 46,0,3,"Rogers, Mr. William John",male,,0,0,S.C./A.4. 23567,8.05,,S 48 | 47,0,3,"Lennon, Mr. Denis",male,,1,0,370371,15.5,,Q 49 | 48,1,3,"O'Driscoll, Miss. Bridget",female,,0,0,14311,7.75,,Q 50 | 49,0,3,"Samaan, Mr. Youssef",male,,2,0,2662,21.6792,,C 51 | 50,0,3,"Arnold-Franchi, Mrs. Josef (Josefine Franchi)",female,18,1,0,349237,17.8,,S 52 | 51,0,3,"Panula, Master. Juha Niilo",male,7,4,1,3101295,39.6875,,S 53 | 52,0,3,"Nosworthy, Mr. Richard Cater",male,21,0,0,A/4. 39886,7.8,,S 54 | 53,1,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female,49,1,0,PC 17572,76.7292,D33,C 55 | 54,1,2,"Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson)",female,29,1,0,2926,26,,S 56 | 55,0,1,"Ostby, Mr. Engelhart Cornelius",male,65,0,1,113509,61.9792,B30,C 57 | 56,1,1,"Woolner, Mr. Hugh",male,,0,0,19947,35.5,C52,S 58 | 57,1,2,"Rugg, Miss. Emily",female,21,0,0,C.A. 31026,10.5,,S 59 | 58,0,3,"Novel, Mr. Mansouer",male,28.5,0,0,2697,7.2292,,C 60 | 59,1,2,"West, Miss. Constance Mirium",female,5,1,2,C.A. 34651,27.75,,S 61 | 60,0,3,"Goodwin, Master. William Frederick",male,11,5,2,CA 2144,46.9,,S 62 | 61,0,3,"Sirayanian, Mr. Orsen",male,22,0,0,2669,7.2292,,C 63 | 62,1,1,"Icard, Miss. Amelie",female,38,0,0,113572,80,B28, 64 | 63,0,1,"Harris, Mr. Henry Birkhardt",male,45,1,0,36973,83.475,C83,S 65 | 64,0,3,"Skoog, Master. Harald",male,4,3,2,347088,27.9,,S 66 | 65,0,1,"Stewart, Mr. Albert A",male,,0,0,PC 17605,27.7208,,C 67 | 66,1,3,"Moubarek, Master. Gerios",male,,1,1,2661,15.2458,,C 68 | 67,1,2,"Nye, Mrs. (Elizabeth Ramell)",female,29,0,0,C.A. 29395,10.5,F33,S 69 | 68,0,3,"Crease, Mr. Ernest James",male,19,0,0,S.P. 3464,8.1583,,S 70 | 69,1,3,"Andersson, Miss. Erna Alexandra",female,17,4,2,3101281,7.925,,S 71 | 70,0,3,"Kink, Mr. Vincenz",male,26,2,0,315151,8.6625,,S 72 | 71,0,2,"Jenkin, Mr. Stephen Curnow",male,32,0,0,C.A. 33111,10.5,,S 73 | 72,0,3,"Goodwin, Miss. Lillian Amy",female,16,5,2,CA 2144,46.9,,S 74 | 73,0,2,"Hood, Mr. Ambrose Jr",male,21,0,0,S.O.C. 14879,73.5,,S 75 | 74,0,3,"Chronopoulos, Mr. Apostolos",male,26,1,0,2680,14.4542,,C 76 | 75,1,3,"Bing, Mr. Lee",male,32,0,0,1601,56.4958,,S 77 | 76,0,3,"Moen, Mr. Sigurd Hansen",male,25,0,0,348123,7.65,F G73,S 78 | 77,0,3,"Staneff, Mr. Ivan",male,,0,0,349208,7.8958,,S 79 | 78,0,3,"Moutal, Mr. Rahamin Haim",male,,0,0,374746,8.05,,S 80 | 79,1,2,"Caldwell, Master. Alden Gates",male,0.83,0,2,248738,29,,S 81 | 80,1,3,"Dowdell, Miss. Elizabeth",female,30,0,0,364516,12.475,,S 82 | 81,0,3,"Waelens, Mr. Achille",male,22,0,0,345767,9,,S 83 | 82,1,3,"Sheerlinck, Mr. Jan Baptist",male,29,0,0,345779,9.5,,S 84 | 83,1,3,"McDermott, Miss. Brigdet Delia",female,,0,0,330932,7.7875,,Q 85 | 84,0,1,"Carrau, Mr. Francisco M",male,28,0,0,113059,47.1,,S 86 | 85,1,2,"Ilett, Miss. Bertha",female,17,0,0,SO/C 14885,10.5,,S 87 | 86,1,3,"Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",female,33,3,0,3101278,15.85,,S 88 | 87,0,3,"Ford, Mr. William Neal",male,16,1,3,W./C. 6608,34.375,,S 89 | 88,0,3,"Slocovski, Mr. Selman Francis",male,,0,0,SOTON/OQ 392086,8.05,,S 90 | 89,1,1,"Fortune, Miss. Mabel Helen",female,23,3,2,19950,263,C23 C25 C27,S 91 | 90,0,3,"Celotti, Mr. Francesco",male,24,0,0,343275,8.05,,S 92 | 91,0,3,"Christmann, Mr. Emil",male,29,0,0,343276,8.05,,S 93 | 92,0,3,"Andreasson, Mr. Paul Edvin",male,20,0,0,347466,7.8542,,S 94 | 93,0,1,"Chaffee, Mr. Herbert Fuller",male,46,1,0,W.E.P. 5734,61.175,E31,S 95 | 94,0,3,"Dean, Mr. Bertram Frank",male,26,1,2,C.A. 2315,20.575,,S 96 | 95,0,3,"Coxon, Mr. Daniel",male,59,0,0,364500,7.25,,S 97 | 96,0,3,"Shorney, Mr. Charles Joseph",male,,0,0,374910,8.05,,S 98 | 97,0,1,"Goldschmidt, Mr. George B",male,71,0,0,PC 17754,34.6542,A5,C 99 | 98,1,1,"Greenfield, Mr. William Bertram",male,23,0,1,PC 17759,63.3583,D10 D12,C 100 | 99,1,2,"Doling, Mrs. John T (Ada Julia Bone)",female,34,0,1,231919,23,,S 101 | 100,0,2,"Kantor, Mr. Sinai",male,34,1,0,244367,26,,S 102 | 101,0,3,"Petranec, Miss. Matilda",female,28,0,0,349245,7.8958,,S 103 | 102,0,3,"Petroff, Mr. Pastcho (""Pentcho"")",male,,0,0,349215,7.8958,,S 104 | 103,0,1,"White, Mr. Richard Frasar",male,21,0,1,35281,77.2875,D26,S 105 | 104,0,3,"Johansson, Mr. Gustaf Joel",male,33,0,0,7540,8.6542,,S 106 | 105,0,3,"Gustafsson, Mr. Anders Vilhelm",male,37,2,0,3101276,7.925,,S 107 | 106,0,3,"Mionoff, Mr. Stoytcho",male,28,0,0,349207,7.8958,,S 108 | 107,1,3,"Salkjelsvik, Miss. Anna Kristine",female,21,0,0,343120,7.65,,S 109 | 108,1,3,"Moss, Mr. Albert Johan",male,,0,0,312991,7.775,,S 110 | 109,0,3,"Rekic, Mr. Tido",male,38,0,0,349249,7.8958,,S 111 | 110,1,3,"Moran, Miss. Bertha",female,,1,0,371110,24.15,,Q 112 | 111,0,1,"Porter, Mr. Walter Chamberlain",male,47,0,0,110465,52,C110,S 113 | 112,0,3,"Zabour, Miss. Hileni",female,14.5,1,0,2665,14.4542,,C 114 | 113,0,3,"Barton, Mr. David John",male,22,0,0,324669,8.05,,S 115 | 114,0,3,"Jussila, Miss. Katriina",female,20,1,0,4136,9.825,,S 116 | 115,0,3,"Attalah, Miss. Malake",female,17,0,0,2627,14.4583,,C 117 | 116,0,3,"Pekoniemi, Mr. Edvard",male,21,0,0,STON/O 2. 3101294,7.925,,S 118 | 117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q 119 | 118,0,2,"Turpin, Mr. William John Robert",male,29,1,0,11668,21,,S 120 | 119,0,1,"Baxter, Mr. Quigg Edmond",male,24,0,1,PC 17558,247.5208,B58 B60,C 121 | 120,0,3,"Andersson, Miss. Ellis Anna Maria",female,2,4,2,347082,31.275,,S 122 | 121,0,2,"Hickman, Mr. Stanley George",male,21,2,0,S.O.C. 14879,73.5,,S 123 | 122,0,3,"Moore, Mr. Leonard Charles",male,,0,0,A4. 54510,8.05,,S 124 | 123,0,2,"Nasser, Mr. Nicholas",male,32.5,1,0,237736,30.0708,,C 125 | 124,1,2,"Webber, Miss. Susan",female,32.5,0,0,27267,13,E101,S 126 | 125,0,1,"White, Mr. Percival Wayland",male,54,0,1,35281,77.2875,D26,S 127 | 126,1,3,"Nicola-Yarred, Master. Elias",male,12,1,0,2651,11.2417,,C 128 | 127,0,3,"McMahon, Mr. Martin",male,,0,0,370372,7.75,,Q 129 | 128,1,3,"Madsen, Mr. Fridtjof Arne",male,24,0,0,C 17369,7.1417,,S 130 | 129,1,3,"Peter, Miss. Anna",female,,1,1,2668,22.3583,F E69,C 131 | 130,0,3,"Ekstrom, Mr. Johan",male,45,0,0,347061,6.975,,S 132 | 131,0,3,"Drazenoic, Mr. Jozef",male,33,0,0,349241,7.8958,,C 133 | 132,0,3,"Coelho, Mr. Domingos Fernandeo",male,20,0,0,SOTON/O.Q. 3101307,7.05,,S 134 | 133,0,3,"Robins, Mrs. Alexander A (Grace Charity Laury)",female,47,1,0,A/5. 3337,14.5,,S 135 | 134,1,2,"Weisz, Mrs. Leopold (Mathilde Francoise Pede)",female,29,1,0,228414,26,,S 136 | 135,0,2,"Sobey, Mr. Samuel James Hayden",male,25,0,0,C.A. 29178,13,,S 137 | 136,0,2,"Richard, Mr. Emile",male,23,0,0,SC/PARIS 2133,15.0458,,C 138 | 137,1,1,"Newsom, Miss. Helen Monypeny",female,19,0,2,11752,26.2833,D47,S 139 | 138,0,1,"Futrelle, Mr. Jacques Heath",male,37,1,0,113803,53.1,C123,S 140 | 139,0,3,"Osen, Mr. Olaf Elon",male,16,0,0,7534,9.2167,,S 141 | 140,0,1,"Giglio, Mr. Victor",male,24,0,0,PC 17593,79.2,B86,C 142 | 141,0,3,"Boulos, Mrs. Joseph (Sultana)",female,,0,2,2678,15.2458,,C 143 | 142,1,3,"Nysten, Miss. Anna Sofia",female,22,0,0,347081,7.75,,S 144 | 143,1,3,"Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)",female,24,1,0,STON/O2. 3101279,15.85,,S 145 | 144,0,3,"Burke, Mr. Jeremiah",male,19,0,0,365222,6.75,,Q 146 | 145,0,2,"Andrew, Mr. Edgardo Samuel",male,18,0,0,231945,11.5,,S 147 | 146,0,2,"Nicholls, Mr. Joseph Charles",male,19,1,1,C.A. 33112,36.75,,S 148 | 147,1,3,"Andersson, Mr. August Edvard (""Wennerstrom"")",male,27,0,0,350043,7.7958,,S 149 | 148,0,3,"Ford, Miss. Robina Maggie ""Ruby""",female,9,2,2,W./C. 6608,34.375,,S 150 | 149,0,2,"Navratil, Mr. Michel (""Louis M Hoffman"")",male,36.5,0,2,230080,26,F2,S 151 | 150,0,2,"Byles, Rev. Thomas Roussel Davids",male,42,0,0,244310,13,,S 152 | 151,0,2,"Bateman, Rev. Robert James",male,51,0,0,S.O.P. 1166,12.525,,S 153 | 152,1,1,"Pears, Mrs. Thomas (Edith Wearne)",female,22,1,0,113776,66.6,C2,S 154 | 153,0,3,"Meo, Mr. Alfonzo",male,55.5,0,0,A.5. 11206,8.05,,S 155 | 154,0,3,"van Billiard, Mr. Austin Blyler",male,40.5,0,2,A/5. 851,14.5,,S 156 | 155,0,3,"Olsen, Mr. Ole Martin",male,,0,0,Fa 265302,7.3125,,S 157 | 156,0,1,"Williams, Mr. Charles Duane",male,51,0,1,PC 17597,61.3792,,C 158 | 157,1,3,"Gilnagh, Miss. Katherine ""Katie""",female,16,0,0,35851,7.7333,,Q 159 | 158,0,3,"Corn, Mr. Harry",male,30,0,0,SOTON/OQ 392090,8.05,,S 160 | 159,0,3,"Smiljanic, Mr. Mile",male,,0,0,315037,8.6625,,S 161 | 160,0,3,"Sage, Master. Thomas Henry",male,,8,2,CA. 2343,69.55,,S 162 | 161,0,3,"Cribb, Mr. John Hatfield",male,44,0,1,371362,16.1,,S 163 | 162,1,2,"Watt, Mrs. James (Elizabeth ""Bessie"" Inglis Milne)",female,40,0,0,C.A. 33595,15.75,,S 164 | 163,0,3,"Bengtsson, Mr. John Viktor",male,26,0,0,347068,7.775,,S 165 | 164,0,3,"Calic, Mr. Jovo",male,17,0,0,315093,8.6625,,S 166 | 165,0,3,"Panula, Master. Eino Viljami",male,1,4,1,3101295,39.6875,,S 167 | 166,1,3,"Goldsmith, Master. Frank John William ""Frankie""",male,9,0,2,363291,20.525,,S 168 | 167,1,1,"Chibnall, Mrs. (Edith Martha Bowerman)",female,,0,1,113505,55,E33,S 169 | 168,0,3,"Skoog, Mrs. William (Anna Bernhardina Karlsson)",female,45,1,4,347088,27.9,,S 170 | 169,0,1,"Baumann, Mr. John D",male,,0,0,PC 17318,25.925,,S 171 | 170,0,3,"Ling, Mr. Lee",male,28,0,0,1601,56.4958,,S 172 | 171,0,1,"Van der hoef, Mr. Wyckoff",male,61,0,0,111240,33.5,B19,S 173 | 172,0,3,"Rice, Master. Arthur",male,4,4,1,382652,29.125,,Q 174 | 173,1,3,"Johnson, Miss. Eleanor Ileen",female,1,1,1,347742,11.1333,,S 175 | 174,0,3,"Sivola, Mr. Antti Wilhelm",male,21,0,0,STON/O 2. 3101280,7.925,,S 176 | 175,0,1,"Smith, Mr. James Clinch",male,56,0,0,17764,30.6958,A7,C 177 | 176,0,3,"Klasen, Mr. Klas Albin",male,18,1,1,350404,7.8542,,S 178 | 177,0,3,"Lefebre, Master. Henry Forbes",male,,3,1,4133,25.4667,,S 179 | 178,0,1,"Isham, Miss. Ann Elizabeth",female,50,0,0,PC 17595,28.7125,C49,C 180 | 179,0,2,"Hale, Mr. Reginald",male,30,0,0,250653,13,,S 181 | 180,0,3,"Leonard, Mr. Lionel",male,36,0,0,LINE,0,,S 182 | 181,0,3,"Sage, Miss. Constance Gladys",female,,8,2,CA. 2343,69.55,,S 183 | 182,0,2,"Pernot, Mr. Rene",male,,0,0,SC/PARIS 2131,15.05,,C 184 | 183,0,3,"Asplund, Master. Clarence Gustaf Hugo",male,9,4,2,347077,31.3875,,S 185 | 184,1,2,"Becker, Master. Richard F",male,1,2,1,230136,39,F4,S 186 | 185,1,3,"Kink-Heilmann, Miss. Luise Gretchen",female,4,0,2,315153,22.025,,S 187 | 186,0,1,"Rood, Mr. Hugh Roscoe",male,,0,0,113767,50,A32,S 188 | 187,1,3,"O'Brien, Mrs. Thomas (Johanna ""Hannah"" Godfrey)",female,,1,0,370365,15.5,,Q 189 | 188,1,1,"Romaine, Mr. Charles Hallace (""Mr C Rolmane"")",male,45,0,0,111428,26.55,,S 190 | 189,0,3,"Bourke, Mr. John",male,40,1,1,364849,15.5,,Q 191 | 190,0,3,"Turcin, Mr. Stjepan",male,36,0,0,349247,7.8958,,S 192 | 191,1,2,"Pinsky, Mrs. (Rosa)",female,32,0,0,234604,13,,S 193 | 192,0,2,"Carbines, Mr. William",male,19,0,0,28424,13,,S 194 | 193,1,3,"Andersen-Jensen, Miss. Carla Christine Nielsine",female,19,1,0,350046,7.8542,,S 195 | 194,1,2,"Navratil, Master. Michel M",male,3,1,1,230080,26,F2,S 196 | 195,1,1,"Brown, Mrs. James Joseph (Margaret Tobin)",female,44,0,0,PC 17610,27.7208,B4,C 197 | 196,1,1,"Lurette, Miss. Elise",female,58,0,0,PC 17569,146.5208,B80,C 198 | 197,0,3,"Mernagh, Mr. Robert",male,,0,0,368703,7.75,,Q 199 | 198,0,3,"Olsen, Mr. Karl Siegwart Andreas",male,42,0,1,4579,8.4042,,S 200 | 199,1,3,"Madigan, Miss. Margaret ""Maggie""",female,,0,0,370370,7.75,,Q 201 | 200,0,2,"Yrois, Miss. Henriette (""Mrs Harbeck"")",female,24,0,0,248747,13,,S 202 | 201,0,3,"Vande Walle, Mr. Nestor Cyriel",male,28,0,0,345770,9.5,,S 203 | 202,0,3,"Sage, Mr. Frederick",male,,8,2,CA. 2343,69.55,,S 204 | 203,0,3,"Johanson, Mr. Jakob Alfred",male,34,0,0,3101264,6.4958,,S 205 | 204,0,3,"Youseff, Mr. Gerious",male,45.5,0,0,2628,7.225,,C 206 | 205,1,3,"Cohen, Mr. Gurshon ""Gus""",male,18,0,0,A/5 3540,8.05,,S 207 | 206,0,3,"Strom, Miss. Telma Matilda",female,2,0,1,347054,10.4625,G6,S 208 | 207,0,3,"Backstrom, Mr. Karl Alfred",male,32,1,0,3101278,15.85,,S 209 | 208,1,3,"Albimona, Mr. Nassef Cassem",male,26,0,0,2699,18.7875,,C 210 | 209,1,3,"Carr, Miss. Helen ""Ellen""",female,16,0,0,367231,7.75,,Q 211 | 210,1,1,"Blank, Mr. Henry",male,40,0,0,112277,31,A31,C 212 | 211,0,3,"Ali, Mr. Ahmed",male,24,0,0,SOTON/O.Q. 3101311,7.05,,S 213 | 212,1,2,"Cameron, Miss. Clear Annie",female,35,0,0,F.C.C. 13528,21,,S 214 | 213,0,3,"Perkin, Mr. John Henry",male,22,0,0,A/5 21174,7.25,,S 215 | 214,0,2,"Givard, Mr. Hans Kristensen",male,30,0,0,250646,13,,S 216 | 215,0,3,"Kiernan, Mr. Philip",male,,1,0,367229,7.75,,Q 217 | 216,1,1,"Newell, Miss. Madeleine",female,31,1,0,35273,113.275,D36,C 218 | 217,1,3,"Honkanen, Miss. Eliina",female,27,0,0,STON/O2. 3101283,7.925,,S 219 | 218,0,2,"Jacobsohn, Mr. Sidney Samuel",male,42,1,0,243847,27,,S 220 | 219,1,1,"Bazzani, Miss. Albina",female,32,0,0,11813,76.2917,D15,C 221 | 220,0,2,"Harris, Mr. Walter",male,30,0,0,W/C 14208,10.5,,S 222 | 221,1,3,"Sunderland, Mr. Victor Francis",male,16,0,0,SOTON/OQ 392089,8.05,,S 223 | 222,0,2,"Bracken, Mr. James H",male,27,0,0,220367,13,,S 224 | 223,0,3,"Green, Mr. George Henry",male,51,0,0,21440,8.05,,S 225 | 224,0,3,"Nenkoff, Mr. Christo",male,,0,0,349234,7.8958,,S 226 | 225,1,1,"Hoyt, Mr. Frederick Maxfield",male,38,1,0,19943,90,C93,S 227 | 226,0,3,"Berglund, Mr. Karl Ivar Sven",male,22,0,0,PP 4348,9.35,,S 228 | 227,1,2,"Mellors, Mr. William John",male,19,0,0,SW/PP 751,10.5,,S 229 | 228,0,3,"Lovell, Mr. John Hall (""Henry"")",male,20.5,0,0,A/5 21173,7.25,,S 230 | 229,0,2,"Fahlstrom, Mr. Arne Jonas",male,18,0,0,236171,13,,S 231 | 230,0,3,"Lefebre, Miss. Mathilde",female,,3,1,4133,25.4667,,S 232 | 231,1,1,"Harris, Mrs. Henry Birkhardt (Irene Wallach)",female,35,1,0,36973,83.475,C83,S 233 | 232,0,3,"Larsson, Mr. Bengt Edvin",male,29,0,0,347067,7.775,,S 234 | 233,0,2,"Sjostedt, Mr. Ernst Adolf",male,59,0,0,237442,13.5,,S 235 | 234,1,3,"Asplund, Miss. Lillian Gertrud",female,5,4,2,347077,31.3875,,S 236 | 235,0,2,"Leyson, Mr. Robert William Norman",male,24,0,0,C.A. 29566,10.5,,S 237 | 236,0,3,"Harknett, Miss. Alice Phoebe",female,,0,0,W./C. 6609,7.55,,S 238 | 237,0,2,"Hold, Mr. Stephen",male,44,1,0,26707,26,,S 239 | 238,1,2,"Collyer, Miss. Marjorie ""Lottie""",female,8,0,2,C.A. 31921,26.25,,S 240 | 239,0,2,"Pengelly, Mr. Frederick William",male,19,0,0,28665,10.5,,S 241 | 240,0,2,"Hunt, Mr. George Henry",male,33,0,0,SCO/W 1585,12.275,,S 242 | 241,0,3,"Zabour, Miss. Thamine",female,,1,0,2665,14.4542,,C 243 | 242,1,3,"Murphy, Miss. Katherine ""Kate""",female,,1,0,367230,15.5,,Q 244 | 243,0,2,"Coleridge, Mr. Reginald Charles",male,29,0,0,W./C. 14263,10.5,,S 245 | 244,0,3,"Maenpaa, Mr. Matti Alexanteri",male,22,0,0,STON/O 2. 3101275,7.125,,S 246 | 245,0,3,"Attalah, Mr. Sleiman",male,30,0,0,2694,7.225,,C 247 | 246,0,1,"Minahan, Dr. William Edward",male,44,2,0,19928,90,C78,Q 248 | 247,0,3,"Lindahl, Miss. Agda Thorilda Viktoria",female,25,0,0,347071,7.775,,S 249 | 248,1,2,"Hamalainen, Mrs. William (Anna)",female,24,0,2,250649,14.5,,S 250 | 249,1,1,"Beckwith, Mr. Richard Leonard",male,37,1,1,11751,52.5542,D35,S 251 | 250,0,2,"Carter, Rev. Ernest Courtenay",male,54,1,0,244252,26,,S 252 | 251,0,3,"Reed, Mr. James George",male,,0,0,362316,7.25,,S 253 | 252,0,3,"Strom, Mrs. Wilhelm (Elna Matilda Persson)",female,29,1,1,347054,10.4625,G6,S 254 | 253,0,1,"Stead, Mr. William Thomas",male,62,0,0,113514,26.55,C87,S 255 | 254,0,3,"Lobb, Mr. William Arthur",male,30,1,0,A/5. 3336,16.1,,S 256 | 255,0,3,"Rosblom, Mrs. Viktor (Helena Wilhelmina)",female,41,0,2,370129,20.2125,,S 257 | 256,1,3,"Touma, Mrs. Darwis (Hanne Youssef Razi)",female,29,0,2,2650,15.2458,,C 258 | 257,1,1,"Thorne, Mrs. Gertrude Maybelle",female,,0,0,PC 17585,79.2,,C 259 | 258,1,1,"Cherry, Miss. Gladys",female,30,0,0,110152,86.5,B77,S 260 | 259,1,1,"Ward, Miss. Anna",female,35,0,0,PC 17755,512.3292,,C 261 | 260,1,2,"Parrish, Mrs. (Lutie Davis)",female,50,0,1,230433,26,,S 262 | 261,0,3,"Smith, Mr. Thomas",male,,0,0,384461,7.75,,Q 263 | 262,1,3,"Asplund, Master. Edvin Rojj Felix",male,3,4,2,347077,31.3875,,S 264 | 263,0,1,"Taussig, Mr. Emil",male,52,1,1,110413,79.65,E67,S 265 | 264,0,1,"Harrison, Mr. William",male,40,0,0,112059,0,B94,S 266 | 265,0,3,"Henry, Miss. Delia",female,,0,0,382649,7.75,,Q 267 | 266,0,2,"Reeves, Mr. David",male,36,0,0,C.A. 17248,10.5,,S 268 | 267,0,3,"Panula, Mr. Ernesti Arvid",male,16,4,1,3101295,39.6875,,S 269 | 268,1,3,"Persson, Mr. Ernst Ulrik",male,25,1,0,347083,7.775,,S 270 | 269,1,1,"Graham, Mrs. William Thompson (Edith Junkins)",female,58,0,1,PC 17582,153.4625,C125,S 271 | 270,1,1,"Bissette, Miss. Amelia",female,35,0,0,PC 17760,135.6333,C99,S 272 | 271,0,1,"Cairns, Mr. Alexander",male,,0,0,113798,31,,S 273 | 272,1,3,"Tornquist, Mr. William Henry",male,25,0,0,LINE,0,,S 274 | 273,1,2,"Mellinger, Mrs. (Elizabeth Anne Maidment)",female,41,0,1,250644,19.5,,S 275 | 274,0,1,"Natsch, Mr. Charles H",male,37,0,1,PC 17596,29.7,C118,C 276 | 275,1,3,"Healy, Miss. Hanora ""Nora""",female,,0,0,370375,7.75,,Q 277 | 276,1,1,"Andrews, Miss. Kornelia Theodosia",female,63,1,0,13502,77.9583,D7,S 278 | 277,0,3,"Lindblom, Miss. Augusta Charlotta",female,45,0,0,347073,7.75,,S 279 | 278,0,2,"Parkes, Mr. Francis ""Frank""",male,,0,0,239853,0,,S 280 | 279,0,3,"Rice, Master. Eric",male,7,4,1,382652,29.125,,Q 281 | 280,1,3,"Abbott, Mrs. Stanton (Rosa Hunt)",female,35,1,1,C.A. 2673,20.25,,S 282 | 281,0,3,"Duane, Mr. Frank",male,65,0,0,336439,7.75,,Q 283 | 282,0,3,"Olsson, Mr. Nils Johan Goransson",male,28,0,0,347464,7.8542,,S 284 | 283,0,3,"de Pelsmaeker, Mr. Alfons",male,16,0,0,345778,9.5,,S 285 | 284,1,3,"Dorking, Mr. Edward Arthur",male,19,0,0,A/5. 10482,8.05,,S 286 | 285,0,1,"Smith, Mr. Richard William",male,,0,0,113056,26,A19,S 287 | 286,0,3,"Stankovic, Mr. Ivan",male,33,0,0,349239,8.6625,,C 288 | 287,1,3,"de Mulder, Mr. Theodore",male,30,0,0,345774,9.5,,S 289 | 288,0,3,"Naidenoff, Mr. Penko",male,22,0,0,349206,7.8958,,S 290 | 289,1,2,"Hosono, Mr. Masabumi",male,42,0,0,237798,13,,S 291 | 290,1,3,"Connolly, Miss. Kate",female,22,0,0,370373,7.75,,Q 292 | 291,1,1,"Barber, Miss. Ellen ""Nellie""",female,26,0,0,19877,78.85,,S 293 | 292,1,1,"Bishop, Mrs. Dickinson H (Helen Walton)",female,19,1,0,11967,91.0792,B49,C 294 | 293,0,2,"Levy, Mr. Rene Jacques",male,36,0,0,SC/Paris 2163,12.875,D,C 295 | 294,0,3,"Haas, Miss. Aloisia",female,24,0,0,349236,8.85,,S 296 | 295,0,3,"Mineff, Mr. Ivan",male,24,0,0,349233,7.8958,,S 297 | 296,0,1,"Lewy, Mr. Ervin G",male,,0,0,PC 17612,27.7208,,C 298 | 297,0,3,"Hanna, Mr. Mansour",male,23.5,0,0,2693,7.2292,,C 299 | 298,0,1,"Allison, Miss. Helen Loraine",female,2,1,2,113781,151.55,C22 C26,S 300 | 299,1,1,"Saalfeld, Mr. Adolphe",male,,0,0,19988,30.5,C106,S 301 | 300,1,1,"Baxter, Mrs. James (Helene DeLaudeniere Chaput)",female,50,0,1,PC 17558,247.5208,B58 B60,C 302 | 301,1,3,"Kelly, Miss. Anna Katherine ""Annie Kate""",female,,0,0,9234,7.75,,Q 303 | 302,1,3,"McCoy, Mr. Bernard",male,,2,0,367226,23.25,,Q 304 | 303,0,3,"Johnson, Mr. William Cahoone Jr",male,19,0,0,LINE,0,,S 305 | 304,1,2,"Keane, Miss. Nora A",female,,0,0,226593,12.35,E101,Q 306 | 305,0,3,"Williams, Mr. Howard Hugh ""Harry""",male,,0,0,A/5 2466,8.05,,S 307 | 306,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S 308 | 307,1,1,"Fleming, Miss. Margaret",female,,0,0,17421,110.8833,,C 309 | 308,1,1,"Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)",female,17,1,0,PC 17758,108.9,C65,C 310 | 309,0,2,"Abelson, Mr. Samuel",male,30,1,0,P/PP 3381,24,,C 311 | 310,1,1,"Francatelli, Miss. Laura Mabel",female,30,0,0,PC 17485,56.9292,E36,C 312 | 311,1,1,"Hays, Miss. Margaret Bechstein",female,24,0,0,11767,83.1583,C54,C 313 | 312,1,1,"Ryerson, Miss. Emily Borie",female,18,2,2,PC 17608,262.375,B57 B59 B63 B66,C 314 | 313,0,2,"Lahtinen, Mrs. William (Anna Sylfven)",female,26,1,1,250651,26,,S 315 | 314,0,3,"Hendekovic, Mr. Ignjac",male,28,0,0,349243,7.8958,,S 316 | 315,0,2,"Hart, Mr. Benjamin",male,43,1,1,F.C.C. 13529,26.25,,S 317 | 316,1,3,"Nilsson, Miss. Helmina Josefina",female,26,0,0,347470,7.8542,,S 318 | 317,1,2,"Kantor, Mrs. Sinai (Miriam Sternin)",female,24,1,0,244367,26,,S 319 | 318,0,2,"Moraweck, Dr. Ernest",male,54,0,0,29011,14,,S 320 | 319,1,1,"Wick, Miss. Mary Natalie",female,31,0,2,36928,164.8667,C7,S 321 | 320,1,1,"Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)",female,40,1,1,16966,134.5,E34,C 322 | 321,0,3,"Dennis, Mr. Samuel",male,22,0,0,A/5 21172,7.25,,S 323 | 322,0,3,"Danoff, Mr. Yoto",male,27,0,0,349219,7.8958,,S 324 | 323,1,2,"Slayter, Miss. Hilda Mary",female,30,0,0,234818,12.35,,Q 325 | 324,1,2,"Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)",female,22,1,1,248738,29,,S 326 | 325,0,3,"Sage, Mr. George John Jr",male,,8,2,CA. 2343,69.55,,S 327 | 326,1,1,"Young, Miss. Marie Grice",female,36,0,0,PC 17760,135.6333,C32,C 328 | 327,0,3,"Nysveen, Mr. Johan Hansen",male,61,0,0,345364,6.2375,,S 329 | 328,1,2,"Ball, Mrs. (Ada E Hall)",female,36,0,0,28551,13,D,S 330 | 329,1,3,"Goldsmith, Mrs. Frank John (Emily Alice Brown)",female,31,1,1,363291,20.525,,S 331 | 330,1,1,"Hippach, Miss. Jean Gertrude",female,16,0,1,111361,57.9792,B18,C 332 | 331,1,3,"McCoy, Miss. Agnes",female,,2,0,367226,23.25,,Q 333 | 332,0,1,"Partner, Mr. Austen",male,45.5,0,0,113043,28.5,C124,S 334 | 333,0,1,"Graham, Mr. George Edward",male,38,0,1,PC 17582,153.4625,C91,S 335 | 334,0,3,"Vander Planke, Mr. Leo Edmondus",male,16,2,0,345764,18,,S 336 | 335,1,1,"Frauenthal, Mrs. Henry William (Clara Heinsheimer)",female,,1,0,PC 17611,133.65,,S 337 | 336,0,3,"Denkoff, Mr. Mitto",male,,0,0,349225,7.8958,,S 338 | 337,0,1,"Pears, Mr. Thomas Clinton",male,29,1,0,113776,66.6,C2,S 339 | 338,1,1,"Burns, Miss. Elizabeth Margaret",female,41,0,0,16966,134.5,E40,C 340 | 339,1,3,"Dahl, Mr. Karl Edwart",male,45,0,0,7598,8.05,,S 341 | 340,0,1,"Blackwell, Mr. Stephen Weart",male,45,0,0,113784,35.5,T,S 342 | 341,1,2,"Navratil, Master. Edmond Roger",male,2,1,1,230080,26,F2,S 343 | 342,1,1,"Fortune, Miss. Alice Elizabeth",female,24,3,2,19950,263,C23 C25 C27,S 344 | 343,0,2,"Collander, Mr. Erik Gustaf",male,28,0,0,248740,13,,S 345 | 344,0,2,"Sedgwick, Mr. Charles Frederick Waddington",male,25,0,0,244361,13,,S 346 | 345,0,2,"Fox, Mr. Stanley Hubert",male,36,0,0,229236,13,,S 347 | 346,1,2,"Brown, Miss. Amelia ""Mildred""",female,24,0,0,248733,13,F33,S 348 | 347,1,2,"Smith, Miss. Marion Elsie",female,40,0,0,31418,13,,S 349 | 348,1,3,"Davison, Mrs. Thomas Henry (Mary E Finck)",female,,1,0,386525,16.1,,S 350 | 349,1,3,"Coutts, Master. William Loch ""William""",male,3,1,1,C.A. 37671,15.9,,S 351 | 350,0,3,"Dimic, Mr. Jovan",male,42,0,0,315088,8.6625,,S 352 | 351,0,3,"Odahl, Mr. Nils Martin",male,23,0,0,7267,9.225,,S 353 | 352,0,1,"Williams-Lambert, Mr. Fletcher Fellows",male,,0,0,113510,35,C128,S 354 | 353,0,3,"Elias, Mr. Tannous",male,15,1,1,2695,7.2292,,C 355 | 354,0,3,"Arnold-Franchi, Mr. Josef",male,25,1,0,349237,17.8,,S 356 | 355,0,3,"Yousif, Mr. Wazli",male,,0,0,2647,7.225,,C 357 | 356,0,3,"Vanden Steen, Mr. Leo Peter",male,28,0,0,345783,9.5,,S 358 | 357,1,1,"Bowerman, Miss. Elsie Edith",female,22,0,1,113505,55,E33,S 359 | 358,0,2,"Funk, Miss. Annie Clemmer",female,38,0,0,237671,13,,S 360 | 359,1,3,"McGovern, Miss. Mary",female,,0,0,330931,7.8792,,Q 361 | 360,1,3,"Mockler, Miss. Helen Mary ""Ellie""",female,,0,0,330980,7.8792,,Q 362 | 361,0,3,"Skoog, Mr. Wilhelm",male,40,1,4,347088,27.9,,S 363 | 362,0,2,"del Carlo, Mr. Sebastiano",male,29,1,0,SC/PARIS 2167,27.7208,,C 364 | 363,0,3,"Barbara, Mrs. (Catherine David)",female,45,0,1,2691,14.4542,,C 365 | 364,0,3,"Asim, Mr. Adola",male,35,0,0,SOTON/O.Q. 3101310,7.05,,S 366 | 365,0,3,"O'Brien, Mr. Thomas",male,,1,0,370365,15.5,,Q 367 | 366,0,3,"Adahl, Mr. Mauritz Nils Martin",male,30,0,0,C 7076,7.25,,S 368 | 367,1,1,"Warren, Mrs. Frank Manley (Anna Sophia Atkinson)",female,60,1,0,110813,75.25,D37,C 369 | 368,1,3,"Moussa, Mrs. (Mantoura Boulos)",female,,0,0,2626,7.2292,,C 370 | 369,1,3,"Jermyn, Miss. Annie",female,,0,0,14313,7.75,,Q 371 | 370,1,1,"Aubart, Mme. Leontine Pauline",female,24,0,0,PC 17477,69.3,B35,C 372 | 371,1,1,"Harder, Mr. George Achilles",male,25,1,0,11765,55.4417,E50,C 373 | 372,0,3,"Wiklund, Mr. Jakob Alfred",male,18,1,0,3101267,6.4958,,S 374 | 373,0,3,"Beavan, Mr. William Thomas",male,19,0,0,323951,8.05,,S 375 | 374,0,1,"Ringhini, Mr. Sante",male,22,0,0,PC 17760,135.6333,,C 376 | 375,0,3,"Palsson, Miss. Stina Viola",female,3,3,1,349909,21.075,,S 377 | 376,1,1,"Meyer, Mrs. Edgar Joseph (Leila Saks)",female,,1,0,PC 17604,82.1708,,C 378 | 377,1,3,"Landergren, Miss. Aurora Adelia",female,22,0,0,C 7077,7.25,,S 379 | 378,0,1,"Widener, Mr. Harry Elkins",male,27,0,2,113503,211.5,C82,C 380 | 379,0,3,"Betros, Mr. Tannous",male,20,0,0,2648,4.0125,,C 381 | 380,0,3,"Gustafsson, Mr. Karl Gideon",male,19,0,0,347069,7.775,,S 382 | 381,1,1,"Bidois, Miss. Rosalie",female,42,0,0,PC 17757,227.525,,C 383 | 382,1,3,"Nakid, Miss. Maria (""Mary"")",female,1,0,2,2653,15.7417,,C 384 | 383,0,3,"Tikkanen, Mr. Juho",male,32,0,0,STON/O 2. 3101293,7.925,,S 385 | 384,1,1,"Holverson, Mrs. Alexander Oskar (Mary Aline Towner)",female,35,1,0,113789,52,,S 386 | 385,0,3,"Plotcharsky, Mr. Vasil",male,,0,0,349227,7.8958,,S 387 | 386,0,2,"Davies, Mr. Charles Henry",male,18,0,0,S.O.C. 14879,73.5,,S 388 | 387,0,3,"Goodwin, Master. Sidney Leonard",male,1,5,2,CA 2144,46.9,,S 389 | 388,1,2,"Buss, Miss. Kate",female,36,0,0,27849,13,,S 390 | 389,0,3,"Sadlier, Mr. Matthew",male,,0,0,367655,7.7292,,Q 391 | 390,1,2,"Lehmann, Miss. Bertha",female,17,0,0,SC 1748,12,,C 392 | 391,1,1,"Carter, Mr. William Ernest",male,36,1,2,113760,120,B96 B98,S 393 | 392,1,3,"Jansson, Mr. Carl Olof",male,21,0,0,350034,7.7958,,S 394 | 393,0,3,"Gustafsson, Mr. Johan Birger",male,28,2,0,3101277,7.925,,S 395 | 394,1,1,"Newell, Miss. Marjorie",female,23,1,0,35273,113.275,D36,C 396 | 395,1,3,"Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson)",female,24,0,2,PP 9549,16.7,G6,S 397 | 396,0,3,"Johansson, Mr. Erik",male,22,0,0,350052,7.7958,,S 398 | 397,0,3,"Olsson, Miss. Elina",female,31,0,0,350407,7.8542,,S 399 | 398,0,2,"McKane, Mr. Peter David",male,46,0,0,28403,26,,S 400 | 399,0,2,"Pain, Dr. Alfred",male,23,0,0,244278,10.5,,S 401 | 400,1,2,"Trout, Mrs. William H (Jessie L)",female,28,0,0,240929,12.65,,S 402 | 401,1,3,"Niskanen, Mr. Juha",male,39,0,0,STON/O 2. 3101289,7.925,,S 403 | 402,0,3,"Adams, Mr. John",male,26,0,0,341826,8.05,,S 404 | 403,0,3,"Jussila, Miss. Mari Aina",female,21,1,0,4137,9.825,,S 405 | 404,0,3,"Hakkarainen, Mr. Pekka Pietari",male,28,1,0,STON/O2. 3101279,15.85,,S 406 | 405,0,3,"Oreskovic, Miss. Marija",female,20,0,0,315096,8.6625,,S 407 | 406,0,2,"Gale, Mr. Shadrach",male,34,1,0,28664,21,,S 408 | 407,0,3,"Widegren, Mr. Carl/Charles Peter",male,51,0,0,347064,7.75,,S 409 | 408,1,2,"Richards, Master. William Rowe",male,3,1,1,29106,18.75,,S 410 | 409,0,3,"Birkeland, Mr. Hans Martin Monsen",male,21,0,0,312992,7.775,,S 411 | 410,0,3,"Lefebre, Miss. Ida",female,,3,1,4133,25.4667,,S 412 | 411,0,3,"Sdycoff, Mr. Todor",male,,0,0,349222,7.8958,,S 413 | 412,0,3,"Hart, Mr. Henry",male,,0,0,394140,6.8583,,Q 414 | 413,1,1,"Minahan, Miss. Daisy E",female,33,1,0,19928,90,C78,Q 415 | 414,0,2,"Cunningham, Mr. Alfred Fleming",male,,0,0,239853,0,,S 416 | 415,1,3,"Sundman, Mr. Johan Julian",male,44,0,0,STON/O 2. 3101269,7.925,,S 417 | 416,0,3,"Meek, Mrs. Thomas (Annie Louise Rowley)",female,,0,0,343095,8.05,,S 418 | 417,1,2,"Drew, Mrs. James Vivian (Lulu Thorne Christian)",female,34,1,1,28220,32.5,,S 419 | 418,1,2,"Silven, Miss. Lyyli Karoliina",female,18,0,2,250652,13,,S 420 | 419,0,2,"Matthews, Mr. William John",male,30,0,0,28228,13,,S 421 | 420,0,3,"Van Impe, Miss. Catharina",female,10,0,2,345773,24.15,,S 422 | 421,0,3,"Gheorgheff, Mr. Stanio",male,,0,0,349254,7.8958,,C 423 | 422,0,3,"Charters, Mr. David",male,21,0,0,A/5. 13032,7.7333,,Q 424 | 423,0,3,"Zimmerman, Mr. Leo",male,29,0,0,315082,7.875,,S 425 | 424,0,3,"Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)",female,28,1,1,347080,14.4,,S 426 | 425,0,3,"Rosblom, Mr. Viktor Richard",male,18,1,1,370129,20.2125,,S 427 | 426,0,3,"Wiseman, Mr. Phillippe",male,,0,0,A/4. 34244,7.25,,S 428 | 427,1,2,"Clarke, Mrs. Charles V (Ada Maria Winfield)",female,28,1,0,2003,26,,S 429 | 428,1,2,"Phillips, Miss. Kate Florence (""Mrs Kate Louise Phillips Marshall"")",female,19,0,0,250655,26,,S 430 | 429,0,3,"Flynn, Mr. James",male,,0,0,364851,7.75,,Q 431 | 430,1,3,"Pickard, Mr. Berk (Berk Trembisky)",male,32,0,0,SOTON/O.Q. 392078,8.05,E10,S 432 | 431,1,1,"Bjornstrom-Steffansson, Mr. Mauritz Hakan",male,28,0,0,110564,26.55,C52,S 433 | 432,1,3,"Thorneycroft, Mrs. Percival (Florence Kate White)",female,,1,0,376564,16.1,,S 434 | 433,1,2,"Louch, Mrs. Charles Alexander (Alice Adelaide Slow)",female,42,1,0,SC/AH 3085,26,,S 435 | 434,0,3,"Kallio, Mr. Nikolai Erland",male,17,0,0,STON/O 2. 3101274,7.125,,S 436 | 435,0,1,"Silvey, Mr. William Baird",male,50,1,0,13507,55.9,E44,S 437 | 436,1,1,"Carter, Miss. Lucile Polk",female,14,1,2,113760,120,B96 B98,S 438 | 437,0,3,"Ford, Miss. Doolina Margaret ""Daisy""",female,21,2,2,W./C. 6608,34.375,,S 439 | 438,1,2,"Richards, Mrs. Sidney (Emily Hocking)",female,24,2,3,29106,18.75,,S 440 | 439,0,1,"Fortune, Mr. Mark",male,64,1,4,19950,263,C23 C25 C27,S 441 | 440,0,2,"Kvillner, Mr. Johan Henrik Johannesson",male,31,0,0,C.A. 18723,10.5,,S 442 | 441,1,2,"Hart, Mrs. Benjamin (Esther Ada Bloomfield)",female,45,1,1,F.C.C. 13529,26.25,,S 443 | 442,0,3,"Hampe, Mr. Leon",male,20,0,0,345769,9.5,,S 444 | 443,0,3,"Petterson, Mr. Johan Emil",male,25,1,0,347076,7.775,,S 445 | 444,1,2,"Reynaldo, Ms. Encarnacion",female,28,0,0,230434,13,,S 446 | 445,1,3,"Johannesen-Bratthammer, Mr. Bernt",male,,0,0,65306,8.1125,,S 447 | 446,1,1,"Dodge, Master. Washington",male,4,0,2,33638,81.8583,A34,S 448 | 447,1,2,"Mellinger, Miss. Madeleine Violet",female,13,0,1,250644,19.5,,S 449 | 448,1,1,"Seward, Mr. Frederic Kimber",male,34,0,0,113794,26.55,,S 450 | 449,1,3,"Baclini, Miss. Marie Catherine",female,5,2,1,2666,19.2583,,C 451 | 450,1,1,"Peuchen, Major. Arthur Godfrey",male,52,0,0,113786,30.5,C104,S 452 | 451,0,2,"West, Mr. Edwy Arthur",male,36,1,2,C.A. 34651,27.75,,S 453 | 452,0,3,"Hagland, Mr. Ingvald Olai Olsen",male,,1,0,65303,19.9667,,S 454 | 453,0,1,"Foreman, Mr. Benjamin Laventall",male,30,0,0,113051,27.75,C111,C 455 | 454,1,1,"Goldenberg, Mr. Samuel L",male,49,1,0,17453,89.1042,C92,C 456 | 455,0,3,"Peduzzi, Mr. Joseph",male,,0,0,A/5 2817,8.05,,S 457 | 456,1,3,"Jalsevac, Mr. Ivan",male,29,0,0,349240,7.8958,,C 458 | 457,0,1,"Millet, Mr. Francis Davis",male,65,0,0,13509,26.55,E38,S 459 | 458,1,1,"Kenyon, Mrs. Frederick R (Marion)",female,,1,0,17464,51.8625,D21,S 460 | 459,1,2,"Toomey, Miss. Ellen",female,50,0,0,F.C.C. 13531,10.5,,S 461 | 460,0,3,"O'Connor, Mr. Maurice",male,,0,0,371060,7.75,,Q 462 | 461,1,1,"Anderson, Mr. Harry",male,48,0,0,19952,26.55,E12,S 463 | 462,0,3,"Morley, Mr. William",male,34,0,0,364506,8.05,,S 464 | 463,0,1,"Gee, Mr. Arthur H",male,47,0,0,111320,38.5,E63,S 465 | 464,0,2,"Milling, Mr. Jacob Christian",male,48,0,0,234360,13,,S 466 | 465,0,3,"Maisner, Mr. Simon",male,,0,0,A/S 2816,8.05,,S 467 | 466,0,3,"Goncalves, Mr. Manuel Estanslas",male,38,0,0,SOTON/O.Q. 3101306,7.05,,S 468 | 467,0,2,"Campbell, Mr. William",male,,0,0,239853,0,,S 469 | 468,0,1,"Smart, Mr. John Montgomery",male,56,0,0,113792,26.55,,S 470 | 469,0,3,"Scanlan, Mr. James",male,,0,0,36209,7.725,,Q 471 | 470,1,3,"Baclini, Miss. Helene Barbara",female,0.75,2,1,2666,19.2583,,C 472 | 471,0,3,"Keefe, Mr. Arthur",male,,0,0,323592,7.25,,S 473 | 472,0,3,"Cacic, Mr. Luka",male,38,0,0,315089,8.6625,,S 474 | 473,1,2,"West, Mrs. Edwy Arthur (Ada Mary Worth)",female,33,1,2,C.A. 34651,27.75,,S 475 | 474,1,2,"Jerwan, Mrs. Amin S (Marie Marthe Thuillard)",female,23,0,0,SC/AH Basle 541,13.7917,D,C 476 | 475,0,3,"Strandberg, Miss. Ida Sofia",female,22,0,0,7553,9.8375,,S 477 | 476,0,1,"Clifford, Mr. George Quincy",male,,0,0,110465,52,A14,S 478 | 477,0,2,"Renouf, Mr. Peter Henry",male,34,1,0,31027,21,,S 479 | 478,0,3,"Braund, Mr. Lewis Richard",male,29,1,0,3460,7.0458,,S 480 | 479,0,3,"Karlsson, Mr. Nils August",male,22,0,0,350060,7.5208,,S 481 | 480,1,3,"Hirvonen, Miss. Hildur E",female,2,0,1,3101298,12.2875,,S 482 | 481,0,3,"Goodwin, Master. Harold Victor",male,9,5,2,CA 2144,46.9,,S 483 | 482,0,2,"Frost, Mr. Anthony Wood ""Archie""",male,,0,0,239854,0,,S 484 | 483,0,3,"Rouse, Mr. Richard Henry",male,50,0,0,A/5 3594,8.05,,S 485 | 484,1,3,"Turkula, Mrs. (Hedwig)",female,63,0,0,4134,9.5875,,S 486 | 485,1,1,"Bishop, Mr. Dickinson H",male,25,1,0,11967,91.0792,B49,C 487 | 486,0,3,"Lefebre, Miss. Jeannie",female,,3,1,4133,25.4667,,S 488 | 487,1,1,"Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)",female,35,1,0,19943,90,C93,S 489 | 488,0,1,"Kent, Mr. Edward Austin",male,58,0,0,11771,29.7,B37,C 490 | 489,0,3,"Somerton, Mr. Francis William",male,30,0,0,A.5. 18509,8.05,,S 491 | 490,1,3,"Coutts, Master. Eden Leslie ""Neville""",male,9,1,1,C.A. 37671,15.9,,S 492 | 491,0,3,"Hagland, Mr. Konrad Mathias Reiersen",male,,1,0,65304,19.9667,,S 493 | 492,0,3,"Windelov, Mr. Einar",male,21,0,0,SOTON/OQ 3101317,7.25,,S 494 | 493,0,1,"Molson, Mr. Harry Markland",male,55,0,0,113787,30.5,C30,S 495 | 494,0,1,"Artagaveytia, Mr. Ramon",male,71,0,0,PC 17609,49.5042,,C 496 | 495,0,3,"Stanley, Mr. Edward Roland",male,21,0,0,A/4 45380,8.05,,S 497 | 496,0,3,"Yousseff, Mr. Gerious",male,,0,0,2627,14.4583,,C 498 | 497,1,1,"Eustis, Miss. Elizabeth Mussey",female,54,1,0,36947,78.2667,D20,C 499 | 498,0,3,"Shellard, Mr. Frederick William",male,,0,0,C.A. 6212,15.1,,S 500 | 499,0,1,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25,1,2,113781,151.55,C22 C26,S 501 | 500,0,3,"Svensson, Mr. Olof",male,24,0,0,350035,7.7958,,S 502 | 501,0,3,"Calic, Mr. Petar",male,17,0,0,315086,8.6625,,S 503 | 502,0,3,"Canavan, Miss. Mary",female,21,0,0,364846,7.75,,Q 504 | 503,0,3,"O'Sullivan, Miss. Bridget Mary",female,,0,0,330909,7.6292,,Q 505 | 504,0,3,"Laitinen, Miss. Kristina Sofia",female,37,0,0,4135,9.5875,,S 506 | 505,1,1,"Maioni, Miss. Roberta",female,16,0,0,110152,86.5,B79,S 507 | 506,0,1,"Penasco y Castellana, Mr. Victor de Satode",male,18,1,0,PC 17758,108.9,C65,C 508 | 507,1,2,"Quick, Mrs. Frederick Charles (Jane Richards)",female,33,0,2,26360,26,,S 509 | 508,1,1,"Bradley, Mr. George (""George Arthur Brayton"")",male,,0,0,111427,26.55,,S 510 | 509,0,3,"Olsen, Mr. Henry Margido",male,28,0,0,C 4001,22.525,,S 511 | 510,1,3,"Lang, Mr. Fang",male,26,0,0,1601,56.4958,,S 512 | 511,1,3,"Daly, Mr. Eugene Patrick",male,29,0,0,382651,7.75,,Q 513 | 512,0,3,"Webber, Mr. James",male,,0,0,SOTON/OQ 3101316,8.05,,S 514 | 513,1,1,"McGough, Mr. James Robert",male,36,0,0,PC 17473,26.2875,E25,S 515 | 514,1,1,"Rothschild, Mrs. Martin (Elizabeth L. Barrett)",female,54,1,0,PC 17603,59.4,,C 516 | 515,0,3,"Coleff, Mr. Satio",male,24,0,0,349209,7.4958,,S 517 | 516,0,1,"Walker, Mr. William Anderson",male,47,0,0,36967,34.0208,D46,S 518 | 517,1,2,"Lemore, Mrs. (Amelia Milley)",female,34,0,0,C.A. 34260,10.5,F33,S 519 | 518,0,3,"Ryan, Mr. Patrick",male,,0,0,371110,24.15,,Q 520 | 519,1,2,"Angle, Mrs. William A (Florence ""Mary"" Agnes Hughes)",female,36,1,0,226875,26,,S 521 | 520,0,3,"Pavlovic, Mr. Stefo",male,32,0,0,349242,7.8958,,S 522 | 521,1,1,"Perreault, Miss. Anne",female,30,0,0,12749,93.5,B73,S 523 | 522,0,3,"Vovk, Mr. Janko",male,22,0,0,349252,7.8958,,S 524 | 523,0,3,"Lahoud, Mr. Sarkis",male,,0,0,2624,7.225,,C 525 | 524,1,1,"Hippach, Mrs. Louis Albert (Ida Sophia Fischer)",female,44,0,1,111361,57.9792,B18,C 526 | 525,0,3,"Kassem, Mr. Fared",male,,0,0,2700,7.2292,,C 527 | 526,0,3,"Farrell, Mr. James",male,40.5,0,0,367232,7.75,,Q 528 | 527,1,2,"Ridsdale, Miss. Lucy",female,50,0,0,W./C. 14258,10.5,,S 529 | 528,0,1,"Farthing, Mr. John",male,,0,0,PC 17483,221.7792,C95,S 530 | 529,0,3,"Salonen, Mr. Johan Werner",male,39,0,0,3101296,7.925,,S 531 | 530,0,2,"Hocking, Mr. Richard George",male,23,2,1,29104,11.5,,S 532 | 531,1,2,"Quick, Miss. Phyllis May",female,2,1,1,26360,26,,S 533 | 532,0,3,"Toufik, Mr. Nakli",male,,0,0,2641,7.2292,,C 534 | 533,0,3,"Elias, Mr. Joseph Jr",male,17,1,1,2690,7.2292,,C 535 | 534,1,3,"Peter, Mrs. Catherine (Catherine Rizk)",female,,0,2,2668,22.3583,,C 536 | 535,0,3,"Cacic, Miss. Marija",female,30,0,0,315084,8.6625,,S 537 | 536,1,2,"Hart, Miss. Eva Miriam",female,7,0,2,F.C.C. 13529,26.25,,S 538 | 537,0,1,"Butt, Major. Archibald Willingham",male,45,0,0,113050,26.55,B38,S 539 | 538,1,1,"LeRoy, Miss. Bertha",female,30,0,0,PC 17761,106.425,,C 540 | 539,0,3,"Risien, Mr. Samuel Beard",male,,0,0,364498,14.5,,S 541 | 540,1,1,"Frolicher, Miss. Hedwig Margaritha",female,22,0,2,13568,49.5,B39,C 542 | 541,1,1,"Crosby, Miss. Harriet R",female,36,0,2,WE/P 5735,71,B22,S 543 | 542,0,3,"Andersson, Miss. Ingeborg Constanzia",female,9,4,2,347082,31.275,,S 544 | 543,0,3,"Andersson, Miss. Sigrid Elisabeth",female,11,4,2,347082,31.275,,S 545 | 544,1,2,"Beane, Mr. Edward",male,32,1,0,2908,26,,S 546 | 545,0,1,"Douglas, Mr. Walter Donald",male,50,1,0,PC 17761,106.425,C86,C 547 | 546,0,1,"Nicholson, Mr. Arthur Ernest",male,64,0,0,693,26,,S 548 | 547,1,2,"Beane, Mrs. Edward (Ethel Clarke)",female,19,1,0,2908,26,,S 549 | 548,1,2,"Padro y Manent, Mr. Julian",male,,0,0,SC/PARIS 2146,13.8625,,C 550 | 549,0,3,"Goldsmith, Mr. Frank John",male,33,1,1,363291,20.525,,S 551 | 550,1,2,"Davies, Master. John Morgan Jr",male,8,1,1,C.A. 33112,36.75,,S 552 | 551,1,1,"Thayer, Mr. John Borland Jr",male,17,0,2,17421,110.8833,C70,C 553 | 552,0,2,"Sharp, Mr. Percival James R",male,27,0,0,244358,26,,S 554 | 553,0,3,"O'Brien, Mr. Timothy",male,,0,0,330979,7.8292,,Q 555 | 554,1,3,"Leeni, Mr. Fahim (""Philip Zenni"")",male,22,0,0,2620,7.225,,C 556 | 555,1,3,"Ohman, Miss. Velin",female,22,0,0,347085,7.775,,S 557 | 556,0,1,"Wright, Mr. George",male,62,0,0,113807,26.55,,S 558 | 557,1,1,"Duff Gordon, Lady. (Lucille Christiana Sutherland) (""Mrs Morgan"")",female,48,1,0,11755,39.6,A16,C 559 | 558,0,1,"Robbins, Mr. Victor",male,,0,0,PC 17757,227.525,,C 560 | 559,1,1,"Taussig, Mrs. Emil (Tillie Mandelbaum)",female,39,1,1,110413,79.65,E67,S 561 | 560,1,3,"de Messemaeker, Mrs. Guillaume Joseph (Emma)",female,36,1,0,345572,17.4,,S 562 | 561,0,3,"Morrow, Mr. Thomas Rowan",male,,0,0,372622,7.75,,Q 563 | 562,0,3,"Sivic, Mr. Husein",male,40,0,0,349251,7.8958,,S 564 | 563,0,2,"Norman, Mr. Robert Douglas",male,28,0,0,218629,13.5,,S 565 | 564,0,3,"Simmons, Mr. John",male,,0,0,SOTON/OQ 392082,8.05,,S 566 | 565,0,3,"Meanwell, Miss. (Marion Ogden)",female,,0,0,SOTON/O.Q. 392087,8.05,,S 567 | 566,0,3,"Davies, Mr. Alfred J",male,24,2,0,A/4 48871,24.15,,S 568 | 567,0,3,"Stoytcheff, Mr. Ilia",male,19,0,0,349205,7.8958,,S 569 | 568,0,3,"Palsson, Mrs. Nils (Alma Cornelia Berglund)",female,29,0,4,349909,21.075,,S 570 | 569,0,3,"Doharr, Mr. Tannous",male,,0,0,2686,7.2292,,C 571 | 570,1,3,"Jonsson, Mr. Carl",male,32,0,0,350417,7.8542,,S 572 | 571,1,2,"Harris, Mr. George",male,62,0,0,S.W./PP 752,10.5,,S 573 | 572,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53,2,0,11769,51.4792,C101,S 574 | 573,1,1,"Flynn, Mr. John Irwin (""Irving"")",male,36,0,0,PC 17474,26.3875,E25,S 575 | 574,1,3,"Kelly, Miss. Mary",female,,0,0,14312,7.75,,Q 576 | 575,0,3,"Rush, Mr. Alfred George John",male,16,0,0,A/4. 20589,8.05,,S 577 | 576,0,3,"Patchett, Mr. George",male,19,0,0,358585,14.5,,S 578 | 577,1,2,"Garside, Miss. Ethel",female,34,0,0,243880,13,,S 579 | 578,1,1,"Silvey, Mrs. William Baird (Alice Munger)",female,39,1,0,13507,55.9,E44,S 580 | 579,0,3,"Caram, Mrs. Joseph (Maria Elias)",female,,1,0,2689,14.4583,,C 581 | 580,1,3,"Jussila, Mr. Eiriik",male,32,0,0,STON/O 2. 3101286,7.925,,S 582 | 581,1,2,"Christy, Miss. Julie Rachel",female,25,1,1,237789,30,,S 583 | 582,1,1,"Thayer, Mrs. John Borland (Marian Longstreth Morris)",female,39,1,1,17421,110.8833,C68,C 584 | 583,0,2,"Downton, Mr. William James",male,54,0,0,28403,26,,S 585 | 584,0,1,"Ross, Mr. John Hugo",male,36,0,0,13049,40.125,A10,C 586 | 585,0,3,"Paulner, Mr. Uscher",male,,0,0,3411,8.7125,,C 587 | 586,1,1,"Taussig, Miss. Ruth",female,18,0,2,110413,79.65,E68,S 588 | 587,0,2,"Jarvis, Mr. John Denzil",male,47,0,0,237565,15,,S 589 | 588,1,1,"Frolicher-Stehli, Mr. Maxmillian",male,60,1,1,13567,79.2,B41,C 590 | 589,0,3,"Gilinski, Mr. Eliezer",male,22,0,0,14973,8.05,,S 591 | 590,0,3,"Murdlin, Mr. Joseph",male,,0,0,A./5. 3235,8.05,,S 592 | 591,0,3,"Rintamaki, Mr. Matti",male,35,0,0,STON/O 2. 3101273,7.125,,S 593 | 592,1,1,"Stephenson, Mrs. Walter Bertram (Martha Eustis)",female,52,1,0,36947,78.2667,D20,C 594 | 593,0,3,"Elsbury, Mr. William James",male,47,0,0,A/5 3902,7.25,,S 595 | 594,0,3,"Bourke, Miss. Mary",female,,0,2,364848,7.75,,Q 596 | 595,0,2,"Chapman, Mr. John Henry",male,37,1,0,SC/AH 29037,26,,S 597 | 596,0,3,"Van Impe, Mr. Jean Baptiste",male,36,1,1,345773,24.15,,S 598 | 597,1,2,"Leitch, Miss. Jessie Wills",female,,0,0,248727,33,,S 599 | 598,0,3,"Johnson, Mr. Alfred",male,49,0,0,LINE,0,,S 600 | 599,0,3,"Boulos, Mr. Hanna",male,,0,0,2664,7.225,,C 601 | 600,1,1,"Duff Gordon, Sir. Cosmo Edmund (""Mr Morgan"")",male,49,1,0,PC 17485,56.9292,A20,C 602 | 601,1,2,"Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)",female,24,2,1,243847,27,,S 603 | 602,0,3,"Slabenoff, Mr. Petco",male,,0,0,349214,7.8958,,S 604 | 603,0,1,"Harrington, Mr. Charles H",male,,0,0,113796,42.4,,S 605 | 604,0,3,"Torber, Mr. Ernst William",male,44,0,0,364511,8.05,,S 606 | 605,1,1,"Homer, Mr. Harry (""Mr E Haven"")",male,35,0,0,111426,26.55,,C 607 | 606,0,3,"Lindell, Mr. Edvard Bengtsson",male,36,1,0,349910,15.55,,S 608 | 607,0,3,"Karaic, Mr. Milan",male,30,0,0,349246,7.8958,,S 609 | 608,1,1,"Daniel, Mr. Robert Williams",male,27,0,0,113804,30.5,,S 610 | 609,1,2,"Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue)",female,22,1,2,SC/Paris 2123,41.5792,,C 611 | 610,1,1,"Shutes, Miss. Elizabeth W",female,40,0,0,PC 17582,153.4625,C125,S 612 | 611,0,3,"Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)",female,39,1,5,347082,31.275,,S 613 | 612,0,3,"Jardin, Mr. Jose Neto",male,,0,0,SOTON/O.Q. 3101305,7.05,,S 614 | 613,1,3,"Murphy, Miss. Margaret Jane",female,,1,0,367230,15.5,,Q 615 | 614,0,3,"Horgan, Mr. John",male,,0,0,370377,7.75,,Q 616 | 615,0,3,"Brocklebank, Mr. William Alfred",male,35,0,0,364512,8.05,,S 617 | 616,1,2,"Herman, Miss. Alice",female,24,1,2,220845,65,,S 618 | 617,0,3,"Danbom, Mr. Ernst Gilbert",male,34,1,1,347080,14.4,,S 619 | 618,0,3,"Lobb, Mrs. William Arthur (Cordelia K Stanlick)",female,26,1,0,A/5. 3336,16.1,,S 620 | 619,1,2,"Becker, Miss. Marion Louise",female,4,2,1,230136,39,F4,S 621 | 620,0,2,"Gavey, Mr. Lawrence",male,26,0,0,31028,10.5,,S 622 | 621,0,3,"Yasbeck, Mr. Antoni",male,27,1,0,2659,14.4542,,C 623 | 622,1,1,"Kimball, Mr. Edwin Nelson Jr",male,42,1,0,11753,52.5542,D19,S 624 | 623,1,3,"Nakid, Mr. Sahid",male,20,1,1,2653,15.7417,,C 625 | 624,0,3,"Hansen, Mr. Henry Damsgaard",male,21,0,0,350029,7.8542,,S 626 | 625,0,3,"Bowen, Mr. David John ""Dai""",male,21,0,0,54636,16.1,,S 627 | 626,0,1,"Sutton, Mr. Frederick",male,61,0,0,36963,32.3208,D50,S 628 | 627,0,2,"Kirkland, Rev. Charles Leonard",male,57,0,0,219533,12.35,,Q 629 | 628,1,1,"Longley, Miss. Gretchen Fiske",female,21,0,0,13502,77.9583,D9,S 630 | 629,0,3,"Bostandyeff, Mr. Guentcho",male,26,0,0,349224,7.8958,,S 631 | 630,0,3,"O'Connell, Mr. Patrick D",male,,0,0,334912,7.7333,,Q 632 | 631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80,0,0,27042,30,A23,S 633 | 632,0,3,"Lundahl, Mr. Johan Svensson",male,51,0,0,347743,7.0542,,S 634 | 633,1,1,"Stahelin-Maeglin, Dr. Max",male,32,0,0,13214,30.5,B50,C 635 | 634,0,1,"Parr, Mr. William Henry Marsh",male,,0,0,112052,0,,S 636 | 635,0,3,"Skoog, Miss. Mabel",female,9,3,2,347088,27.9,,S 637 | 636,1,2,"Davis, Miss. Mary",female,28,0,0,237668,13,,S 638 | 637,0,3,"Leinonen, Mr. Antti Gustaf",male,32,0,0,STON/O 2. 3101292,7.925,,S 639 | 638,0,2,"Collyer, Mr. Harvey",male,31,1,1,C.A. 31921,26.25,,S 640 | 639,0,3,"Panula, Mrs. Juha (Maria Emilia Ojala)",female,41,0,5,3101295,39.6875,,S 641 | 640,0,3,"Thorneycroft, Mr. Percival",male,,1,0,376564,16.1,,S 642 | 641,0,3,"Jensen, Mr. Hans Peder",male,20,0,0,350050,7.8542,,S 643 | 642,1,1,"Sagesser, Mlle. Emma",female,24,0,0,PC 17477,69.3,B35,C 644 | 643,0,3,"Skoog, Miss. Margit Elizabeth",female,2,3,2,347088,27.9,,S 645 | 644,1,3,"Foo, Mr. Choong",male,,0,0,1601,56.4958,,S 646 | 645,1,3,"Baclini, Miss. Eugenie",female,0.75,2,1,2666,19.2583,,C 647 | 646,1,1,"Harper, Mr. Henry Sleeper",male,48,1,0,PC 17572,76.7292,D33,C 648 | 647,0,3,"Cor, Mr. Liudevit",male,19,0,0,349231,7.8958,,S 649 | 648,1,1,"Simonius-Blumer, Col. Oberst Alfons",male,56,0,0,13213,35.5,A26,C 650 | 649,0,3,"Willey, Mr. Edward",male,,0,0,S.O./P.P. 751,7.55,,S 651 | 650,1,3,"Stanley, Miss. Amy Zillah Elsie",female,23,0,0,CA. 2314,7.55,,S 652 | 651,0,3,"Mitkoff, Mr. Mito",male,,0,0,349221,7.8958,,S 653 | 652,1,2,"Doling, Miss. Elsie",female,18,0,1,231919,23,,S 654 | 653,0,3,"Kalvik, Mr. Johannes Halvorsen",male,21,0,0,8475,8.4333,,S 655 | 654,1,3,"O'Leary, Miss. Hanora ""Norah""",female,,0,0,330919,7.8292,,Q 656 | 655,0,3,"Hegarty, Miss. Hanora ""Nora""",female,18,0,0,365226,6.75,,Q 657 | 656,0,2,"Hickman, Mr. Leonard Mark",male,24,2,0,S.O.C. 14879,73.5,,S 658 | 657,0,3,"Radeff, Mr. Alexander",male,,0,0,349223,7.8958,,S 659 | 658,0,3,"Bourke, Mrs. John (Catherine)",female,32,1,1,364849,15.5,,Q 660 | 659,0,2,"Eitemiller, Mr. George Floyd",male,23,0,0,29751,13,,S 661 | 660,0,1,"Newell, Mr. Arthur Webster",male,58,0,2,35273,113.275,D48,C 662 | 661,1,1,"Frauenthal, Dr. Henry William",male,50,2,0,PC 17611,133.65,,S 663 | 662,0,3,"Badt, Mr. Mohamed",male,40,0,0,2623,7.225,,C 664 | 663,0,1,"Colley, Mr. Edward Pomeroy",male,47,0,0,5727,25.5875,E58,S 665 | 664,0,3,"Coleff, Mr. Peju",male,36,0,0,349210,7.4958,,S 666 | 665,1,3,"Lindqvist, Mr. Eino William",male,20,1,0,STON/O 2. 3101285,7.925,,S 667 | 666,0,2,"Hickman, Mr. Lewis",male,32,2,0,S.O.C. 14879,73.5,,S 668 | 667,0,2,"Butler, Mr. Reginald Fenton",male,25,0,0,234686,13,,S 669 | 668,0,3,"Rommetvedt, Mr. Knud Paust",male,,0,0,312993,7.775,,S 670 | 669,0,3,"Cook, Mr. Jacob",male,43,0,0,A/5 3536,8.05,,S 671 | 670,1,1,"Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright)",female,,1,0,19996,52,C126,S 672 | 671,1,2,"Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)",female,40,1,1,29750,39,,S 673 | 672,0,1,"Davidson, Mr. Thornton",male,31,1,0,F.C. 12750,52,B71,S 674 | 673,0,2,"Mitchell, Mr. Henry Michael",male,70,0,0,C.A. 24580,10.5,,S 675 | 674,1,2,"Wilhelms, Mr. Charles",male,31,0,0,244270,13,,S 676 | 675,0,2,"Watson, Mr. Ennis Hastings",male,,0,0,239856,0,,S 677 | 676,0,3,"Edvardsson, Mr. Gustaf Hjalmar",male,18,0,0,349912,7.775,,S 678 | 677,0,3,"Sawyer, Mr. Frederick Charles",male,24.5,0,0,342826,8.05,,S 679 | 678,1,3,"Turja, Miss. Anna Sofia",female,18,0,0,4138,9.8417,,S 680 | 679,0,3,"Goodwin, Mrs. Frederick (Augusta Tyler)",female,43,1,6,CA 2144,46.9,,S 681 | 680,1,1,"Cardeza, Mr. Thomas Drake Martinez",male,36,0,1,PC 17755,512.3292,B51 B53 B55,C 682 | 681,0,3,"Peters, Miss. Katie",female,,0,0,330935,8.1375,,Q 683 | 682,1,1,"Hassab, Mr. Hammad",male,27,0,0,PC 17572,76.7292,D49,C 684 | 683,0,3,"Olsvigen, Mr. Thor Anderson",male,20,0,0,6563,9.225,,S 685 | 684,0,3,"Goodwin, Mr. Charles Edward",male,14,5,2,CA 2144,46.9,,S 686 | 685,0,2,"Brown, Mr. Thomas William Solomon",male,60,1,1,29750,39,,S 687 | 686,0,2,"Laroche, Mr. Joseph Philippe Lemercier",male,25,1,2,SC/Paris 2123,41.5792,,C 688 | 687,0,3,"Panula, Mr. Jaako Arnold",male,14,4,1,3101295,39.6875,,S 689 | 688,0,3,"Dakic, Mr. Branko",male,19,0,0,349228,10.1708,,S 690 | 689,0,3,"Fischer, Mr. Eberhard Thelander",male,18,0,0,350036,7.7958,,S 691 | 690,1,1,"Madill, Miss. Georgette Alexandra",female,15,0,1,24160,211.3375,B5,S 692 | 691,1,1,"Dick, Mr. Albert Adrian",male,31,1,0,17474,57,B20,S 693 | 692,1,3,"Karun, Miss. Manca",female,4,0,1,349256,13.4167,,C 694 | 693,1,3,"Lam, Mr. Ali",male,,0,0,1601,56.4958,,S 695 | 694,0,3,"Saad, Mr. Khalil",male,25,0,0,2672,7.225,,C 696 | 695,0,1,"Weir, Col. John",male,60,0,0,113800,26.55,,S 697 | 696,0,2,"Chapman, Mr. Charles Henry",male,52,0,0,248731,13.5,,S 698 | 697,0,3,"Kelly, Mr. James",male,44,0,0,363592,8.05,,S 699 | 698,1,3,"Mullens, Miss. Katherine ""Katie""",female,,0,0,35852,7.7333,,Q 700 | 699,0,1,"Thayer, Mr. John Borland",male,49,1,1,17421,110.8833,C68,C 701 | 700,0,3,"Humblen, Mr. Adolf Mathias Nicolai Olsen",male,42,0,0,348121,7.65,F G63,S 702 | 701,1,1,"Astor, Mrs. John Jacob (Madeleine Talmadge Force)",female,18,1,0,PC 17757,227.525,C62 C64,C 703 | 702,1,1,"Silverthorne, Mr. Spencer Victor",male,35,0,0,PC 17475,26.2875,E24,S 704 | 703,0,3,"Barbara, Miss. Saiide",female,18,0,1,2691,14.4542,,C 705 | 704,0,3,"Gallagher, Mr. Martin",male,25,0,0,36864,7.7417,,Q 706 | 705,0,3,"Hansen, Mr. Henrik Juul",male,26,1,0,350025,7.8542,,S 707 | 706,0,2,"Morley, Mr. Henry Samuel (""Mr Henry Marshall"")",male,39,0,0,250655,26,,S 708 | 707,1,2,"Kelly, Mrs. Florence ""Fannie""",female,45,0,0,223596,13.5,,S 709 | 708,1,1,"Calderhead, Mr. Edward Pennington",male,42,0,0,PC 17476,26.2875,E24,S 710 | 709,1,1,"Cleaver, Miss. Alice",female,22,0,0,113781,151.55,,S 711 | 710,1,3,"Moubarek, Master. Halim Gonios (""William George"")",male,,1,1,2661,15.2458,,C 712 | 711,1,1,"Mayne, Mlle. Berthe Antonine (""Mrs de Villiers"")",female,24,0,0,PC 17482,49.5042,C90,C 713 | 712,0,1,"Klaber, Mr. Herman",male,,0,0,113028,26.55,C124,S 714 | 713,1,1,"Taylor, Mr. Elmer Zebley",male,48,1,0,19996,52,C126,S 715 | 714,0,3,"Larsson, Mr. August Viktor",male,29,0,0,7545,9.4833,,S 716 | 715,0,2,"Greenberg, Mr. Samuel",male,52,0,0,250647,13,,S 717 | 716,0,3,"Soholt, Mr. Peter Andreas Lauritz Andersen",male,19,0,0,348124,7.65,F G73,S 718 | 717,1,1,"Endres, Miss. Caroline Louise",female,38,0,0,PC 17757,227.525,C45,C 719 | 718,1,2,"Troutt, Miss. Edwina Celia ""Winnie""",female,27,0,0,34218,10.5,E101,S 720 | 719,0,3,"McEvoy, Mr. Michael",male,,0,0,36568,15.5,,Q 721 | 720,0,3,"Johnson, Mr. Malkolm Joackim",male,33,0,0,347062,7.775,,S 722 | 721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6,0,1,248727,33,,S 723 | 722,0,3,"Jensen, Mr. Svend Lauritz",male,17,1,0,350048,7.0542,,S 724 | 723,0,2,"Gillespie, Mr. William Henry",male,34,0,0,12233,13,,S 725 | 724,0,2,"Hodges, Mr. Henry Price",male,50,0,0,250643,13,,S 726 | 725,1,1,"Chambers, Mr. Norman Campbell",male,27,1,0,113806,53.1,E8,S 727 | 726,0,3,"Oreskovic, Mr. Luka",male,20,0,0,315094,8.6625,,S 728 | 727,1,2,"Renouf, Mrs. Peter Henry (Lillian Jefferys)",female,30,3,0,31027,21,,S 729 | 728,1,3,"Mannion, Miss. Margareth",female,,0,0,36866,7.7375,,Q 730 | 729,0,2,"Bryhl, Mr. Kurt Arnold Gottfrid",male,25,1,0,236853,26,,S 731 | 730,0,3,"Ilmakangas, Miss. Pieta Sofia",female,25,1,0,STON/O2. 3101271,7.925,,S 732 | 731,1,1,"Allen, Miss. Elisabeth Walton",female,29,0,0,24160,211.3375,B5,S 733 | 732,0,3,"Hassan, Mr. Houssein G N",male,11,0,0,2699,18.7875,,C 734 | 733,0,2,"Knight, Mr. Robert J",male,,0,0,239855,0,,S 735 | 734,0,2,"Berriman, Mr. William John",male,23,0,0,28425,13,,S 736 | 735,0,2,"Troupiansky, Mr. Moses Aaron",male,23,0,0,233639,13,,S 737 | 736,0,3,"Williams, Mr. Leslie",male,28.5,0,0,54636,16.1,,S 738 | 737,0,3,"Ford, Mrs. Edward (Margaret Ann Watson)",female,48,1,3,W./C. 6608,34.375,,S 739 | 738,1,1,"Lesurer, Mr. Gustave J",male,35,0,0,PC 17755,512.3292,B101,C 740 | 739,0,3,"Ivanoff, Mr. Kanio",male,,0,0,349201,7.8958,,S 741 | 740,0,3,"Nankoff, Mr. Minko",male,,0,0,349218,7.8958,,S 742 | 741,1,1,"Hawksford, Mr. Walter James",male,,0,0,16988,30,D45,S 743 | 742,0,1,"Cavendish, Mr. Tyrell William",male,36,1,0,19877,78.85,C46,S 744 | 743,1,1,"Ryerson, Miss. Susan Parker ""Suzette""",female,21,2,2,PC 17608,262.375,B57 B59 B63 B66,C 745 | 744,0,3,"McNamee, Mr. Neal",male,24,1,0,376566,16.1,,S 746 | 745,1,3,"Stranden, Mr. Juho",male,31,0,0,STON/O 2. 3101288,7.925,,S 747 | 746,0,1,"Crosby, Capt. Edward Gifford",male,70,1,1,WE/P 5735,71,B22,S 748 | 747,0,3,"Abbott, Mr. Rossmore Edward",male,16,1,1,C.A. 2673,20.25,,S 749 | 748,1,2,"Sinkkonen, Miss. Anna",female,30,0,0,250648,13,,S 750 | 749,0,1,"Marvin, Mr. Daniel Warner",male,19,1,0,113773,53.1,D30,S 751 | 750,0,3,"Connaghton, Mr. Michael",male,31,0,0,335097,7.75,,Q 752 | 751,1,2,"Wells, Miss. Joan",female,4,1,1,29103,23,,S 753 | 752,1,3,"Moor, Master. Meier",male,6,0,1,392096,12.475,E121,S 754 | 753,0,3,"Vande Velde, Mr. Johannes Joseph",male,33,0,0,345780,9.5,,S 755 | 754,0,3,"Jonkoff, Mr. Lalio",male,23,0,0,349204,7.8958,,S 756 | 755,1,2,"Herman, Mrs. Samuel (Jane Laver)",female,48,1,2,220845,65,,S 757 | 756,1,2,"Hamalainen, Master. Viljo",male,0.67,1,1,250649,14.5,,S 758 | 757,0,3,"Carlsson, Mr. August Sigfrid",male,28,0,0,350042,7.7958,,S 759 | 758,0,2,"Bailey, Mr. Percy Andrew",male,18,0,0,29108,11.5,,S 760 | 759,0,3,"Theobald, Mr. Thomas Leonard",male,34,0,0,363294,8.05,,S 761 | 760,1,1,"Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)",female,33,0,0,110152,86.5,B77,S 762 | 761,0,3,"Garfirth, Mr. John",male,,0,0,358585,14.5,,S 763 | 762,0,3,"Nirva, Mr. Iisakki Antino Aijo",male,41,0,0,SOTON/O2 3101272,7.125,,S 764 | 763,1,3,"Barah, Mr. Hanna Assi",male,20,0,0,2663,7.2292,,C 765 | 764,1,1,"Carter, Mrs. William Ernest (Lucile Polk)",female,36,1,2,113760,120,B96 B98,S 766 | 765,0,3,"Eklund, Mr. Hans Linus",male,16,0,0,347074,7.775,,S 767 | 766,1,1,"Hogeboom, Mrs. John C (Anna Andrews)",female,51,1,0,13502,77.9583,D11,S 768 | 767,0,1,"Brewe, Dr. Arthur Jackson",male,,0,0,112379,39.6,,C 769 | 768,0,3,"Mangan, Miss. Mary",female,30.5,0,0,364850,7.75,,Q 770 | 769,0,3,"Moran, Mr. Daniel J",male,,1,0,371110,24.15,,Q 771 | 770,0,3,"Gronnestad, Mr. Daniel Danielsen",male,32,0,0,8471,8.3625,,S 772 | 771,0,3,"Lievens, Mr. Rene Aime",male,24,0,0,345781,9.5,,S 773 | 772,0,3,"Jensen, Mr. Niels Peder",male,48,0,0,350047,7.8542,,S 774 | 773,0,2,"Mack, Mrs. (Mary)",female,57,0,0,S.O./P.P. 3,10.5,E77,S 775 | 774,0,3,"Elias, Mr. Dibo",male,,0,0,2674,7.225,,C 776 | 775,1,2,"Hocking, Mrs. Elizabeth (Eliza Needs)",female,54,1,3,29105,23,,S 777 | 776,0,3,"Myhrman, Mr. Pehr Fabian Oliver Malkolm",male,18,0,0,347078,7.75,,S 778 | 777,0,3,"Tobin, Mr. Roger",male,,0,0,383121,7.75,F38,Q 779 | 778,1,3,"Emanuel, Miss. Virginia Ethel",female,5,0,0,364516,12.475,,S 780 | 779,0,3,"Kilgannon, Mr. Thomas J",male,,0,0,36865,7.7375,,Q 781 | 780,1,1,"Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)",female,43,0,1,24160,211.3375,B3,S 782 | 781,1,3,"Ayoub, Miss. Banoura",female,13,0,0,2687,7.2292,,C 783 | 782,1,1,"Dick, Mrs. Albert Adrian (Vera Gillespie)",female,17,1,0,17474,57,B20,S 784 | 783,0,1,"Long, Mr. Milton Clyde",male,29,0,0,113501,30,D6,S 785 | 784,0,3,"Johnston, Mr. Andrew G",male,,1,2,W./C. 6607,23.45,,S 786 | 785,0,3,"Ali, Mr. William",male,25,0,0,SOTON/O.Q. 3101312,7.05,,S 787 | 786,0,3,"Harmer, Mr. Abraham (David Lishin)",male,25,0,0,374887,7.25,,S 788 | 787,1,3,"Sjoblom, Miss. Anna Sofia",female,18,0,0,3101265,7.4958,,S 789 | 788,0,3,"Rice, Master. George Hugh",male,8,4,1,382652,29.125,,Q 790 | 789,1,3,"Dean, Master. Bertram Vere",male,1,1,2,C.A. 2315,20.575,,S 791 | 790,0,1,"Guggenheim, Mr. Benjamin",male,46,0,0,PC 17593,79.2,B82 B84,C 792 | 791,0,3,"Keane, Mr. Andrew ""Andy""",male,,0,0,12460,7.75,,Q 793 | 792,0,2,"Gaskell, Mr. Alfred",male,16,0,0,239865,26,,S 794 | 793,0,3,"Sage, Miss. Stella Anna",female,,8,2,CA. 2343,69.55,,S 795 | 794,0,1,"Hoyt, Mr. William Fisher",male,,0,0,PC 17600,30.6958,,C 796 | 795,0,3,"Dantcheff, Mr. Ristiu",male,25,0,0,349203,7.8958,,S 797 | 796,0,2,"Otter, Mr. Richard",male,39,0,0,28213,13,,S 798 | 797,1,1,"Leader, Dr. Alice (Farnham)",female,49,0,0,17465,25.9292,D17,S 799 | 798,1,3,"Osman, Mrs. Mara",female,31,0,0,349244,8.6833,,S 800 | 799,0,3,"Ibrahim Shawah, Mr. Yousseff",male,30,0,0,2685,7.2292,,C 801 | 800,0,3,"Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)",female,30,1,1,345773,24.15,,S 802 | 801,0,2,"Ponesell, Mr. Martin",male,34,0,0,250647,13,,S 803 | 802,1,2,"Collyer, Mrs. Harvey (Charlotte Annie Tate)",female,31,1,1,C.A. 31921,26.25,,S 804 | 803,1,1,"Carter, Master. William Thornton II",male,11,1,2,113760,120,B96 B98,S 805 | 804,1,3,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C 806 | 805,1,3,"Hedman, Mr. Oskar Arvid",male,27,0,0,347089,6.975,,S 807 | 806,0,3,"Johansson, Mr. Karl Johan",male,31,0,0,347063,7.775,,S 808 | 807,0,1,"Andrews, Mr. Thomas Jr",male,39,0,0,112050,0,A36,S 809 | 808,0,3,"Pettersson, Miss. Ellen Natalia",female,18,0,0,347087,7.775,,S 810 | 809,0,2,"Meyer, Mr. August",male,39,0,0,248723,13,,S 811 | 810,1,1,"Chambers, Mrs. Norman Campbell (Bertha Griggs)",female,33,1,0,113806,53.1,E8,S 812 | 811,0,3,"Alexander, Mr. William",male,26,0,0,3474,7.8875,,S 813 | 812,0,3,"Lester, Mr. James",male,39,0,0,A/4 48871,24.15,,S 814 | 813,0,2,"Slemen, Mr. Richard James",male,35,0,0,28206,10.5,,S 815 | 814,0,3,"Andersson, Miss. Ebba Iris Alfrida",female,6,4,2,347082,31.275,,S 816 | 815,0,3,"Tomlin, Mr. Ernest Portage",male,30.5,0,0,364499,8.05,,S 817 | 816,0,1,"Fry, Mr. Richard",male,,0,0,112058,0,B102,S 818 | 817,0,3,"Heininen, Miss. Wendla Maria",female,23,0,0,STON/O2. 3101290,7.925,,S 819 | 818,0,2,"Mallet, Mr. Albert",male,31,1,1,S.C./PARIS 2079,37.0042,,C 820 | 819,0,3,"Holm, Mr. John Fredrik Alexander",male,43,0,0,C 7075,6.45,,S 821 | 820,0,3,"Skoog, Master. Karl Thorsten",male,10,3,2,347088,27.9,,S 822 | 821,1,1,"Hays, Mrs. Charles Melville (Clara Jennings Gregg)",female,52,1,1,12749,93.5,B69,S 823 | 822,1,3,"Lulic, Mr. Nikola",male,27,0,0,315098,8.6625,,S 824 | 823,0,1,"Reuchlin, Jonkheer. John George",male,38,0,0,19972,0,,S 825 | 824,1,3,"Moor, Mrs. (Beila)",female,27,0,1,392096,12.475,E121,S 826 | 825,0,3,"Panula, Master. Urho Abraham",male,2,4,1,3101295,39.6875,,S 827 | 826,0,3,"Flynn, Mr. John",male,,0,0,368323,6.95,,Q 828 | 827,0,3,"Lam, Mr. Len",male,,0,0,1601,56.4958,,S 829 | 828,1,2,"Mallet, Master. Andre",male,1,0,2,S.C./PARIS 2079,37.0042,,C 830 | 829,1,3,"McCormack, Mr. Thomas Joseph",male,,0,0,367228,7.75,,Q 831 | 830,1,1,"Stone, Mrs. George Nelson (Martha Evelyn)",female,62,0,0,113572,80,B28, 832 | 831,1,3,"Yasbeck, Mrs. Antoni (Selini Alexander)",female,15,1,0,2659,14.4542,,C 833 | 832,1,2,"Richards, Master. George Sibley",male,0.83,1,1,29106,18.75,,S 834 | 833,0,3,"Saad, Mr. Amin",male,,0,0,2671,7.2292,,C 835 | 834,0,3,"Augustsson, Mr. Albert",male,23,0,0,347468,7.8542,,S 836 | 835,0,3,"Allum, Mr. Owen George",male,18,0,0,2223,8.3,,S 837 | 836,1,1,"Compton, Miss. Sara Rebecca",female,39,1,1,PC 17756,83.1583,E49,C 838 | 837,0,3,"Pasic, Mr. Jakob",male,21,0,0,315097,8.6625,,S 839 | 838,0,3,"Sirota, Mr. Maurice",male,,0,0,392092,8.05,,S 840 | 839,1,3,"Chip, Mr. Chang",male,32,0,0,1601,56.4958,,S 841 | 840,1,1,"Marechal, Mr. Pierre",male,,0,0,11774,29.7,C47,C 842 | 841,0,3,"Alhomaki, Mr. Ilmari Rudolf",male,20,0,0,SOTON/O2 3101287,7.925,,S 843 | 842,0,2,"Mudd, Mr. Thomas Charles",male,16,0,0,S.O./P.P. 3,10.5,,S 844 | 843,1,1,"Serepeca, Miss. Augusta",female,30,0,0,113798,31,,C 845 | 844,0,3,"Lemberopolous, Mr. Peter L",male,34.5,0,0,2683,6.4375,,C 846 | 845,0,3,"Culumovic, Mr. Jeso",male,17,0,0,315090,8.6625,,S 847 | 846,0,3,"Abbing, Mr. Anthony",male,42,0,0,C.A. 5547,7.55,,S 848 | 847,0,3,"Sage, Mr. Douglas Bullen",male,,8,2,CA. 2343,69.55,,S 849 | 848,0,3,"Markoff, Mr. Marin",male,35,0,0,349213,7.8958,,C 850 | 849,0,2,"Harper, Rev. John",male,28,0,1,248727,33,,S 851 | 850,1,1,"Goldenberg, Mrs. Samuel L (Edwiga Grabowska)",female,,1,0,17453,89.1042,C92,C 852 | 851,0,3,"Andersson, Master. Sigvard Harald Elias",male,4,4,2,347082,31.275,,S 853 | 852,0,3,"Svensson, Mr. Johan",male,74,0,0,347060,7.775,,S 854 | 853,0,3,"Boulos, Miss. Nourelain",female,9,1,1,2678,15.2458,,C 855 | 854,1,1,"Lines, Miss. Mary Conover",female,16,0,1,PC 17592,39.4,D28,S 856 | 855,0,2,"Carter, Mrs. Ernest Courtenay (Lilian Hughes)",female,44,1,0,244252,26,,S 857 | 856,1,3,"Aks, Mrs. Sam (Leah Rosen)",female,18,0,1,392091,9.35,,S 858 | 857,1,1,"Wick, Mrs. George Dennick (Mary Hitchcock)",female,45,1,1,36928,164.8667,,S 859 | 858,1,1,"Daly, Mr. Peter Denis ",male,51,0,0,113055,26.55,E17,S 860 | 859,1,3,"Baclini, Mrs. Solomon (Latifa Qurban)",female,24,0,3,2666,19.2583,,C 861 | 860,0,3,"Razi, Mr. Raihed",male,,0,0,2629,7.2292,,C 862 | 861,0,3,"Hansen, Mr. Claus Peter",male,41,2,0,350026,14.1083,,S 863 | 862,0,2,"Giles, Mr. Frederick Edward",male,21,1,0,28134,11.5,,S 864 | 863,1,1,"Swift, Mrs. Frederick Joel (Margaret Welles Barron)",female,48,0,0,17466,25.9292,D17,S 865 | 864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.55,,S 866 | 865,0,2,"Gill, Mr. John William",male,24,0,0,233866,13,,S 867 | 866,1,2,"Bystrom, Mrs. (Karolina)",female,42,0,0,236852,13,,S 868 | 867,1,2,"Duran y More, Miss. Asuncion",female,27,1,0,SC/PARIS 2149,13.8583,,C 869 | 868,0,1,"Roebling, Mr. Washington Augustus II",male,31,0,0,PC 17590,50.4958,A24,S 870 | 869,0,3,"van Melkebeke, Mr. Philemon",male,,0,0,345777,9.5,,S 871 | 870,1,3,"Johnson, Master. Harold Theodor",male,4,1,1,347742,11.1333,,S 872 | 871,0,3,"Balkic, Mr. Cerin",male,26,0,0,349248,7.8958,,S 873 | 872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47,1,1,11751,52.5542,D35,S 874 | 873,0,1,"Carlsson, Mr. Frans Olof",male,33,0,0,695,5,B51 B53 B55,S 875 | 874,0,3,"Vander Cruyssen, Mr. Victor",male,47,0,0,345765,9,,S 876 | 875,1,2,"Abelson, Mrs. Samuel (Hannah Wizosky)",female,28,1,0,P/PP 3381,24,,C 877 | 876,1,3,"Najib, Miss. Adele Kiamie ""Jane""",female,15,0,0,2667,7.225,,C 878 | 877,0,3,"Gustafsson, Mr. Alfred Ossian",male,20,0,0,7534,9.8458,,S 879 | 878,0,3,"Petroff, Mr. Nedelio",male,19,0,0,349212,7.8958,,S 880 | 879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S 881 | 880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56,0,1,11767,83.1583,C50,C 882 | 881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25,0,1,230433,26,,S 883 | 882,0,3,"Markun, Mr. Johann",male,33,0,0,349257,7.8958,,S 884 | 883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,22,0,0,7552,10.5167,,S 885 | 884,0,2,"Banfield, Mr. Frederick James",male,28,0,0,C.A./SOTON 34068,10.5,,S 886 | 885,0,3,"Sutehall, Mr. Henry Jr",male,25,0,0,SOTON/OQ 392076,7.05,,S 887 | 886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39,0,5,382652,29.125,,Q 888 | 887,0,2,"Montvila, Rev. Juozas",male,27,0,0,211536,13,,S 889 | 888,1,1,"Graham, Miss. Margaret Edith",female,19,0,0,112053,30,B42,S 890 | 889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S 891 | 890,1,1,"Behr, Mr. Karl Howell",male,26,0,0,111369,30,C148,C 892 | 891,0,3,"Dooley, Mr. Patrick",male,32,0,0,370376,7.75,,Q 893 | -------------------------------------------------------------------------------- /data/words.txt: -------------------------------------------------------------------------------- 1 | hello world 2 | hello pyspark 3 | spark context 4 | i like spark 5 | hadoop rdd 6 | text file 7 | word count 8 | 9 | 10 | -------------------------------------------------------------------------------- /figures/20newsgroups.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vsmolyakov/pyspark/7106e747c0280f77403e157c42675a30f1509d10/figures/20newsgroups.png -------------------------------------------------------------------------------- /figures/als.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vsmolyakov/pyspark/7106e747c0280f77403e157c42675a30f1509d10/figures/als.png -------------------------------------------------------------------------------- /figures/lda_topics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vsmolyakov/pyspark/7106e747c0280f77403e157c42675a30f1509d10/figures/lda_topics.png -------------------------------------------------------------------------------- /figures/mlp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vsmolyakov/pyspark/7106e747c0280f77403e157c42675a30f1509d10/figures/mlp.png -------------------------------------------------------------------------------- /figures/random_forrest.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vsmolyakov/pyspark/7106e747c0280f77403e157c42675a30f1509d10/figures/random_forrest.png -------------------------------------------------------------------------------- /figures/spark.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vsmolyakov/pyspark/7106e747c0280f77403e157c42675a30f1509d10/figures/spark.png -------------------------------------------------------------------------------- /lda.py: -------------------------------------------------------------------------------- 1 | 2 | from pyspark import SparkContext 3 | from pyspark.sql import SQLContext 4 | from pyspark.sql import SparkSession 5 | from pyspark.sql import Row 6 | 7 | import re 8 | import numpy as np 9 | from time import time 10 | from sklearn.datasets import fetch_20newsgroups 11 | 12 | from pyspark.ml.feature import CountVectorizer, HashingTF, IDF 13 | from pyspark.ml.feature import Tokenizer, StopWordsRemover 14 | from pyspark.ml.clustering import LDA 15 | 16 | np.random.seed(0) 17 | 18 | if __name__ == "__main__": 19 | 20 | sc = SparkContext('local', 'lda') 21 | sqlContext = SQLContext(sc) 22 | 23 | spark = SparkSession\ 24 | .builder\ 25 | .appName("LDA")\ 26 | .getOrCreate() 27 | 28 | 29 | num_features = 8000 #vocabulary size 30 | num_topics = 20 #fixed for LDA 31 | 32 | print "loading 20 newsgroups dataset..." 33 | tic = time() 34 | dataset = fetch_20newsgroups(shuffle=True, random_state=0, remove=('headers','footers','quotes')) 35 | train_corpus = dataset.data # a list of 11314 documents / entries 36 | toc = time() 37 | print "elapsed time: %.4f sec" %(toc - tic) 38 | 39 | #distribute data 40 | corpus_rdd = sc.parallelize(train_corpus) 41 | corpus_rdd = corpus_rdd.map(lambda doc: re.sub(r"[^A-Za-z]", " ", doc)) 42 | corpus_rdd = corpus_rdd.map(lambda doc: u"".join(doc).encode('utf-8').strip()) 43 | 44 | rdd_row = corpus_rdd.map(lambda doc: Row(raw_corpus=str(doc))) 45 | newsgroups = spark.createDataFrame(rdd_row) 46 | 47 | tokenizer = Tokenizer(inputCol="raw_corpus", outputCol="tokens") 48 | newsgroups = tokenizer.transform(newsgroups) 49 | newsgroups = newsgroups.drop('raw_corpus') 50 | 51 | stopwords = StopWordsRemover(inputCol="tokens", outputCol="tokens_filtered") 52 | newsgroups = stopwords.transform(newsgroups) 53 | newsgroups = newsgroups.drop('tokens') 54 | 55 | count_vec = CountVectorizer(inputCol="tokens_filtered", outputCol="tf_features", vocabSize=num_features, minDF=2.0) 56 | count_vec_model = count_vec.fit(newsgroups) 57 | vocab = count_vec_model.vocabulary 58 | newsgroups = count_vec_model.transform(newsgroups) 59 | newsgroups = newsgroups.drop('tokens_filtered') 60 | 61 | #hashingTF = HashingTF(inputCol="tokens_filtered", outputCol="tf_features", numFeatures=num_features) 62 | #newsgroups = hashingTF.transform(newsgroups) 63 | #newsgroups = newsgroups.drop('tokens_filtered') 64 | 65 | idf = IDF(inputCol="tf_features", outputCol="features") 66 | newsgroups = idf.fit(newsgroups).transform(newsgroups) 67 | newsgroups = newsgroups.drop('tf_features') 68 | 69 | lda = LDA(k=num_topics, featuresCol="features", seed=0) 70 | model = lda.fit(newsgroups) 71 | 72 | topics = model.describeTopics() 73 | topics.show() 74 | 75 | model.topicsMatrix() 76 | 77 | topics_rdd = topics.rdd 78 | 79 | topics_words = topics_rdd\ 80 | .map(lambda row: row['termIndices'])\ 81 | .map(lambda idx_list: [vocab[idx] for idx in idx_list])\ 82 | .collect() 83 | 84 | for idx, topic in enumerate(topics_words): 85 | print "topic: ", idx 86 | print "----------" 87 | for word in topic: 88 | print word 89 | print "----------" 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | -------------------------------------------------------------------------------- /log_reg.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | from time import time 4 | import matplotlib.pyplot as plt 5 | 6 | from sklearn.datasets import fetch_20newsgroups 7 | from sklearn.feature_extraction.text import TfidfVectorizer 8 | 9 | from pyspark import SparkContext 10 | from pyspark.mllib.regression import LabeledPoint 11 | from pyspark.mllib.classification import LogisticRegressionWithLBFGS 12 | 13 | def parse_doc(values): 14 | return LabeledPoint(values[0], values[1:]) 15 | 16 | def main(): 17 | 18 | #parameters 19 | num_features = 400 #vocabulary size 20 | 21 | #load data 22 | print "loading 20 newsgroups dataset..." 23 | categories = ['rec.autos','rec.sport.hockey','comp.graphics','sci.space'] 24 | tic = time() 25 | dataset = fetch_20newsgroups(shuffle=True, random_state=0, categories=categories, remove=('headers','footers','quotes')) 26 | train_corpus = dataset.data # a list of 11314 documents / entries 27 | train_labels = dataset.target 28 | toc = time() 29 | print "elapsed time: %.4f sec" %(toc - tic) 30 | 31 | #tf-idf vectorizer 32 | tfidf = TfidfVectorizer(max_df=0.5, max_features=num_features, \ 33 | min_df=2, stop_words='english', use_idf=True) 34 | X_tfidf = tfidf.fit_transform(train_corpus).toarray() 35 | 36 | #append document labels 37 | train_labels = train_labels.reshape(-1,1) 38 | X_all = np.hstack([train_labels, X_tfidf]) 39 | 40 | #distribute the data 41 | sc = SparkContext('local', 'log_reg') 42 | rdd = sc.parallelize(X_all) 43 | labeled_corpus = rdd.map(parse_doc) 44 | train_RDD, test_RDD = labeled_corpus.randomSplit([8, 2], seed=0) 45 | 46 | #distributed logistic regression 47 | print "training logistic regression..." 48 | model = LogisticRegressionWithLBFGS.train(train_RDD, regParam=1, regType='l1', numClasses=len(categories)) 49 | 50 | #evaluated the model on test data 51 | labels_and_preds = test_RDD.map(lambda p: (p.label, model.predict(p.features))) 52 | test_err = labels_and_preds.filter(lambda (v, p): v != p).count() / float(test_RDD.count()) 53 | print "log-reg test error: ", test_err 54 | 55 | #model.save(sc, './log_reg_lbfgs_model') 56 | 57 | 58 | if __name__ == "__main__": 59 | main() -------------------------------------------------------------------------------- /mapper.py: -------------------------------------------------------------------------------- 1 | ''' 2 | >>> lines.collect() 3 | [u'ATATCCCCGGGAT', u'ATCGATCGATATG'] 4 | 5 | >>> rdd.collect() 6 | [(u'A', 3), (u'C', 4), (u'T', 3), (u'G', 3), (u'A', 4), (u'C', 2), (u'T', 4), (u'G', 3)] 7 | 8 | >>> cnt.collect() 9 | [(u'A', 7), (u'C', 6), (u'T', 7), (u'G', 6)] 10 | ''' 11 | 12 | def mapper(seq): 13 | freq = dict() 14 | for x in list(seq): 15 | if x in freq: 16 | freq[x] += 1 17 | else: 18 | freq[x] = 1 19 | 20 | kv = [(x, freq[x]) for x in freq.keys()] 21 | return kv 22 | 23 | 24 | from pyspark import SparkContext 25 | 26 | if __name__ == "__main__": 27 | 28 | sc = SparkContext('local', 'mapper') 29 | lines = sc.textFile("./data/dna_seq.txt", 1) 30 | 31 | rdd = lines.flatMap(mapper) 32 | cnt = rdd.reduceByKey(lambda x, y: x + y) 33 | print cnt.collect() 34 | -------------------------------------------------------------------------------- /mlp.py: -------------------------------------------------------------------------------- 1 | 2 | from pyspark import SparkContext 3 | from pyspark.sql import SQLContext 4 | from pyspark.sql import SparkSession 5 | 6 | from pyspark.ml.feature import StringIndexer, VectorAssembler 7 | from pyspark.ml.classification import MultilayerPerceptronClassifier 8 | from pyspark.ml.evaluation import MulticlassClassificationEvaluator 9 | from pyspark.sql.types import DoubleType, IntegerType 10 | 11 | 12 | if __name__ == "__main__": 13 | 14 | sc = SparkContext('local', 'mlp') 15 | sqlContext = SQLContext(sc) 16 | 17 | spark = SparkSession\ 18 | .builder\ 19 | .appName("MLPClassifier")\ 20 | .getOrCreate() 21 | 22 | #read in csv as dataframe 23 | dataset = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load('./data/titanic.csv') 24 | dataset = dataset.drop('PassengerId','Name','Ticket','Cabin') 25 | 26 | #set column types 27 | dataset = dataset.withColumn("Survived", dataset["Survived"].cast(IntegerType())) 28 | dataset = dataset.withColumn("Pclass", dataset["Pclass"].cast(IntegerType())) 29 | dataset = dataset.withColumn("Age", dataset["Age"].cast(DoubleType())) 30 | dataset = dataset.withColumn("SibSp", dataset["SibSp"].cast(IntegerType())) 31 | dataset = dataset.withColumn("Parch", dataset["Parch"].cast(IntegerType())) 32 | dataset = dataset.withColumn("Fare", dataset["Fare"].cast(DoubleType())) 33 | 34 | #fill NaN 35 | avg_age = round(dataset.groupBy().avg("age").collect()[0][0],2) 36 | dataset = dataset.na.fill({'Age': avg_age}) 37 | dataset = dataset.na.drop() 38 | 39 | #map categorical data 40 | indexer = StringIndexer(inputCol="Sex", outputCol="SexInd") 41 | dataset = indexer.fit(dataset).transform(dataset) 42 | 43 | indexer = StringIndexer(inputCol="Embarked", outputCol="EmbarkedInd") 44 | dataset = indexer.fit(dataset).transform(dataset) 45 | 46 | #assemble features 47 | assembler = VectorAssembler( 48 | inputCols=["Age","Pclass","SexInd","SibSp","Parch","Fare","EmbarkedInd"], 49 | outputCol="features") 50 | 51 | dataset = assembler.transform(dataset) 52 | 53 | (trainingData, testData) = dataset.randomSplit([0.8, 0.2]) 54 | 55 | #MLP 56 | layers = [7, 8, 4, 2] #input: 7 features; output: 2 classes 57 | mlp = MultilayerPerceptronClassifier(maxIter=100, layers=layers, labelCol="Survived", featuresCol="features", blockSize=128, seed=0) 58 | 59 | model = mlp.fit(trainingData) 60 | result = model.transform(testData) 61 | 62 | prediction_label = result.select("prediction", "Survived") 63 | evaluator = MulticlassClassificationEvaluator(labelCol="Survived", predictionCol="prediction", metricName="accuracy") 64 | print "MLP test accuracy: " + str(evaluator.evaluate(prediction_label)) 65 | 66 | -------------------------------------------------------------------------------- /outliers.py: -------------------------------------------------------------------------------- 1 | ''' 2 | >>> print output 3 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 4 | ''' 5 | 6 | from pyspark import SparkContext 7 | 8 | def remove_outliers(nums): 9 | 10 | stats = nums.stats() 11 | stddev = stats.stdev() 12 | return nums.filter(lambda x: abs(x-stats.mean()) < 3 * stddev) 13 | 14 | if __name__ == "__main__": 15 | 16 | sc = SparkContext('local', 'outliers') 17 | nums = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1000]) 18 | output = sorted(remove_outliers(nums).collect()) 19 | print output 20 | 21 | -------------------------------------------------------------------------------- /pi_est.py: -------------------------------------------------------------------------------- 1 | 2 | import random 3 | from pyspark import SparkContext 4 | 5 | NUM_SAMPLES = 1000000 6 | 7 | def inside(p): 8 | x, y = random.random(), random.random() 9 | return x*x + y*y < 1 10 | 11 | if __name__ == "__main__": 12 | 13 | sc = SparkContext('local', 'pi_est') 14 | 15 | count = sc.parallelize(xrange(0,NUM_SAMPLES)).filter(inside).count() 16 | print "Pi estimate: %2.6f \n" %(4 * count / float(NUM_SAMPLES)) -------------------------------------------------------------------------------- /random_forest.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | 4 | from pyspark import SparkContext 5 | from pyspark.sql import SQLContext 6 | from pyspark.sql import SparkSession 7 | from pyspark.sql import Row 8 | 9 | from sklearn.preprocessing import LabelEncoder 10 | from pyspark.ml.classification import RandomForestClassifier 11 | from pyspark.ml.feature import VectorAssembler 12 | from pyspark.ml.evaluation import MulticlassClassificationEvaluator 13 | 14 | 15 | if __name__ == "__main__": 16 | 17 | sc = SparkContext('local', 'dataframe') 18 | sqlContext = SQLContext(sc) 19 | 20 | spark = SparkSession\ 21 | .builder\ 22 | .appName("RandomForestClassifier")\ 23 | .getOrCreate() 24 | 25 | #dataset = sc.textFile("./data/titanic.csv", 1) 26 | #dataset = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load('./data/titanic.csv') 27 | dataset = pd.read_csv("./data/titanic.csv") 28 | dataset = dataset.drop(['PassengerId','Name','Ticket','Cabin'], axis=1) 29 | 30 | #map categorical data 31 | for col in dataset.columns: 32 | if dataset[col].dtype == 'object': 33 | lbl = LabelEncoder() 34 | lbl.fit(dataset[col]) 35 | dataset[col] = lbl.transform(dataset[col]) 36 | 37 | #fill NaN 38 | median_age = dataset['Age'].dropna().median() 39 | dataset['Age'] = dataset['Age'].fillna(median_age) 40 | #dataset.isnull().sum() 41 | 42 | rdd_data = sc.parallelize(dataset.values) 43 | rdd_row = rdd_data.map(lambda p: Row(survived=int(p[0]),pclass=int(p[1]),sex=int(p[2]),age=float(p[3]),sibsp=int(p[4]),parch=int(p[5]),fare=float(p[6]),embarked=int(p[7]))) 44 | titanic = spark.createDataFrame(rdd_row) 45 | 46 | assembler = VectorAssembler( 47 | inputCols=["age","embarked","fare","parch","pclass","sex","sibsp"], 48 | outputCol="features") 49 | 50 | titanic = assembler.transform(titanic) 51 | (trainingData, testData) = titanic.randomSplit([0.8, 0.2]) 52 | 53 | #random forest classifier 54 | rfc = RandomForestClassifier(numTrees=100, maxDepth=6, labelCol="survived", featuresCol="features", seed=0) 55 | model = rfc.fit(trainingData) 56 | 57 | #feature importances 58 | model.featureImportances 59 | 60 | #predictions 61 | predictions = model.transform(testData) 62 | predictions.select("survived", "probability", "prediction").show(truncate=False) 63 | 64 | #compute test error 65 | evaluator = MulticlassClassificationEvaluator(labelCol="survived", predictionCol="prediction", metricName="accuracy") 66 | accuracy = evaluator.evaluate(predictions) 67 | print "RFC accuracy = %2.4f" % accuracy 68 | 69 | -------------------------------------------------------------------------------- /recommender.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import math 4 | from time import time 5 | from pyspark import SparkContext 6 | from pyspark.mllib.recommendation import ALS 7 | 8 | DATASET_PATH = '/data/vision/fisher/data1/MovieLens/' 9 | 10 | def main(): 11 | 12 | sc = SparkContext('local', 'als') 13 | 14 | small_ratings_file = os.path.join(DATASET_PATH,'ml-latest-small','ratings.csv') 15 | small_ratings_raw_data = sc.textFile(small_ratings_file) 16 | small_ratings_raw_data_header = small_ratings_raw_data.take(1)[0] 17 | 18 | #userId, movieId, rating, timestamp 19 | small_ratings_data = small_ratings_raw_data.filter(lambda line: line!=small_ratings_raw_data_header)\ 20 | .map(lambda line: line.split(",")).map(lambda tokens: (tokens[0],tokens[1],tokens[2])).cache() 21 | 22 | small_movies_file = os.path.join(DATASET_PATH, 'ml-latest-small', 'movies.csv') 23 | small_movies_raw_data = sc.textFile(small_movies_file) 24 | small_movies_raw_data_header = small_movies_raw_data.take(1)[0] 25 | 26 | #movieId, title, genre 27 | small_movies_data = small_movies_raw_data.filter(lambda line: line!=small_movies_raw_data_header)\ 28 | .map(lambda line: line.split(",")).map(lambda tokens: (tokens[0],tokens[1])).cache() 29 | small_movies_titles = small_movies_data.map(lambda x: (int(x[0]),x[1])) 30 | 31 | 32 | #training, validation, test split 33 | training_RDD, validation_RDD, test_RDD = small_ratings_data.randomSplit([6, 2, 2], seed=0L) 34 | validation_for_predict_RDD = validation_RDD.map(lambda x: (x[0], x[1])) 35 | test_for_predict_RDD = test_RDD.map(lambda x: (x[0], x[1])) 36 | 37 | #ALS parameters and model selection 38 | iterations = 10 39 | regularization_parameter = 0.1 40 | ranks = [4, 8, 12] 41 | tolerance = 0.02 42 | errors = [] 43 | 44 | min_error = float('inf') 45 | best_rank = -1 46 | best_iteration = -1 47 | 48 | for rank in ranks: 49 | model = ALS.train(training_RDD, rank, seed=0, iterations=iterations, lambda_=regularization_parameter) 50 | predictions = model.predictAll(validation_for_predict_RDD).map(lambda r: ((r[0], r[1]), r[2])) #((user, product), rating) 51 | rates_and_preds = validation_RDD.map(lambda r: ((int(r[0]), int(r[1])), float(r[2]))).join(predictions) #((user, product), rating1, rating2) 52 | error = math.sqrt(rates_and_preds.map(lambda r: (r[1][0] - r[1][1])**2).mean()) #sqrt[mean((rating1 - rating2)**2)] 53 | errors.append(error) 54 | 55 | print 'For rank %s the RMSE is %s' % (rank, error) 56 | if error < min_error: 57 | min_error = error 58 | best_rank = rank 59 | 60 | print 'The best model was trained with rank %s' % best_rank 61 | 62 | #new user ratings 63 | new_user_ID = 0 64 | #userId, movieId, rating 65 | new_user_ratings = [ 66 | (0,260,9), # Star Wars (1977) 67 | (0,1,8), # Toy Story (1995) 68 | (0,16,7), # Casino (1995) 69 | (0,25,8), # Leaving Las Vegas (1995) 70 | (0,32,9), # Twelve Monkeys (a.k.a. 12 Monkeys) (1995) 71 | (0,335,4), # Flintstones, The (1994) 72 | (0,379,3), # Timecop (1994) 73 | (0,296,7), # Pulp Fiction (1994) 74 | (0,858,10) , # Godfather, The (1972) 75 | (0,50,8) # Usual Suspects, The (1995) 76 | ] 77 | new_user_ratings_RDD = sc.parallelize(new_user_ratings) 78 | print 'New user ratings: %s' % new_user_ratings_RDD.take(10) 79 | 80 | #re-train the model 81 | small_data_with_new_ratings_RDD = small_ratings_data.union(new_user_ratings_RDD) 82 | t0 = time() 83 | new_ratings_model = ALS.train(small_data_with_new_ratings_RDD, best_rank, seed=0, iterations=iterations, lambda_=regularization_parameter) 84 | tt = time() - t0 85 | print "New model trained in %s seconds" % round(tt,3) 86 | 87 | #getting top recommendations 88 | new_user_ratings_ids = map(lambda x: x[1], new_user_ratings) # get just movie IDs 89 | 90 | #movieId, title, genre 91 | new_user_unrated_movies_RDD = (small_movies_data.filter(lambda x: x[0] not in new_user_ratings_ids).map(lambda x: (new_user_ID, x[0]))) 92 | new_user_recommendations_RDD = new_ratings_model.predictAll(new_user_unrated_movies_RDD) 93 | 94 | #movieId, rating 95 | new_user_recommendations_rating_RDD = new_user_recommendations_RDD.map(lambda x: (x.product, x.rating)) 96 | new_user_recommendations_rating_title_RDD = new_user_recommendations_rating_RDD.join(small_movies_titles) 97 | new_user_recommendations_rating_title_RDD = new_user_recommendations_rating_title_RDD.map(lambda r: (r[1][1], r[1][0])) 98 | 99 | top_movies = new_user_recommendations_rating_title_RDD.takeOrdered(25, key=lambda x: -x[1]) 100 | print "top recommended movies:\n %s" % '\n'.join(map(str, top_movies)) 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | if __name__ == "__main__": 111 | main() -------------------------------------------------------------------------------- /scala/pi_est.scala: -------------------------------------------------------------------------------- 1 | import org.apache.spark.sql.SparkSession 2 | import scala.math.random 3 | 4 | object pi_est { 5 | 6 | def inside(p:Int) : Boolean = { 7 | val x = random * 2 - 1 8 | val y = random * 2 - 1 9 | return x * x + y * y <= 1 10 | } 11 | 12 | def main(args: Array[String]): Unit = { 13 | val spark = SparkSession.builder.appName("pi_est").getOrCreate() 14 | val NUM_SAMPLES = 1000000 15 | val count = spark.sparkContext.parallelize(Range(0, NUM_SAMPLES)).filter(inside).count() 16 | val pi_est = 4.0 * count / NUM_SAMPLES 17 | println(s"Pi est = $pi_est") 18 | } 19 | } -------------------------------------------------------------------------------- /term_doc.py: -------------------------------------------------------------------------------- 1 | ''' 2 | >>> tokens.first() 3 | ['It', 'is', 'the', 'east,', 'and', 'Juliet', 'is', 'the', 'sun.'] 4 | 5 | >>> local_vocab_map 6 | {'and': 0, 'A': 1, 'fit': 14, 'for': 13, 'of': 3, 'is': 4, 'gods.': 7, 'It': 11,\ 7 | 'Brevity': 10, 'soul': 12, 'sun.': 8, 'dish': 2, 'east,': 9, 'the': 5, 'wit.': 6, 'Juliet': 15} 8 | 9 | >>> for doc in term_document_matrix.collect(): 10 | print doc 11 | (16,[0,4,5,8,9,11,15],[1.0,2.0,2.0,1.0,1.0,1.0,1.0]) 12 | (16,[1,2,5,7,13,14],[1.0,1.0,1.0,1.0,1.0,1.0]) 13 | (16,[3,4,5,6,10,12],[1.0,1.0,1.0,1.0,1.0,1.0]) 14 | 15 | ''' 16 | 17 | from pyspark.mllib.linalg import SparseVector 18 | from collections import Counter 19 | 20 | from pyspark import SparkContext 21 | 22 | if __name__ == "__main__": 23 | 24 | sc = SparkContext('local', 'term_doc') 25 | corpus = sc.parallelize([ 26 | "It is the east, and Juliet is the sun.", 27 | "A dish fit for the gods.", 28 | "Brevity is the soul of wit."]) 29 | 30 | tokens = corpus.map(lambda raw_text: raw_text.split()).cache() 31 | local_vocab_map = tokens.flatMap(lambda token: token).distinct().zipWithIndex().collectAsMap() 32 | 33 | vocab_map = sc.broadcast(local_vocab_map) 34 | vocab_size = sc.broadcast(len(local_vocab_map)) 35 | 36 | term_document_matrix = tokens \ 37 | .map(Counter) \ 38 | .map(lambda counts: {vocab_map.value[token]: float(counts[token]) for token in counts}) \ 39 | .map(lambda index_counts: SparseVector(vocab_size.value, index_counts)) 40 | 41 | for doc in term_document_matrix.collect(): 42 | print doc 43 | -------------------------------------------------------------------------------- /word_count.py: -------------------------------------------------------------------------------- 1 | ''' 2 | >>> lines.collect() 3 | [u'hello world', u'hello pyspark', u'spark context', u'i like spark', u'hadoop rdd', u'text file', u'word count', u'', u''] 4 | 5 | >>> words.collect() 6 | [u'hello', u'world', u'hello', u'pyspark', u'spark', u'context', u'i', u'like', u'spark', u'hadoop', u'rdd', u'text', u'file', u'word', u'count', u'', u''] 7 | 8 | >>> ones.take(2) 9 | [(u'hello', 1), (u'world', 1)] 10 | 11 | >>> counts.takeSample(1, 2) 12 | [(u'spark', 2), (u'hello', 2)] 13 | ''' 14 | 15 | from pyspark import SparkContext 16 | 17 | if __name__ == "__main__": 18 | 19 | sc = SparkContext('local', 'word_count') 20 | lines = sc.textFile("./data/words.txt", 1) 21 | 22 | words = lines.flatMap(lambda x: x.split(' ')) 23 | ones = words.map(lambda x: (x, 1)) 24 | counts = ones.reduceByKey(lambda x, y: x + y) 25 | counts = counts.sortByKey(1) 26 | 27 | counts.saveAsTextFile("./data/word_counts.txt") --------------------------------------------------------------------------------