├── Figure2.PNG
├── Figure3.PNG
├── Figure4.PNG
├── Figure5.PNG
├── Figure6.PNG
├── Methodology.PNG
├── README.md
└── data
├── annotated_test_set.txt
└── annotated_test_set_corrected.csv
/Figure2.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/irecasens/nlp_amazon_reviews/HEAD/Figure2.PNG
--------------------------------------------------------------------------------
/Figure3.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/irecasens/nlp_amazon_reviews/HEAD/Figure3.PNG
--------------------------------------------------------------------------------
/Figure4.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/irecasens/nlp_amazon_reviews/HEAD/Figure4.PNG
--------------------------------------------------------------------------------
/Figure5.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/irecasens/nlp_amazon_reviews/HEAD/Figure5.PNG
--------------------------------------------------------------------------------
/Figure6.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/irecasens/nlp_amazon_reviews/HEAD/Figure6.PNG
--------------------------------------------------------------------------------
/Methodology.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/irecasens/nlp_amazon_reviews/HEAD/Methodology.PNG
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # nlp_amazon_reviews
2 | View notebook here .
3 |
4 | ## Introduction
5 | Merchants selling products through ecommerce often received a high amount of customers reviews too large in scale for human processing. These reviews often have important business insights that can be leveraged to perform actions that can improve profits. In this project we analyze ~400,000 mobile phone reviews from Amazon.com aiming to find trends and patterns to determine which product characteristics are mentioned most by customers and with what sentiment.
6 |
7 | Our task is performed in six steps:
8 |
9 | (1) pre-processing to prepare the data for analysis including tokenization and part-of-speech tagging
10 |
11 | (2) product names standardization
12 |
13 | (3) characteristics extraction
14 |
15 | (4) reviews filtering to remove reviews considered as outliers, unbalanced or meaningless
16 |
17 | (5) sentiment extraction for each product-characteristic
18 |
19 | (6) performance analysis to determine the accuracy of the model where we evaluate characteristic extraction separately from sentiment scores
20 |
21 | The dataset can be found in Kaggle:
22 | - https://www.kaggle.com/PromptCloudHQ/amazon-reviews-unlocked-mobile-phones/downloads/Amazon_Unlocked_Mobile.csv
23 |
24 | ## Methodology
25 |
26 | A flowchart of the project, including the approach, performance and final business analysis is presented below:
27 |
28 |
29 |
--------------------------------------------------------------------------------
/data/annotated_test_set.txt:
--------------------------------------------------------------------------------
1 | 757, Lenovo A850, {battery:-1};
2 | 1540,BlackBerry Curve, {Trackball:-1,Battery:-1,Micro-SD:-1};
3 | 1554,Acer Liquid E700 TRIO,{Camera:-1,Hardware:-1,Buttons:-1};
4 | 1697,Alcatel OneTouch,{Hardware:-1,Charging Port:-1};
5 | 1930,Alcatel OneTouch,{Screen:1,Size:-1};
6 | 3270,iPhone 5c,{Wifi:-1};
7 | 3274,iPhone 5c,{screen:1,speed:1,battery:-1};
8 | 3308,iPhone 5c,{charging:1,battery:1};
9 | 3310,iPhone 5c,{speaker:-1};
10 | 3323,iPhone 5c,{charger:-1};
11 | 3329,iPhone 5c,{battery:-1};
12 | 197316, HTC One X,{Price:1};
13 | 197369, HTC One X,{Speed:1,camera:1,storage:1};
14 | 197376, HTC One X,{Screen:-1};
15 | 197443, HTC One X,{Apps:1};
16 | 197450, HTC One X,{Software:-1,charger:-1,play store:-1,memory:-1};
17 | 197460, HTC One X,{screen:1,warranty:-1};
18 | 198094, HTC Rhyme, {layout:-1,user interface:-1,size:-1};
19 | 198105, HTC Rhyme, {sound:-1,battery:-1};
20 | 198109, HTC Rhyme, {charger:-1,battery:-1};
21 | 198111, HTC Rhyme, {navigation system:1,voice search:1,speakers:1};
22 | 198115, HTC Rhyme, {size:1,weight:1,keyboard:-1,SD card:-1,cord:-1};
23 | 199958, Huawei Ascend P7, {software:-1};
24 | 199986, Huawei Ascend P7, {image quality:-1,coverage:-1};
25 | 199992, Huawei Ascend P7, {wifi:-1};
26 | 200009, Huawei Ascend P7, {price:1,hardware:1};
27 | 200041, Huawei Ascend P7, {specs:1,price:1,software:1,screen:1,size:1};
28 | 200946, Huawei Mate 2, {battery:1,bluetootch:1,size:1};
29 | 200955, Huawei Mate 2, {battery:1,screen:1};
30 | 53, Asha 302, {sound: 1, smart phone features: 1, software: -1, documentation: 1, browser: 1, keyboard: 1, hot-keys: 1, shortcuts: 1, airplane mode: -1, flight profile: -1, navigation button/mouse: -1};
31 | 69, Asha 302, {build: 1, keyboard: 1,sound: 1, Xpress: -1, internet experience: 1};
32 | 71, Asha 302, {build: 1, reception: 1, audio: 1, keyboard: 1, wi-fi: 1, opera mini: 1, OVI store app: -1, battery: 1};
33 | 73, Asha 302, {apps: 1};
34 | 75, Asha 302, {SMS: 1, rings: 1, body: 1, freezes: -1};
35 | 78, Asha 302, {ring tones: 1};
36 | 79, Asha 302, {wi-fi: 1, calendar: 1, alarm clock: 1, text messaging: 1, camera: 1, missed/received call: 1, size: 1};
37 | 82, Asha 302, {price: 1};
38 | 84, Asha 302, {time: -1, support: -1, booklet: -1};
39 | 85, Asha 302, {screen: -1, calling: 1, messaging: 1, web browsing: 1, social networking: 1, wi-fi: 1};
40 | 86, Asha 302, {texting interface: 1, battery: 1, email: 1, keyboard: 1, os: 1, backup: 1};
41 | 47102, iPhone 5s, {apps: 1, size: 1, pictures: 1};
42 | 47108, iPhone 5s, {size: -1};
43 | 47110, iPhone 5s, {box: -1};
44 | 236742, Optimus S, {look: 1, case: 1, screen protector: 1};
45 | 236743, Optimus S, {case: 1, design: -1};
46 | 236745, Optimus S, {case: 1, screen protector: -1};
47 | 236748, Optimus S, {design: 1, case: -1};
48 | 236750, Optimus S, {build: -1};
49 | 236751, Optimus S, {case: -1};
50 | 66007, iPhone 6s, {size: 1};
51 | 66015, iPhone 6s, {set up: 1};
52 | 66016, iPhone 6s, {box: -1, charger: -1, hands free: -1};
53 | 66017, iPhone 6s, {screen: -1};
54 | 85857, 9530 Storm, {speaker: -1};
55 | 85859, 9530 Storm, {wireless apps: -1, internet: -1};
56 | 38591, iPhone 5c, {battery: -1};
57 | 38609, iPhone 5c, {screen: -1};
58 | 40055, iPhone 5s, {fingerprint addition: 1};
59 | 40176, iPhone 5s, {fingerprint unlocking mechanism: -1};
60 | 41962, iPhone 5s, {sound: -1, signal: -1};
61 | 42135, iPhone 5s, {pictures: 1, feel: 1, ease of use: 1};
62 | 42160, iPhone 5s, {camera: 1, social media: 1, navigation: 1, 16gb: -1};
63 | 42164, iPhone 5s, {shape: 1};
64 | 42175, iPhone 5s, {box: 1};
65 | 42179, iPhone 5s, {battery life: -1, case: -1, ios8: -1, apps: -1, camera: 1, audio output: 1, call quality: 1};
66 | 42196, iPhone 5s, {multi-tasking: 1, pictures: 1, speed: 1};
67 | 42201, iPhone 5s, {sim card: -1};
68 | 42202, iPhone 5s, {camera: -1, apps: -1};
69 | 42206, iPhone 5s, {size: 1};
70 | 42224, iPhone 5s, {top left corner: -1, lcd display: -1};
71 | 42239, iPhone 5s, {battery: -1, speaker: -1};
72 | 42377, iPhone 5s, {battery: 1, camera: 1};
73 | 89714, Bold 9900, {memory card: -1};
74 | 89713, Bold 9900, {charger: -1, case: -1};
75 | 89712, Bold 9900, {frame: -1};
76 | 89829, Bold 9930, {keys: -1};
77 | 89830, Bold 9930, {user manual: -1};
78 | 89868, Bold 9930, {audio: -1, frozen: -1};
79 | 89909, Bold 9930, {camera: 1, graphics: 1, storage: 1};
80 | 89938, Bold 9930, {touchscreen: 1, feel: 1};
81 | 734, Lenovo A850, {screen:1, audio: 1, apps: -1, speed: 1, price: 1};
82 | 755, Lenovo A850, {language setting: -1, battery: -1};
83 | 773, Lenovo A850, {apps: 1, price: 1};
84 | 774, Lenovo A850, {charger: -1};
85 | 776, Lenovo A850, {screen: -1, button: -1, brand: 1};
86 | 828, Lenovo A850, {price: 1, size: 1};
87 | 943, Lenovo A850, {speed: 1, size: 1, screen: 1, camera: -1, price: 1};
88 | 3177, iPhone 5c, {size: 1, charger: 1, apps: 1, headphone: -1, sim card: -1};
89 | 3270, iPhone 5c, {wi-fi: -1};
90 | 3529, iPhone 5c, {backlight: -1, touchscreen: -1};
91 | 3817, iPhone 5c, {storage: 1, price: 1};
92 | 72174, iPhone 7 Plus, {price: 1};
93 | 72244, iPhone 7 Plus, {design: -1, camera: 1, screen: 1, battery: 1};
94 | 72261, iPhone 7 Plus, {camera: 1, touchscreen: 1, size: 1, apps: -1};
95 | 72292, iPhone 7 Plus, {price: -1};
96 | 72644, iPhone 7 Plus, {case: -1};
97 | 72645, iPhone 7 Plus, {headphone: -1};
98 | 83658, BlackBerry 8520, {software: 1};
99 | 83689, BlackBerry 8520, {wi-fi: 1, price: 1, internet: 1};
100 | 83694, BlackBerry 8520, {email: -1};
101 | 83709, BlackBerry 8520, {price: -1, speed: -1, battery: -1, screen: -1};
102 | 83739, BlackBerry 8520, {price: 1, ease of use: 1};
103 | 84943, BlackBerry 8520, {keyboard: 1, display: 1, trackpad: 1, media keys: 1, mute button: 1};
104 | 84954, BlackBerry 8520, {box: 1, case: -1, memory card: -1};
105 | 85002, BlackBerry 9520, {voice: 1, SMS: 1, data: 1, wi-fi: 1, price: 1};
106 | 114768, BLU Dash L, {size: -1, price: 1};
107 | 114772, BLU Dash L, {size: -1, battery: -1};
108 | 114775, BLU Dash L, {battery: -1};
109 | 114784, BLU Dash L, {storage: -1, sim card: 1, price: 1};
110 | 115022, BLU Dash L, {apps: -1, touchscreen: -1};
111 | 115040, BLU Dash L, {color: 1, speed: -1, apps: -1};
112 | 192806, HTC One M7, {speed: 1, audio: 1, control button: -1};
113 | 192810, HTC One M7, {display: 1, speaker: 1, screen: 1, size: 1, battery: -1, camera: -1};
114 | 192835, HTC One M7, {battery: 1, memory: 1};
115 | 192844, HTC One M7, {camera: -1};
116 | 192879, HTC One M7, {apps: 1};
117 | 200629, Huawei GX8, {battery: 1};
118 | 200641, Huawei GX8, {price: 1, security options: 1, battery: 1, data monitoring: 1};
119 | 200642, Huawei GX8, {price: 1, design: 1, screen: 1, fingerprint sensor: 1, speed: -1};
120 | 200657, Huawei GX8, {multi-tasking: -1, fingerprint reader: 1, apps: -1}
121 | 200658, Huawei GX8, {battery: 1, camera: 1, screen: 1, price: 1};
122 | 238741, LG Xenon GR500, {keyboard: 1, color: 1};
123 | 238859, LG Xenon GR500, {screen: -1, ease of use: 1, battery: -1};
124 | 238891, LG Xenon GR500, {ease of use: 1, keyboard: 1};
125 | 240859, Microsoft Lumia 950, {price: 1, camera: 1, battery: 1, weight: 1};
126 | 240862, Microsoft Lumia 950, {size: 1, software: 1, apps: 1};
127 | 240956, Microsoft Lumia 950, {speed: 1, case: 1, screen: 1};
128 | 241013, Microsoft Lumia 950, {ease of use: 1};
129 | 241146, Microsoft Lumia 950, {internet: -1};
130 | 241214, Microsoft Lumia 950, {sim card: 1, speed: 1, camera: 1, SMS: -1, security options: -1};
131 |
--------------------------------------------------------------------------------
/data/annotated_test_set_corrected.csv:
--------------------------------------------------------------------------------
1 | 757; Lenovo A850; {'battery':-1}
2 | 1540;BlackBerry Curve; {'Trackball':-1,'Battery':-1,'Micro-SD':-1}
3 | 1554;Acer Liquid E700 TRIO;{'Camera':-1,'Hardware':-1,'Buttons':-1}
4 | 1697;Alcatel OneTouch;{'Hardware':-1,'Charging Port':-1}
5 | 1930;Alcatel OneTouch;{'Screen':1,'Size':-1}
6 | 3270;iPhone 5c;{'Wifi':-1}
7 | 3274;iPhone 5c;{'screen':1,'speed':1,'battery':-1}
8 | 3308;iPhone 5c;{'charging':1,'battery':1}
9 | 3310;iPhone 5c;{'speaker':-1}
10 | 3323;iPhone 5c;{'charger':-1}
11 | 3329;iPhone 5c;{'battery':-1}
12 | 197316; HTC One X;{'Price':1}
13 | 197369; HTC One X;{'Speed':1,'camera':1,'storage':1}
14 | 197376; HTC One X;{'Screen':-1}
15 | 197443; HTC One X;{'Apps':1}
16 | 197450; HTC One X;{'Software':-1,'charger':-1,'play store':-1,'memory':-1}
17 | 197460; HTC One X;{'screen':1,'warranty':-1}
18 | 198094; HTC Rhyme; {'layout':-1,'user interface':-1,'size':-1}
19 | 198105; HTC Rhyme; {'sound':-1,'battery':-1}
20 | 198109; HTC Rhyme; {'charger':-1,'battery':-1}
21 | 198111; HTC Rhyme; {'navigation system':1,'voice search':1,'speakers':1}
22 | 198115; HTC Rhyme; {'size':1,'weight':1,'keyboard':-1,'SD card':-1,'cord':-1}
23 | 199958; Huawei Ascend P7; {'software':-1}
24 | 199986; Huawei Ascend P7; {'image quality':-1,'coverage':-1}
25 | 199992; Huawei Ascend P7; {'wifi':-1}
26 | 200009; Huawei Ascend P7; {'price':1,'hardware':1}
27 | 200041; Huawei Ascend P7; {'specs':1,'price':1,'software':1,'screen':1,'size':1}
28 | 200946; Huawei Mate 2; {'battery':1,'bluetootch':1,'size':1}
29 | 200955; Huawei Mate 2; {'battery':1,'screen':1}
30 | 53; Asha 302; {'sound': 1,' smart phone features': 1,' software': -1,' documentation': 1,' browser': 1,' keyboard': 1,' hot-keys': 1,' shortcuts': 1,' airplane mode': -1,' flight profile': -1,' navigation button/mouse': -1}
31 | 69; Asha 302; {'build': 1,' keyboard': 1,'sound': 1,' Xpress': -1,' internet experience': 1}
32 | 71; Asha 302; {'build': 1,' reception': 1,' audio': 1,' keyboard': 1,' wi-fi': 1,' opera mini': 1,' OVI store app': -1,' battery': 1}
33 | 73; Asha 302; {'apps': 1}
34 | 75; Asha 302; {'SMS': 1,' rings': 1,' body': 1,' freezes': -1}
35 | 78; Asha 302; {'ring tones': 1}
36 | 79; Asha 302; {'wi-fi': 1,' calendar': 1,' alarm clock': 1,' text messaging': 1,' camera': 1,' missed/received call': 1,' size': 1}
37 | 82; Asha 302; {'price': 1}
38 | 84; Asha 302; {'time': -1,' support': -1,' booklet': -1}
39 | 85; Asha 302; {'screen': -1,' calling': 1,' messaging': 1,' web browsing': 1,' social networking': 1,' wi-fi': 1}
40 | 86; Asha 302; {'texting interface': 1,' battery': 1,' email': 1,' keyboard': 1,' os': 1,' backup': 1}
41 | 47102; iPhone 5s; {'apps': 1,' size': 1,' pictures': 1}
42 | 47108; iPhone 5s; {'size': -1}
43 | 47110; iPhone 5s; {'box': -1}
44 | 236742; Optimus S; {'look': 1,' case': 1,' screen protector': 1}
45 | 236743; Optimus S; {'case': 1,' design': -1}
46 | 236745; Optimus S; {'case': 1,' screen protector': -1}
47 | 236748; Optimus S; {'design': 1,' case': -1}
48 | 236750; Optimus S; {'build': -1}
49 | 236751; Optimus S; {'case': -1}
50 | 66007; iPhone 6s; {'size': 1}
51 | 66015; iPhone 6s; {'set up': 1}
52 | 66016; iPhone 6s; {'box': -1,' charger': -1,' hands free': -1}
53 | 66017; iPhone 6s; {'screen': -1}
54 | 85857; 9530 Storm; {'speaker': -1}
55 | 85859; 9530 Storm; {'wireless apps': -1,' internet': -1}
56 | 38591; iPhone 5c; {'battery': -1}
57 | 38609; iPhone 5c; {'screen': -1}
58 | 40055; iPhone 5s; {'fingerprint addition': 1}
59 | 40176; iPhone 5s; {'fingerprint unlocking mechanism': -1}
60 | 41962; iPhone 5s; {'sound': -1,' signal': -1}
61 | 42135; iPhone 5s; {'pictures': 1,' feel': 1,' ease of use': 1}
62 | 42160; iPhone 5s; {'camera': 1,' social media': 1,' navigation': 1,' 16gb': -1}
63 | 42164; iPhone 5s; {'shape': 1}
64 | 42175; iPhone 5s; {'box': 1}
65 | 42179; iPhone 5s; {'battery life': -1,' case': -1,' ios8': -1,' apps': -1,' camera': 1,' audio output': 1,' call quality': 1}
66 | 42196; iPhone 5s; {'multi-tasking': 1,' pictures': 1,' speed': 1}
67 | 42201; iPhone 5s; {'sim card': -1}
68 | 42202; iPhone 5s; {'camera': -1,' apps': -1}
69 | 42206; iPhone 5s; {'size': 1}
70 | 42224; iPhone 5s; {'top left corner': -1,' lcd display': -1}
71 | 42239; iPhone 5s; {'battery': -1,' speaker': -1}
72 | 42377; iPhone 5s; {'battery': 1,' camera': 1}
73 | 89714; Bold 9900; {'memory card': -1}
74 | 89713; Bold 9900; {'charger': -1,' case': -1}
75 | 89712; Bold 9900; {'frame': -1}
76 | 89829; Bold 9930; {'keys': -1}
77 | 89830; Bold 9930; {'user manual': -1}
78 | 89868; Bold 9930; {'audio': -1,' frozen': -1}
79 | 89909; Bold 9930; {'camera': 1,' graphics': 1,' storage': 1}
80 | 89938; Bold 9930; {'touchscreen': 1,' feel': 1}
81 | 734; Lenovo A850; {'screen':1,' audio': 1,' apps': -1,' speed': 1,' price': 1}
82 | 755; Lenovo A850; {'language setting': -1,' battery': -1}
83 | 773; Lenovo A850; {'apps': 1,' price': 1}
84 | 774; Lenovo A850; {'charger': -1}
85 | 776; Lenovo A850; {'screen': -1,' button': -1,' brand': 1}
86 | 828; Lenovo A850; {'price': 1,' size': 1}
87 | 943; Lenovo A850; {'speed': 1,' size': 1,' screen': 1,' camera': -1,' price': 1}
88 | 3177; iPhone 5c; {'size': 1,' charger': 1,' apps': 1,' headphone': -1,' sim card': -1}
89 | 3270; iPhone 5c; {'wi-fi': -1}
90 | 3529; iPhone 5c; {'backlight': -1,' touchscreen': -1}
91 | 3817; iPhone 5c; {'storage': 1,' price': 1}
92 | 72174; iPhone 7 Plus; {'price': 1}
93 | 72244; iPhone 7 Plus; {'design': -1,' camera': 1,' screen': 1,' battery': 1}
94 | 72261; iPhone 7 Plus; {'camera': 1,' touchscreen': 1,' size': 1,' apps': -1}
95 | 72292; iPhone 7 Plus; {'price': -1}
96 | 72644; iPhone 7 Plus; {'case': -1}
97 | 72645; iPhone 7 Plus; {'headphone': -1}
98 | 83658; BlackBerry 8520; {'software': 1}
99 | 83689; BlackBerry 8520; {'wi-fi': 1,' price': 1,' internet': 1}
100 | 83694; BlackBerry 8520; {'email': -1}
101 | 83709; BlackBerry 8520; {'price': -1,' speed': -1,' battery': -1,' screen': -1}
102 | 83739; BlackBerry 8520; {'price': 1,' ease of use': 1}
103 | 84943; BlackBerry 8520; {'keyboard': 1,' display': 1,' trackpad': 1,' media keys': 1,' mute button': 1}
104 | 84954; BlackBerry 8520; {'box': 1,' case': -1,' memory card': -1}
105 | 85002; BlackBerry 9520; {'voice': 1,' SMS': 1,' data': 1,' wi-fi': 1,' price': 1}
106 | 114768; BLU Dash L; {'size': -1,' price': 1}
107 | 114772; BLU Dash L; {'size': -1,' battery': -1}
108 | 114775; BLU Dash L; {'battery': -1}
109 | 114784; BLU Dash L; {'storage': -1,' sim card': 1,' price': 1}
110 | 115022; BLU Dash L; {'apps': -1,' touchscreen': -1}
111 | 115040; BLU Dash L; {'color': 1,' speed': -1,' apps': -1}
112 | 192806; HTC One M7; {'speed': 1,' audio': 1,' control button': -1}
113 | 192810; HTC One M7; {'display': 1,' speaker': 1,' screen': 1,' size': 1,' battery': -1,' camera': -1}
114 | 192835; HTC One M7; {'battery': 1,' memory': 1}
115 | 192844; HTC One M7; {'camera': -1}
116 | 192879; HTC One M7; {'apps': 1}
117 | 200629; Huawei GX8; {'battery': 1}
118 | 200641; Huawei GX8; {'price': 1,' security options': 1,' battery': 1,' data monitoring': 1}
119 | 200642; Huawei GX8; {'price': 1,' design': 1,' screen': 1,' fingerprint sensor': 1,' speed': -1}
120 | 200657; Huawei GX8; {'multi-tasking': -1,' fingerprint reader': 1,' apps': -1}
121 | 200658; Huawei GX8; {'battery': 1,' camera': 1,' screen': 1,' price': 1}
122 | 238741; LG Xenon GR500; {'keyboard': 1,' color': 1}
123 | 238859; LG Xenon GR500; {'screen': -1,' ease of use': 1,' battery': -1}
124 | 238891; LG Xenon GR500; {'ease of use': 1,' keyboard': 1}
125 | 240859; Microsoft Lumia 950; {'price': 1,' camera': 1,' battery': 1,' weight': 1}
126 | 240862; Microsoft Lumia 950; {'size': 1,' software': 1,' apps': 1}
127 | 240956; Microsoft Lumia 950; {'speed': 1,' case': 1,' screen': 1}
128 | 241013; Microsoft Lumia 950; {'ease of use': 1}
129 | 241146; Microsoft Lumia 950; {'internet': -1}
130 | 241214; Microsoft Lumia 950; {'sim card': 1,' speed': 1,' camera': 1,' SMS': -1,' security options': -1}
131 |
132 |
--------------------------------------------------------------------------------