├── .gitignore
├── LICENSE.md
├── Procfile
├── README.md
├── app.json
├── requirements-dev.txt
├── requirements.txt
├── runtime.txt
├── scripts
    ├── run_dev_flask.sh
    ├── start_service.sh
    └── stop_service.sh
└── src
    ├── config.py
    ├── fb_comment_downloader_app.py
    ├── get_fb_comments_from_fb.py
    ├── static
        ├── loader.gif
        ├── script.js
        └── style.css
    ├── templates
        └── index.html
    ├── test_data_urls.json
    ├── test_validation.py
    └── validation.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | .*
3 | env/
4 | *.bak
5 | __pycache__
6 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Washington State Department of Transportation
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/Procfile:
--------------------------------------------------------------------------------
1 | web: gunicorn --chdir src fb_comment_downloader_app:app --timeout 180 --worker-class gevent
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Facebook Comment Downloader #
 2 | 
 3 | A small web app for downloading comments from a public facebook page post.
 4 | Comment downloading from https://github.com/minimaxir/facebook-page-post-scraper
 5 | 
 6 | ![web app screenshot](https://user-images.githubusercontent.com/6343384/32192303-0a58cc90-bd71-11e7-8c79-bf12a3203040.png)
 7 | 
 8 | Setup
 9 | -----
10 | 
11 | ```
12 | pip install -r requirements.txt
13 | ```
14 | 
15 | *Note: this will install [Gunicorn](http://gunicorn.org/) and [Gevent](http://www.gevent.org/). These packages are not required if you choose a different server.*
16 | 
17 | This application is set up to only download comments on posts from a specified public facebook page. You will need to [register and configure a Facebook app](https://developers.facebook.com/docs/apps/register/). Once you've done this, fill out `config.py` with your information.
18 | 
19 | To get comment author and reactions info you will need to use a [Page Access token](https://developers.facebook.com/docs/facebook-login/access-tokens/#pagetokens) from a user who has admin rights to the page. You can get a token by setting up a [system user](https://developers.facebook.com/docs/audience-network/reporting-api/systemuser/).
20 | 
21 | Be aware of the following restriction: 
22 | > Devmode Apps — Apps in Devmode are now rate-limited to 200 calls per hour, per page-app pair, and can only access Users who have a role on the app (admin, developer, or tester).
23 | 
24 | https://developers.facebook.com/docs/graph-api/changelog/breaking-changes
25 | 
26 | Deployment
27 | ----------
28 | This project is built with [Flask](http://flask.pocoo.org/).
29 | Hosting is up to you, the Flask webpage lists [some options](http://flask.pocoo.org/docs/0.12/deploying/).
30 | 
31 | Click below to deploy the app with [Gunicorn](http://gunicorn.org/) on Heroku.
32 | 
33 | [![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy)
34 | 
35 | Development Setup
36 | ----------------
37 | 
38 | ```
39 | pip install -r requirements-dev.txt
40 | ```
41 | 
42 | ##### Start Flask dev server 
43 | 
44 | `FLASK_APP=fb_comment_downloader_app.py flask run`
45 | 
46 | ##### Run tests
47 | 
48 | `python test_validation.py`. 
49 | Currently, we only have a few tests for checking the facebook urls.
50 | 
51 | Contributing
52 | ------------
53 | 
54 | Find a bug? Got an idea? Send us a pull request or open an issue and we'll take a look. You can also check the issue tracker.
55 | 
56 | License
57 | -------
58 | 
59 | MIT
60 | 


--------------------------------------------------------------------------------
/app.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "Facebook Comment Downloader",
 3 |   "description": "A web app for downloading Facebook comments as a csv file.",
 4 |   "keywords": [
 5 |     "python",
 6 |     "flask"
 7 |   ],
 8 |   "repository": "https://github.com/WSDOT/fb-comment-downloader",
 9 |   "env": {
10 |     "PAGE_ACCESS_TOKEN": {
11 |       "description": "Your Facebook page access token. Recommend using a permanent system user token.",
12 |       "value": "Replace with your page access token"
13 |     },
14 |     "PAGE_ID": {
15 |       "description": "ID of the public Facebook page where comments will be downloaded from.",
16 |       "value": "Replace with your page ID"
17 |     },
18 |     "PAGE_NAME": {
19 |       "description": "The name of your Facebook page where comments will be downloaded from.",
20 |       "value": "Replace with your page name"
21 |     }
22 |   },
23 |   "formation": {
24 |     "web": {
25 |       "quantity": 1,
26 |       "size": "free"
27 |     }
28 |   },
29 |   "image": "heroku/python"
30 | }
31 | 


--------------------------------------------------------------------------------
/requirements-dev.txt:
--------------------------------------------------------------------------------
1 | click==6.7
2 | ddt==1.1.1
3 | Flask==1.0.2
4 | itsdangerous==0.24
5 | Jinja2==2.11.3
6 | MarkupSafe==1.0
7 | Werkzeug==0.15.5
8 | ddt==1.1.1
9 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | click==6.7
2 | Flask==1.0.2
3 | itsdangerous==0.24
4 | Jinja2==2.11.3
5 | MarkupSafe==1.0
6 | Werkzeug==0.15.5
7 | gunicorn==19.7.1
8 | gevent==1.2.2
9 | 


--------------------------------------------------------------------------------
/runtime.txt:
--------------------------------------------------------------------------------
1 | python-2.7.14
2 | 


--------------------------------------------------------------------------------
/scripts/run_dev_flask.sh:
--------------------------------------------------------------------------------
1 | FLASK_APP=../src/fb_comment_downloader_app.py flask run
2 | 


--------------------------------------------------------------------------------
/scripts/start_service.sh:
--------------------------------------------------------------------------------
1 | systemctl start fb-comment-downloader.service
2 | 


--------------------------------------------------------------------------------
/scripts/stop_service.sh:
--------------------------------------------------------------------------------
1 | systemctl stop fb-comment-downloader.service
2 | 


--------------------------------------------------------------------------------
/src/config.py:
--------------------------------------------------------------------------------
1 | import os
2 | 
3 | access_token = os.environ['PAGE_ACCESS_TOKEN'] or "x"
4 | 
5 | page_id = os.environ['PAGE_ID'] or "-1" # ID of the public Facebook page where comments will be downloaded from.
6 | page_name = os.environ['PAGE_NAME'] or "PAGE NAME" # Name of page, must be exactly as shown in page url.
7 | 


--------------------------------------------------------------------------------
/src/fb_comment_downloader_app.py:
--------------------------------------------------------------------------------
 1 | import io
 2 | import csv
 3 | import re
 4 | 
 5 | from flask import Flask
 6 | from flask import render_template
 7 | from flask import stream_with_context
 8 | from flask import Response
 9 | from flask import request
10 | from flask import jsonify
11 | 
12 | from werkzeug.datastructures import Headers
13 | 
14 | from validation import get_post_id
15 | from validation import get_page_name
16 | 
17 | from get_fb_comments_from_fb import scrapeFacebookPageFeedComments
18 | from get_fb_comments_from_fb import request_once
19 | 
20 | import config
21 | 
22 | app = Flask(__name__)
23 | 
24 | @app.route('/')
25 | def index():
26 |     return render_template('index.html', page_name=config.page_name)
27 | 
28 | @app.route('/', methods=['POST'])
29 | def index_post():
30 | 
31 |     error = None
32 |     url = request.form['text']
33 | 
34 |     if request_once(url) == None:
35 |         message="Please make sure you entered a vaild url"
36 |         return jsonify({"error": message})
37 | 
38 |     if get_page_name(url) != config.page_name:
39 |         message="Please enter a post url for the {0} page".format(config.page_name)
40 |         return jsonify({"error": message})
41 | 
42 | 
43 |     post_id = get_post_id(url)
44 | 
45 |     status_id = "{0}_{1}".format(config.page_id, post_id)
46 | 
47 |     if status_id == None:
48 |         message="Please make sure you entered a vaild Facebook url"
49 |         return jsonify({"error": message})
50 | 
51 |     si = io.StringIO()
52 |     cw = csv.writer(si)
53 | 
54 |     # add a filename
55 |     headers = Headers()
56 |     headers.set('Content-Disposition', 'attachment', filename='fb_comments.csv')
57 | 
58 |     # stream the response as the data is generated
59 |     return Response(
60 |         stream_with_context(scrapeFacebookPageFeedComments(
61 |             si,
62 |             cw,
63 |             config.page_id,
64 |             config.access_token,
65 |             status_id)),
66 |         mimetype='application/download', headers=headers
67 |     )
68 | 
69 | if __name__ == "__main__":
70 |         app.run()
71 | 


--------------------------------------------------------------------------------
/src/get_fb_comments_from_fb.py:
--------------------------------------------------------------------------------
  1 | # MIT License
  2 | #
  3 | # Copyright (c) 2017 Max Woolf
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person
  6 | # obtaining a copy of this software and associated documentation
  7 | # files (the "Software"), to deal in the Software without
  8 | # restriction, including without limitation the rights to use,
  9 | # copy, modify, merge, publish, distribute, sublicense, and/or
 10 | # sell copies of the Software, and to permit persons to whom
 11 | # the Software is furnished to do so, subject to the following conditions:
 12 | 
 13 | # The above copyright notice and this permission notice shall be included
 14 | # in all copies or substantial portions of the Software.
 15 | 
 16 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 17 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 18 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 19 | # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 20 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 21 | # WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 22 | # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
 23 | # THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 24 | #
 25 | # https://github.com/minimaxir/facebook-page-post-scraper
 26 | #
 27 | import json
 28 | import datetime
 29 | import csv
 30 | import time
 31 | 
 32 | try:
 33 |     from urllib.request import urlopen, Request
 34 | except ImportError:
 35 |     from urllib.request import urlopen, Request
 36 | 
 37 | # Modififed to only attempt once - Logan Sims
 38 | def request_once(url):
 39 |     req = Request(url)
 40 |     try:
 41 |         response = urlopen(req)
 42 |         if response.getcode() == 200:
 43 |             success = True
 44 |     except Exception as e:
 45 |         print(e)
 46 |         print("Error for URL {}: {}".format(url, datetime.datetime.now()))
 47 |         return None
 48 | 
 49 |     return response.read()
 50 | 
 51 | # Needed to write tricky unicode correctly to csv
 52 | def unicode_decode(text):
 53 |     try:
 54 |         return text.encode('utf-8').decode()
 55 |     except UnicodeDecodeError:
 56 |         return text.encode('utf-8')
 57 | 
 58 | def getFacebookCommentFeedUrl(base_url):
 59 | 
 60 |     # Construct the URL string
 61 |     fields = "&fields=id,message,reactions.limit(0).summary(true)" + \
 62 |         ",created_time,comments,from,attachment"
 63 |     url = base_url + fields
 64 | 
 65 |     return url
 66 | 
 67 | def processFacebookComment(comment, status_id, parent_id=''):
 68 | 
 69 |     # The status is now a Python dictionary, so for top-level items,
 70 |     # we can simply call the key.
 71 | 
 72 |     # Additionally, some items may not always exist,
 73 |     # so must check for existence first
 74 | 
 75 |     comment_id = comment['id']
 76 |     comment_message = '' if 'message' not in comment or comment['message'] \
 77 |         is '' else unicode_decode(comment['message'])
 78 | 
 79 |     comment_author = unicode_decode(comment['from']['name'] if 'from' in comment else "user")
 80 | 
 81 |     if 'attachment' in comment:
 82 |         attachment_type = comment['attachment']['type']
 83 |         attachment_type = 'gif' if attachment_type == 'animated_image_share' \
 84 |             else attachment_type
 85 |         attach_tag = "[[{}]]".format(attachment_type.upper())
 86 |         comment_message = attach_tag if comment_message is '' else \
 87 |             comment_message + " " + attach_tag
 88 | 
 89 |     # Time needs special care since a) it's in UTC and
 90 |     # b) it's not easy to use in statistical programs.
 91 | 
 92 |     comment_published = datetime.datetime.strptime(
 93 |         comment['created_time'], '%Y-%m-%dT%H:%M:%S+0000')
 94 |     comment_published = comment_published + datetime.timedelta(hours=-5)  # EST
 95 |     comment_published = comment_published.strftime(
 96 |         '%Y-%m-%d %H:%M:%S')  # best time format for spreadsheet programs
 97 | 
 98 |     # Return a tuple of all processed data
 99 | 
100 |     return (comment_id, status_id, parent_id, comment_message, comment_author,
101 |             comment_published)
102 | 
103 | # Modififed to yield contents of CSV - Logan Sims
104 | def scrapeFacebookPageFeedComments(stringIO, writer, page_id, access_token, status_id):
105 |     writer.writerow(["comment_id", "status_id", "parent_id", "comment_message",
106 |                 "comment_author", "comment_published"])
107 | 
108 |     num_processed = 0
109 |     scrape_starttime = datetime.datetime.now()
110 |     after = ''
111 |     base = "https://graph.facebook.com/v2.9"
112 |     parameters = "/?limit={}&access_token={}".format(
113 |         100, access_token)
114 | 
115 |     print("Scraping {} Comments From Posts: {}\n".format(page_id, scrape_starttime))
116 | 
117 |     reader = [dict(status_id=status_id)]
118 | 
119 |     for status in reader:
120 |         has_next_page = True
121 | 
122 |         while has_next_page:
123 | 
124 |             node = "/{}/comments".format(status['status_id'])
125 |             after = '' if after is '' else "&after={}".format(after)
126 |             base_url = base + node + parameters + after
127 | 
128 |             url = getFacebookCommentFeedUrl(base_url)
129 | 
130 |             data = request_once(url)
131 | 
132 |             if data is None:
133 |                 writer.writerow(['url error'])
134 |                 yield stringIO.getvalue()
135 |                 stringIO.seek(0)
136 |                 stringIO.truncate(0)
137 |                 raise StopIteration
138 |                 
139 |             # python 3.6+ decodes automatically 
140 |             try:
141 |                 comments = json.loads(data)
142 |             except TypeError:
143 |                 comments = json.loads(data.decode('utf-8'))
144 | 
145 |             for comment in comments['data']:
146 |                 comment_data = processFacebookComment(
147 |                     comment, status['status_id'])
148 | 
149 |                 writer.writerow(comment_data)
150 |                 yield stringIO.getvalue()
151 |                 stringIO.seek(0)
152 |                 stringIO.truncate(0)
153 | 
154 |                 if 'comments' in comment:
155 |                     has_next_subpage = True
156 |                     sub_after = ''
157 | 
158 |                     while has_next_subpage:
159 |                         sub_node = "/{}/comments".format(comment['id'])
160 |                         sub_after = '' if sub_after is '' else "&after={}".format(
161 |                             sub_after)
162 |                         sub_base_url = base + sub_node + parameters + sub_after
163 | 
164 |                         sub_url = getFacebookCommentFeedUrl(
165 |                             sub_base_url)
166 |                         sub_comments = json.loads(
167 |                             request_once(sub_url))
168 | 
169 |                         for sub_comment in sub_comments['data']:
170 |                             sub_comment_data = processFacebookComment(
171 |                                 sub_comment, status['status_id'], comment['id'])
172 | 
173 |                             writer.writerow(sub_comment_data)
174 |                             yield stringIO.getvalue()
175 |                             stringIO.seek(0)
176 |                             stringIO.truncate(0)
177 | 
178 |                             num_processed += 1
179 |                             if num_processed % 100 == 0:
180 |                                 print("{} Comments Processed: {}".format(num_processed,datetime.datetime.now()))
181 | 
182 |                         if 'paging' in sub_comments:
183 |                             if 'next' in sub_comments['paging']:
184 |                                 sub_after = sub_comments[
185 |                                     'paging']['cursors']['after']
186 |                             else:
187 |                                 has_next_subpage = False
188 |                         else:
189 |                             has_next_subpage = False
190 | 
191 |                 # output progress occasionally to make sure code is not
192 |                 # stalling
193 |                 num_processed += 1
194 |                 if num_processed % 100 == 0:
195 |                     print("{} Comments Processed: {}".format(num_processed, datetime.datetime.now()))
196 | 
197 |             if 'paging' in comments:
198 |                 if 'next' in comments['paging']:
199 |                     after = comments['paging']['cursors']['after']
200 |                 else:
201 |                     has_next_page = False
202 |             else:
203 |                 has_next_page = False
204 | 
205 |         print("\nDone!\n{} Comments Processed in {}".format(num_processed, datetime.datetime.now() - scrape_starttime))
206 | 


--------------------------------------------------------------------------------
/src/static/loader.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/WSDOT/fb-comment-downloader/630c0e5b44a0f17b477df1879ac852b12dbb1060/src/static/loader.gif


--------------------------------------------------------------------------------
/src/static/script.js:
--------------------------------------------------------------------------------
 1 | function setup(){
 2 | 
 3 | // Set up loading spinner
 4 | $body = $("body");
 5 | $(document).on({
 6 |   ajaxStart: function() { $body.addClass("loading");    },
 7 |   ajaxStop: function() { $body.removeClass("loading"); }    
 8 | });
 9 | 
10 | // Get comments, generate download button when complete
11 | $( "#url-form" ).submit(function( event ) {
12 | 
13 |   $("#download").empty()
14 |   $("#error").empty()
15 | 
16 |   var form = $(this);
17 |   $.ajax({ 
18 |     url   : form.attr('action'),
19 |     type  : form.attr('method'),
20 |     data  : form.serialize(), // data to be submitted
21 |     success: function(response){
22 |       if (response.error) {
23 |         console.log(response.error)
24 |         $("<p class=\"error-message\">" + response.error + "</p>").appendTo("#error");    
25 |       
26 |       } else {
27 |         var downloadLink = "<a class=\"download-button\" href=\"data:application/csv;charset=utf-8," + 
28 |           encodeURIComponent(response)  + "\" download=\"comments.csv\">" + 
29 |           "<i class=\"material-icons\">&#xE2C4</i> Download Comments</a>"      
30 | 
31 |         $(downloadLink).appendTo("#download");
32 |       }
33 |     },
34 |     error: function(error) {
35 |       alert("error")
36 |     }
37 |   });
38 |   event.preventDefault();
39 | });
40 | 
41 | }
42 | 


--------------------------------------------------------------------------------
/src/static/style.css:
--------------------------------------------------------------------------------
 1 | h3, p {
 2 |   font-family: Arial, sans-serif;
 3 | }
 4 | 
 5 | input[type=text], select {
 6 |   font-family: Arial, sans-serif;
 7 |   font-size: 10pt;
 8 |   width: 100%;
 9 |   padding: 12px 20px;
10 |   margin: 8px 0;
11 |   display: inline-block;
12 |   border: 1px solid #ccc;
13 |   border-radius: 4px;
14 |   box-sizing: border-box;
15 | }
16 | 
17 | input[type=submit], .download-button {
18 |   font-family: Arial, sans-serif;
19 |   width: 100%;
20 |   color: white;
21 |   padding: 14px 20px;
22 |   font-size: 12pt;
23 |   margin: 8px 0;
24 |   border: none;
25 |   border-radius: 4px;
26 |   cursor: pointer;
27 | }
28 | 
29 | input[type=submit], .download-button {
30 |   background-color: #00795F;
31 | }
32 | 
33 | input[type=submit]:hover {
34 |   background-color: #004F50;
35 | }
36 | 
37 | .download-button {
38 |   background-color: #007A99;
39 |   text-decoration: none;
40 | }
41 | 
42 | .info-1 {
43 |   font-family: Arial, sans-serif;
44 |   font-size: 12pt;
45 | }
46 | 
47 | .info-2 {
48 |   font-family: Arial, sans-serif;
49 |   font-size: 10pt;
50 |   padding-bottom: 14px;
51 | }
52 | 
53 | .error-message {
54 |   color: #af2b2b;
55 | }
56 | 
57 | .form {
58 |   width: 50%;
59 |   border-radius: 5px;
60 |   background-color: #f2f2f2;
61 |   padding: 20px;
62 |   display: inline-block;
63 | } 
64 | 
65 | .material-icons, .icon-text {
66 |   vertical-align: middle;
67 | }
68 | 
69 | .wrapper {
70 |   text-align: center;
71 | }
72 | 
73 | /* loading spinner */
74 | .modal {
75 |     display:    none;
76 |     position:   fixed;
77 |     z-index:    1000;
78 |     top:        0;
79 |     left:       0;
80 |     height:     100%;
81 |     width:      100%;
82 |     background: rgba( 255, 255, 255, .8 ) 
83 |                 url('loader.gif') 
84 |                 50% 50% 
85 |                 no-repeat;
86 | }
87 | 
88 | body.loading {
89 |     overflow: hidden;   
90 | }
91 | 
92 | body.loading .modal {
93 |     display: block;
94 | }
95 | 


--------------------------------------------------------------------------------
/src/templates/index.html:
--------------------------------------------------------------------------------
 1 | <!doctype html>
 2 | {% block head %}
 3 |   <meta charset=utf-8>
 4 |   <title>Get Comments</title>  
 5 |   <link rel=stylesheet type=text/css href="{{ url_for('static', filename='style.css') }}">
 6 |   <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
 7 |   
 8 |   <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
 9 |   <script type="text/javascript" src="static/script.js"></script>  
10 | {% endblock %}
11 | 
12 | <script>
13 |   window.onload = function() {
14 |     setup();
15 |   };
16 | </script>
17 | 
18 | {% block body %}
19 | <div class="wrapper">
20 |   <div class="form">
21 |     <h3>Enter a {{ page_name }} Facebook post url</h3>
22 |     <p class="info-1">
23 |       Click a post's timestamp to get the url
24 |     </p>
25 |     <form id="url-form" action="." method="POST">
26 |       <input type="text" name="text">
27 |       <input type="submit" value="Get Comments">
28 |     </form>
29 |     
30 |     <p class="info-2">
31 |     This may take awhile for posts with a large number of comments
32 |     </p>    
33 | 
34 |     <div id="download"></div>
35 |     <div id="error"></div>
36 | 
37 |   </div>
38 | </div>
39 | 
40 | <div class="modal"/>  
41 | 
42 | {% endblock %}
43 | 


--------------------------------------------------------------------------------
/src/test_data_urls.json:
--------------------------------------------------------------------------------
1 | [
2 |   "https://www.facebook.com/WSDOT/photos/a.142223281974.122484.20110391974/10154885382271975/?type=3&theater", 
3 |   "https://www.facebook.com/WSDOT/photos/a.142223281974.122484.20110391974/10154885382271975/", 
4 |   "https://www.facebook.com/WSDOT/photos/a.142223281974.122484.20110391974/10154885382271975", 
5 |   "https://www.facebook.com/WSDOT/posts/10154885382271975", 
6 |   "https://www.facebook.com/WSDOT/posts/10154885382271975/",
7 |   "https://business.facebook.com/WSDOT/posts/10154885382271975"
8 | ] 
9 | 


--------------------------------------------------------------------------------
/src/test_validation.py:
--------------------------------------------------------------------------------
 1 | import unittest
 2 | from ddt import ddt, file_data
 3 | 
 4 | from validation import get_post_id_from_fb_url
 5 | from validation import get_page_name_from_fb_url
 6 | 
 7 | expected_post_id = "10154885382271975"
 8 | expected_page_name = "WSDOT"
 9 | 
10 | @ddt
11 | class ValidationTestCase(unittest.TestCase):
12 |     
13 |     @file_data('test_data_urls.json')
14 |     def test_post_id_match(self, test_url):
15 | 
16 |         post_id = get_post_id_from_fb_url(test_url)
17 | 
18 |         self.assertEqual(
19 |                 post_id,
20 |                 expected_post_id,
21 |                 'post_id {0} did not match expected id {1} '.format(post_id, expected_post_id))
22 | 
23 | 
24 |     @file_data('test_data_urls.json')
25 |     def test_page_name_match(self, test_url):
26 | 
27 |         page_name = get_page_name_from_fb_url(test_url)
28 | 
29 |         self.assertEqual(
30 |                 page_name,
31 |                 expected_page_name,
32 |                 'page_name {0} did not match expected name {1} '.format(page_name, expected_page_name))
33 | 
34 | if __name__ == '__main__':
35 |         unittest.main(verbosity=2)
36 | 
37 | 


--------------------------------------------------------------------------------
/src/validation.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | 
 3 | def get_post_id(url):
 4 |     if is_fb_url(url):
 5 |         return get_post_id_from_fb_url(url)
 6 |     else:
 7 |         return None
 8 | 
 9 | # gets a post_id from a facebook url
10 | # possible fb urls: https://developers.facebook.com/docs/plugins/oembed-endpoints
11 | def get_post_id_from_fb_url(url):
12 | 
13 |     #remove any query string
14 |     matches = re.search(r'(\/[^?]+).*', url)
15 | 
16 |     if matches:
17 |         matches_list = matches.group(1).split('/')
18 |         filtered_matches_list = [_f for _f in matches_list if _f]
19 |         return filtered_matches_list[-1]
20 | 
21 |     else:
22 |         return None
23 | 
24 | def get_page_name(url):
25 |     if is_fb_url(url):
26 |         return get_page_name_from_fb_url(url)
27 |     else:
28 |         return None
29 | 
30 | def get_page_name_from_fb_url(url):
31 | 
32 |     # Matches the page name from a facebook url.
33 |     # https://gist.github.com/marcgg/733592/ae0ca10a7a344140abf8e9bb890868e872c39756
34 |     matches = re.search(r'^(?:https?:\/\/)?(?:www\.|m\.|touch\.|business\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*?(\/)?([^/?\s]*)(?:/|&|\?)?.*$', url)
35 | 
36 |     if matches is None:
37 |         return matches
38 |     else:
39 |         return matches.group(2)
40 | 
41 | 
42 | def is_fb_url(url):
43 |     matches = re.search(r'^(?:https?:\/\/)?(?:www\.)?(?:business.)?facebook.com(?=\/[a-zA-Z0-9(\.\?)?])', url)
44 | 
45 |     if matches:
46 |         return True
47 |     else:
48 |         return False
49 | 


--------------------------------------------------------------------------------