├── .github
├── FUNDING.yml
├── ISSUE_TEMPLATE
│ └── bug_report.md
└── PULL_REQUEST_TEMPLATE.md
├── .whitesource
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE.md
├── README.md
├── TODO.md
├── _config.yml
├── optracker
├── __init__.py
├── data
│ ├── __init__.py
│ └── face_models
│ │ ├── dlib_face_recognition_resnet_model_v1.dat
│ │ ├── mmod_human_face_detector.dat
│ │ └── shape_predictor_5_face_landmarks.dat
├── facerec
│ ├── __init__.py
│ ├── api.py
│ └── facerec.py
├── functions
│ ├── __init__.py
│ ├── core_func.py
│ ├── db_func.py
│ ├── instagram_func.py
│ └── side_func.py
├── igramscraper
│ ├── __init__.py
│ ├── endpoints.py
│ ├── exception
│ │ ├── __init__.py
│ │ ├── instagram_auth_exception.py
│ │ ├── instagram_exception.py
│ │ └── instagram_not_found_exception.py
│ ├── helper.py
│ ├── instagram.py
│ ├── model
│ │ ├── __init__.py
│ │ ├── account.py
│ │ ├── carousel_media.py
│ │ ├── comment.py
│ │ ├── initializer_model.py
│ │ ├── location.py
│ │ ├── media.py
│ │ ├── story.py
│ │ ├── tag.py
│ │ └── user_stories.py
│ ├── session_manager.py
│ └── two_step_verification
│ │ ├── __init__.py
│ │ ├── console_verification.py
│ │ └── two_step_verification_abstract_class.py
├── optracker.py
└── zerodata.py
├── requirements.txt
├── run_tracker.py
└── setup.py
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | # These are supported funding model platforms
2 | open_collective: # Replace with a single Open Collective username
3 | ko_fi: mknoph
4 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
5 | issuehunt: # Replace with a single IssueHunt username
6 | otechie: # Replace with a single Otechie username
7 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Bug report
3 | about: Create a report to help us improve
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 |
13 | **To Reproduce**
14 | Steps to reproduce the behavior:
15 | 1. Go to '...'
16 | 2. Click on '....'
17 | 3. Scroll down to '....'
18 | 4. See error
19 |
20 | **Expected behavior**
21 | A clear and concise description of what you expected to happen.
22 |
23 | **Screenshots**
24 | If applicable, add screenshots to help explain your problem.
25 |
26 | **Desktop (please complete the following information):**
27 | - OS: [e.g. iOS]
28 | - Browser [e.g. chrome, safari]
29 | - Version [e.g. 22]
30 |
31 | **Additional context**
32 | Add any other context about the problem here.
33 |
--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | # Description
2 |
3 | Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
4 |
5 | Fixes # (issue)
6 |
7 | ## Type of change
8 |
9 | Please delete options that are not relevant.
10 |
11 | - [ ] Bug fix (non-breaking change which fixes an issue)
12 | - [ ] New feature (non-breaking change which adds functionality)
13 | - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
14 | - [ ] This change requires a documentation update
15 |
16 | # How Has This Been Tested?
17 |
18 | Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
19 |
20 |
21 | # Checklist:
22 |
23 | - [ ] My code follows the style guidelines of this project
24 | - [ ] I have performed a self-review of my own code
25 | - [ ] I have commented my code, particularly in hard-to-understand areas
26 | - [ ] I have made corresponding changes to the documentation
27 | - [ ] My changes generate no new warnings
28 | - [ ] I have added tests that prove my fix is effective or that my feature works
29 | - [ ] New and existing unit tests pass locally with my changes
30 | - [ ] Any dependent changes have been merged and published in downstream modules
--------------------------------------------------------------------------------
/.whitesource:
--------------------------------------------------------------------------------
1 | {
2 | "checkRunSettings": {
3 | "vulnerableCheckRunConclusionLevel": "failure"
4 | },
5 | "issueSettings": {
6 | "minSeverityLevel": "LOW"
7 | }
8 | }
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 |

2 |
3 | # Contributor Covenant Code of Conduct
4 |
5 | ## Our Pledge
6 |
7 | In the interest of fostering an open and welcoming environment, we as
8 | contributors and maintainers pledge to making participation in our project and
9 | our community a harassment-free experience for everyone, regardless of age, body
10 | size, disability, ethnicity, sex characteristics, gender identity and expression,
11 | level of experience, education, socio-economic status, nationality, personal
12 | appearance, race, religion, or sexual identity and orientation.
13 |
14 | ## Our Standards
15 |
16 | Examples of behavior that contributes to creating a positive environment
17 | include:
18 |
19 | * Using welcoming and inclusive language
20 | * Being respectful of differing viewpoints and experiences
21 | * Gracefully accepting constructive criticism
22 | * Focusing on what is best for the community
23 | * Showing empathy towards other community members
24 |
25 | Examples of unacceptable behavior by participants include:
26 |
27 | * The use of sexualized language or imagery and unwelcome sexual attention or
28 | advances
29 | * Trolling, insulting/derogatory comments, and personal or political attacks
30 | * Public or private harassment
31 | * Publishing others' private information, such as a physical or electronic
32 | address, without explicit permission
33 | * Other conduct which could reasonably be considered inappropriate in a
34 | professional setting
35 |
36 | ## Our Responsibilities
37 |
38 | Project maintainers are responsible for clarifying the standards of acceptable
39 | behavior and are expected to take appropriate and fair corrective action in
40 | response to any instances of unacceptable behavior.
41 |
42 | Project maintainers have the right and responsibility to remove, edit, or
43 | reject comments, commits, code, wiki edits, issues, and other contributions
44 | that are not aligned to this Code of Conduct, or to ban temporarily or
45 | permanently any contributor for other behaviors that they deem inappropriate,
46 | threatening, offensive, or harmful.
47 |
48 | ## Scope
49 |
50 | This Code of Conduct applies both within project spaces and in public spaces
51 | when an individual is representing the project or its community. Examples of
52 | representing a project or community include using an official project e-mail
53 | address, posting via an official social media account, or acting as an appointed
54 | representative at an online or offline event. Representation of a project may be
55 | further defined and clarified by project maintainers.
56 |
57 | ## Enforcement
58 |
59 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
60 | reported by contacting the project team at marcuscrazy@gmail.com. All
61 | complaints will be reviewed and investigated and will result in a response that
62 | is deemed necessary and appropriate to the circumstances. The project team is
63 | obligated to maintain confidentiality with regard to the reporter of an incident.
64 | Further details of specific enforcement policies may be posted separately.
65 |
66 | Project maintainers who do not follow or enforce the Code of Conduct in good
67 | faith may face temporary or permanent repercussions as determined by other
68 | members of the project's leadership.
69 |
70 | ## Attribution
71 |
72 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
73 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
74 |
75 | [homepage]: https://www.contributor-covenant.org
76 |
77 | For answers to common questions about this code of conduct, see
78 | https://www.contributor-covenant.org/faq
79 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Contributing
4 |
5 | When contributing to this repository, please first discuss the change you wish to make via issue,
6 | email, or any other method with the owners of this repository before making a change.
7 |
8 | Please note we have a code of conduct, please follow it in all your interactions with the project.
9 |
10 | ## Pull Request Process
11 |
12 | 1. Ensure any install or build dependencies are removed before the end of the layer when doing a
13 | build.
14 | 2. Update the README.md with details of changes to the interface, this includes new environment
15 | variables, exposed ports, useful file locations and container parameters.
16 | 3. Increase the version numbers in any examples files and the README.md to the new version that this
17 | Pull Request would represent. The versioning scheme we use is [SemVer](http://semver.org/).
18 | 4. You may merge the Pull Request in once you have the sign-off of two other developers, or if you
19 | do not have permission to do that, you may request the second reviewer to merge it for you.
20 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 Marcus Knoph
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | 
3 | 
4 | 
5 | 
6 | 
7 | 
8 | 
9 |
10 |
11 | # openSource Tracker
12 | Easy to use program for scraping openSources, saves data and enable you to analyze it in your favorite graphic display. I created the projected based on instagram scraper, witch allows you to get data from Instagram without API. The goal of this project is to make it easy for everyone to gather openSource content and analyze it.
13 |
14 |
15 |
16 | ## How to install
17 | ***Simply run:***
18 | ```cmd
19 | pip install optracker
20 | ```
21 |
22 | ***Or download the project via git clone and run the following:***
23 | ```cmd
24 | pip install -r requirements.txt
25 | python .\run_tracker.py
26 | ```
27 |
28 | ## Getting Started
29 | The projects found here are for my own study for confirming and testing out theory according to social network analyzing. They can be used and altered as you see fit. To use it you need to install some required library for python see install.
30 |
31 | ### 1. Running the program
32 | To run simply type: **optracker** in console if you installed it from PIP. If you downloaded it from github: **python .\run_tracker.py** from the optracker directory.
33 |
34 | ***NB! You will need to run the script as administrator if you are using windows***
35 |
36 | ### 2. Userlist
37 | The program need functional accounts to work. They can either be added manually when you run it for the first time. Or create a local file with usernames and password. They will then be added to the database automatically on startup. In experience you need more then one user account to scan large list of users so your user don't get blocked because of to many requests.
38 |
39 | >**The following user list can be created:**
40 | >- inst_user.txt
41 | >- face_user.txt
42 | >- user_list.txt
43 |
44 | You don't need to have a separate list for facebook or instagram but some people prefer it. You can add all the userdata in the same file by using user_list.txt. They all need to be setup the same way anyway.
45 |
46 | ```python
47 | #Setup for user_list file
48 | {USERNAME}, {PASSWORD}, {EMAIL}, {FULLNAME}, {ACCOUNT}
49 | ```
50 | Account type can be: **facebook** or **instagram**. It is so the program know witch account to use where.
51 |
52 | ```python
53 | #Example of insta_user.txt
54 | my_username, my_password, my_email, my_fullname, instagram
55 | my_username2, my_password2, my_email3, my_fullname2, facebook
56 | my_username3, my_password3, my_email3, my_fullname3, instagram
57 | ```
58 |
59 | **User list** will update each time you start the program, so new users can be added directly into the .txt document or you can add them manually into the program at start up.
60 |
61 | **Place for userlist** are in root directory. Usually is it ***c:\optracker*** or ***\optracker*** for Linux
62 | ```cmd
63 | optracker/
64 | userlist.txt
65 | db/
66 | openSource-tracker.db
67 | export/
68 | node.csv
69 | egdes.csv
70 | ```
71 |
72 |
73 | ### 3. How to use
74 | When you run the program it will first try to connect to Instagram, if youdon'tt have a user file you will be asked to enter a username and password. After that you will get the option to choose from a menu. Start by running a single scan of one account. After that you can run more single scan to grow your node database or use follow by scan options. You also have a help menu that will give you all the information you need.
75 |
76 | > ### Root Folder
77 | > Root folder for the program are the lowest dir. Usally is it ***c:\optracker*** or ***\optracker*** for linux
78 |
79 | ### 4. First time scraping
80 | The first time you scrape all the users will be saved as nodes. This will take some time, since we also want to save all the info we can get for each node. During this a lot of request will be send to the target server for the scrape, and as a result some of your user account may be blocked because of to many request in a short time. Laster when you scrape instagram as an example it will check if the node all ready exist in your database, if so it only add the connections it finds and your request to the server fall. Conclusion is that the bigger node base you have the faster you can scrape, and less request will be made.
81 |
82 | ### 5. Scan all follower
83 | You will be presented with a list of users that you have finnished adding to your database. The program will then scan all the connections it has that are not private, add the nodes to DB and connections in edges.
84 |
85 | ### 6. Max Follows and Max Followed by
86 | During **Scan all follower**, where you scan the profile for one user that have completet the singel search you can set a limit to how many followers a user can have or how many it are following. This is to prevent to scan uninterested profils like public organizations and so on as they can have up to 10K. Default is 2000 and is considerated a normal amount of followes/followed by.
87 |
88 | ### 7. Deepscan and Surfacescan
89 | By turning on surfacescan you only extract username and instagram id when scraping. This is to save you for request to the server so you can use one user for a longer periode of time, and make the scan go quicker if you are scraping a big nettwork. You can later add specific users found in the graphic to a text file and scan only the ones that are interesting and get all the data.
90 |
91 | ### 8. Deepscan from list
92 | Gives you the possibility to run a deep scan on a selected list of users. It will scrape all the data from instagram for the selected ones, and update DB Node. You need to create a file in **ROOT FOLDER** called **user_scan_insta.txt**
93 | ```cmd
94 | optracker/
95 | userlist.txt
96 | user_scan_insta.txt
97 | db/
98 | openSource-tracker.db
99 | export/
100 | node.csv
101 | egdes.csv
102 | ```
103 | Content of the list need to be one username per line:
104 | ```python
105 | {USER 1}
106 | {USER 2}
107 | {USER 3}
108 | {USER 4}
109 | ```
110 | ### 9. Detail Print
111 | On Default is it turned **OFF** you will only get the minimum of info to see if it is working properly. If you turn it **ON** will you be presented with all the output the scraper have.
112 |
113 | ### 10. Download Profile Image
114 | The program will download every Instagram profile image it scans for face recognition. It saves it to **profile_pic_insta**. You can turn it of from default value menu.
115 |
116 | ```cmd
117 | optracker/
118 | userlist.txt
119 | user_scan_insta.txt
120 | db/
121 | openSource-tracker.db
122 | export/
123 | node.csv
124 | egdes.csv
125 | instadata/
126 | profile_pic_insta/
127 | /**FIRST TWO IN ID**
128 | /**SECOND TWO IN ID**
129 | /**INSTA USER**-**INT INC**.jpg
130 | post/
131 | ```
132 |
133 | ### 11. Update Profile Image
134 | Running this will check the DB agenst profile image folder, and download all the images that are missing.
135 |
136 | ### 12. Change default value
137 | From the menu can you change default values like surfacescan, max follow and mysql or sqlite with more. To change select yes, fill in new value, if you dont want to change one value leave it blank.
138 |
139 | ### 13. Face reco-
140 |
141 | ## Database Information
142 | By default the scraper use **SQLite**, all the data are stored in **optracker/db/openSource-tracker.db**.
143 |
144 | > **MySQL** are also available to use. Current version tested and found OK is **MySQL 8.0.18**. You can change the database settings in the menu. But you need to download and install the latest version of Mysql and create a database called **openSource-tracker**, if you dont have an online version you want to use instead of local. Also remember to use **utf8mb4**. The following are default:
145 | > * DB_MYSQL = "localhost"
146 | > * DB_MYSQL_USER = "optracker"
147 | > * DB_MYSQL_PASSWORD = "localpassword"
148 | > * DB_MYSQL_DATABASE = "openSource_tracker"
149 | > * DB_MYSQL_PORT = "3306"
150 | > * DB_MYSQL_ON = 0
151 | > * DB_MYSQL_COLLATION = "utf8mb4_general_ci"
152 | > * DB_MYSQL_CHARSET = "utf8mb4"
153 | >
154 | >***Scraping big amount of data can be really slow if you use SQLite, therefore are MySQL an option if you plan on collectingg huge amounts.***
155 |
156 | **The database consist of the following tabels:**
157 | - accounts
158 | - edges_insta
159 | - nodes
160 | - options
161 | - new_insta
162 |
163 | > **Note!** All SQL data are saved in **optracker.config** located in root folder. The format are in JSON and you can change it as you would like to match your current DB. But I recomend to keep the standar settings.
164 |
165 |
166 | ### 1. Accounts
167 | Stores all your usernames and password for the different openSource sites.
168 |
169 | ### 2. Edges_insta
170 | Have list of all the connections. Rows are target, source, weight and type. This is all made to be used with gephi for visualizing the data in graph form. The numbers are connected to ID in nodes. Show how is following or connected to who.
171 |
172 | ### 3. nodes
173 | List of all the nodes created. They all have their own ID. It also contain all information scraped on a single user like username, email, bio and so on found in the different scraping sites.
174 |
175 | ### 4. Options
176 | Temporary table to store information like follow list, last search and so on for the program to use.
177 |
178 | ### 5. New_insta
179 | This table have a list of all instagram accounts that have been found during scraping. The program will used this to see witch account have not yet been fully scraped. When it is finnish are the account set to DONE. If you dont want the account to be scraped set the WAIT value to True. 0 = False, 1 = True. ***This can also be used in the case of a user have to many follower, or non at all so you dont want to scan it. When the user pop up, the scanner jumps over it.***
180 |
181 | ### 6. Export
182 | To export the data you can connect to the DB file under the db/folder. Or you can export it from the program. From main menu choose export. It will the generate two files **nodes.csv** and **egdes.csv**. You can then import this into your favorite graphic display
183 |
184 | ## Common Error
185 | ### 1. F String
186 | ```python
187 | Traceback (most recent call last):
188 | File "/usr/local/bin/optracker", line 6, in
189 | from optracker.optracker import run
190 | File "/usr/local/lib/python3.5/dist-packages/optracker/optracker.py", line 14, in
191 | from igramscraper.instagram import Instagram
192 | File "/usr/local/lib/python3.5/dist-packages/igramscraper/instagram.py", line 153
193 | cookies += f"{key}={session[key]}; "
194 | ^
195 | SyntaxError: invalid syntax
196 | ```
197 | To fix update python to latest, you are using an old version that dosent support **f""** you need to use **python3.6**
198 |
199 | ### 2. Instagram useragent
200 | ```
201 | ERROR: {"message": "useragent mismatch", "status": "fail"}
202 | ```
203 | Igramscraper are using a useragent that are not up to date. You need to update **self.user_agent** in **igramscraper/instagram.py**. Locate this file and look for somethong that looks like this:
204 | ```python
205 | self.user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) ' \
206 | 'AppleWebKit/537.36 (KHTML, like Gecko) ' \
207 | 'Chrome/66.0.3359.139 Safari/537.36'
208 | ```
209 | After this change it to a new useragent that are allowed by instagram, this is one example that worked in october 2019.
210 | ```python
211 | self.user_agent = 'Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X)' \
212 | 'AppleWebKit/605.1.15 (KHTML, like Gecko)' \
213 | 'Mobile/15E148 Instagram 105.0.0.11.118 (iPhone11,8; iOS 12_3_1; en_US; en-US; scale=2.00; 828x1792; 165586599)'
214 | ```
215 |
216 | ### 3. Private Instagram
217 | ```python
218 | File "\optracker\functions\instagram_func.py", line 20, in get_insta_following
219 | following = self.instagram.get_following(insta_id, totalFollow, self.page_size_check(totalFollow), delayed=True)
220 | File "\optracker\igramscraper\instagram.py", line 963, in get_following
221 | Instagram.HTTP_FORBIDDEN)
222 | optracker.igramscraper.exception.instagram_exception.InstagramException: Failed to get follows of account id ******. The account is private., Code:403
223 | ```
224 | When searhing profiles sometimes the user have set it to private after first scraping. When extracting data after this the program will stop and give an error that the profile is private. Just run it once more, the program have updated the profile automatic to private so it wont happen on the next scan.
225 |
226 | ### 4. Two step verification. Please report issue., Code:20
227 | ```
228 | Traceback (most recent call last):
229 | File "python37-32\lib\runpy.py", line 193, in _run_module_as_main
230 | "__main__", mod_spec)
231 | File "python37-32\lib\runpy.py", line 85, in _run_code
232 | exec(code, run_globals)
233 | File "Python37-32\Scripts\optracker.exe\__main__.py", line 7, in
234 | File "python37-32\lib\site-packages\optracker\optracker.py", line 174, in run
235 | myOptracker = Optracker()
236 | File "python37-32\lib\site-packages\optracker\optracker.py", line 56, in __init__
237 | self.autoSelectAndLogin()
238 | File "python37-32\lib\site-packages\optracker\optracker.py", line 97, in autoSelectAndLogin
239 | self.loginInstagram(self.instagram)
240 | File "python37-32\lib\site-packages\optracker\optracker.py", line 138, in loginInstagram
241 | self.instagram.login(force=False,two_step_verificator=True)
242 | File "python37-32\lib\site-packages\optracker\igramscraper\instagram.py", line 1324, in login
243 | two_step_verificator)
244 | File "python37-32\lib\site-packages\optracker\igramscraper\instagram.py", line 1414, in __verify_two_step
245 | response.status_code)
246 | optracker.igramscraper.exception.instagram_auth_exception.InstagramAuthException: Something went wrong when try two step verification. Please report issue., Code:20
247 | ```
248 | Something went wrong with instagram login. The username and password could not be used to loggin. Change the user value or add a new user, try once more and it schould work.
249 |
250 | ## What to do with the data?
251 | When you have gathered enough data its time to put them to some good. You have plenty of options first thing first, you can export the standar values from the program its self. It will generate to files: nodes.csv and egdes.csv
252 |
253 | This files are made to be used with [gephi](https://gephi.org). Import it to gephi and start the analyzeing. There are plenty of good tutorials out there for how to process the data. Some tips along the way is:
254 | - Import nodes first then egdes
255 | - Filter out extra nodes: **Filter -> Topology -> Degree Range** set to 2 is a good start.
256 | - Run statistics: **Network Diameter, Avereage Degree, Modularity**
257 | - Set size on nodes attribute: **Betweenness Centrallity**
258 | - Set color on nodes: **Modularity Class**
259 |
260 | This is an exampel of how it can look when finnish to easy see the pattern. You can also turn on label to see the names of the nodes.¨
261 |
262 | 
263 |
264 | ## Common Information
265 | - Look at TODO if you want to help: [TODO](https://github.com/suxSx/opensource-tracker/blob/master/TODO.md)
266 | - Read the CODE of Conduct before you edit: [Code of Conduct](https://github.com/suxSx/opensource-tracker/blob/master/CODE_OF_CONDUCT.md)
267 | - We use MIT License: [MIT](https://github.com/suxSx/opensource-tracker/blob/master/LICENSE.md)
268 |
269 | ### Worth mentioning
270 | - instagram-php-scraper [here](https://github.com/postaddictme/instagram-php-scraper/)
271 | - instagram-scraper [here](https://github.com/realsirjoe/instagram-scraper)
272 | - logo-design [here](http://freepik.com)
273 | - face-recognition [here](https://github.com/ageitgey/face_recognition)
274 |
--------------------------------------------------------------------------------
/TODO.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Todo:
4 | - [x] 27-10-2019 (U) Add update Node data when you run a check of node DB.
5 | - [ ] Make the code smaller. Repeating steps can be shorten
6 | - [x] 27-10-2019 (U) Make a stop function for if profile is private
7 | - [ ] Add try and catch in get user info. To enable error handling.
8 | - [ ] Make database for followers, and follower for easy rollback on error (delete when current user are done, and keypoint for insta user.)
9 | - [ ] Add functions scan keywords. (Look for specific keywords in user profiles (node) and then use a full single scan)
10 | - [ ] Add other platforms for data gathering
11 | - [x] 18-10-2019 (U) Added surface/deep scan.
12 | - [x] 01-10-2019 (U) Check up on Finnish status message in DB_TABLE_NEW_INSTA
13 | - [x] 07-10-2019 (U) Add max follower criteria in search options.
14 | - [x] 11-10-2019 (U) Root directory, PIP install, class updates.
15 | - [ ] Add user creation options
16 | - [x] 18-10-2019 (U) User DB NODE are updated in setCurrentUser(). And in userselect when scanFollowBy().
17 | - [x] 27-10-2019 (U) Scan user from DB og text that have Deep = 0
18 | - [x] 18-10-2019 (U) Added scan options for users in txt document.
19 | - [x] 19-10-2019 (P) updateNodesUser() ERROR fix.
20 | - [x] 28-10-2019 (U) Detail print added show minimum text or all.
21 | - [x] 11-11-2019 (N) Facerecognition added.
22 | - [x] 11-11-2019 (U) Scan profile image for a face and add it to collection for later use.
23 | - [ ] Add node-type to node, is it person, page with more.
24 | - [ ] Add download post, and scan faces to create relationship status between users for a more detail map scan.
25 |
26 | ## Rules
27 | When something are done, mark it as finnish and add date of completions and what kind of edit was made.
28 | - (U) = UPDATE
29 | - (P) = PATCH
30 | - (N) = NEW
31 |
32 | Exampel:
33 | - `07-10-2019 (P) Fix bug on line 127 in core.py`
34 | - `07-10-2019 (U) Added better search options`
35 | - `07-10-2019 (N) New functions added able to export into xml`
36 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-slate
--------------------------------------------------------------------------------
/optracker/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NorseByte/opensource-tracker/5388d845ba57bf6ca0aa80575608f4b903e1b8dc/optracker/__init__.py
--------------------------------------------------------------------------------
/optracker/data/__init__.py:
--------------------------------------------------------------------------------
1 | from pkg_resources import resource_filename
2 |
3 | def pose_predictor_five_point_model_location():
4 | return resource_filename(__name__, "face_models/shape_predictor_5_face_landmarks.dat")
5 |
6 | def face_recognition_model_location():
7 | return resource_filename(__name__, "face_models/dlib_face_recognition_resnet_model_v1.dat")
8 |
9 | def cnn_face_detector_model_location():
10 | return resource_filename(__name__, "face_models/mmod_human_face_detector.dat")
--------------------------------------------------------------------------------
/optracker/data/face_models/dlib_face_recognition_resnet_model_v1.dat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NorseByte/opensource-tracker/5388d845ba57bf6ca0aa80575608f4b903e1b8dc/optracker/data/face_models/dlib_face_recognition_resnet_model_v1.dat
--------------------------------------------------------------------------------
/optracker/data/face_models/mmod_human_face_detector.dat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NorseByte/opensource-tracker/5388d845ba57bf6ca0aa80575608f4b903e1b8dc/optracker/data/face_models/mmod_human_face_detector.dat
--------------------------------------------------------------------------------
/optracker/data/face_models/shape_predictor_5_face_landmarks.dat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NorseByte/opensource-tracker/5388d845ba57bf6ca0aa80575608f4b903e1b8dc/optracker/data/face_models/shape_predictor_5_face_landmarks.dat
--------------------------------------------------------------------------------
/optracker/facerec/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NorseByte/opensource-tracker/5388d845ba57bf6ca0aa80575608f4b903e1b8dc/optracker/facerec/__init__.py
--------------------------------------------------------------------------------
/optracker/facerec/api.py:
--------------------------------------------------------------------------------
1 | import PIL.Image
2 | import dlib
3 | import numpy as np
4 | from PIL import ImageFile
5 | from ..data import *
6 |
7 | class face_recognition():
8 | def __init__(self):
9 | ImageFile.LOAD_TRUNCATED_IMAGES = True
10 | self.face_detector = dlib.get_frontal_face_detector()
11 |
12 | self.predictor_5_point_model = pose_predictor_five_point_model_location()
13 | self.pose_predictor_5_point = dlib.shape_predictor(self.predictor_5_point_model)
14 |
15 | self.cnn_face_detection_model = cnn_face_detector_model_location()
16 | self.cnn_face_detector = dlib.cnn_face_detection_model_v1(self.cnn_face_detection_model)
17 |
18 | self.face_recognition_model = face_recognition_model_location()
19 | self.face_encoder = dlib.face_recognition_model_v1(self.face_recognition_model)
20 |
21 |
22 | def _rect_to_css(self, rect):
23 | """
24 | Convert a dlib 'rect' object to a plain tuple in (top, right, bottom, left) order
25 |
26 | :param rect: a dlib 'rect' object
27 | :return: a plain tuple representation of the rect in (top, right, bottom, left) order
28 | """
29 | return rect.top(), rect.right(), rect.bottom(), rect.left()
30 |
31 |
32 | def _css_to_rect(self, css):
33 | """
34 | Convert a tuple in (top, right, bottom, left) order to a dlib `rect` object
35 |
36 | :param css: plain tuple representation of the rect in (top, right, bottom, left) order
37 | :return: a dlib `rect` object
38 | """
39 | return dlib.rectangle(css[3], css[0], css[1], css[2])
40 |
41 |
42 | def _trim_css_to_bounds(self, css, image_shape):
43 | """
44 | Make sure a tuple in (top, right, bottom, left) order is within the bounds of the image.
45 |
46 | :param css: plain tuple representation of the rect in (top, right, bottom, left) order
47 | :param image_shape: numpy shape of the image array
48 | :return: a trimmed plain tuple representation of the rect in (top, right, bottom, left) order
49 | """
50 | return max(css[0], 0), min(css[1], image_shape[1]), min(css[2], image_shape[0]), max(css[3], 0)
51 |
52 |
53 | def face_distance(self, face_encodings, face_to_compare):
54 | """
55 | Given a list of face encodings, compare them to a known face encoding and get a euclidean distance
56 | for each comparison face. The distance tells you how similar the faces are.
57 |
58 | :param faces: List of face encodings to compare
59 | :param face_to_compare: A face encoding to compare against
60 | :return: A numpy ndarray with the distance for each face in the same order as the 'faces' array
61 | """
62 | if len(face_encodings) == 0:
63 | return np.empty((0))
64 |
65 | return np.linalg.norm(face_encodings - face_to_compare, axis=1)
66 |
67 |
68 | def load_image_file(self, file, mode='RGB'):
69 | """
70 | Loads an image file (.jpg, .png, etc) into a numpy array
71 |
72 | :param file: image file name or file object to load
73 | :param mode: format to convert the image to. Only 'RGB' (8-bit RGB, 3 channels) and 'L' (black and white) are supported.
74 | :return: image contents as numpy array
75 | """
76 | im = PIL.Image.open(file)
77 | if mode:
78 | im = im.convert(mode)
79 | return np.array(im)
80 |
81 |
82 | def _raw_face_locations(self, img, number_of_times_to_upsample=1, model="hog"):
83 | """
84 | Returns an array of bounding boxes of human faces in a image
85 |
86 | :param img: An image (as a numpy array)
87 | :param number_of_times_to_upsample: How many times to upsample the image looking for faces. Higher numbers find smaller faces.
88 | :param model: Which face detection model to use. "hog" is less accurate but faster on CPUs. "cnn" is a more accurate
89 | deep-learning model which is GPU/CUDA accelerated (if available). The default is "hog".
90 | :return: A list of dlib 'rect' objects of found face locations
91 | """
92 | if model == "cnn":
93 | return self.cnn_face_detector(img, number_of_times_to_upsample)
94 | else:
95 | return self.face_detector(img, number_of_times_to_upsample)
96 |
97 |
98 | def face_locations(self, img, number_of_times_to_upsample=1, model="hog"):
99 | """
100 | Returns an array of bounding boxes of human faces in a image
101 |
102 | :param img: An image (as a numpy array)
103 | :param number_of_times_to_upsample: How many times to upsample the image looking for faces. Higher numbers find smaller faces.
104 | :param model: Which face detection model to use. "hog" is less accurate but faster on CPUs. "cnn" is a more accurate
105 | deep-learning model which is GPU/CUDA accelerated (if available). The default is "hog".
106 | :return: A list of tuples of found face locations in css (top, right, bottom, left) order
107 | """
108 | if model == "cnn":
109 | return [self._trim_css_to_bounds(self._rect_to_css(face.rect), img.shape) for face in self._raw_face_locations(img, number_of_times_to_upsample, "cnn")]
110 | else:
111 | return [self._trim_css_to_bounds(self._rect_to_css(face), img.shape) for face in self._raw_face_locations(img, number_of_times_to_upsample, model)]
112 |
113 |
114 | def _raw_face_locations_batched(self, images, number_of_times_to_upsample=1, batch_size=128):
115 | """
116 | Returns an 2d array of dlib rects of human faces in a image using the cnn face detector
117 |
118 | :param img: A list of images (each as a numpy array)
119 | :param number_of_times_to_upsample: How many times to upsample the image looking for faces. Higher numbers find smaller faces.
120 | :return: A list of dlib 'rect' objects of found face locations
121 | """
122 | return cnn_face_detector(images, number_of_times_to_upsample, batch_size=batch_size)
123 |
124 |
125 | def batch_face_locations(self, images, number_of_times_to_upsample=1, batch_size=128):
126 | """
127 | Returns an 2d array of bounding boxes of human faces in a image using the cnn face detector
128 | If you are using a GPU, this can give you much faster results since the GPU
129 | can process batches of images at once. If you aren't using a GPU, you don't need this function.
130 |
131 | :param img: A list of images (each as a numpy array)
132 | :param number_of_times_to_upsample: How many times to upsample the image looking for faces. Higher numbers find smaller faces.
133 | :param batch_size: How many images to include in each GPU processing batch.
134 | :return: A list of tuples of found face locations in css (top, right, bottom, left) order
135 | """
136 | def convert_cnn_detections_to_css(detections):
137 | return [_trim_css_to_bounds(_rect_to_css(face.rect), images[0].shape) for face in detections]
138 |
139 | raw_detections_batched = _raw_face_locations_batched(images, number_of_times_to_upsample, batch_size)
140 |
141 | return list(map(convert_cnn_detections_to_css, raw_detections_batched))
142 |
143 |
144 | def _raw_face_landmarks(self, face_image, face_locations=None, model="large"):
145 | if face_locations is None:
146 | face_locations = _raw_face_locations(face_image)
147 | else:
148 | face_locations = [_css_to_rect(face_location) for face_location in face_locations]
149 |
150 |
151 | pose_predictor = pose_predictor_5_point
152 |
153 | return [pose_predictor(face_image, face_location) for face_location in face_locations]
154 |
155 |
156 | def face_landmarks(self, face_image, face_locations=None, model="large"):
157 | """
158 | Given an image, returns a dict of face feature locations (eyes, nose, etc) for each face in the image
159 |
160 | :param face_image: image to search
161 | :param face_locations: Optionally provide a list of face locations to check.
162 | :param model: Optional - which model to use. "large" (default) or "small" which only returns 5 points but is faster.
163 | :return: A list of dicts of face feature locations (eyes, nose, etc)
164 | """
165 | landmarks = self._raw_face_landmarks(face_image, face_locations, model)
166 | landmarks_as_tuples = [[(p.x, p.y) for p in landmark.parts()] for landmark in landmarks]
167 |
168 | # For a definition of each point index, see https://cdn-images-1.medium.com/max/1600/1*AbEg31EgkbXSQehuNJBlWg.png
169 | if model == 'large':
170 | return [{
171 | "chin": points[0:17],
172 | "left_eyebrow": points[17:22],
173 | "right_eyebrow": points[22:27],
174 | "nose_bridge": points[27:31],
175 | "nose_tip": points[31:36],
176 | "left_eye": points[36:42],
177 | "right_eye": points[42:48],
178 | "top_lip": points[48:55] + [points[64]] + [points[63]] + [points[62]] + [points[61]] + [points[60]],
179 | "bottom_lip": points[54:60] + [points[48]] + [points[60]] + [points[67]] + [points[66]] + [points[65]] + [points[64]]
180 | } for points in landmarks_as_tuples]
181 | elif model == 'small':
182 | return [{
183 | "nose_tip": [points[4]],
184 | "left_eye": points[2:4],
185 | "right_eye": points[0:2],
186 | } for points in landmarks_as_tuples]
187 | else:
188 | raise ValueError("Invalid landmarks model type. Supported models are ['small', 'large'].")
189 |
190 |
191 | def face_encodings(self, face_image, known_face_locations=None, num_jitters=1):
192 | """
193 | Given an image, return the 128-dimension face encoding for each face in the image.
194 |
195 | :param face_image: The image that contains one or more faces
196 | :param known_face_locations: Optional - the bounding boxes of each face if you already know them.
197 | :param num_jitters: How many times to re-sample the face when calculating encoding. Higher is more accurate, but slower (i.e. 100 is 100x slower)
198 | :return: A list of 128-dimensional face encodings (one for each face in the image)
199 | """
200 | raw_landmarks = _raw_face_landmarks(face_image, known_face_locations, model="small")
201 | return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
202 |
203 |
204 | def compare_faces(self, known_face_encodings, face_encoding_to_check, tolerance=0.6):
205 | """
206 | Compare a list of face encodings against a candidate encoding to see if they match.
207 |
208 | :param known_face_encodings: A list of known face encodings
209 | :param face_encoding_to_check: A single face encoding to compare against the list
210 | :param tolerance: How much distance between faces to consider it a match. Lower is more strict. 0.6 is typical best performance.
211 | :return: A list of True/False values indicating which known_face_encodings match the face encoding to check
212 | """
213 | return list(face_distance(known_face_encodings, face_encoding_to_check) <= tolerance)
214 |
--------------------------------------------------------------------------------
/optracker/facerec/facerec.py:
--------------------------------------------------------------------------------
1 | from PIL import Image
2 | from .api import face_recognition
3 |
4 | class facerec():
5 | def __init__(self, Zero):
6 | self.zero = Zero
7 |
8 | self.zero.printText("+ Loading Face Recognition", True)
9 | self.face = face_recognition()
10 |
11 | def findFaceinImgCNN(self, img):
12 | print("Finding Face CNN")
13 | image = self.face.load_image_file(img)
14 | face_locations = self.face.face_locations(image, number_of_times_to_upsample=0, model="cnn")
15 | print("I found {} face(s) in this photograph.".format(len(face_locations)))
16 |
17 | for face_location in face_locations:
18 | # Print the location of each face in this image
19 | top, right, bottom, left = face_location
20 | print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
21 |
22 | # You can access the actual face itself like this:
23 | face_image = image[top:bottom, left:right]
24 | pil_image = Image.fromarray(face_image)
25 | pil_image.show()
26 |
27 | def findFaceinImg(self, img):
28 | self.zero.printText("+ Finding Face HOG log", False)
29 | image = self.face.load_image_file(img)
30 | face_locations = self.face.face_locations(image)
31 |
32 | #Return Image Value of all faces in array. Use LEN to see how many
33 | return face_locations, image
34 |
35 | def readSource(self, file):
36 | print("Read source")
37 |
--------------------------------------------------------------------------------
/optracker/functions/__init__.py:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/optracker/functions/core_func.py:
--------------------------------------------------------------------------------
1 | import os
2 | import requests
3 | import shutil
4 | import filecmp
5 | import imghdr
6 | from PIL import Image
7 | from ..functions.instagram_func import *
8 |
9 | class coreFunc():
10 | def __init__(self, dbTool, dbConn, instagram, Zero, Facerec):
11 | self.dbTool = dbTool
12 | self.dbConn = dbConn
13 | self.instagram = instagram
14 | self.zero = Zero
15 | self.instaTool = InstagramFunc(self.instagram)
16 | self.curPrivate = 0
17 | self.facerec = Facerec
18 |
19 | def find_between_r(self, s, first, last,):
20 | try:
21 | start = s.rindex( first ) + len( first )
22 | end = s.rindex( last, start )
23 | return s[start:end]
24 | except ValueError:
25 | return ""
26 |
27 | def is_similar(self, image1, image2):
28 | return filecmp.cmp(image1, image2)
29 |
30 | def createFolderIf(self, folder):
31 | if not os.path.exists(folder):
32 | os.mkdir(folder)
33 | self.zero.printText("+ Folder created: {}".format(folder), False)
34 | else:
35 | self.zero.printText("+ Folder loacted: {}".format(folder), False)
36 |
37 | def setImageName(self, folder, username, type):
38 | c = 1
39 | d = True
40 | currentFile = ""
41 | while d == True:
42 | currentFile = folder + username + "-" + str(c) + type
43 | if os.path.isfile(currentFile) == False:
44 | d = False
45 | else:
46 | self.zero.printText("+ File: {} exist trying next.".format(currentFile), False)
47 | c += 1
48 |
49 | self.zero.printText("+ Setting filename: {}".format(currentFile), False)
50 | return currentFile
51 |
52 | def compareImage(self, path, currentFile):
53 | #Get files in directory # r=root, d=directories, f = files
54 | files = []
55 | for r, d, f in os.walk(path):
56 | for file in f:
57 | files.append(os.path.join(r, file))
58 |
59 | #Check if its the only ones
60 | for f in files:
61 | if f != currentFile:
62 | self.zero.printText("+ Checking {} and {} if same.".format(currentFile, f), False)
63 | if self.is_similar(currentFile, f) == True:
64 | self.zero.printText("+ Image same deleting: {}".format(currentFile), False)
65 | os.remove(currentFile)
66 | return True
67 | return False
68 |
69 | def createInstaProfileFolder(self, ID):
70 | curr = self.zero.OP_INSTA_PROFILEFOLDER_NAME_VALUE
71 | counter = 1
72 |
73 | for i in range(0, len(ID), 2):
74 | if counter <= 2:
75 | curr = curr + ID[i:i + 2] + '\\'
76 | self.createFolderIf(curr)
77 | counter += 1
78 |
79 | curr = curr + ID + "\\"
80 | self.createFolderIf(curr)
81 | return curr
82 |
83 | def downloadProfileImage(self, name, username, type, url):
84 | instaFolder = self.createInstaProfileFolder(name)
85 | file = self.setImageName(instaFolder, username, type)
86 |
87 | self.zero.printText("+ Downloading Image: {}".format(file), True)
88 | downloadok = True
89 |
90 | #Write Image
91 | resp = requests.get(url, stream=True)
92 | local_file = open(file, 'wb')
93 | resp.raw.decode_content = True
94 | shutil.copyfileobj(resp.raw, local_file)
95 | local_file.close()
96 | del resp
97 |
98 | #Read Image to verify
99 | type = imghdr.what(file)
100 |
101 | if str(type) != "None":
102 | self.zero.printText("+ Download Complete", True)
103 | else:
104 | downloadok = False
105 | self.zero.printText("+ File dont contain image, deleting file", True)
106 | os.remove(file)
107 |
108 | if downloadok == True:
109 | if self.compareImage(instaFolder, file) == False:
110 | if int(self.zero.FACEREC_ON_VALUE) == int(1):
111 | self.zero.printText("+ Face scan active, scanning image", True)
112 | face, image = self.facerec.findFaceinImg(file)
113 | if len(face) == 0:
114 | self.zero.printText("+ Found no face in image, deleting file", True)
115 | os.remove(file)
116 | else:
117 | self.zero.printText("+ Found {} faces in image".format(len(face)), True)
118 |
119 | def exportDBData(self):
120 | self.zero.printText("\n- Loading current data from DB", True)
121 | totalNodes = self.dbTool.getValueSQLnoinput(self.dbConn, self.zero.DB_SELECT_COUNT_NODES)[0][0]
122 | totalEdgesInsta = self.dbTool.getValueSQLnoinput(self.dbConn, self.zero.DB_SELECT_COUNT_EDES_INSTA)[0][0]
123 | self.zero.printText("+ Total nodes: {}\n+ Total egdes from instagram:{}".format(totalNodes, totalEdgesInsta), True)
124 | exportyes = input("+ Do you want to export? [Y/n] ")
125 |
126 | if exportyes.lower().strip() != "n":
127 | self.zero.printText("+ Exporting NODES", True)
128 | self.dbTool.exportNode(self.dbConn, self.zero.DB_SELECT_EXPORT_ID_USER, self.zero.DB_DATABASE_EXPORT_FOLDER + self.zero.DB_DATABASE_EXPORT_NODES)
129 | self.zero.printText("+ NODES exported to: {}".format(self.zero.DB_DATABASE_EXPORT_NODES), True)
130 |
131 | self.zero.printText("+ Exporting EDGES", True)
132 | self.dbTool.exportNode(self.dbConn, self.zero.DB_SELECT_ALL_INSTA_EDGES, self.zero.DB_DATABASE_EXPORT_FOLDER + self.zero.DB_DATABASE_EXPORT_INSTA_EGDE)
133 | self.zero.printText("+ EDGES exported to: {}".format(self.zero.DB_DATABASE_EXPORT_INSTA_EGDE), True)
134 |
135 | def getDoneUserIDFromInsta(self):
136 | self.zero.printText("\n- Loading done user from instagram", True)
137 | userList = self.dbTool.getValueSQLnoinput(self.dbConn, self.zero.DB_SELECT_ALL_DONE_NEW_INSTA)
138 |
139 | if userList == 0:
140 | self.zero.printText("+ No users in database that have been scannet 100%", True)
141 | return 0
142 |
143 | else:
144 | self.zero.printText("+ User list imported", True)
145 | count = 0
146 | for i in userList:
147 | count += 1
148 | self.zero.printText("[{}] {} ({})".format(count, i[0], str(i[1]).strip()), True)
149 | selectUser = input("+ Select user (1-{}): ".format(count))
150 |
151 | if not selectUser.isnumeric():
152 | self.zero.printText("+ Invalid input, #1 selected", True)
153 | selectUser = 1
154 |
155 | if int(selectUser) > count:
156 | self.zero.printText("+ Invalid input, #1 selected", True)
157 | selectUser = 1
158 |
159 | newNumber = int(selectUser) - 1
160 | return userList[newNumber]
161 |
162 | def updateNodesUser(self, instaID):
163 | self.zero.printText("+ Updating user data for: {}".format(instaID), False)
164 | newDataUser = self.instaTool.get_insta_account_info_id(instaID)
165 | self.zero.printText("+ User data loaded.", False)
166 | label = self.getLabelforUser(newDataUser)
167 |
168 | #Download profile Image
169 | if int(self.zero.DOWNLOAD_PROFILE_INSTA_VALUE) == 1:
170 | self.downloadProfileImage(newDataUser.identifier, newDataUser.username, self.zero.INSTA_FILE_EXT, newDataUser.get_profile_picture_url())
171 |
172 | UPDATE_DATA = (self.zero.sanTuple(newDataUser.full_name), self.zero.sanTuple(label), newDataUser.get_profile_picture_url(), newDataUser.follows_count, newDataUser.followed_by_count, self.zero.sanTuple(newDataUser.biography), newDataUser.username, newDataUser.is_private, newDataUser.is_verified, newDataUser.media_count, newDataUser.external_url, 1, newDataUser.identifier)
173 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_UPDATE_NODES, UPDATE_DATA)
174 | self.zero.printText("+ Update of DB NODE complete.", False)
175 | return newDataUser
176 |
177 | def updateProfileImg(self):
178 | self.zero.printText("\n- Starting Profile Img Update", True)
179 | user_img_list = self.dbTool.getValueSQLnoinput(self.dbConn, self.zero.DB_SELECT_IMG)
180 | lengList = len(user_img_list)
181 | counter = 1
182 | for u in user_img_list:
183 | self.zero.printText("\n+ {} of {}: {}".format(counter, lengList, u[0]), True)
184 | counter += 1
185 | #Download profile Image
186 | self.downloadProfileImage(str(u[1]), str(u[0]), self.zero.INSTA_FILE_EXT, str(u[2]))
187 |
188 | def updateNodesUserLoaded(self, newDataUser):
189 | self.zero.printText("+ Updating user data for: {} ({})".format(newDataUser.username, newDataUser.identifier), False)
190 | label = self.getLabelforUser(newDataUser)
191 | UPDATE_DATA = (self.zero.sanTuple(newDataUser.full_name), self.zero.sanTuple(label), newDataUser.get_profile_picture_url(), newDataUser.follows_count, newDataUser.followed_by_count, self.zero.sanTuple(newDataUser.biography), newDataUser.username, newDataUser.is_private, newDataUser.is_verified, newDataUser.media_count, newDataUser.external_url, 1, newDataUser.identifier)
192 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_UPDATE_NODES, UPDATE_DATA)
193 | self.zero.printText("+ Update of DB NODE complete.", False)
194 |
195 | def updateNodeFromList(self):
196 | self.zero.printText("\n- Updating users from list", True)
197 | fullpath = self.zero.OP_ROOT_FOLDER_PATH_VALUE + self.zero.USER_FILE_SCAN_NODE_INSTA
198 | if os.path.isfile(fullpath):
199 | self.zero.printText("+ Found: {}, extracting data".format(fullpath), True)
200 | with open(fullpath) as fp:
201 | line = fp.readline()
202 | while line:
203 | if line != 0:
204 | user = line.strip()
205 | zero.printText("+ Getting user info for {}:".format(user), True)
206 | updatenode = self.instaTool.get_insta_account_info(user)
207 |
208 | #Download profile Image
209 | if int(self.zero.DOWNLOAD_PROFILE_INSTA_VALUE) == 1:
210 | self.downloadProfileImage(updatenode.identifier, updatenode.username, self.zero.INSTA_FILE_EXT, updatenode.get_profile_picture_url())
211 |
212 | self.updateNodesUserLoaded(updatenode)
213 | line = fp.readline()
214 | else:
215 | self.zero.printText("+ File not found.", True)
216 | self.zero.printText("+ Create {} to continue.".format(fullpath), True)
217 |
218 | def deepScanAll(self):
219 | self.zero.printText("\n-Geting users from DB", True)
220 | allDeep = self.dbTool.getValueSQLnoinput(self.dbConn, self.zero.DB_SELECT_DEEPSCAN_NEED)
221 | lengDeep = len(allDeep)
222 | counter = 1
223 | for u in allDeep:
224 | user = u[0]
225 | self.zero.printText("+ {} of {} - Getting user info for {}:".format(counter, lengDeep, user), True)
226 | updatenode = self.instaTool.get_insta_account_info(user)
227 |
228 | #Download profile Image
229 | if int(self.zero.DOWNLOAD_PROFILE_INSTA_VALUE) == 1:
230 | self.downloadProfileImage(updatenode.identifier, updatenode.username, self.zero.INSTA_FILE_EXT, updatenode.get_profile_picture_url())
231 |
232 | self.updateNodesUserLoaded(updatenode)
233 | counter += 1
234 |
235 | def scanFollowToInstaID(self):
236 | currentInstaID = self.getDoneUserIDFromInsta()
237 |
238 | self.zero.printText("\n- Starting scan by follow", True)
239 | if currentInstaID == 0:
240 | self.zero.printText("+ No users could be selected.\n+ Run a full single scan of a user to continue.", True)
241 |
242 | else:
243 | currentUser = currentInstaID[1]
244 | currentID = currentInstaID[0]
245 |
246 | getMaxValueFOLLOW = int(self.zero.INSTA_MAX_FOLLOW_SCAN_VALUE)
247 | getMaxValueFOLLOWBY = int(self.zero.INSTA_MAX_FOLLOW_BY_SCAN_VALUE)
248 |
249 | self.zero.printText("+ Current insta id: {} ({})".format(currentID, currentUser), True)
250 | self.zero.printText("+ Looking up NODE ID.", True)
251 | currentNode = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_ID_NODE, (currentID, ))[0][0]
252 | self.zero.printText("+ Node ID found: {}".format(currentNode), True)
253 | self.zero.printText("+ Loading followed by list where PRIVATE = 0", True)
254 | followList = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_FOLLOW_OF, (currentNode, ))
255 | if followList == 0:
256 | self.zero.printText("+ Are followed nobody that have PUBLIC profile.", True)
257 | else:
258 | lenFollowList = len(followList)
259 | counter = 0
260 | self.zero.printText("+ Loaded {} users from: {} where private = 0".format(lenFollowList, currentUser), True)
261 |
262 | #TODO: ADD SORTING OF USER BASED ON KEY WORD FROM BIO
263 | for i in followList:
264 | counter += 1
265 | self.zero.printText("\n- {} of {} :: {}".format(counter, lenFollowList, i[8]), True),
266 | self.zero.printText("+ Checking search status for: {}".format(i[3]), False)
267 | moveON = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_DONE_NEW_INSTA, (i[3],))
268 | if moveON[0][0] == 1:
269 | self.zero.printText("+ User SCAN are allready DONE.", True)
270 | else:
271 | if moveON[0][1] == 1:
272 | self.zero.printText("+ User NOT scanned but set on WAIT", True)
273 | else:
274 | self.zero.printText("+ User VALID for singel scan.", True)
275 |
276 | scan_insta_followed_by = int(i[6])
277 | scan_insta_follow = int(i[5])
278 |
279 | #TODO: SPEEDUP SCAN - LESS SQL REQUEST
280 | deepScan = int(i[13])
281 | if deepScan == 0:
282 | #User have not been deepscanned scan and update
283 | self.zero.printText("+ User missing deepScan, getting info.", False)
284 | newDataUser = self.updateNodesUser(i[3])
285 | scan_insta_followed_by = int(newDataUser.followed_by_count)
286 | scan_insta_follow = int(newDataUser.follows_count)
287 |
288 |
289 | self.zero.printText("+ User are following: {}\n+ User are followed by: {}".format(scan_insta_follow, scan_insta_followed_by), False)
290 |
291 | #Search sorting firt step follows_count
292 | if scan_insta_follow <= getMaxValueFOLLOW:
293 | if scan_insta_followed_by <= getMaxValueFOLLOWBY:
294 | #Search critera for allowed OK Start scan.
295 | self.setCurrentUser(i[8].strip())
296 |
297 | #Check if private
298 | if self.curPrivate == 0:
299 | #Extract info from following list
300 | if scan_insta_follow != 0:
301 | if self.loadFollowlist(False) == True:
302 | self.add_egde_from_list_insta(False)
303 | else:
304 | self.zero.printText("+ Follow list is empty", False)
305 |
306 | #Extract followed by
307 | if scan_insta_followed_by != 0:
308 | if self.loadFollowlist(True) == True:
309 | self.add_egde_from_list_insta(True)
310 | else:
311 | self.zero.printText("+ Follow by list is empty", False)
312 |
313 | #Update new_Insta
314 | self.zero.printText("\n- Scan complete", False)
315 | self.zero.printText("+ Setting {} ({}) to complete.".format(i[8], i[3]), True)
316 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_UPDATE_NEW_INSTA_DONE_TRUE, (i[3],))
317 | else:
318 | self.zero.printText("+ User profile are private after update.", True)
319 | self.zero.printText("+ Setting {} ({}) to complete.".format(i[8], i[3]), True)
320 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_UPDATE_NEW_INSTA_DONE_TRUE, (i[3],))
321 | else:
322 | self.zero.printText("+ User are followed by to many, increese allowed follow to continue", True)
323 | else:
324 | self.zero.printText("+ User are following to many, increese allowed follow to continue", True)
325 |
326 |
327 | def loadFollowlist(self, inOut): #False load Follow, True Load followers
328 | continueScan = True
329 |
330 | if inOut == False:
331 | #Getting following
332 | self.zero.printText("\n- Loading follows list for: {}".format(self.currentUser.username), True)
333 | self.followNumber = self.currentUser.follows_count;
334 | if self.followNumber != 0:
335 | self.zero.printText("+ {} are following {} starting info extract.".format(self.currentUser.full_name, self.followNumber), False)
336 | self.imported_follow = self.instaTool.get_insta_following(self.followNumber, self.currentUser.identifier)
337 | self.lenImpF = len(self.imported_follow['accounts'])
338 | self.zero.printText("+ Total loaded: {}".format(self.lenImpF), False)
339 | continueScan = True
340 | else:
341 | print("+ {} are following NOBODY, skipping this stage".format(self.currentUser.username))
342 | continueScan = False
343 | else:
344 | #Getting following
345 | self.zero.printText("\n- Loading followed by list for: {}".format(self.currentUser.username), True)
346 | self.followNumber = self.currentUser.followed_by_count;
347 | if self.followNumber != 0:
348 | self.zero.printText("+ {} are followed by {} starting info extract".format(self.currentUser.full_name, self.followNumber), False)
349 | self.imported_follow = self.instaTool.get_insta_follow_by(self.followNumber, self.currentUser.identifier)
350 | self.lenImpF = len(self.imported_follow['accounts'])
351 | self.zero.printText("+ Total loaded: {}".format(self.lenImpF), False)
352 | continueScan = True
353 | else:
354 | print("+ {} are following NOBODY, skipping this stage".format(self.currentUser.username))
355 | continueScan = False
356 |
357 | return continueScan
358 |
359 | def setCurrentUser(self, user):
360 | #Get information
361 | self.zero.printText("\n- Setting current user to: {}".format(user), True)
362 |
363 | #Check if zeroPoint is in DB if not add.
364 | self.zero.printText("+ Getting user information from Instagram", True)
365 | self.currentUser = self.instaTool.get_insta_account_info(user)
366 | self.curPrivate = self.currentUser.is_private
367 | self.check_user_db_node(self.currentUser, False)
368 |
369 | #Download profile Image
370 | if int(self.zero.DOWNLOAD_PROFILE_INSTA_VALUE) == 1:
371 | self.downloadProfileImage(self.currentUser.identifier, self.currentUser.username, self.zero.INSTA_FILE_EXT, self.currentUser.get_profile_picture_url())
372 |
373 | #Update User information
374 | self.updateNodesUserLoaded(self.currentUser)
375 |
376 | #Check if in new_Insta
377 | self.check_new_insta(self.currentUser.identifier, self.currentUser.username)
378 |
379 | #Getting current NODE ID for source
380 | self.sourceID = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_ID_NODE, (self.currentUser.identifier, ))[0][0]
381 | self.zero.printText("+ Recived node ID: {} for zeroPoint: {}".format(self.sourceID, self.currentUser.username), True)
382 |
383 | #Setting global INSTA # IDEA
384 | self.zero.printText("+ Global insta ID set to {}".format(self.currentUser.identifier), True)
385 | self.zero.INSTA_USER_ID = self.currentUser.identifier
386 |
387 | def check_new_insta(self, instaID, insert_username):
388 | self.zero.printText("+ Checking new_insta DB for: {}".format(instaID), False)
389 | getNewinsta = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_DONE_NEW_INSTA, (instaID, ))
390 | if getNewinsta == 0:
391 | self.zero.printText("+ NOT found in new_insta adding user_id: {} ({})".format(instaID, insert_username), False)
392 | self.zero.INSERT_DATA = (instaID, insert_username)
393 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_INSERT_NEW_INSTA, self.zero.INSERT_DATA)
394 | else:
395 | self.zero.printText("+ FOUND in new_insta", False)
396 | if getNewinsta[0][0] == 1:
397 | self.zero.printText("+ STATUS = FINNISH", False)
398 | else:
399 | if getNewinsta[0][1] == 1:
400 | self.zero.printText("+ STATUS = WAIT", False)
401 | else:
402 | self.zero.printText("+ STATUS = IN LINE", False)
403 |
404 | def getLabelforUser(self, user):
405 | self.zero.printText("+ Are full_name empty?", False)
406 | if user.full_name:
407 | self.zero.printText("+ NO", False)
408 | self.zero.printText("+ Using: {} for label.".format(user.full_name), False)
409 | return user.full_name
410 |
411 | else:
412 | self.zero.printText("+ YES", False)
413 | self.zero.printText("+ Using: {} for label.".format(user.username), False)
414 | return user.username
415 |
416 | def check_user_db_node(self, user, getInfo):
417 | #Check if we do a full scan
418 | getSurfaceScan = int(self.zero.SURFACE_SCAN_VALUE)
419 |
420 | #Get node id
421 | userNodeID = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_ID_NODE, (user.identifier, ))
422 |
423 | self.zero.printText("+ Checking NODE DB for id: {} ({})".format(user.identifier, user.username), False)
424 | if userNodeID == 0:
425 | self.zero.printText("+ NOT found in node", False)
426 | tempID = user.identifier;
427 |
428 | if getSurfaceScan == 0:
429 | if getInfo == True:
430 | self.zero.printText("+ Getting user data for: {}".format(user.username), False)
431 | user = self.instaTool.get_insta_account_info_id(tempID)
432 |
433 | #Download profile
434 | if int(self.zero.DOWNLOAD_PROFILE_INSTA_VALUE) == 1:
435 | self.downloadProfileImage(user.identifier, user.username, self.zero.INSTA_FILE_EXT, user.get_profile_picture_url())
436 |
437 | label = self.getLabelforUser(user)
438 | self.zero.INSERT_DATA = (self.zero.sanTuple(user.full_name), self.zero.sanTuple(label), user.identifier, user.get_profile_picture_url(), user.follows_count, user.followed_by_count, self.zero.sanTuple(user.biography), user.username, user.is_private, user.is_verified, user.media_count, user.external_url, 1, user.identifier)
439 |
440 | else:
441 | #TODO: Add image download to surface`?
442 | self.zero.printText("+ Surfacescan are ON", False)
443 |
444 | if user.is_private == False:
445 | user.is_private = 0
446 | else:
447 | user.is_private = 1
448 |
449 | if user.is_verified == False:
450 | user.is_verified = 0
451 | else:
452 | user.is_verified = 1
453 |
454 | label = self.getLabelforUser(user)
455 | self.zero.INSERT_DATA = (self.zero.sanTuple(user.full_name), self.zero.sanTuple(label), user.identifier, user.get_profile_picture_url(), user.follows_count, user.followed_by_count, self.zero.sanTuple(user.biography), user.username, user.is_private, user.is_verified, user.media_count, user.external_url, 0, user.identifier)
456 |
457 | self.zero.printText("+ ADDING to NODE db", False)
458 | userNodeID = self.dbTool.inserttoTabelMulti(self.dbConn, self.zero.DB_INSERT_NODE, self.zero.INSERT_DATA)[0][0]
459 | else:
460 | userNodeID = userNodeID[0][0]
461 | self.zero.printText("+ FOUND in NODE list ({}) moving on".format(userNodeID), False)
462 |
463 | return userNodeID
464 |
465 | def add_egde_from_list_insta(self, inOut):
466 | counterF = 0
467 | for following in self.imported_follow['accounts']:
468 | counterF += 1
469 | self.zero.printText("\n", False)
470 | self.zero.printText("- {} of {} :: Username: {} - ID: {}".format(counterF, self.lenImpF, following.username, following.identifier), True)
471 |
472 | #Add in Node DB
473 | tempID = self.check_user_db_node(following, True)
474 |
475 | #Check if this is a new node that havent been search
476 | self.check_new_insta(following.identifier, following.username)
477 |
478 | #Get node ID
479 | self.zero.printText("+ Recived node ID: {} ({})".format(tempID, following.username), False)
480 |
481 | #Add in egdes_insta
482 | if inOut == True:
483 | self.zero.printText("+ Checking insta_edges DB. Source: {} ({}), Target: {} ({})".format(tempID, following.username, self.sourceID, self.currentUser.username), False)
484 | if self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_TARGET_EDGE, (tempID, self.sourceID, )) == 0:
485 | self.zero.printText("+ NOT found in insta_edges adding data", False)
486 | self.zero.INSERT_DATA = (tempID, self.sourceID)
487 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_INSERT_INSTA_EGDE, self.zero.INSERT_DATA)
488 | else:
489 | self.zero.printText("+ FOUND in insta_edges list moving on", False)
490 | else:
491 | self.zero.printText("+ Checking insta_edges DB. Source: {} ({}), Target: {} ({})".format(self.sourceID, self.currentUser.full_name, tempID, following.username), False)
492 | if self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_TARGET_EDGE, (self.sourceID, tempID, )) == 0:
493 | self.zero.printText("+ NOT found in insta_edges adding data.", False)
494 | self.zero.INSERT_DATA = (self.sourceID, tempID)
495 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_INSERT_INSTA_EGDE, self.zero.INSERT_DATA)
496 | else:
497 | self.zero.printText("+ FOUND in insta_edges list moving on", False)
498 |
--------------------------------------------------------------------------------
/optracker/functions/db_func.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sqlite3
3 | import unicodecsv as csv
4 | import sys
5 | import mysql.connector
6 | from sqlite3 import Error
7 |
8 |
9 |
10 | class dbFunc():
11 | def __init__(self, dbname, Zero):
12 | self.zero = Zero
13 | self.dbname = self.zero.DB_DATABASE_FOLDER + dbname
14 | self.createDBfolder()
15 |
16 | def create_connection(self):
17 | conn = None
18 | if self.zero.DB_MYSQL_ON == 0:
19 | try:
20 | self.zero.printText("+ Connection to: SQLite", True)
21 | conn = sqlite3.connect(self.dbname)
22 | except Error as e:
23 | print(e)
24 | else:
25 | try:
26 | self.zero.printText("+ Connection to: MySQL", True)
27 | conn = mysql.connector.connect(host= self.zero.DB_MYSQL, database=self.zero.DB_MYSQL_DATABASE, user=self.zero.DB_MYSQL_USER, passwd=self.zero.DB_MYSQL_PASSWORD, charset=self.zero.DB_MYSQL_CHARSET, collation=self.zero.DB_MYSQL_COLLATION)
28 | except Exception as e:
29 | print(e)
30 |
31 | self.zero.printText("+ Connection to DB done.", False)
32 | return conn
33 |
34 | def createTabels(self, conn, create_table_sql):
35 | try:
36 | c = conn.cursor()
37 | c.execute(create_table_sql)
38 | except Error as e:
39 | print(e)
40 |
41 | def inserttoTabel(self, conn, sql, task):
42 | cur = conn.cursor()
43 | cur.execute(sql, task)
44 | conn.commit()
45 | return cur.lastrowid
46 |
47 | def inserttoTabelMulti(self, conn, sql, task):
48 | if self.zero.DB_MYSQL_ON == 1:
49 | c = conn.cursor(buffered=True)
50 | results = c.execute(sql, task, multi=True)
51 | for cur in results:
52 | if cur.with_rows:
53 | data = cur.fetchall()
54 | conn.commit()
55 |
56 | else:
57 | newSQL = sql.replace("?", "'%s'")
58 | exeSQL = (newSQL % task)
59 | multi = exeSQL.split(";")
60 | lenM = len(multi)
61 | counter = 0
62 |
63 | cur = conn.cursor()
64 | for i in multi:
65 | counter += 1
66 | if counter != lenM:
67 | cur.execute(i)
68 |
69 | data = cur.fetchall()
70 | conn.commit()
71 |
72 | if data:
73 | return data
74 | else:
75 | return 0
76 |
77 | def getValueSQL(self, conn, sql, task):
78 | cur = conn.cursor()
79 | cur.execute(sql, task)
80 | rows = cur.fetchall()
81 |
82 | if rows:
83 | return rows
84 | else:
85 | return 0
86 |
87 | def getValueSQLnoinput(self, conn, sql):
88 | cur = conn.cursor()
89 | cur.execute(sql)
90 | rows = cur.fetchall()
91 |
92 | if rows:
93 | return rows
94 | else:
95 | return 0
96 |
97 | def exportNode(self, conn, sql, filename):
98 | if os.path.exists(filename):
99 | print("+ File: {} exist, deleting it.".format(filename))
100 | os.remove(filename)
101 |
102 | cur = conn.cursor()
103 | cur.execute(sql)
104 | with open(filename, 'wb') as csvfile:
105 | print("+ Creating and writing to: {}".format(filename))
106 | writer = csv.writer(csvfile, encoding=self.zero.WRITE_ENCODING)
107 | writer.writerow([ i[0] for i in cur.description ])
108 | writer.writerows(cur.fetchall())
109 |
110 | def createDBfolder(self):
111 | if not os.path.exists(self.zero.DB_DATABASE_FOLDER):
112 | os.mkdir(self.zero.DB_DATABASE_FOLDER)
113 |
114 | if not os.path.exists(self.zero.DB_DATABASE_EXPORT_FOLDER):
115 | os.mkdir(self.zero.DB_DATABASE_EXPORT_FOLDER)
116 |
117 | def setDefaultValue(self, conn, text, value):
118 | getValue = self.getValueSQL(conn, self.zero.DB_SELECT_OPTIONS, (text, ))
119 | if getValue == 0:
120 | self.zero.printText("+ {} are NOT in database".format(text), True)
121 | self.inserttoTabel(conn, self.zero.DB_INSERT_OPTIONS_LASTINSTA, (value, text, ))
122 | self.zero.printText("+ {} set to: {}".format(text, value), True)
123 | else:
124 | value = getValue[0][1]
125 | self.zero.printText("+ {} in database, value set to: {}".format(text, value, True), False)
126 |
127 | if text == self.zero.INSTA_MAX_FOLLOW_SCAN_TEXT:
128 | self.zero.INSTA_MAX_FOLLOW_SCAN_VALUE = value
129 |
130 | if text == self.zero.INSTA_MAX_FOLLOW_BY_SCAN_TEXT:
131 | self.zero.INSTA_MAX_FOLLOW_BY_SCAN_VALUE = value
132 |
133 | if text == self.zero.SURFACE_SCAN_TEXT:
134 | self.zero.SURFACE_SCAN_VALUE = value
135 |
136 | if text == self.zero.DETAIL_PRINT_TEXT:
137 | self.zero.DETAIL_PRINT_VALUE = value
138 |
139 | if text == self.zero.DOWNLOAD_PROFILE_INSTA_TEXT:
140 | self.zero.DOWNLOAD_PROFILE_INSTA_VALUE = value
141 |
142 | if text == self.zero.FACEREC_ON_TEXT:
143 | self.zero.FACEREC_ON_VALUE = value
144 |
145 |
146 | def setDefaultValueOptions(self, conn):
147 | #Set max value for scan
148 | print("+ Setup of default values")
149 | self.setDefaultValue(conn, self.zero.INSTA_MAX_FOLLOW_SCAN_TEXT, self.zero.INSTA_MAX_FOLLOW_SCAN_VALUE)
150 | self.setDefaultValue(conn, self.zero.INSTA_MAX_FOLLOW_BY_SCAN_TEXT, self.zero.INSTA_MAX_FOLLOW_BY_SCAN_VALUE)
151 | self.setDefaultValue(conn, self.zero.SURFACE_SCAN_TEXT, self.zero.SURFACE_SCAN_VALUE)
152 | self.setDefaultValue(conn, self.zero.DOWNLOAD_PROFILE_INSTA_TEXT, self.zero.DOWNLOAD_PROFILE_INSTA_VALUE)
153 | self.setDefaultValue(conn, self.zero.FACEREC_ON_TEXT, self.zero.FACEREC_ON_VALUE)
154 | self.zero.printText("+ Setup of default DONE", False)
155 |
--------------------------------------------------------------------------------
/optracker/functions/instagram_func.py:
--------------------------------------------------------------------------------
1 | from time import sleep
2 |
3 | class InstagramFunc():
4 | def __init__(self, instagram):
5 | self.instagram = instagram
6 |
7 | def page_size_check(self, totalFollow):
8 | page_size = 100
9 | if totalFollow < page_size:
10 | page_size = totalFollow
11 | return page_size
12 |
13 | def get_insta_follow_by(self, totalFollow, insta_id):
14 | followers = []
15 | followers = self.instagram.get_followers(insta_id, totalFollow, self.page_size_check(totalFollow), delayed=True)
16 | return followers
17 |
18 | def get_insta_following(self, totalFollow, insta_id):
19 | following = []
20 | following = self.instagram.get_following(insta_id, totalFollow, self.page_size_check(totalFollow), delayed=True)
21 | return following
22 |
23 | def get_insta_media(self, user):
24 | medias = self.instagram.get_medias(user, 25)
25 | media = medias[6]
26 | print(media)
27 | account = media.owner
28 |
29 | def get_insta_account_info(self, currentUser):
30 | newInfo = self.instagram.get_account(currentUser)
31 | sleep(3) #mimic user
32 | return newInfo
33 |
34 | def get_insta_account_info_id(self, currentUser):
35 | newInfo = self.instagram.get_account_by_id(currentUser)
36 | sleep(3) #mimix user
37 | return newInfo
38 |
--------------------------------------------------------------------------------
/optracker/functions/side_func.py:
--------------------------------------------------------------------------------
1 | import os
2 | from datetime import datetime, timedelta
3 |
4 | class sideFunc():
5 | def __init__(self, dbTool, dbConn, Zero):
6 | self.zero = Zero
7 | self.dbTool = dbTool
8 | self.dbConn = dbConn
9 |
10 | def setCurrentUserUpdate(self, user, password):
11 | self.zero.LOGIN_PASSWORD_INSTA = password
12 | self.zero.LOGIN_USERNAME_INSTA = user
13 | self.zero.printText("+ Setting user to: {} and password to: {}".format(self.zero.LOGIN_USERNAME_INSTA, self.zero.LOGIN_PASSWORD_INSTA), True)
14 |
15 | #Update time in account
16 | currentTime = datetime.today()
17 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_UPDATE_ACCOUNT_LAST_USED, (currentTime, user,))
18 | self.zero.printText("+ Updating last time for: {} to: {}".format(user, currentTime), False)
19 |
20 | def autoSelectLogin(self):
21 | userList = self.runUserCheck()
22 | self.zero.printText("\n- Auto selecting login user", True)
23 | if userList != True:
24 | count = 0
25 | currentSelect = 0
26 | oldestTime = datetime.strptime(str(datetime.today()), self.zero.DATETIME_MASK) #Setting current time
27 | for i in userList:
28 | lastTime = i[6]
29 | #Print function to list time and date, not needed.
30 | #self.zero.printText("+ User: {}, last used: {}".format(i[0], lastTime))
31 | datetimelasttime = datetime.strptime(str(datetime.today()), self.zero.DATETIME_MASK)
32 |
33 | if not lastTime:
34 | self.zero.printText("+ {} oldest so far.".format(i[0]), False)
35 | oldestTime = datetimelasttime
36 | currentSelect = count
37 | break
38 | else:
39 | datetimelasttime = datetime.strptime(lastTime, self.zero.DATETIME_MASK)
40 |
41 | if oldestTime >= datetimelasttime:
42 | #oldestTime er nyere så setter forløpig denne til eldste
43 | self.zero.printText("+ {} oldest so far.".format(i[0]), False)
44 | oldestTime = datetimelasttime
45 | currentSelect = count
46 |
47 | count += 1
48 |
49 | self.setCurrentUserUpdate(userList[currentSelect][0].strip(), userList[currentSelect][1].strip())
50 |
51 | def runUserCheck(self):
52 | currentTime = datetime.today()
53 | self.zero.printText("\n- Loading INSTAGRAM user list from DB", True)
54 | userList = self.dbTool.getValueSQLnoinput(self.dbConn, self.zero.DB_SELECT_LOGIN_INSTA)
55 | if userList == 0:
56 | self.zero.printText("+ No user for Instagram found, please add one", True)
57 | user = input("+ Username: ")
58 | password = input("+ Password: ")
59 | email = input("+ Email: ")
60 | fullname = input("+ Fullname: ")
61 |
62 | self.zero.printText("+ Adding {} to DB".format(user), True)
63 | INSERT_DATA = (user, password, email, fullname, "instagram", currentTime)
64 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_INSERT_LOGIN_INSTA, INSERT_DATA)
65 | password = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_LOGIN_PASSWORD_INSTA , (user,))
66 |
67 | if password == 0:
68 | #Add loop for user
69 | self.zero.printText("+ Not able to add user", True)
70 |
71 | else:
72 | self.zero.printText("+ User add OK", True)
73 | self.setCurrentUserUpdate(user, password[0][0])
74 | return True
75 | else:
76 | if len(userList) == 1:
77 | self.zero.printText("+ One user found using: {}".format(userList[0][0]), True)
78 | self.setCurrentUserUpdate(userList[0][0].strip(), userList[0][1].strip())
79 | return True
80 | else:
81 | self.zero.printText("+ User list loaded.", True)
82 | return userList
83 |
84 | def setDefValue(self, newValue, text, value_text, oneup, json):
85 | change = False
86 | if newValue.isdigit():
87 | if oneup == False:
88 | if int(newValue) < 1:
89 | self.zero.printText("+ Invalid input {} not changed".format(text), False)
90 | else:
91 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_UPDATE_OPTIONS, (newValue, text))
92 | self.changeValue(text, newValue)
93 | self.zero.printText("+ {} set to: {}".format(text, value_text), True)
94 | if oneup == True:
95 | if int(newValue) > 1:
96 | self.zero.printText("+ Invalid input {} not changed".format(text), False)
97 | else:
98 | if int(newValue) < 0:
99 | self.zero.printText("+ Invalid input {} not changed".format(text), False)
100 | else:
101 | self.changeValue(text, newValue)
102 | if json == False:
103 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_UPDATE_OPTIONS, (newValue, text))
104 | self.zero.printText("+ {} set to: {}".format(text, value_text), True)
105 | else:
106 | self.zero.setupJSON(True)
107 | self.zero.printText("+ {} set to: {}".format(text, newValue), True)
108 | self.zero.printText("+ IF SQL CHANGE RESTART PROGRAM", True)
109 |
110 | else:
111 | self.zero.printText("+ Invalid input {} not changed".format(text), False)
112 |
113 | def changeValue(self, value_text, newValue):
114 | if str(value_text).strip() == str(self.zero.INSTA_MAX_FOLLOW_SCAN_TEXT).strip():
115 | self.zero.INSTA_MAX_FOLLOW_SCAN_VALUE = newValue
116 | if str(value_text).strip() == str(self.zero.INSTA_MAX_FOLLOW_BY_SCAN_TEXT).strip():
117 | self.zero.INSTA_MAX_FOLLOW_BY_SCAN_VALUE = newValue
118 | if str(value_text).strip() == str(self.zero.SURFACE_SCAN_TEXT).strip():
119 | self.zero.SURFACE_SCAN_VALUE = newValue
120 | if str(value_text).strip() == str(self.zero.DETAIL_PRINT_TEXT).strip():
121 | self.zero.DETAIL_PRINT_VALUE = newValue
122 | if str(value_text).strip() == str(self.zero.DOWNLOAD_PROFILE_INSTA_TEXT).strip():
123 | self.zero.DOWNLOAD_PROFILE_INSTA_VALUE = newValue
124 | if str(value_text).strip() == str(self.zero.FACEREC_ON_TEXT).strip():
125 | self.zero.FACEREC_ON_VALUE = newValue
126 |
127 | def editDefaultValue(self):
128 | getMaxValueFOLLOW = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_OPTIONS, (self.zero.INSTA_MAX_FOLLOW_SCAN_TEXT, ))[0][1]
129 | getMaxValueFOLLOWBY = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_OPTIONS, (self.zero.INSTA_MAX_FOLLOW_BY_SCAN_TEXT, ))[0][1]
130 | getSurfaceScan = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_OPTIONS, (self.zero.SURFACE_SCAN_TEXT, ))[0][1]
131 | getDownload = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_OPTIONS, (self.zero.DOWNLOAD_PROFILE_INSTA_TEXT, ))[0][1]
132 |
133 | self.zero.printText("\n- Loading default values:", True)
134 | self.zero.printText("+ Max allowed follow: {}".format(getMaxValueFOLLOW), True)
135 | self.zero.printText("+ Max allowed follow by: {}".format(getMaxValueFOLLOWBY), True)
136 | self.zero.printText("+ Surface scan set to: {} (0 = OFF, 1 = ON)".format(getSurfaceScan), True)
137 | self.zero.printText("+ Download profile set to: {} (0 = OFF, 1 = ON)".format(getDownload), True)
138 | self.zero.printText("+ Detail print set to: {} (0 = OFF, 1 = ON)".format(self.zero.DETAIL_PRINT_VALUE), True)
139 | self.zero.printText("+ Face Recognition on download: {} (0 = OFF, 1 = ON)".format(self.zero.FACEREC_ON_VALUE), True)
140 | self.zero.printText("+ Mysql(1) - Sqlite(0): {}".format(self.zero.DB_MYSQL_ON), True)
141 |
142 | change = input("+ Change value? [y/N] ")
143 |
144 | if change.lower().strip() == "y":
145 | newMaxFollow = input("+ Max allowed follow: ")
146 | newMaxFollowBy = input("+ Max allowed followed by: ")
147 | newSurfaceScan = input("+ Surface scan on[1]/off[0]: ")
148 | newDetailPrint = input("+ Detail print on[1]/off[0]: ")
149 | newSavePhoto = input("+ Download profile on[1]/off[0]: ")
150 | newFace = input("+ Face recognition on download on[1]/off[0]: ")
151 | newMysql = input("+ Mysql[1] - Sqlite[0]: ")
152 |
153 | self.setDefValue(newMaxFollow, self.zero.INSTA_MAX_FOLLOW_SCAN_TEXT, self.zero.INSTA_MAX_FOLLOW_SCAN_VALUE, False, False)
154 | self.setDefValue(newMaxFollowBy, self.zero.INSTA_MAX_FOLLOW_BY_SCAN_TEXT, self.zero.INSTA_MAX_FOLLOW_BY_SCAN_VALUE, False, False)
155 | self.setDefValue(newSurfaceScan, self.zero.SURFACE_SCAN_TEXT, self.zero.SURFACE_SCAN_VALUE, True, False)
156 | self.setDefValue(newDetailPrint, self.zero.DETAIL_PRINT_TEXT, self.zero.DETAIL_PRINT_VALUE, True, True)
157 | self.setDefValue(newSavePhoto, self.zero.DOWNLOAD_PROFILE_INSTA_TEXT, self.zero.DOWNLOAD_PROFILE_INSTA_VALUE, True, False)
158 | self.setDefValue(newFace, self.zero.FACEREC_ON_TEXT, self.zero.FACEREC_ON_VALUE, True, False)
159 | else:
160 | self.zero.printText("+ Nothing changed.", True)
161 |
162 | def addLastInsta(self, update):
163 | lastInsta = input("+ Enter account to scrape: ")
164 | self.zero.printText("+ Adding {} to DB".format(lastInsta), True)
165 | if update == False:
166 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_INSERT_OPTIONS_LASTINSTA, (lastInsta, self.zero.LAST_INSTA_TEXT))
167 | else:
168 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_UPDATE_LAST_INSTA, (lastInsta,))
169 |
170 | if lastInsta == 0:
171 | #Add loop for user
172 | self.zero.printText("+ Not able to add to scraper", True)
173 |
174 | else:
175 | self.zero.INSTA_USER = lastInsta
176 | self.zero.printText("+ Scraper enabled for: {}".format(self.zero.INSTA_USER), True)
177 |
178 | def lastSearch(self):
179 | self.zero.printText("\n- Loading last scraper for Instagram from DB", True)
180 | lastInsta = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_OPTIONS, (self.zero.LAST_INSTA_TEXT, ))
181 | if lastInsta == 0:
182 | self.zero.printText("+ No last search for Instagram found", True)
183 | self.addLastInsta(False)
184 |
185 | else:
186 | self.zero.printText("+ Last user scraped: {}".format(lastInsta[0][1]), True)
187 | goon = input("+ Continue with user? [Y/n] ")
188 |
189 | if goon.lower().strip() == "n":
190 | self.addLastInsta(True)
191 | else:
192 | self.zero.INSTA_USER = lastInsta[0][1]
193 |
194 | def setupLogin(self):
195 | userList = self.runUserCheck()
196 | if userList != True:
197 | self.zero.printText("+ User list imported", True)
198 | count = 0
199 | for i in userList:
200 | count += 1
201 | self.zero.printText("[{}] {} ({}) (Last used: {})".format(count, i[0], i[3].strip(), i[6]), True)
202 | selectUser = input("+ Select user (1-{}): ".format(count))
203 |
204 | if not selectUser.isnumeric():
205 | self.zero.printText("+ Invalid input, #1 selected", True)
206 | selectUser = 1
207 |
208 | if int(selectUser) > count:
209 | self.zero.printText("+ Invalid input, #1 selected", True)
210 | selectUser = 1
211 |
212 | newNumber = int(selectUser) - 1
213 | self.setCurrentUserUpdate(userList[newNumber][0].strip(), userList[newNumber][1].strip())
214 |
215 | def countCurrentUser(self):
216 | userList = self.dbTool.getValueSQLnoinput(self.dbConn, self.zero.DB_SELECT_LOGIN_INSTA)
217 | count = 0
218 |
219 | if userList != 0:
220 | for i in userList:
221 | count =+ 1
222 |
223 | self.zero.TOTAL_USER_COUNT = count
224 |
225 | def loadLoginText(self):
226 | currentTime = datetime.today()
227 | self.zero.printText("\n- Loading user and password from file", True)
228 | for file in self.zero.USER_FILES:
229 | fullpath = self.zero.OP_ROOT_FOLDER_PATH_VALUE + file[0]
230 | if os.path.isfile(fullpath):
231 | self.zero.printText("+ Found: {}, extracting data".format(fullpath), True)
232 | with open(fullpath) as fp:
233 | line = fp.readline()
234 | while line:
235 | newUser = line.strip().split(",")
236 |
237 | if len(newUser[0]) != 0:
238 | #Check if exists
239 | password = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_LOGIN_PASSWORD_INSTA , (newUser[0],))
240 |
241 | if password != 0:
242 | self.zero.printText("+ User allready exist: {}".format(newUser[0]), False)
243 |
244 | else:
245 | self.zero.printText("+ User NOT found adding user: {}. ".format(newUser[0]), False)
246 | INSERT_DATA = (newUser[0].strip(), newUser[1].strip(), newUser[2].strip(), newUser[3].strip(), newUser[4].strip(), currentTime)
247 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_INSERT_LOGIN_INSTA, INSERT_DATA)
248 | password = self.dbTool.getValueSQL(self.dbConn, self.zero.DB_SELECT_LOGIN_PASSWORD_INSTA , (newUser[0].strip(),))
249 |
250 | if password == 0:
251 | self.zero.printText("+ Not able to add user", False)
252 | else:
253 | self.zero.printText("+ User add OK", False)
254 | line = fp.readline()
255 | else:
256 | self.zero.printText("+ File: {} does not exist".format(fullpath), False)
257 |
--------------------------------------------------------------------------------
/optracker/igramscraper/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NorseByte/opensource-tracker/5388d845ba57bf6ca0aa80575608f4b903e1b8dc/optracker/igramscraper/__init__.py
--------------------------------------------------------------------------------
/optracker/igramscraper/endpoints.py:
--------------------------------------------------------------------------------
1 | import urllib.parse
2 | import json
3 |
4 | USER_MEDIAS = '17880160963012870'
5 | USER_STORIES = '17890626976041463'
6 | STORIES = '17873473675158481'
7 |
8 | BASE_URL = 'https://www.instagram.com'
9 | LOGIN_URL = 'https://www.instagram.com/accounts/login/ajax/'
10 | ACCOUNT_PAGE = 'https://www.instagram.com/%s'
11 | MEDIA_LINK = 'https://www.instagram.com/p/%s'
12 | ACCOUNT_MEDIAS = 'https://www.instagram.com/graphql/query/?query_hash=42323d64886122307be10013ad2dcc44&variables=%s'
13 | ACCOUNT_JSON_INFO = 'https://www.instagram.com/%s/?__a=1'
14 | MEDIA_JSON_INFO = 'https://www.instagram.com/p/%s/?__a=1'
15 | MEDIA_JSON_BY_LOCATION_ID = 'https://www.instagram.com/explore/locations/%s/?__a=1&max_id=%s'
16 | MEDIA_JSON_BY_TAG = 'https://www.instagram.com/explore/tags/%s/?__a=1&max_id=%s'
17 | GENERAL_SEARCH = 'https://www.instagram.com/web/search/topsearch/?query=%s'
18 | COMMENTS_BEFORE_COMMENT_ID_BY_CODE = 'https://www.instagram.com/graphql/query/?query_hash=97b41c52301f77ce508f55e66d17620e&variables=%s'
19 | LIKES_BY_SHORTCODE_OLD = 'https://www.instagram.com/graphql/query/?query_id=17864450716183058&variables={"shortcode":"%s","first":%s,"after":"%s"}'
20 | LIKES_BY_SHORTCODE = 'https://www.instagram.com/graphql/query/?query_hash=d5d763b1e2acf209d62d22d184488e57&variables=%s'
21 | FOLLOWING_URL_OLD = 'https://www.instagram.com/graphql/query/?query_id=17874545323001329&id={{accountId}}&first={{count}}&after={{after}}'
22 | FOLLOWING_URL = 'https://www.instagram.com/graphql/query/?query_hash=d04b0a864b4b54837c0d870b0e77e076&variables=%s'
23 | FOLLOWERS_URL_OLD = 'https://www.instagram.com/graphql/query/?query_id=17851374694183129&id={{accountId}}&first={{count}}&after={{after}}'
24 | FOLLOWERS_URL = 'https://www.instagram.com/graphql/query/?query_hash=c76146de99bb02f6415203be841dd25a&variables=%s'
25 | FOLLOW_URL = 'https://www.instagram.com/web/friendships/%s/follow/'
26 | UNFOLLOW_URL = 'https://www.instagram.com/web/friendships/%s/unfollow/'
27 | INSTAGRAM_CDN_URL = 'https://scontent.cdninstagram.com/'
28 | ACCOUNT_JSON_PRIVATE_INFO_BY_ID = 'https://i.instagram.com/api/v1/users/%s/info/'
29 | LIKE_URL = 'https://www.instagram.com/web/likes/%s/like/'
30 | UNLIKE_URL = 'https://www.instagram.com/web/likes/%s/unlike/'
31 | ADD_COMMENT_URL = 'https://www.instagram.com/web/comments/%s/add/'
32 | DELETE_COMMENT_URL = 'https://www.instagram.com/web/comments/%s/delete/%s/'
33 |
34 | ACCOUNT_MEDIAS2 = 'https://www.instagram.com/graphql/query/?query_id=17880160963012870&id={{accountId}}&first=10&after='
35 |
36 | GRAPH_QL_QUERY_URL = 'https://www.instagram.com/graphql/query/?query_id=%s'
37 |
38 | request_media_count = 30
39 |
40 |
41 | def get_account_page_link(username):
42 | return ACCOUNT_PAGE % urllib.parse.quote_plus(username)
43 |
44 |
45 | def get_account_json_link(username):
46 | return ACCOUNT_JSON_INFO % urllib.parse.quote_plus(username)
47 |
48 |
49 | def get_account_json_private_info_link_by_account_id(account_id):
50 | return ACCOUNT_JSON_PRIVATE_INFO_BY_ID % urllib.parse.quote_plus(str(account_id))
51 |
52 |
53 | def get_account_medias_json_link(variables):
54 | return ACCOUNT_MEDIAS % urllib.parse.quote_plus(json.dumps(variables, separators=(',', ':')))
55 |
56 |
57 | def get_media_page_link(code):
58 | return MEDIA_LINK % urllib.parse.quote_plus(code)
59 |
60 |
61 | def get_media_json_link(code):
62 | return MEDIA_JSON_INFO % urllib.parse.quote_plus(code)
63 |
64 |
65 | def get_medias_json_by_location_id_link(facebook_location_id, max_id=''):
66 | return MEDIA_JSON_BY_LOCATION_ID % (urllib.parse.quote_plus(str(facebook_location_id)), urllib.parse.quote_plus(max_id))
67 |
68 |
69 | def get_medias_json_by_tag_link(tag, max_id=''):
70 | return MEDIA_JSON_BY_TAG % (urllib.parse.quote_plus(str(tag)), urllib.parse.quote_plus(str(max_id)))
71 |
72 |
73 | def get_general_search_json_link(query):
74 | return GENERAL_SEARCH % urllib.parse.quote_plus(query)
75 |
76 |
77 | def get_comments_before_comments_id_by_code(variables):
78 | return COMMENTS_BEFORE_COMMENT_ID_BY_CODE % urllib.parse.quote_plus(json.dumps(variables, separators=(',', ':')))
79 |
80 |
81 | def get_last_likes_by_code_old(code, count, last_like_id):
82 | return LIKES_BY_SHORTCODE_OLD % (urllib.parse.quote_plus(code), urllib.parse.quote_plus(str(count)), urllib.parse.quote_plus(str(last_like_id)))
83 |
84 |
85 | def get_last_likes_by_code(variables):
86 | return LIKES_BY_SHORTCODE % urllib.parse.quote_plus(json.dumps(variables, separators=(',', ':')))
87 |
88 |
89 | def get_follow_url(account_id):
90 | return FOLLOW_URL % urllib.parse.quote_plus(account_id)
91 |
92 |
93 | def get_unfollow_url(account_id):
94 | return UNFOLLOW_URL % urllib.parse.quote_plus(account_id)
95 |
96 |
97 | def get_followers_json_link_old(account_id, count, after=''):
98 | url = FOLLOWERS_URL_OLD.replace(
99 | '{{accountId}}', urllib.parse.quote_plus(account_id))
100 | url = url.replace('{{count}}', urllib.parse.quote_plus(str(count)))
101 |
102 | if after == '':
103 | url = url.replace('&after={{after}}', '')
104 | else:
105 | url = url.replace('{{after}}', urllib.parse.quote_plus(str(after)))
106 |
107 | return url
108 |
109 | def get_followers_json_link(variables):
110 | return FOLLOWERS_URL % urllib.parse.quote_plus(json.dumps(variables, separators=(',', ':')))
111 |
112 |
113 | def get_following_json_link_old(account_id, count, after=''):
114 | url = FOLLOWING_URL_OLD.replace(
115 | '{{accountId}}', urllib.parse.quote_plus(account_id))
116 | url = url.replace('{{count}}', urllib.parse.quote_plus(count))
117 |
118 | if after == '':
119 | url = url.replace('&after={{after}}', '')
120 | else:
121 | url = url.replace('{{after}}', urllib.parse.quote_plus(after))
122 |
123 | return url
124 |
125 | def get_following_json_link(variables):
126 | return FOLLOWING_URL % urllib.parse.quote_plus(json.dumps(variables, separators=(',', ':')))
127 |
128 | def get_user_stories_link():
129 | return get_graph_ql_url(USER_STORIES, {'variables': json.dumps([], separators=(',', ':'))})
130 |
131 |
132 | def get_graph_ql_url(query_id, parameters):
133 | url = GRAPH_QL_QUERY_URL % urllib.parse.quote_plus(query_id)
134 |
135 | if len(parameters) > 0:
136 | query_string = urllib.parse.urlencode(parameters)
137 | url += '&' + query_string
138 |
139 | return url
140 |
141 |
142 | def get_stories_link(variables):
143 | return get_graph_ql_url(STORIES, {'variables': json.dumps(variables, separators=(',', ':'))})
144 |
145 |
146 | def get_like_url(media_id):
147 | return LIKE_URL % urllib.parse.quote_plus(str(media_id))
148 |
149 |
150 | def get_unlike_url(media_id):
151 | return UNLIKE_URL % urllib.parse.quote_plus(str(media_id))
152 |
153 |
154 | def get_add_comment_url(media_id):
155 | return ADD_COMMENT_URL % urllib.parse.quote_plus(str(media_id))
156 |
157 |
158 | def get_delete_comment_url(media_id, comment_id):
159 | return DELETE_COMMENT_URL % (urllib.parse.quote_plus(str(media_id)), urllib.parse.quote_plus(str(comment_id)))
160 |
161 |
--------------------------------------------------------------------------------
/optracker/igramscraper/exception/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NorseByte/opensource-tracker/5388d845ba57bf6ca0aa80575608f4b903e1b8dc/optracker/igramscraper/exception/__init__.py
--------------------------------------------------------------------------------
/optracker/igramscraper/exception/instagram_auth_exception.py:
--------------------------------------------------------------------------------
1 | class InstagramAuthException(Exception):
2 | def __init__(self, message = "", code = 401):
3 | super().__init__(f'{message}, Code:{code}')
--------------------------------------------------------------------------------
/optracker/igramscraper/exception/instagram_exception.py:
--------------------------------------------------------------------------------
1 | class InstagramException(Exception):
2 | StatusCode = -1
3 |
4 | def __init__(self, message="", code=500):
5 | super().__init__(f'{message}, Code:{code}')
6 |
7 | @staticmethod
8 | def default(response_text, status_code):
9 | StatusCode = status_code
10 | return InstagramException(
11 | 'Response code is {status_code}. Body: {response_text} '
12 | 'Something went wrong. Please report issue.'.format(
13 | response_text=response_text, status_code=status_code),
14 | status_code)
15 |
--------------------------------------------------------------------------------
/optracker/igramscraper/exception/instagram_not_found_exception.py:
--------------------------------------------------------------------------------
1 | class InstagramNotFoundException(Exception):
2 | def __init__(self, message="", code=404):
3 | super().__init__(f'{message}, Code:{code}')
4 |
--------------------------------------------------------------------------------
/optracker/igramscraper/helper.py:
--------------------------------------------------------------------------------
1 | # -*- coding:utf-8 -*-
2 |
3 | from functools import reduce
4 | import signal
5 |
6 |
7 | def get_from_dict(data_dict, map_list, default=None):
8 | def getitem(source, key):
9 | try:
10 | if isinstance(source, list):
11 | return source[int(key)]
12 | if isinstance(source, dict) and key not in source.keys():
13 | return default
14 | if not source:
15 | return default
16 | except IndexError:
17 | return default
18 |
19 | return source[key]
20 |
21 | if isinstance(map_list, str):
22 | map_list = map_list.split('.')
23 |
24 | return reduce(getitem, map_list, data_dict)
25 |
26 |
27 | def set_timeout(num, callback):
28 | """
29 | A decorator to limit the method run time.
30 | example:
31 | def after_timeout(): # callback function
32 | print("Time out!")
33 | @set_timeout(2, after_timeout) # 2s limited
34 | def connect():
35 | time.sleep(3)
36 | print('Finished without timeout.')
37 | :param num:
38 | :param callback:
39 | :return:
40 | """
41 | def wrap(func):
42 | def handle(signum, frame):
43 | raise RuntimeError
44 |
45 | def to_do(*args, **kwargs):
46 | try:
47 | signal.signal(signal.SIGALRM, handle)
48 | signal.alarm(num)
49 | r = func(*args, **kwargs)
50 | signal.alarm(0)
51 | return r
52 | except RuntimeError as e:
53 | callback()
54 | return to_do
55 | return wrap
--------------------------------------------------------------------------------
/optracker/igramscraper/instagram.py:
--------------------------------------------------------------------------------
1 | import time
2 | import requests
3 | import re
4 | import json
5 | import hashlib
6 | import os
7 | from slugify import slugify
8 | import random
9 | from .session_manager import CookieSessionManager
10 | from .exception.instagram_auth_exception import InstagramAuthException
11 | from .exception.instagram_exception import InstagramException
12 | from .exception.instagram_not_found_exception import InstagramNotFoundException
13 | from .model.account import Account
14 | from .model.comment import Comment
15 | from .model.location import Location
16 | from .model.media import Media
17 | from .model.story import Story
18 | from .model.user_stories import UserStories
19 | from .model.tag import Tag
20 | from . import endpoints
21 | from .two_step_verification.console_verification import ConsoleVerification
22 |
23 | class Instagram:
24 | HTTP_NOT_FOUND = 404
25 | HTTP_OK = 200
26 | HTTP_FORBIDDEN = 403
27 | HTTP_BAD_REQUEST = 400
28 |
29 | MAX_COMMENTS_PER_REQUEST = 300
30 | MAX_LIKES_PER_REQUEST = 50
31 | # 30 mins time limit on operations that require multiple self.__req
32 | PAGING_TIME_LIMIT_SEC = 1800
33 | PAGING_DELAY_MINIMUM_MICROSEC = 1000000 # 1 sec min delay to simulate browser
34 | PAGING_DELAY_MAXIMUM_MICROSEC = 3000000 # 3 sec max delay to simulate browser
35 |
36 | instance_cache = None
37 |
38 | def __init__(self, sleep_between_requests=0):
39 | self.__req = requests.session()
40 | self.paging_time_limit_sec = Instagram.PAGING_TIME_LIMIT_SEC
41 | self.paging_delay_minimum_microsec = Instagram.PAGING_DELAY_MINIMUM_MICROSEC
42 | self.paging_delay_maximum_microsec = Instagram.PAGING_DELAY_MAXIMUM_MICROSEC
43 |
44 | self.session_username = None
45 | self.session_password = None
46 | self.user_session = None
47 | self.rhx_gis = None
48 | self.sleep_between_requests = sleep_between_requests
49 | self.user_agent = 'Mozilla/5.0 (iPhone; CPU iPhone OS 12_3_1 like Mac OS X)' \
50 | 'AppleWebKit/605.1.15 (KHTML, like Gecko)' \
51 | 'Mobile/15E148 Instagram 105.0.0.11.118 (iPhone11,8; iOS 12_3_1; en_US; en-US; scale=2.00; 828x1792; 165586599)'
52 |
53 | def with_credentials(self, username, password, session_folder=None):
54 | """
55 | param string username
56 | param string password
57 | param null sessionFolder
58 |
59 | return Instagram
60 | """
61 | Instagram.instance_cache = None
62 |
63 | if not session_folder:
64 | cwd = os.getcwd()
65 | session_folder = cwd + os.path.sep + 'sessions' + os.path.sep
66 |
67 | if isinstance(session_folder, str):
68 |
69 | Instagram.instance_cache = CookieSessionManager(
70 | session_folder, slugify(username) + '.txt')
71 |
72 | else:
73 | Instagram.instance_cache = session_folder
74 |
75 | Instagram.instance_cache.empty_saved_cookies()
76 |
77 |
78 | self.session_username = username
79 | self.session_password = password
80 |
81 | def set_proxies(self, proxy):
82 | if proxy and isinstance(proxy, dict):
83 | self.__req.proxies = proxy
84 |
85 | def disable_verify(self):
86 | self.__req.verify = False
87 |
88 | def disable_proxies(self):
89 | self.__req.proxies = {}
90 |
91 | def get_user_agent(self):
92 | return self.user_agent
93 |
94 | def set_user_agent(self, user_agent):
95 | self.user_agent = user_agent
96 |
97 | @staticmethod
98 | def set_account_medias_request_count(count):
99 | """
100 | Set how many media objects should be retrieved in a single request
101 | param int count
102 | """
103 | endpoints.request_media_count = count
104 |
105 | def get_account_by_id(self, id):
106 | """
107 | :param id: account id
108 | :return: Account
109 | """
110 | username = self.get_username_by_id(id)
111 | return self.get_account(username)
112 |
113 | def get_username_by_id(self, id):
114 | """
115 | :param id: account id
116 | :return: username string from response
117 | """
118 | time.sleep(self.sleep_between_requests)
119 | response = self.__req.get(
120 | endpoints.get_account_json_private_info_link_by_account_id(
121 | id), headers=self.generate_headers(self.user_session))
122 |
123 | if Instagram.HTTP_NOT_FOUND == response.status_code:
124 | raise InstagramNotFoundException(
125 | 'Failed to fetch account with given id')
126 |
127 | if Instagram.HTTP_OK != response.status_code:
128 | raise InstagramException.default(response.text,
129 | response.status_code)
130 |
131 | json_response = response.json()
132 | if not json_response:
133 | raise InstagramException('Response does not JSON')
134 |
135 | if json_response['status'] != 'ok':
136 | message = json_response['message'] if (
137 | 'message' in json_response.keys()) else 'Unknown Error'
138 | raise InstagramException(message)
139 |
140 | return json_response['user']['username']
141 |
142 | def generate_headers(self, session, gis_token=None):
143 | """
144 | :param session: user session dict
145 | :param gis_token: a token used to be verified by instagram in header
146 | :return: header dict
147 | """
148 | headers = {}
149 | if session is not None:
150 | cookies = ''
151 |
152 | for key in session.keys():
153 | cookies += f"{key}={session[key]}; "
154 |
155 | csrf = session['x-csrftoken'] if session['csrftoken'] is None else \
156 | session['csrftoken']
157 |
158 | headers = {
159 | 'cookie': cookies,
160 | 'referer': endpoints.BASE_URL + '/',
161 | 'x-csrftoken': csrf
162 | }
163 |
164 | if self.user_agent is not None:
165 | headers['user-agent'] = self.user_agent
166 |
167 | if gis_token is not None:
168 | headers['x-instagram-gis'] = gis_token
169 |
170 | return headers
171 |
172 | def __generate_gis_token(self, variables):
173 | """
174 | :param variables: a dict used to generate_gis_token
175 | :return: a token used to be verified by instagram
176 | """
177 | rhx_gis = self.__get_rhx_gis() if self.__get_rhx_gis() is not None else 'NULL'
178 | string_to_hash = ':'.join([rhx_gis, json.dumps(variables, separators=(',', ':')) if isinstance(variables, dict) else variables])
179 | return hashlib.md5(string_to_hash.encode('utf-8')).hexdigest()
180 |
181 | def __get_rhx_gis(self):
182 | """
183 | :return: a string to generate gis_token
184 | """
185 | if self.rhx_gis is None:
186 | try:
187 | shared_data = self.__get_shared_data_from_page()
188 | except Exception as _:
189 | raise InstagramException('Could not extract gis from page')
190 |
191 | if 'rhx_gis' in shared_data.keys():
192 | self.rhx_gis = shared_data['rhx_gis']
193 | else:
194 | self.rhx_gis = None
195 |
196 | return self.rhx_gis
197 |
198 | def __get_mid(self):
199 | """manually fetches the machine id from graphQL"""
200 | time.sleep(self.sleep_between_requests)
201 | response = self.__req.get('https://www.instagram.com/web/__mid/')
202 |
203 | if response.status_code != Instagram.HTTP_OK:
204 | raise InstagramException.default(response.text,
205 | response.status_code)
206 |
207 | return response.text
208 |
209 | def __get_shared_data_from_page(self, url=endpoints.BASE_URL):
210 | """
211 | :param url: the requested url
212 | :return: a dict extract from page
213 | """
214 | url = url.rstrip('/') + '/'
215 | time.sleep(self.sleep_between_requests)
216 | response = self.__req.get(url, headers=self.generate_headers(
217 | self.user_session))
218 |
219 | if Instagram.HTTP_NOT_FOUND == response.status_code:
220 | raise InstagramNotFoundException(f"Page {url} not found")
221 |
222 | if not Instagram.HTTP_OK == response.status_code:
223 | raise InstagramException.default(response.text,
224 | response.status_code)
225 |
226 | return Instagram.extract_shared_data_from_body(response.text)
227 |
228 | @staticmethod
229 | def extract_shared_data_from_body(body):
230 | """
231 | :param body: html string from a page
232 | :return: a dict extract from page
233 | """
234 | array = re.findall(r'_sharedData = .*?;', body)
235 | if len(array) > 0:
236 | raw_json = array[0][len("_sharedData ="):-len(";")]
237 |
238 | return json.loads(raw_json)
239 |
240 | return None
241 |
242 | def search_tags_by_tag_name(self, tag):
243 | """
244 | :param tag: tag string
245 | :return: list of Tag
246 | """
247 | # TODO: Add tests and auth
248 | time.sleep(self.sleep_between_requests)
249 | response = self.__req.get(endpoints.get_general_search_json_link(tag))
250 |
251 | if Instagram.HTTP_NOT_FOUND == response.status_code:
252 | raise InstagramNotFoundException(
253 | 'Account with given username does not exist.')
254 |
255 | if not Instagram.HTTP_OK == response.status_code:
256 | raise InstagramException.default(response.text,
257 | response.status_code)
258 |
259 | json_response = response.json()
260 |
261 | try:
262 | status = json_response['status']
263 | if status != 'ok':
264 | raise InstagramException(
265 | 'Response code is not equal 200. '
266 | 'Something went wrong. Please report issue.')
267 | except KeyError:
268 | raise InstagramException('Response code is not equal 200. Something went wrong. Please report issue.')
269 |
270 | try:
271 | hashtags_raw = json_response['hashtags']
272 | if len(hashtags_raw) == 0:
273 | return []
274 | except KeyError:
275 | return []
276 |
277 | hashtags = []
278 | for json_hashtag in hashtags_raw:
279 | hashtags.append(Tag(json_hashtag['hashtag']))
280 |
281 | return hashtags
282 |
283 | def get_medias(self, username, count=20, maxId=''):
284 | """
285 | :param username: instagram username
286 | :param count: the number of how many media you want to get
287 | :param maxId: used to paginate
288 | :return: list of Media
289 | """
290 | account = self.get_account(username)
291 | return self.get_medias_by_user_id(account.identifier, count, maxId)
292 |
293 | def get_medias_by_code(self, media_code):
294 | """
295 | :param media_code: media code
296 | :return: Media
297 | """
298 | url = endpoints.get_media_page_link(media_code)
299 | return self.get_media_by_url(url)
300 |
301 | def get_medias_by_user_id(self, id, count=12, max_id=''):
302 | """
303 | :param id: instagram account id
304 | :param count: the number of how many media you want to get
305 | :param max_id: used to paginate
306 | :return: list of Media
307 | """
308 | index = 0
309 | medias = []
310 | is_more_available = True
311 |
312 | while index < count and is_more_available:
313 |
314 | variables = {
315 | 'id': str(id),
316 | 'first': str(count),
317 | 'after': str(max_id)
318 | }
319 |
320 | headers = self.generate_headers(self.user_session,
321 | self.__generate_gis_token(
322 | variables))
323 |
324 | time.sleep(self.sleep_between_requests)
325 | response = self.__req.get(
326 | endpoints.get_account_medias_json_link(variables),
327 | headers=headers)
328 |
329 | if not Instagram.HTTP_OK == response.status_code:
330 | raise InstagramException.default(response.text,
331 | response.status_code)
332 |
333 | arr = json.loads(response.text)
334 |
335 | try:
336 | nodes = arr['data']['user']['edge_owner_to_timeline_media'][
337 | 'edges']
338 | except KeyError:
339 | return {}
340 |
341 | for mediaArray in nodes:
342 | if index == count:
343 | return medias
344 |
345 | media = Media(mediaArray['node'])
346 | medias.append(media)
347 | index += 1
348 |
349 | if not nodes or nodes == '':
350 | return medias
351 |
352 | max_id = \
353 | arr['data']['user']['edge_owner_to_timeline_media'][
354 | 'page_info'][
355 | 'end_cursor']
356 | is_more_available = \
357 | arr['data']['user']['edge_owner_to_timeline_media'][
358 | 'page_info'][
359 | 'has_next_page']
360 |
361 | return medias
362 |
363 | def get_media_by_id(self, media_id):
364 | """
365 | :param media_id: media id
366 | :return: list of Media
367 | """
368 | media_link = Media.get_link_from_id(media_id)
369 | return self.get_media_by_url(media_link)
370 |
371 | def get_media_by_url(self, media_url):
372 | """
373 | :param media_url: media url
374 | :return: Media
375 | """
376 | url_regex = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
377 |
378 | if len(re.findall(url_regex, media_url)) <= 0:
379 | raise ValueError('Malformed media url')
380 |
381 | url = media_url.rstrip('/') + '/?__a=1'
382 | time.sleep(self.sleep_between_requests)
383 | response = self.__req.get(url, headers=self.generate_headers(
384 | self.user_session))
385 |
386 | if Instagram.HTTP_NOT_FOUND == response.status_code:
387 | raise InstagramNotFoundException(
388 | 'Media with given code does not exist or account is private.')
389 |
390 | if Instagram.HTTP_OK != response.status_code:
391 | raise InstagramException.default(response.text,
392 | response.status_code)
393 |
394 | media_array = response.json()
395 | try:
396 | media_in_json = media_array['graphql']['shortcode_media']
397 | except KeyError:
398 | raise InstagramException('Media with this code does not exist')
399 |
400 | return Media(media_in_json)
401 |
402 | def get_medias_from_feed(self, username, count=20):
403 | """
404 | :param username: instagram username
405 | :param count: the number of how many media you want to get
406 | :return: list of Media
407 | """
408 | medias = []
409 | index = 0
410 | time.sleep(self.sleep_between_requests)
411 | response = self.__req.get(endpoints.get_account_json_link(username),
412 | headers=self.generate_headers(
413 | self.user_session))
414 |
415 | if Instagram.HTTP_NOT_FOUND == response.status_code:
416 | raise InstagramNotFoundException(
417 | 'Account with given username does not exist.')
418 |
419 | if Instagram.HTTP_OK != response.status_code:
420 | raise InstagramException.default(response.text,
421 | response.status_code)
422 |
423 | user_array = response.json()
424 |
425 | try:
426 | user = user_array['graphql']['user']
427 | except KeyError:
428 | raise InstagramNotFoundException(
429 | 'Account with this username does not exist')
430 |
431 | try:
432 | nodes = user['edge_owner_to_timeline_media']['edges']
433 | if len(nodes) == 0:
434 | return []
435 | except Exception:
436 | return []
437 |
438 | for media_array in nodes:
439 | if index == count:
440 | return medias
441 | medias.append(Media(media_array['node']))
442 | index += 1
443 |
444 | return medias
445 |
446 | def get_medias_by_tag(self, tag, count=12, max_id='', min_timestamp=None):
447 | """
448 | :param tag: tag string
449 | :param count: the number of how many media you want to get
450 | :param max_id: used to paginate
451 | :param min_timestamp: limit the time you want to start from
452 | :return: list of Media
453 | """
454 | index = 0
455 | medias = []
456 | media_ids = []
457 | has_next_page = True
458 | while index < count and has_next_page:
459 |
460 | time.sleep(self.sleep_between_requests)
461 | response = self.__req.get(
462 | endpoints.get_medias_json_by_tag_link(tag, max_id),
463 | headers=self.generate_headers(self.user_session))
464 |
465 | if response.status_code != Instagram.HTTP_OK:
466 | raise InstagramException.default(response.text,
467 | response.status_code)
468 |
469 | arr = response.json()
470 |
471 | try:
472 | arr['graphql']['hashtag']['edge_hashtag_to_media']['count']
473 | except KeyError:
474 | return []
475 |
476 | nodes = arr['graphql']['hashtag']['edge_hashtag_to_media']['edges']
477 | for media_array in nodes:
478 | if index == count:
479 | return medias
480 | media = Media(media_array['node'])
481 | if media.identifier in media_ids:
482 | return medias
483 |
484 | if min_timestamp is not None \
485 | and media.created_time < min_timestamp:
486 | return medias
487 |
488 | media_ids.append(media.identifier)
489 | medias.append(media)
490 | index += 1
491 |
492 | if len(nodes) == 0:
493 | return medias
494 |
495 | max_id = \
496 | arr['graphql']['hashtag']['edge_hashtag_to_media']['page_info'][
497 | 'end_cursor']
498 | has_next_page = \
499 | arr['graphql']['hashtag']['edge_hashtag_to_media']['page_info'][
500 | 'has_next_page']
501 |
502 | return medias
503 |
504 | def get_medias_by_location_id(self, facebook_location_id, count=24,
505 | max_id=''):
506 | """
507 | :param facebook_location_id: facebook location id
508 | :param count: the number of how many media you want to get
509 | :param max_id: used to paginate
510 | :return: list of Media
511 | """
512 | index = 0
513 | medias = []
514 | has_next_page = True
515 |
516 | while index < count and has_next_page:
517 |
518 | time.sleep(self.sleep_between_requests)
519 | response = self.__req.get(
520 | endpoints.get_medias_json_by_location_id_link(
521 | facebook_location_id, max_id),
522 | headers=self.generate_headers(self.user_session))
523 |
524 | if response.status_code != Instagram.HTTP_OK:
525 | raise InstagramException.default(response.text,
526 | response.status_code)
527 |
528 | arr = response.json()
529 |
530 | nodes = arr['graphql']['location']['edge_location_to_media'][
531 | 'edges']
532 | for media_array in nodes:
533 | if index == count:
534 | return medias
535 |
536 | medias.append(Media(media_array['node']))
537 | index += 1
538 |
539 | if len(nodes) == 0:
540 | return medias
541 |
542 | has_next_page = \
543 | arr['graphql']['location']['edge_location_to_media'][
544 | 'page_info'][
545 | 'has_next_page']
546 | max_id = \
547 | arr['graphql']['location']['edge_location_to_media'][
548 | 'page_info'][
549 | 'end_cursor']
550 |
551 | return medias
552 |
553 | def get_current_top_medias_by_tag_name(self, tag_name):
554 | """
555 | :param tag_name: tag string
556 | :return: list of the top Media
557 | """
558 | time.sleep(self.sleep_between_requests)
559 | response = self.__req.get(
560 | endpoints.get_medias_json_by_tag_link(tag_name, ''),
561 | headers=self.generate_headers(self.user_session))
562 |
563 | if response.status_code == Instagram.HTTP_NOT_FOUND:
564 | raise InstagramNotFoundException(
565 | 'Account with given username does not exist.')
566 |
567 | if response.status_code is not Instagram.HTTP_OK:
568 | raise InstagramException.default(response.text,
569 | response.status_code)
570 |
571 | json_response = response.json()
572 | medias = []
573 |
574 | nodes = \
575 | json_response['graphql']['hashtag']['edge_hashtag_to_top_posts'][
576 | 'edges']
577 |
578 | for media_array in nodes:
579 | medias.append(Media(media_array['node']))
580 |
581 | return medias
582 |
583 | def get_current_top_medias_by_location_id(self, facebook_location_id):
584 | """
585 | :param facebook_location_id: facebook location id
586 | :return: list of the top Media
587 | """
588 | time.sleep(self.sleep_between_requests)
589 | response = self.__req.get(
590 | endpoints.get_medias_json_by_location_id_link(facebook_location_id),
591 | headers=self.generate_headers(self.user_session))
592 | if response.status_code == Instagram.HTTP_NOT_FOUND:
593 | raise InstagramNotFoundException(
594 | "Location with this id doesn't exist")
595 |
596 | if response.status_code != Instagram.HTTP_OK:
597 | raise InstagramException.default(response.text,
598 | response.status_code)
599 |
600 | json_response = response.json()
601 |
602 | nodes = \
603 | json_response['graphql']['location']['edge_location_to_top_posts'][
604 | 'edges']
605 | medias = []
606 |
607 | for media_array in nodes:
608 | medias.append(Media(media_array['node']))
609 |
610 | return medias
611 |
612 | def get_paginate_medias(self, username, max_id=''):
613 | """
614 | :param username: instagram user name
615 | :param max_id: used to paginate next time
616 | :return: dict that contains Media list, maxId, hasNextPage
617 | """
618 | account = self.get_account(username)
619 | has_next_page = True
620 | medias = []
621 |
622 | to_return = {
623 | 'medias': medias,
624 | 'maxId': max_id,
625 | 'hasNextPage': has_next_page,
626 | }
627 |
628 | variables = json.dumps({
629 | 'id': str(account.identifier),
630 | 'first': str(endpoints.request_media_count),
631 | 'after': str(max_id)
632 | }, separators=(',', ':'))
633 |
634 | time.sleep(self.sleep_between_requests)
635 | response = self.__req.get(
636 | endpoints.get_account_medias_json_link(variables),
637 | headers=self.generate_headers(self.user_session,
638 | self.__generate_gis_token(variables))
639 | )
640 |
641 | if not Instagram.HTTP_OK == response.status_code:
642 | raise InstagramException.default(response.text,
643 | response.status_code)
644 |
645 | arr = response.json()
646 |
647 | try:
648 | nodes = arr['data']['user']['edge_owner_to_timeline_media']['edges']
649 | except KeyError:
650 | return to_return
651 |
652 | for mediaArray in nodes:
653 | medias.append(Media(mediaArray['node']))
654 |
655 | max_id = \
656 | arr['data']['user']['edge_owner_to_timeline_media']['page_info'][
657 | 'end_cursor']
658 | has_next_page = \
659 | arr['data']['user']['edge_owner_to_timeline_media']['page_info'][
660 | 'has_next_page']
661 |
662 | to_return = {
663 | 'medias': medias,
664 | 'maxId': max_id,
665 | 'hasNextPage': has_next_page,
666 | }
667 |
668 | return to_return
669 |
670 | def get_paginate_medias_by_tag(self, tag, max_id=''):
671 | """
672 | :param tag: tag name
673 | :param max_id: used to paginate next time
674 | :return: dict that contains Media list, maxId, hasNextPage
675 | """
676 | has_next_page = True
677 | medias = []
678 |
679 | to_return = {
680 | 'medias': medias,
681 | 'maxId': max_id,
682 | 'hasNextPage': has_next_page,
683 | }
684 |
685 | time.sleep(self.sleep_between_requests)
686 | response = self.__req.get(
687 | endpoints.get_medias_json_by_tag_link(tag, max_id),
688 | headers=self.generate_headers(self.user_session))
689 |
690 | if response.status_code != Instagram.HTTP_OK:
691 | raise InstagramException.default(response.text,
692 | response.status_code)
693 |
694 | arr = response.json()
695 |
696 | try:
697 | nodes = arr['graphql']['hashtag']['edge_hashtag_to_media']['edges']
698 | except KeyError:
699 | return to_return
700 |
701 | for media_array in nodes:
702 | medias.append(Media(media_array['node']))
703 |
704 | max_id = \
705 | arr['graphql']['hashtag']['edge_hashtag_to_media']['page_info'][
706 | 'end_cursor']
707 | has_next_page = \
708 | arr['graphql']['hashtag']['edge_hashtag_to_media']['page_info'][
709 | 'has_next_page']
710 | try:
711 | media_count = arr['graphql']['hashtag']['edge_hashtag_to_media'][
712 | 'count']
713 | except KeyError:
714 | return to_return
715 |
716 | to_return = {
717 | 'medias': medias,
718 | 'count': media_count,
719 | 'maxId': max_id,
720 | 'hasNextPage': has_next_page,
721 | }
722 |
723 | return to_return
724 |
725 | def get_location_by_id(self, facebook_location_id):
726 | """
727 | :param facebook_location_id: facebook location id
728 | :return: Location
729 | """
730 | time.sleep(self.sleep_between_requests)
731 | response = self.__req.get(
732 | endpoints.get_medias_json_by_location_id_link(facebook_location_id),
733 | headers=self.generate_headers(self.user_session))
734 |
735 | if response.status_code == Instagram.HTTP_NOT_FOUND:
736 | raise InstagramNotFoundException(
737 | 'Location with this id doesn\'t exist')
738 |
739 | if response.status_code != Instagram.HTTP_OK:
740 | raise InstagramException.default(response.text,
741 | response.status_code)
742 |
743 | json_response = response.json()
744 |
745 | return Location(json_response['graphql']['location'])
746 |
747 | def get_media_likes_by_code(self, code, count=10, max_id=None):
748 | """
749 | :param code:
750 | :param count:
751 | :param max_id:
752 | :return:
753 | """
754 |
755 | remain = count
756 | likes = []
757 | index = 0
758 | has_previous = True
759 |
760 | #TODO: $index < $count (bug index getting to high since max_likes_per_request gets sometimes changed by instagram)
761 |
762 | while (has_previous and index < count):
763 | if (remain > self.MAX_LIKES_PER_REQUEST):
764 | number_of_likes_to_receive = self.MAX_LIKES_PER_REQUEST
765 | remain -= self.MAX_LIKES_PER_REQUEST
766 | index += self.MAX_LIKES_PER_REQUEST
767 | else:
768 | number_of_likes_to_receive = remain
769 | index += remain
770 | remain = 0
771 |
772 | if (max_id != None):
773 | max_id = ''
774 |
775 | variables = {
776 | "shortcode": str(code),
777 | "first": str(number_of_likes_to_receive),
778 | "after": '' if not max_id else max_id
779 | }
780 |
781 | time.sleep(self.sleep_between_requests)
782 |
783 | response = self.__req.get(
784 | endpoints.get_last_likes_by_code(variables),
785 | headers=self.generate_headers(self.user_session))
786 |
787 | if not response.status_code == Instagram.HTTP_OK:
788 | raise InstagramException.default(response.text,response.status_code)
789 |
790 | jsonResponse = response.json()
791 |
792 | nodes = jsonResponse['data']['shortcode_media']['edge_liked_by']['edges']
793 |
794 | for likesArray in nodes:
795 |
796 | like = Account(likesArray['node'])
797 | likes.append(like)
798 |
799 |
800 | has_previous = jsonResponse['data']['shortcode_media']['edge_liked_by']['page_info']['has_next_page']
801 | number_of_likes = jsonResponse['data']['shortcode_media']['edge_liked_by']['count']
802 | if count > number_of_likes:
803 | count = number_of_likes
804 |
805 | if len(nodes) == 0:
806 | data = {}
807 | data['next_page'] = max_id
808 | data['accounts'] = likes
809 |
810 | return data
811 |
812 | max_id = jsonResponse['data']['shortcode_media']['edge_liked_by']['page_info']['end_cursor']
813 |
814 | data = {}
815 | data['next_page'] = max_id
816 | data['accounts'] = likes
817 |
818 | return data
819 |
820 | def get_followers(self, account_id, count=20, page_size=20, end_cursor='',
821 | delayed=True):
822 |
823 | """
824 | :param account_id:
825 | :param count:
826 | :param page_size:
827 | :param end_cursor:
828 | :param delayed:
829 | :return:
830 | """
831 | # TODO set time limit
832 | # if ($delayed) {
833 | # set_time_limit($this->pagingTimeLimitSec);
834 | # }
835 |
836 | index = 0
837 | accounts = []
838 |
839 | next_page = end_cursor
840 |
841 | if count < page_size:
842 | raise InstagramException(
843 | 'Count must be greater than or equal to page size.')
844 |
845 | while True:
846 | time.sleep(self.sleep_between_requests)
847 |
848 | variables = {
849 | 'id': str(account_id),
850 | 'first': str(count),
851 | 'after': next_page
852 | }
853 |
854 | headers = self.generate_headers(self.user_session)
855 |
856 | response = self.__req.get(
857 | endpoints.get_followers_json_link(variables),
858 | headers=headers)
859 |
860 | if not response.status_code == Instagram.HTTP_OK:
861 | raise InstagramException.default(response.text,
862 | response.status_code)
863 |
864 | jsonResponse = response.json()
865 |
866 | if jsonResponse['data']['user']['edge_followed_by']['count'] == 0:
867 | return accounts
868 |
869 | edgesArray = jsonResponse['data']['user']['edge_followed_by'][
870 | 'edges']
871 | if len(edgesArray) == 0:
872 | InstagramException(
873 | f'Failed to get followers of account id {account_id}.'
874 | f' The account is private.',
875 | Instagram.HTTP_FORBIDDEN)
876 |
877 | pageInfo = jsonResponse['data']['user']['edge_followed_by'][
878 | 'page_info']
879 | if pageInfo['has_next_page']:
880 | next_page = pageInfo['end_cursor']
881 |
882 | for edge in edgesArray:
883 |
884 | accounts.append(Account(edge['node']))
885 | index += 1
886 |
887 | if index >= count:
888 | #since break 2 not in python, looking for better solution since duplicate code
889 | data = {}
890 | data['next_page'] = next_page
891 | data['accounts'] = accounts
892 |
893 | return data
894 |
895 | #must be below here
896 | if not pageInfo['has_next_page']:
897 | break
898 |
899 | if delayed != None:
900 | # Random wait between 1 and 3 sec to mimic browser
901 | microsec = random.uniform(1.0, 3.0)
902 | time.sleep(microsec)
903 |
904 | data = {}
905 | data['next_page'] = next_page
906 | data['accounts'] = accounts
907 |
908 | return data
909 |
910 | def get_following(self, account_id, count=20, page_size=20, end_cursor='',
911 | delayed=True):
912 | """
913 | :param account_id:
914 | :param count:
915 | :param page_size:
916 | :param end_cursor:
917 | :param delayed:
918 | :return:
919 | """
920 |
921 | #TODO
922 | # if ($delayed) {
923 | # set_time_limit($this->pagingTimeLimitSec);
924 | # }
925 |
926 | index = 0
927 | accounts = []
928 |
929 | next_page = end_cursor
930 |
931 | if count < page_size:
932 | raise InstagramException('Count must be greater than or equal to page size.')
933 |
934 | while True:
935 |
936 | variables = {
937 | 'id': str(account_id),
938 | 'first': str(count),
939 | 'after': next_page
940 | }
941 |
942 | headers = self.generate_headers(self.user_session)
943 |
944 |
945 | response = self.__req.get(
946 | endpoints.get_following_json_link(variables),
947 | headers=headers)
948 |
949 | if not response.status_code == Instagram.HTTP_OK:
950 | raise InstagramException.default(response.text,response.status_code)
951 |
952 | jsonResponse = response.json()
953 | if jsonResponse['data']['user']['edge_follow']['count'] == 0:
954 | return accounts
955 |
956 | edgesArray = jsonResponse['data']['user']['edge_follow'][
957 | 'edges']
958 |
959 | if len(edgesArray) == 0:
960 | raise InstagramException(
961 | f'Failed to get follows of account id {account_id}.'
962 | f' The account is private.',
963 | Instagram.HTTP_FORBIDDEN)
964 |
965 | pageInfo = jsonResponse['data']['user']['edge_follow']['page_info']
966 | if pageInfo['has_next_page']:
967 | next_page = pageInfo['end_cursor']
968 |
969 | for edge in edgesArray:
970 | accounts.append(Account(edge['node']))
971 | index += 1
972 | if index >= count:
973 | #since no break 2, looking for better solution since duplicate code
974 | data = {}
975 | data['next_page'] = next_page
976 | data['accounts'] = accounts
977 |
978 | return data
979 |
980 | #must be below here
981 | if not pageInfo['has_next_page']:
982 | break
983 |
984 | if delayed != None:
985 | # Random wait between 1 and 3 sec to mimic browser
986 | microsec = random.uniform(1.0, 3.0)
987 | time.sleep(microsec)
988 |
989 | data = {}
990 | data['next_page'] = next_page
991 | data['accounts'] = accounts
992 |
993 | return data
994 |
995 | def get_media_comments_by_id(self, media_id, count=10, max_id=None):
996 | """
997 | :param media_id: media id
998 | :param count: the number of how many comments you want to get
999 | :param max_id: used to paginate
1000 | :return: Comment List
1001 | """
1002 | code = Media.get_code_from_id(media_id)
1003 | return self.get_media_comments_by_code(code, count, max_id)
1004 |
1005 | def get_media_comments_by_code(self, code, count=10, max_id=''):
1006 |
1007 | """
1008 | :param code: media code
1009 | :param count: the number of how many comments you want to get
1010 | :param max_id: used to paginate
1011 | :return: Comment List
1012 | """
1013 |
1014 | comments = []
1015 | index = 0
1016 | has_previous = True
1017 |
1018 | while has_previous and index < count:
1019 | number_of_comments_to_receive = 0
1020 | if count - index > Instagram.MAX_COMMENTS_PER_REQUEST:
1021 | number_of_comments_to_receive = Instagram.MAX_COMMENTS_PER_REQUEST
1022 | else:
1023 | number_of_comments_to_receive = count - index
1024 |
1025 | variables = {
1026 | "shortcode": str(code),
1027 | "first": str(number_of_comments_to_receive),
1028 | "after": '' if not max_id else max_id
1029 | }
1030 |
1031 | comments_url = endpoints.get_comments_before_comments_id_by_code(
1032 | variables)
1033 |
1034 | time.sleep(self.sleep_between_requests)
1035 | response = self.__req.get(comments_url,
1036 | headers=self.generate_headers(
1037 | self.user_session,
1038 | self.__generate_gis_token(variables)))
1039 |
1040 | if not response.status_code == Instagram.HTTP_OK:
1041 | raise InstagramException.default(response.text,
1042 | response.status_code)
1043 |
1044 | jsonResponse = response.json()
1045 |
1046 | nodes = jsonResponse['data']['shortcode_media']['edge_media_to_parent_comment']['edges']
1047 |
1048 | for commentArray in nodes:
1049 | comment = Comment(commentArray['node'])
1050 | comments.append(comment)
1051 | index += 1
1052 |
1053 | has_previous = jsonResponse['data']['shortcode_media']['edge_media_to_parent_comment']['page_info']['has_next_page']
1054 |
1055 | number_of_comments = jsonResponse['data']['shortcode_media']['edge_media_to_parent_comment']['count']
1056 | if count > number_of_comments:
1057 | count = number_of_comments
1058 |
1059 | max_id = jsonResponse['data']['shortcode_media']['edge_media_to_parent_comment']['page_info']['end_cursor']
1060 |
1061 | if len(nodes) == 0:
1062 | break
1063 |
1064 |
1065 | data = {}
1066 | data['next_page'] = max_id
1067 | data['comments'] = comments
1068 | return data
1069 |
1070 | def get_account(self, username):
1071 | """
1072 | :param username: username
1073 | :return: Account
1074 | """
1075 | time.sleep(self.sleep_between_requests)
1076 | response = self.__req.get(endpoints.get_account_page_link(
1077 | username), headers=self.generate_headers(self.user_session))
1078 |
1079 | if Instagram.HTTP_NOT_FOUND == response.status_code:
1080 | raise InstagramNotFoundException(
1081 | 'Account with given username does not exist.')
1082 |
1083 | if Instagram.HTTP_OK != response.status_code:
1084 | raise InstagramException.default(response.text,
1085 | response.status_code)
1086 |
1087 | user_array = Instagram.extract_shared_data_from_body(response.text)
1088 |
1089 | if user_array['entry_data']['ProfilePage'][0]['graphql']['user'] is None:
1090 | raise InstagramNotFoundException(
1091 | 'Account with this username does not exist')
1092 |
1093 | return Account(
1094 | user_array['entry_data']['ProfilePage'][0]['graphql']['user'])
1095 |
1096 | def get_stories(self, reel_ids=None):
1097 | """
1098 | :param reel_ids: reel ids
1099 | :return: UserStories List
1100 | """
1101 | variables = {'precomposed_overlay': False, 'reel_ids': []}
1102 |
1103 | if reel_ids is None or len(reel_ids) == 0:
1104 | time.sleep(self.sleep_between_requests)
1105 | response = self.__req.get(endpoints.get_user_stories_link(),
1106 | headers=self.generate_headers(
1107 | self.user_session))
1108 |
1109 | if not Instagram.HTTP_OK == response.status_code:
1110 | raise InstagramException.default(response.text,
1111 | response.status_code)
1112 |
1113 | json_response = response.json()
1114 |
1115 | try:
1116 | edges = json_response['data']['user']['feed_reels_tray'][
1117 | 'edge_reels_tray_to_reel']['edges']
1118 | except KeyError:
1119 | return []
1120 |
1121 | for edge in edges:
1122 | variables['reel_ids'].append(edge['node']['id'])
1123 |
1124 | else:
1125 | variables['reel_ids'] = reel_ids
1126 |
1127 | time.sleep(self.sleep_between_requests)
1128 | response = self.__req.get(endpoints.get_stories_link(variables),
1129 | headers=self.generate_headers(
1130 | self.user_session))
1131 |
1132 | if not Instagram.HTTP_OK == response.status_code:
1133 | raise InstagramException.default(response.text,
1134 | response.status_code)
1135 |
1136 | json_response = response.json()
1137 |
1138 | try:
1139 | reels_media = json_response['data']['reels_media']
1140 | if len(reels_media) == 0:
1141 | return []
1142 | except KeyError:
1143 | return []
1144 |
1145 | stories = []
1146 | for user in reels_media:
1147 | user_stories = UserStories()
1148 |
1149 | user_stories.owner = Account(user['user'])
1150 | for item in user['items']:
1151 | story = Story(item)
1152 | user_stories.stories.append(story)
1153 |
1154 | stories.append(user_stories)
1155 | return stories
1156 |
1157 | def search_accounts_by_username(self, username):
1158 | """
1159 | :param username: user name
1160 | :return: Account List
1161 | """
1162 | time.sleep(self.sleep_between_requests)
1163 | response = self.__req.get(
1164 | endpoints.get_general_search_json_link(username),
1165 | headers=self.generate_headers(self.user_session))
1166 |
1167 | if Instagram.HTTP_NOT_FOUND == response.status_code:
1168 | raise InstagramNotFoundException(
1169 | 'Account with given username does not exist.')
1170 |
1171 | if not Instagram.HTTP_OK == response.status_code:
1172 | raise InstagramException.default(response.text,
1173 | response.status_code)
1174 |
1175 | json_response = response.json()
1176 |
1177 | try:
1178 | status = json_response['status']
1179 | if not status == 'ok':
1180 | raise InstagramException(
1181 | 'Response code is not equal 200.'
1182 | ' Something went wrong. Please report issue.')
1183 | except KeyError:
1184 | raise InstagramException(
1185 | 'Response code is not equal 200.'
1186 | ' Something went wrong. Please report issue.')
1187 |
1188 | try:
1189 | users = json_response['users']
1190 | if len(users) == 0:
1191 | return []
1192 | except KeyError:
1193 | return []
1194 |
1195 | accounts = []
1196 | for json_account in json_response['users']:
1197 | accounts.append(Account(json_account['user']))
1198 |
1199 | return accounts
1200 |
1201 | # TODO not optimal separate http call after getMedia
1202 | def get_media_tagged_users_by_code(self, code):
1203 | """
1204 | :param code: media short code
1205 | :return: list contains tagged_users dict
1206 | """
1207 | url = endpoints.get_media_json_link(code)
1208 |
1209 | time.sleep(self.sleep_between_requests)
1210 | response = self.__req.get(url, headers=self.generate_headers(
1211 | self.user_session))
1212 |
1213 | if not Instagram.HTTP_OK == response.status_code:
1214 | raise InstagramException.default(response.text,
1215 | response.status_code)
1216 |
1217 | json_response = response.json()
1218 |
1219 | try:
1220 | tag_data = json_response['graphql']['shortcode_media'][
1221 | 'edge_media_to_tagged_user']['edges']
1222 | except KeyError:
1223 | return []
1224 |
1225 | tagged_users = []
1226 |
1227 | for tag in tag_data:
1228 | x_pos = tag['node']['x']
1229 | y_pos = tag['node']['y']
1230 | user = tag['node']['user']
1231 | # TODO: add Model and add Data to it instead of Dict
1232 | tagged_user = dict()
1233 | tagged_user['x_pos'] = x_pos
1234 | tagged_user['y_pos'] = y_pos
1235 | tagged_user['user'] = user
1236 |
1237 | tagged_users.append(tagged_user)
1238 |
1239 | return tagged_users
1240 |
1241 | def is_logged_in(self, session):
1242 | """
1243 | :param session: session dict
1244 | :return: bool
1245 | """
1246 | if session is None or 'sessionid' not in session.keys():
1247 | return False
1248 |
1249 | session_id = session['sessionid']
1250 | csrf_token = session['csrftoken']
1251 |
1252 | headers = {
1253 | 'cookie': f"ig_cb=1; csrftoken={csrf_token}; sessionid={session_id};",
1254 | 'referer': endpoints.BASE_URL + '/',
1255 | 'x-csrftoken': csrf_token,
1256 | 'X-CSRFToken': csrf_token,
1257 | 'user-agent': self.user_agent,
1258 | }
1259 |
1260 | time.sleep(self.sleep_between_requests)
1261 | response = self.__req.get(endpoints.BASE_URL, headers=headers)
1262 |
1263 | if not response.status_code == Instagram.HTTP_OK:
1264 | return False
1265 |
1266 | cookies = response.cookies.get_dict()
1267 |
1268 | if cookies is None or not 'ds_user_id' in cookies.keys():
1269 | return False
1270 |
1271 | return True
1272 |
1273 | def login(self, force=False, two_step_verificator=None):
1274 | """support_two_step_verification true works only in cli mode - just run login in cli mode - save cookie to file and use in any mode
1275 | :param force: true will refresh the session
1276 | :param two_step_verificator: true will need to do verification when an account goes wrong
1277 | :return: headers dict
1278 | """
1279 | if self.session_username is None or self.session_password is None:
1280 | raise InstagramAuthException("User credentials not provided")
1281 |
1282 | if two_step_verificator:
1283 | two_step_verificator = ConsoleVerification()
1284 |
1285 | session = json.loads(
1286 | Instagram.instance_cache.get_saved_cookies()) if Instagram.instance_cache.get_saved_cookies() != None else None
1287 |
1288 | if force or not self.is_logged_in(session):
1289 | time.sleep(self.sleep_between_requests)
1290 | response = self.__req.get(endpoints.BASE_URL)
1291 | if not response.status_code == Instagram.HTTP_OK:
1292 | raise InstagramException.default(response.text,
1293 | response.status_code)
1294 |
1295 | match = re.findall(r'"csrf_token":"(.*?)"', response.text)
1296 |
1297 | if len(match) > 0:
1298 | csrfToken = match[0]
1299 |
1300 | cookies = response.cookies.get_dict()
1301 |
1302 | # cookies['mid'] doesnt work at the moment so fetch it with function
1303 | mid = self.__get_mid()
1304 |
1305 | headers = {
1306 | 'cookie': f"ig_cb=1; csrftoken={csrfToken}; mid={mid};",
1307 | 'referer': endpoints.BASE_URL + '/',
1308 | 'x-csrftoken': csrfToken,
1309 | 'X-CSRFToken': csrfToken,
1310 | 'user-agent': self.user_agent,
1311 | }
1312 | payload = {'username': self.session_username,
1313 | 'password': self.session_password}
1314 | response = self.__req.post(endpoints.LOGIN_URL, data=payload,
1315 | headers=headers)
1316 |
1317 | if not response.status_code == Instagram.HTTP_OK:
1318 | if (
1319 | response.status_code == Instagram.HTTP_BAD_REQUEST
1320 | and response.text is not None
1321 | and response.json()['message'] == 'checkpoint_required'
1322 | and two_step_verificator is not None):
1323 | response = self.__verify_two_step(response, cookies,
1324 | two_step_verificator)
1325 | print('checkpoint required')
1326 |
1327 | elif response.status_code is not None and response.text is not None:
1328 | raise InstagramAuthException(
1329 | f'Response code is {response.status_code}. Body: {response.text} Something went wrong. Please report issue.',
1330 | response.status_code)
1331 | else:
1332 | raise InstagramAuthException(
1333 | 'Something went wrong. Please report issue.',
1334 | response.status_code)
1335 |
1336 | if not response.json()['authenticated']:
1337 | raise InstagramAuthException('User credentials are wrong.')
1338 |
1339 | cookies = response.cookies.get_dict()
1340 |
1341 | cookies['mid'] = mid
1342 | Instagram.instance_cache.set_saved_cookies(json.dumps(cookies, separators=(',', ':')))
1343 |
1344 | self.user_session = cookies
1345 |
1346 | else:
1347 | self.user_session = session
1348 |
1349 | return self.generate_headers(self.user_session)
1350 |
1351 | def __verify_two_step(self, response, cookies, two_step_verificator):
1352 | """
1353 | :param response: Response object returned by Request
1354 | :param cookies: user cookies
1355 | :param two_step_verificator: two_step_verification instance
1356 | :return: Response
1357 | """
1358 | new_cookies = response.cookies.get_dict()
1359 | cookies = {**cookies, **new_cookies}
1360 |
1361 | cookie_string = ''
1362 | for key in cookies.keys():
1363 | cookie_string += f'{key}={cookies[key]};'
1364 |
1365 | headers = {
1366 | 'cookie': cookie_string,
1367 | 'referer': endpoints.LOGIN_URL,
1368 | 'x-csrftoken': cookies['csrftoken'],
1369 | 'user-agent': self.user_agent,
1370 | }
1371 |
1372 | url = endpoints.BASE_URL + response.json()['checkpoint_url']
1373 |
1374 | time.sleep(self.sleep_between_requests)
1375 | response = self.__req.get(url, headers=headers)
1376 | data = Instagram.extract_shared_data_from_body(response.text)
1377 |
1378 | if data is not None:
1379 | try:
1380 | choices = \
1381 | data['entry_data']['Challenge'][0]['extraData']['content'][
1382 | 3][
1383 | 'fields'][0]['values']
1384 | except KeyError:
1385 | choices = dict()
1386 | try:
1387 | fields = data['entry_data']['Challenge'][0]['fields']
1388 | try:
1389 | choices.update({'label': f"Email: {fields['email']}",
1390 | 'value': 1})
1391 | except KeyError:
1392 | pass
1393 | try:
1394 | choices.update(
1395 | {'label': f"Phone: {fields['phone_number']}",
1396 | 'value': 0})
1397 | except KeyError:
1398 | pass
1399 |
1400 | except KeyError:
1401 | pass
1402 |
1403 | if len(choices) > 0:
1404 | selected_choice = two_step_verificator.get_verification_type(
1405 | choices)
1406 | response = self.__req.post(url,
1407 | data={'choice': selected_choice},
1408 | headers=headers)
1409 |
1410 | if len(re.findall('name="security_code"', response.text)) <= 0:
1411 | raise InstagramAuthException(
1412 | 'Something went wrong when try '
1413 | 'two step verification. Please report issue.',
1414 | response.status_code)
1415 |
1416 | security_code = two_step_verificator.get_security_code()
1417 |
1418 | post_data = {
1419 | 'csrfmiddlewaretoken': cookies['csrftoken'],
1420 | 'verify': 'Verify Account',
1421 | 'security_code': security_code,
1422 | }
1423 | response = self.__req.post(url, data=post_data, headers=headers)
1424 | if not response.status_code == Instagram.HTTP_OK \
1425 | or 'Please check the code we sent you and try again' in response.text:
1426 | raise InstagramAuthException(
1427 | 'Something went wrong when try two step'
1428 | ' verification and enter security code. Please report issue.',
1429 | response.status_code)
1430 |
1431 | return response
1432 |
1433 | def add_comment(self, media_id, text, replied_to_comment_id=None):
1434 | """
1435 | :param media_id: media id
1436 | :param text: the content you want to post
1437 | :param replied_to_comment_id: the id of the comment you want to reply
1438 | :return: Comment
1439 | """
1440 | media_id = media_id.identifier if isinstance(media_id, Media) else media_id
1441 |
1442 | replied_to_comment_id = replied_to_comment_id._data['id'] if isinstance(replied_to_comment_id, Comment) else replied_to_comment_id
1443 |
1444 | body = {'comment_text': text,
1445 | 'replied_to_comment_id': replied_to_comment_id
1446 | if replied_to_comment_id is not None else ''}
1447 |
1448 | response = self.__req.post(endpoints.get_add_comment_url(media_id),
1449 | data=body, headers=self.generate_headers(
1450 | self.user_session))
1451 |
1452 | if not Instagram.HTTP_OK == response.status_code:
1453 | raise InstagramException.default(response.text,
1454 | response.status_code)
1455 |
1456 | json_response = response.json()
1457 |
1458 | if json_response['status'] != 'ok':
1459 | status = json_response['status']
1460 | raise InstagramException(
1461 | f'Response status is {status}. '
1462 | f'Body: {response.text} Something went wrong.'
1463 | f' Please report issue.',
1464 | response.status_code)
1465 |
1466 | return Comment(json_response)
1467 |
1468 | def delete_comment(self, media_id, comment_id):
1469 | """
1470 | :param media_id: media id
1471 | :param comment_id: the id of the comment you want to delete
1472 | """
1473 | media_id = media_id.identifier if isinstance(media_id,
1474 | Media) else media_id
1475 |
1476 | comment_id = comment_id._data['id'] if isinstance(comment_id,
1477 | Comment) else comment_id
1478 |
1479 | response = self.__req.post(
1480 | endpoints.get_delete_comment_url(media_id, comment_id),
1481 | headers=self.generate_headers(self.user_session))
1482 |
1483 | if not Instagram.HTTP_OK == response.status_code:
1484 | raise InstagramException.default(response.text,
1485 | response.status_code)
1486 |
1487 | json_response = response.json()
1488 |
1489 | if json_response['status'] != 'ok':
1490 | status = json_response['status']
1491 | raise InstagramException(
1492 | f'Response status is {status}. '
1493 | f'Body: {response.text} Something went wrong.'
1494 | f' Please report issue.',
1495 | response.status_code)
1496 |
1497 | def like(self, media_id):
1498 | """
1499 | :param media_id: media id
1500 | """
1501 | media_id = media_id.identifier if isinstance(media_id,
1502 | Media) else media_id
1503 | response = self.__req.post(endpoints.get_like_url(media_id),
1504 | headers=self.generate_headers(
1505 | self.user_session))
1506 |
1507 | if not Instagram.HTTP_OK == response.status_code:
1508 | raise InstagramException.default(response.text,
1509 | response.status_code)
1510 |
1511 | json_response = response.json()
1512 |
1513 | if json_response['status'] != 'ok':
1514 | status = json_response['status']
1515 | raise InstagramException(
1516 | f'Response status is {status}. '
1517 | f'Body: {response.text} Something went wrong.'
1518 | f' Please report issue.',
1519 | response.status_code)
1520 |
1521 | def unlike(self, media_id):
1522 | """
1523 | :param media_id: media id
1524 | """
1525 | media_id = media_id.identifier if isinstance(media_id,
1526 | Media) else media_id
1527 | response = self.__req.post(endpoints.get_unlike_url(media_id),
1528 | headers=self.generate_headers(
1529 | self.user_session))
1530 |
1531 | if not Instagram.HTTP_OK == response.status_code:
1532 | raise InstagramException.default(response.text,
1533 | response.status_code)
1534 |
1535 | json_response = response.json()
1536 |
1537 | if json_response['status'] != 'ok':
1538 | status = json_response['status']
1539 | raise InstagramException(
1540 | f'Response status is {status}. '
1541 | f'Body: {response.text} Something went wrong.'
1542 | f' Please report issue.',
1543 | response.status_code)
1544 |
1545 | def follow(self, user_id):
1546 | """
1547 | :param user_id: user id
1548 | :return: bool
1549 | """
1550 | if self.is_logged_in(self.user_session):
1551 | url = endpoints.get_follow_url(user_id)
1552 |
1553 | try:
1554 | follow = self.__req.post(url,
1555 | headers=self.generate_headers(
1556 | self.user_session))
1557 | if follow.status_code == Instagram.HTTP_OK:
1558 | return True
1559 | except:
1560 | raise InstagramException("Except on follow!")
1561 | return False
1562 |
1563 | def unfollow(self, user_id):
1564 | """
1565 | :param user_id: user id
1566 | :return: bool
1567 | """
1568 | if self.is_logged_in(self.user_session):
1569 | url_unfollow = endpoints.get_unfollow_url(user_id)
1570 | try:
1571 | unfollow = self.__req.post(url_unfollow)
1572 | if unfollow.status_code == Instagram.HTTP_OK:
1573 | return unfollow
1574 | except:
1575 | raise InstagramException("Exept on unfollow!")
1576 | return False
1577 |
--------------------------------------------------------------------------------
/optracker/igramscraper/model/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NorseByte/opensource-tracker/5388d845ba57bf6ca0aa80575608f4b903e1b8dc/optracker/igramscraper/model/__init__.py
--------------------------------------------------------------------------------
/optracker/igramscraper/model/account.py:
--------------------------------------------------------------------------------
1 | from .initializer_model import InitializerModel
2 | from .media import Media
3 | import textwrap
4 |
5 |
6 | class Account(InitializerModel):
7 |
8 | def __init__(self, props=None):
9 | self.identifier = None
10 | self.username = None
11 | self.full_name = None
12 | self.profile_pic_url = None
13 | self.profile_pic_url_hd = None
14 | self.biography = None
15 | self.external_url = None
16 | self.follows_count = 0
17 | self.followed_by_count = 0
18 | self.media_count = 0
19 | self.is_private = False
20 | self.is_verified = False
21 | self.medias = []
22 | self.blocked_by_viewer = False
23 | self.country_block = False
24 | self.followed_by_viewer = False
25 | self.follows_viewer = False
26 | self.has_channel = False
27 | self.has_blocked_viewer = False
28 | self.highlight_reel_count = 0
29 | self.has_requested_viewer = False
30 | self.is_business_account = False
31 | self.is_joined_recently = False
32 | self.business_category_name = None
33 | self.business_email = None
34 | self.business_phone_number = None
35 | self.business_address_json = None
36 | self.requested_by_viewer = False
37 | self.connected_fb_page = None
38 |
39 | super(Account, self).__init__(props)
40 |
41 | def get_profile_picture_url(self):
42 | try:
43 | if not self.profile_pic_url_hd == '':
44 | return self.profile_pic_url_hd
45 | except AttributeError:
46 | try:
47 | return self.profile_pic_url
48 | except AttributeError:
49 | return ''
50 |
51 | def __str__(self):
52 | string = f"""
53 | Account info:
54 | Id: {self.identifier}
55 | Username: {self.username if hasattr(self, 'username') else '-'}
56 | Full Name: {self.full_name if hasattr(self, 'full_name') else '-'}
57 | Bio: {self.biography if hasattr(self, 'biography') else '-'}
58 | Profile Pic Url: {self.get_profile_picture_url()}
59 | External url: {self.external_url if hasattr(self, 'external_url') else '-'}
60 | Number of published posts: {self.media_count if hasattr(self, 'mediaCount') else '-'}
61 | Number of followers: {self.followed_by_count if hasattr(self, 'followed_by_count') else '-'}
62 | Number of follows: {self.follows_count if hasattr(self, 'follows_count') else '-'}
63 | Is private: {self.is_private if hasattr(self, 'is_private') else '-'}
64 | Is verified: {self.is_verified if hasattr(self, 'is_verified') else '-'}
65 | """
66 | return textwrap.dedent(string)
67 |
68 | """
69 | * @param Media $media
70 | * @return Account
71 | """
72 | def add_media(self, media):
73 | try:
74 | self.medias.append(media)
75 | except AttributeError:
76 | raise AttributeError
77 |
78 | def _init_properties_custom(self, value, prop, array):
79 |
80 | if prop == 'id':
81 | self.identifier = value
82 |
83 | standart_properties = [
84 | 'username',
85 | 'full_name',
86 | 'profile_pic_url',
87 | 'profile_pic_url_hd',
88 | 'biography',
89 | 'external_url',
90 | 'is_private',
91 | 'is_verified',
92 | 'blocked_by_viewer',
93 | 'country_block',
94 | 'followed_by_viewer',
95 | 'follows_viewer',
96 | 'has_channel',
97 | 'has_blocked_viewer',
98 | 'highlight_reel_count',
99 | 'has_requested_viewer',
100 | 'is_business_account',
101 | 'is_joined_recently',
102 | 'business_category_name',
103 | 'business_email',
104 | 'business_phone_number',
105 | 'business_address_json',
106 | 'requested_by_viewer',
107 | 'connected_fb_page'
108 | ]
109 | if prop in standart_properties:
110 | self.__setattr__(prop, value)
111 |
112 | if prop == 'edge_follow':
113 | self.follows_count = array[prop]['count'] \
114 | if array[prop]['count'] is not None else 0
115 |
116 | if prop == 'edge_followed_by':
117 | self.followed_by_count = array[prop]['count'] \
118 | if array[prop]['count'] is not None else 0
119 |
120 | if prop == 'edge_owner_to_timeline_media':
121 | self._init_media(array[prop])
122 |
123 | def _init_media(self, array):
124 | self.media_count = array['count'] if 'count' in array.keys() else 0
125 |
126 | try:
127 | nodes = array['edges']
128 | except:
129 | return
130 |
131 | if not self.media_count or isinstance(nodes, list):
132 | return
133 |
134 | for media_array in nodes:
135 | media = Media(media_array['node'])
136 | if isinstance(media, Media):
137 | self.add_media(media)
138 |
--------------------------------------------------------------------------------
/optracker/igramscraper/model/carousel_media.py:
--------------------------------------------------------------------------------
1 | class CarouselMedia:
2 |
3 | def __init__(self):
4 | self.__type = ''
5 | self.__image_low_resolution_url = ''
6 | self.__image_thumbnail_url = ''
7 | self.__image_standard_resolution_url = ''
8 | self.__image_high_resolution_url = ''
9 | self.__video_low_resolution_url = ''
10 | self.__video_standard_resolution_url = ''
11 | self.__video_low_bandwidth_url = ''
12 | self.__video_views = ''
13 |
14 |
--------------------------------------------------------------------------------
/optracker/igramscraper/model/comment.py:
--------------------------------------------------------------------------------
1 | from .initializer_model import InitializerModel
2 |
3 |
4 | class Comment(InitializerModel):
5 | """
6 | * @param $value
7 | * @param $prop
8 | """
9 |
10 | def __init__(self, props=None):
11 | self.identifier = None
12 | self.text = None
13 | self.created_at = None
14 | # Account object
15 | self.owner = None
16 |
17 | super(Comment, self).__init__(props)
18 |
19 | def _init_properties_custom(self, value, prop, array):
20 |
21 | if prop == 'id':
22 | self.identifier = value
23 |
24 | standart_properties = [
25 | 'created_at',
26 | 'text',
27 | ]
28 |
29 | if prop in standart_properties:
30 | self.__setattr__(prop, value)
31 |
32 | if prop == 'owner':
33 | from .account import Account
34 | self.owner = Account(value)
35 |
--------------------------------------------------------------------------------
/optracker/igramscraper/model/initializer_model.py:
--------------------------------------------------------------------------------
1 | import time
2 |
3 |
4 | class InitializerModel:
5 |
6 | def __init__(self, props=None):
7 |
8 | self._is_new = True
9 | self._is_loaded = False
10 | """init data was empty"""
11 | self._is_load_empty = True
12 | self._is_fake = False
13 | self._modified = None
14 |
15 | """Array of initialization data"""
16 | self._data = {}
17 |
18 | self.modified = time.time()
19 |
20 | if props is not None and len(props) > 0:
21 | self._init(props)
22 |
23 | def _init(self, props):
24 | """
25 |
26 | :param props: props array
27 | :return: None
28 | """
29 | for key in props.keys():
30 | try:
31 | self._init_properties_custom(props[key], key, props)
32 | except AttributeError:
33 | # if function does not exist fill help data array
34 | self._data[key] = props[key]
35 |
36 | self._is_new = False
37 | self._is_loaded = True
38 | self._is_load_empty = False
39 |
40 | # '''
41 | # * @return $this
42 | # '''
43 | # @staticmethod
44 | # def fake():
45 | # return static::create()->setFake(true);
46 |
47 | # '''
48 | # * @param bool $value
49 | # *
50 | # * @return $this
51 | # '''
52 | # def _setFake(self, value = True):
53 | # self._isFake = (bool)value
54 |
55 | # '''
56 | # * @return bool
57 | # '''
58 | # public function isNotEmpty()
59 | # {
60 | # return !$this->isLoadEmpty;
61 | # }
62 |
63 | # '''
64 | # * @return bool
65 | # '''
66 | # public function isFake()
67 | # {
68 | # return $this->isFake;
69 | # }
70 |
71 | # '''
72 | # * @return array
73 | # '''
74 | # public function toArray()
75 | # {
76 | # ret = []
77 | # map = static::$initPropertiesMap;
78 | # foreach ($map as $key => $init) {
79 | # if (\property_exists($this, $key)) {
80 | # //if there is property then it just assign value
81 | # $ret[$key] = $this->{$key};
82 | # } elseif (isset($this[$key])) {
83 | # //probably array access
84 | # $ret[$key] = $this[$key];
85 | # } else {
86 | # $ret[$key] = null;
87 | # }
88 | # }
89 |
90 | # return $ret;
91 | # }
92 |
93 | # '''
94 | # * @param $datetime
95 | # *
96 | # * @return $this
97 | # '''
98 | # protected function initModified($datetime)
99 | # {
100 | # $this->modified = \strtotime($datetime);
101 |
102 | # return $this;
103 | # }
104 |
105 | # '''
106 | # * @param string $date
107 | # * @param string $key
108 | # *
109 | # * @return $this
110 | # '''
111 | # protected function initDatetime($date, $key)
112 | # {
113 | # return $this->initProperty(\strtotime($date), $key);
114 | # }
115 |
116 | # '''
117 | # * @param $value
118 | # * @param $key
119 | # *
120 | # * @return $this
121 | # '''
122 | # protected function initProperty($value, $key)
123 | # {
124 | # $keys = \func_get_args();
125 | # unset($keys[0]); //remove value
126 | # if (\count($keys) > 1) {
127 | # foreach ($keys as $key) {
128 | # if (\property_exists($this, $key)) { //first found set
129 | # $this->{$key} = $value;
130 |
131 | # return $this;
132 | # }
133 | # }
134 | # } elseif (\property_exists($this, $key)) {
135 | # $this->{$key} = $value;
136 | # }
137 |
138 | # return $this;
139 | # }
140 |
141 | # '''
142 | # * @param mixed $value
143 | # * @param string $key
144 | # *
145 | # * @return $this
146 | # '''
147 | # protected function initBool($value, $key)
148 | # {
149 | # return $this->initProperty(!empty($value), "is{$key}", $key);
150 | # }
151 |
152 | # '''
153 | # * @param mixed $value
154 | # * @param string $key
155 | # *
156 | # * @return $this
157 | # '''
158 | # protected function initInt($value, $key)
159 | # {
160 | # return $this->initProperty((int)$value, $key);
161 | # }
162 |
163 | # '''
164 | # * @param mixed $value
165 | # * @param string $key
166 | # *
167 | # * @return $this
168 | # '''
169 | # protected function initFloat($value, $key)
170 | # {
171 | # return $this->initProperty((float)$value, $key);
172 | # }
173 |
174 | # '''
175 | # * @param string $rawData
176 | # * @param string $key
177 | # *
178 | # * @return $this
179 | # '''
180 | # def _initJsonArray(rawData, key):
181 |
182 | # value = json.loads(rawData)
183 | # if value == None or len(value) == 0:
184 | # if ('null' == rawData or '' == rawData or 'None' == rawData):
185 | # value = []
186 | # else:
187 | # value = (array)rawData;
188 | # else
189 | # value = (array)$value;
190 |
191 | # return $this->initProperty($value, $key);
192 |
193 | # '''
194 | # * @param mixed $value
195 | # * @param string $key
196 | # *
197 | # * @return $this
198 | # '''
199 | # protected function initExplode($value, $key)
200 | # {
201 | # return $this->initProperty(\explode(',', $value), "is{$key}", $key);
202 | # }
203 |
--------------------------------------------------------------------------------
/optracker/igramscraper/model/location.py:
--------------------------------------------------------------------------------
1 | from .initializer_model import InitializerModel
2 | import textwrap
3 |
4 |
5 | class Location(InitializerModel):
6 |
7 | def __init__(self, props=None):
8 | self.identifier = None
9 | self.has_public_page = None
10 | self.name = None
11 | self.slug = None
12 | self.lat = None
13 | self.lng = None
14 | self.modified = None
15 | super(Location, self).__init__(props)
16 |
17 | def __str__(self):
18 | string = f"""
19 | Location info:
20 | Id: {self.identifier}
21 | Name: {self.name}
22 | Latitude: {self.lat}
23 | Longitude: {self.lng}
24 | Slug: {self.slug}
25 | Is public page available: {self.has_public_page}
26 | """
27 |
28 | return textwrap.dedent(string)
29 |
30 | def _init_properties_custom(self, value, prop, arr):
31 |
32 | if prop == 'id':
33 | self.identifier = value
34 |
35 | standart_properties = [
36 | 'has_public_page',
37 | 'name',
38 | 'slug',
39 | 'lat',
40 | 'lng',
41 | 'modified',
42 | ]
43 |
44 | if prop in standart_properties:
45 | self.__setattr__(prop, value)
46 |
--------------------------------------------------------------------------------
/optracker/igramscraper/model/media.py:
--------------------------------------------------------------------------------
1 | import urllib.parse
2 | import textwrap
3 |
4 | from .initializer_model import InitializerModel
5 | from .comment import Comment
6 | from .carousel_media import CarouselMedia
7 | from .. import endpoints
8 |
9 |
10 | class Media(InitializerModel):
11 | TYPE_IMAGE = 'image'
12 | TYPE_VIDEO = 'video'
13 | TYPE_SIDECAR = 'sidecar'
14 | TYPE_CAROUSEL = 'carousel'
15 |
16 | def __init__(self, props=None):
17 | self.identifier = None
18 | self.short_code = None
19 | self.created_time = 0
20 | self.type = None
21 | self.link = None
22 | self.image_low_resolution_url = None
23 | self.image_thumbnail_url = None
24 | self.image_standard_resolution_url = None
25 | self.image_high_resolution_url = None
26 | self.square_images = []
27 | self.carousel_media = []
28 | self.caption = None
29 | self.is_ad = False
30 | self.video_low_resolution_url = None
31 | self.video_standard_resolution_url = None
32 | self.video_low_bandwidth_url = None
33 | self.video_views = 0
34 | self.video_url = None
35 | # account object
36 | self.owner = None
37 | self.likes_count = 0
38 | self.location_id = None
39 | self.location_name = None
40 | self.comments_count = 0
41 | self.comments = []
42 | self.has_more_comments = False
43 | self.comments_next_page = None
44 | self.location_slug = None
45 |
46 | super(Media, self).__init__(props)
47 |
48 | @staticmethod
49 | def get_id_from_code(code):
50 | alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_'
51 | id = 0
52 |
53 | for i in range(len(code)):
54 | c = code[i]
55 | id = id * 64 + alphabet.index(c)
56 |
57 | return id
58 |
59 | @staticmethod
60 | def get_link_from_id(id):
61 | code = Media.get_code_from_id(id)
62 | return endpoints.get_media_page_link(code)
63 |
64 | @staticmethod
65 | def get_code_from_id(id):
66 | parts = str(id).partition('_')
67 | id = int(parts[0])
68 | alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_'
69 | code = ''
70 |
71 | while (id > 0):
72 | remainder = int(id) % 64
73 | id = (id - remainder) // 64
74 | code = alphabet[remainder] + code
75 |
76 | return code
77 |
78 | def __str__(self):
79 | string = f"""
80 | Media Info:
81 | 'Id: {self.identifier}
82 | Shortcode: {self.short_code}
83 | Created at: {self.created_time}
84 | Caption: {self.caption}
85 | Number of comments: {self.comments_count if hasattr(self,
86 | 'commentsCount') else 0}
87 | Number of likes: {self.likes_count}
88 | Link: {self.link}
89 | Hig res image: {self.image_high_resolution_url}
90 | Media type: {self.type}
91 | """
92 |
93 | return textwrap.dedent(string)
94 |
95 | def _init_properties_custom(self, value, prop, arr):
96 |
97 | if prop == 'id':
98 | self.identifier = value
99 |
100 | standart_properties = [
101 | 'type',
102 | 'link',
103 | 'thumbnail_src',
104 | 'caption',
105 | 'video_view_count',
106 | 'caption_is_edited',
107 | 'is_ad'
108 | ]
109 |
110 | if prop in standart_properties:
111 | self.__setattr__(prop, value)
112 |
113 | elif prop == 'created_time' or prop == 'taken_at_timestamp' or prop == 'date':
114 | self.created_time = int(value)
115 |
116 | elif prop == 'code':
117 | self.short_code = value
118 | self.link = endpoints.get_media_page_link(self.short_code)
119 |
120 | elif prop == 'comments':
121 | self.comments_count = arr[prop]['count']
122 | elif prop == 'likes':
123 | self.likes_count = arr[prop]['count']
124 |
125 | elif prop == 'display_resources':
126 | medias_url = []
127 | for media in value:
128 | medias_url.append(media['src'])
129 |
130 | if media['config_width'] == 640:
131 | self.image_thumbnail_url = media['src']
132 | elif media['config_width'] == 750:
133 | self.image_low_resolution_url = media['src']
134 | elif media['config_width'] == 1080:
135 | self.image_standard_resolution_url = media['src']
136 |
137 | elif prop == 'display_src' or prop == 'display_url':
138 | self.image_high_resolution_url = value
139 | if self.type is None:
140 | self.type = Media.TYPE_IMAGE
141 |
142 | elif prop == 'thumbnail_resources':
143 | square_images_url = []
144 | for square_image in value:
145 | square_images_url.append(square_image['src'])
146 | self.square_images = square_images_url
147 |
148 | elif prop == 'carousel_media':
149 | self.type = Media.TYPE_CAROUSEL
150 | self.carousel_media = []
151 | for carousel_array in arr["carousel_media"]:
152 | self.set_carousel_media(arr, carousel_array)
153 |
154 | elif prop == 'video_views':
155 | self.video_views = value
156 | self.type = Media.TYPE_VIDEO
157 |
158 | elif prop == 'videos':
159 | self.video_low_resolution_url = arr[prop]['low_resolution']['url']
160 | self.video_standard_resolution_url = \
161 | arr[prop]['standard_resolution']['url']
162 | self.video_low_bandwith_url = arr[prop]['low_bandwidth']['url']
163 |
164 | elif prop == 'video_resources':
165 | for video in value:
166 | if video['profile'] == 'MAIN':
167 | self.video_standard_resolution_url = video['src']
168 | elif video['profile'] == 'BASELINE':
169 | self.video_low_resolution_url = video['src']
170 | self.video_low_bandwith_url = video['src']
171 |
172 | elif prop == 'location' and value is not None:
173 | self.location_id = arr[prop]['id']
174 | self.location_name = arr[prop]['name']
175 | self.location_slug = arr[prop]['slug']
176 |
177 | elif prop == 'user' or prop == 'owner':
178 | from .account import Account
179 | self.owner = Account(arr[prop])
180 |
181 | elif prop == 'is_video':
182 | if bool(value):
183 | self.type = Media.TYPE_VIDEO
184 |
185 | elif prop == 'video_url':
186 | self.video_standard_resolution_url = value
187 |
188 | elif prop == 'shortcode':
189 | self.short_code = value
190 | self.link = endpoints.get_media_page_link(self.short_code)
191 |
192 | elif prop == 'edge_media_to_comment':
193 | try:
194 | self.comments_count = int(arr[prop]['count'])
195 | except KeyError:
196 | pass
197 | try:
198 | edges = arr[prop]['edges']
199 |
200 | for comment_data in edges:
201 | self.comments.append(Comment(comment_data['node']))
202 | except KeyError:
203 | pass
204 | try:
205 | self.has_more_comments = bool(
206 | arr[prop]['page_info']['has_next_page'])
207 | except KeyError:
208 | pass
209 | try:
210 | self.comments_next_page = str(
211 | arr[prop]['page_info']['end_cursor'])
212 | except KeyError:
213 | pass
214 |
215 | elif prop == 'edge_media_preview_like':
216 | self.likes_count = arr[prop]['count']
217 | elif prop == 'edge_liked_by':
218 | self.likes_count = arr[prop]['count']
219 |
220 | elif prop == 'edge_media_to_caption':
221 | try:
222 | self.caption = arr[prop]['edges'][0]['node']['text']
223 | except (KeyError, IndexError):
224 | pass
225 |
226 | elif prop == 'edge_sidecar_to_children':
227 | pass
228 | # #TODO implement
229 | # if (!is_array($arr[$prop]['edges'])) {
230 | # break;
231 | # }
232 | # foreach ($arr[$prop]['edges'] as $edge) {
233 | # if (!isset($edge['node'])) {
234 | # continue;
235 | # }
236 |
237 | # $this->sidecarMedias[] = static::create($edge['node']);
238 | # }
239 | elif prop == '__typename':
240 | if value == 'GraphImage':
241 | self.type = Media.TYPE_IMAGE
242 | elif value == 'GraphVideo':
243 | self.type = Media.TYPE_VIDEO
244 | elif value == 'GraphSidecar':
245 | self.type = Media.TYPE_SIDECAR
246 |
247 | # if self.ownerId and self.owner != None:
248 | # self.ownerId = self.getOwner().getId()
249 |
250 | @staticmethod
251 | def set_carousel_media(media_array, carousel_array):
252 |
253 | print(carousel_array)
254 | # TODO implement
255 | pass
256 | """
257 | param mediaArray
258 | param carouselArray
259 | param instance
260 | return mixed
261 | """
262 | # carousel_media = CarouselMedia()
263 | # carousel_media.type(carousel_array['type'])
264 |
265 | # try:
266 | # images = carousel_array['images']
267 | # except KeyError:
268 | # pass
269 |
270 | # carousel_images = Media.__get_image_urls(
271 | # carousel_array['images']['standard_resolution']['url'])
272 | # carousel_media.imageLowResolutionUrl = carousel_images['low']
273 | # carousel_media.imageThumbnailUrl = carousel_images['thumbnail']
274 | # carousel_media.imageStandardResolutionUrl = carousel_images['standard']
275 | # carousel_media.imageHighResolutionUrl = carousel_images['high']
276 |
277 | # if carousel_media.type == Media.TYPE_VIDEO:
278 | # try:
279 | # carousel_media.video_views = carousel_array['video_views']
280 | # except KeyError:
281 | # pass
282 |
283 | # if 'videos' in carousel_array.keys():
284 | # carousel_media.videoLowResolutionUrl(
285 | # carousel_array['videos']['low_resolution']['url'])
286 | # carousel_media.videoStandardResolutionUrl(
287 | # carousel_array['videos']['standard_resolution']['url'])
288 | # carousel_media.videoLowBandwidthUrl(
289 | # carousel_array['videos']['low_bandwidth']['url'])
290 |
291 | # media_array.append(carousel_media)
292 | # # array_push($instance->carouselMedia, $carouselMedia);
293 | # return media_array
294 |
295 | @staticmethod
296 | def __getImageUrls(image_url):
297 | parts = '/'.split(urllib.parse.quote_plus(image_url)['path'])
298 | imageName = parts[len(parts) - 1]
299 | urls = {
300 | 'thumbnail': endpoints.INSTAGRAM_CDN_URL + 't/s150x150/' + imageName,
301 | 'low': endpoints.INSTAGRAM_CDN_URL + 't/s320x320/' + imageName,
302 | 'standard': endpoints.INSTAGRAM_CDN_URL + 't/s640x640/' + imageName,
303 | 'high': endpoints.INSTAGRAM_CDN_URL + 't/' + imageName,
304 | }
305 | return urls
306 |
--------------------------------------------------------------------------------
/optracker/igramscraper/model/story.py:
--------------------------------------------------------------------------------
1 | from .media import Media
2 | import textwrap
3 |
4 |
5 | class Story(Media):
6 |
7 | skip_prop = [
8 | 'owner'
9 | ]
10 |
11 | # We do not need some values - do not parse it for Story,
12 | # for example - we do not need owner object inside story
13 |
14 | # param value
15 | # param prop
16 | # param arr
17 |
18 | def _init_properties_custom(self, value, prop, arr):
19 | if prop in Story.skip_prop:
20 | return
21 |
22 | super()._init_properties_custom(value, prop, arr)
23 |
24 | def __str__(self):
25 | string = f"""
26 | Story Info:
27 | 'Id: {self.identifier}
28 | Hig res image: {self.image_high_resolution_url}
29 | Media type: {self.type if hasattr(self, 'type') else ''}
30 | """
31 |
32 | return textwrap.dedent(string)
33 |
--------------------------------------------------------------------------------
/optracker/igramscraper/model/tag.py:
--------------------------------------------------------------------------------
1 | from .initializer_model import InitializerModel
2 |
3 |
4 | class Tag(InitializerModel):
5 |
6 | def __init__(self, props=None):
7 | self._media_count = 0
8 | self._name = None
9 | self._id = None
10 | super(Tag, self).__init__(props)
11 |
12 | def _init_properties_custom(self, value, prop, arr):
13 |
14 | if prop == 'id':
15 | self.identifier = value
16 |
17 | standart_properties = [
18 | 'media_count',
19 | 'name',
20 | ]
21 |
22 | if prop in standart_properties:
23 | self.__setattr__(prop, value)
24 |
--------------------------------------------------------------------------------
/optracker/igramscraper/model/user_stories.py:
--------------------------------------------------------------------------------
1 | from .initializer_model import InitializerModel
2 |
3 | class UserStories(InitializerModel):
4 |
5 | def __init__(self, stories=[], owner=None):
6 | if stories is None:
7 | stories = []
8 | self.owner = owner
9 | self.stories = stories
10 | super().__init__()
11 |
12 |
--------------------------------------------------------------------------------
/optracker/igramscraper/session_manager.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 |
4 | class CookieSessionManager:
5 | def __init__(self, session_folder, filename):
6 | self.session_folder = session_folder
7 | self.filename = filename
8 |
9 | def get_saved_cookies(self):
10 | try:
11 | f = open(self.session_folder + self.filename, 'r')
12 | return f.read()
13 | except FileNotFoundError:
14 | return None
15 |
16 | def set_saved_cookies(self, cookie_string):
17 | if not os.path.exists(self.session_folder):
18 | os.makedirs(self.session_folder)
19 |
20 | with open(self.session_folder + self.filename,"w+") as f:
21 | f.write(cookie_string)
22 |
23 | def empty_saved_cookies(self):
24 | try:
25 | os.remove(self.session_folder + self.filename)
26 | except FileNotFoundError:
27 | pass
28 |
--------------------------------------------------------------------------------
/optracker/igramscraper/two_step_verification/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NorseByte/opensource-tracker/5388d845ba57bf6ca0aa80575608f4b903e1b8dc/optracker/igramscraper/two_step_verification/__init__.py
--------------------------------------------------------------------------------
/optracker/igramscraper/two_step_verification/console_verification.py:
--------------------------------------------------------------------------------
1 | from .two_step_verification_abstract_class import TwoStepVerificationAbstractClass
2 |
3 |
4 | class ConsoleVerification(TwoStepVerificationAbstractClass):
5 |
6 | def get_verification_type(self, choices):
7 | if (len(choices) > 1):
8 | possible_values = {}
9 | print('Select where to send security code')
10 |
11 | for choice in choices:
12 | print(choice['label'] + ' - ' + str(choice['value']))
13 | possible_values[str(choice['value'])] = True
14 |
15 | selected_choice = None
16 |
17 | while (not selected_choice in possible_values.keys()):
18 | if (selected_choice):
19 | print('Wrong choice. Try again')
20 |
21 | selected_choice = input('Your choice: ').strip()
22 | else:
23 | print('Message with security code sent to: ' + choices[0]['label'])
24 | selected_choice = choices[0]['value']
25 |
26 | return selected_choice
27 |
28 | def get_security_code(self):
29 | """
30 |
31 | :return: string
32 | """
33 | security_code = ''
34 | while (len(security_code) != 6 and not security_code.isdigit()):
35 | if (security_code):
36 | print('Wrong security code')
37 |
38 | security_code = input('Enter security code: ').strip()
39 |
40 | return security_code
41 |
--------------------------------------------------------------------------------
/optracker/igramscraper/two_step_verification/two_step_verification_abstract_class.py:
--------------------------------------------------------------------------------
1 | from abc import ABC, abstractmethod
2 |
3 |
4 | class TwoStepVerificationAbstractClass(ABC):
5 |
6 | @abstractmethod
7 | def get_verification_type(self, possible_values):
8 | """
9 | :param possible_values: array of possible values
10 | :return: string
11 | """
12 | pass
13 |
14 | @abstractmethod
15 | def get_security_code(self):
16 | """
17 |
18 | :return: string
19 | """
20 | pass
21 |
--------------------------------------------------------------------------------
/optracker/optracker.py:
--------------------------------------------------------------------------------
1 | import os
2 | from .zerodata import zerodata
3 | from .functions.db_func import *
4 | from .functions.side_func import *
5 | from .functions.core_func import *
6 | from .igramscraper.instagram import Instagram
7 | from .facerec.facerec import facerec
8 | from time import sleep
9 |
10 | class Optracker():
11 | def __init__(self):
12 | #Adding Text source
13 | self.zero = zerodata()
14 |
15 | #Setting up OP_ROOT_FOLDER
16 | self.createRootfolder()
17 |
18 | #Load Config
19 | self.zero.setupJSON(False)
20 |
21 | #Load face_recognition
22 | self.myFace = facerec(self.zero)
23 |
24 | #Iniatlaize DB_DATABASE
25 | print("+ Setting up DB")
26 | self.dbTool = dbFunc(self.zero.DB_DATABASE, self.zero)
27 | self.dbConn = self.dbTool.create_connection()
28 |
29 | #Create Tabels
30 | if self.zero.DB_MYSQL_ON == 0:
31 | self.dbTool.createTabels(self.dbConn, self.zero.DB_TABLE_NODES)
32 | self.dbTool.createTabels(self.dbConn, self.zero.DB_TABLE_EGDES)
33 | self.dbTool.createTabels(self.dbConn, self.zero.DB_TABLE_NEW_INSTA)
34 | self.dbTool.createTabels(self.dbConn, self.zero.DB_TABLE_LOGIN_INSTA)
35 | self.dbTool.createTabels(self.dbConn, self.zero.DB_TABLE_OPTIONS)
36 | else:
37 | self.dbTool.createTabels(self.dbConn, self.zero.DB_TABLE_MYSQL_NODES)
38 | self.dbTool.createTabels(self.dbConn, self.zero.DB_TABLE_MYSQL_EGDES)
39 | self.dbTool.createTabels(self.dbConn, self.zero.DB_TABLE_MYSQL_NEW_INSTA)
40 | self.dbTool.createTabels(self.dbConn, self.zero.DB_TABLE_MYSQL_LOGIN_INSTA)
41 | self.dbTool.createTabels(self.dbConn, self.zero.DB_TABLE_MYSQL_OPTIONS)
42 |
43 | self.dbTool.setDefaultValueOptions(self.dbConn)
44 | self.zero.printText("+ DB setup complete", False)
45 |
46 | #Get usernames
47 | self.sideTool = sideFunc(self.dbTool, self.dbConn, self.zero)
48 | self.sideTool.loadLoginText()
49 | self.sideTool.countCurrentUser()
50 |
51 | #Init INSTAGRAM
52 | self.instagram = Instagram()
53 |
54 | #User Select and Login
55 | #selectUserAndLogin()
56 | self.autoSelectAndLogin()
57 |
58 | #Setup coreFunc
59 | print("+ Setting up core functions")
60 | self.mainFunc = coreFunc(self.dbTool, self.dbConn, self.instagram, self.zero, self.myFace)
61 |
62 | self.MENU_ITEMS = [
63 | { self.zero.HELP_TEXT_DISP: self.dispHelp },
64 | { self.zero.RUN_CURRENT_DISP: self.runSingelScan },
65 | { self.zero.RUN_FOLLOW_DISP: self.runFollowScan },
66 | { self.zero.RUN_CHANGE_USER: self.selectUserAndLogin },
67 | { self.zero.RUN_LOAD_SCAN: self.runLoadUserNodeScan },
68 | { self.zero.RUN_EXPORT_DATA: self.dispExport},
69 | { self.zero.RUN_EDIT_OPTIONS: self.runEditDefault},
70 | { self.zero.RUN_GET_DEEP: self.runDeepfromDB},
71 | { self.zero.RUN_UPDATE_IMG: self.updateImg},
72 | { self.zero.RUN_EXIT_DISP: exit},
73 | ]
74 |
75 | #Core Functions to main
76 | def runSingelScan(self):
77 | #Setup zeroPoint
78 | self.sideTool.lastSearch()
79 |
80 | #Run Scan from zeroPoint
81 | self.mainFunc.setCurrentUser(self.zero.INSTA_USER)
82 | self.runCurrentScan()
83 |
84 | def updateImg(self):
85 | self.mainFunc.updateProfileImg()
86 |
87 | def selectUserAndLogin(self):
88 | #Setusername
89 | self.sideTool.setupLogin()
90 | #Login Instagram
91 | self.loginInstagram(instagram)
92 |
93 | def autoSelectAndLogin(self):
94 | #Find user
95 | self.sideTool.autoSelectLogin()
96 | #Login instagram
97 | self.loginInstagram(self.instagram)
98 |
99 | def runCurrentScan(self):
100 | #Extract info from following list
101 | if self.mainFunc.loadFollowlist(False) == True:
102 | self.mainFunc.add_egde_from_list_insta(False)
103 |
104 | #Extract followed by
105 | if self.mainFunc.loadFollowlist(True) == True:
106 | self.mainFunc.add_egde_from_list_insta(True)
107 |
108 | #Update new_Insta
109 | print("\n- Scan complete")
110 | print("+ Setting {} ({}) to complete.".format(self.zero.INSTA_USER, self.zero.INSTA_USER_ID))
111 | self.dbTool.inserttoTabel(self.dbConn, self.zero.DB_UPDATE_NEW_INSTA_DONE_TRUE, (self.zero.INSTA_USER_ID,))
112 |
113 | def runFollowScan(self):
114 | self.mainFunc.scanFollowToInstaID()
115 | input("+ Press [Enter] to continue...")
116 |
117 | def runLoadUserNodeScan(self):
118 | self.mainFunc.updateNodeFromList()
119 |
120 | def runEditDefault(self):
121 | self.sideTool.editDefaultValue()
122 |
123 | def runDeepfromDB(self):
124 | self.mainFunc.deepScanAll()
125 |
126 | def dispHelp(self):
127 | print(self.zero.HELP_TEXT)
128 | input("\nPress [Enter] to continue...")
129 |
130 | def dispExport(self):
131 | self.mainFunc.exportDBData()
132 | input("+ Press [Enter] to continue...")
133 |
134 | def loginInstagram(self, instagram):
135 | #Iniatlaize Instagram login
136 | print("\n- Connecting to Instagram")
137 | self.instagram.with_credentials(self.zero.LOGIN_USERNAME_INSTA, self.zero.LOGIN_PASSWORD_INSTA, '/cachepath')
138 | self.instagram.login(force=False,two_step_verificator=True)
139 | sleep(2) # Delay to mimic user
140 |
141 | def root_path(self):
142 | return os.path.abspath(os.sep)
143 |
144 | def createFolder(self, folder):
145 | if not os.path.exists(folder):
146 | os.mkdir(folder)
147 | self.zero.printText("+ Folder created: {}".format(folder), True)
148 | else:
149 | self.zero.printText("+ Folder loacted: {}".format(folder), True)
150 |
151 | def createRootfolder(self):
152 | self.zero.OP_ROOT_FOLDER_PATH_VALUE = self.root_path()
153 | self.zero.OP_ROOT_FOLDER_PATH_VALUE = self.zero.OP_ROOT_FOLDER_PATH_VALUE + self.zero.OP_ROOT_FOLDER_NAME_VALUE
154 | self.createFolder(self.zero.OP_ROOT_FOLDER_PATH_VALUE)
155 |
156 | #Setup INSTA_FOLDER
157 | self.zero.OP_INSTA_FOLDER_NAME_VALUE = self.zero.OP_ROOT_FOLDER_PATH_VALUE + self.zero.OP_INSTA_FOLDER_NAME_VALUE
158 | self.zero.OP_INSTA_PROFILEFOLDER_NAME_VALUE = self.zero.OP_INSTA_FOLDER_NAME_VALUE + self.zero.OP_INSTA_PROFILEFOLDER_NAME_VALUE
159 | self.zero.OP_INSTA_INSTAID_FOLDER_VALUE = self.zero.OP_INSTA_FOLDER_NAME_VALUE + self.zero.OP_INSTA_INSTAID_FOLDER_VALUE
160 |
161 | self.createFolder(self.zero.OP_INSTA_FOLDER_NAME_VALUE)
162 | self.createFolder(self.zero.OP_INSTA_PROFILEFOLDER_NAME_VALUE)
163 | self.createFolder(self.zero.OP_INSTA_INSTAID_FOLDER_VALUE)
164 |
165 | #Setting up full path starting
166 | self.zero.DB_DATABASE_FOLDER = self.zero.OP_ROOT_FOLDER_PATH_VALUE + self.zero.DB_DATABASE_FOLDER
167 | self.zero.DB_DATABASE_EXPORT_FOLDER = self.zero.OP_ROOT_FOLDER_PATH_VALUE + self.zero.DB_DATABASE_EXPORT_FOLDER
168 | self.zero.OP_ROOT_CONFIG = self.zero.OP_ROOT_FOLDER_PATH_VALUE + self.zero.OP_ROOT_CONFIG
169 | self.zero.printText("+ Database folder are loacted {}".format(self.zero.DB_DATABASE_FOLDER), False)
170 | self.zero.printText("+ Export folder are loacted {}".format(self.zero.DB_DATABASE_EXPORT_FOLDER), False)
171 | self.zero.printText("+ Config file are loacted {}".format(self.zero.OP_ROOT_CONFIG), False)
172 |
173 | def run():
174 | myOptracker = Optracker()
175 | while True:
176 | print("\n- Main menu")
177 | for item in myOptracker.MENU_ITEMS:
178 | print("[" + str(myOptracker.MENU_ITEMS.index(item)) + "] " + list(item.keys())[0])
179 | choice = input(">> ")
180 |
181 | if choice.isdigit():
182 | newInfo = int(choice)
183 | if newInfo <= len(myOptracker.MENU_ITEMS):
184 | list(myOptracker.MENU_ITEMS[newInfo].values())[0]()
185 | else:
186 | myOptracker.dispHelp()
187 | else:
188 | myOptracker.dispHelp()
189 |
--------------------------------------------------------------------------------
/optracker/zerodata.py:
--------------------------------------------------------------------------------
1 | import os
2 | import json
3 |
4 | class zerodata():
5 | #Define username and password
6 | LOGIN_USERNAME_INSTA = ""
7 | LOGIN_PASSWORD_INSTA = ""
8 | PROGRAM_NAME = "openSource Tracker v.1.3.6"
9 |
10 | #List log
11 | USER_FILES = ( ["user_insta.txt"],
12 | ["user_face.txt"],
13 | ["user_list.txt"]
14 | )
15 |
16 | USER_FILE_SCAN_NODE_INSTA = "user_scan_insta.txt"
17 |
18 | #Menu variabels
19 | HELP_TEXT_DISP = "Display Help"
20 | RUN_CURRENT_DISP = "Singel Scan"
21 | RUN_FOLLOW_DISP = "Scan Followed by to user"
22 | RUN_CHANGE_USER = "Change user Instagram"
23 | RUN_EXPORT_DATA = "Export nodes and egdes"
24 | RUN_EDIT_OPTIONS = "Change default values"
25 | RUN_LOAD_SCAN = "Deepscan from list"
26 | RUN_GET_DEEP = "Deepscan from database"
27 | RUN_UPDATE_IMG = "Update Profile Images"
28 | RUN_EXIT_DISP = "Exit"
29 |
30 | #ERROR codes
31 | ERROR_429 = "429 - To many request"
32 |
33 | #FOLDER Setup
34 | OP_ROOT_FOLDER_PATH_TEXT = "OP_ROOT_FOLDER_PATH"
35 | OP_ROOT_FOLDER_PATH_VALUE = "\\"
36 |
37 | OP_ROOT_FOLDER_NAME_TEXT = "OP_ROOT_FOLDER_NAME"
38 | OP_ROOT_FOLDER_NAME_VALUE = "optracker\\"
39 |
40 | OP_INSTA_FOLDER_NAME_TEXT = "INSTA_FOLDER_NAME"
41 | OP_INSTA_FOLDER_NAME_VALUE = "instadata\\"
42 |
43 | OP_INSTA_PROFILEFOLDER_NAME_TEXT = "INSTA_PROFILE_FOLDER_NAME"
44 | OP_INSTA_PROFILEFOLDER_NAME_VALUE = "profile_pic_insta\\"
45 |
46 | OP_INSTA_INSTAID_FOLDER_TEXT = "INSTA_INSTAID_FOLDER_NAME"
47 | OP_INSTA_INSTAID_FOLDER_VALUE = "post\\"
48 |
49 | #Config filename
50 | OP_ROOT_CONFIG = "optracker.config"
51 |
52 | #Database setup
53 | DB_DATABASE = "openSource-tracker.db"
54 | DB_DATABASE_FOLDER = "db\\"
55 | DB_DATABASE_EXPORT_FOLDER = "export\\"
56 | DB_DATABASE_EXPORT_NODES = "nodes.csv"
57 | DB_DATABASE_EXPORT_INSTA_EGDE = "edges_insta.csv"
58 |
59 | #DB MYSQL
60 | DB_MYSQL = "localhost"
61 | DB_MYSQL_USER = "optracker"
62 | DB_MYSQL_PASSWORD = "localpassword"
63 | DB_MYSQL_DATABASE = "openSource_tracker"
64 | DB_MYSQL_PORT = "3306"
65 | DB_MYSQL_ON = 0
66 | DB_MYSQL_COLLATION = "utf8mb4_general_ci"
67 | DB_MYSQL_CHARSET = "utf8mb4"
68 |
69 | DB_MYSQL_TEXT = "MYSQL_HOST"
70 | DB_MYSQL_USER_TEXT = "MYSQL_USER"
71 | DB_MYSQL_PASSWORD_TEXT = "MYSQL_PASSWORD"
72 | DB_MYSQL_DATABASE_TEXT = "MYSQL_DB"
73 | DB_MYSQL_PORT_TEXT = "MYSQL_PORT"
74 | DB_MYSQL_ON_TEXT = "MYSQL_ON"
75 | DB_MYSQL_COLLATION_TEXT = "MYSQL_COL"
76 | DB_MYSQL_CHARSET_TEXT = "MYSQL_CHAR"
77 |
78 | #SQLIte
79 | DB_TABLE_NODES = """
80 | CREATE TABLE IF NOT EXISTS "nodes" (
81 | "id" INTEGER PRIMARY KEY AUTOINCREMENT,
82 | "name" TEXT,
83 | "label" TEXT,
84 | "insta_id" INTEGER,
85 | "insta_img" TEXT,
86 | "insta_follow" INTEGER,
87 | "insta_follower" INTEGER,
88 | "insta_bio" TEXT,
89 | "insta_username" TEXT,
90 | "insta_private" INTEGER,
91 | "insta_verifyed" INTEGER,
92 | "insta_post" INTEGER,
93 | "insta_exturl" TEXT,
94 | "insta_deepscan" INTEGER DEFAULT 0
95 | );"""
96 |
97 | DB_TABLE_EGDES = """
98 | CREATE TABLE IF NOT EXISTS "egdes_insta" (
99 | "source" INTEGER,
100 | "target" INTEGER,
101 | "type" TEXT DEFAULT 'undirected',
102 | "weight" INTEGER DEFAULT 1
103 | );"""
104 |
105 | DB_TABLE_NEW_INSTA = """
106 | CREATE TABLE IF NOT EXISTS "new_insta" (
107 | "insta_id" INTEGER UNIQUE,
108 | "insta_user" INTEGER,
109 | "done" INTEGER DEFAULT 0,
110 | "wait" INTEGER DEFAULT 0,
111 | "followed_by_done" INTEGER DEFAULT 0
112 | );
113 | """
114 |
115 | DB_TABLE_LOGIN_INSTA = """
116 | CREATE TABLE IF NOT EXISTS "accounts" (
117 | "username" TEXT UNIQUE,
118 | "password" TEXT,
119 | "email" TEXT,
120 | "fullname" TEXT,
121 | "account_type" TEXT,
122 | "current_run" INTEGER DEFAULT 0,
123 | "last_used" TEXT
124 | );
125 | """
126 |
127 | DB_TABLE_OPTIONS = """
128 | CREATE TABLE IF NOT EXISTS "options" (
129 | "what" TEXT UNIQUE,
130 | "value" TEXT,
131 | "ref" TEXT
132 | );
133 | """
134 |
135 | #MYSQL
136 | DB_TABLE_MYSQL_NODES = """
137 | CREATE TABLE IF NOT EXISTS nodes (
138 | id BIGINT(20) NOT NULL AUTO_INCREMENT,
139 | name VARCHAR(64) NULL DEFAULT "N/A",
140 | label VARCHAR(64) NULL DEFAULT "N/A",
141 | insta_id BIGINT(20) NULL DEFAULT 0,
142 | insta_img TEXT NULL,
143 | insta_follow BIGINT(20) NULL DEFAULT 0,
144 | insta_follower BIGINT(20) NOT NULL DEFAULT 0,
145 | insta_bio TEXT NULL,
146 | insta_username VARCHAR(64) NOT NULL DEFAULT "N/A",
147 | insta_private INT(10) NOT NULL DEFAULT 0,
148 | insta_verifyed INT(10) NOT NULL DEFAULT 0,
149 | insta_post BIGINT(20) NOT NULL DEFAULT 0,
150 | insta_exturl TEXT NULL,
151 | insta_deepscan INT(20) NOT NULL DEFAULT '0',
152 | PRIMARY KEY (`id`)) ENGINE = MyISAM
153 | """
154 |
155 | DB_TABLE_MYSQL_EGDES = """
156 | CREATE TABLE IF NOT EXISTS egdes_insta (
157 | source BIGINT(20) NOT NULL ,
158 | target BIGINT(20) NOT NULL ,
159 | type VARCHAR(64) NOT NULL DEFAULT 'undirected' ,
160 | weight INT(20) NOT NULL DEFAULT '1'
161 | ) ENGINE = MyISAM;"""
162 |
163 | DB_TABLE_MYSQL_NEW_INSTA = """
164 | CREATE TABLE IF NOT EXISTS new_insta (
165 | insta_id BIGINT(20) NOT NULL UNIQUE,
166 | insta_user TEXT NULL,
167 | done INT(20) NOT NULL DEFAULT 0,
168 | wait INT(20) NOT NULL DEFAULT 0,
169 | followed_by_done INT(20) NOT NULL DEFAULT 0
170 | ) ENGINE = MyISAM;
171 | """
172 |
173 | DB_TABLE_MYSQL_LOGIN_INSTA = """
174 | CREATE TABLE IF NOT EXISTS accounts (
175 | username VARCHAR(64) NOT NULL UNIQUE,
176 | password VARCHAR(64) NOT NULL,
177 | email VARCHAR(64) NOT NULL,
178 | fullname VARCHAR(64) NOT NULL,
179 | account_type VARCHAR(64) NOT NULL,
180 | current_run INT(20) NOT NULL DEFAULT 0,
181 | last_used VARCHAR(64) NOT NULL DEFAULT 0
182 | ) ENGINE = MyISAM;"""
183 |
184 | DB_TABLE_MYSQL_OPTIONS = """
185 | CREATE TABLE IF NOT EXISTS options (
186 | what VARCHAR(64) NOT NULL UNIQUE,
187 | value VARCHAR(64) NOT NULL,
188 | ref VARCHAR(64) NOT NULL DEFAULT 0
189 | ) ENGINE = MyISAM;
190 | """
191 |
192 | #MySQL
193 | DB_INSERT_MYSQL_NODE = """INSERT INTO nodes (name, label, insta_id, insta_img, insta_follow, insta_follower, insta_bio, insta_username, insta_private, insta_verifyed, insta_post, insta_exturl, insta_deepscan) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s); SELECT id FROM nodes where insta_id = %s;"""
194 | DB_INSERT_MYSQL_INSTA_EGDE = 'INSERT INTO egdes_insta (source, target) VALUES (%s, %s);'
195 | DB_INSERT_MYSQL_NEW_INSTA = 'INSERT INTO new_insta (insta_id, insta_user) VALUES (%s, %s);'
196 | DB_INSERT_MYSQL_LOGIN_INSTA = 'INSERT INTO accounts (username, password, email, fullname, account_type, last_used) VALUES (%s, %s, %s, %s, %s, %s);'
197 | DB_INSERT_MYSQL_OPTIONS_LASTINSTA = 'INSERT INTO options (value, what) VALUES (%s, %s);'
198 | DB_SELECT_MYSQL_DEEPSCAN_NEED = 'SELECT insta_username FROM nodes WHERE insta_deepscan = 0'
199 |
200 | DB_UPDATE_MYSQL_LAST_INSTA = 'UPDATE options SET value = (%s) WHERE what = "LAST_INSTA";'
201 | DB_UPDATE_MYSQL_OPTIONS = 'UPDATE options SET value = (%s) WHERE what = %s;'
202 | DB_UPDATE_MYSQL_NEW_INSTA_DONE_TRUE = 'UPDATE new_insta SET done = 1 WHERE insta_id = %s;'
203 | DB_UPDATE_MYSQL_NEW_INSTA_DONE_FALSE = 'UPDATE new_insta SET done = 0 WHERE insta_id = %s;'
204 | DB_UPDATE_MYSQL_ACCOUNT_LAST_USED = 'UPDATE accounts SET last_used = %s WHERE username = %s'
205 | DB_UPDATE_MYSQL_NODES = 'UPDATE nodes SET name = %s, label = %s, insta_img = %s, insta_follow = %s, insta_follower = %s, insta_bio = %s, insta_username = %s, insta_private = %s, insta_verifyed = %s, insta_post = %s, insta_exturl = %s, insta_deepscan = %s WHERE insta_id = %s'
206 |
207 | DB_SELECT_MYSQL_IMG = 'SELECT insta_username, insta_id, insta_img FROM nodes WHERE insta_img IS NOT NULL AND insta_img IS NOT "None"'
208 | DB_SELECT_MYSQL_EXPORT_ID_USER = 'SELECT id, insta_username FROM nodes'
209 | DB_SELECT_MYSQL_ID_NODE = 'SELECT id FROM nodes WHERE insta_id = %s'
210 | DB_SELECT_MYSQL_USERNAME_NODE = 'SELECT insta_username FROM nodes WHERE insta_id = %s'
211 | DB_SELECT_MYSQL_DONE_NEW_INSTA = 'SELECT done, wait FROM new_insta WHERE insta_id = %s'
212 | DB_SELECT_MYSQL_TARGET_EDGE = 'SELECT target FROM egdes_insta WHERE source = %s AND target = %s'
213 | DB_SELECT_MYSQL_LOGIN_INSTA = 'SELECT * FROM accounts WHERE account_type = "instagram"'
214 | DB_SELECT_MYSQL_LOGIN_PASSWORD_INSTA = 'SELECT password FROM accounts WHERE username = %s AND account_type = "instagram"'
215 | DB_SELECT_MYSQL_OPTIONS = 'SELECT * FROM options WHERE what = %s'
216 | DB_SELECT_MYSQL_ALL_DONE_NEW_INSTA = 'SELECT * FROM new_insta WHERE done = 1'
217 | DB_SELECT_MYSQL_ALL_NODE = "SELECT * FROM nodes"
218 | DB_SELECT_MYSQL_ALL_INSTA_EDGES = "SELECT source, target, type, weight FROM egdes_insta"
219 | DB_SELECT_MYSQL_COUNT_NODES = "SELECT count(*) FROM nodes"
220 | DB_SELECT_MYSQL_COUNT_EDES_INSTA = "SELECT count(*) FROM egdes_insta"
221 | DB_SELECT_MYSQL_INSTA_FOLLOWER_NODE_ID = 'SELECT insta_follower FROM nodes WHERE id = %s'
222 | DB_SELECT_MYSQL_FOLLOW_OF = 'SELECT * FROM nodes as Node INNER JOIN egdes_insta as Edge ON Node.id = Edge.source WHERE Node.insta_private = 0 AND Edge.target = %s'
223 |
224 |
225 | #SQLite
226 | DB_SELECT_EXPORT_ID_USER = 'SELECT id, insta_username FROM nodes'
227 | DB_INSERT_NODE = """INSERT INTO "main"."nodes" ("name", "label", "insta_id", "insta_img", "insta_follow", "insta_follower", "insta_bio", "insta_username", "insta_private", "insta_verifyed", "insta_post", "insta_exturl", "insta_deepscan") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?); SELECT id FROM nodes where insta_id = ?;"""
228 | DB_INSERT_INSTA_EGDE = 'INSERT INTO "main"."egdes_insta" ("source", "target") VALUES (?, ?);'
229 | DB_INSERT_NEW_INSTA = 'INSERT INTO "main"."new_insta" ("insta_id", "insta_user") VALUES (?, ?);'
230 | DB_INSERT_LOGIN_INSTA = 'INSERT INTO "main"."accounts" ("username", "password", "email", "fullname", "account_type", "last_used") VALUES (?, ?, ?, ?, ?, ?);'
231 | DB_INSERT_OPTIONS_LASTINSTA = 'INSERT INTO "main"."options" ("value", "what") VALUES (?, ?);'
232 |
233 | DB_UPDATE_LAST_INSTA = 'UPDATE "main"."options" SET "value" = (?) WHERE "what" = "LAST_INSTA";'
234 | DB_UPDATE_OPTIONS = 'UPDATE "main"."options" SET "value" = (?) WHERE "what" = ?;'
235 | DB_UPDATE_NEW_INSTA_DONE_TRUE = 'UPDATE "main"."new_insta" SET "done" = 1 WHERE "insta_id" = ?;'
236 | DB_UPDATE_NEW_INSTA_DONE_FALSE = 'UPDATE "main"."new_insta" SET "done" = 0 WHERE "insta_id" = ?;'
237 | DB_UPDATE_ACCOUNT_LAST_USED = 'UPDATE "main"."accounts" SET ("last_used") = ? WHERE username = ?'
238 | DB_UPDATE_NODES = 'UPDATE "main"."nodes" SET "name" = ?, "label" = ?, "insta_img" = ?, "insta_follow" = ?, "insta_follower" = ?, "insta_bio" = ?, "insta_username" = ?, "insta_private" = ?, "insta_verifyed" = ?, "insta_post" = ?, "insta_exturl" = ?, "insta_deepscan" = ? WHERE "insta_id" = ?'
239 |
240 | DB_SELECT_IMG = 'SELECT insta_username, insta_id, insta_img FROM nodes WHERE insta_img IS NOT NULL AND insta_img IS NOT "None"'
241 | DB_SELECT_DEEPSCAN_NEED = 'SELECT insta_username FROM nodes WHERE insta_deepscan = 0'
242 | DB_SELECT_ID_NODE = 'SELECT id FROM "main"."nodes" WHERE ("insta_id") = ?'
243 | DB_SELECT_USERNAME_NODE = 'SELECT insta_username FROM "main"."nodes" WHERE insta_id = ?'
244 | DB_SELECT_DONE_NEW_INSTA = 'SELECT done, wait FROM "main"."new_insta" WHERE ("insta_id") = ?'
245 | DB_SELECT_TARGET_EDGE = 'SELECT target FROM "main"."egdes_insta" WHERE source = ? AND target = ?'
246 | DB_SELECT_LOGIN_INSTA = 'SELECT * FROM "main"."accounts" WHERE account_type = "instagram"'
247 | DB_SELECT_LOGIN_PASSWORD_INSTA = 'SELECT password FROM "main"."accounts" WHERE ("username") = ? AND account_type = "instagram"'
248 | DB_SELECT_OPTIONS = 'SELECT * FROM options WHERE what = ?'
249 | DB_SELECT_ALL_DONE_NEW_INSTA = 'SELECT * FROM "main"."new_insta" WHERE done = 1'
250 | DB_SELECT_ALL_NODE = "SELECT * FROM main.nodes"
251 | DB_SELECT_ALL_INSTA_EDGES = "SELECT source, target, type, weight FROM main.egdes_insta"
252 | DB_SELECT_COUNT_NODES = "SELECT count(*) FROM main.nodes"
253 | DB_SELECT_COUNT_EDES_INSTA = "SELECT count(*) FROM main.egdes_insta"
254 | DB_SELECT_INSTA_FOLLOWER_NODE_ID = 'SELECT insta_follower FROM "main"."nodes" WHERE id = ?'
255 | DB_SELECT_FOLLOW_OF = 'SELECT * FROM "main"."nodes" as Node INNER JOIN "main"."egdes_insta" as Edge ON Node.id = Edge.source WHERE Node.insta_private = 0 AND Edge.target = ?'
256 |
257 |
258 | #Startpoint information
259 | INSTA_USER = ""
260 | INSTA_USER_ID = ""
261 | INSERT_DATA = ""
262 | DATETIME_MASK = "%Y-%m-%d %H:%M:%S.%f"
263 | TOTAL_USER_COUNT = 0
264 | WRITE_ENCODING = "utf-8"
265 | ON_ERROR_ENCODING = "replace"
266 | INSTA_FILE_EXT = ".jpg"
267 |
268 | INSTA_MAX_FOLLOW_SCAN_TEXT = "INSTA_MAX_FOLLOW_SCAN"
269 | INSTA_MAX_FOLLOW_SCAN_VALUE = 2000
270 |
271 | INSTA_MAX_FOLLOW_BY_SCAN_TEXT = "INSTA_MAX_FOLLOW_BY_SCAN"
272 | INSTA_MAX_FOLLOW_BY_SCAN_VALUE = 2000
273 |
274 | SURFACE_SCAN_TEXT = "SURFACE_SCAN"
275 | SURFACE_SCAN_VALUE = "1"
276 |
277 | DETAIL_PRINT_TEXT = "DETAIL_PRINT"
278 | DETAIL_PRINT_VALUE = "1"
279 |
280 | LAST_INSTA_TEXT = "LAST_INSTA"
281 | LAST_INSTA_VALUE = ""
282 |
283 | DOWNLOAD_PROFILE_INSTA_TEXT = "DOWNLOAD_PROFILE_INSTA"
284 | DOWNLOAD_PROFILE_INSTA_VALUE = "1"
285 |
286 | FACEREC_ON_TEXT = "FACE_REC_ON"
287 | FACEREC_ON_VALUE = "1"
288 |
289 |
290 | #Help TEXT
291 | HELP_TEXT = """
292 | {} - HELP TEXT
293 |
294 | {} - Scan a specific node
295 | This mode will allow you to run a scan for a specific user and is your first step to generate nodes and edges. You will need to enter a startpoint, it is a instagram username. The program will look it up find follow and followed by. For then to add it to the database with connections.
296 |
297 | {} - Scan all follower
298 | You will be presented with a list of users that you have finnished adding to your database. The program til then scan all the connections it has as it was a first time use and add the data to the database. Short and sweet scan the follow to the follow for a user.
299 |
300 | {} - Allow you to change users
301 | This will give you a list of all avalible users so you can change before the scan if you are not happy with the choice from startup.
302 |
303 | Nodes - Main database
304 | The node database is a collection of all the users that have been scanned. It contains basic data as ID, username, instagram description with more.
305 |
306 | Edges - connections
307 | The edges database is a database with connections between nodes. This is used to create a visual display for how a social nettwork are connected.
308 |
309 | SQLite - The Database
310 | All data are saved in the database found in folder 'db/'. You need to open it in a SQL browser and then export the data in node table and edges table to a .CSV file witch you can import into a visualising program (eks. gephi).
311 |
312 | {} - RUN_EXPORT_DATA
313 | Gives you an overveiew of data collected so far, and exports it to folder {}.
314 |
315 | {}
316 | Loads a list of users from root folder, scraps all info from instagram and updates node DB.
317 |
318 | Max Follows and Max Followed by
319 | During search of follows by, where you scan the profile for one user that have completet the singel search you can set a limit to how many followers a user can have or how many it are following. This is to prevent to scan uninterested profils like public organizations and so on as they can have up to 10K. Default is 2000 and is considerated a normal amount of followes/followed by.
320 |
321 | Deepscan and Surfacescan
322 | On default are SurfaceScan turned off. By turning on surfacescan you only extract username and instagram id when scraping. This is to save you for request to the server so you can use one user for a longer periode of time, and make the scan go quicker if you are scraping a big nettwork. You can later add specific users found in the graphic to a text file and scan only the ones that are interesting and get all the data.
323 |
324 | Print Detail
325 | On Default is it turned ON. You will be presented with all the output the scraper have. If turned OFF you will only get the minimum of info to see if it is working properly.
326 |
327 | ERROR CODES - List of ERROR codes
328 | 001 - INSTAGRAM USER BLOCKED
329 | 002 - TO MANY REQUEST FROM CURRENT USER
330 | 003 - ERROR LOGIN
331 | 004 - USER DONT HAVE ACCESS TO DATA, RETURNING JSON ERROR
332 | """.format(PROGRAM_NAME, RUN_CURRENT_DISP, RUN_FOLLOW_DISP, RUN_CHANGE_USER, RUN_EXPORT_DATA, DB_DATABASE_EXPORT_FOLDER, RUN_LOAD_SCAN)
333 |
334 |
335 | def printText(self, text, override):
336 | if int(self.DETAIL_PRINT_VALUE) == 1:
337 | print(text)
338 | else:
339 | if override == True:
340 | print(text)
341 |
342 | #Removes unwanted symbols in string
343 | def sanTuple(self, text):
344 | text = str(text)
345 | text = text.replace("'", "")
346 | text = text.replace('"', "")
347 | text = text.replace(";", "")
348 |
349 | return text
350 |
351 | def setupJSON(self, export):
352 | if export == True:
353 | self.printText("+ Config export started", False)
354 | DATA = {}
355 | DATA['DB'] = []
356 | DATA['DB'].append({
357 | self.DB_MYSQL_TEXT : self.DB_MYSQL,
358 | self.DB_MYSQL_USER_TEXT : self.DB_MYSQL_USER,
359 | self.DB_MYSQL_PASSWORD_TEXT : self.DB_MYSQL_PASSWORD,
360 | self.DB_MYSQL_DATABASE_TEXT : self.DB_MYSQL_DATABASE,
361 | self.DB_MYSQL_PORT_TEXT : self.DB_MYSQL_PORT,
362 | self.DB_MYSQL_ON_TEXT : self.DB_MYSQL_ON,
363 | self.DB_MYSQL_COLLATION_TEXT : self.DB_MYSQL_COLLATION,
364 | self.DB_MYSQL_CHARSET_TEXT : self.DB_MYSQL_CHARSET,
365 | self.DETAIL_PRINT_TEXT : self.DETAIL_PRINT_VALUE
366 | })
367 |
368 | if os.path.exists(self.OP_ROOT_CONFIG):
369 | self.printText("+ File: {} exist, deleting it.".format(self.OP_ROOT_CONFIG), False)
370 | os.remove(self.OP_ROOT_CONFIG)
371 |
372 | with open(self.OP_ROOT_CONFIG, 'w') as outfile:
373 | json.dump(DATA, outfile)
374 |
375 | self.printText("+ Config export end", False)
376 |
377 | else:
378 | self.printText("+ Config import started", True)
379 | if os.path.exists(self.OP_ROOT_CONFIG):
380 | with open(self.OP_ROOT_CONFIG) as json_file:
381 | data = json.load(json_file)
382 | for p in data['DB']:
383 | self.DB_MYSQL = p[self.DB_MYSQL_TEXT]
384 | self.DB_MYSQL_USER = p[self.DB_MYSQL_USER_TEXT]
385 | self.DB_MYSQL_PASSWORD = p[self.DB_MYSQL_PASSWORD_TEXT]
386 | self.DB_MYSQL_DATABASE = p[self.DB_MYSQL_DATABASE_TEXT]
387 | self.DB_MYSQL_PORT = int(p[self.DB_MYSQL_PORT_TEXT])
388 | self.DB_MYSQL_ON = int(p[self.DB_MYSQL_ON_TEXT])
389 | self.DB_MYSQL_COLLATION = p[self.DB_MYSQL_COLLATION_TEXT]
390 | self.DB_MYSQL_CHARSET = p[self.DB_MYSQL_CHARSET_TEXT]
391 | self.DETAIL_PRINT_VALUE = int(p[self.DETAIL_PRINT_TEXT])
392 |
393 | self.printText("+ {} are set to: {}".format(self.DB_MYSQL_TEXT, self.DB_MYSQL), False)
394 | self.printText("+ {} are set to: {}".format(self.DB_MYSQL_USER_TEXT, self.DB_MYSQL_USER), False)
395 | self.printText("+ {} are set to: {}".format(self.DB_MYSQL_PASSWORD_TEXT, self.DB_MYSQL_PASSWORD), False)
396 | self.printText("+ {} are set to: {}".format(self.DB_MYSQL_DATABASE_TEXT, self.DB_MYSQL_DATABASE), False)
397 | self.printText("+ {} are set to: {}".format(self.DB_MYSQL_PORT_TEXT, self.DB_MYSQL_PORT), False)
398 | self.printText("+ {} are set to: {}".format(self.DB_MYSQL_ON_TEXT, self.DB_MYSQL_ON), False)
399 | self.printText("+ {} are set to: {}".format(self.DB_MYSQL_COLLATION_TEXT, self.DB_MYSQL_COLLATION), False)
400 | self.printText("+ {} are set to: {}".format(self.DB_MYSQL_CHARSET_TEXT, self.DB_MYSQL_CHARSET), False)
401 | self.printText("+ {} are set to: {}".format(self.DETAIL_PRINT_TEXT, self.DETAIL_PRINT_VALUE), False)
402 |
403 | else:
404 | self.printText("+ Config file dosent exist - using standard", False)
405 |
406 | self.changeSQLquery()
407 | self.printText("+ Config import end", False)
408 |
409 | def changeSQLquery(self):
410 | if self.DB_MYSQL_ON == 1:
411 | self.DB_INSERT_NODE = self.DB_INSERT_MYSQL_NODE
412 | self.DB_INSERT_INSTA_EGDE = self.DB_INSERT_MYSQL_INSTA_EGDE
413 | self.DB_INSERT_NEW_INSTA = self.DB_INSERT_MYSQL_NEW_INSTA
414 | self.DB_INSERT_LOGIN_INSTA = self.DB_INSERT_MYSQL_LOGIN_INSTA
415 | self.DB_INSERT_OPTIONS_LASTINSTA = self.DB_INSERT_MYSQL_OPTIONS_LASTINSTA
416 | self.DB_UPDATE_LAST_INSTA = self.DB_UPDATE_MYSQL_LAST_INSTA
417 | self.DB_UPDATE_OPTIONS = self.DB_UPDATE_MYSQL_OPTIONS
418 | self.DB_UPDATE_NEW_INSTA_DONE_TRUE = self.DB_UPDATE_MYSQL_NEW_INSTA_DONE_TRUE
419 | self.DB_UPDATE_NEW_INSTA_DONE_FALSE = self.DB_UPDATE_MYSQL_NEW_INSTA_DONE_FALSE
420 | self.DB_UPDATE_ACCOUNT_LAST_USED = self.DB_UPDATE_MYSQL_ACCOUNT_LAST_USED
421 | self.DB_UPDATE_NODES = self.DB_UPDATE_MYSQL_NODES
422 | self.DB_SELECT_ID_NODE = self.DB_SELECT_MYSQL_ID_NODE
423 | self.DB_SELECT_USERNAME_NODE = self.DB_SELECT_MYSQL_USERNAME_NODE
424 | self.DB_SELECT_DONE_NEW_INSTA = self.DB_SELECT_MYSQL_DONE_NEW_INSTA
425 | self.DB_SELECT_TARGET_EDGE = self.DB_SELECT_MYSQL_TARGET_EDGE
426 | self.DB_SELECT_LOGIN_INSTA = self.DB_SELECT_MYSQL_LOGIN_INSTA
427 | self.DB_SELECT_LOGIN_PASSWORD_INSTA = self.DB_SELECT_MYSQL_LOGIN_PASSWORD_INSTA
428 | self.DB_SELECT_OPTIONS = self.DB_SELECT_MYSQL_OPTIONS
429 | self.DB_SELECT_ALL_DONE_NEW_INSTA = self.DB_SELECT_MYSQL_ALL_DONE_NEW_INSTA
430 | self.DB_SELECT_ALL_NODE = self.DB_SELECT_MYSQL_ALL_NODE
431 | self.DB_SELECT_ALL_INSTA_EDGES = self.DB_SELECT_MYSQL_ALL_INSTA_EDGES
432 | self.DB_SELECT_COUNT_NODES = self.DB_SELECT_MYSQL_COUNT_NODES
433 | self.DB_SELECT_COUNT_EDES_INSTA = self.DB_SELECT_MYSQL_COUNT_EDES_INSTA
434 | self.DB_SELECT_INSTA_FOLLOWER_NODE_ID = self.DB_SELECT_MYSQL_INSTA_FOLLOWER_NODE_ID
435 | self.DB_SELECT_FOLLOW_OF = self.DB_SELECT_MYSQL_FOLLOW_OF
436 | self.DB_SELECT_DEEPSCAN_NEED = self.DB_SELECT_MYSQL_DEEPSCAN_NEED
437 | self.DB_SELECT_EXPORT_ID_USER = self.DB_SELECT_MYSQL_EXPORT_ID_USER
438 | self.DB_SELECT_IMG = self.DB_SELECT_MYSQL_IMG
439 |
440 | def __init__(self):
441 | #Starting up
442 | print("- Starting {}".format(self.PROGRAM_NAME))
443 | print("+ Text Libray loaded")
444 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | python-slugify==3.0.2
2 | unicodecsv==0.14.1
3 | mysql-connector-python==8.0.18
4 | cmake
5 | Pillow
6 | dlib>=19.7
7 |
--------------------------------------------------------------------------------
/run_tracker.py:
--------------------------------------------------------------------------------
1 | from optracker.optracker import run
2 |
3 | if __name__ == '__main__':
4 | run()
5 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | import setuptools
2 | from pathlib import Path
3 |
4 | setuptools.setup(
5 | name="optracker",
6 | version="1.3.6",
7 | description=('Scrapes medias, likes, followers from social media. Organize them in a database for more deeper analyze.'),
8 | long_description=Path("README.md").read_text(),
9 | long_description_content_type="text/markdown",
10 | packages=setuptools.find_packages(),
11 | package_data={'optracker': ['data/face_models/*.dat']},
12 | license="MIT",
13 | maintainer="suxSx",
14 | author='suxSx',
15 | author_email='marcuscrazy@gmail.com',
16 | keywords='scraper media social network mapper tracker instagram scrape like follow analyze',
17 | url='https://github.com/suxSx/openSource-tracker',
18 | entry_points={
19 | 'console_scripts': [
20 | 'optracker = optracker.optracker:run',
21 | ],
22 | },
23 | install_requires=[
24 | 'python-slugify==3.0.2',
25 | 'unicodecsv==0.14.1',
26 | 'mysql-connector-python==8.0.18',
27 | 'cmake',
28 | 'Pillow',
29 | 'dlib>=19.7'
30 | ],
31 | classifiers=[
32 | 'Development Status :: 4 - Beta',
33 | 'Environment :: Console',
34 | 'Operating System :: OS Independent',
35 | 'Intended Audience :: Developers',
36 | 'Intended Audience :: Education',
37 | 'Programming Language :: Python',
38 | 'Programming Language :: Python :: 3.6',
39 | 'Topic :: Education :: Testing',
40 | "License :: OSI Approved :: MIT License"
41 | ],
42 | )
43 |
--------------------------------------------------------------------------------