├── .travis.yml
├── AUTHORS.md
├── LICENSE
├── README.md
├── appveyor.yml
├── instatag
├── __init__.py
├── __main__.py
├── instatag.py
├── tags.tf
└── test.py
├── requirements.txt
└── setup.py
/.travis.yml:
--------------------------------------------------------------------------------
1 | sudo: true
2 |
3 | language: python
4 |
5 | python:
6 | - 3.5
7 |
8 | install:
9 | - pip3 install -r requirements.txt
10 | - python3 setup.py install
11 |
12 | script:
13 | - python3 -m instatag test
--------------------------------------------------------------------------------
/AUTHORS.md:
--------------------------------------------------------------------------------
1 | # Core Developers #
2 |
3 | ----------
4 | - Sepand Haghighi - Sharif University of Technology - ([http://github.com/sepandhaghighi](http://github.com/sepandhaghighi)) ([sepand@qpage.ir](mailto:sepand@qpage.ir))
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 Moduland Co
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |

3 |
4 |
5 |
6 |
7 |
8 |
9 |

10 |

11 |

12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 | # InstaTag
20 |
21 |
22 | Extract Instagram Users from tags (Public , without API and Login)
23 |
24 |
25 | ----------
26 |
27 | By [Moduland Co](http://www.moduland.ir)
28 |
29 | ----------
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 | Linux |
38 | Windows |
39 |
40 |
41 |
42 |
43 |  |
44 | |
45 |
46 |
47 |
48 |
49 |
50 |
51 |
52 | ## Installation
53 | ### Source Code
54 | - Download [Version 0.1](https://github.com/moduland/instatag/archive/v0.1.zip) or [Latest Source ](https://github.com/Moduland/instatag/archive/master.zip)
55 |
56 | - `python3 setup.py install` or `python setup.py install`
57 |
58 | ### PyPI
59 |
60 |
61 | - Check [Python Packaging User Guide](https://packaging.python.org/installing/)
62 | - `pip install instatag` or `pip3 install instatag` (Need root access)
63 |
64 |
65 | ## Usage
66 |
67 | - Manual Inputs : `python -m instatag input tag1,tag2,tag3,...`
68 | - File Input : `python -m instatag file tags.tf`
69 | - Default : `python -m instatag` (Search for `tags.tf` file in current word directory)
70 | - Result in `insta_data` folder
71 |
72 |
73 |

74 |
Screen Record
75 |
76 |
77 |
78 | ## Issues & Bug Reports
79 |
80 | Just fill an issue and describe it. We'll check it ASAP!
81 | or send an email to [info@moduland.ir](mailto:info@moduland.ir "info@moduland.ir").
82 |
83 |
84 | ## Contribution
85 |
86 | You can fork the repository, improve or fix some part of it and then send the pull requests back if you want to see them here. I really appreciate that. ❤️
87 |
88 | Remember to write a few tests for your code before sending pull requests.
89 |
90 | ## Donate to our project
91 |
92 | If you feel like our project is important can you please support us?
93 | Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do.
94 |
95 | Bitcoin :
96 |
97 | ```1XGr9qbZjBpUQJJSB6WtgBQbDTgrhPLPA```
98 |
99 |
100 | Payping (For Iranian citizens) :
101 |
102 |
103 |
104 | ## License
105 |
106 |

107 |
108 |

109 |
110 |
111 |
112 |
113 |
114 |
115 |
116 |
--------------------------------------------------------------------------------
/appveyor.yml:
--------------------------------------------------------------------------------
1 | build: false
2 |
3 | environment:
4 | matrix:
5 |
6 | - PYTHON: "C:\\Python35"
7 | PYTHON_VERSION: "3.5.2"
8 | PYTHON_ARCH: "32"
9 |
10 | - PYTHON: "C:\\Python35"
11 | PYTHON_VERSION: "3.5.2"
12 | PYTHON_ARCH: "64"
13 |
14 |
15 |
16 | init:
17 | - "ECHO %PYTHON% %PYTHON_VERSION% %PYTHON_ARCH%"
18 |
19 | install:
20 | - "%PYTHON%/Scripts/pip.exe install -r requirements.txt"
21 | - "%PYTHON%/python.exe setup.py install"
22 |
23 | test_script:
24 | - "%PYTHON%/python.exe -m instatag test"
--------------------------------------------------------------------------------
/instatag/__init__.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 |
3 | from .instatag import *
--------------------------------------------------------------------------------
/instatag/__main__.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 |
3 | from .instatag import *
4 | import time
5 | import os
6 | import sys
7 | import doctest
8 | import multiprocessing as mu
9 | tag_list=[]
10 | if __name__=="__main__":
11 | args=sys.argv
12 | if len(args)>1:
13 | if args[1].upper()=="HELP":
14 | instatag_help()
15 | sys.exit()
16 | elif args[1].upper()=="TEST":
17 | doctest.testfile("test.py",verbose=True)
18 | sys.exit()
19 | if len(args) > 2:
20 | if args[1].upper()=="INPUT":
21 | tag_list=list(args[2].split(","))
22 | elif args[1].upper()=="FILE":
23 | tag_list=get_tags(args[2])
24 | timer_1=time.time()
25 | if "insta_data" not in os.listdir(os.getcwd()):
26 | os.mkdir("insta_data")
27 | if len(tag_list)==0:
28 | tag_list=get_tags()
29 | p=mu.Pool(mu.cpu_count())
30 | p.map(user_list_gen,tag_list)
31 | timer_2=time.time()
32 | delta_time=int(timer_2-timer_1)
33 | print("InstaTag Data Generated In "+time_convert(delta_time))
--------------------------------------------------------------------------------
/instatag/instatag.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 |
3 | import requests
4 | import time
5 | import socket
6 | import io
7 | from bs4 import BeautifulSoup
8 | from random import randint
9 | import os
10 | import sys
11 | from art import *
12 | DEBUG=True
13 | VERSION="0.1"
14 | def instatag_help():
15 | tprint("Instatag")
16 | tprint("V"+VERSION)
17 | print("Help : \n")
18 | print(" - test --> (run tests)\n")
19 | print(" - input tags --> Example : 'python -m instatag input tag1,tag2,tag3'\n")
20 | print(" - file filename --> Example : 'python -m instatag file file.tf\n")
21 | print(" - --> Example : 'python -m instatag (Search for tags.tf file)")
22 | def tag_url_maker(tag):
23 | return "http://www.instagram.com/explore/tags/"+tag
24 | def post_url_maker(post_hash):
25 | return "https://www.instagram.com/p/"+post_hash
26 |
27 | def step_2_url_maker(name):
28 | return "http://www.instagram.com/"+name+"/media/"
29 | def print_line(number=30,char="-"):
30 | '''
31 | This function print line in screen
32 | :param number: number of items in each line
33 | :param char: each char of line
34 | :return: None
35 | '''
36 | line=""
37 | i=0
38 | while(i>> internet() # if there is stable internet connection
100 | True
101 | >>> internet() # if there is no stable internet connection
102 | False
103 | """
104 | try:
105 | socket.setdefaulttimeout(timeout)
106 | socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect((host, port))
107 | return True
108 | except Exception:
109 | return False
110 | def create_random_sleep(index=1,min_time=5,max_time=60):
111 | '''
112 | This function generate sleep time with random processes
113 | :param index: index to determine first page and messages(index = 0 is for first page)
114 | :param min_time: minimum time of sleep
115 | :param max_time: maximum time of sleep
116 | :type index:int
117 | :type min_time:int
118 | :type max_time:int
119 | :return: time of sleep as integer (a number between max and min)
120 | '''
121 | if index==0:
122 | time_sleep = 5
123 | if DEBUG==True:
124 | print("Wait "+str(time_sleep)+" sec for first search . . .")
125 | else:
126 | time_sleep = randint(min_time, max_time)
127 | if DEBUG==True:
128 | print("Wait "+str(time_sleep)+" sec for next search . . .")
129 | if DEBUG==True:
130 | print_line(70,"*")
131 | return time_sleep
132 |
133 | def post_list_gen(tag,index=0):
134 | '''
135 | This function extract instagram post_link
136 | :param tag: hashtag
137 | :param index: index for resume search
138 | :type tag:str
139 | :type index:int
140 | :return: post links as a list
141 | '''
142 | try:
143 | print("Page Extracting . . .")
144 | post_list = []
145 | tag_url = tag_url_maker(tag)
146 | raw_file = get_html(tag_url)
147 | if "No posts yet." in raw_file:
148 | print("Invalid Tag")
149 | sys.exit()
150 | while(index!=-1):
151 | index=raw_file.find('"shortcode":',index+7,len(raw_file))
152 | length=raw_file[index:].find(',')
153 | post=raw_file[index+13:index+length-1]
154 | if len(post)<50:
155 | post_list.append(post)
156 | return post_list
157 | except Exception:
158 | return post_list
159 |
160 | def step_2_gen(name_list,tag):
161 | '''
162 | This function extract 2nd users (from 1st users follower and following list)
163 | :param name_list: 1st users id
164 | :param tag: hashtag
165 | :type name:list
166 | :type tag:str
167 | :return: None
168 | '''
169 | try:
170 | file=io.open("insta_data/users_2_"+tag+".txt","a",encoding="utf-8")
171 | user_list=[]
172 | for i in range(len(name_list)):
173 | print("ID : "+name_list[i])
174 | user_url=step_2_url_maker(name_list[i])
175 | raw_html=get_html(user_url,max_delay=8)
176 | raw_file = BeautifulSoup(raw_html, "html.parser").prettify()
177 | if raw_file.find('"more_available": false')!=-1:
178 | print("Account is private")
179 | continue
180 | index=0
181 | while (index != -1):
182 | index = raw_file.find('"username":', index + 13, len(raw_file))
183 | length = raw_file[index:].find('}')
184 | user = raw_file[index + 13:index + length - 1]
185 | if user[-1] == '"':
186 | user = user[:-1]
187 | user = user.encode("utf-8")
188 | if len(user) < 50 and user not in user_list and user!=name_list[i]:
189 | file.write(str(user, encoding="utf-8") + "\n")
190 | user_list.append(user)
191 | print(str(len(user_list))+" ID Extracted")
192 | except Exception as ex:
193 | print(str(ex))
194 |
195 |
196 |
197 | def user_list_gen(tag):
198 | '''
199 | This function extract user_list for each tag in first step and then run step_2_gen for second users
200 | :param tag: hastag
201 | :type tag:str
202 | :return: None
203 | '''
204 | try:
205 | hash_list = post_list_gen(tag)
206 | file=io.open("insta_data/users_1_"+tag+".txt","a",encoding="utf-8")
207 | user_list=[]
208 | for i in range(len(hash_list)):
209 | print("Code : "+hash_list[i])
210 | user_url = post_url_maker(hash_list[i])
211 | raw_html = get_html(user_url)
212 | raw_file=BeautifulSoup(raw_html,"html.parser").prettify()
213 | index = 0
214 | while(index!=-1):
215 | index=raw_file.find('"username":',index+13,len(raw_file))
216 | length=raw_file[index:].find('}')
217 | user=raw_file[index+12:index+length-1]
218 | if user[-1]=='"':
219 | user=user[:-1]
220 | if len(user)<50 and (user not in user_list):
221 | user_list.append(user)
222 | user = user.encode("utf-8")
223 | file.write(str(user,encoding="utf-8")+"\n")
224 |
225 | step_2_gen(user_list,tag)
226 | file.close()
227 | except Exception as ex:
228 | print(str(ex))
229 | instatag_help()
230 | step_2_gen(user_list, tag)
231 |
232 | def get_tags(filename="tags.tf"):
233 | '''
234 | This function read tags from a comma seperated file (tags.tf)
235 | :return: Tags as a list
236 | '''
237 | try:
238 | if filename in os.listdir(os.getcwd()):
239 | file=io.open(filename,"r",encoding="utf-8")
240 | contain=file.read()
241 | return contain.strip().split(",")
242 | else:
243 | print("[Error] Tag File Missed!")
244 | instatag_help()
245 | sys.exit()
246 | except Exception:
247 | print("[Error] In Tags Read!")
248 | instatag_help()
249 | sys.exit()
250 |
251 |
--------------------------------------------------------------------------------
/instatag/tags.tf:
--------------------------------------------------------------------------------
1 | Tag1,Tag2,Tag3,Tag4
--------------------------------------------------------------------------------
/instatag/test.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | '''
3 | >>> import multiprocessing as mu
4 | >>> from instatag import *
5 | >>> tag_url_maker("test")
6 | 'http://www.instagram.com/explore/tags/test'
7 | >>> step_2_url_maker("test")
8 | 'http://www.instagram.com/test/media/'
9 | >>> print_line()
10 | ------------------------------
11 | >>> print_line(30,"p")
12 | pppppppppppppppppppppppppppppp
13 | >>> zero_insert("w")
14 | '0w'
15 | >>> zero_insert("asdasdasd")
16 | 'asdasdasd'
17 | >>> time_convert(10000)
18 | '00 days, 02 hour, 46 minutes, 40 seconds'
19 | >>> internet()
20 | True
21 | >>> import random
22 | >>> random.seed(2)
23 | >>> create_random_sleep(index=1,min_time=2,max_time=5)
24 | Wait 2 sec for next search . . .
25 | **********************************************************************
26 | 2
27 |
28 | '''
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | requests
2 | bs4
3 | art
4 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from distutils.core import setup
2 | setup(
3 | name = 'instatag',
4 | packages = ['instatag'],
5 | version = '0.1',
6 | description = 'Extract users from tag in instagram',
7 | long_description="",
8 | author = 'Moduland Co',
9 | author_email = 'info@moduland.ir',
10 | url = 'https://github.com/Moduland/instatag',
11 | download_url = 'https://github.com/Moduland/instatag/tarball/v0.1',
12 | keywords = ['extract', 'scrap', 'instagram','python','tags','users'],
13 | install_requires=[
14 | 'art',
15 | 'bs4',
16 | 'requests',
17 | ],
18 | classifiers = [
19 | 'Development Status :: 3 - Alpha',
20 | 'Intended Audience :: End Users/Desktop',
21 | 'License :: OSI Approved :: MIT License',
22 | 'Operating System :: OS Independent',
23 | 'Programming Language :: Python :: 3.4',
24 | 'Programming Language :: Python :: 3.5',
25 | 'Programming Language :: Python :: 3.6',
26 | 'Topic :: Internet',
27 | ],
28 | license='MIT',
29 | )
30 |
--------------------------------------------------------------------------------