├── .editorconfig ├── .github └── ISSUE_TEMPLATE.md ├── .gitignore ├── .travis.yml ├── AUTHORS.rst ├── CONTRIBUTING.rst ├── HISTORY.rst ├── LICENSE ├── MANIFEST.in ├── Makefile ├── README.rst ├── docs ├── Makefile ├── authors.rst ├── conf.py ├── contributing.rst ├── history.rst ├── index.rst ├── installation.rst ├── make.bat ├── modules.rst ├── readme.rst ├── socials.rst └── usage.rst ├── requirements_dev.txt ├── setup.cfg ├── setup.py ├── socials ├── __init__.py ├── cli.py └── socials.py ├── tests └── test_socials.py └── tox.ini /.editorconfig: -------------------------------------------------------------------------------- 1 | # http://editorconfig.org 2 | 3 | root = true 4 | 5 | [*] 6 | indent_style = space 7 | indent_size = 4 8 | trim_trailing_whitespace = true 9 | insert_final_newline = true 10 | charset = utf-8 11 | end_of_line = lf 12 | 13 | [*.bat] 14 | indent_style = tab 15 | end_of_line = crlf 16 | 17 | [LICENSE] 18 | insert_final_newline = false 19 | 20 | [Makefile] 21 | indent_style = tab 22 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | * Socials version: 2 | * Python version: 3 | * Operating System: 4 | 5 | ### Description 6 | 7 | Describe what you were trying to get done. 8 | Tell us what happened, what went wrong, and what you expected to happen. 9 | 10 | ### What I Did 11 | 12 | ``` 13 | Paste the command(s) you ran and the output. 14 | If there was a crash, please include the traceback here. 15 | ``` 16 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | 58 | # Flask stuff: 59 | instance/ 60 | .webassets-cache 61 | 62 | # Scrapy stuff: 63 | .scrapy 64 | 65 | # Sphinx documentation 66 | docs/_build/ 67 | 68 | # PyBuilder 69 | target/ 70 | 71 | # Jupyter Notebook 72 | .ipynb_checkpoints 73 | 74 | # pyenv 75 | .python-version 76 | 77 | # celery beat schedule file 78 | celerybeat-schedule 79 | 80 | # SageMath parsed files 81 | *.sage.py 82 | 83 | # dotenv 84 | .env 85 | 86 | # virtualenv 87 | .venv 88 | venv/ 89 | ENV/ 90 | 91 | # Spyder project settings 92 | .spyderproject 93 | .spyproject 94 | 95 | # Rope project settings 96 | .ropeproject 97 | 98 | # mkdocs documentation 99 | /site 100 | 101 | # mypy 102 | .mypy_cache/ 103 | 104 | # IDEs 105 | .idea 106 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | python: 3 | - 3.6 4 | - 3.5 5 | - 3.4 6 | install: pip install -U tox-travis 7 | script: tox 8 | deploy: 9 | provider: pypi 10 | distributions: sdist bdist_wheel 11 | user: lorey 12 | password: 13 | secure: GTc0gR1MHaDQV8Uah2wZFiJp4QX3L3yqyiMaDs8hBlIPhQJ+A3Cxo62c2TbyuG46DGLK0CLhCZfd5fyNVYS9QRDm6bMZsIBNVE6aKbGLl5ZzmN+C9Mhn2D4ePFl+acQ/k98DIw9qbb2DQGuL3E0ZmL2XbW250P638WSyr2HnaCLkKng99y7ZsK8AwPjhtF9EchxS3rZ36Xl+Zd91x7hTB5tVvUNn3NdRJFZk6iksO1YGQOFDYwprEiB8ToUD3o/7TyqEQa21Iok4v5zYjmavvQKSlMNaQOZx3ts3WxFIt1GdX2eG3IwH+D5C78PCCA8g4BaQTCdeBmG+A1xUyU69fRmrQ0C66NS4yrT6xlYwx2QaXORamp3waQfboiSATljhdRhN39x6/8+/kg6CHrYTk/62X3r0mQ2UTK6HIAt29V0uWQ0nPYssaRaX1H8GiRcCKC9FwYL1Iy7s235ImAW9yeOlHTRmMr0R3qHKoAbnE5kQjRKbuv2wDOebA2NBDMR3UdAHLAE/r0aqolY6Sr442c891dNI4ST3ZPAbQQGfzhhh4mgG1N88F9du+bYT+iise9XITU5KzfxVifAG6Is7PD90dupR56A8EnSBA5ZHpDjG3NcmapavQ/EBhST26GFEQQm/tXfvr/gkBM5ThwdqtbuLFgPMEQkZS7A4pkmrhnI= 14 | on: 15 | tags: true 16 | repo: lorey/socials 17 | python: 3.6 18 | -------------------------------------------------------------------------------- /AUTHORS.rst: -------------------------------------------------------------------------------- 1 | ======= 2 | Credits 3 | ======= 4 | 5 | Development Lead 6 | ---------------- 7 | 8 | * Karl Lorey 9 | 10 | Contributors 11 | ------------ 12 | 13 | * Dan Stace 14 | -------------------------------------------------------------------------------- /CONTRIBUTING.rst: -------------------------------------------------------------------------------- 1 | .. highlight:: shell 2 | 3 | ============ 4 | Contributing 5 | ============ 6 | 7 | Contributions are welcome, and they are greatly appreciated! Every little bit 8 | helps, and credit will always be given. 9 | 10 | You can contribute in many ways: 11 | 12 | Types of Contributions 13 | ---------------------- 14 | 15 | Report Bugs 16 | ~~~~~~~~~~~ 17 | 18 | Report bugs at https://github.com/lorey/socials/issues. 19 | 20 | If you are reporting a bug, please include: 21 | 22 | * Your operating system name and version. 23 | * Any details about your local setup that might be helpful in troubleshooting. 24 | * Detailed steps to reproduce the bug. 25 | 26 | Fix Bugs 27 | ~~~~~~~~ 28 | 29 | Look through the GitHub issues for bugs. Anything tagged with "bug" and "help 30 | wanted" is open to whoever wants to implement it. 31 | 32 | Implement Features 33 | ~~~~~~~~~~~~~~~~~~ 34 | 35 | Look through the GitHub issues for features. Anything tagged with "enhancement" 36 | and "help wanted" is open to whoever wants to implement it. 37 | 38 | Write Documentation 39 | ~~~~~~~~~~~~~~~~~~~ 40 | 41 | Socials could always use more documentation, whether as part of the 42 | official Socials docs, in docstrings, or even on the web in blog posts, 43 | articles, and such. 44 | 45 | Submit Feedback 46 | ~~~~~~~~~~~~~~~ 47 | 48 | The best way to send feedback is to file an issue at https://github.com/lorey/socials/issues. 49 | 50 | If you are proposing a feature: 51 | 52 | * Explain in detail how it would work. 53 | * Keep the scope as narrow as possible, to make it easier to implement. 54 | * Remember that this is a volunteer-driven project, and that contributions 55 | are welcome :) 56 | 57 | Get Started! 58 | ------------ 59 | 60 | Ready to contribute? Here's how to set up `socials` for local development. 61 | 62 | 1. Fork the `socials` repo on GitHub. 63 | 2. Clone your fork locally:: 64 | 65 | $ git clone git@github.com:your_name_here/socials.git 66 | 67 | 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:: 68 | 69 | $ mkvirtualenv socials 70 | $ cd socials/ 71 | $ python setup.py develop 72 | 73 | 4. Create a branch for local development:: 74 | 75 | $ git checkout -b name-of-your-bugfix-or-feature 76 | 77 | Now you can make your changes locally. 78 | 79 | 5. When you're done making changes, check that your changes pass flake8 and the 80 | tests, including testing other Python versions with tox:: 81 | 82 | $ flake8 socials tests 83 | $ python setup.py test or py.test 84 | $ tox 85 | 86 | To get flake8 and tox, just pip install them into your virtualenv. 87 | 88 | 6. Commit your changes and push your branch to GitHub:: 89 | 90 | $ git add . 91 | $ git commit -m "Your detailed description of your changes." 92 | $ git push origin name-of-your-bugfix-or-feature 93 | 94 | 7. Submit a pull request through the GitHub website. 95 | 96 | Pull Request Guidelines 97 | ----------------------- 98 | 99 | Before you submit a pull request, check that it meets these guidelines: 100 | 101 | 1. The pull request should include tests. 102 | 2. If the pull request adds functionality, the docs should be updated. Put 103 | your new functionality into a function with a docstring, and add the 104 | feature to the list in README.rst. 105 | 3. The pull request should work for Python 2.7, 3.4, 3.5 and 3.6, and for PyPy. Check 106 | https://travis-ci.org/lorey/socials/pull_requests 107 | and make sure that the tests pass for all supported Python versions. 108 | 109 | Tips 110 | ---- 111 | 112 | To run a subset of tests:: 113 | 114 | $ py.test tests.test_socials 115 | 116 | 117 | Deploying 118 | --------- 119 | 120 | A reminder for the maintainers on how to deploy. 121 | Make sure all your changes are committed (including an entry in HISTORY.rst). 122 | Then run:: 123 | 124 | $ bumpversion patch # possible: major / minor / patch 125 | $ git push 126 | $ git push --tags 127 | 128 | Travis will then deploy to PyPI if tests pass. 129 | -------------------------------------------------------------------------------- /HISTORY.rst: -------------------------------------------------------------------------------- 1 | ======= 2 | History 3 | ======= 4 | 5 | 0.2.0 (2018-05-31) 6 | ------------------ 7 | 8 | * Email address extraction. 9 | * Extraction of specific platforms. 10 | 11 | 0.1.0 (2018-05-18) 12 | ------------------ 13 | 14 | * First release on PyPI. 15 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Social Account Detection for Python 5 | Copyright (C) 2018 Karl Lorey 6 | 7 | This program is free software: you can redistribute it and/or modify 8 | it under the terms of the GNU General Public License as published by 9 | the Free Software Foundation, either version 3 of the License, or 10 | (at your option) any later version. 11 | 12 | This program is distributed in the hope that it will be useful, 13 | but WITHOUT ANY WARRANTY; without even the implied warranty of 14 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15 | GNU General Public License for more details. 16 | 17 | You should have received a copy of the GNU General Public License 18 | along with this program. If not, see . 19 | 20 | Also add information on how to contact you by electronic and paper mail. 21 | 22 | You should also get your employer (if you work as a programmer) or school, 23 | if any, to sign a "copyright disclaimer" for the program, if necessary. 24 | For more information on this, and how to apply and follow the GNU GPL, see 25 | . 26 | 27 | The GNU General Public License does not permit incorporating your program 28 | into proprietary programs. If your program is a subroutine library, you 29 | may consider it more useful to permit linking proprietary applications with 30 | the library. If this is what you want to do, use the GNU Lesser General 31 | Public License instead of this License. But first, please read 32 | . 33 | 34 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include AUTHORS.rst 2 | include CONTRIBUTING.rst 3 | include HISTORY.rst 4 | include LICENSE 5 | include README.rst 6 | 7 | recursive-include tests * 8 | recursive-exclude * __pycache__ 9 | recursive-exclude * *.py[co] 10 | 11 | recursive-include docs *.rst conf.py Makefile make.bat *.jpg *.png *.gif 12 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: clean clean-test clean-pyc clean-build docs help 2 | .DEFAULT_GOAL := help 3 | 4 | define BROWSER_PYSCRIPT 5 | import os, webbrowser, sys 6 | 7 | try: 8 | from urllib import pathname2url 9 | except: 10 | from urllib.request import pathname2url 11 | 12 | webbrowser.open("file://" + pathname2url(os.path.abspath(sys.argv[1]))) 13 | endef 14 | export BROWSER_PYSCRIPT 15 | 16 | define PRINT_HELP_PYSCRIPT 17 | import re, sys 18 | 19 | for line in sys.stdin: 20 | match = re.match(r'^([a-zA-Z_-]+):.*?## (.*)$$', line) 21 | if match: 22 | target, help = match.groups() 23 | print("%-20s %s" % (target, help)) 24 | endef 25 | export PRINT_HELP_PYSCRIPT 26 | 27 | BROWSER := python -c "$$BROWSER_PYSCRIPT" 28 | 29 | help: 30 | @python -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST) 31 | 32 | clean: clean-build clean-pyc clean-test ## remove all build, test, coverage and Python artifacts 33 | 34 | clean-build: ## remove build artifacts 35 | rm -fr build/ 36 | rm -fr dist/ 37 | rm -fr .eggs/ 38 | find . -name '*.egg-info' -exec rm -fr {} + 39 | find . -name '*.egg' -exec rm -f {} + 40 | 41 | clean-pyc: ## remove Python file artifacts 42 | find . -name '*.pyc' -exec rm -f {} + 43 | find . -name '*.pyo' -exec rm -f {} + 44 | find . -name '*~' -exec rm -f {} + 45 | find . -name '__pycache__' -exec rm -fr {} + 46 | 47 | clean-test: ## remove test and coverage artifacts 48 | rm -fr .tox/ 49 | rm -f .coverage 50 | rm -fr htmlcov/ 51 | rm -fr .pytest_cache 52 | 53 | lint: ## check style with flake8 54 | flake8 socials tests 55 | 56 | test: ## run tests quickly with the default Python 57 | py.test 58 | 59 | test-all: ## run tests on every Python version with tox 60 | tox 61 | 62 | coverage: ## check code coverage quickly with the default Python 63 | coverage run --source socials -m pytest 64 | coverage report -m 65 | coverage html 66 | $(BROWSER) htmlcov/index.html 67 | 68 | docs: ## generate Sphinx HTML documentation, including API docs 69 | rm -f docs/socials.rst 70 | rm -f docs/modules.rst 71 | sphinx-apidoc -o docs/ socials 72 | $(MAKE) -C docs clean 73 | $(MAKE) -C docs html 74 | $(BROWSER) docs/_build/html/index.html 75 | 76 | servedocs: docs ## compile the docs watching for changes 77 | watchmedo shell-command -p '*.rst' -c '$(MAKE) -C docs html' -R -D . 78 | 79 | release: dist ## package and upload a release 80 | twine upload dist/* 81 | 82 | dist: clean ## builds source and wheel package 83 | python setup.py sdist 84 | python setup.py bdist_wheel 85 | ls -l dist 86 | 87 | install: clean ## install the package to the active Python's site-packages 88 | python setup.py install 89 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | ======= 2 | Socials 3 | ======= 4 | 5 | 6 | .. image:: https://img.shields.io/pypi/v/socials.svg 7 | :target: https://pypi.python.org/pypi/socials 8 | 9 | .. image:: https://img.shields.io/travis/lorey/socials.svg 10 | :target: https://travis-ci.org/lorey/socials 11 | 12 | .. image:: https://readthedocs.org/projects/socials/badge/?version=latest 13 | :target: https://socials.readthedocs.io/en/latest/?badge=latest 14 | :alt: Documentation Status 15 | 16 | 17 | 18 | 19 | Social Account Detection and Extraction for Python 20 | 21 | 22 | * Free software: GNU General Public License v3 23 | * Documentation: https://socials.readthedocs.io. 24 | * Source: https://github.com/lorey/socials 25 | 26 | 27 | Features 28 | -------- 29 | 30 | * Detect and extract URLs of social accounts: throw in URLs, get back URLs of social media profiles by type. 31 | * Currently supports Facebook, Twitter, LinkedIn, GitHub, and Emails. 32 | 33 | Usage 34 | ----- 35 | 36 | Install it with ``pip install socials`` and use it as follows: 37 | 38 | .. code-block:: python 39 | 40 | >>> hrefs = ['https://facebook.com/peterparker', 'https://techcrunch.com', 'https://github.com/lorey'] 41 | >>> socials.extract(hrefs).get_matches_per_platform() 42 | {'github': ['https://github.com/lorey'], 'facebook': ['https://facebook.com/peterparker']} 43 | >>> socials.extract(hrefs).get_matches_for_platform('github') 44 | ['https://github.com/lorey'] 45 | 46 | Read more about `usage in our documentation`_. 47 | 48 | .. _usage in our documentation: https://socials.readthedocs.io/en/latest/usage.html 49 | 50 | Socials API 51 | -------- 52 | There's also `an API called Socials API`_ that allows you to use the functionality via REST. 53 | You can use a `free online version`_, try it in the browser, or deploy it yourself. 54 | 55 | .. _an API called Socials API: https://github.com/lorey/socials-api 56 | .. _free online version: https://socials.karllorey.com 57 | 58 | Development 59 | ----------- 60 | 61 | * Create virtual envirenment ``venv`` with ``virtualenv -p /usr/bin/python3 venv``. 62 | * Activate the environment with ``source venv/bin/activate``. 63 | * Install the development requirements with ``pip install -r requirements-dev.txt``. 64 | * Run the tests: ``tox`` or ``python setup.py test`` 65 | 66 | Credits 67 | ------- 68 | 69 | This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template. 70 | 71 | .. _Cookiecutter: https://github.com/audreyr/cookiecutter 72 | .. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage 73 | -------------------------------------------------------------------------------- /docs/Makefile: -------------------------------------------------------------------------------- 1 | # Minimal makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = python -msphinx 7 | SPHINXPROJ = socials 8 | SOURCEDIR = . 9 | BUILDDIR = _build 10 | 11 | # Put it first so that "make" without argument is like "make help". 12 | help: 13 | @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 14 | 15 | .PHONY: help Makefile 16 | 17 | # Catch-all target: route all unknown targets to Sphinx using the new 18 | # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). 19 | %: Makefile 20 | @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 21 | -------------------------------------------------------------------------------- /docs/authors.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../AUTHORS.rst 2 | -------------------------------------------------------------------------------- /docs/conf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # socials documentation build configuration file, created by 5 | # sphinx-quickstart on Fri Jun 9 13:47:02 2017. 6 | # 7 | # This file is execfile()d with the current directory set to its 8 | # containing dir. 9 | # 10 | # Note that not all possible configuration values are present in this 11 | # autogenerated file. 12 | # 13 | # All configuration values have a default; values that are commented out 14 | # serve to show the default. 15 | 16 | # If extensions (or modules to document with autodoc) are in another 17 | # directory, add these directories to sys.path here. If the directory is 18 | # relative to the documentation root, use os.path.abspath to make it 19 | # absolute, like shown here. 20 | # 21 | import os 22 | import sys 23 | sys.path.insert(0, os.path.abspath('..')) 24 | 25 | import socials 26 | 27 | # -- General configuration --------------------------------------------- 28 | 29 | # If your documentation needs a minimal Sphinx version, state it here. 30 | # 31 | # needs_sphinx = '1.0' 32 | 33 | # Add any Sphinx extension module names here, as strings. They can be 34 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones. 35 | extensions = ['sphinx.ext.autodoc', 'sphinx.ext.viewcode'] 36 | 37 | # Add any paths that contain templates here, relative to this directory. 38 | templates_path = ['_templates'] 39 | 40 | # The suffix(es) of source filenames. 41 | # You can specify multiple suffix as a list of string: 42 | # 43 | # source_suffix = ['.rst', '.md'] 44 | source_suffix = '.rst' 45 | 46 | # The master toctree document. 47 | master_doc = 'index' 48 | 49 | # General information about the project. 50 | project = u'Socials' 51 | copyright = u"2018, Karl Lorey" 52 | author = u"Karl Lorey" 53 | 54 | # The version info for the project you're documenting, acts as replacement 55 | # for |version| and |release|, also used in various other places throughout 56 | # the built documents. 57 | # 58 | # The short X.Y version. 59 | version = socials.__version__ 60 | # The full version, including alpha/beta/rc tags. 61 | release = socials.__version__ 62 | 63 | # The language for content autogenerated by Sphinx. Refer to documentation 64 | # for a list of supported languages. 65 | # 66 | # This is also used if you do content translation via gettext catalogs. 67 | # Usually you set "language" from the command line for these cases. 68 | language = None 69 | 70 | # List of patterns, relative to source directory, that match files and 71 | # directories to ignore when looking for source files. 72 | # This patterns also effect to html_static_path and html_extra_path 73 | exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] 74 | 75 | # The name of the Pygments (syntax highlighting) style to use. 76 | pygments_style = 'sphinx' 77 | 78 | # If true, `todo` and `todoList` produce output, else they produce nothing. 79 | todo_include_todos = False 80 | 81 | 82 | # -- Options for HTML output ------------------------------------------- 83 | 84 | # The theme to use for HTML and HTML Help pages. See the documentation for 85 | # a list of builtin themes. 86 | # 87 | html_theme = 'alabaster' 88 | 89 | # Theme options are theme-specific and customize the look and feel of a 90 | # theme further. For a list of options available for each theme, see the 91 | # documentation. 92 | # 93 | # html_theme_options = {} 94 | 95 | # Add any paths that contain custom static files (such as style sheets) here, 96 | # relative to this directory. They are copied after the builtin static files, 97 | # so a file named "default.css" will overwrite the builtin "default.css". 98 | html_static_path = ['_static'] 99 | 100 | 101 | # -- Options for HTMLHelp output --------------------------------------- 102 | 103 | # Output file base name for HTML help builder. 104 | htmlhelp_basename = 'socialsdoc' 105 | 106 | 107 | # -- Options for LaTeX output ------------------------------------------ 108 | 109 | latex_elements = { 110 | # The paper size ('letterpaper' or 'a4paper'). 111 | # 112 | # 'papersize': 'letterpaper', 113 | 114 | # The font size ('10pt', '11pt' or '12pt'). 115 | # 116 | # 'pointsize': '10pt', 117 | 118 | # Additional stuff for the LaTeX preamble. 119 | # 120 | # 'preamble': '', 121 | 122 | # Latex figure (float) alignment 123 | # 124 | # 'figure_align': 'htbp', 125 | } 126 | 127 | # Grouping the document tree into LaTeX files. List of tuples 128 | # (source start file, target name, title, author, documentclass 129 | # [howto, manual, or own class]). 130 | latex_documents = [ 131 | (master_doc, 'socials.tex', 132 | u'Socials Documentation', 133 | u'Karl Lorey', 'manual'), 134 | ] 135 | 136 | 137 | # -- Options for manual page output ------------------------------------ 138 | 139 | # One entry per manual page. List of tuples 140 | # (source start file, name, description, authors, manual section). 141 | man_pages = [ 142 | (master_doc, 'socials', 143 | u'Socials Documentation', 144 | [author], 1) 145 | ] 146 | 147 | 148 | # -- Options for Texinfo output ---------------------------------------- 149 | 150 | # Grouping the document tree into Texinfo files. List of tuples 151 | # (source start file, target name, title, author, 152 | # dir menu entry, description, category) 153 | texinfo_documents = [ 154 | (master_doc, 'socials', 155 | u'Socials Documentation', 156 | author, 157 | 'socials', 158 | 'One line description of project.', 159 | 'Miscellaneous'), 160 | ] 161 | 162 | 163 | 164 | -------------------------------------------------------------------------------- /docs/contributing.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../CONTRIBUTING.rst 2 | -------------------------------------------------------------------------------- /docs/history.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../HISTORY.rst 2 | -------------------------------------------------------------------------------- /docs/index.rst: -------------------------------------------------------------------------------- 1 | Welcome to Socials's documentation! 2 | ====================================== 3 | 4 | .. toctree:: 5 | :maxdepth: 2 6 | :caption: Contents: 7 | 8 | readme 9 | installation 10 | usage 11 | modules 12 | contributing 13 | authors 14 | history 15 | 16 | Indices and tables 17 | ================== 18 | * :ref:`genindex` 19 | * :ref:`modindex` 20 | * :ref:`search` 21 | -------------------------------------------------------------------------------- /docs/installation.rst: -------------------------------------------------------------------------------- 1 | .. highlight:: shell 2 | 3 | ============ 4 | Installation 5 | ============ 6 | 7 | 8 | Stable release 9 | -------------- 10 | 11 | To install Socials, run this command in your terminal: 12 | 13 | .. code-block:: console 14 | 15 | $ pip install socials 16 | 17 | This is the preferred method to install Socials, as it will always install the most recent stable release. 18 | 19 | If you don't have `pip`_ installed, this `Python installation guide`_ can guide 20 | you through the process. 21 | 22 | .. _pip: https://pip.pypa.io 23 | .. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/ 24 | 25 | 26 | From sources 27 | ------------ 28 | 29 | The sources for Socials can be downloaded from the `Github repo`_. 30 | 31 | You can either clone the public repository: 32 | 33 | .. code-block:: console 34 | 35 | $ git clone git://github.com/lorey/socials 36 | 37 | Or download the `tarball`_: 38 | 39 | .. code-block:: console 40 | 41 | $ curl -OL https://github.com/lorey/socials/tarball/master 42 | 43 | Once you have a copy of the source, you can install it with: 44 | 45 | .. code-block:: console 46 | 47 | $ python setup.py install 48 | 49 | 50 | .. _Github repo: https://github.com/lorey/socials 51 | .. _tarball: https://github.com/lorey/socials/tarball/master 52 | -------------------------------------------------------------------------------- /docs/make.bat: -------------------------------------------------------------------------------- 1 | @ECHO OFF 2 | 3 | pushd %~dp0 4 | 5 | REM Command file for Sphinx documentation 6 | 7 | if "%SPHINXBUILD%" == "" ( 8 | set SPHINXBUILD=python -msphinx 9 | ) 10 | set SOURCEDIR=. 11 | set BUILDDIR=_build 12 | set SPHINXPROJ=socials 13 | 14 | if "%1" == "" goto help 15 | 16 | %SPHINXBUILD% >NUL 2>NUL 17 | if errorlevel 9009 ( 18 | echo. 19 | echo.The Sphinx module was not found. Make sure you have Sphinx installed, 20 | echo.then set the SPHINXBUILD environment variable to point to the full 21 | echo.path of the 'sphinx-build' executable. Alternatively you may add the 22 | echo.Sphinx directory to PATH. 23 | echo. 24 | echo.If you don't have Sphinx installed, grab it from 25 | echo.http://sphinx-doc.org/ 26 | exit /b 1 27 | ) 28 | 29 | %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% 30 | goto end 31 | 32 | :help 33 | %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% 34 | 35 | :end 36 | popd 37 | -------------------------------------------------------------------------------- /docs/modules.rst: -------------------------------------------------------------------------------- 1 | socials 2 | ======= 3 | 4 | .. toctree:: 5 | :maxdepth: 4 6 | 7 | socials 8 | -------------------------------------------------------------------------------- /docs/readme.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../README.rst 2 | -------------------------------------------------------------------------------- /docs/socials.rst: -------------------------------------------------------------------------------- 1 | socials package 2 | =============== 3 | 4 | Submodules 5 | ---------- 6 | 7 | socials.cli module 8 | ------------------ 9 | 10 | .. automodule:: socials.cli 11 | :members: 12 | :undoc-members: 13 | :show-inheritance: 14 | 15 | socials.socials module 16 | ---------------------- 17 | 18 | .. automodule:: socials.socials 19 | :members: 20 | :undoc-members: 21 | :show-inheritance: 22 | 23 | 24 | Module contents 25 | --------------- 26 | 27 | .. automodule:: socials 28 | :members: 29 | :undoc-members: 30 | :show-inheritance: 31 | -------------------------------------------------------------------------------- /docs/usage.rst: -------------------------------------------------------------------------------- 1 | ===== 2 | Usage 3 | ===== 4 | 5 | To use Socials in a project:: 6 | 7 | import socials 8 | 9 | 10 | Let's assume that you have a list of href attribute values: 11 | 12 | .. code-block:: python 13 | 14 | >>> hrefs = ['https://facebook.com/peterparker', 'https://techcrunch.com', 'https://github.com/lorey'] 15 | 16 | You can then extract all matches, i.e. social accounts and email addresses, as follows: 17 | 18 | .. code-block:: python 19 | 20 | >>> socials.extract(hrefs).get_matches_per_platform() 21 | {'github': ['https://github.com/lorey'], 'facebook': ['https://facebook.com/peterparker']} 22 | 23 | Or to extract matches for one specific platform only, e.g. github, you do: 24 | 25 | .. code-block:: python 26 | 27 | >>> socials.extract(hrefs).get_matches_for_platform('github') 28 | ['https://github.com/lorey'] 29 | -------------------------------------------------------------------------------- /requirements_dev.txt: -------------------------------------------------------------------------------- 1 | pip==9.0.1 2 | bumpversion==0.5.3 3 | wheel==0.30.0 4 | watchdog==0.8.3 5 | flake8==3.5.0 6 | tox==2.9.1 7 | coverage==4.5.1 8 | Sphinx==1.7.1 9 | twine==1.10.0 10 | 11 | pytest==3.4.2 12 | pytest-runner==2.11.1 13 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [bumpversion] 2 | current_version = 0.2.0 3 | commit = True 4 | tag = True 5 | 6 | [bumpversion:file:setup.py] 7 | search = version='{current_version}' 8 | replace = version='{new_version}' 9 | 10 | [bumpversion:file:socials/__init__.py] 11 | search = __version__ = '{current_version}' 12 | replace = __version__ = '{new_version}' 13 | 14 | [bdist_wheel] 15 | universal = 1 16 | 17 | [flake8] 18 | exclude = docs 19 | 20 | [aliases] 21 | test = pytest 22 | 23 | [tool:pytest] 24 | collect_ignore = ['setup.py'] 25 | 26 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | """The setup script.""" 5 | 6 | from setuptools import setup, find_packages 7 | 8 | with open('README.rst') as readme_file: 9 | readme = readme_file.read() 10 | 11 | with open('HISTORY.rst') as history_file: 12 | history = history_file.read() 13 | 14 | requirements = ['Click>=6.0', ] 15 | 16 | setup_requirements = ['pytest-runner', ] 17 | 18 | test_requirements = ['pytest', ] 19 | 20 | setup( 21 | author="Karl Lorey", 22 | author_email='git@karllorey.com', 23 | classifiers=[ 24 | 'Development Status :: 2 - Pre-Alpha', 25 | 'Intended Audience :: Developers', 26 | 'License :: OSI Approved :: GNU General Public License v3 (GPLv3)', 27 | 'Natural Language :: English', 28 | 'Programming Language :: Python :: 3.4', 29 | 'Programming Language :: Python :: 3.5', 30 | 'Programming Language :: Python :: 3.6', 31 | ], 32 | description="Social Account Detection for Python", 33 | entry_points={ 34 | 'console_scripts': [ 35 | 'socials=socials.cli:main', 36 | ], 37 | }, 38 | install_requires=requirements, 39 | license="GNU General Public License v3", 40 | long_description=readme + '\n\n' + history, 41 | include_package_data=True, 42 | keywords='socials', 43 | name='socials', 44 | packages=find_packages(include=['socials']), 45 | python_requires='>=3.4.0', 46 | setup_requires=setup_requirements, 47 | test_suite='tests', 48 | tests_require=test_requirements, 49 | url='https://github.com/lorey/socials', 50 | version='0.2.0', 51 | zip_safe=False, 52 | ) 53 | -------------------------------------------------------------------------------- /socials/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | """Top-level package for Socials.""" 4 | from socials.socials import Extraction 5 | 6 | __author__ = """Karl Lorey""" 7 | __email__ = 'git@karllorey.com' 8 | __version__ = '0.2.0' 9 | 10 | 11 | def extract(urls): 12 | return Extraction(urls) 13 | -------------------------------------------------------------------------------- /socials/cli.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | """Console script for socials.""" 4 | import sys 5 | import click 6 | 7 | 8 | @click.command() 9 | def main(args=None): 10 | """Console script for socials.""" 11 | click.echo("Replace this message by putting your code into " 12 | "socials.cli.main") 13 | click.echo("See click documentation at http://click.pocoo.org/") 14 | return 0 15 | 16 | 17 | if __name__ == "__main__": 18 | sys.exit(main()) # pragma: no cover 19 | -------------------------------------------------------------------------------- /socials/socials.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | """Main module.""" 4 | import re 5 | 6 | PLATFORM_FACEBOOK = 'facebook' 7 | PLATFORM_GITHUB = 'github' 8 | PLATFORM_LINKEDIN = 'linkedin' 9 | PLATFORM_TWITTER = 'twitter' 10 | PLATFORM_INSTAGRAM = 'instagram' 11 | PLATFORM_YOUTUBE = 'youtube' 12 | PLATFORM_EMAIL = 'email' 13 | 14 | FACEBOOK_URL_REGEXS = [ 15 | r'^http(s)?://(www\.)?(facebook|fb)\.com/[A-Za-z0-9_\-\.]+/?$', 16 | r'^http(s)?://(www\.)?(facebook|fb)\.com/profile\.php\?id=\d+$', 17 | ] 18 | 19 | GITHUB_URL_REGEXS = [ 20 | r'^http(s)?://(www\.)?github\.com/[A-Za-z0-9_-]+/?$', 21 | ] 22 | 23 | LINKEDIN_URL_REGEXS = [ 24 | # private 25 | r'^http(s)?://([\w]+\.)?linkedin\.com/in/[A-Za-z0-9_-]+/?$', 26 | r'^http(s)?://([\w]+\.)?linkedin\.com/pub/[A-Za-z0-9_-]+(\/[A-z 0-9]+){3}/?$', 27 | # companies 28 | r'^http(s)?://(www\.)?linkedin\.com/company/[A-Za-z0-9_-]+/?$', 29 | ] 30 | 31 | TWITTER_URL_REGEXS = [ 32 | r'^http(s)?://(.*\.)?twitter\.com\/[A-Za-z0-9_]+/?$', 33 | ] 34 | 35 | INSTAGRAM_URL_REGEXS = [ 36 | r'^http(s)?://(www\.)?instagram\.com/[A-Za-z0-9_.]+/?$', 37 | r'^http(s)?://(www\.)?instagr\.am/[A-Za-z0-9_.]+/?$', 38 | ] 39 | 40 | YOUTUBE_URL_REGEXS = [ 41 | r'^http(s)?://(www\.)?youtube\.com/user/[A-z0-9_.-]+/?$', 42 | r'^http(s)?://(www\.)?youtube\.com/c/[A-z0-9_.-]+/?$', 43 | r'^http(s)?://(www\.)?youtube\.com/[A-z0-9_.-]+/?$', 44 | ] 45 | 46 | EMAIL_REGEX = r'^(mailto:)?[\w\.-]+@[\w\.-]+$' 47 | 48 | 49 | PATTERNS = { 50 | PLATFORM_FACEBOOK: FACEBOOK_URL_REGEXS, 51 | PLATFORM_TWITTER: TWITTER_URL_REGEXS, 52 | PLATFORM_LINKEDIN: LINKEDIN_URL_REGEXS, 53 | PLATFORM_GITHUB: GITHUB_URL_REGEXS, 54 | PLATFORM_INSTAGRAM: INSTAGRAM_URL_REGEXS, 55 | PLATFORM_YOUTUBE: YOUTUBE_URL_REGEXS, 56 | PLATFORM_EMAIL: [EMAIL_REGEX], 57 | } 58 | 59 | ERROR_MSG_UNKNOWN_PLATFORM = 'Unknown platform, expected one of %s' % PATTERNS.keys() 60 | 61 | 62 | class Extraction(object): 63 | """Extracted profiles.""" 64 | 65 | _hrefs = None 66 | 67 | def __init__(self, hrefs): 68 | self._hrefs = hrefs 69 | 70 | def get_matches_per_platform(self): 71 | """ 72 | Get lists of profiles keyed by platform name. 73 | 74 | :return: a dictionary with the platform as a key, 75 | and a list of the platform's profiles as values. 76 | """ 77 | return extract_matches_per_platform(self._hrefs) 78 | 79 | def get_matches_for_platform(self, platform): 80 | """ 81 | Find all matches for a specific platform. 82 | 83 | :param platform: platform to search for. 84 | :return: list of matches. 85 | """ 86 | return extract_matches_for_platform(platform, self._hrefs) 87 | 88 | 89 | def extract_matches_per_platform(hrefs): 90 | """ 91 | Get lists of profiles keyed by platform name. 92 | 93 | :param hrefs: hrefs to parse. 94 | :return: a dictionary with the platform as a key, 95 | and a list of the platform's profiles as values. 96 | """ 97 | matches = {} 98 | for platform in PATTERNS.keys(): 99 | platform_matches = extract_matches_for_platform(platform, hrefs) 100 | matches[platform] = platform_matches 101 | return matches 102 | 103 | 104 | def extract_matches_for_platform(platform, hrefs): 105 | matches = [] 106 | for href in hrefs: 107 | if platform == get_platform(href): 108 | result = _clean_href(href, platform) 109 | matches.append(result) 110 | return matches 111 | 112 | 113 | def _clean_href(href, platform): 114 | """Cleans a href for a specific platform.""" 115 | result = href 116 | cleaner = get_cleaner(platform) 117 | if cleaner: 118 | result = cleaner(href) 119 | return result 120 | 121 | 122 | def get_platform(href): 123 | for platform in PATTERNS: 124 | is_match = is_platform(href, platform) 125 | if is_match: 126 | return platform 127 | return None 128 | 129 | 130 | def is_platform(href, platform): 131 | if platform not in PATTERNS: 132 | raise RuntimeError(ERROR_MSG_UNKNOWN_PLATFORM) 133 | return any(re.match(p, href) for p in PATTERNS[platform]) 134 | 135 | 136 | def clean_mailto(href): 137 | return href.replace('mailto:', '') 138 | 139 | 140 | def get_cleaner(platform): 141 | cleaners = { 142 | PLATFORM_EMAIL: clean_mailto, 143 | } 144 | 145 | if platform not in cleaners: 146 | return None 147 | return cleaners[platform] 148 | -------------------------------------------------------------------------------- /tests/test_socials.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | """Tests for `socials` package.""" 5 | 6 | import pytest 7 | 8 | from click.testing import CliRunner 9 | 10 | import socials 11 | from socials import cli 12 | 13 | 14 | @pytest.fixture 15 | def response(): 16 | """Sample pytest fixture. 17 | 18 | See more at: http://doc.pytest.org/en/latest/fixture.html 19 | """ 20 | # import requests 21 | # return requests.get('https://github.com/audreyr/cookiecutter-pypackage') 22 | 23 | 24 | def test_content(response): 25 | """Sample pytest test function with the pytest fixture as an argument.""" 26 | # from bs4 import BeautifulSoup 27 | # assert 'GitHub' in BeautifulSoup(response.content).title.string 28 | 29 | 30 | def test_command_line_interface(): 31 | """Test the CLI.""" 32 | runner = CliRunner() 33 | result = runner.invoke(cli.main) 34 | assert result.exit_code == 0 35 | assert 'socials.cli.main' in result.output 36 | help_result = runner.invoke(cli.main, ['--help']) 37 | assert help_result.exit_code == 0 38 | assert '--help Show this message and exit.' in help_result.output 39 | 40 | 41 | def test_extract(): 42 | """Test the extract method.""" 43 | urls = [ 44 | 'http://google.de', 45 | 'http://facebook.com', 46 | 'http://facebook.com/peterparker', 47 | 'http://facebook.com/peter[parker', # Invalid character 48 | 'https://www.facebook.com/profile.php?id=4', 49 | 'mailto:bill@microsoft.com', 50 | 'steve@microsoft.com', 51 | 'https://www.linkedin.com/company/google/', 52 | 'https://www.linkedin.com/comp^any/google/', # Invalid character 53 | 'http://www.twitter.com/Some_Company/', 54 | 'http://www.twitter.com/Some_\\Company', # Invalid character 55 | 'https://www.instagram.com/instagram/', 56 | 'https://www.instagram.com/instag-ram/', # Invalid character 57 | 'http://instagr.am/instagram', 58 | 'http://youtube.com/this/is/too/long', 59 | 'http://www.youtube.com/user/Some_1', 60 | 'http://youtube.com/c/your-custom-name', 61 | 'http://youtube.com/your.custom.name', 62 | ] 63 | extraction = socials.extract(urls) 64 | matches = extraction.get_matches_per_platform() 65 | assert 'facebook' in matches 66 | assert len(matches['facebook']) == 2 67 | assert 'http://facebook.com/peterparker' in matches['facebook'] 68 | assert 'https://www.facebook.com/profile.php?id=4' in matches['facebook'] 69 | 70 | assert 'email' in matches 71 | assert len(matches['email']) == 2 72 | assert 'bill@microsoft.com' in matches['email'] 73 | assert 'steve@microsoft.com' in matches['email'] 74 | 75 | assert 'linkedin' in matches 76 | assert len(matches['linkedin']) == 1 77 | assert 'https://www.linkedin.com/company/google/' in matches['linkedin'] 78 | 79 | assert 'twitter' in matches 80 | assert len(matches['twitter']) == 1 81 | assert 'http://www.twitter.com/Some_Company/' in matches['twitter'] 82 | 83 | assert 'instagram' in matches 84 | assert len(matches['instagram']) == 2 85 | assert 'https://www.instagram.com/instagram/' in matches['instagram'] 86 | assert 'http://instagr.am/instagram' in matches['instagram'] 87 | 88 | assert 'youtube' in matches 89 | assert len(matches['youtube']) == 3 90 | assert 'http://www.youtube.com/user/Some_1' in matches['youtube'] 91 | assert 'http://youtube.com/c/your-custom-name' in matches['youtube'] 92 | assert 'http://youtube.com/your.custom.name' in matches['youtube'] 93 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | envlist = py34, py35, py36, flake8 3 | 4 | [travis] 5 | python = 6 | 3.6: py36 7 | 3.5: py35 8 | 3.4: py34 9 | 10 | [testenv:flake8] 11 | basepython = python 12 | deps = flake8 13 | commands = flake8 socials 14 | 15 | [testenv] 16 | setenv = 17 | PYTHONPATH = {toxinidir} 18 | deps = 19 | -r{toxinidir}/requirements_dev.txt 20 | ; If you want to make tox run the tests with the same versions, create a 21 | ; requirements.txt with the pinned versions and uncomment the following line: 22 | ; -r{toxinidir}/requirements.txt 23 | commands = 24 | pip install -U pip 25 | py.test --basetemp={envtmpdir} 26 | 27 | 28 | --------------------------------------------------------------------------------