├── .gitignore ├── .travis.yml ├── AUTHORS.rst ├── CONTRIBUTING.rst ├── HISTORY.rst ├── LICENSE ├── MANIFEST.in ├── Makefile ├── README.rst ├── TODO.txt ├── requirements-python-2.txt ├── requirements-python-3.txt ├── searchcmd ├── __init__.py ├── cache.py ├── cmdextract.py ├── commands.py ├── download.py └── search_engines.py ├── setup.cfg ├── setup.py ├── tests ├── __init__.py ├── test_cache.py ├── test_cmdextract.py ├── test_commands.py ├── test_download.py ├── test_search_engines.py ├── test_searchcmd.py ├── testdata │ ├── cmdextract │ │ ├── brunolinux.com │ │ ├── cyberciti.biz │ │ ├── stackoverflow.com │ │ └── unixmantra.com │ └── search_engines │ │ ├── bing.com │ │ └── google.com └── testutils.py └── tox.ini /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *[~#] 3 | .cache 4 | .eggs 5 | searchcmd.egg-info 6 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | 3 | python: 4 | - 2.7 5 | - 3.4 6 | 7 | before_install: 8 | - sudo apt-get update -qq 9 | - sudo apt-get install -y libxml2-dev libxslt-dev 10 | 11 | script: "python setup.py test" 12 | -------------------------------------------------------------------------------- /AUTHORS.rst: -------------------------------------------------------------------------------- 1 | ======= 2 | Credits 3 | ======= 4 | 5 | Development Lead 6 | ---------------- 7 | 8 | * Jimmy Petersson 9 | 10 | Contributors 11 | ------------ 12 | 13 | None yet. Why not be the first? 14 | -------------------------------------------------------------------------------- /CONTRIBUTING.rst: -------------------------------------------------------------------------------- 1 | ============ 2 | Contributing 3 | ============ 4 | 5 | Contributions are welcome, and they are greatly appreciated! Every 6 | little bit helps, and credit will always be given. 7 | 8 | You can contribute in many ways: 9 | 10 | Types of Contributions 11 | ---------------------- 12 | 13 | Report Bugs 14 | ~~~~~~~~~~~ 15 | 16 | Report bugs at https://github.com/jimmyppi/searchcmd/issues. 17 | 18 | If you are reporting a bug, please include: 19 | 20 | * Your operating system name and version. 21 | * Any details about your local setup that might be helpful in troubleshooting. 22 | * Detailed steps to reproduce the bug. 23 | 24 | Fix Bugs 25 | ~~~~~~~~ 26 | 27 | Look through the GitHub issues for bugs. Anything tagged with "bug" 28 | is open to whoever wants to implement it. 29 | 30 | Implement Features 31 | ~~~~~~~~~~~~~~~~~~ 32 | 33 | Look through the GitHub issues for features. Anything tagged with "feature" 34 | is open to whoever wants to implement it. 35 | 36 | Write Documentation 37 | ~~~~~~~~~~~~~~~~~~~ 38 | 39 | searchcmd could always use more documentation, whether as part of the 40 | official searchcmd docs, in docstrings, or even on the web in blog posts, 41 | articles, and such. 42 | 43 | Submit Feedback 44 | ~~~~~~~~~~~~~~~ 45 | 46 | The best way to send feedback is to file an issue at https://github.com/jimmyppi/searchcmd/issues. 47 | 48 | If you are proposing a feature: 49 | 50 | * Explain in detail how it would work. 51 | * Keep the scope as narrow as possible, to make it easier to implement. 52 | * Remember that this is a volunteer-driven project, and that contributions 53 | are welcome :) 54 | 55 | Get Started! 56 | ------------ 57 | 58 | Ready to contribute? Here's how to set up `searchcmd` for local development. 59 | 60 | 1. Fork the `searchcmd` repo on GitHub. 61 | 2. Clone your fork locally:: 62 | 63 | $ git clone git@github.com:your_name_here/searchcmd.git 64 | 65 | 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:: 66 | 67 | $ mkvirtualenv searchcmd 68 | $ cd searchcmd/ 69 | $ python setup.py develop 70 | 71 | 4. Create a branch for local development:: 72 | 73 | $ git checkout -b name-of-your-bugfix-or-feature 74 | 75 | Now you can make your changes locally. 76 | 77 | 5. When you're done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:: 78 | 79 | $ flake8 searchcmd tests 80 | $ python setup.py test 81 | $ tox 82 | 83 | To get flake8 and tox, just pip install them into your virtualenv. 84 | 85 | 6. Commit your changes and push your branch to GitHub:: 86 | 87 | $ git add . 88 | $ git commit -m "Your detailed description of your changes." 89 | $ git push origin name-of-your-bugfix-or-feature 90 | 91 | 7. Submit a pull request through the GitHub website. 92 | 93 | Pull Request Guidelines 94 | ----------------------- 95 | 96 | Before you submit a pull request, check that it meets these guidelines: 97 | 98 | 1. The pull request should include tests. 99 | 2. If the pull request adds functionality, the docs should be updated. Put 100 | your new functionality into a function with a docstring, and add the 101 | feature to the list in README.rst. 102 | 3. The pull request should work for Python 2.6, 2.7, 3.3, and 3.4, and for PyPy. Check 103 | https://travis-ci.org/jimmyppi/searchcmd/pull_requests 104 | and make sure that the tests pass for all supported Python versions. 105 | 106 | Tips 107 | ---- 108 | 109 | To run a subset of tests:: 110 | 111 | $ python -m unittest tests.test_searchcmd 112 | -------------------------------------------------------------------------------- /HISTORY.rst: -------------------------------------------------------------------------------- 1 | .. :changelog: 2 | 3 | History 4 | ------- 5 | 6 | 0.1.0 (2015-06-01) 7 | --------------------- 8 | 9 | * First release on PyPI. 10 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015, Jimmy Petersson 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 5 | 6 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 7 | 8 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 9 | 10 | * Neither the name of searchcmd nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 11 | 12 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include AUTHORS.rst 2 | include CONTRIBUTING.rst 3 | include HISTORY.rst 4 | include LICENSE 5 | include README.rst 6 | 7 | recursive-include tests * 8 | recursive-exclude * __pycache__ 9 | recursive-exclude * *.py[co] 10 | 11 | recursive-include docs *.rst conf.py Makefile make.bat 12 | 13 | include requirements-python-2.txt 14 | include requirements-python-3.txt 15 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: clean-pyc clean-build docs clean 2 | 3 | help: 4 | @echo "clean - remove all build, test, coverage and Python artifacts" 5 | @echo "clean-build - remove build artifacts" 6 | @echo "clean-pyc - remove Python file artifacts" 7 | @echo "clean-test - remove test and coverage artifacts" 8 | @echo "lint - check style with flake8" 9 | @echo "test - run tests quickly with the default Python" 10 | @echo "test-all - run tests on every Python version with tox" 11 | @echo "coverage - check code coverage quickly with the default Python" 12 | @echo "docs - generate Sphinx HTML documentation, including API docs" 13 | @echo "release - package and upload a release" 14 | @echo "dist - package" 15 | @echo "install - install the package to the active Python's site-packages" 16 | 17 | clean: clean-build clean-pyc clean-test 18 | 19 | clean-build: 20 | rm -fr build/ 21 | rm -fr dist/ 22 | rm -fr .eggs/ 23 | find . -name '*.egg-info' -exec rm -fr {} + 24 | find . -name '*.egg' -exec rm -f {} + 25 | 26 | clean-pyc: 27 | find . -name '*.pyc' -exec rm -f {} + 28 | find . -name '*.pyo' -exec rm -f {} + 29 | find . -name '*~' -exec rm -f {} + 30 | find . -name '__pycache__' -exec rm -fr {} + 31 | 32 | clean-test: 33 | rm -fr .tox/ 34 | rm -f .coverage 35 | rm -fr htmlcov/ 36 | 37 | lint: 38 | flake8 searchcmd tests 39 | 40 | test: 41 | python setup.py test 42 | 43 | test-all: 44 | tox 45 | 46 | coverage: 47 | coverage run --source searchcmd setup.py test 48 | coverage report -m 49 | coverage html 50 | open htmlcov/index.html 51 | 52 | docs: 53 | rm -f docs/searchcmd.rst 54 | rm -f docs/modules.rst 55 | sphinx-apidoc -o docs/ searchcmd 56 | $(MAKE) -C docs clean 57 | $(MAKE) -C docs html 58 | open docs/_build/html/index.html 59 | 60 | release: clean 61 | python setup.py sdist upload 62 | python setup.py bdist_wheel upload 63 | 64 | dist: clean 65 | python setup.py sdist 66 | python setup.py bdist_wheel 67 | ls -l dist 68 | 69 | install: clean 70 | python setup.py install 71 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | =============================== 2 | searchcmd 3 | =============================== 4 | 5 | .. image:: https://img.shields.io/travis/jimmyppi/searchcmd.svg 6 | :target: https://travis-ci.org/jimmyppi/searchcmd 7 | 8 | .. image:: https://img.shields.io/pypi/v/searchcmd.svg 9 | :target: https://pypi.python.org/pypi/searchcmd 10 | 11 | * Free software: BSD license 12 | 13 | Get help from your friends on the internets without leaving your best friend the cli. 14 | 15 | Motivation 16 | ---------- 17 | 18 | Many commands have really many flags and the man pages often lack examples. It is usually faster to go to your browser and search for what you want to do. 19 | 20 | Installation 21 | ------------ 22 | 23 | :: 24 | 25 | sudo apt-get install libxml2-dev libxslt-dev python-dev 26 | pip install searchcmd 27 | 28 | Examples 29 | -------- 30 | 31 | :: 32 | 33 | # searchcmd git commit "change last commit message" 34 | git commit (git-scm.com, kernel.org) 35 | git commit --amend (help.github.com) 36 | git commit –amend -m ‘new message’ (makandracards.com) 37 | git commit --amend --no-edit (kernel.org) 38 | git commit -c ORIG_HEAD (kernel.org) 39 | 40 | :: 41 | 42 | # searchcmd find "sort files by size" 43 | find . -type f -printf "%s\t%p\n" | sort -n (unix.stackexchange.com) 44 | find . -type f | xargs du -h | sort -rn (unix.stackexchange.com) 45 | find . -type f -print0 | xargs -0 ls -la | awk '{print int($5/1000) " KB\t" $9}' | sort -n -r -k1 (unix.stackexchange.com) 46 | find . -type f -ls | sort -r -n -k7 (unix.stackexchange.com) 47 | find . -type f -ls -printf '\0' | sort -zk7rn | tr -d '\0' (unix.stackexchange.com) 48 | 49 | Manual 50 | ------ 51 | 52 | :: 53 | 54 | usage: searchcmd [-h] [-v] [--no-cache] [--engine {bing,google}] [-n MAX_HITS] 55 | [--max-download MAX_DOWNLOAD] 56 | query [query ...] 57 | 58 | positional arguments: 59 | query Type a command and/or describe what you want to do in 60 | quotes. 61 | 62 | optional arguments: 63 | -h, --help show this help message and exit 64 | -v, --verbose Include source url in output. 65 | --no-cache Skip cache, always do a new search. 66 | --engine {bing,google} 67 | The search engine to use. 68 | -n MAX_HITS, --max-hits MAX_HITS 69 | Max number of commands to show. 70 | --max-download MAX_DOWNLOAD 71 | Download max this number of search hits. 72 | 73 | Examples: 74 | searchcmd git commit "change last commit message" 75 | searchcmd find directory 76 | searchcmd "search replace" 77 | 78 | Similar projects 79 | ---------------- 80 | 81 | * Useful examples at the command line: https://github.com/srsudar/eg 82 | * Search commandlinefu.com from the terminal: https://github.com/ncrocfer/clf 83 | 84 | Todo 85 | ---- 86 | 87 | * Support for recognizing more advanced prompts. Example: ``um@server#find . -name "*sh*"`` 88 | * Merge commands that do the same thing. 89 | * Support for beautifulsoup in py3. 90 | * An open ended search (for example "search replace") will only find commands that are installed on the system. Better filtering of false positives is needed to allow unknown commands. A solution could be to train a probabilistic parser like https://github.com/datamade/parserator 91 | -------------------------------------------------------------------------------- /TODO.txt: -------------------------------------------------------------------------------- 1 | TODO 2 | ==== 3 | * Support for more advanced prompt. Example: um@server#find . -name "*sh*" 4 | * Many false positives. Example: "or space)" 5 | * Should be possible to merge commands. All with same name and same flags 6 | should at least be related. 7 | * Support for beautifulsoup in py3. 8 | 9 | DONE 10 | ==== 11 | * cli (-v --verbose -e --engine --no-cache) 12 | * Prettier output, color output? 13 | * cli flags have been reversed 14 | * cache does not work to load from 15 | * Sometimes you do not want to split on '\n' ("search replace") 16 | Sometimes you do ("get process id", docker "remove stopped containers"). 17 | In those two examples, the output of the command is included in a code-tag. 18 | tar "unpack", listing of examples in one code block. But that was in 19 | pre-tag, which should always be splitted by \n. 20 | * Handle of when command starts with sudo 21 | * Print download progress (one dot per downloaded search result?), 22 | x when error 23 | * Implement download.get 24 | * Support for using beautifulsoup if lxml fails 25 | * cache (store in tmp), store as json? to_json/from_json for commands- 26 | * Test example with unicode: date "set time", cyberciti.biz 27 | * Split command on pipe (|). Example: Want xargs examples, but xargs is mostly 28 | invoked by piping other result to it: find ... | xargs ... 29 | * Tests (py2 + py3) 30 | * Package 31 | -------------------------------------------------------------------------------- /requirements-python-2.txt: -------------------------------------------------------------------------------- 1 | requests-futures 2 | lxml 3 | cssselect 4 | beautifulsoup 5 | pygments 6 | tld 7 | -------------------------------------------------------------------------------- /requirements-python-3.txt: -------------------------------------------------------------------------------- 1 | requests-futures 2 | lxml 3 | cssselect 4 | pygments 5 | tld 6 | -------------------------------------------------------------------------------- /searchcmd/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | """ 3 | Main entry point of the application. 4 | """ 5 | import sys 6 | import argparse 7 | 8 | from searchcmd.search_engines import get_engine, ENGINES 9 | from searchcmd.cmdextract import extract_commands 10 | from searchcmd.download import get, iter_get, Request, DownloadError 11 | from searchcmd import cache 12 | 13 | EXAMPLES = """Examples: 14 | searchcmd git commit "change last commit message" 15 | searchcmd find directory 16 | searchcmd "search replace" 17 | """ 18 | 19 | 20 | def get_print_func(io): 21 | """This is a workaround go get mocking of stdout to work with 22 | both py2 and py3. 23 | """ 24 | if sys.version_info[0] == 2: 25 | return io.write 26 | else: 27 | return io.writelines 28 | 29 | stdout = get_print_func(sys.stdout) 30 | stderr = get_print_func(sys.stderr) 31 | 32 | 33 | class SearchError(Exception): 34 | pass 35 | 36 | 37 | def get_arg_parser(): 38 | parser = argparse.ArgumentParser( 39 | epilog=EXAMPLES, formatter_class=argparse.RawDescriptionHelpFormatter) 40 | parser.add_argument( 41 | 'query', nargs='+', 42 | help="Type a command and/or describe what you want to do in quotes.") 43 | parser.add_argument('-v', '--verbose', action='store_true', 44 | help='Include source url in output.') 45 | parser.add_argument('--no-cache', action='store_true', 46 | help='Skip cache, always do a new search.') 47 | parser.add_argument('--engine', help='The search engine to use.', 48 | default='google', choices=ENGINES.keys()) 49 | parser.add_argument('-n', '--max-hits', default=5, type=int, 50 | help='Max number of commands to show.') 51 | parser.add_argument('--max-download', default=5, type=int, 52 | help='Download max this number of search hits.') 53 | return parser 54 | 55 | 56 | def main(args=None): 57 | """ 58 | Args: 59 | args (list): Command line arguments. 60 | Returns: 61 | int: exit code. 62 | """ 63 | parser = get_arg_parser() 64 | args = parser.parse_args(args) 65 | 66 | query_string, cmd = get_query_string(args.query) 67 | 68 | search_args = {'query_string': query_string, 69 | 'cmd': cmd, 70 | 'search_engine': args.engine, 71 | 'max_download': args.max_download} 72 | commands = None 73 | if not args.no_cache: 74 | commands = cache.get(**search_args) 75 | if commands is None: 76 | try: 77 | commands = search(**search_args) 78 | except SearchError as e: 79 | stderr(str(e) + u'\n') 80 | return 1 81 | cache.store(commands, **search_args) 82 | for cmd in commands.rank_commands(nr=args.max_hits): 83 | stdout(cmd.echo(verbose=args.verbose) + u'\n') 84 | return 0 85 | 86 | 87 | def parse_query(orig_query): 88 | """Divide query into command name and search query. 89 | 90 | Args: 91 | orig_query ([str]): The original query. 92 | Returns: 93 | (str, str): Command name and search query. 94 | 95 | Some assumptions are made about the original query: 96 | 97 | * If only one element in the orig query and the element contains only one 98 | word, assume that it is a command and we want to see general examples 99 | of how to use this command. 100 | 101 | Example: 102 | parse_query(['find']) == ('find', 'examples') 103 | 104 | * If only one element in the orig query and the element contains more 105 | than one word, assume that it is a description of what we want to do 106 | and the exact command is unknown. 107 | 108 | Example: 109 | parse_query(['search replace']) == (None, 'search replace') 110 | 111 | * If more than one element in query, assume that the last element is 112 | a description of what we want to do and the elements before define the 113 | command. 114 | 115 | Example: 116 | parse_query(['git', 'commit', 'change last commit message']) == ( 117 | 'git commit', 'change last commit message') 118 | """ 119 | if len(orig_query) == 1: 120 | orig_query = orig_query[0] 121 | if len(orig_query.split()) == 1: 122 | return orig_query, 'examples' 123 | else: 124 | return None, orig_query 125 | else: 126 | return ' '.join(orig_query[:-1]), orig_query[-1] 127 | 128 | 129 | def get_query_string(orig_query): 130 | if not orig_query: 131 | raise ValueError('No query provided') 132 | cmd, query = parse_query(orig_query) 133 | return ' '.join([part for part in ['linux', cmd, query] if part]), cmd 134 | 135 | 136 | def search(query_string=None, cmd=None, search_engine='google', 137 | max_download=5): 138 | engine = get_engine(search_engine) 139 | search_req = engine.get_search_request(query_string) 140 | search_result = get(search_req) 141 | if isinstance(search_result, DownloadError): 142 | raise SearchError('Failed search on {} ({})'.format( 143 | search_engine, search_result.status_code)) 144 | urls = engine.get_hits(search_result) 145 | docs = iter_get([Request(u.url) for u in urls[:max_download]]) 146 | 147 | return extract_commands(docs, base_commands=cmd) 148 | 149 | 150 | if __name__ == '__main__': 151 | sys.exit(main()) 152 | -------------------------------------------------------------------------------- /searchcmd/cache.py: -------------------------------------------------------------------------------- 1 | """ 2 | Simple caching of objects to file in /tmp. 3 | 4 | Usage: 5 | 6 | >>> store(obj, **{identifiers}) 7 | >>> obj = get(**{identifiers}) 8 | 9 | The object obj must only contain json serializable built-ins and 10 | the types in the dict TYPES. 11 | """ 12 | 13 | import os 14 | import hashlib 15 | import json 16 | import errno 17 | 18 | from searchcmd.commands import Commands, Command 19 | from searchcmd.download import HtmlDocument, Url 20 | 21 | # Supported custom types. They must implement from_dict and to_dict 22 | TYPES = {cls.__name__: cls for cls in [ 23 | Commands, Command, HtmlDocument, Url]} 24 | 25 | CACHE_DIR = os.path.sep.join(['', 'tmp', 'searchcmd']) 26 | try: 27 | os.makedirs(CACHE_DIR) 28 | except OSError: 29 | pass 30 | 31 | 32 | def get(**args): 33 | try: 34 | with open(get_file_name(**args)) as inp: 35 | return json.load(inp, cls=CustomTypeDecoder) 36 | except IOError as e: 37 | if e.errno == errno.ENOENT: 38 | return None 39 | raise 40 | 41 | 42 | def store(commands, **args): 43 | with open(get_file_name(**args), 'w') as out: 44 | return json.dump(commands, out, cls=CustomTypeEncoder) 45 | 46 | 47 | def get_file_name(**args): 48 | return os.path.join( 49 | CACHE_DIR, hashlib.sha1( 50 | repr(sorted(args.items())).encode('utf-8')).hexdigest()) 51 | 52 | 53 | class CustomTypeEncoder(json.JSONEncoder): 54 | """A custom JSONEncoder class that knows how to encode core custom 55 | objects. 56 | 57 | Inspired by: http://stackoverflow.com/a/2343640/2874515 58 | 59 | Custom objects are encoded as JSON object literals (ie, dicts) with 60 | one key, '__TypeName__' where 'TypeName' is the actual name of the 61 | type to which the object belongs. That single key maps to another 62 | object literal which is just the __dict__ of the object encoded.""" 63 | 64 | def default(self, obj): 65 | if isinstance(obj, tuple(TYPES.values())): 66 | key = '__%s__' % obj.__class__.__name__ 67 | return {key: obj.to_dict()} 68 | return json.JSONEncoder.default(self, obj) 69 | 70 | 71 | class CustomTypeDecoder(json.JSONDecoder): 72 | 73 | def decode(self, s): 74 | 75 | def load_objects(obj): 76 | if isinstance(obj, list): 77 | return [load_objects(e) for e in obj] 78 | elif isinstance(obj, dict): 79 | if len(obj) == 1: 80 | type_name, value = list(obj.items())[0] 81 | type_name = type_name.strip('_') 82 | if type_name in TYPES: 83 | return TYPES[type_name].from_dict(load_objects(value)) 84 | return {key: load_objects(value) 85 | for key, value in obj.items()} 86 | else: 87 | return obj 88 | 89 | return load_objects(json.JSONDecoder.decode(self, s)) 90 | -------------------------------------------------------------------------------- /searchcmd/cmdextract.py: -------------------------------------------------------------------------------- 1 | import re 2 | import subprocess 3 | from collections import defaultdict 4 | 5 | from lxml import etree 6 | 7 | from searchcmd.commands import Command, Commands 8 | from searchcmd.download import HtmlDocument 9 | 10 | 11 | def list_available_commands(): 12 | """Return list of available commands on this computer.""" 13 | return [cmd for cmd in subprocess.check_output( 14 | 'compgen -abc', shell=True, executable='/bin/bash').decode().split() 15 | if cmd and cmd[0].isalnum()] 16 | ALL_COMMANDS = list_available_commands() 17 | 18 | 19 | def extract_commands(html_docs, base_commands=None): 20 | """Extract all commands in the html documents. 21 | 22 | Args: 23 | html_docs (HtmlDocument or iterable of docs): The html documents. 24 | base_commands (str or iterable of str): If provided, limit the results 25 | to these commands. 26 | Returns: 27 | commands (Commands): Collection of found command examples. 28 | """ 29 | if isinstance(html_docs, HtmlDocument): 30 | html_docs = [html_docs] 31 | extractor = CommandExtractor(base_commands) 32 | commands = Commands() 33 | for doc in html_docs: 34 | seen = set() 35 | try: 36 | nr_cmds = 0 37 | for line_nr, cmd in extractor.iter_commands(doc): 38 | if cmd in seen: 39 | continue 40 | seen.add(cmd) 41 | commands.add_command(Command(cmd, line_nr, nr_cmds, doc)) 42 | nr_cmds += 1 43 | commands.nr_docs += 1 44 | except: 45 | continue 46 | 47 | if base_commands: 48 | return commands 49 | 50 | # Only keep command names with more than one occurence 51 | commands_by_name = defaultdict(list) 52 | for command in commands: 53 | commands_by_name[command.name].append(command) 54 | keep = {} 55 | for coms in commands_by_name.values(): 56 | if len(coms) > 1 or len(coms[0].lines) > 1: 57 | for com in coms: 58 | keep[com.cmd] = com 59 | return Commands(keep, commands.nr_docs) 60 | 61 | 62 | class CommandExtractor(object): 63 | """Extract commands from a html document. 64 | 65 | Usage: 66 | 67 | >>> extractor = CommandExtractor('git') 68 | >>> for line_nr, cmd in extractor.iter_commands(html_doc) 69 | >>> ... 70 | 71 | """ 72 | IGNORE_TAGS = set(['style', 'head', 'script', etree.Comment]) 73 | COMMAND_REX = re.compile(( 74 | # The command could start with a prompt 75 | r'^[\$\>\:\#\%%]*\s*' 76 | # The command could start with sudo 77 | r'(?P(sudo\s+|)' 78 | # Command must start with a lower case letter or a digit. 79 | # A more detailed check of the command will be done in is_command. 80 | r'[a-z0-9][a-z0-9\-]*' 81 | # The rest of the command that contains sub command name, 82 | # options and arguments. 83 | r'\s.+$)')) 84 | 85 | RE_COMMAND_NAME = re.compile(( 86 | r'^(' 87 | # Must be a letter if the name only contain one char 88 | r'[a-z]|' 89 | # Must start and end with letter or digit, but can contain '-' 90 | r'[a-z0-9][a-z0-9\-]*[a-z0-9]' 91 | r')$')) 92 | 93 | BASE_CMDS_REX = r'^(%s)\b' 94 | RE_ALL_COMMANDS = re.compile(BASE_CMDS_REX % '|'.join( 95 | map(re.escape, ALL_COMMANDS))) 96 | 97 | RE_SPACE = re.compile(r'\s+', re.U+re.S) 98 | RE_ONLY_LETTERS = re.compile('^[a-z]+$', re.I) 99 | RE_SENTENCE_END = re.compile(r'[a-z][\.\!\?\:]+$') 100 | RE_FLAG = re.compile(r'^(-){1,2}[a-z0-9]') 101 | RE_SUDO = re.compile(r'^sudo\s+') 102 | 103 | MAX_COMMAND_LENGTH = 200 104 | # The command must not contain more than this many consecutively 105 | # words that only contain letters. 106 | MAX_CONSECUTIVELY_LETTER_WORDS = 2 107 | 108 | def __init__(self, base_commands=None): 109 | if isinstance(base_commands, str): 110 | base_commands = [base_commands] 111 | if base_commands: 112 | base_commands = set(base_commands) 113 | if base_commands and 'sudo' in base_commands: 114 | self.remove_sudo = lambda cmd: cmd 115 | else: 116 | self.remove_sudo = lambda cmd: self.RE_SUDO.sub('', cmd) 117 | self.re_wanted_commands = self._get_wanted_rex(base_commands) 118 | 119 | def _get_wanted_rex(self, base_commands): 120 | if not base_commands: 121 | return self.RE_ALL_COMMANDS 122 | else: 123 | return re.compile(self.BASE_CMDS_REX % '|'.join( 124 | map(re.escape, base_commands))) 125 | 126 | def iter_commands(self, html_doc): 127 | """Generate commands found in the html document.""" 128 | for line, txt in self.iter_text_lines(html_doc): 129 | cmd = self.get_command(txt) 130 | if cmd: 131 | yield line, cmd 132 | 133 | def get_command(self, txt): 134 | """Return command found in text or None if no command was found.""" 135 | m = self.COMMAND_REX.search(txt) 136 | if not m: 137 | return 138 | cmd = m.group('cmd') 139 | cmd = self.remove_sudo(cmd) 140 | if not self.is_command(cmd): 141 | return 142 | if not self.has_wanted_command(cmd): 143 | return 144 | return cmd 145 | 146 | def has_wanted_command(self, line): 147 | """Return True if the line contains any of the wanted commands.""" 148 | cmds = line.split(' | ') 149 | for cmd in cmds: 150 | cmd = cmd.strip() 151 | if self.re_wanted_commands.search(cmd): 152 | return True 153 | return False 154 | 155 | def is_command_name(self, word): 156 | """Return True if word is a valid command name.""" 157 | if not word: 158 | return False 159 | if word.isdigit(): 160 | return False 161 | return True if self.RE_COMMAND_NAME.match(word) else False 162 | 163 | def is_command_output(self, line): 164 | """Return True if the line looks like output from a command.""" 165 | if '\t' in line or ' ' in line: 166 | return True 167 | return False 168 | 169 | def is_command(self, candidate): 170 | """Return True if the candidate looks like a command.""" 171 | if not candidate: 172 | return False 173 | if len(candidate) > self.MAX_COMMAND_LENGTH: 174 | return False 175 | if self.RE_SENTENCE_END.search(candidate): 176 | return False 177 | if self.is_command_output(candidate): 178 | return False 179 | words = candidate.split() 180 | if not self.is_command_name(words[0]): 181 | return False 182 | 183 | only_letters = [] 184 | others = [] 185 | flags = [] 186 | nr_consecutively_letter_words = 0 187 | for word in words: 188 | if self.RE_ONLY_LETTERS.match(word): 189 | nr_consecutively_letter_words += 1 190 | if nr_consecutively_letter_words > \ 191 | self.MAX_CONSECUTIVELY_LETTER_WORDS: 192 | return False 193 | only_letters.append(word) 194 | else: 195 | nr_consecutively_letter_words = 0 196 | if self.RE_FLAG.match(word): 197 | flags.append(word) 198 | others.append(word) 199 | if flags: 200 | return True 201 | #if not others: 202 | # return False 203 | return True 204 | 205 | def iter_text_lines(self, html_doc): 206 | """Generate text lines found in the html document.""" 207 | for line, txt in self._iter_texts(html_doc.tree): 208 | yield line, txt 209 | 210 | def _iter_texts(self, tree, in_pre=False): 211 | """Find text snippets recursively in the html tree. 212 | 213 | If in a pre-tag, split on line breaks. 214 | If the tree is a code tag, merge all text snippets in the tree. 215 | """ 216 | in_pre = in_pre or tree.tag == 'pre' 217 | 218 | def iter_tree_texts(): 219 | if tree.tag == 'code': 220 | yield tree.sourceline, tree.text_content() 221 | return 222 | line = tree.sourceline 223 | if not self.skip_tree(tree): 224 | if tree.text: 225 | yield line, tree.text 226 | for child in tree.getchildren(): 227 | for line, txt in self._iter_texts(child, in_pre): 228 | yield line, txt 229 | if tree.tail: 230 | yield line, tree.tail 231 | txt_gen = iter_tree_texts() 232 | 233 | if not in_pre: 234 | for line, txt in txt_gen: 235 | txt = self.fix_space(txt) 236 | if txt: 237 | yield line, txt 238 | else: 239 | for first_line, txt in txt_gen: 240 | for line, txt_line in enumerate(txt.split('\n')): 241 | txt_line = txt_line.strip() 242 | if txt_line: 243 | yield first_line + line, txt_line 244 | 245 | def skip_tree(self, tree): 246 | if tree.tag == 'a' and tree.attrib.get('href'): 247 | return True 248 | return tree.tag in self.IGNORE_TAGS 249 | 250 | def fix_space(self, txt): 251 | return self.RE_SPACE.sub(' ', txt.strip()) 252 | -------------------------------------------------------------------------------- /searchcmd/commands.py: -------------------------------------------------------------------------------- 1 | from operator import itemgetter 2 | from collections import Counter 3 | 4 | from pygments import highlight 5 | from pygments.lexers import BashLexer 6 | from pygments.formatters import TerminalFormatter 7 | 8 | LEXER = BashLexer() 9 | FORMATTER = TerminalFormatter() 10 | 11 | 12 | class Command(object): 13 | 14 | def __init__(self, cmd, line, idx, doc): 15 | self.cmd = cmd 16 | self.name = cmd.split()[0] 17 | self.lines = [line] 18 | self.idxs = [idx] 19 | self.docs = [doc] 20 | self.domains = Counter({doc.url.domain: 1}) 21 | 22 | def to_dict(self): 23 | return {'cmd': self.cmd, 24 | 'lines': self.lines, 25 | 'idxs': self.idxs, 26 | 'docs': self.docs} 27 | 28 | @classmethod 29 | def from_dict(cls, d): 30 | cmd = d['cmd'] 31 | merged = None 32 | for line, idx, doc in zip(d['lines'], d['idxs'], d['docs']): 33 | inst = cls(cmd, line, idx, doc) 34 | if merged is None: 35 | merged = inst 36 | else: 37 | merged.add_duplicate(inst) 38 | return merged 39 | 40 | def __eq__(self, cmd): 41 | # TODO: More advanced comparison 42 | return self.cmd == cmd.cmd 43 | 44 | def add_duplicate(self, cmd): 45 | self.lines.extend(cmd.lines) 46 | self.idxs.extend(cmd.idxs) 47 | self.docs.extend(cmd.docs) 48 | self.domains.update(cmd.domains) 49 | 50 | def echo(self, verbose=False): 51 | """ 52 | Example output: 53 | 54 | cmd --flag (fromdomain.com, otherdomain.com) 55 | 56 | Include urls to all sources if verbose: 57 | 58 | cmd --flag (fromdomain.com) 59 | http://fromdomain.com/full/path 60 | ... 61 | 62 | """ 63 | cmd = highlight(self.cmd, LEXER, FORMATTER).strip() 64 | domains = u'({})'.format( 65 | u', '.join(d for d,_ in self.domains.most_common(2))) 66 | s = u'{}\t{}'.format(cmd, domains) 67 | if verbose: 68 | s += u'\n {}'.format( 69 | u'\n'.join([u'\t{}'.format(doc.url.url) for doc in self.docs])) 70 | return s 71 | 72 | def score(self, nr_docs): 73 | nr_docs = float(nr_docs) 74 | score = 0.0 75 | for line, doc in zip(self.lines, self.docs): 76 | score += (doc.nr_lines/(doc.nr_lines + line)) * \ 77 | (nr_docs/(nr_docs + doc.idx)) 78 | return score 79 | 80 | def __repr__(self): 81 | return ''.format(self.cmd.encode('utf-8')) 82 | 83 | 84 | class Commands(object): 85 | 86 | def __init__(self, commands=None, nr_docs=0): 87 | self.commands = commands or {} 88 | self.nr_docs = nr_docs 89 | 90 | def add_command(self, cmd): 91 | if cmd.cmd in self.commands: 92 | self.commands[cmd.cmd].add_duplicate(cmd) 93 | else: 94 | self.commands[cmd.cmd] = cmd 95 | 96 | def rank_commands(self, nr=5): 97 | cmds = [(cmd.score(self.nr_docs), cmd) 98 | for cmd in self] 99 | cmds.sort(key=itemgetter(0), reverse=True) 100 | return [cmd for _, cmd in cmds[:nr]] 101 | 102 | def __iter__(self): 103 | for command in self.commands.values(): 104 | yield command 105 | 106 | def to_dict(self): 107 | return {'commands': self.commands, 108 | 'nr_docs': self.nr_docs} 109 | 110 | @classmethod 111 | def from_dict(cls, d): 112 | return cls(d['commands'], d['nr_docs']) 113 | -------------------------------------------------------------------------------- /searchcmd/download.py: -------------------------------------------------------------------------------- 1 | import re 2 | import sys 3 | try: 4 | from urllib.parse import urlparse 5 | except ImportError: 6 | from urlparse import urlparse 7 | from concurrent.futures import as_completed 8 | 9 | from requests.packages import urllib3 10 | from requests_futures.sessions import FuturesSession 11 | from lxml.html import fromstring, tostring 12 | try: 13 | from lxml.html import soupparser 14 | except ImportError: 15 | soupparser = None 16 | import tld 17 | 18 | urllib3.disable_warnings() 19 | 20 | 21 | def get(request): 22 | session = FuturesSession(max_workers=1) 23 | future = next(as_completed([session.get( 24 | request.url, headers=request.headers, timeout=request.timeout)])) 25 | if future.exception() is not None: 26 | return DownloadError(request, future.exception()) 27 | else: 28 | resp = future.result() 29 | return HtmlDocument(resp.url, resp.content) 30 | 31 | 32 | def iter_get(requests, verbose=True): 33 | if isinstance(requests, Request): 34 | requests = [requests] 35 | session = FuturesSession(max_workers=10) 36 | futures_to_req = dict( 37 | (session.get(req.url, headers=req.headers, timeout=req.timeout), 38 | (req, i)) for i, req in enumerate(requests)) 39 | for future in as_completed(futures_to_req): 40 | if future.exception() is not None: 41 | req, idx = futures_to_req[future] 42 | if verbose: 43 | sys.stdout.writelines(u'x') 44 | sys.stdout.flush() 45 | yield DownloadError(req, future.exception(), idx) 46 | else: 47 | resp = future.result() 48 | _, idx = futures_to_req[future] 49 | if verbose: 50 | sys.stdout.writelines(u'.') 51 | sys.stdout.flush() 52 | yield HtmlDocument(resp.url, resp.content, idx) 53 | if verbose: 54 | sys.stdout.writelines(u'\n') 55 | 56 | 57 | class DownloadError(object): 58 | def __init__(self, request, err, idx=None): 59 | self.request = request 60 | self.idx = idx 61 | self.err = err 62 | self.status_code = None 63 | if hasattr(err, 'status_code'): 64 | self.status_code = err.status_code 65 | 66 | 67 | class Request(object): 68 | def __init__(self, url, headers=None, timeout=None): 69 | self.url = url 70 | self.headers = headers 71 | self.timeout = timeout or 3.03 72 | 73 | 74 | class HtmlDocument(object): 75 | def __init__(self, url, body, idx=None, nr_lines=None): 76 | self.url = Url(url) 77 | self.body = body 78 | self.nr_lines = nr_lines 79 | if self.nr_lines is None: 80 | self.nr_lines = float(len(body.split(b'\n'))) 81 | self.idx = idx 82 | self._tree = None 83 | 84 | @property 85 | def tree(self): 86 | if self._tree is not None: 87 | return self._tree 88 | try: 89 | self._tree = fromstring(self.body, base_url=self.url.url) 90 | _ = tostring(self._tree, encoding='unicode') 91 | except: 92 | try: 93 | self._tree = soupparser.fromstring(self.body) 94 | except: 95 | pass 96 | return self._tree 97 | 98 | def to_dict(self): 99 | return {'url': self.url.url, 100 | 'nr_lines': self.nr_lines, 101 | 'idx': self.idx} 102 | 103 | @classmethod 104 | def from_dict(cls, d): 105 | return cls(d['url'], b'', d['idx'], d['nr_lines']) 106 | 107 | 108 | class Url(object): 109 | def __init__(self, url): 110 | self.url = url 111 | self.domain = re.sub('^www.', '', urlparse(self.url).netloc) 112 | self.base_domain = tld.get_tld(self.url) 113 | -------------------------------------------------------------------------------- /searchcmd/search_engines.py: -------------------------------------------------------------------------------- 1 | """ 2 | Search the internet and extract urls to the search hits. 3 | 4 | Usage: 5 | 6 | >>> e = get_engine('google') 7 | >>> req = e.get_search_request('linux') 8 | 9 | Now download the request and init a html doc. 10 | Extract urls: 11 | 12 | >>> urls = e.get_hits(html_doc) 13 | 14 | """ 15 | 16 | import re 17 | 18 | try: 19 | from urllib import quote_plus, unquote 20 | except ImportError: 21 | from urllib.parse import quote_plus, unquote 22 | 23 | from collections import defaultdict 24 | 25 | from lxml import html 26 | 27 | from searchcmd.download import Url, Request 28 | 29 | 30 | def get_engine(name): 31 | if name not in ENGINES: 32 | raise ValueError('Unkown search engine: %r' % name) 33 | return ENGINES[name] 34 | 35 | 36 | class SearchEngine(object): 37 | BASE_URL = None 38 | RE_INTERNAL_DOMAINS = None 39 | 40 | RE_URL_AS_PARAM = re.compile(r'https?:/{2}[^.]*\.[^&\s]*') 41 | 42 | def get_search_request(self, query): 43 | url = self.BASE_URL.format(quote_plus(query.encode('utf-8'))) 44 | return Request(url=url) 45 | 46 | def get_id(self, tree): 47 | tag = tree.tag.strip() 48 | cls = tree.attrib.get('class', '').strip() 49 | if not cls: 50 | return tag 51 | return '@'.join([tag, cls.split()[0]]) 52 | 53 | def get_path(self, subtree): 54 | parent = subtree.getparent() 55 | if parent.tag == 'body': 56 | return 'body' 57 | return '/'.join([self.get_path(parent), self.get_id(subtree)]) 58 | 59 | def get_outgoing_url(self, link): 60 | """Return url if outgoing, else None""" 61 | url = link.attrib.get('href') or '' 62 | url = url.strip() 63 | if not url: 64 | return 65 | if not url.startswith('http'): 66 | if 'http' not in url: 67 | return 68 | m = self.RE_URL_AS_PARAM.search(url) 69 | if not m: 70 | return 71 | url = unquote(m.group(0)) 72 | u = Url(url) 73 | if self.RE_INTERNAL_DOMAINS.search(u.domain): 74 | return 75 | return u 76 | 77 | def get_hits(self, html_doc): 78 | """ 79 | Returns: 80 | hits ([Url]): List of search result urls. 81 | """ 82 | tree = html.fromstring(html_doc.body, base_url=html_doc.url.url) 83 | links = tree.cssselect('a') 84 | # Group by path in html tree 85 | groups = defaultdict(list) 86 | for link in links: 87 | url = self.get_outgoing_url(link) 88 | if not url: 89 | continue 90 | groups[self.get_path(link)].append(url) 91 | # Skip groups where all urls have the same base domain 92 | for path, urls in list(groups.items()): 93 | if len(set((u.base_domain for u in urls))) == 1: 94 | groups.pop(path) 95 | # Return the largest group of urls 96 | return sorted(groups.values(), key=lambda e: len(e))[-1] 97 | 98 | 99 | class Google(SearchEngine): 100 | 101 | BASE_URL = 'http://google.com/search?q={}' 102 | RE_INTERNAL_DOMAINS = re.compile( 103 | r'(google|googleusercontent)\.[a-z][a-z][a-z]?$') 104 | 105 | 106 | class Bing(SearchEngine): 107 | 108 | BASE_URL = 'http://bing.com/search?q={}' 109 | RE_INTERNAL_DOMAINS = re.compile( 110 | r'(bing|bingj|microsoft)\.[a-z][a-z][a-z]?$') 111 | 112 | 113 | ENGINES = { 114 | 'google': Google(), 115 | 'bing': Bing(), 116 | } 117 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [wheel] 2 | universal = 1 3 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import sys 5 | import os 6 | from setuptools import setup 7 | from setuptools.command.test import test as TestCommand 8 | 9 | with open('README.rst') as readme_file: 10 | readme = readme_file.read() 11 | 12 | with open('HISTORY.rst') as history_file: 13 | history = history_file.read().replace('.. :changelog:', '') 14 | 15 | 16 | def get_requirements(file_name): 17 | with open(file_name) as req_file: 18 | requirements = [req.strip() for req in req_file.read().split('\n')] 19 | return [req for req in requirements if req and not req.startswith('-')] 20 | 21 | if sys.version_info[0] == 2: 22 | requirements = get_requirements('requirements-python-2.txt') 23 | else: 24 | requirements = get_requirements('requirements-python-3.txt') 25 | 26 | 27 | class PyTest(TestCommand): 28 | def finalize_options(self): 29 | TestCommand.finalize_options(self) 30 | self.test_args = [] 31 | self.test_suite = True 32 | 33 | def run_tests(self): 34 | import pytest 35 | errcode = pytest.main(self.test_args) 36 | sys.exit(errcode) 37 | 38 | setup( 39 | name='searchcmd', 40 | packages=['searchcmd'], 41 | version='0.1.0', 42 | description='Search the internets for commands from the command line.', 43 | long_description=readme + '\n\n' + history, 44 | author='Jimmy Petersson', 45 | author_email='jimmy.petersson@gmail.com', 46 | url='https://github.com/jimmyppi/searchcmd/', 47 | package_dir={'searchcmd': 48 | 'searchcmd'}, 49 | include_package_data=True, 50 | install_requires=requirements, 51 | license='lite BSD', 52 | entry_points={ 53 | 'console_scripts': ['searchcmd = searchcmd:main'] 54 | }, 55 | tests_require=['pytest', 'requests_mock', 'mock'], 56 | cmdclass={'test': PyTest}, 57 | test_suite='tests', 58 | extras_require={'testing': ['pytest']}, 59 | zip_safe=False, 60 | keywords='searchcmd cli', 61 | classifiers=[ 62 | 'Development Status :: 4 - Beta', 63 | 'Intended Audience :: Developers', 64 | 'Intended Audience :: System Administrators', 65 | 'License :: OSI Approved :: BSD License', 66 | 'Natural Language :: English', 67 | "Programming Language :: Python :: 2", 68 | 'Programming Language :: Python :: 2.7', 69 | 'Programming Language :: Python :: 3', 70 | 'Programming Language :: Python :: 3.4', 71 | 'Operating System :: POSIX', 72 | 'Environment :: Console', 73 | 'Topic :: Software Development', 74 | 'Topic :: System :: Systems Administration' 75 | ], 76 | ) 77 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jimmyppi/searchcmd/2fb94a70c4e1623a397f98d11b0122c9e029c45b/tests/__init__.py -------------------------------------------------------------------------------- /tests/test_cache.py: -------------------------------------------------------------------------------- 1 | from unittest import TestCase 2 | 3 | from tests.testutils import get_html_doc, iter_html_docs 4 | from searchcmd import cache 5 | 6 | 7 | class TestCache(TestCase): 8 | 9 | def test_cache(self): 10 | doc = get_html_doc('search_engines', 'google.com') 11 | docs = list(iter_html_docs('search_engines')) 12 | obj = {'single': doc, 'multi': docs} 13 | 14 | cache.store(obj, testing=True) 15 | from_cache = cache.get(testing=True) 16 | 17 | self.assertEqual(from_cache.keys(), obj.keys()) 18 | self.assertEqual(obj['single'].url.url, from_cache['single'].url.url) 19 | 20 | self.assertIsNone(cache.get(does_not_exist=True)) 21 | -------------------------------------------------------------------------------- /tests/test_cmdextract.py: -------------------------------------------------------------------------------- 1 | try: 2 | from itertools import izip as zip 3 | except ImportError: 4 | pass 5 | 6 | import re 7 | import mock 8 | 9 | from unittest import TestCase 10 | 11 | from tests.testutils import iter_html_docs, get_html_doc 12 | from searchcmd.download import HtmlDocument 13 | from searchcmd.cmdextract import CommandExtractor, extract_commands 14 | 15 | 16 | TEST_DATA_DIR = 'cmdextract' 17 | 18 | COMMANDS = { 19 | 'http://unixmantra.com': [ 20 | (1254, u'find ./music -name "*.mp3" -print0 | xargs -0 ls'), 21 | (1258, u'find ./work -print | xargs grep "profit"'), 22 | (1267, 23 | 'find . -name "*.sh" -print0 | xargs -0 -I {} mv {} ~/back.scripts'), 24 | (1306, 'find . -name "*.sh" | xargs grep "ksh"'), 25 | (1328, 'find /tmp -name "*.tmp" | xargs rm'), 26 | (1335, 'find /tmp -name "*.tmp" -print0 | xargs -0 rm'), 27 | (1373, 'mv chap1 chap1.old'), 28 | (1374, 'mv chap2 chap2.old'), 29 | (1375, 'mv chap3 chap3.old'), 30 | (1385, 'ar r lib.a chap1 ?...'), 31 | (1386, 'ar r lib.a chap2 ?...'), 32 | (1387, 'ar r lib.a chap3 ?...')], 33 | 'http://brunolinux.com': [ 34 | (62, "sed -i 's/ugly/beautiful/g' /home/bruno/old-friends/sue.txt"), 35 | (91, ("find /home/bruno/old-friends -type f " 36 | "-exec sed -i 's/ugly/beautiful/g' {} \\;")), 37 | (118, u'mv $fl $fl.old'), 38 | (119, u"sed 's/FINDSTRING/REPLACESTRING/g' $fl.old > $fl"), 39 | (120, u'rm -f $fl.old'), 40 | (150, ('perl -e "s/old_string/new_string/g;" ' 41 | '-pi.save $(find DirectoryName -type f)'))], 42 | 'http://cyberciti.biz': [ 43 | (118, 'hwclock -r'), 44 | (120, 'hwclock --show'), 45 | (122, 'hwclock --show --utc'), 46 | (126, 'date -s "2 OCT 2006 18:00:00"'), 47 | (128, 'date --set="2 OCT 2006 18:00:00"'), 48 | (130, 'date +%Y%m%d -s "20081128"'), 49 | (131, 'date +%T -s "10:13:13"'), 50 | (133, 'date +%T%p -s "6:10:30AM" # date +%T%p -s "12:10:30PM"'), 51 | (135, 'hwclock --systohc'), 52 | (137, 'hwclock -w'), 53 | (158, 'date -s 2007.04.08-22:46+0000'), 54 | (167, u'date -s \u201c2 OCT 2006 18:00:00\u2033'), 55 | (217, u'export http_proxy=\u2019http://10.10.1.2:3128\u2032'), 56 | (234, u'hwclock \u2013show'), 57 | (235, u'hwclock \u2013systohc'), 58 | (286, 'date +%Y%m%d -s "20080817"'), 59 | (354, u'date set=\u201d2 OCT 2006 18:00:00\u2033'), 60 | (371, u'date -s \u201d2 OCT 2006 18:00:00\u2033'), 61 | (380, u'date \u2013set=\u201d2 sep 2011 11:27:20\u2033'), 62 | (412, u'hwclock \u2013systohc'), 63 | (479, 'date 07252208002009'), 64 | (623, u'date +%Y%m%d%T -s \u201c20081225 10:05:00\u2033'), 65 | (704, u'date \u2013set=\u201d2 OCT 2006 18:00:00\u2033'), 66 | (832, 'for example 23/03/2011 to 23.03.2011'), 67 | (841, u'hwclock \u2013show'), 68 | (843, 'ls -l /etc/localtime'), 69 | (845, 'date 041106232011'), 70 | (846, u'hwclock \u2013systohc'), 71 | (847, 'ln -sf /usr/share/zoneinfo/ /etc/localtime'), 72 | (848, 'vi /etc/sysconfig/clock (update timezone if redhat)'), 73 | (850, u'hwclock \u2013show'), 74 | (852, 'ls -l /etc/localtime'), 75 | (929, 'date 030613252012'), 76 | (1126, 'uname -a'), 77 | (1144, u'date -s \u201cYYYY-MM-DD HH:MM:SS\u201d'), 78 | (1161, 'uname -a'), 79 | (1183, 'date --set="4 JAN 2014 13:25:00"'), 80 | (1239, (u'python -c \u2018import platform ; ' 81 | u'print platform.dist()[0]\u2019')), 82 | (1260, u'su \u2013'), 83 | (1261, 'passwd root'), 84 | (1301, u'date +%T -s \u201c10:13:13\u2033'), 85 | (1310, u'date -s \u201cYYYY-MM-DD HH:MM:SS\u201d')], 86 | 'http://stackoverflow.com': [ 87 | (5, 'du -h --max-depth=1'), 88 | (24, 'du -h -s *')] 89 | } 90 | 91 | NR_TEXTS = {'http://unixmantra.com': 225, 92 | 'http://brunolinux.com': 47, 93 | 'http://cyberciti.biz': 705, 94 | 'http://stackoverflow.com': 49} 95 | 96 | MERGED_COMMANDS = set([ 97 | u'find ./music -name "*.mp3" -print0 | xargs -0 ls', 98 | 'hwclock --show', 99 | "sed -i 's/ugly/beautiful/g' /home/bruno/old-friends/sue.txt", 100 | 'date --set="2 OCT 2006 18:00:00"', 101 | 'mv chap1 chap1.old', 102 | u'date +%T -s \u201c10:13:13\u2033', 103 | 'ar r lib.a chap1 ?...', 104 | 'find . -name "*.sh" -print0 | xargs -0 -I {} mv {} ~/back.scripts', 105 | u'mv $fl $fl.old', 106 | 'mv chap2 chap2.old', 107 | 'date 07252208002009', 108 | 'date +%T%p -s "6:10:30AM" # date +%T%p -s "12:10:30PM"', 109 | 'find /tmp -name "*.tmp" | xargs rm', 110 | 'du -h --max-depth=1', 111 | 'mv chap3 chap3.old', 112 | u'date \u2013set=\u201d2 sep 2011 11:27:20\u2033', 113 | u'date -s \u201d2 OCT 2006 18:00:00\u2033', 114 | 'date +%Y%m%d -s "20081128"', 115 | 'find . -name "*.sh" | xargs grep "ksh"', 116 | 'date 041106232011', 117 | 'ar r lib.a chap3 ?...', 118 | u'find ./work -print | xargs grep "profit"', 119 | u'date +%Y%m%d%T -s \u201c20081225 10:05:00\u2033', 120 | 'date -s 2007.04.08-22:46+0000', 121 | 'find /tmp -name "*.tmp" -print0 | xargs -0 rm', 122 | 'date +%T -s "10:13:13"', 123 | 'date --set="4 JAN 2014 13:25:00"', 124 | u'hwclock \u2013systohc', 125 | 'hwclock --show --utc', 126 | u'date -s \u201c2 OCT 2006 18:00:00\u2033', 127 | 'hwclock --systohc', 128 | ("find /home/bruno/old-friends -type f " 129 | "-exec sed -i 's/ugly/beautiful/g' {} \\;"), 130 | 'date -s "2 OCT 2006 18:00:00"', 131 | u"sed 's/FINDSTRING/REPLACESTRING/g' $fl.old > $fl", 132 | 'date 030613252012', 133 | u'date set=\u201d2 OCT 2006 18:00:00\u2033', 134 | u'hwclock \u2013show', 135 | 'du -h -s *', 136 | 'date +%Y%m%d -s "20080817"', 137 | 'ar r lib.a chap2 ?...', 138 | u'date \u2013set=\u201d2 OCT 2006 18:00:00\u2033', 139 | 'hwclock -w', 140 | u'date -s \u201cYYYY-MM-DD HH:MM:SS\u201d', 141 | 'hwclock -r']) 142 | 143 | 144 | ALL_TESTS_COMMANDS = [u'vim', u'vi', u'alias', u'bg', u'bind', u'break', u'builtin', u'caller', u'cd', u'command', u'compgen', u'complete', u'compopt', u'continue', u'declare', u'dirs', u'disown', u'echo', u'enable', u'eval', u'exec', u'exit', u'export', u'false', u'fc', u'fg', u'getopts', u'hash', u'help', u'history', u'jobs', u'kill', u'let', u'local', u'logout', u'mapfile', u'popd', u'printf', u'pushd', u'pwd', u'read', u'readarray', u'readonly', u'return', u'set', u'shift', u'shopt', u'source', u'suspend', u'test', u'times', u'trap', u'true', u'type', u'typeset', u'ulimit', u'umask', u'unalias', u'unset', u'wait', u'if', u'then', u'else', u'elif', u'fi', u'case', u'esac', u'for', u'select', u'while', u'until', u'do', u'done', u'in', u'function', u'time', u'coproc', u'alias', u'bg', u'bind', u'break', u'builtin', u'caller', u'cd', u'command', u'compgen', u'complete', u'compopt', u'continue', u'declare', u'dirs', u'disown', u'echo', u'enable', u'eval', u'exec', u'exit', u'export', u'false', u'fc', u'fg', u'getopts', u'hash', u'help', u'history', u'jobs', u'kill', u'let', u'local', u'logout', u'mapfile', u'popd', u'printf', u'pushd', u'pwd', u'read', u'readarray', u'readonly', u'return', u'set', u'shift', u'shopt', u'source', u'suspend', u'test', u'times', u'trap', u'true', u'type', u'typeset', u'ulimit', u'umask', u'unalias', u'unset', u'wait', u'ipcluster2', u'dynamodb_load', u'ipcontroller2', u'nosetests', u'lss3', u'asadmin', u'pilprint.py', u'cfadmin', u'taskadmin', u'instance_events', u'pilfile.py', u's3put', u'ipython2', u'bundle_image', u'ipcontroller', u'list_instances', u'ipcluster', u'mturk', u'iptest', u'pilfont.py', u'fetch_file', u'kill_instance', u'pildriver.py', u'django-admin.py', u'ipython', u'cwutil', u'route53', u'glacier', u'gunicorn_paster', u'dynamodb_dump', u'cq', u'faker', u'iptest2', u'pilconvert.py', u'gunicorn', u'ipdb', u'django-admin', u'launch_instance', u'sqlformat', u'newrelic-admin', u'nosetests-2.7', u'gunicorn_django', u'ipengine', u'elbadmin', u'ipengine2', u'sdbadmin', u'pyami_sendmail', u'pip', u'pip2.7', u'wheel', u'easy_install', u'pip2', u'easy_install-2.7', u'2to3', u'idle', u'python2-config', u'smtpd.py', u'sqlite3', u'pydoc', u'python2', u'python', u'python2.7', u'python2.7-config', u'python-config', u'ldattach', u'update-mime', u'sshd', u'install-sgmlcatalog', u'update-xmlcatalog', u'pg_updatedicts', u'tcpdchk', u'filefrag', u'update-ca-certificates', u'logrotate', u'try-from', u'libgvc6-config-update', u'update-catalog', u'update-gsfontmap', u'make-ssl-cert', u'tunelp', u'mklost+found', u'safe_finger', u'cron', u'e2freefrag', u'cytune', u'fdformat', u'readprofile', u'rtcwake', u'tcpd', u'gconf-schemas', u'update-icon-caches.gtk2', u'e4defrag', u'update-icon-caches', u'tcpdmatch', u'paperconfig', u'update-java-alternatives', u'service', u'vipw', u'chroot', u'dpkg-preconfigure', u'rmt', u'locale-gen', u'delgroup', u'dpkg-reconfigure', u'arpd', u'pam-auth-update', u'pwunconv', u'pwck', u'pwconv', u'newusers', u'cpgr', u'update-initramfs', u'rmt-tar', u'pam_getenv', u'adduser', u'policy-rc.d', u'addgroup', u'mkinitramfs', u'zic', u'nologin', u'vigr', u'invoke-rc.d', u'useradd', u'add-shell', u'usermod', u'cppw', u'groupdel', u'iconvconfig', u'tarcat', u'update-passwd', u'grpunconv', u'validlocale', u'deluser', u'update-rc.d', u'update-locale', u'userdel', u'groupmod', u'chpasswd', u'remove-shell', u'grpconv', u'tzconfig', u'grpck', u'chgpasswd', u'pam_timestamp_check', u'groupadd', u'init', u'pnmtops', u'chrt', u'tgz', u'lzmore', u'dpkg-genchanges', u'gcc-4.8', u'rasttopnm', u'pdb', u'ppmtoicr', u'volname', u'ssh-import-id-gh', u'pnmalias', u'rawtopgm', u'linux32', u'ppmdither', u'pg_basebackup', u'fc-cache', u'pbmtolj', u'editres', u'gcov-4.8', u'c_rehash', u'dh_pypy', u'paperconf', u'deb-systemd-invoke', u'pdb3', u'pbmtopgm', u'identify', u'pnmcomp', u'st4topgm', u'xlsatoms', u'python3m', u'partx', u'2to3', u'ppmtolss16', u'minfo', u'jcmd', u'curl', u'fitstopnm', u'jmap', u'dpkg-distaddfile', u'host', u'timedatectl', u'xkill', u'anytopnm', u'MagickWand-config', u'pgmedge', u'pic', u'x86_64-linux-gnu-objdump', u'yuvsplittoppm', u'line', u'eyuvtoppm', u'pkttyagent', u'ps2pdf14', u'font2c', u'xfontsel', u'dpkg-scanpackages', u'find2perl', u'pbmto10x', u'xbmtopbm', u'objcopy', u'faked-sysv', u'pnmtopalm', u'git-shell', u'pdb3.4', u'pnmmargin', u'orbd', u'ppm3d', u'crontab', u'zipgrep', u'xzcat', u'schemagen', u'convert', u'pbmpscale', u'dpkg-gencontrol', u'xwdtopnm', u'pnmtosir', u'dvipdf', u'namei', u'x86_64-linux-gnu-size', u'testrb', u'pstruct', u'telnet', u'ybmtopbm', u'x86_64-linux-gnu-gcc-ranlib', u'rcp', u'pnmtojpeg', u'ppmhist', u'editor', u'gsnd', u'pbmtext', u'perlivp', u'gold', u'printafm', u'pbmmask', u'peekfd', u'yacc', u'rename', u'pcre-config', u'see', u'telnet.netkit', u'lscpu', u'pnminvert', u'jstat', u'autoconf', u'pl2pm', u'ppmtv', u'mdel', u'mcheck', u'ptardiff', u'ssh-copy-id', u'psidtopgm', u'pg_dropcluster', u'localectl', u'sgitopnm', u'pbmtowbmp', u'pack200', u'pgmhist', u'pbmtoybm', u'gpg-error', u'stunnel', u'import', u'pg', u'x86_64-linux-gnu-objcopy', u'strip', u'gsdj500', u'cpp', u'eps2eps', u'setuidgid', u'libnetcfg', u'vacuumdb', u'pgbench', u'spctoppm', u'ppmquant', u'pnmshear', u'svstat', u'pod2html', u'lessecho', u'pamcut', u'pgmtopbm', u'pnmtoddif', u'gio-querymodules', u'prename', u'envdir', u'gslp', u'gcc-ranlib-4.8', u'run-mailcap', u'winicontoppm', u'x86_64-linux-gnu-gcc-ranlib-4.8', u'xfd', u'chattr', u'fakeroot', u'setarch', u'pnmgamma', u'gdbus-codegen', u'pyversions', u'gtk-launch', u'pbmlife', u'x86_64-linux-gnu-cpp-4.8', u'resizepart', u'pnmtotiffcmyk', u'xprop', u'perlthanks', u'pnmnoraw', u'ppmquantall', u'pgmtexture', u'memdiskfind', u'pamstretch', u'ssh-add', u'libpng-config', u'ppmtomitsu', u'pnmtoxwd', u'x86_64-linux-gnu-ld.bfd', u'pbmtoascii', u'jinfo', u'pgrphack', u'pbmreduce', u'x86_64-linux-gnu-c++filt', u'pg_conftool', u'python2-config', u'erb', u'update-mime-database.real', u'lesskey', u'pg_recvlogical', u'setfacl', u'pnmtile', u'prtstat', u'sirtopnm', u'pnmsplit', u'ppmtoleaf', u'killall', u'ps2epsi', u'mzip', u'lsb_release', u'rletopnm', u'pgmcrater', u'gresource', u'rsh', u'psql', u'lcf', u'ppmtouil', u'droplang', u'compare.im6', u'pgmtofs', u'mclasserase', u'vacuumlo', u'pbmtoepsi', u'pnmtorast', u'hostnamectl', u'pbmtog3', u'ionice', u'jpegtopnm', u'xzgrep', u'gsettings', u'c99', u'lzfgrep', u'mdatopbm', u'ppmdist', u'pg_config', u'ppmchange', u'xz', u'dpkg-buildpackage', u'mdir', u'xzmore', u'ppmtoilbm', u'pbmtoptx', u'ppmntsc', u'ppmnorm', u'mkmanifest', u'libgcrypt-config', u'pnmcut', u'dh_python2', u'aclocal', u'svscan', u'pxelinux-options', u'rgb3toppm', u'x86_64-linux-gnu-dwp', u'ranlib', u'import.im6', u'unzipsfx', u'javadoc', u'x86_64-linux-gnu-ld', u'pamstretch-gen', u'gemtopnm', u'setsid', u'make', u'gcc-ar-4.8', u'irb', u'x86_64-linux-gnu-readelf', u'x86_64-linux-gnu-python2.7-config', u'g++', u'xml2-config', u'pgmoil', u'dpkg-name', u'gtester', u'gprof', u'ps2ps', u'logger', u'mcat', u'ppmtoeyuv', u'x86_64-linux-gnu-ranlib', u'ld', u'x86_64-linux-gnu-nm', u'pamdeinterlace', u'x86_64-linux-gnu-strip', u'mshowfat', u'grog', u'zeisstopnm', u'xvinfo', u'perldoc', u'ps2pdf12', u'jconsole', u'linux64', u'gcc', u'pod2latex', u'ssh-keyscan', u'pi3topbm', u'rlogin', u'rmiregistry', u'montage.im6', u'unlzma', u'hmac256', u'svscanboot', u'pod2text', u'pfbtopfa', u'ppmtoacad', u'a2p', u'x86_64-linux-gnu-gcov', u'geqn', u'pnmremap', u'pamfile', u'pbmtoppa', u'splain', u'dig', u'strings', u'rmic', u'python3.4', u'desktop-file-install', u'pnmscalefixed', u'sputoppm', u'ppmtopgm', u'sprof', u'brushtopbm', u'pod2latex.bundled', u'h2ph', u'ghostscript', u'pgmnorm', u'xauth', u'servertool', u'2to3-3.4', u'rake1.9.1', u's2p', u'mdu', u'pnmenlarge', u'gdk-pixbuf-pixdata', u'ps2pdf13', u'pg_createcluster', u'c++filt', u'mxtar', u'extcheck', u'pnmtorle', u'ppmtoneo', u'neqn', u'cd-fix-profile', u'cpanp', u'faked-tcp', u'ppmtomap', u'cpan', u'luit', u'policytool', u'flock', u'python3', u'grops', u'wftopfa', u'tnameserv', u'ppmtopuzz', u'cpanp-run-perl', u'dpkg-vendor', u'dwp', u'xzless', u'convert.im6', u'gpic', u'mren', u'mgrtopbm', u'fc-query', u'ppmspread', u'ppmfade', u'pstopnm', u'pphs', u'ppmtopi1', u'autoreconf', u'gemtopbm', u'fghack', u'pbmpage', u'cpan2dist', u'pydoc', u'mlabel', u'x86_64-linux-gnu-gcc-ar', u'pgmnoise', u'soelim', u'pamdice', u'bison.yacc', u'podselect', u'autom4te', u'ssh-keygen', u'stream', u'xvminitoppm', u'glib-compile-resources', u'pg_ctlcluster', u'lz', u'jsadebugd', u'pg_renamecluster', u'pnmconvol', u'x86_64-linux-gnu-addr2line', u'dumpsexp', u'pbmtoxbm', u'jrunscript', u'dh_autotools-dev_restoreconfig', u'gcov', u'pbmclean', u'rmid', u'x86_64-linux-gnu-gcc-4.8', u'gcc-ar', u'ssh-argv0', u'dbus-launch', u'supervise', u'x86_64-linux-gnu-ld.gold', u'ppmtogif', u'cmuwmtopbm', u'elfedit', u'identify.im6', u'xdriinfo', u'size', u'pg_isready', u'desktop-file-edit', u'slogin', u'colormgr', u'zipsplit', u'pbmtextps', u'getfacl', u'dpkg-checkbuilddeps', u'cpp-4.8', u'cd-iccdump', u'pnmmontage', u'ptargrep', u'keytool', u'x86_64', u'readproctitle', u'ppmtoyuv', u'pgmramp', u'gdk-pixbuf-query-loaders', u'lessfile', u'ppmtolj', u'pstree', u'pg_restore', u'pnmcolormap', u'pamstack', u'testrb1.9.1', u'pydoc2.7', u'pnmnorm', u'pbmtopsg3', u'giftopnm', u'pnmindex', u'ilbmtoppm', u'json_pp', u'pnmcrop', u'libtoolize', u'x86_64-linux-gnu-ar', u'x86_64-linux-gnu-gcc', u'pydoc3', u'libpng12-config', u'x86_64-linux-gnu-python-config', u'nsupdate', u'pbmtoatk', u'2to3-2.7', u'delpart', u'pbmtomda', u'python2', u'udisksctl', u'pdb2.7', u'git', u'pnmflip', u'tbl', u'pnmtoplainpnm', u'rdoc1.9.1', u'pnmpaste', u'ppmtoyuvsplit', u'pnmdepth', u'mformat', u'podchecker', u'ld.bfd', u'ps2pdf', u'objdump', u'idlj', u'sldtoppm', u'jhat', u'pgmslice', u'fc-list', u'jstatd', u'pg_receivexlog', u'conjure.im6', u'g3topbm', u'fiascotopnm', u'pnmtofits', u'dpkg-shlibdeps', u'zipdetails', u'x86_64-linux-gnu-strings', u'pkaction', u'autoupdate', u'xzfgrep', u'ppmshift', u'ntfsdecrypt', u'unzip', u'ptar', u'pod2usage', u'less', u'traceroute6', u'pnmquant', u'pygettext3.4', u'ps2ps2', u'bmptopnm', u'compile_et', u'pnminterp-gen', u'pbmtomgr', u'c89', u'rsync', u'xwininfo', u'groff', u'ssh-import-id-lp', u'py3compile', u'pygettext2.7', u'addpart', u'print', u'jarsigner', u'montage', u'xlsclients', u'wall', u'aclocal-1.14', u'procan', u'pnmscale', u'file', u'multilog', u'pydoc3.4', u'gtbl', u'jar', u'shasum', u'preconv', u'ruby', u'gtk-update-icon-cache-3.0', u'c99-gcc', u'ucfq', u'view', u'pnmfile', u'mkdiskimage', u'autoheader', u'ppmtompeg', u'thinkjettopbm', u'ppmtoxpm', u'x86_64-linux-gnu-gcc-ar-4.8', u'gobject-query', u'zipinfo', u'display.im6', u'pnmtotiff', u'dh_python3', u'eject', u'lzcat', u'java', u'xdpyinfo', u'qrttoppm', u'pnminterp', u'compose', u'svc', u'enc2xs', u'stream.im6', u'md5pass', u'fc-scan', u'git-upload-archive', u'ppmtopj', u'gsettings-schema-convert', u'glib-genmarshal', u'ld.gold', u'envuidgid', u'Magick-config', u'readelf', u'pbmtoepson', u'rev', u'bison', u'sha1pass', u'listres', u'syslinux2ansi', u'glib-gettextize', u'mbadblocks', u'composite', u'native2ascii', u'pgmbentley', u'createuser', u'createdb', u'wsgen', u'dpkg-gensymbols', u'pg_virtualenv', u'whereis', u'passwd', u'pcxtoppm', u'ifnames', u'leaftoppm', u'clusterdb', u'ppmshadow', u'xzegrep', u'gs', u'ucfr', u'softlimit', u'jexec', u'm4', u'ipcmk', u'taskset', u'rpcgen', u'pbmtogo', u'gtk-update-icon-cache', u'ppmtopict', u'troff', u'mogrify.im6', u'macptopbm', u'pnmrotate', u'reindexdb', u'isohybrid', u'h2xs', u'freetype-config', u'lzegrep', u'edit', u'pbmtoplot', u'fc-pattern', u'ssh', u'gnome-open', u'mogrify', u'git-upload-pack', u'chsh', u'ppmtotga', u'glib-mkenums', u'ppmtosixel', u'xjc', u'uz', u'pnmpad', u'pbmtonokia', u'desktop-file-validate', u'gconftool-2', u'ppmcie', u'javap', u'ppmtobmp', u'g++-4.8', u'zip', u'bioradtopgm', u'setterm', u'lzgrep', u'mcd', u'conjure', u'nslookup', u'scp', u'ssh-import-id', u'cd-create-profile', u'grotty', u'tifftopnm', u'pnmhistmap', u'automake-1.14', u'jps', u'ppmrelief', u'mcomp', u'krb5-config', u'gpasswd', u'git-receive-pack', u'pg_dump', u'getopt', u'dpkg-scansources', u'ppmbrighten', u'pbmupc', u'ppmtopcx', u'xmessage', u'pkexec', u'pbmtocmuwm', u'pngtopnm', u'ps2pdfwr', u'gencat', u'gslj', u'ps2ascii', u'411toppm', u'py3clean', u'eqn', u'python', u'svok', u'ps2txt', u'dpkg-buildflags', u'gethostip', u'xev', u'lzmainfo', u'unxz', u'mshortname', u'chfn', u'tai64nlocal', u'MagickCore-config', u'mcopy', u'ppmdim', u'update-gconf-defaults', u'pi1toppm', u'ppmflash', u'deb-systemd-helper', u'pod2man', u'wget', u'autoscan', u'mcookie', u'dropuser', u'openssl', u'gtester-report', u'x86_64-linux-gnu-cpp', u'pg_upgradecluster', u'ppmlabel', u'asciitopgm', u'stunnel3', u'ipcs', u'psed', u'pybuild', u'javac', u'pbmtopi3', u'fc-cat', u'compare', u'python2.7', u'pgmenhance', u'cc', u'fstopgm', u'x86_64-linux-gnu-as', u'icontopbm', u'pgmtoppm', u'erb1.9.1', u'prove', u'pnmarith', u'zipnote', u'gsdj', u'xlsfonts', u'imgtoppm', u'dbus-monitor', u'neotoppm', u'pbmtobbnbg', u'corelist', u'ppmcolormask', u'gdk-pixbuf-csource', u'yuvtoppm', u'serialver', u'ppmtorgb3', u'xzdiff', u'perlbug', u'i386', u'mysql_config', u'nm', u'pnmhisteq', u'dpkg-parsechangelog', u'sbigtopgm', u'pjtoppm', u'gdbus', u'mattrib', u'mmount', u'display', u'script', u'ppmmake', u'jdb', u'fallocate', u'piconv', u'ppmforge', u'pdf2ps', u'gsbj', u'animate', u'filan', u'sftp', u'pnmtofiasco', u'tgatoppm', u'lzcmp', u'lsattr', u'ruby1.9.1', u'x86_64-linux-gnu-gcc-nm-4.8', u'jstack', u'gcc-nm', u'pnmpsnr', u'libwmf-config', u'glib-compile-schemas', u'as', u'python2.7-config', u'automake', u'xpmtoppm', u'socat', u'javah', u'dropdb', u'expiry', u'gpg-error-config', u'appres', u'zipcloak', u'gcc-ranlib', u'x86_64-linux-gnu-gcov-4.8', u'lzless', u'unshare', u'python-config', u'c2ph', u'ppmtowinicon', u'palmtopnm', u'rename.ul', u'fakeroot-tcp', u'xzcmp', u'irb1.9.1', u'xslt-config', u'pbmtoicon', u'chage', u'traceroute6.iputils', u'x86_64-linux-gnu-g++-4.8', u'rawtoppm', u'x86_64-linux-gnu-gcc-nm', u'cautious-launcher', u'pkcheck', u'animate.im6', u'dh_installxmlcatalogs', u'fc-match', u'mtvtoppm', u'funzip', u'ppmmix', u'pg_dumpall', u'pbmtozinc', u'gcc-nm-4.8', u'curl-config', u'pg_lsclusters', u'config_data', u'tracepath6', u'ppmpat', u'ri1.9.1', u'wbmptopbm', u'pdf2dsc', u'ssh-agent', u'pnmsmooth', u'gem', u'ucf', u'ximtoppm', u'ppmtojpeg', u'mtrace', u'amuFormat.sh', u'mpartition', u'chacl', u'pg_archivecleanup', u'sotruss', u'viewres', u'pgmkernel', u'rdoc', u'mtools', u'pbmtogem', u'dh_autotools-dev_updateconfig', u'pstree.x11', u'py3versions', u'pbmtomacp', u'lzma', u'mmd', u'x86_64-linux-gnu-g++', u'python3.4m', u'pyclean', u'newgrp', u'xsubpp', u'tracepath', u'nroff', u'pgmtolispm', u'gsettings-data-convert', u'atktopbm', u'ppmrainbow', u'lispmtopgm', u'ri', u'pf2afm', u'pyvenv-3.4', u'syslinux', u'gconftool', u'mmove', u'ppmcolors', u'dpkg-architecture', u'ipcrm', u'ppmqvga', u'pygettext', u'addr2line', u'hipstopgm', u'gouldtoppm', u'chardet', u'mdeltree', u'patch', u'x86_64-linux-gnu-gprof', u'isohybrid.pl', u'fc-validate', u'instmodsh', u'pnmtosgi', u'pbmtox10bm', u'appletviewer', u'pnmcat', u'lss16toppm', u'setlock', u'lzdiff', u'composite.im6', u'pnmtopng', u'pkg-config', u'gem1.9.1', u'lesspipe', u'c++', u'mtype', u'pnmnlfilt', u'c89-gcc', u'chkdupexe', u'createlang', u'update-mime-database', u'pamoil', u'ddate', u'pg_config.libpq-dev', u'mtoolstest', u'dbus-send', u'bmptoppm', u'pbmmake', u'scriptreplay', u'stunnel4', u'wsimport', u'mrd', u'Wand-config', u'dpkg-mergechangelogs', u'renice', u'gconf-merge-tree', u'ar', u'dpkg-source', u'update-desktop-database', u'x86_64-linux-gnu-elfedit', u'libtool', u'fakeroot-sysv', u'unpack200', u'pycompile', u'tai64n', u'pygettext3', u'base64', u'tty', u'service', u'test', u'nl', u'numfmt', u'sha1sum', u'cmp', u'dpkg-split', u'ncurses5-config', u'apt-get', u'iconv', u'nstat', u'lsinitramfs', u'free', u'wc', u'debconf', u'lastb', u'hostid', u'ptx', u'tabs', u'diff3', u'pwdx', u'cksum', u'debconf-communicate', u'fold', u'captoinfo', u'rtstat', u'routef', u'groups', u'localedef', u'whoami', u'tic', u'od', u'gpgsplit', u'apt-config', u'pager', u'seq', u'arch', u'debconf-set-selections', u'tput', u'w', u'perl5.18.2', u'locale', u'md5sum', u'env', u'routel', u'md5sum.textutils', u'find', u'tzselect', u'logname', u'factor', u'awk', u'expr', u'gpg', u'ncursesw5-config', u'w.procps', u'tset', u'pinky', u'shred', u'zdump', u'sha224sum', u'apt-key', u'apt', u'apt-cache', u'shuf', u'pathchk', u'expand', u'timeout', u'tsort', u'initctl2dot', u'gpg-zip', u'ctstat', u'sha256sum', u'mawk', u'sensible-pager', u'gpgv', u'debconf-show', u'printf', u'comm', u'yes', u'rgrep', u'perl', u'debconf-copydb', u'init-checkconf', u'clear_console', u'nohup', u'infotocap', u'dpkg-statoverride', u'debconf-apt-progress', u'stdbuf', u'mkfifo', u'pldd', u'dpkg-trigger', u'cut', u'infocmp', u'sum', u'who', u'id', u'tload', u'reset', u'pmap', u'diff', u'pr', u'dpkg-query', u'dpkg-divert', u'csplit', u'touch', u'printenv', u'dircolors', u'savelog', u'bashbug', u'sensible-browser', u'which', u'update-alternatives', u'sg', u'oldfind', u'stat', u'basename', u'unlink', u'vmstat', u'apt-cdrom', u'fmt', u'nice', u'head', u'sensible-editor', u'getent', u'clear', u'watch', u'skill', u'pkill', u'split', u'sha512sum', u'dpkg', u'dpkg-deb', u'dirname', u'nproc', u'last', u'nawk', u'debconf-escape', u'ischroot', u'lspgpot', u'catchsegv', u'apt-mark', u'getconf', u'snice', u'lastlog', u'du', u'join', u'tail', u'sdiff', u'link', u'slabtop', u'tac', u'lnstat', u'users', u'tee', u'uniq', u'tr', u'top', u'paste', u'xargs', u'pgrep', u'uptime', u'ldd', u'runcon', u'truncate', u'sort', u'install', u'faillog', u'sha384sum', u'mesg', u'unexpand', u'select-editor', u'toe', u'chcon', u'dpkg-maintscript-helper', u'blockdev', u'ntfscp', u'fdisk', u'swapon', u'mount.ntfs-3g', u'hwclock', u'fsck.ext4dev', u'fsck', u'parted', u'unix_chkpwd', u'fsck.minix', u'fstrim', u'fsck.vfat', u'fsck.msdos', u'umount.udisks2', u'fsck.fat', u'sfdisk', u'sgdisk', u'mkfs.ext3', u'fsck.ext3', u'mkfs.ext2', u'mkfs.ext4dev', u'cfdisk', u'mkfs.bfs', u'mount.fuse', u'tune2fs', u'fstrim-all', u'pivot_root', u'mkdosfs', u'mkfs.minix', u'logsave', u'ntfsresize', u'ctrlaltdel', u'e2undo', u'e2image', u'ntfsclone', u'ntfsundelete', u'losetup', u'wipefs', u'partprobe', u'fsck.ext4', u'mkfs.ext4', u'switch_root', u'cgdisk', u'badblocks', u'findfs', u'isosize', u'ntfslabel', u'gdisk', u'mkfs.cramfs', u'fsck.cramfs', u'mkfs.fat', u'fsck.ext2', u'swaplabel', u'fatlabel', u'debugfs', u'mkfs.ntfs', u'e2label', u'dumpe2fs', u'dosfslabel', u'mount.lowntfs-3g', u'raw', u'dosfsck', u'mkntfs', u'fixparts', u'e2fsck', u'getty', u'mke2fs', u'mkswap', u'mkfs.msdos', u'resize2fs', u'mount.ntfs', u'dmsetup', u'blkid', u'agetty', u'mkfs.vfat', u'swapoff', u'mkfs', u'fsfreeze', u'sysctl', u'pam_tally2', u'udevadm', u'killall5', u'initctl', u'status', u'unix_update', u'fsck.nfs', u'ip', u'mountall', u'rmmod', u'shutdown', u'bridge', u'installkernel', u'pam_tally', u'lsmod', u'upstart-udev-bridge', u'upstart-socket-bridge', u'startpar', u'plymouthd', u'rtacct', u'start-stop-daemon', u'insmod', u'mkhomedir_helper', u'upstart-local-bridge', u'upstart-dbus-bridge', u'startpar-upstart-inject', u'start', u'reload', u'modprobe', u'ifup', u'upstart-file-bridge', u'ldconfig.real', u'shadowconfig', u'udevd', u'mntctl', u'modinfo', u'ifdown', u'initctl.distrib', u'poweroff', u'telinit', u'rtmon', u'reboot', u'ldconfig', u'MAKEDEV', u'tc', u'runlevel', u'restart', u'depmod', u'sulogin', u'ifquery', u'stop', u'halt', u'upstart-event-bridge', u'fstab-decode', u'init', u'bzcmp', u'findmnt', u'bzip2recover', u'bzip2', u'su', u'bzgrep', u'lowntfs-3g', u'ping6', u'mount', u'ntfscmp', u'tailf', u'lessecho', u'bzfgrep', u'lesskey', u'setfacl', u'ntfswipe', u'bzmore', u'umount', u'dbus-uuidgen', u'fusermount', u'loginctl', u'ntfsmove', u'ping', u'bzless', u'more', u'ntfsmftalloc', u'ntfscat', u'getfacl', u'lessfile', u'nc.openbsd', u'fuser', u'less', u'red', u'ntfs-3g.probe', u'ntfs-3g', u'ntfsdump_logfile', u'ntfstruncate', u'ntfsck', u'nc', u'bunzip2', u'ntfscluster', u'ntfsfix', u'dbus-daemon', u'bzdiff', u'dbus-cleanup-sockets', u'ulockmgr_server', u'bzegrep', u'chacl', u'lsblk', u'bzexe', u'dmesg', u'ntfs-3g.secaudit', u'ntfsinfo', u'ed', u'ntfsls', u'lesspipe', u'bzcat', u'netcat', u'ntfs-3g.usermap', u'ls', u'sh.distrib', u'dd', u'mountpoint', u'gzip', u'false', u'udevadm', u'pidof', u'pwd', u'mktemp', u'chgrp', u'grep', u'ss', u'sed', u'znew', u'rmdir', u'plymouth', u'ln', u'df', u'ip', u'cat', u'hostname', u'kill', u'mt-gnu', u'lsmod', u'stty', u'rbash', u'zcmp', u'login', u'fgrep', u'zcat', u'nisdomainname', u'zless', u'true', u'egrep', u'cp', u'readlink', u'uname', u'running-in-container', u'dnsdomainname', u'gunzip', u'touch', u'dir', u'which', u'mt', u'sh', u'tar', u'cpio', u'kmod', u'zdiff', u'vdir', u'bash', u'mkdir', u'sleep', u'domainname', u'uncompress', u'chown', u'chmod', u'rm', u'date', u'plymouth-upstart-bridge', u'gzexe', u'zegrep', u'tempfile', u'zfgrep', u'ps', u'sync', u'zgrep', u'ypdomainname', u'zforce', u'dash', u'echo', u'mv', u'zmore', u'mknod', u'run-parts'] # noqa 145 | 146 | 147 | @mock.patch.object( 148 | CommandExtractor, 149 | 'RE_ALL_COMMANDS', 150 | re.compile(CommandExtractor.BASE_CMDS_REX % '|'.join(map(re.escape, ALL_TESTS_COMMANDS))) 151 | ) 152 | class TestCommandExtract(TestCase): 153 | 154 | def test_extract_commands(self): 155 | cmds = extract_commands(iter_html_docs(TEST_DATA_DIR)) 156 | self.assertEqual(set(cmds.commands.keys()), MERGED_COMMANDS) 157 | 158 | cmds = extract_commands(iter_html_docs(TEST_DATA_DIR), 'xargs') 159 | self.assertEqual(set(cmds.commands.keys()), set([ 160 | 'find /tmp -name "*.tmp" | xargs rm', 161 | u'find ./music -name "*.mp3" -print0 | xargs -0 ls', 162 | 'find . -name "*.sh" | xargs grep "ksh"', 163 | 'find /tmp -name "*.tmp" -print0 | xargs -0 rm', 164 | 'find . -name "*.sh" -print0 | xargs -0 -I {} mv {} ~/back.scripts', 165 | u'find ./work -print | xargs grep "profit"'])) 166 | 167 | cmds = extract_commands( 168 | get_html_doc(TEST_DATA_DIR, 'stackoverflow.com'), 'xargs') 169 | self.assertEqual(cmds.commands, {}) 170 | 171 | doc = HtmlDocument('http://stackoverflow.com', b'') 172 | doc.body = None 173 | cmds = extract_commands(doc) 174 | self.assertEqual(cmds.nr_docs, 0) 175 | 176 | def test_iter_texts(self): 177 | extractor = CommandExtractor() 178 | for doc in iter_html_docs(TEST_DATA_DIR): 179 | nr_txts = 0 180 | for _ in extractor.iter_text_lines(doc): 181 | nr_txts += 1 182 | self.assertEqual(NR_TEXTS[doc.url.url], nr_txts) 183 | 184 | def test_iter_commands(self): 185 | extractor = CommandExtractor() 186 | for doc in iter_html_docs(TEST_DATA_DIR): 187 | for (line, cmd), correct in zip(extractor.iter_commands(doc), 188 | COMMANDS[doc.url.url]): 189 | self.assertEqual((line, cmd), correct) 190 | 191 | def test_get_command(self): 192 | ext = CommandExtractor('find') 193 | self.assertEqual(ext.get_command('$ find . -name "*.mp3"'), 194 | 'find . -name "*.mp3"') 195 | self.assertIsNone(ext.get_command('# ls -hl')) 196 | self.assertIsNone(ext.get_command('find a file')) 197 | self.assertIsNone(ext.get_command('Find . -name "*.mp3"')) 198 | 199 | def test_sudo(self): 200 | ext = CommandExtractor('ls') 201 | self.assertEqual(ext.get_command('# sudo ls -hl'), 202 | 'ls -hl') 203 | ext = CommandExtractor('sudo') 204 | self.assertEqual( 205 | ext.get_command('# sudo -u www vi ~www/htdocs/index.html'), 206 | 'sudo -u www vi ~www/htdocs/index.html') 207 | 208 | def test_has_wanted_command(self): 209 | ext = CommandExtractor('xargs') 210 | self.assertTrue(ext.has_wanted_command( 211 | u'find ./music -name "*.mp3" -print0 | xargs -0 ls')) 212 | self.assertFalse(ext.has_wanted_command( 213 | u'find ./music -name "*.mp3" -print0 | grep xargs')) 214 | 215 | ext = CommandExtractor(['du', 'mv']) 216 | self.assertTrue(ext.has_wanted_command( 217 | u'du -h -s *')) 218 | self.assertFalse(ext.has_wanted_command( 219 | u'ls -hl')) 220 | 221 | ext = CommandExtractor(['git commit']) 222 | self.assertTrue(ext.has_wanted_command( 223 | u'git commit --amend')) 224 | self.assertFalse(ext.has_wanted_command( 225 | u'git pull origin master')) 226 | 227 | def test_is_command_name(self): 228 | ext = CommandExtractor() 229 | 230 | self.assertTrue(ext.is_command_name('ls')) 231 | self.assertTrue(ext.is_command_name('l')) 232 | self.assertTrue(ext.is_command_name('7z')) 233 | self.assertTrue(ext.is_command_name('apt-get')) 234 | 235 | self.assertFalse(ext.is_command_name('')) 236 | self.assertFalse(ext.is_command_name('1')) 237 | self.assertFalse(ext.is_command_name('22')) 238 | self.assertFalse(ext.is_command_name('apt-')) 239 | self.assertFalse(ext.is_command_name('-')) 240 | self.assertFalse(ext.is_command_name('-ls')) 241 | self.assertFalse(ext.is_command_name('apt get')) 242 | self.assertFalse(ext.is_command_name('APT-GET')) 243 | 244 | def test_is_command_output(self): 245 | ext = CommandExtractor() 246 | self.assertTrue(ext.is_command_output( 247 | 'drwxr-xr-x 3 root root 4,0K maj 5 2014 home')) 248 | self.assertFalse(ext.is_command_output('total 0')) 249 | 250 | def test_is_command(self): 251 | ext = CommandExtractor() 252 | 253 | self.assertTrue(ext.is_command( 254 | u'date -s \u201cYYYY-MM-DD HH:MM:SS\u201d')) 255 | self.assertTrue(ext.is_command('uname -a')) 256 | self.assertTrue(ext.is_command('git log')) 257 | 258 | self.assertFalse(ext.is_command('')) 259 | self.assertFalse(ext.is_command('when I use:')) 260 | self.assertFalse(ext.is_command('is used.')) 261 | self.assertFalse(ext.is_command('thank you!!!!')) 262 | self.assertFalse(ext.is_command('thanks for sharing')) 263 | self.assertFalse(ext.is_command( 264 | 'drwxr-xr-x 3 root root 4,0K maj 5 2014 home')) 265 | self.assertFalse(ext.is_command( 266 | 'ls -al %s' % 'a'*ext.MAX_COMMAND_LENGTH)) 267 | self.assertFalse(ext.is_command( 268 | ("you can use '-c' & '-w' with wc to obtain number of characters" 269 | "and words repectively"))) 270 | self.assertFalse(ext.is_command('250 total')) 271 | 272 | # TODO 273 | self.assertTrue(ext.is_command('thanx :)')) 274 | -------------------------------------------------------------------------------- /tests/test_commands.py: -------------------------------------------------------------------------------- 1 | from unittest import TestCase 2 | 3 | from searchcmd.commands import Command, Commands 4 | from searchcmd.download import HtmlDocument 5 | 6 | 7 | class TestCommands(TestCase): 8 | 9 | def test_commands(self): 10 | 11 | cmds = Commands() 12 | doc = HtmlDocument('http://example.com', b'', 1) 13 | cmd = Command('ls', 1, 1, doc) 14 | cmds.add_command(cmd) 15 | cmd = Command(u'grep \u201ctest\u2033', 5, 2, doc) 16 | cmds.add_command(cmd) 17 | cmd = Command('ls', 22, 3, doc) 18 | cmds.add_command(cmd) 19 | 20 | cmds.nr_docs = 1 21 | 22 | ranked = cmds.rank_commands() 23 | 24 | self.assertEqual(len(ranked), 2) 25 | self.assertEqual(ranked[0].cmd, 'ls') 26 | 27 | for cmd in ranked: 28 | cmd.echo() 29 | cmd.echo(verbose=True) 30 | print(repr(cmd)) 31 | 32 | cmds = Commands.from_dict(cmds.to_dict()) 33 | for cmd in ranked: 34 | cmd_copy = Command.from_dict(cmd.to_dict()) 35 | self.assertEqual(cmd_copy, cmd) 36 | -------------------------------------------------------------------------------- /tests/test_download.py: -------------------------------------------------------------------------------- 1 | from unittest import TestCase 2 | import requests_mock 3 | 4 | from searchcmd import download 5 | from tests.testutils import get_html_doc, iter_html_docs 6 | 7 | 8 | class TestDownload(TestCase): 9 | 10 | def test_get(self): 11 | doc = get_html_doc('search_engines', 'google.com') 12 | req = download.Request(doc.url.url) 13 | with requests_mock.mock() as m: 14 | m.get(req.url, content=doc.body) 15 | resp = download.get(req) 16 | self.assertIsInstance(resp, download.HtmlDocument) 17 | self.assertEqual(resp.body, doc.body) 18 | 19 | with requests_mock.mock() as m: 20 | resp = download.get(req) 21 | self.assertIsInstance(resp, download.DownloadError) 22 | 23 | def test_iter_get(self): 24 | docs = list(iter_html_docs('cmdextract')) 25 | reqs = [download.Request(doc.url.url) for doc in docs] 26 | with requests_mock.mock() as m: 27 | for req, doc in zip(reqs[:-1], docs[:-1]): 28 | m.get(req.url, content=doc.body) 29 | resps = list(download.iter_get(reqs)) 30 | success = [doc.url.url.strip('/') for doc in resps if 31 | isinstance(doc, download.HtmlDocument)] 32 | for resp in resps: 33 | if isinstance(resp, download.DownloadError): 34 | if isinstance(resp.err, UnicodeDecodeError): 35 | raise resp.err 36 | self.assertEqual(set(success), set([r.url for r in reqs[:-1]])) 37 | 38 | m.get(reqs[-1].url, content=docs[-1].body) 39 | resps = list(download.iter_get(reqs[-1])) 40 | self.assertEqual( 41 | [r.url.url.strip('/') for r in resps], [reqs[-1].url]) 42 | 43 | def test_html_document(self): 44 | for doc in iter_html_docs('cmdextract'): 45 | self.assertTrue(doc.tree.tag) 46 | 47 | doc_cp = download.HtmlDocument.from_dict(doc.to_dict()) 48 | self.assertEqual(doc_cp.url.url, doc.url.url) 49 | -------------------------------------------------------------------------------- /tests/test_search_engines.py: -------------------------------------------------------------------------------- 1 | from unittest import TestCase 2 | 3 | from tests.testutils import get_html_doc 4 | from searchcmd.search_engines import get_engine, SearchEngine 5 | 6 | TEST_DATA_DIR = 'search_engines' 7 | 8 | 9 | class TestSearchEngines(TestCase): 10 | 11 | def test_google(self): 12 | urls = [ 13 | 'http://www.thegeekstuff.com/2013/12/xargs-examples/', 14 | 'http://www.cyberciti.biz/faq/linux-unix-bsd-xargs-construct-argument-lists-utility/', 15 | 'http://www.unixmantra.com/2013/12/xargs-all-in-one-tutorial-guide.html', 16 | 'http://javarevisited.blogspot.com/2012/06/10-xargs-command-example-in-linux-unix.html', 17 | 'http://www.computerhope.com/unix/xargs.htm', 18 | 'http://en.wikipedia.org/wiki/Xargs', 19 | 'http://unix.stackexchange.com/questions/24954/when-is-xargs-needed', 20 | 'https://sidvind.com/wiki/Xargs_by_example', 21 | 'http://linux.about.com/od/commands/a/Example-Uses-Of-The-xargs-Command.htm', 22 | 'http://www.linuxplanet.com/linuxplanet/tutorials/6522/1'] 23 | self._test_engine('google', urls) 24 | 25 | def test_bing(self): 26 | urls = [ 27 | 'http://www.cyberciti.biz/faq/linux-unix-bsd-xargs-construct-argument-lists-utility/', 28 | 'http://www.thegeekstuff.com/2013/12/xargs-examples/', 29 | 'http://javarevisited.blogspot.com/2012/06/10-xargs-command-example-in-linux-unix.html', 30 | 'http://www.computerhope.com/unix/xargs.htm', 31 | 'http://linux.101hacks.com/linux-commands/xargs-command-examples/', 32 | 'http://en.wikipedia.org/wiki/Xargs', 33 | 'https://www.mkssoftware.com/docs/man1/xargs.1.asp', 34 | 'http://linux.about.com/od/commands/a/Example-Uses-Of-The-xargs-Command.htm', 35 | 'http://www.folkstalk.com/2012/07/xargs-command-examples-in-unix-linux.html', 36 | 'http://linux.die.net/man/1/xargs'] 37 | self._test_engine('bing', urls) 38 | 39 | def test_non_existent(self): 40 | with self.assertRaises(ValueError): 41 | get_engine('non_existent') 42 | 43 | def _test_engine(self, engine, expected_urls): 44 | e = get_engine(engine) 45 | req = e.get_search_request('test query') 46 | self.assertTrue('test+query' in req.url) 47 | 48 | doc = get_html_doc(TEST_DATA_DIR, '%s.com' % engine) 49 | 50 | urls = e.get_hits(doc) 51 | self.assertEqual([u.url for u in urls], expected_urls) 52 | 53 | def test_RE_URL_AS_PARAM(self): 54 | matched_urls = [ 55 | 'abchttp://example.com?get=1', 56 | 'abc http://example.com?get=1 ', 57 | 'abchttps://example.com?get=1', 58 | 'abchttps://example.com?get=1"', 59 | 'abchttp://example.com?get=1&gett=2', 60 | 'abc hhttp://example.com?get=1', 61 | 'abc http http://example.com?get=1', 62 | 'abc http https://example.com?get=1', 63 | 'abc https https://example.com?get=1', 64 | 'https://b.com', 65 | ] 66 | 67 | for url in matched_urls: 68 | self.assertIsNotNone(SearchEngine.RE_URL_AS_PARAM.search(url)) 69 | 70 | no_matched_urls = [ 71 | '', 72 | 'abchtttp://example.com?get=1', 73 | 'abchttpp://example.com?get=1', 74 | 'abchttpss://example.com?get=1', 75 | 'abchttp//example.com?get=1', 76 | 'abchttp:/example.com?get=1', 77 | 'abchttp:example.com?get=1', 78 | 'abc http htts://example.com?get=1', 79 | 'abc http ttps://example.com?get=1', 80 | 'https://', 81 | 'http://', 82 | 'http://a', 83 | 'https://b', 84 | ] 85 | 86 | for url in no_matched_urls: 87 | self.assertIsNone(SearchEngine.RE_URL_AS_PARAM.search(url)) 88 | -------------------------------------------------------------------------------- /tests/test_searchcmd.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from io import StringIO 3 | from unittest import TestCase 4 | 5 | import requests_mock 6 | 7 | from searchcmd.commands import Commands, Command 8 | from searchcmd.download import HtmlDocument 9 | from tests.testutils import get_html_doc 10 | from searchcmd.search_engines import get_engine 11 | import searchcmd 12 | 13 | main = searchcmd.main 14 | 15 | 16 | class TestSearchCommand(TestCase): 17 | 18 | def setUp(self): 19 | self.internal_stdout = StringIO() 20 | self.internal_stderr = StringIO() 21 | searchcmd.stdout = searchcmd.get_print_func(self.internal_stdout) 22 | searchcmd.stderr = searchcmd.get_print_func(self.internal_stderr) 23 | 24 | def tearDown(self): 25 | searchcmd.stdout = searchcmd.get_print_func(sys.stdout) 26 | searchcmd.stderr = searchcmd.get_print_func(sys.stderr) 27 | 28 | def test_print(self): 29 | searchcmd.stdout = searchcmd.get_print_func(sys.stdout) 30 | 31 | def mock_search(query_string=None, cmd=None, **kwargs): 32 | coms = Commands() 33 | cmd = u'git commit \u2013amend -m \u2018new message\u2019' 34 | doc = HtmlDocument(u'http://example.com', b'', 1) 35 | coms.add_command(Command(cmd, 1, 1, doc)) 36 | return coms 37 | 38 | orig_search = searchcmd.search 39 | searchcmd.search = mock_search 40 | try: 41 | main(['git commit', '--no-cache']) 42 | finally: 43 | searchcmd.search = orig_search 44 | 45 | def test_search_engine_error(self): 46 | searchcmd.stdout = searchcmd.get_print_func(sys.stdout) 47 | searchcmd.stderr = searchcmd.get_print_func(sys.stderr) 48 | with requests_mock.mock() as m: 49 | exit_code = main(['find', '--no-cache']) 50 | self.assertNotEqual(exit_code, 0) 51 | 52 | def test_query(self): 53 | self.result = None 54 | 55 | def mock_search(query_string=None, cmd=None, **kwargs): 56 | self.result = (cmd, query_string) 57 | return Commands() 58 | 59 | orig_search = searchcmd.search 60 | searchcmd.search = mock_search 61 | try: 62 | main(['find', '--no-cache']) 63 | self.assertEqual(self.result, ('find', 'linux find examples')) 64 | main(['search replace', '--no-cache']) 65 | self.assertEqual(self.result, (None, 'linux search replace')) 66 | main(['git', 'commit', 'change last commit message', '--no-cache']) 67 | self.assertEqual( 68 | self.result, ('git commit', 69 | 'linux git commit change last commit message')) 70 | with self.assertRaises(ValueError): 71 | searchcmd.get_query_string([]) 72 | finally: 73 | searchcmd.search = orig_search 74 | 75 | def test_engine(self): 76 | so = get_html_doc('cmdextract', 'stackoverflow.com') 77 | query = ['du'] 78 | options = ['--engine', 'bing', '--no-cache'] 79 | with requests_mock.mock() as m: 80 | self._setup_request_mock(m, query, [so], search_engine='bing') 81 | main(query + options) 82 | self._check_output(contains='du') 83 | 84 | def test_cache(self): 85 | so = get_html_doc('cmdextract', 'stackoverflow.com') 86 | query = ['du'] 87 | options = ['--no-cache'] 88 | with requests_mock.mock() as m: 89 | self._setup_request_mock(m, query, [so]) 90 | main(query + options) 91 | self._check_output(contains='du') 92 | self._truncate_stdout() 93 | with requests_mock.mock() as m: 94 | main(query) 95 | self._check_output(contains='du') 96 | 97 | def test_max_download(self): 98 | with requests_mock.mock() as m: 99 | self._setup_request_mock(m, ['du'], []) 100 | main(['du', '--no-cache', '--max-download', '0']) 101 | self.assertNotIn('du', self._get_current_stdout()) 102 | 103 | def test_verbose(self): 104 | so = get_html_doc('cmdextract', 'stackoverflow.com') 105 | query = ['du'] 106 | options = ['-v'] 107 | with requests_mock.mock() as m: 108 | self._setup_request_mock(m, query, [so]) 109 | main(query) 110 | non_verbose = self._get_current_stdout() 111 | self._truncate_stdout() 112 | with requests_mock.mock() as m: 113 | main(query + options) 114 | self.assertGreater(len(self._get_current_stdout()), len(non_verbose)) 115 | 116 | def test_max_hits(self): 117 | so = get_html_doc('cmdextract', 'stackoverflow.com') 118 | query = ['du'] 119 | options = ['--max-hits', '1'] 120 | with requests_mock.mock() as m: 121 | self._setup_request_mock(m, query, [so]) 122 | main(query) 123 | nr_lines = len(self._get_current_stdout().split('\n')) 124 | self._truncate_stdout() 125 | with requests_mock.mock() as m: 126 | main(query + options) 127 | self.assertGreater( 128 | nr_lines, len(self._get_current_stdout().split('\n'))) 129 | 130 | def _get_current_stdout(self): 131 | return self.internal_stdout.getvalue() 132 | 133 | def _truncate_stdout(self): 134 | self.internal_stdout.truncate(0) 135 | 136 | def _setup_request_mock(self, mock, query, search_results, 137 | search_engine='google'): 138 | engine_doc = get_html_doc('search_engines', search_engine + '.com') 139 | e = get_engine(search_engine) 140 | query_string, _ = searchcmd.get_query_string(query) 141 | url = e.get_search_request(query_string).url 142 | mock.get(url, content=engine_doc.body) 143 | urls = e.get_hits(engine_doc) 144 | for url, result in zip(urls, search_results): 145 | mock.get(url.url, content=result.body) 146 | 147 | def _check_output(self, nr_lines=None, contains=None, equals=None): 148 | out = self._get_current_stdout() 149 | if nr_lines is not None: 150 | self.assertEqual(len(out.split('\n')), nr_lines) 151 | if contains is not None: 152 | self.assertIn(contains, out) 153 | if equals is not None: 154 | self.assertEqual(out, equals) 155 | -------------------------------------------------------------------------------- /tests/testdata/cmdextract/brunolinux.com: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 6 | 8 | 9 | Find and Replace with Sed 10 | 11 | 12 | 14 | 15 | 16 | 17 | 18 | 20 | 21 | 23 | 24 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 |
Tips 19 | Linux Explorers  All 22 | Things Linux Forum  Great 25 | Linux Links  LinuxClues.com  Hometown  Email 
35 | 36 |
37 |
38 |
39 |
40 |
41 | FIND AND REPLACE with SED
42 |
43 |
44 |
45 | Let us start off simple:
46 | Imagine 47 | you have a large file ( txt, php, html, anything ) and you want to 48 | replace all the words "ugly" with "beautiful" because you just met your 49 | old friend Sue again and she/he is coming over for a visit.
50 |
51 |
52 | This is the command:
53 |
54 |
55 | 57 | 58 | 59 | 60 | 61 | 62 | 65 | 66 | 67 |
CODE
$ 63 | sed -i 's/ugly/beautiful/g' 64 | /home/bruno/old-friends/sue.txt
68 |

69 |
70 | Well, that command speaks for itself "sed" edits "-i in 71 | place ( on the spot ) and replaces the word "ugly with "beautiful" 72 | in the file "/home/bruno/old-friends/sue.txt"
73 |
74 |
75 | Now, here comes the real magic:
76 | Imagine 77 | you have a whole lot of files in a directory ( all about Sue ) and you 78 | want the same command to do all those files in one go because she/he is 79 | standing right at the door . .
80 | Remember the find command ? We will combine the two:
82 |
83 |
84 | 86 | 87 | 88 | 89 | 90 | 91 | 94 | 95 | 96 |
CODE
$ 92 | find /home/bruno/old-friends -type f 93 | -exec sed -i 's/ugly/beautiful/g' {} \;
97 |

98 |
99 | Sure 100 | in combination with the find command you can do all kind of nice 101 | tricks, even if you don't remember where the files are located !
102 |
103 |
104 | Aditionally I did find a little script on the net for if you often have to find and 107 | replace multiple files at once:
108 |
109 |
110 | 112 | 113 | 114 | 115 | 116 | 117 | 123 | 124 | 125 |
CODE
#!/bin/bash
118 |      for fl in *.php; do
119 |      mv $fl $fl.old
120 |      sed 's/FINDSTRING/REPLACESTRING/g' $fl.old > $fl
121 |      rm -f $fl.old
122 |      done
126 |

127 | just replace the "*.php", "FINDSTRING" and "REPLACESTRING" 128 | make it executable and you are set.
129 |
130 | I 131 | changed a www address in 183 .html files in one go with this little 132 | script . . . but note that you have to use "escape-signs" ( \ ) if 133 | there are slashes in the text you want to replace, so as an example: 134 | 's/www.search.yahoo.com\/images/www.google.com\/linux/g' to change 135 | www.search.yahoo.com/images to www.google.com/linux
136 |
137 |
138 |
139 |
140 | For the lovers of perl I also found this one:
141 |
142 |
143 | 145 | 146 | 147 | 148 | 149 | 150 | 153 | 154 | 155 |
CODE
# 151 | perl -e "s/old_string/new_string/g;" 152 | -pi.save $(find DirectoryName -type f)
156 |

157 | But 158 | it leaves "traces", e.g it backs up the old file with a .save extension 159 | . . . so is not really effective when Sue comes around ;-/
160 |
161 |
162 |
163 | Bruno 164 |
165 |
166 |
167 | -- Jan 4 2005 ( Revised Dec 10 2005 ) -- 168 |
169 |
170 |
171 |
172 |
173 | 174 | 175 | 176 | 177 | 179 | 180 | 182 | 183 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 |
Tips 178 | Linux Explorers  All 181 | Things Linux Forum  Great 184 | Linux Links  LinuxClues.com  Hometown  Email 
194 |
195 | 196 | 197 | 198 | -------------------------------------------------------------------------------- /tests/testdata/cmdextract/stackoverflow.com: -------------------------------------------------------------------------------- 1 | 2 |
3 |

Try this

4 | 5 |
du -h --max-depth=1
6 | 7 |

Output

8 | 9 |
oliver@home:/usr$ sudo du -h --max-depth=1
10 | 24M     ./include
11 | 20M     ./sbin
12 | 228M    ./local
13 | 4.0K    ./src
14 | 520M    ./lib
15 | 8.0K    ./games
16 | 1.3G    ./share
17 | 255M    ./bin
18 | 2.4G    .
19 | 20 |

Alternative

21 | 22 |

If --max-depth=1 is a bit too long for your taste, you can also try using:

23 | 24 |
du -h -s *
25 | 26 |

This uses -s (--summarize) and will only print the size of the folder itself by default. By passing all elements in the current working directory (*), it produces similar output as --max-depth=1 would:

27 | 28 |

Output

29 | 30 |
oliver@cloud:/usr$ sudo du -h -s *
31 | 255M    bin
32 | 8.0K    games
33 | 24M     include
34 | 520M    lib
35 | 0       lib64
36 | 228M    local
37 | 20M     sbin
38 | 1.3G    share
39 | 4.0K    src
40 | 41 |

The difference is subtle. The former approach will display the total size of the current working directory and the total size of all folders that are contained in it... but only up to a depth of 1.

42 | 43 |

The latter approach will calculate the total size of all passed items individually. Thus, it includes the symlink lib64 in the output, but excludes the hidden items (whose name start with a dot). It also lacks the total size for the current working directory, as that was not passed as an argument.

44 |
45 | 46 | 47 | 49 | 62 | 63 | 64 | 65 | 81 | 82 |
48 |
share|improve this answer
50 | 66 | 67 | 68 | 80 |
83 | 84 | -------------------------------------------------------------------------------- /tests/testdata/search_engines/bing.com: -------------------------------------------------------------------------------- 1 | linux xargs examples - Bing

Bing

12 |
  1. xargs: How To Control and Use Command Line Arguments

    www.cyberciti.biz/faq/linux-unix-bsd-xargs-construct-argument...

    I am trying to use xargs command using shell pipes and not able to understand how to control and use command line arguments. For example I'd like to find out all *.c ...

  2. 10 Xargs Command Examples in Linux / UNIX - The …

    www.thegeekstuff.com/2013/12/xargs-examples

    The xargs command is extremely useful when we combine it with other commands. This tutorials explains the usage of xargs command using few simple examples.

  3. 10 xargs command example in Linux - Unix tutorial

    javarevisited.blogspot.com/...xargs-command-example-in-linux-unix.html

    2012-06-07 · In short xargs command in Unix or Linux is an essential tool which enhances functionality of front line commands like find, grep or cut and gives more ...

  4. Linux and Unix xargs command help and examples

    www.computerhope.com/unix/xargs.htm

    Unix and Linux xargs command information, examples, and syntax. Unix and Linux xargs command information, examples, and syntax. Skip to Main Content. Search. …

  5. Hack 22. Xargs Command Examples - Linux 101 Hacks …

    linux.101hacks.com/linux-commands/xargs-command-examples

    xargs is a very powerful command that takes output of a command and pass it as argument of another command. Following are some practical examples on how to

  6. xargs - Wikipedia, the free encyclopedia

    en.wikipedia.org/wiki/Xargs

    xargs is a command on Unix and most Unix-like operating systems used to build and execute command lines ... under the Linux kernel before version 2 ... For example ...

  7. xargs -- construct and execute command lines

    https://www.mkssoftware.com/docs/man1/xargs.1.asp

    For example, in xargs -n 2 diff obtains two arguments from the standard input, appends them to the diff command, and executes the command.

  8. Example Uses of the xargs Command - Linux

    The xargs command is typically used in a command line where the output of one command is passed on as input arguments to another command. In many cases no …

  9. xargs command examples in Unix / Linux Tutorial

    www.folkstalk.com/2012/07/xargs-command-examples-in-unix-linux.html

    Xargs command in unix or linux operating system is used to pass the output of one command as an argument to another command. Some of the unix or linux commands …

  10. xargs(1): build/execute from stdin - Linux man page

    linux.die.net/man/1/xargs

    xargs(1) - Linux man page Name. ... to edit the files listed on xargs' standard input. This example achieves the same effect as BSD's -o option, ...

  1. Related searches

-------------------------------------------------------------------------------- /tests/testutils.py: -------------------------------------------------------------------------------- 1 | import os 2 | from searchcmd.download import HtmlDocument 3 | 4 | BASE_DATA_DIR = os.path.join(os.path.dirname(__file__), 'testdata') 5 | 6 | 7 | def iter_html_docs(data_dir): 8 | for fname in os.listdir(os.path.join(BASE_DATA_DIR, data_dir)): 9 | yield get_html_doc(data_dir, fname) 10 | 11 | 12 | def get_html_doc(data_dir, fname): 13 | with open(os.path.join(BASE_DATA_DIR, data_dir, fname), 'rb') as inp: 14 | body = inp.read() 15 | base_url = 'http://%s' % fname 16 | return HtmlDocument(base_url, body) 17 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | # Tox (http://tox.testrun.org/) is a tool for running tests 2 | # in multiple virtualenvs. This configuration file will run the 3 | # test suite on all supported python versions. To use it, "pip install tox" 4 | # and then run "tox" from this directory. 5 | 6 | [tox] 7 | envlist = py27, py34 8 | 9 | [testenv] 10 | commands = {envpython} setup.py test 11 | deps = 12 | requests_mock 13 | --------------------------------------------------------------------------------