├── tests ├── __init__.py ├── data │ ├── expect │ │ ├── ppp_text_5.txt │ │ ├── ppp_rec_6.txt │ │ ├── ppp_rec_8.txt │ │ ├── ppp_text_1.txt │ │ ├── ppp_rec_20.txt │ │ ├── ppp_line_4.txt │ │ ├── ppp_line_2.txt │ │ ├── ppp_rec_5.txt │ │ ├── ppp_rec_4.txt │ │ ├── ppp_csv_3.txt │ │ ├── ppp_line_3.txt │ │ ├── ppp_rec_3.txt │ │ ├── ppp_text_2.txt │ │ ├── ppp_rec_1.txt │ │ ├── ppp_rec_17.txt │ │ ├── ppp_rec_19.txt │ │ ├── ppp_rec_2.txt │ │ ├── ppp_rec_12.txt │ │ ├── ppp_rec_14.txt │ │ ├── ppp_rec_15.txt │ │ ├── ppp_rec_7.txt │ │ ├── ppp_rec_9.txt │ │ ├── ppp_csv_2.txt │ │ ├── ppp_line_1.txt │ │ ├── ppp_rec_13.txt │ │ ├── ppp_csv_1.txt │ │ ├── ppp_rec_18.txt │ │ ├── ppp_rec_10.txt │ │ ├── ppp_rec_11.txt │ │ ├── ppp_csv_4.txt │ │ ├── ppp_text_3.txt │ │ ├── ppp_text_4.txt │ │ └── ppp_rec_16.txt │ └── input │ │ ├── echo_line_1.txt │ │ ├── echo_rec_2.txt │ │ ├── echo_line_2.txt │ │ ├── echo_rec_3.txt │ │ ├── echo_rec_1.txt │ │ ├── echo_rec_4.txt │ │ ├── staff.csv │ │ ├── staff.txt │ │ ├── staff.jsonlines.txt │ │ ├── staff.xml │ │ └── staff.json └── test_pypipe_examples.py ├── .vscode └── settings.json ├── docs ├── view_sample1.png ├── view_sample2.png ├── view_sample3.png ├── bat_pager_sample.png ├── staff.csv ├── staff.txt ├── cinema.csv ├── staff.jsonlines.txt ├── population.csv ├── create_demo.sh ├── staff.xml ├── staff.json └── people.csv ├── wasmer.toml ├── .pre-commit-config.yaml ├── .github └── workflows │ ├── ci.yml │ └── release.yml ├── pyproject.toml ├── LICENSE ├── pypipe.py └── README.md /tests/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_text_5.txt: -------------------------------------------------------------------------------- 1 | 5 2 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_6.txt: -------------------------------------------------------------------------------- 1 | CCC 2 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_8.txt: -------------------------------------------------------------------------------- 1 | 1000 2 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_text_1.txt: -------------------------------------------------------------------------------- 1 | 231 2 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_20.txt: -------------------------------------------------------------------------------- 1 | TOTAL WEIGHT 4271 2 | -------------------------------------------------------------------------------- /tests/data/input/echo_line_1.txt: -------------------------------------------------------------------------------- 1 | 1 2 | 2 3 | 3 4 | 4 5 | 5 6 | -------------------------------------------------------------------------------- /tests/data/input/echo_rec_2.txt: -------------------------------------------------------------------------------- 1 | AAA BBB CCC DDD 2 | -------------------------------------------------------------------------------- /tests/data/input/echo_line_2.txt: -------------------------------------------------------------------------------- 1 | https://github.com/bugen/pypipe 2 | -------------------------------------------------------------------------------- /tests/data/input/echo_rec_3.txt: -------------------------------------------------------------------------------- 1 | Height: 200px, Width: 1000px 2 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_line_4.txt: -------------------------------------------------------------------------------- 1 | https github.com /bugen/pypipe 2 | -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "markdown.extension.toc.levels": "1..2" 3 | } -------------------------------------------------------------------------------- /tests/data/expect/ppp_line_2.txt: -------------------------------------------------------------------------------- 1 | Simba 2 | Dumbo 3 | George 4 | Pooh 5 | Bob 6 | -------------------------------------------------------------------------------- /tests/data/input/echo_rec_1.txt: -------------------------------------------------------------------------------- 1 | Hello 100 10.2 True {"id":100,"title":"sample"} 2 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_5.txt: -------------------------------------------------------------------------------- 1 | Name 2 | Simba 3 | Dumbo 4 | George 5 | Pooh 6 | Bob 7 | -------------------------------------------------------------------------------- /docs/view_sample1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bugen/pypipe/HEAD/docs/view_sample1.png -------------------------------------------------------------------------------- /docs/view_sample2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bugen/pypipe/HEAD/docs/view_sample2.png -------------------------------------------------------------------------------- /docs/view_sample3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bugen/pypipe/HEAD/docs/view_sample3.png -------------------------------------------------------------------------------- /docs/bat_pager_sample.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bugen/pypipe/HEAD/docs/bat_pager_sample.png -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_4.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /tests/data/input/echo_rec_4.txt: -------------------------------------------------------------------------------- 1 | Hello 100 10.2 True None (1,2,3) [1,2,3] {1,2,3} {"id":100,"title":"sample"} 2 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_csv_3.txt: -------------------------------------------------------------------------------- 1 | Simba,250,1994-06-15,29,Lion,Mammal 2 | Dumbo,4000,1941-10-23,81,Elephant,Mammal 3 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_line_3.txt: -------------------------------------------------------------------------------- 1 | 1 1.0 2 | 2 1.4142135623730951 3 | 3 1.7320508075688772 4 | 4 2.0 5 | 5 2.23606797749979 6 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_3.txt: -------------------------------------------------------------------------------- 1 | Simba 1994-06-15 2 | Dumbo 1941-10-23 3 | George 1939-01-01 4 | Pooh 1921-08-21 5 | Bob 1999-05-01 6 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_text_2.txt: -------------------------------------------------------------------------------- 1 | {'Name': 'Simba', 'Weight': 250, 'Birth': '1994-06-15', 'Age': 29, 'Species': 'Lion', 'Class': 'Mammal'} 2 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_1.txt: -------------------------------------------------------------------------------- 1 | Name Weight Birth 2 | Simba 250 1994-06-15 3 | Dumbo 4000 1941-10-23 4 | George 20 1939-01-01 5 | Pooh 1 1921-08-21 6 | Bob 0 1999-05-01 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_17.txt: -------------------------------------------------------------------------------- 1 | Birth Weight Name 2 | 1994-06-15 250 Simba 3 | 1941-10-23 4000 Dumbo 4 | 1939-01-01 20 George 5 | 1921-08-21 1 Pooh 6 | 1999-05-01 0 Bob 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_19.txt: -------------------------------------------------------------------------------- 1 | Name Weight Birth 2 | Simba 250 1994-06-15 3 | Dumbo 4000 1941-10-23 4 | George 20 1939-01-01 5 | Pooh 1 1921-08-21 6 | Bob 0 1999-05-01 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_2.txt: -------------------------------------------------------------------------------- 1 | Name is Age years old 2 | Simba is 29 years old 3 | Dumbo is 81 years old 4 | George is 84 years old 5 | Pooh is 102 years old 6 | Bob is 24 years old 7 | -------------------------------------------------------------------------------- /docs/staff.csv: -------------------------------------------------------------------------------- 1 | Name,Weight,Birth,Age,Species,Class 2 | Simba,250,1994-06-15,29,Lion,Mammal 3 | Dumbo,4000,1941-10-23,81,Elephant,Mammal 4 | George,20,1939-01-01,84,Monkey,Mammal 5 | Pooh,1,1921-08-21,102,Teddy bear,Artifact 6 | Bob,0,1999-05-01,24,Sponge,Demosponge 7 | -------------------------------------------------------------------------------- /docs/staff.txt: -------------------------------------------------------------------------------- 1 | Name Weight Birth Age Species Class 2 | Simba 250 1994-06-15 29 Lion Mammal 3 | Dumbo 4000 1941-10-23 81 Elephant Mammal 4 | George 20 1939-01-01 84 Monkey Mammal 5 | Pooh 1 1921-08-21 102 Teddy bear Artifact 6 | Bob 0 1999-05-01 24 Sponge Demosponge 7 | -------------------------------------------------------------------------------- /tests/data/input/staff.csv: -------------------------------------------------------------------------------- 1 | Name,Weight,Birth,Age,Species,Class 2 | Simba,250,1994-06-15,29,Lion,Mammal 3 | Dumbo,4000,1941-10-23,81,Elephant,Mammal 4 | George,20,1939-01-01,84,Monkey,Mammal 5 | Pooh,1,1921-08-21,102,Teddy bear,Artifact 6 | Bob,0,1999-05-01,24,Sponge,Demosponge 7 | -------------------------------------------------------------------------------- /tests/data/input/staff.txt: -------------------------------------------------------------------------------- 1 | Name Weight Birth Age Species Class 2 | Simba 250 1994-06-15 29 Lion Mammal 3 | Dumbo 4000 1941-10-23 81 Elephant Mammal 4 | George 20 1939-01-01 84 Monkey Mammal 5 | Pooh 1 1921-08-21 102 Teddy bear Artifact 6 | Bob 0 1999-05-01 24 Sponge Demosponge 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_12.txt: -------------------------------------------------------------------------------- 1 | Name,Weight,Birth,Age,Species,Class 2 | Simba,250,1994-06-15,29,Lion,Mammal 3 | Dumbo,4000,1941-10-23,81,Elephant,Mammal 4 | George,20,1939-01-01,84,Monkey,Mammal 5 | Pooh,1,1921-08-21,102,Teddy bear,Artifact 6 | Bob,0,1999-05-01,24,Sponge,Demosponge 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_14.txt: -------------------------------------------------------------------------------- 1 | Name Weight Birth Age Species Class 2 | Simba 250 1994-06-15 29 Lion Mammal 3 | Dumbo 4000 1941-10-23 81 Elephant Mammal 4 | George 20 1939-01-01 84 Monkey Mammal 5 | Pooh 1 1921-08-21 102 Teddy bear Artifact 6 | Bob 0 1999-05-01 24 Sponge Demosponge 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_15.txt: -------------------------------------------------------------------------------- 1 | Name Weight Birth Age Species Class 2 | Simba 250 1994 06 15 29 Lion Mammal 3 | Dumbo 4000 1941 10 23 81 Elephant Mammal 4 | George 20 1939 01 01 84 Monkey Mammal 5 | Pooh 1 1921 08 21 102 Teddy bear Artifact 6 | Bob 0 1999 05 01 24 Sponge Demosponge 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_7.txt: -------------------------------------------------------------------------------- 1 | Name,Weight,Birth,Age,Species,Class 2 | Simba,250,1994-06-15,29,Lion,Mammal 3 | Dumbo,4000,1941-10-23,81,Elephant,Mammal 4 | George,20,1939-01-01,84,Monkey,Mammal 5 | Pooh,1,1921-08-21,102,Teddy bear,Artifact 6 | Bob,0,1999-05-01,24,Sponge,Demosponge 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_9.txt: -------------------------------------------------------------------------------- 1 | Name Weight Birth Age Species Class 2 | Simba 250 1994-06-15 29 Lion Mammal 3 | Dumbo 4000 1941-10-23 81 Elephant Mammal 4 | George 20 1939-01-01 84 Monkey Mammal 5 | Pooh 1 1921-08-21 102 Teddy bear Artifact 6 | Bob 0 1999-05-01 24 Sponge Demosponge 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_csv_2.txt: -------------------------------------------------------------------------------- 1 | Name Weight Birth Age Species Class 2 | Simba 250 1994-06-15 29 Lion Mammal 3 | Dumbo 4000 1941-10-23 81 Elephant Mammal 4 | George 20 1939-01-01 84 Monkey Mammal 5 | Pooh 1 1921-08-21 102 Teddy bear Artifact 6 | Bob 0 1999-05-01 24 Sponge Demosponge 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_line_1.txt: -------------------------------------------------------------------------------- 1 | 1 NAME WEIGHT BIRTH AGE SPECIES CLASS 2 | 2 SIMBA 250 1994-06-15 29 LION MAMMAL 3 | 3 DUMBO 4000 1941-10-23 81 ELEPHANT MAMMAL 4 | 4 GEORGE 20 1939-01-01 84 MONKEY MAMMAL 5 | 5 POOH 1 1921-08-21 102 TEDDY BEAR ARTIFACT 6 | 6 BOB 0 1999-05-01 24 SPONGE DEMOSPONGE 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_13.txt: -------------------------------------------------------------------------------- 1 | Name||Weight||Birth||Age||Species||Class 2 | Simba||250||1994-06-15||29||Lion||Mammal 3 | Dumbo||4000||1941-10-23||81||Elephant||Mammal 4 | George||20||1939-01-01||84||Monkey||Mammal 5 | Pooh||1||1921-08-21||102||Teddy bear||Artifact 6 | Bob||0||1999-05-01||24||Sponge||Demosponge 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_csv_1.txt: -------------------------------------------------------------------------------- 1 | "Name","Weight","Birth","Age","Species","Class" 2 | "Simba","250","1994-06-15","29","Lion","Mammal" 3 | "Dumbo","4000","1941-10-23","81","Elephant","Mammal" 4 | "George","20","1939-01-01","84","Monkey","Mammal" 5 | "Pooh","1","1921-08-21","102","Teddy bear","Artifact" 6 | "Bob","0","1999-05-01","24","Sponge","Demosponge" 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_18.txt: -------------------------------------------------------------------------------- 1 | [Record 1] 2 | 1 ('Hello', ) 3 | 2 (100, ) 4 | 3 (10.2, ) 5 | 4 (True, ) 6 | 5 (None, ) 7 | 6 ((1, 2, 3), ) 8 | 7 ([1, 2, 3], ) 9 | 8 ({1, 2, 3}, ) 10 | 9 ({'id': 100, 'title': 'sample'}, ) 11 | 12 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_10.txt: -------------------------------------------------------------------------------- 1 | ["Name", "Weight", "Birth", "Age", "Species", "Class"] 2 | ["Simba", "250", "1994-06-15", "29", "Lion", "Mammal"] 3 | ["Dumbo", "4000", "1941-10-23", "81", "Elephant", "Mammal"] 4 | ["George", "20", "1939-01-01", "84", "Monkey", "Mammal"] 5 | ["Pooh", "1", "1921-08-21", "102", "Teddy bear", "Artifact"] 6 | ["Bob", "0", "1999-05-01", "24", "Sponge", "Demosponge"] 7 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_11.txt: -------------------------------------------------------------------------------- 1 | ['Name', 'Weight', 'Birth', 'Age', 'Species', 'Class'] 2 | ['Simba', '250', '1994-06-15', '29', 'Lion', 'Mammal'] 3 | ['Dumbo', '4000', '1941-10-23', '81', 'Elephant', 'Mammal'] 4 | ['George', '20', '1939-01-01', '84', 'Monkey', 'Mammal'] 5 | ['Pooh', '1', '1921-08-21', '102', 'Teddy bear', 'Artifact'] 6 | ['Bob', '0', '1999-05-01', '24', 'Sponge', 'Demosponge'] 7 | -------------------------------------------------------------------------------- /docs/cinema.csv: -------------------------------------------------------------------------------- 1 | Title,Year,Director,Genre,Revenue,Budget 2 | "Cutthroat Island",1995,"Renny Harlin","Adventure, Action",10,115 3 | "Town & Country",2001,"Peter Chelsom","Comedy, Romance",10.4,90 4 | "The Alamo",2004,"John Lee Hancock","Drama, Western",25.8,107 5 | "47 Ronin",2013,"Carl Rinsch","Action, Adventure",151.8,175 6 | "Ishtar",1987,"Elaine May","Action, Adventure",14.4,55 7 | "The Adventures of Pluto Nash",2002,"Ron Underwood","Action, Comedy",1.0,100 8 | -------------------------------------------------------------------------------- /wasmer.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "bugen/pypipe" 3 | version = "0.4.0" 4 | description = "pypipe in Wasmer" 5 | repository = "https://github.com/bugen/pypipe" 6 | readme = "README.md" 7 | 8 | [dependencies] 9 | "python/python" = "^0.2.0" 10 | 11 | [fs] 12 | "/src" = "./" 13 | 14 | [[command]] 15 | name = "ppp" 16 | module = "python/python:python" 17 | runner = "wasi" 18 | 19 | [command.annotations.wasi] 20 | main-args = [ 21 | "/src/pypipe.py", 22 | ] 23 | env = ["PYTHONEXECUTABLE: /bin/python"] 24 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_csv_4.txt: -------------------------------------------------------------------------------- 1 | ,,,,, 2 | ,,,,, 3 | ,,,,, 4 | ,,,,, 5 | ,,,,, 6 | ,,,,, 7 | -------------------------------------------------------------------------------- /docs/staff.jsonlines.txt: -------------------------------------------------------------------------------- 1 | {"Name": "Simba", "Weight": 250, "Birth": "1994-06-15", "Age": 29, "Species": "Lion", "Class": "Mammal"} 2 | {"Name": "Dumbo", "Weight": 4000, "Birth": "1941-10-23", "Age": 81, "Species": "Elephant", "Class": "Mammal"} 3 | {"Name": "George", "Weight": 20, "Birth": "1939-01-01", "Age": 84, "Species": "Monkey", "Class": "Mammal"} 4 | {"Name": "Pooh", "Weight": 1, "Birth": "1921-08-21", "Age": 102, "Species": "Teddy bear", "Class": "Artifact"} 5 | {"Name": "Bob", "Weight": 0, "Birth": "1999-05-01", "Age": 24, "Species": "Sponge", "Class": "Demosponge"} 6 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_text_3.txt: -------------------------------------------------------------------------------- 1 | {"Name": "Simba", "Weight": 250, "Birth": "1994-06-15", "Age": 29, "Species": "Lion", "Class": "Mammal"} 2 | {"Name": "Dumbo", "Weight": 4000, "Birth": "1941-10-23", "Age": 81, "Species": "Elephant", "Class": "Mammal"} 3 | {"Name": "George", "Weight": 20, "Birth": "1939-01-01", "Age": 84, "Species": "Monkey", "Class": "Mammal"} 4 | {"Name": "Pooh", "Weight": 1, "Birth": "1921-08-21", "Age": 102, "Species": "Teddy bear", "Class": "Artifact"} 5 | {"Name": "Bob", "Weight": 0, "Birth": "1999-05-01", "Age": 24, "Species": "Sponge", "Class": "Demosponge"} 6 | -------------------------------------------------------------------------------- /tests/data/input/staff.jsonlines.txt: -------------------------------------------------------------------------------- 1 | {"Name": "Simba", "Weight": 250, "Birth": "1994-06-15", "Age": 29, "Species": "Lion", "Class": "Mammal"} 2 | {"Name": "Dumbo", "Weight": 4000, "Birth": "1941-10-23", "Age": 81, "Species": "Elephant", "Class": "Mammal"} 3 | {"Name": "George", "Weight": 20, "Birth": "1939-01-01", "Age": 84, "Species": "Monkey", "Class": "Mammal"} 4 | {"Name": "Pooh", "Weight": 1, "Birth": "1921-08-21", "Age": 102, "Species": "Teddy bear", "Class": "Artifact"} 5 | {"Name": "Bob", "Weight": 0, "Birth": "1999-05-01", "Age": 24, "Species": "Sponge", "Class": "Demosponge"} 6 | -------------------------------------------------------------------------------- /docs/population.csv: -------------------------------------------------------------------------------- 1 | City,State,Population 2 | New York,New York,8398748 3 | Los Angeles,California,3990456 4 | Chicago,Illinois,2705994 5 | Houston,Texas,2320268 6 | Phoenix,Arizona,1680992 7 | Philadelphia,Pennsylvania,1584138 8 | San Antonio,Texas,1547253 9 | San Diego,California,1423851 10 | Dallas,Texas,1343573 11 | San Jose,California,1030119 12 | Austin,Texas,964254 13 | Jacksonville,Florida,903889 14 | Indianapolis,Indiana,876862 15 | San Francisco,California,883305 16 | Columbus,Ohio,892533 17 | Fort Worth,Texas,895008 18 | Charlotte,North Carolina,792862 19 | Detroit,Michigan,673104 20 | El Paso,Texas,681124 21 | Seattle,Washington,753675 22 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | # See https://pre-commit.com for more information 2 | # See https://pre-commit.com/hooks.html for more hooks 3 | repos: 4 | - repo: https://github.com/pre-commit/pre-commit-hooks 5 | rev: v3.2.0 6 | hooks: 7 | - id: trailing-whitespace 8 | exclude: ^(.+\.svg$|tests/.+\.txt)$ 9 | - id: end-of-file-fixer 10 | exclude: ^tests/.+\.txt$ 11 | - id: check-yaml 12 | - id: check-added-large-files 13 | - repo: https://github.com/pycqa/isort 14 | rev: 5.12.0 15 | hooks: 16 | - id: isort 17 | - repo: local 18 | hooks: 19 | - id: pytest-check 20 | name: pytest-check 21 | entry: pytest 22 | language: system 23 | pass_filenames: false 24 | always_run: true 25 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_text_4.txt: -------------------------------------------------------------------------------- 1 | [Record 1] 2 | 1 {'data': [{'Age': 29, 'Birth': '1994-06-15', 'Class': 'Mammal', 'Name': 'Simba', 'Species': 'Lion', 'Weight': 250}, 3 | . {'Age': 81, 'Birth': '1941-10-23', 'Class': 'Mammal', 'Name': 'Dumbo', 'Species': 'Elephant', 'Weight': 4000}, 4 | . {'Age': 84, 'Birth': '1939-01-01', 'Class': 'Mammal', 'Name': 'George', 'Species': 'Monkey', 'Weight': 20}, 5 | . {'Age': 102, 6 | . 'Birth': '1921-08-21', 7 | . 'Class': 'Artifact', 8 | . 'Name': 'Pooh', 9 | . 'Species': 'Teddy bear', 10 | . 'Weight': 1}, 11 | . {'Age': 24, 'Birth': '1999-05-01', 'Class': 'Demosponge', 'Name': 'Bob', 'Species': 'Sponge', 'Weight': 0}], 12 | . 'number_of_records': 5} 13 | 14 | -------------------------------------------------------------------------------- /tests/data/expect/ppp_rec_16.txt: -------------------------------------------------------------------------------- 1 | [Record 1] 2 | 1 | Name | Simba 3 | 2 | Weight | 250 4 | 3 | Birth | 1994-06-15 5 | 4 | Age | 29 6 | 5 | Species | Lion 7 | 6 | Class | Mammal 8 | 9 | [Record 2] 10 | 1 | Name | Dumbo 11 | 2 | Weight | 4000 12 | 3 | Birth | 1941-10-23 13 | 4 | Age | 81 14 | 5 | Species | Elephant 15 | 6 | Class | Mammal 16 | 17 | [Record 3] 18 | 1 | Name | George 19 | 2 | Weight | 20 20 | 3 | Birth | 1939-01-01 21 | 4 | Age | 84 22 | 5 | Species | Monkey 23 | 6 | Class | Mammal 24 | 25 | [Record 4] 26 | 1 | Name | Pooh 27 | 2 | Weight | 1 28 | 3 | Birth | 1921-08-21 29 | 4 | Age | 102 30 | 5 | Species | Teddy bear 31 | 6 | Class | Artifact 32 | 33 | [Record 5] 34 | 1 | Name | Bob 35 | 2 | Weight | 0 36 | 3 | Birth | 1999-05-01 37 | 4 | Age | 24 38 | 5 | Species | Sponge 39 | 6 | Class | Demosponge 40 | 41 | -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- 1 | name: Run Unit Test via Pytest 2 | 3 | on: [push] 4 | 5 | jobs: 6 | build: 7 | runs-on: ubuntu-latest 8 | strategy: 9 | matrix: 10 | python-version: ["3.10"] 11 | 12 | steps: 13 | - uses: actions/checkout@v3 14 | - name: Set up Python ${{ matrix.python-version }} 15 | uses: actions/setup-python@v4 16 | with: 17 | python-version: ${{ matrix.python-version }} 18 | - name: Install dependencies 19 | run: | 20 | python -m pip install --upgrade pip 21 | if [ -f requirements.txt ]; then pip install -r requirements.txt; fi 22 | if ! pip show pytest; then pip install pytest; fi 23 | - name: Lint with Ruff 24 | run: | 25 | pip install ruff 26 | ruff --output-format=github --target-version=py310 . 27 | continue-on-error: true 28 | - name: Test with pytest 29 | run: | 30 | pytest -v -s 31 | -------------------------------------------------------------------------------- /docs/create_demo.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # HOW TO USE 4 | # 1. Install termtosvg https://github.com/nbedos/termtosvg 5 | # 2. termtosvg -c ./create_demo.sh demo.svg 6 | 7 | set -e 8 | set -u 9 | 10 | delay=${1:-0} 11 | PROMPT="$" 12 | 13 | enter() { 14 | INPUT=$1 15 | sleep 1 16 | type "$INPUT" 17 | sleep 0.5 18 | printf '%b' "\\n" 19 | eval "$INPUT" 20 | type "\\n" 21 | prompt 22 | } 23 | 24 | prompt() { 25 | printf '%b ' "$PROMPT" | pv -q 26 | } 27 | 28 | type() { 29 | printf '%b' "$1" | pv -qL $((10+(-2 + RANDOM%5))) 30 | } 31 | 32 | clear -x 33 | prompt 34 | sleep ${delay} 35 | 36 | main() { 37 | IFS='%' 38 | enter "cat staff.txt" 39 | enter "cat staff.txt| ppp 'line.upper()'" 40 | enter "cat staff.txt| ppp rec 'r[0]'" 41 | enter "cat staff.txt| ppp rec -l 6 f6,f5,f1" 42 | enter "cat staff.txt| ppp rec -H -f 'dic[\"Class\"] != \"Mammal\"'" 43 | enter "cat staff.txt| ppp rec -H -l6 --counter f6" 44 | enter "cat staff.txt| ppp rec -H --view" 45 | sleep 2 46 | type "\\n" 47 | unset IFS 48 | } 49 | 50 | main 51 | -------------------------------------------------------------------------------- /docs/staff.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | Simba 4 | 250 5 | 1994-06-15 6 | 29 7 | Lion 8 | Mammal 9 | 10 | 11 | Dumbo 12 | 4000 13 | 1941-10-23 14 | 81 15 | Elephant 16 | Mammal 17 | 18 | 19 | George 20 | 20 21 | 1939-01-01 22 | 84 23 | Monkey 24 | Mammal 25 | 26 | 27 | Pooh 28 | 1 29 | 1921-08-21 30 | 102 31 | Teddy bear 32 | Artifact 33 | 34 | 35 | Bob 36 | 0 37 | 1999-05-01 38 | 24 39 | Sponge 40 | Demosponge 41 | 42 | 43 | 44 | -------------------------------------------------------------------------------- /docs/staff.json: -------------------------------------------------------------------------------- 1 | { 2 | "number_of_records": 5, 3 | "data": [ 4 | { 5 | "Name": "Simba", 6 | "Weight": 250, 7 | "Birth": "1994-06-15", 8 | "Age": 29, 9 | "Species": "Lion", 10 | "Class": "Mammal" 11 | }, 12 | { 13 | "Name": "Dumbo", 14 | "Weight": 4000, 15 | "Birth": "1941-10-23", 16 | "Age": 81, 17 | "Species": "Elephant", 18 | "Class": "Mammal" 19 | }, 20 | { 21 | "Name": "George", 22 | "Weight": 20, 23 | "Birth": "1939-01-01", 24 | "Age": 84, 25 | "Species": "Monkey", 26 | "Class": "Mammal" 27 | }, 28 | { 29 | "Name": "Pooh", 30 | "Weight": 1, 31 | "Birth": "1921-08-21", 32 | "Age": 102, 33 | "Species": "Teddy bear", 34 | "Class": "Artifact" 35 | }, 36 | { 37 | "Name": "Bob", 38 | "Weight": 0, 39 | "Birth": "1999-05-01", 40 | "Age": 24, 41 | "Species": "Sponge", 42 | "Class": "Demosponge" 43 | } 44 | ] 45 | } 46 | -------------------------------------------------------------------------------- /tests/data/input/staff.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | Simba 4 | 250 5 | 1994-06-15 6 | 29 7 | Lion 8 | Mammal 9 | 10 | 11 | Dumbo 12 | 4000 13 | 1941-10-23 14 | 81 15 | Elephant 16 | Mammal 17 | 18 | 19 | George 20 | 20 21 | 1939-01-01 22 | 84 23 | Monkey 24 | Mammal 25 | 26 | 27 | Pooh 28 | 1 29 | 1921-08-21 30 | 102 31 | Teddy bear 32 | Artifact 33 | 34 | 35 | Bob 36 | 0 37 | 1999-05-01 38 | 24 39 | Sponge 40 | Demosponge 41 | 42 | 43 | 44 | -------------------------------------------------------------------------------- /tests/data/input/staff.json: -------------------------------------------------------------------------------- 1 | { 2 | "number_of_records": 5, 3 | "data": [ 4 | { 5 | "Name": "Simba", 6 | "Weight": 250, 7 | "Birth": "1994-06-15", 8 | "Age": 29, 9 | "Species": "Lion", 10 | "Class": "Mammal" 11 | }, 12 | { 13 | "Name": "Dumbo", 14 | "Weight": 4000, 15 | "Birth": "1941-10-23", 16 | "Age": 81, 17 | "Species": "Elephant", 18 | "Class": "Mammal" 19 | }, 20 | { 21 | "Name": "George", 22 | "Weight": 20, 23 | "Birth": "1939-01-01", 24 | "Age": 84, 25 | "Species": "Monkey", 26 | "Class": "Mammal" 27 | }, 28 | { 29 | "Name": "Pooh", 30 | "Weight": 1, 31 | "Birth": "1921-08-21", 32 | "Age": 102, 33 | "Species": "Teddy bear", 34 | "Class": "Artifact" 35 | }, 36 | { 37 | "Name": "Bob", 38 | "Weight": 0, 39 | "Birth": "1999-05-01", 40 | "Age": 24, 41 | "Species": "Sponge", 42 | "Class": "Demosponge" 43 | } 44 | ] 45 | } 46 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["hatchling"] 3 | build-backend = "hatchling.build" 4 | 5 | [project] 6 | name = "pypipe-ppp" 7 | dynamic = ["version"] 8 | description = 'A Python command-line tool for pipeline processing' 9 | readme = "README.md" 10 | requires-python = ">=3.6" 11 | license = "Apache-2.0" 12 | keywords = [] 13 | authors = [ 14 | { name = "bugen", email = "paradise.on.paradise@gmail.com" }, 15 | ] 16 | classifiers = [ 17 | "Development Status :: 4 - Beta", 18 | "Programming Language :: Python", 19 | "Programming Language :: Python :: 3.6", 20 | "Programming Language :: Python :: 3.7", 21 | "Programming Language :: Python :: 3.8", 22 | "Programming Language :: Python :: 3.9", 23 | "Programming Language :: Python :: 3.10", 24 | "Programming Language :: Python :: 3.11", 25 | "Programming Language :: Python :: 3.12", 26 | ] 27 | dependencies = [] 28 | 29 | [project.urls] 30 | Documentation = "https://github.com/bugen/pypipe#readme" 31 | Issues = "https://github.com/bugen/pypipe/issues" 32 | Source = "https://github.com/bugen/pypipe" 33 | 34 | [project.scripts] 35 | pypipe = "pypipe:main" 36 | # To support `pipx run`, there should be a console script whose name matches 37 | # the PyPI name of the package 38 | pypipe-ppp = "pypipe:main" 39 | # https://github.com/bugen/pypipe/pull/8#discussion_r1379798279 40 | ppp = "pypipe:main" 41 | 42 | [tool.hatch.build.targets.wheel] 43 | packages = ["pypipe.py"] 44 | 45 | [tool.hatch.version] 46 | path = "pypipe.py" 47 | -------------------------------------------------------------------------------- /docs/people.csv: -------------------------------------------------------------------------------- 1 | Name,Age,Hobby,Gender 2 | John Doe,30,Hiking,Male 3 | Jane Smith,25,Painting,Female 4 | Bob Johnson,40,Photography,Male 5 | Alice Brown,35,Reading,Female 6 | Eva Green,28,Cooking,Female 7 | Michael Wilson,45,Gardening,Male 8 | Sara Lee,32,Traveling,Female 9 | David Anderson,37,Playing Music,Male 10 | Emily Taylor,29,Dancing,Female 11 | James Clark,42,Woodworking,Male 12 | Olivia Harris,26,Yoga,Female 13 | Richard Davis,50,Running,Male 14 | Mia Thomas,33,Swimming,Female 15 | William White,29,Photography,Male 16 | Ava Hall,38,Hiking,Female 17 | Charles Turner,48,Painting,Male 18 | Sophia Allen,31,Reading,Female 19 | Daniel Lewis,27,Traveling,Male 20 | Grace Young,34,Cooking,Female 21 | Joseph Hill,36,Playing Music,Male 22 | Ella Moore,24,Dancing,Female 23 | Lily Adams,27,Hiking,Female 24 | Thomas Mitchell,49,Gardening,Male 25 | Chloe Nelson,30,Running,Female 26 | John Harris,43,Swimming,Male 27 | Sophia Walker,29,Photography,Female 28 | Matthew Hall,39,Yoga,Male 29 | Olivia Turner,32,Traveling,Female 30 | William Turner,37,Painting,Male 31 | Emma Wright,28,Cooking,Female 32 | John Garcia,45,Reading,Male 33 | Isabella Carter,26,Hiking,Female 34 | James Parker,44,Woodworking,Male 35 | Ava Lopez,33,Playing Music,Female 36 | David Scott,46,Dancing,Male 37 | Sophia Rodriguez,30,Photography,Female 38 | Ethan Adams,35,Gardening,Male 39 | Emily Baker,29,Traveling,Female 40 | Nathan Reed,38,Running,Male 41 | Abigail Bennett,28,Reading,Female 42 | Daniel Perez,41,Cooking,Male 43 | Madison Mitchell,31,Swimming,Female 44 | Christopher Martinez,40,Hiking,Male 45 | Avery Green,27,Yoga,Female 46 | Kevin King,49,Playing Music,Male 47 | Lily Turner,25,Dancing,Female 48 | Michael Nelson,42,Hiking,Male 49 | Chloe Baker,33,Painting,Female 50 | Jacob Williams,47,Reading,Male 51 | Ella Lewis,30,Cooking,Female 52 | Ryan Wright,39,Traveling,Male 53 | -------------------------------------------------------------------------------- /.github/workflows/release.yml: -------------------------------------------------------------------------------- 1 | name: Publish package to PyPI 2 | 3 | on: 4 | release: 5 | types: 6 | - published 7 | 8 | permissions: 9 | contents: read 10 | 11 | jobs: 12 | build: 13 | name: Build distribution packages 14 | runs-on: ubuntu-latest 15 | steps: 16 | - uses: actions/checkout@v3 17 | - name: Set up Python 18 | uses: actions/setup-python@v4 19 | - name: Install pypa/build 20 | run: python -m pip install --user build 21 | - name: Build packages 22 | run: python -m build 23 | - uses: actions/upload-artifact@v3 24 | with: 25 | name: dist 26 | path: dist/ 27 | if-no-files-found: error 28 | test: 29 | name: Run tests 30 | runs-on: ubuntu-latest 31 | needs: 32 | - build 33 | steps: 34 | - uses: actions/checkout@v3 35 | - name: Set up Python 36 | uses: actions/setup-python@v4 37 | - name: Download package 38 | uses: actions/download-artifact@v3 39 | with: 40 | name: dist 41 | path: dist/ 42 | - name: Install dependencies and package 43 | run: | 44 | python -m pip install ruff pytest 45 | python -m pip install --no-index --find-links ./dist/ pypipe-ppp 46 | - name: Lint with Ruff 47 | run: ruff --output-format=github --target-version=py310 . 48 | - name: Test with pytest 49 | run: pytest -v -s 50 | publish-to-test-pypi: 51 | name: Publish packages to Test PyPI 52 | runs-on: ubuntu-latest 53 | # List the jobs that this one directly depends on: 54 | # - build because it needs the package to be built 55 | # - test to make sure it doesn't try uploading before tests pass 56 | needs: 57 | - build 58 | - test 59 | environment: test-pypi 60 | permissions: 61 | # this permission is mandatory for trusted publishing 62 | id-token: write 63 | steps: 64 | - uses: actions/download-artifact@v3 65 | with: 66 | name: dist 67 | path: dist/ 68 | - name: Publish packages to Test PyPI 69 | uses: pypa/gh-action-pypi-publish@release/v1 70 | with: 71 | repository-url: https://test.pypi.org/legacy/ 72 | print-hash: true 73 | publish-to-pypi: 74 | name: Publish packages to PyPI 75 | runs-on: ubuntu-latest 76 | # List the jobs that this one directly depends on: 77 | # - build because it needs the package to be built 78 | # - publish-to-test-pypi to make sure it doesn't try the real upload before the test one succeeds 79 | needs: 80 | - build 81 | - publish-to-test-pypi 82 | environment: pypi 83 | permissions: 84 | # this permission is mandatory for trusted publishing 85 | id-token: write 86 | steps: 87 | - uses: actions/download-artifact@v3 88 | with: 89 | name: dist 90 | path: dist/ 91 | - name: Publish packages to PyPI 92 | uses: pypa/gh-action-pypi-publish@release/v1 93 | with: 94 | print-hash: true 95 | -------------------------------------------------------------------------------- /tests/test_pypipe_examples.py: -------------------------------------------------------------------------------- 1 | import io 2 | import sys 3 | from pathlib import Path 4 | 5 | import pytest 6 | 7 | from pypipe import main 8 | 9 | TEST_DATA_DIR = Path(__file__).resolve().parent / 'data' 10 | 11 | 12 | @pytest.mark.parametrize('input_text_file_name, expected_text_file_name, command', [ 13 | ('staff.txt', 'ppp_line_1.txt', ['i, line.upper()', ]), 14 | ('staff.jsonlines.txt', 'ppp_line_2.txt', ['-j', 'dic["Name"]']), 15 | ('echo_line_1.txt', 'ppp_line_3.txt', ['line, math.sqrt(int(line))',]), 16 | ('echo_line_2.txt', 'ppp_line_4.txt', ['urllib.parse.urlparse(line)',]), 17 | ('staff.txt', 'ppp_rec_1.txt', ['rec', 'r[:3]']), 18 | ('staff.txt', 'ppp_rec_2.txt', ['rec', '-l5', 'f"{f1} is {f4} years old"']), 19 | ('staff.txt', 'ppp_rec_3.txt', ['rec', '-H', 'rec[0], dic["Birth"]']), 20 | ('echo_rec_1.txt', 'ppp_rec_4.txt', ['rec', '-l5', '--type', '2:i,3:f,4:b,5:j', 21 | "type(f1),type(f2),type(f3),type(f4),type(f5)"]), 22 | ('staff.csv', 'ppp_rec_5.txt', ['rec', '-d', ',', '-l6', 'f1']), 23 | ('echo_rec_2.txt', 'ppp_rec_6.txt', ['rec', '-d', r'\s+', 'rec[2]']), 24 | ('staff.txt', 'ppp_rec_7.txt', ['rec', '-D' ',']), 25 | ('echo_rec_3.txt', 'ppp_rec_8.txt', ['rec', '-m', r'\d+', 'r[1]']), 26 | ('staff.txt', 'ppp_rec_9.txt', ['rec', '-Fd']), 27 | ('staff.txt', 'ppp_rec_10.txt', ['rec', '-Fj']), 28 | ('staff.txt', 'ppp_rec_11.txt', ['rec', '-Fn']), 29 | ('staff.txt', 'ppp_rec_12.txt', ['rec', '-D', ',']), 30 | ('staff.txt', 'ppp_rec_13.txt', ['rec', '-D', '||']), 31 | ('staff.txt', 'ppp_rec_14.txt', ['rec', '-d', r'\s+']), 32 | ('staff.txt', 'ppp_rec_15.txt', ['rec', '-m', r'\w+']), 33 | ('staff.txt', 'ppp_rec_16.txt', ['rec', '-v', '-H', '-knever']), 34 | ('staff.txt', 'ppp_rec_17.txt', ['rec', 'f3,f2,f1']), 35 | ('echo_rec_4.txt', 'ppp_rec_18.txt', ['rec', '--view', '-t', '[(v, type(v)) for v in rec]']), 36 | ('staff.txt', 'ppp_rec_19.txt', ['rec', 'print(f1,f2,f3)']), 37 | ('staff.txt', 'ppp_rec_20.txt', ['rec', '-H', '-t', '-c', 'counter["TOTAL WEIGHT"] += f2']), 38 | ('staff.csv', 'ppp_csv_1.txt', ['csv', '-O', 'quoting=csv.QUOTE_ALL']), 39 | ('staff.csv', 'ppp_csv_2.txt', ['csv', '-D', r'\t']), 40 | ('staff.csv', 'ppp_csv_3.txt', ['csv', '-H', '-f', 'int(f2) > 100']), 41 | ('staff.csv', 'ppp_csv_4.txt', ['csv', '-t', '[type(v) for v in rec]']), 42 | ('staff.txt', 'ppp_text_1.txt', ['text', "len(text)"]), 43 | ('staff.json', 'ppp_text_2.txt', ['text', '-j', 'dic["data"][0]']), 44 | ('staff.json', 'ppp_text_3.txt', ['text', '-j', '-L', '-Fj', '*dic["data"]']), 45 | ('staff.json', 'ppp_text_4.txt', ['text', '-j', '-v', '-knever', 'dic']), 46 | ('staff.json', 'ppp_text_5.txt', ['text', '--convert', '-Fj', 'text["number_of_records"]']), 47 | ]) 48 | def test_ppp_common(input_text_file_name, expected_text_file_name, command, capsys): 49 | ex_data: str 50 | try: 51 | sys.stdin = open(TEST_DATA_DIR / 'input' / input_text_file_name) 52 | with open(TEST_DATA_DIR / 'expect' / expected_text_file_name) as file: 53 | ex_data = file.read() 54 | 55 | main(command) 56 | out, err = capsys.readouterr() 57 | assert out.replace("\r\n", "\n") == ex_data.replace("\r\n", "\n") 58 | finally: 59 | sys.stdin.close() 60 | 61 | 62 | def test_ppp_file(capsys): 63 | f = io.StringIO() 64 | f.write(str(TEST_DATA_DIR / 'input' / 'staff.json')+"\n") 65 | f.write(str(TEST_DATA_DIR / 'input' / 'staff.txt')+"\n") 66 | f.seek(0) 67 | sys.stdin = f 68 | 69 | main(['file', 'path, len(text)']) 70 | out, err = capsys.readouterr() 71 | expect = [ 72 | str(TEST_DATA_DIR / 'input' / 'staff.json') + "\t" + "1046\n", 73 | str(TEST_DATA_DIR / 'input' / 'staff.txt') + "\t" + "231\n" 74 | ] 75 | assert out == ''.join(expect) 76 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /pypipe.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | pypipe.py 4 | 5 | Copyright 2023 bugen 6 | 7 | Licensed under the Apache License, Version 2.0 (the "License"); 8 | you may not use this file except in compliance with the License. 9 | You may obtain a copy of the License at 10 | 11 | http://www.apache.org/licenses/LICENSE-2.0 12 | 13 | Unless required by applicable law or agreed to in writing, software 14 | distributed under the License is distributed on an "AS IS" BASIS, 15 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 | See the License for the specific language governing permissions and 17 | limitations under the License. 18 | """ 19 | import argparse 20 | import ast 21 | import atexit 22 | import importlib.util 23 | import re 24 | import shutil 25 | import signal 26 | import subprocess 27 | import sys 28 | from os import chmod, environ 29 | 30 | __version__ = "0.4.1" 31 | 32 | 33 | INDENT = " " * 4 34 | 35 | FIELD_TYPE_TMPL = { 36 | "i": r"int({})", 37 | "f": r"float({})", 38 | "b": r"bool({})", 39 | "j": r"json.loads({})", 40 | } 41 | 42 | TEMPLATE_LINE = r""" 43 | {imp} 44 | 45 | {pre} 46 | 47 | for i, line in enumerate(sys.stdin, 1): 48 | line = line.rstrip("\r\n") 49 | l = line # ABBREV 50 | {loop_head} 51 | {loop_filter} 52 | {main} 53 | 54 | {post} 55 | """ 56 | 57 | TEMPLATE_REC = r""" 58 | {imp} 59 | 60 | {prepre} 61 | {pre} 62 | 63 | for i, line in enumerate(sys.stdin, 1): 64 | line = line.rstrip("\r\n") 65 | {parse_line} 66 | r = rec # ABBREV 67 | {loop_head} 68 | {loop_filter} 69 | {main} 70 | 71 | {post} 72 | """ 73 | 74 | TEMPLATE_CSV = r""" 75 | {imp} 76 | 77 | def _write(*args, writer=None): 78 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 79 | writer.writerow(args[0]) 80 | else: 81 | writer.writerow(args) 82 | 83 | 84 | reader = csv.reader(sys.stdin, {reader_opts}) 85 | writer = csv.writer(sys.stdout, {writer_opts}) 86 | _w = writer.writerow # ABBREV 87 | {prepre} 88 | {pre} 89 | 90 | for i, rec in enumerate(reader, 1): 91 | r = rec # ABBREV 92 | {loop_head} 93 | {loop_filter} 94 | {main} 95 | 96 | {post} 97 | """ 98 | 99 | TEMPLATE_TEXT = r""" 100 | {imp} 101 | 102 | {pre} 103 | 104 | text = sys.stdin.read() 105 | {pre_main} 106 | {main} 107 | 108 | {post} 109 | """ 110 | 111 | TEMPLATE_FILE = r""" 112 | {imp} 113 | 114 | def _open(path): 115 | if path.suffix == '.gz': 116 | return gzip.open(path, '{mode}') 117 | else: 118 | return open(path, '{mode}') 119 | 120 | {pre} 121 | 122 | for i, line in enumerate(sys.stdin, 1): 123 | path = Path(line.rstrip('\r\n')) 124 | with _open(path) as file: 125 | text = file.read() 126 | {loop_head} 127 | {loop_filter} 128 | {main} 129 | 130 | {post} 131 | """ 132 | 133 | PRINT_FUNC = r""" 134 | def _print(*args, sep='{sep}'): 135 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 136 | print(sep.join(str(v) for v in args[0])) 137 | else: 138 | print(sep.join(str(v) for v in args)) 139 | """ 140 | 141 | PRINT_FUNC_JSON = r""" 142 | def _print(*args, sep='{sep}'): 143 | print(sep.join(json.dumps(v) for v in args)) 144 | """ 145 | 146 | PRINT_FUNC_NATIVE = r"_print = partial(print, sep='{sep}')" 147 | 148 | FORMAT_PRINT_FUNC = { 149 | "default": PRINT_FUNC, "d": PRINT_FUNC, 150 | "json": PRINT_FUNC_JSON, "j": PRINT_FUNC_JSON, 151 | "native": PRINT_FUNC_NATIVE, "n": PRINT_FUNC_NATIVE, 152 | } 153 | 154 | CONVERT_FUNC = r""" 155 | PATTERN_NUMERIC = re.compile(r'^[\d.-]+$') 156 | CONV_DIC = {"true": True, "false": False, "none": None, "null": None} 157 | 158 | def _deserialize(val, funcs): 159 | for func in funcs: 160 | try: 161 | return func(val) 162 | except: 163 | continue 164 | return val 165 | 166 | def _convert(val): 167 | lower = val.lower() 168 | if lower in CONV_DIC: 169 | return CONV_DIC[lower] 170 | elif PATTERN_NUMERIC.match(val): 171 | return _deserialize(val, [int, float]) 172 | elif val.startswith(('{', '[', '(')): 173 | return _deserialize(val, [json.loads, eval]) 174 | return val 175 | """ 176 | 177 | COUNTER_POST = r""" 178 | for v, c in counter.most_common(): 179 | v = "\t".join(str(x) for x in v) if isinstance(v, (list, set, tuple)) else v 180 | print(f"{v}\t{c}") 181 | """.lstrip() 182 | 183 | 184 | VIEW_TMPL = r""" 185 | CLEAR = '\033[0m' 186 | GREEN = '\033[32m' 187 | CYAN = '\033[36m' 188 | BOLD = '\033[1m' 189 | 190 | def color(s, color_code=CYAN, bold=False): 191 | if color_code is None: 192 | return s 193 | return f"{BOLD}{color_code}{s}{CLEAR}" if bold else f"{color_code}{s}{CLEAR}" 194 | 195 | nocolor = partial(color, color_code=None) 196 | cyan = partial(color, color_code=CYAN) 197 | green = partial(color, color_code=GREEN) 198 | 199 | class Viewer: 200 | 201 | def __init__(self, colored=True): 202 | self.num = 1 203 | self.color1, self.color2 = (cyan, green) if colored else (nocolor, nocolor) 204 | 205 | def wlen(self, w): 206 | return sum(2 if east_asian_width(c) in "FWA" else 1 for c in w) 207 | 208 | def ljust(self, w, length): 209 | return w + " " * max(length - self.wlen(w), 0) 210 | 211 | def format(self, val): 212 | if isinstance(val, (dict, list, tuple, set)): 213 | return pformat(val, indent=1, width=120) 214 | return str(val) 215 | 216 | def _view(self, vals): 217 | num_width = len(str(len(vals))) 218 | tmpl = rf"{{0:<{num_width}}} {{1}}" 219 | for i, val in enumerate(vals, 1): 220 | for j, line in enumerate(self.format(val).split("\n")): 221 | if j == 0: 222 | print(tmpl.format(i, self.color2(line))) 223 | else: 224 | print(tmpl.format('.', self.color2(line))) 225 | 226 | def _view_with_headers(self, vals, headers): 227 | num_width = len(str(len(vals))) 228 | header_width = max(self.wlen(h) for h in headers) 229 | tmpl = rf"{{0:<{num_width}}} | {{1}} | {{2}}" 230 | for i, (header, val) in enumerate(zip(headers, vals), 1): 231 | for j, line in enumerate(self.format(val).split("\n")): 232 | if j == 0: 233 | print(tmpl.format(i, self.ljust(header, header_width), self.color2(line))) 234 | else: 235 | print(tmpl.format('', self.ljust('', header_width), self.color2(line))) 236 | 237 | def view(self, *args, recnum=None, headers=None): 238 | print(self.color1(f'[Record {recnum or self.num}]', bold=True)) 239 | vals = args[0] if len(args) == 1 and isinstance(args[0], (list, tuple)) else args 240 | if headers and len(vals) == len(headers): 241 | self._view_with_headers(vals, headers) 242 | else: 243 | self._view(vals) 244 | print() 245 | self.num += 1 246 | """ 247 | 248 | 249 | def parse_all_codes(args): 250 | code_trees = [] 251 | for name in ("codes", "pre_codes", "post_codes", "loop_heads", "filters"): 252 | if name in args: 253 | code = '\n'.join(extend_codes(getattr(args, name) or [])) 254 | try: 255 | tree = ast.parse(code) 256 | code_trees.append(tree) 257 | except SyntaxError: 258 | pass 259 | return code_trees 260 | 261 | 262 | def check_field_variables_in_code(args): 263 | pattern = re.compile(r'^f\d+$') 264 | for tree in args.all_code_trees: 265 | for node in ast.walk(tree): 266 | if isinstance(node, ast.Name) and pattern.match(node.id): 267 | return True 268 | return False 269 | 270 | 271 | def check_wrapping_is_need(args): 272 | codes = extend_codes(args.codes) 273 | if not codes: 274 | return True 275 | try: 276 | tree = ast.parse(codes[-1]) 277 | except SyntaxError: 278 | return True 279 | if not isinstance(tree.body[0], ast.Expr): 280 | return False 281 | if not isinstance(tree.body[0].value, ast.Call): 282 | return True 283 | if not isinstance(tree.body[0].value.func, ast.Name): 284 | return True 285 | return tree.body[0].value.func.id not in ('print', '_print') 286 | 287 | 288 | def is_colored(args): 289 | if args.color == 'always': 290 | return True 291 | elif args.color == 'never': 292 | return False 293 | if environ.get('PYPIPE_VIEW_COLORED', 'true').lower() == 'false': 294 | return False 295 | return sys.stdout.isatty() 296 | 297 | 298 | def paging_enabled(args): 299 | if args.output: 300 | return False 301 | if not sys.stdout.isatty(): 302 | return False 303 | if args.paging is not None: 304 | return args.paging 305 | return environ.get('PYPIPE_PAGER_ENABLED', 'true').lower() == 'true' 306 | 307 | 308 | def select_pager(args): 309 | pager = environ.get('PYPIPE_PAGER') or environ.get('PAGER') or 'less' 310 | if args.print: 311 | pager = environ.get('PYPIPE_PRINT_PAGER') or pager 312 | elif args.view: 313 | pager = environ.get('PYPIPE_VIEW_PAGER') or pager 314 | if pager.split()[0] == 'less': 315 | pager = pager + ' ' + environ.get('PYPIPE_LESS_OPTS', ' -R -F') 316 | return pager 317 | 318 | 319 | def enable_pager(args): 320 | pager = select_pager(args) 321 | if shutil.which(pager.split()[0]) is None: 322 | return False 323 | proc = None 324 | stdout_save = sys.stdout 325 | stat = {"is_exiting": False} 326 | 327 | def on_exit(): 328 | stat["is_exiting"] = True 329 | if proc: 330 | try: 331 | proc.stdin.close() 332 | proc.wait() 333 | except (BrokenPipeError, KeyboardInterrupt): 334 | pass 335 | sys.stdout = stdout_save 336 | 337 | def sighandler(signum, frame): 338 | # When entering only "ppp" without using a pipe, pypipe continues 339 | # to wait for standard input and does not exit. In such cases, 340 | # you may want to terminate it with Ctrl-C. However, if Ctrl-C 341 | # is accepted while pypipe is waiting for he termination of the 342 | # pager in the on_exit, the terminal display may be corrupted. 343 | # Therefore, make Ctrl-C acceptable until just before the processing 344 | # reaches on_exit, and ignore it if it hasn't reached on_exit yet. 345 | if not stat["is_exiting"]: 346 | exit() 347 | 348 | proc = subprocess.Popen( 349 | pager.split(), 350 | stdin=subprocess.PIPE, 351 | universal_newlines=True, 352 | start_new_session=True, 353 | ) 354 | sys.stdout = proc.stdin 355 | atexit.register(on_exit) 356 | signal.signal(signal.SIGINT, sighandler) 357 | return True 358 | 359 | 360 | def indent(code, level=1): 361 | return INDENT * level + code 362 | 363 | 364 | def format_code(code, remove_comments=False, remove_abbrevs=False): 365 | code = code.strip("\n") 366 | if remove_comments: 367 | code = "\n".join( 368 | line for line in code.split("\n") 369 | if not line.strip().startswith("#") 370 | ) 371 | if remove_abbrevs: 372 | code = "\n".join( 373 | line for line in code.split("\n") 374 | if not line.endswith("# ABBREV") 375 | ) 376 | return code 377 | 378 | 379 | def _exec_code(code, args): 380 | if args.output or args.print: 381 | code = format_code( 382 | code, 383 | remove_comments=args.no_comments, 384 | remove_abbrevs=args.no_abbrevs, 385 | ) 386 | if args.output: 387 | with open(args.output, "w") as outfile: 388 | print('#!/usr/bin/env python', file=outfile) 389 | print(code, file=outfile) 390 | chmod(args.output, 0o755) 391 | elif args.print: 392 | print(code) 393 | else: 394 | _globals = { 395 | '__name__': '__exec__', 396 | '__builtins__': globals()['__builtins__'] 397 | } 398 | exec(compile(code, '', 'exec'), _globals) 399 | 400 | 401 | def exec_code(code, args): 402 | try: 403 | _exec_code(code, args) 404 | except (BrokenPipeError, KeyboardInterrupt): 405 | exit(0) 406 | 407 | 408 | def extend_codes(codes, comment=None): 409 | not_empty_codes = [] 410 | if comment: 411 | not_empty_codes.append(f"# {comment}") 412 | if codes: 413 | for _codes in codes: 414 | not_empty_codes.extend(c.rstrip() for c in _codes.split("\n") if c.rstrip()) 415 | return not_empty_codes 416 | 417 | 418 | def is_json_needed(args): 419 | return ("json" in args and args.json or 420 | args.output_format in ("json", "j") or 421 | "field_type" in args and "j" in list(args.field_type.values())) 422 | 423 | 424 | def get_imports(args): 425 | imports = set() 426 | imports.add("sys") 427 | imports.add("from functools import partial") 428 | if args.view: 429 | imports.add("from pprint import pformat") 430 | imports.add("from unicodedata import east_asian_width") 431 | if is_json_needed(args): 432 | imports.add("json") 433 | if args.convert: 434 | imports.update({"re", "json"}) 435 | if args.counter: 436 | imports.add("from collections import Counter") 437 | # REC 438 | if args.command == "rec": 439 | if args.regex or (args.delimiter != r'\t' and len(args.delimiter) > 1): 440 | imports.add("re") 441 | # CSV 442 | if args.command == "csv": 443 | imports.add("csv") 444 | # FILE 445 | if args.command == "file": 446 | imports.add("gzip") 447 | imports.add("from pathlib import Path") 448 | return imports 449 | 450 | 451 | def get_auto_imports(args): 452 | 453 | def _trace(node): 454 | """ 455 | e.g.) math.sqrt(num) -> True, ['math', 'sqrt'] 456 | e.g.) urllib.parse.urlparse(url) -> True, ['urllib', 'parse', 'urlparse'] 457 | e.g.) datetime.datetime.now().isoformat() -> False, ['datetime', 'datetime', 'now'] 458 | """ 459 | if isinstance(node, ast.Attribute): 460 | chained, ls = _trace(node.value) 461 | if chained: 462 | ls.append(node.attr) 463 | return chained, ls 464 | elif isinstance(node, ast.Subscript): 465 | _, ls = _trace(node.value) 466 | return False, ls 467 | elif isinstance(node, ast.Call): 468 | _, ls = _trace(node.func) 469 | return False, ls 470 | else: 471 | return (True, [node.id]) if isinstance(node, ast.Name) else (False, []) 472 | 473 | def _retrieve(tree): 474 | ret = [] 475 | for node in ast.iter_child_nodes(tree): 476 | if isinstance(node, ast.Attribute): 477 | _, ls = _trace(node) 478 | ret.append(ls[:-1]) 479 | else: 480 | ret.extend(_retrieve(node)) 481 | return ret 482 | 483 | def _extract_module(tree): 484 | candidates = set() 485 | for ls in _retrieve(tree): 486 | for i in range(len(ls), 0, -1): 487 | modulename = ".".join(ls[:i]) 488 | try: 489 | if importlib.util.find_spec(modulename) is not None: 490 | candidates.add(modulename) 491 | break 492 | except (ModuleNotFoundError, AttributeError): 493 | continue 494 | return candidates 495 | 496 | modules = set() 497 | for tree in args.all_code_trees: 498 | modules.update(_extract_module(tree)) 499 | return modules 500 | 501 | 502 | def get_optional_imports(args): 503 | return {i for i in args.import_codes or []} 504 | 505 | 506 | def gen_import(args): 507 | # Ensure that the import statements specified by the option comes later. 508 | # For example, if we want to use orjson and specify `import orjson as json`, 509 | # this import statement needs to be added after the import statement 510 | # `import json` added by pypipe. 511 | imports_opt = get_optional_imports(args) 512 | imports = (get_imports(args) | get_auto_imports(args)) - imports_opt 513 | codes = ["# IMPORT"] 514 | for i in list(imports) + list(imports_opt): 515 | if i.startswith('import ') or i.startswith('from '): 516 | codes.append(i) 517 | else: 518 | codes.append(f"import {i}") 519 | return "\n".join(codes) 520 | 521 | 522 | def gen_pre(args): 523 | codes = ["# PRE"] 524 | codes.append(r'_p = partial(print, sep="\t") # ABBREV') 525 | codes.append(r'I, S, B, L, D, SET = 0, "", False, [], {}, set() # ABBREV') 526 | if args.view: 527 | codes.append(VIEW_TMPL) 528 | codes.append(rf"viewer = Viewer(colored={args.colored})") 529 | codes.append(r"view = viewer.view") 530 | if args.convert: 531 | codes.append(CONVERT_FUNC) 532 | codes.append(FORMAT_PRINT_FUNC[args.output_format].format(sep=args.output_delimiter)) 533 | if args.counter: 534 | codes.append(r"counter = Counter()") 535 | codes.append(r"c = counter #ABBREV") 536 | if args.pre_codes: 537 | codes.extend(extend_codes(args.pre_codes)) 538 | return "\n".join(codes) 539 | 540 | 541 | def gen_post(args): 542 | if args.post_codes: 543 | return "\n".join(extend_codes(args.post_codes, "POST")) 544 | codes = ["# POST"] 545 | if args.counter: 546 | codes.append(COUNTER_POST) 547 | return "\n".join(codes) 548 | 549 | 550 | def gen_main(args, default_code, wrapper, level=1): 551 | codes = extend_codes(args.codes, "MAIN") 552 | if len(codes) == 1: 553 | codes.append(default_code) # set default code 554 | if not args.no_wrapping: 555 | spaces = "" 556 | for c in codes[-1]: 557 | if c != " ": 558 | break 559 | spaces += c 560 | if args.counter: 561 | codes[-1] = spaces + r"counter[{}] += 1".format(codes[-1].lstrip()) 562 | else: 563 | codes[-1] = spaces + wrapper.format(codes[-1].lstrip()) 564 | return "\n".join(indent(c, level=level) for c in codes) 565 | 566 | 567 | def gen_loop_filter(args, level=1): 568 | filters = ["# LOOP FILTER"] 569 | if args.filters: 570 | for f in args.filters: 571 | if not f.strip(): 572 | continue 573 | filters.append('if not ({}): continue'.format(f.strip())) 574 | return "\n".join(indent(c, level=level) for c in filters) 575 | 576 | 577 | def gen_loop_head_rec_csv(args): 578 | loop_head_codes = ["# LOOP HEAD"] 579 | if args.convert: 580 | loop_head_codes.append("rec = [_convert(v) for v in rec]") 581 | if args.field_type: 582 | # ex) if len(rec) > 16 and rec[16]: rec[16] = int(rec[16]) 583 | for f, t in args.field_type.items(): 584 | loop_head_codes.append( 585 | "if len(rec) > {0} and rec[{0}]: rec[{0}] = {1}".format( 586 | f - 1, FIELD_TYPE_TMPL[t].format(f"rec[{f-1}]")) 587 | ) 588 | if args.field_length is not None: 589 | if args.field_length == 0: 590 | # define field variables dinamically 591 | loop_head_codes.append(r"_locals.update({f'f{j+1}': rec[j] for j in range(len(rec))})") 592 | else: 593 | # ex) f1, f2, f3, f4 = r[:4] 594 | loop_head_codes.append("{} = {}".format( 595 | ", ".join(f"f{i+1}" for i in range(args.field_length)), 596 | f"r[:{args.field_length}]", 597 | )) 598 | if args.header: 599 | loop_head_codes.append("dic = dict(zip(header, rec))") 600 | loop_head_codes.append("d = dic # ABBREV") 601 | 602 | loop_head_codes.extend(extend_codes(args.loop_heads)) 603 | return "\n".join(indent(c) for c in loop_head_codes) 604 | 605 | 606 | def line_handler(args): 607 | 608 | def gen_loop_head(): 609 | loop_head_codes = ["# LOOP HEAD"] 610 | if args.convert: 611 | loop_head_codes.append("l = line = _convert(line)") 612 | if args.json: 613 | loop_head_codes.append('dic = json.loads(line)') 614 | loop_head_codes.append('d = dic #ABBREV') 615 | loop_head_codes.extend(extend_codes(args.loop_heads)) 616 | return "\n".join(indent(c) for c in loop_head_codes) 617 | 618 | wrapper = r"view({})" if args.view else r"_print({})" 619 | code = TEMPLATE_LINE.format( 620 | imp=gen_import(args), 621 | pre=gen_pre(args), 622 | loop_head=gen_loop_head(), 623 | loop_filter=gen_loop_filter(args), 624 | main=gen_main(args, "line", wrapper), 625 | post=gen_post(args), 626 | ) 627 | exec_code(code, args) 628 | 629 | 630 | def rec_handler(args): 631 | is_regex_delimiter = args.delimiter != r'\t' and len(args.delimiter) > 1 632 | if args.regex is not None: 633 | re_compile = rf"pattern = re.compile(r'{args.regex}')" 634 | parse_header = r"header = pattern.findall(next(sys.stdin).rstrip('\r\n'))" if args.header else "" 635 | parse_line = r"rec = pattern.findall(line)" 636 | elif is_regex_delimiter: 637 | re_compile = rf"pattern = re.compile(r'{args.delimiter}')" 638 | parse_header = r"header = pattern.split(next(sys.stdin).rstrip('\r\n'))" if args.header else "" 639 | parse_line = r"rec = pattern.split(line)" 640 | else: 641 | re_compile = "" 642 | parse_header = rf"header = next(sys.stdin).rstrip('\r\n').split('{args.delimiter}')" if args.header else "" 643 | parse_line = rf"rec = line.split('{args.delimiter}')" 644 | 645 | locals = "_locals = locals()" if args.field_length is not None and args.field_length == 0 else "" 646 | wrapper = r"_print({})" 647 | if args.view: 648 | wrapper = r"view({}, headers=header)" if args.header else r"view({})" 649 | code = TEMPLATE_REC.format( 650 | imp=gen_import(args), 651 | prepre='\n'.join(extend_codes([re_compile, parse_header, locals])), 652 | pre=gen_pre(args), 653 | parse_line=parse_line, 654 | loop_head=gen_loop_head_rec_csv(args), 655 | loop_filter=gen_loop_filter(args), 656 | main=gen_main(args, "rec", wrapper), 657 | post=gen_post(args), 658 | ) 659 | exec_code(code, args) 660 | 661 | 662 | def csv_handler(args): 663 | output_delimiter = args.output_delimiter or args.delimiter 664 | csv_reader_opts = [("delimiter", f"'{args.delimiter}'")] 665 | csv_writer_opts = [("delimiter", f"'{output_delimiter}'")] 666 | if args.csv_opts: 667 | csv_reader_opts.extend(args.csv_opts) 668 | csv_writer_opts.extend(args.csv_opts) 669 | reader_opts = ", ".join(f'{k}={v}' for k, v in csv_reader_opts) 670 | writer_opts = ", ".join(f'{k}={v}' for k, v in csv_writer_opts) 671 | parse_header = "header = next(reader)" if args.header else "" 672 | 673 | locals = "_locals = locals()" if args.field_length is not None and args.field_length == 0 else "" 674 | wrapper = r"_write({}, writer=writer)" 675 | if args.view: 676 | wrapper = r"view({}, headers=header)" if args.header else r"view({})" 677 | code = TEMPLATE_CSV.format( 678 | imp=gen_import(args), 679 | reader_opts=reader_opts, 680 | writer_opts=writer_opts, 681 | prepre='\n'.join(extend_codes([parse_header, locals])), 682 | pre=gen_pre(args), 683 | loop_head=gen_loop_head_rec_csv(args), 684 | loop_filter=gen_loop_filter(args), 685 | main=gen_main(args, "rec", wrapper), 686 | post=gen_post(args), 687 | ) 688 | exec_code(code, args) 689 | 690 | 691 | def text_handler(args): 692 | 693 | def gen_pre_main(): 694 | codes = [] 695 | if args.convert: 696 | codes.append("text = _convert(text)") 697 | if args.json: 698 | codes.append("dic = json.loads(text)") 699 | codes.append('d = dic #ABBREV') 700 | return "\n".join(codes) 701 | 702 | wrapper = r"view({})" if args.view else r"_print({})" 703 | code = TEMPLATE_TEXT.format( 704 | imp=gen_import(args), 705 | pre=gen_pre(args), 706 | pre_main=gen_pre_main(), 707 | main=gen_main(args, "text", wrapper, level=0), 708 | post=gen_post(args), 709 | ) 710 | exec_code(code, args) 711 | 712 | 713 | def file_handler(args): 714 | 715 | def gen_loop_head(): 716 | loop_head_codes = extend_codes(args.loop_heads, "LOOP HEAD") 717 | if args.convert: 718 | loop_head_codes.append("text = _convert(text)") 719 | if args.json: 720 | loop_head_codes.append("dic = json.loads(text)") 721 | return "\n".join(indent(c, 2) for c in loop_head_codes) 722 | 723 | wrapper = r"view({})" if args.view else r"_print({})" 724 | code = TEMPLATE_FILE.format( 725 | imp=gen_import(args), 726 | pre=gen_pre(args), 727 | mode=args.mode, 728 | loop_head=gen_loop_head(), 729 | loop_filter=gen_loop_filter(args, 2), 730 | main=gen_main(args, "text", wrapper, level=2), 731 | post=gen_post(args), 732 | ) 733 | exec_code(code, args) 734 | 735 | 736 | def load_custom_command(name): 737 | from os import environ 738 | from pathlib import Path 739 | custom_path = Path(environ.get("PYPIPE_CUSTOM", '~/.config/pypipe/pypipe_custom.py')) 740 | _globals = { 741 | '__name__': '__exec__', 742 | '__builtins__': globals()['__builtins__'] 743 | } 744 | # load custom configuration from the user custom file 745 | with open(custom_path.expanduser()) as f: 746 | exec(f.read(), _globals) 747 | custom_mode = _globals["custom_command"] 748 | return custom_mode[name] 749 | 750 | 751 | def custom_handler(args): 752 | config = load_custom_command(args.name) 753 | template = config["template"] 754 | code_indent = config.get("code_indent", 0) 755 | wrapper = config.get("wrapper") 756 | default_code = config.get("default_code") 757 | opt_configs = config.get("options", {}) 758 | 759 | def gen_loop_head(): 760 | loop_head_codes = extend_codes(args.loop_heads, "LOOP HEAD") 761 | return "\n".join(indent(c, code_indent) for c in loop_head_codes) 762 | 763 | wrapper = r"view({})" if args.view else wrapper 764 | params = { 765 | "imp": gen_import(args), 766 | "pre": gen_pre(args), 767 | "loop_head": gen_loop_head(), 768 | "loop_filter": gen_loop_filter(args, code_indent), 769 | "main": gen_main(args, default_code, wrapper, level=code_indent), 770 | "post": gen_post(args), 771 | } 772 | opts = {k: v for k, v in args.opts} 773 | opt_params = { 774 | k: opts.get(k) or opt_configs[k].get("default") 775 | for k in opt_configs 776 | } 777 | params.update(opt_params) 778 | code = template.format(**params) 779 | exec_code(code, args) 780 | 781 | 782 | def main(argv=sys.argv[1:]): 783 | def key_value(s): 784 | kv = s.split("=", 1) 785 | return kv[0], kv[1] 786 | 787 | def field_type(s): 788 | ret = {} 789 | for ft in s.split(","): 790 | f, t = ft.split(":", 1) 791 | ret[int(f)] = t 792 | return ret 793 | 794 | parser = argparse.ArgumentParser( 795 | description='Python PiPe command line tool') 796 | 797 | parser.add_argument( 798 | '-V', '--version', 799 | action='version', 800 | version=f'pypipe {__version__}' 801 | ) 802 | 803 | ## COMMON OPTIONS 804 | common_parser = argparse.ArgumentParser(add_help=False) 805 | common_parser.add_argument( 806 | "-v", '--view', 807 | action="store_true", 808 | ) 809 | common_parser.add_argument( 810 | "-k", '--color', 811 | choices=['always', 'auto', 'never'], 812 | default='auto', 813 | ) 814 | common_parser.add_argument( 815 | "-p", '--print', 816 | action="store_true", 817 | help="Only prints the generated code." 818 | ) 819 | common_parser.add_argument( 820 | "-o", '--output', 821 | help="Output file" 822 | ) 823 | common_parser.add_argument( 824 | "-q", '--no-comments', 825 | dest="no_comments", 826 | action="store_true", 827 | ) 828 | common_parser.add_argument( 829 | "-r", '--no-abbrevs', 830 | dest="no_abbrevs", 831 | action="store_true", 832 | ) 833 | common_parser.add_argument( 834 | "-n", '--no-wrapping', 835 | dest="no_wrapping", 836 | action="store_true" 837 | ) 838 | common_parser.add_argument( 839 | "-i", '--import', 840 | dest="import_codes", 841 | action="append", 842 | ) 843 | common_parser.add_argument( 844 | "-b", '--pre', 845 | dest="pre_codes", 846 | action="append", 847 | ) 848 | common_parser.add_argument( 849 | "-a", '--post', 850 | dest="post_codes", 851 | action="append", 852 | ) 853 | common_parser.add_argument( 854 | '-c', '--counter', 855 | action="store_true" 856 | ) 857 | common_parser.add_argument( 858 | '-D', '--output-delimiter', 859 | dest="output_delimiter", 860 | ) 861 | common_parser.add_argument( 862 | '-L', '--linebreak', 863 | action='store_const', 864 | const=r'\n', 865 | dest="output_delimiter", 866 | ) 867 | common_parser.add_argument( 868 | '-F', '--output-format', 869 | choices=FORMAT_PRINT_FUNC.keys(), 870 | default='default', 871 | dest="output_format", 872 | ) 873 | common_parser.add_argument( 874 | '-t', '--convert', 875 | action="store_true", 876 | ) 877 | common_parser.add_argument( 878 | '--paging', 879 | dest="paging", 880 | action="store_const", 881 | const=True, 882 | ) 883 | common_parser.add_argument( 884 | '--no-paging', 885 | dest="paging", 886 | action="store_const", 887 | const=False, 888 | ) 889 | 890 | ## LOOP OPTIONS 891 | loop_parser = argparse.ArgumentParser(add_help=False) 892 | loop_parser.add_argument( 893 | "-e", "--loop-head", 894 | dest="loop_heads", 895 | default=[], 896 | action="append", 897 | ) 898 | loop_parser.add_argument( 899 | "-f", "--filter", 900 | dest="filters", 901 | action="append", 902 | ) 903 | 904 | ## REC AND CSV OPTIONS 905 | rec_csv_parser = argparse.ArgumentParser(add_help=False) 906 | rec_csv_parser.add_argument( 907 | '-l', '--field-length', 908 | dest="field_length", 909 | type=int, 910 | ) 911 | rec_csv_parser.add_argument( 912 | '--type', '--field-type', 913 | dest="field_type", 914 | type=field_type, 915 | default={}, 916 | help="ex) 1:i,3:j,5:b" 917 | ) 918 | rec_csv_parser.add_argument( 919 | '-H', '--header', 920 | action="store_true", 921 | ) 922 | 923 | # SUB COMMANDS 924 | subparsers = parser.add_subparsers( 925 | title="subcommands", 926 | help="show subcommands help: %(prog)s subcommand -h" 927 | ) 928 | 929 | ## LINE 930 | line_parser = subparsers.add_parser( 931 | "line", aliases=['l'], parents=[common_parser, loop_parser]) 932 | line_parser.add_argument( 933 | '-j', '--json', 934 | action="store_true" 935 | ) 936 | line_parser.add_argument("codes", nargs='*') 937 | line_parser.set_defaults(handler=line_handler, command="line") 938 | 939 | ## REC 940 | rec_parser = subparsers.add_parser( 941 | "rec", aliases=['r', 'record'], parents=[common_parser, loop_parser, rec_csv_parser]) 942 | rec_parser.add_argument("codes", nargs='*') 943 | rec_parser.add_argument( 944 | '-d', '--delimiter', 945 | default=r'\t' 946 | ) 947 | rec_parser.add_argument( 948 | '-m', '--regex-match', 949 | dest="regex", 950 | ) 951 | rec_parser.add_argument( 952 | '-C', '--csv', 953 | action='store_const', 954 | dest="delimiter", 955 | const=',', 956 | ) 957 | rec_parser.add_argument( 958 | '-S', '--spaces', 959 | action='store_const', 960 | dest="delimiter", 961 | const=r'\s+', 962 | ) 963 | rec_parser.set_defaults(handler=rec_handler, command="rec") 964 | 965 | ## CSV 966 | csv_parser = subparsers.add_parser( 967 | "csv", parents=[common_parser, loop_parser, rec_csv_parser]) 968 | csv_parser.add_argument("codes", nargs='*') 969 | csv_parser.add_argument( 970 | '-d', '--delimiter', 971 | default=',' 972 | ) 973 | csv_parser.add_argument( 974 | '-O', '--csv-opt', 975 | dest="csv_opts", 976 | type=key_value, 977 | default=[], 978 | action="append", 979 | ) 980 | csv_parser.add_argument( 981 | '-T', '--tsv', 982 | action='store_const', 983 | dest="delimiter", 984 | const=r'\t', 985 | ) 986 | csv_parser.set_defaults(handler=csv_handler, command="csv") 987 | 988 | ## TEXT 989 | text_parser = subparsers.add_parser( 990 | "text", aliases=['t'], parents=[common_parser]) 991 | text_parser.add_argument("codes", nargs='*') 992 | text_parser.add_argument( 993 | '-j', '--json', 994 | action="store_true" 995 | ) 996 | text_parser.set_defaults(handler=text_handler, command="text") 997 | 998 | ## FILE 999 | file_parser = subparsers.add_parser( 1000 | "file", aliases=['f'], parents=[common_parser, loop_parser]) 1001 | file_parser.add_argument("codes", nargs='*') 1002 | file_parser.add_argument( 1003 | "-m", "--mode", 1004 | default='rt', 1005 | ) 1006 | file_parser.add_argument( 1007 | '-j', '--json', 1008 | action="store_true" 1009 | ) 1010 | file_parser.set_defaults(handler=file_handler, command="file") 1011 | 1012 | ## CUSTOM 1013 | custom_parser = subparsers.add_parser( 1014 | "custom", aliases=['c'], parents=[common_parser, loop_parser]) 1015 | custom_parser.add_argument( 1016 | '-O', '--opt', 1017 | action="append", 1018 | dest="opts", 1019 | default=[], 1020 | type=key_value, 1021 | ) 1022 | custom_parser.add_argument( 1023 | "-N", "--name", 1024 | required=True, 1025 | ) 1026 | custom_parser.add_argument("codes", nargs='*') 1027 | custom_parser.set_defaults(handler=custom_handler, command="custom") 1028 | 1029 | expected_1st_args = ( 1030 | "line", "l", "rec", "r", "csv", "text", "t", "file", "f", "custom", "c", 1031 | "-h", "--help", "-V", "--version" 1032 | ) 1033 | if len(argv) == 0 or argv[0] not in expected_1st_args: 1034 | argv.insert(0, "line") 1035 | 1036 | args = parser.parse_args(argv) 1037 | 1038 | if args.output_delimiter is None: 1039 | if 'delimiter' in args and args.delimiter and len(args.delimiter) == 1: 1040 | args.output_delimiter = args.delimiter 1041 | else: 1042 | args.output_delimiter = r'\t' 1043 | 1044 | args.all_code_trees = list(parse_all_codes(args)) 1045 | if args.command in ("rec", "csv") and args.field_length is None: 1046 | if check_field_variables_in_code(args): 1047 | args.field_length = 0 1048 | 1049 | if not check_wrapping_is_need(args): 1050 | args.no_wrapping = True 1051 | 1052 | args.colored = is_colored(args) 1053 | if paging_enabled(args): 1054 | enable_pager(args) 1055 | 1056 | args.handler(args) 1057 | 1058 | 1059 | if __name__ == '__main__': 1060 | main() 1061 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pypipe 2 | 3 | ```sh 4 | $ echo "pypipe" | ppp "line[::2]" 5 | ppp 6 | ``` 7 | 8 | **pypipe** is a Python command-line tool for pipeline processing. 9 | 10 | ## Demo 11 | ![Demo](docs/demo.svg) 12 | 13 | 14 | ## Quick links 15 | - [Installation](#installation) 16 | - [Basic usage and Examples](#basic-usage-and-examples) 17 | - [Automatic Import and Explicit Import](#automatic-import-and-explicit-import) 18 | - [Automatic type conversion `-t, --convert`](#automatic-type-conversion--t---convert) 19 | - [View mode `-v, --view`](#view-mode--v---view) 20 | - [Output formatting](#output-formatting) 21 | - [Counter `-c, --counter`](#counter--c---counter) 22 | - [pypipe is a code generator.](#pypipe-is-a-code-generator) 23 | - [Pager](#pager) 24 | 25 | 26 | 27 | ## Installation 28 | pypipe is a single Python file and uses only the standard library. You can use it by placing `pypipe.py` in a directory included in your PATH (e.g., ~/.local/bin). If execute permission is not already present, please add it. 29 | ```sh 30 | chmod +x pypipe.py 31 | ``` 32 | To make it easier to type, it's recommended to create a symbolic link. 33 | ```sh 34 | ln -s pypipe.py ppp 35 | ``` 36 | 37 | > [!Note] 38 | > pypipe requires Python 3.6 or later. 39 | 40 | pypipe can also be installed in the standard way for Python packages, using [pip](https://pip.pypa.io/en/stable/) or any compatible tool such as [pipx](https://pypa.github.io/pipx/). 41 | ```sh 42 | pipx install pypipe-ppp 43 | ``` 44 | It also supports running directly with pipx without installation. 45 | ```sh 46 | pipx run pypipe-ppp 47 | ``` 48 | 49 | You can also use it with [Wasmer](https://wasmer.io/): 50 | ```sh 51 | alias ppp="wasmer run bugen/pypipe -- " 52 | ``` 53 | 54 | ## Basic usage and Examples 55 | 56 | ### `| ppp line` 57 | 58 | Processing line-by-line. You can access the current line as `line` or `l`, and the current line number as `i`. 59 | 60 | ```sh 61 | $ cat staff.txt |ppp 'i, line.upper()' 62 | 1 NAME WEIGHT BIRTH AGE SPECIES CLASS 63 | 2 SIMBA 250 1994-06-15 29 LION MAMMAL 64 | 3 DUMBO 4000 1941-10-23 81 ELEPHANT MAMMAL 65 | 4 GEORGE 20 1939-01-01 84 MONKEY MAMMAL 66 | 5 POOH 1 1921-08-21 102 TEDDY BEAR ARTIFACT 67 | 6 BOB 0 1999-05-01 24 SPONGE DEMOSPONGE 68 | ``` 69 | 70 | Using the `-j, --json` option allows you to decode each line as JSON. The decoded result can be obtained as `dic`. 71 | ```sh 72 | $ cat staff.jsonlines.txt |ppp -j 'dic["Name"]' 73 | Simba 74 | Dumbo 75 | George 76 | Pooh 77 | Bob 78 | ``` 79 | 80 | ### `| ppp rec` 81 | 82 | Split each line by TAB. You can get the list including splitted strings as `rec` or `r` and the record number as `i`.. 83 | ```sh 84 | cat staff.txt |ppp rec 'r[:3]' 85 | Name Weight Birth 86 | Simba 250 1994-06-15 87 | Dumbo 4000 1941-10-23 88 | George 20 1939-01-01 89 | Pooh 1 1921-08-21 90 | Bob 0 1999-05-01 91 | ``` 92 | 93 | Using the `-l LENGTH, --length LENGTH` option allows you to get the values of each field as `f1, f2, f3, ....` 94 | ```sh 95 | $ tail -n +2 staff.txt |ppp rec -l5 'f"{f1} is {f4} years old"' 96 | Simba is 29 years old 97 | Dumbo is 81 years old 98 | George is 84 years old 99 | Pooh is 102 years old 100 | Bob is 24 years old 101 | ``` 102 | 103 | > [!Tip] 104 | > You can now use field variables (f1, f2, f3, ...) without specifying the `--length` option. 105 | > ``` 106 | > $ cat staff.txt | ppp rec f1,f2,f3 107 | > ``` 108 | > Using field variables can make typing easier, but you have to know the number of fields in advance. Omitting the `--length` option makes it more convenient to use, but if you omit it, performance will be degraded. In tests, processing data with about 60,000 records and 23 items took 0.45 seconds when specifying the `--length` option, whereas omitting the `--length` option took about 0.75 seconds. To maintain performance, either use the `--length` option or retrieve fields from rec using indices like `rec[0], rec[1], rec[2], ...` without using field variables. 109 | 110 | 111 | When using the `-H, --header` option, it treats the first line as a header line and skips it. The header values can be obtained from a list named `header`, and you can access the values of each field using the format `dic["FIELD_NAME"]`. 112 | ```sh 113 | $ cat staff.txt |ppp rec -H 'rec[0], dic["Birth"]' 114 | Simba 1994-06-15 115 | Dumbo 1941-10-23 116 | George 1939-01-01 117 | Pooh 1921-08-21 118 | Bob 1999-05-01 119 | ``` 120 | 121 | By using the `--type FIELD_TYPES, --field-type FIELD_TYPES`, you can specify the type of each field, allowing you to convert values from 'str' to the specified type. 122 | ```sh 123 | $ echo 'Hello 100 10.2 True {"id":100,"title":"sample"}'|ppp rec -l5 --type 2:i,3:f,4:b,5:j "type(f1),type(f2),type(f3),type(f4),type(f5)" 124 | 125 | ``` 126 | > [!Tip] 127 | > When there is a header row in the data, using `--type, --field-type` often results in errors when attempting to convert the header row's item names to the specified types. In such cases, you can avoid errors by using the `-H, --header` option to skip the header row. 128 | 129 | > [!Note] 130 | > pypipe has added support for [automatic type conversion](#automatic-type-conversion--t---convert). 131 | 132 | You can change the delimiter by using the `-d DELIMITER, --delimiter DELIMITER` option. 133 | ```sh 134 | $ cat staff.csv |ppp rec -d , -l6 f1 135 | Name 136 | Simba 137 | Dumbo 138 | George 139 | Pooh 140 | Bob 141 | ``` 142 | 143 | Also supports regular expression delimiters. 144 | 145 | ```sh 146 | $ echo 'AAA BBB CCC DDD' | ppp rec -d '\s+' rec[2] 147 | CCC 148 | ``` 149 | 150 | > [!Tip] 151 | > `-S, --spaces` option has the same meaning as `-d '\s+'`. 152 | 153 | You can change the output delimiter by using the `-D DELIMITER, --output-delimiter DELIMITER` option. 154 | ```sh 155 | $ cat staff.txt |ppp rec -D , 156 | Name,Weight,Birth,Age,Species,Class 157 | Simba,250,1994-06-15,29,Lion,Mammal 158 | Dumbo,4000,1941-10-23,81,Elephant,Mammal 159 | George,20,1939-01-01,84,Monkey,Mammal 160 | Pooh,1,1921-08-21,102,Teddy bear,Artifact 161 | Bob,0,1999-05-01,24,Sponge,Demosponge 162 | ``` 163 | 164 | When using the `-m, --regex-match` option, `rec` is generated through regular expression matching instead of delimiter-based splitting. 165 | ```sh 166 | $ echo 'Height: 200px, Width: 1000px' | ppp rec -m '\d+' r[1] 167 | 1000 168 | ``` 169 | 170 | ### `| ppp csv` 171 | `csv` is similar to `rec`, but the difference is that while `rec` simply splits the line using the specified DELIMITER like this, `'line.split(DELIMITER))'`, `csv` uses the [csv](https://docs.python.org/3/library/csv.html) library for parsing. Furthermore, `rec` is tab-separated by default, whereas `csv` is comma-separated. 172 | 173 | You can specify options to pass to csv.reader and csv.writer using the `-O NAME=VALUE, --csv-opt NAME=VALUE` option. 174 | ```sh 175 | $ cat staff.csv |ppp csv -O 'quoting=csv.QUOTE_ALL' 176 | "Name","Weight","Birth","Age","Species","Class" 177 | "Simba","250","1994-06-15","29","Lion","Mammal" 178 | "Dumbo","4000","1941-10-23","81","Elephant","Mammal" 179 | "George","20","1939-01-01","84","Monkey","Mammal" 180 | "Pooh","1","1921-08-21","102","Teddy bear","Artifact" 181 | "Bob","0","1999-05-01","24","Sponge","Demosponge" 182 | ``` 183 | 184 | 185 | ### `| ppp text` 186 | In `ppp text`, the entire standard input is read as a single piece of text. You can access the read text as `text`. 187 | 188 | ```sh 189 | $ cat staff.txt | ppp text 'len(text)' 190 | 231 191 | ``` 192 | 193 | For example, `ppp text` is particularly useful when working with an indented JSON file. Using the `-j, --json` option allows you to decode the text into JSON. The decoded data can be obtained as a `dic`. 194 | ```sh 195 | $ cat staff.json |ppp text -j 'dic["data"][0]' 196 | {'Name': 'Simba', 'Weight': 250, 'Birth': '1994-06-15', 'Age': 29, 'Species': 'Lion', 'Class': 'Mammal'} 197 | ``` 198 | 199 | > [!Tip] 200 | > You can also use `-j, --json` option in `line` and `file`. 201 | 202 | ### `| ppp file` 203 | In `ppp file`, it receives a list of file paths from standard input. It then opens each received file path, reads the contents of the file into `text`, and repeats this process for each received file path in a loop. The received paths can be obtained as `path`. 204 | 205 | ```sh 206 | $ ls staff.txt staff.csv staff.json staff.xml |ppp file 'path, len(text)' 207 | staff.csv 231 208 | staff.json 1046 209 | staff.txt 231 210 | staff.xml 1042 211 | ``` 212 | 213 | For example, `ppp file` is useful, especially when processing a large number of JSON files. 214 | ```sh 215 | find . -name '*.json'| ppp file --json ... 216 | ``` 217 | 218 | ### `| ppp custom -N NAME` 219 | You can easily create custom commands using pypipe. First, you define custom commands. The definition file is, by default, located at `~/.config/pypipe/pypipe_custom.py`. You can change the path of this file using the `PYPIPE_CUSTOM` environment variable. 220 | 221 | The following is an example of defining custom commands xpath and sum. 222 | 223 | ~/.config/pypipe/pypipe_custom.py 224 | ```python 225 | TEMPLATE_XPATH = r""" 226 | from lxml import etree 227 | {imp} 228 | 229 | def output(e): 230 | if isinstance(e, etree._Element): 231 | print(etree.tostring(e).decode().rstrip()) 232 | else: 233 | _print(e) 234 | 235 | {pre} 236 | 237 | tree = etree.parse(sys.stdin) 238 | for e in tree.xpath('{path}'): 239 | {loop_head} 240 | {loop_filter} 241 | {main} 242 | 243 | {post} 244 | """ 245 | 246 | TEMPLATE_SUM = r""" 247 | import re 248 | import sys 249 | {imp} 250 | 251 | ptn = re.compile(r'{pattern}') 252 | s = 0 253 | 254 | def add_or_print(*args): 255 | global s 256 | rec = args[0] 257 | if len(args) == 2: 258 | if isinstance(args[1], int): 259 | i = args[1] 260 | if len(rec) >= i: 261 | s += rec[i-1] 262 | else: 263 | print(args[1]) 264 | else: 265 | print(*args[1:]) 266 | 267 | 268 | for line in sys.stdin: 269 | line = line.rstrip('\r\n') 270 | rec = [{type}(e) for e in ptn.findall(line)] 271 | if not rec: 272 | continue 273 | {loop_head} 274 | {loop_filter} 275 | {main} 276 | 277 | print(s) 278 | """ 279 | 280 | custom_command = { 281 | "xpath": { 282 | "template": TEMPLATE_XPATH, 283 | "code_indent": 1, 284 | "default_code": "e", 285 | "wrapper": 'output({})', 286 | "options": { 287 | "path": {"default": '/'} 288 | } 289 | }, 290 | "sum": { 291 | "template": TEMPLATE_SUM, 292 | "code_indent": 1, 293 | "default_code": "1", 294 | "wrapper": 'add_or_print(rec, {})', 295 | "options": { 296 | "pattern": {"default": r'\d+'}, 297 | "type": {"default": 'int'} 298 | } 299 | }, 300 | } 301 | ``` 302 | 303 | You can use them as follows: 304 | 305 | ```sh 306 | $ cat staff.xml |ppp custom -N xpath -O path='./Animal/Age' 307 | 29 308 | 81 309 | 84 310 | 102 311 | 24 312 | ``` 313 | 314 | ```sh 315 | $ seq 10000| ppp c -Nsum -f 'rec[0] % 3 == 0' 316 | 16668333 317 | ``` 318 | 319 | ## Automatic Import and Explicit Import 320 | pypipe attempts to automatically import the necessary modules. While explicit import is likely not required in most cases, it is also possible to explicitly import the necessary modules using the `-i IMPORT, --import IMPORT` option. The following examples all work in the same way: 321 | 322 | ```sh 323 | $ seq 10 | ppp 'math.sqrt(int(line))' 324 | ``` 325 | ```sh 326 | $ seq 10 | ppp -i math 'math.sqrt(int(line))' 327 | ``` 328 | ```sh 329 | $ seq 10 | ppp -i 'from math import sqrt' 'sqrt(int(line))' 330 | ``` 331 | Using the explicit import format `from import ` can be useful in cases where you need to use the `` multiple times within the code. 332 | > [!Note] 333 | > See also [here](#import-modules--i-module---import-module) about `-i IMPORT, --import IMPORT` option. 334 | 335 | 336 | ## Automatic type conversion `-t, --convert` 337 | When using the -t, --convert option, it automatically converts the input types. 338 | ```console 339 | $ echo 'Hello 100 10.2 True None (1,2,3) [1,2,3] {1,2,3} {"id":100,"title":"sample"}'|ppp rec --view -t "[(v, type(v)) for v in rec]" 340 | [Record 1] 341 | 1 ('Hello', ) 342 | 2 (100, ) 343 | 3 (10.2, ) 344 | 4 (True, ) 345 | 5 (None, ) 346 | 6 ((1, 2, 3), ) 347 | 7 ([1, 2, 3], ) 348 | 8 ({1, 2, 3}, ) 349 | 9 ({'id': 100, 'title': 'sample'}, ) 350 | ``` 351 | In the following example, there is no longer a need to explicitly convert to a numeric type like `int(rec[1]) > 100`; it now works with `rec[1] > 100`. 352 | ```console 353 | $ cat staff.txt | ppp rec --convert --header --filter 'rec[1] > 100' 354 | Simba 250 1994-06-15 29 Lion Mammal 355 | Dumbo 4000 1941-10-23 81 Elephant Mammal 356 | ``` 357 | > [!Tip] 358 | > The `-t, --convert` option is available for use with line, rec, csv, text, and file. 359 | 360 | > [!Tip] 361 | > Automatic type conversion supports int, float, bool, None, json (dict, list, bool, null), and eval (tuple, list, set, dict). 362 | 363 | > [!Warning] 364 | > The `-t, --convert` option is convenient but may lead to a performance degradation when used. It should not be used if performance is crucial. 365 | 366 | ## View mode `-v, --view` 367 | When using the `-v, --view` option, the output is pretty printed with colored formatting. Data formats with many items such as CSV, TSV, JSON, and others can be hard to read in their raw format, making the View mode particularly useful when inspecting such data. In View mode, `dict`, `list` and `tuple` are formatted using the standard library's `pprint`. 368 | 369 | ![Alt text](docs/view_sample1.png) 370 | 371 | When you use both the `-v, --view` option and the `-H, --header` option together, it displays the values along with the field names. 372 | 373 | ![Alt text](docs/view_sample2.png) 374 | 375 | In View mode, `dict`, `list` and `tuple` are formatted using the standard library's `pprint`. 376 | 377 | ![Alt text](docs/view_sample3.png) 378 | 379 | 380 | ### `-k COLOR_MODE, --color COLOR_MODE` 381 | In View mode, pypipe automatically determines whether to apply colorization. By default, when outputting to a terminal, the output will be in color. However, if you redirect the output to a file or pipe it to another command, it will not be in color. You can change this behavior using the `-k COLOR_MODE, --color COLOR_MODE` options: 382 | 383 | - Using `-k auto` or `--color auto` lets the tool automatically decide whether to apply colorization. 384 | - Using `-k always` or `--color always` forces colorization at all times. 385 | - Using `-k never` or `--color never` disables colorization. 386 | 387 | Also, by setting the `PYPIPE_VIEW_COLORED` environment variable to `false`, you can disable colors by default. However, if the `-k, --color` option is specified, it takes precedence. 388 | 389 | ## Output formatting 390 | In pypipe, you have the flexibility to write code to output results in any desired format. For example: 391 | 392 | ```sh 393 | $ echo "Hello" | ppp line -n 'print(line + " World!")' 394 | Hello World! 395 | ``` 396 | 397 | Please note the presence of the `-n` option in the command above. If you omit this option, the output will look like this: 398 | 399 | ```sh 400 | $ echo "Hello" | ppp line 'print(line + " World!")' 401 | Hello World! 402 | None 403 | ``` 404 | 405 | So, what's happening here? When you have questions about pypipe's behavior, a good approach is to inspect the code generated using the `-p, --print` option. 406 | 407 | ```sh 408 | ~$ echo "Hello" | ppp line 'print(line + " World!")' -p 409 | # IMPORT 410 | import sys 411 | from functools import partial 412 | 413 | # PRE 414 | _p = partial(print, sep="\t") # ABBREV 415 | I, S, B, L, D, SET = 0, "", False, [], {}, set() # ABBREV 416 | 417 | def _print(*args, sep='\t'): 418 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 419 | print(sep.join(str(v) for v in args[0])) 420 | else: 421 | print(sep.join(str(v) for v in args)) 422 | 423 | 424 | for i, line in enumerate(sys.stdin, 1): 425 | line = line.rstrip("\r\n") 426 | l = line # ABBREV 427 | # LOOP HEAD 428 | # LOOP FILTER 429 | # MAIN 430 | _print(print(line + " World!")) 431 | 432 | # POST 433 | ``` 434 | 435 | In this case, running `ppp line 'print(line + " World!")' -p` should reveal a line in the generated code like `_print(print(line + " World!"))`. This is due to a unique feature of pypipe called as [Code wrapping](#code-wrappping). 436 | 437 | Let's make a slight modification to the command by removing the print function: 438 | 439 | ```sh 440 | $ echo "Hello" | ppp line 'line + " World!"' 441 | Hello World! 442 | ``` 443 | 444 | Indeed, pypipe is designed to allow the omission of the print function for less typing. 445 | 446 | ### Change the behavior of the `_print` function 447 | By default, the `_print({})` wrapper is used. The `_print` function is an internally implemented output function in pypipe and has the following implementation: 448 | 449 | ```python 450 | def _print(*args, sep='\t'): 451 | if len(args) is 1 and isinstance(args[0], (list, tuple)): 452 | print(sep.join(str(v) for v in args[0])) 453 | else: 454 | print(sep.join(str(v) for v in args)) 455 | ``` 456 | You can replace the implementation of the _print function using the `-F FORMAT, --output-format FORMAT` option. pypipe allows you to control the output format by changing the implementation of the _print function. 457 | 458 | #### `-Fd, -F default, --output-format=default` 459 | Default output format. 460 | 461 | Implementation of the _print function: as described above. 462 | 463 | Output example: 464 | ```sh 465 | $ echo '["aaa", "bbb", "ccc"]' | ppp --json -Fd dic 466 | aaa bbb ccc 467 | ``` 468 | 469 | #### `-Fj, -F json, --output-format=json` 470 | Converts `dict`, `list`, and `tuple` to JSON format for output. However, when a single string is passed, it will not be enclosed in double quotes (meaning it is not in JSON string format). 471 | 472 | Implementation of the `_print` function: 473 | ```python 474 | def _json(v): 475 | if isinstance(v, (dict, list, tuple)): 476 | v = json.dumps(v) 477 | elif not isinstance(v, str): 478 | v = str(v) 479 | return v 480 | 481 | def _print(*args, sep='\t'): 482 | print(sep.join(_json(v) for v in args)) 483 | ``` 484 | 485 | Output example: 486 | 487 | ```sh 488 | $ echo '["aaa", "bbb", "ccc"]' | ppp --json -Fj dic 489 | ["aaa", "bbb", "ccc"] 490 | ``` 491 | 492 | #### `-Fn, -F native, --output-format=native` 493 | Uses the standard print function for output. 494 | 495 | Implementation of the `_print` function: 496 | ```python 497 | _print = partial(print, sep='\t') 498 | ``` 499 | 500 | Output example: 501 | ```sh 502 | $ echo '["aaa", "bbb", "ccc"]' | ppp --json -Fn dic 503 | ['aaa', 'bbb', 'ccc'] 504 | 505 | ``` 506 | 507 | ### Change the output delimiter `-D DELIMITER, --output-delimiter DELIMITER` 508 | You can change the output delimiter using the `-D DELIMITER, --output-delimiter DELIMITER` option. The delimiter does not have to be a single character, you can specify multiple characters [^1]. 509 | 510 | ```sh 511 | $ cat staff.txt | ppp rec -D ' | ' 512 | Name | Weight | Birth | Age | Species | Class 513 | Simba | 250 | 1994-06-15 | 29 | Lion | Mammal 514 | Dumbo | 4000 | 1941-10-23 | 81 | Elephant | Mammal 515 | George | 20 | 1939-01-01 | 84 | Monkey | Mammal 516 | Pooh | 1 | 1921-08-21 | 102 | Teddy bear | Artifact 517 | Bob | 0 | 1999-05-01 | 24 | Sponge | Demosponge 518 | ``` 519 | 520 | [^1]: Internally, the character specified using the `-D, --output-delimiter` option is passed as the `sep` argument to the `_print` function. Then, you can specify multiple characters for `sep`. However, it's important to note that in the `csv` command, a different output function using `csv.writer` is used as a wrapper, rather than the `_print` function. In this case, the character specified using the `-D, --output-delimiter` option is passed as the `delimiter` argument to `csv.writer`, and specifying multiple characters for the delimiter is not possible. 521 | 522 | #### `-L, --linebreak` 523 | The `-L, --linebreak` option has the same meaning as `-D '\n', --output-delimiter '\n'`. It is useful when connecting pypipe's output to pypipe. Instead of writing a for loop in pypipe, you can use `-L, --linebreak` to connect to the next pypipe, enabling you to achieve similar processing as nested for loops. 524 | 525 | Using `-L` to output with line breaks: 526 | ```sh 527 | $ cat staff.json|ppp text -j '*dic["data"]' -Fj -L 528 | {"Name": "Simba", "Weight": 250, "Birth": "1994-06-15", "Age": 29, "Species": "Lion", "Class": "Mammal"} 529 | {"Name": "Dumbo", "Weight": 4000, "Birth": "1941-10-23", "Age": 81, "Species": "Elephant", "Class": "Mammal"} 530 | {"Name": "George", "Weight": 20, "Birth": "1939-01-01", "Age": 84, "Species": "Monkey", "Class": "Mammal"} 531 | {"Name": "Pooh", "Weight": 1, "Birth": "1921-08-21", "Age": 102, "Species": "Teddy bear", "Class": "Artifact"} 532 | {"Name": "Bob", "Weight": 0, "Birth": "1999-05-01", "Age": 24, "Species": "Sponge", "Class": "Demosponge"} 533 | ``` 534 | To further process this output: 535 | 536 | ```sh 537 | $ cat staff.json | ppp text -j '*dic["data"]' -Fj -L | ppp -j 'dic["Weight"]' | ppp c -N sum 538 | 4271 539 | ``` 540 | 541 | This can also be written as follows. Please use your preferred method: 542 | ```sh 543 | $ cat staff.json|ppp text -j ' 544 | > for r in dic["data"]: 545 | > I += r["Weight"] 546 | > ' -n -a 'print(I)' 547 | 4271 548 | ``` 549 | ## Counter `-c, --counter` 550 | Using the `-c, --counter` option allows for easy data aggregation. When you specify the `-c, --counter` option, it creates an instance of collections.Counter, which can be accessed as either `counter` or `c`. The `-c, --counter` option is available for use in all commands. 551 | 552 | An example of aggregating data by the 'Gender' and 'Hobby' fields. 553 | ```sh 554 | $ cat people.csv |ppp csv -H --counter 'dic["Gender"], dic["Hobby"]'| head -n10 555 | Female Cooking 4 556 | Male Hiking 3 557 | Female Reading 3 558 | Male Gardening 3 559 | Female Traveling 3 560 | Male Playing Music 3 561 | Female Dancing 3 562 | Female Hiking 3 563 | Female Painting 2 564 | Male Photography 2 565 | ``` 566 | 567 | This is an example to aggregate data based on whether female individuals are 30 years or older. 568 | ```sh 569 | cat people.csv |ppp csv -H -c -f 'dic["Gender"] == "Female"' 'int(dic["Age"]) >= 30' 570 | False 16 571 | True 10 572 | ``` 573 | 574 | When using the `-c, --counter` option, it uses `counter[{}] += 1` as the wrapper. If you want to count in a different way, you can disable the wrapping by using the `-n, --no-wrapping` option and add your own counting code. 575 | 576 | ```sh 577 | $ cat population.csv |ppp csv -H -c -n 'counter[dic["State"]] += int(dic["Population"])' 578 | New York 8398748 579 | Texas 7751480 580 | California 7327731 581 | Illinois 2705994 582 | Arizona 1680992 583 | Pennsylvania 1584138 584 | Florida 903889 585 | Ohio 892533 586 | Indiana 876862 587 | North Carolina 792862 588 | Washington 753675 589 | Michigan 673104 590 | ``` 591 | 592 | Information about [Code wrapping](#code-wrapping). 593 | 594 | 595 | ## pypipe is a code generator. 596 | pypipe is a command-line tool for pipeline processing, but it can also be thought of as a code generator. It generates code internally using the given arguments and then executes the generated code using the `exec` function. Therefore, instead of executing the generated code, you have the option to print it to the standard output or save it to a file. 597 | 598 | ### Print generated code. `-p, --print` 599 | To check the generated code, you can use the `-p, --print` option. 600 | ```sh 601 | ppp file -m rb -i hashlib -b 'total = 0' -b '_p("PATH", "SIZE", "MD5")' -e 'size = len(text)' -f 'path.stem == "staff"' 'total += size' 'path, size, hashlib.md5(text).hexdigest()' -a 'print(f"Total size: {total}")' -p 602 | ``` 603 | The generated code is output as follows. 604 | ```python 605 | # IMPORT 606 | import sys 607 | from functools import partial 608 | import gzip 609 | from pathlib import Path 610 | import hashlib 611 | 612 | def _open(path): 613 | if path.suffix == '.gz': 614 | return gzip.open(path, 'rb') 615 | else: 616 | return open(path, 'rb') 617 | 618 | # PRE 619 | _p = partial(print, sep="\t") # ABBREV 620 | I, S, B, L, D, SET = 0, "", False, [], {}, set() # ABBREV 621 | 622 | def _print(*args, sep='\t'): 623 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 624 | print(sep.join(str(v) for v in args[0])) 625 | else: 626 | print(sep.join(str(v) for v in args)) 627 | 628 | total = 0 629 | _p("PATH", "SIZE", "MD5") 630 | 631 | for i, line in enumerate(sys.stdin, 1): 632 | path = Path(line.rstrip('\r\n')) 633 | with _open(path) as file: 634 | text = file.read() 635 | # LOOP HEAD 636 | size = len(text) 637 | # LOOP FILTER 638 | if not (path.stem == "staff"): continue 639 | # MAIN 640 | total += size 641 | _print(path, size, hashlib.md5(text).hexdigest()) 642 | 643 | # POST 644 | print(f"Total size: {total}", file=sys.stderr) 645 | ``` 646 | 647 | Check that there are no issues with the generated code and execute it. 648 | ```sh 649 | $ find docs -type f |ppp file -m rb -i hashlib -b 'total = 0' -b '_p("PATH", "SIZE", "MD5")' -e 'size = len(text)' -f 'path.stem == "staff"' 'total += size' 'path, size, hashlib.md5(text).hexdigest()' -a 'print(f"Total size: {total}")' 650 | PATH SIZE MD5 651 | docs/staff.json 1046 3f81986424eea2648bcabec324f8e959 652 | docs/staff.txt 231 a0757fb3838ed1235b21f96e1953445c 653 | docs/staff.xml 1042 7d36d493c1dd7594db3426f242b667f6 654 | docs/staff.csv 231 6cba6414c49b8762d6a49e2d9a62e563 655 | Total size: 2550 656 | ``` 657 | 658 | ### Save generated code to a file. `-o PATH, --output PATH` 659 | For writing more complex code, it's a good practice to create a template code with pypipe and edit the templated code manually. Here's the process you can follow: 660 | 661 | 1. Create a template code with pypipe and save it to a file, for example: 662 | ```sh 663 | ppp line --output /tmp/pipe.py ... 664 | ``` 665 | 2. Edit the code in /tmp/pipe.py to suit your needs. 666 | 3. Execute the modified code by piping input to it, for example: 667 | ```sh 668 | cat sample.txt | /tmp/pipe.py 669 | ``` 670 | 671 | ### Main codes 672 | The main code is specified as positional arguments. You can specify multiple main codes. The placement of the main code varies depending on the command. In commands like `line`, `rec`, `csv`, and `file`, the main code is added within the loop processing with proper indentation. However, in the `text` command, where there is no loop processing, the main code is added without indentation. 673 | In the `custom` command, the main code is added according to the definitions provided in the `pypipe_custom.py` file. 674 | 675 | ```sh 676 | $ ppp text -pqrn "for word in text.split():" " print(word)" 677 | ``` 678 | ```python 679 | import sys 680 | from functools import partial 681 | 682 | def _print(*args, sep='\t'): 683 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 684 | print(sep.join(str(v) for v in args[0])) 685 | else: 686 | print(sep.join(str(v) for v in args)) 687 | 688 | text = sys.stdin.read() 689 | for word in text.split(): # <- HERE 690 | print(word) # <- HERE 691 | ``` 692 | 693 | You can also write it with line breaks in the terminal as follows: 694 | ```sh 695 | $ ppp text -pqrn ' 696 | > for word in text.split(): 697 | > print(word) 698 | > ' 699 | ``` 700 | 701 | 702 | ### Default main code 703 | If no main code is specified in the arguments, pypipe adds a predefined default code. For example, the default code in Line mode is `'line'`. 704 | ```sh 705 | ppp -pqr 706 | ``` 707 | ```python 708 | import sys 709 | from functools import partial 710 | 711 | 712 | def _print(*args, sep='\t'): 713 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 714 | print(sep.join(str(v) for v in args[0])) 715 | else: 716 | print(sep.join(str(v) for v in args)) 717 | 718 | 719 | for i, line in enumerate(sys.stdin, 1): 720 | line = line.rstrip("\r\n") 721 | _print(line) # Default code with code wrappping. 722 | ``` 723 | 724 | ### Code wrapping 725 | By default, pypipe wraps the last code specified in the arguments with a predefined wrapper. For example, in `ppp line`, it uses `'_print({})'` as the wrapper. However, if the `-c, --counter` option is specified, it uses `'counter[{}] += 1'` as the wrapper instead. 726 | ```sh 727 | $ ppp line 'year = int(line)' year -pqr 728 | ``` 729 | ```python 730 | import sys 731 | from functools import partial 732 | 733 | 734 | def _print(*args, sep='\t'): 735 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 736 | print(sep.join(str(v) for v in args[0])) 737 | else: 738 | print(sep.join(str(v) for v in args)) 739 | 740 | 741 | for i, line in enumerate(sys.stdin, 1): 742 | line = line.rstrip("\r\n") 743 | year = int(line) 744 | _print(year) # Wrapping 745 | ``` 746 | #### Disable code wrapping. `-n, --no-wrapping` 747 | If you want to disable the wrapping of the last code specified in the arguments by a predefined wrapper, you can use the `-n, --no-wrapping` option. 748 | ```sh 749 | $ ppp line -n 'I = max(len(line), I)' -a 'print(I)' -pq 750 | ``` 751 | ```python 752 | import sys 753 | from functools import partial 754 | 755 | _p = partial(print, sep="\t") # ABBREV 756 | I, S, B, L, D, SET = 0, "", False, [], {}, set() # ABBREV 757 | 758 | def _print(*args, sep='\t'): 759 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 760 | print(sep.join(str(v) for v in args[0])) 761 | else: 762 | print(sep.join(str(v) for v in args)) 763 | 764 | 765 | for i, line in enumerate(sys.stdin, 1): 766 | line = line.rstrip("\r\n") 767 | l = line # ABBREV 768 | I = max(len(line), I) # No wrapping 769 | 770 | print(I) 771 | ``` 772 | 773 | ### Pre and Post codes. `-b CODE, --pre CODE`, `-a CODE, --post CODE` 774 | The code specified with `-b CODE, --pre CODE` will be added before the loop processing or the main code. This can be useful for declaring variables or performing any necessary setup before entering a loop or executing the main code. The code specified with `-a CODE, --post CODE` will be added after the loop processing or the main code. This can be useful for displaying aggregated results or performing any additional actions after the loop or main code execution. 775 | 776 | ```sh 777 | $ ppp rec --pqrn -b 'TOTAL = 0' -b 'MAX = 0' 'TOTAL += int(rec[0])' 'MAX = max(MAX, int(rec[0]))' -a 'print(f"TOTAL: {TOTAL}")' -a 'print(f"MAX: {MAX}")' 778 | ``` 779 | ```python 780 | import sys 781 | from functools import partial 782 | 783 | 784 | def _print(*args, sep='\t'): 785 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 786 | print(sep.join(str(v) for v in args[0])) 787 | else: 788 | print(sep.join(str(v) for v in args)) 789 | 790 | 791 | TOTAL = 0 # PRE 792 | MAX = 0 # PRE 793 | 794 | for i, line in enumerate(sys.stdin, 1): 795 | line = line.rstrip("\r\n") 796 | rec = line.split('\t') 797 | TOTAL += int(rec[0]) 798 | MAX = max(MAX, int(rec[0])) 799 | 800 | print(f"TOTAL: {TOTAL}") # POST 801 | print(f"MAX: {MAX}") # POST 802 | ``` 803 | 804 | ### Inner loop. `-e CODE, --loop-head CODE`, `-f CODE, --filter CODE` 805 | In the loop processing of `line`, `rec`, `csv`, and `file` commands, the code is added in the following positions: 806 | ``` 807 | for ... : 808 | {loop_head} # Added with the -e CODE, --loop-head CODE option. 809 | {filter} # Added with the -f CODE, --filter CODE option. 810 | {main} # The main code is added here. 811 | ``` 812 | "loop_head" is added using the `-e CODE, --loop-head CODE` option, while "filter" is added using the `-f CODE, --filter CODE` option. 813 | Please note that the "loop_head" code is added as-is, while the "loop_filter" is wrapped with `if not ({}): continue`. 814 | 815 | ```sh 816 | $ ppp line -pqrn -e 'line = line.replace("foo", "bar")' -e 'line = line.upper()' -f '"BAR" in line' 'print(line)' 817 | ``` 818 | ```python 819 | import sys 820 | from functools import partial 821 | 822 | 823 | def _print(*args, sep='\t'): 824 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 825 | print(sep.join(str(v) for v in args[0])) 826 | else: 827 | print(sep.join(str(v) for v in args)) 828 | 829 | 830 | for i, line in enumerate(sys.stdin, 1): 831 | line = line.rstrip("\r\n") 832 | line = line.replace("foo", "bar") # LOOP_HEAD 833 | line = line.upper() # LOOP_HEAD 834 | if not ("BAR" in line): continue # FILTER 835 | print(line) # MAIN 836 | ``` 837 | 838 | ### Import modules. `-i MODULE, --import MODULE` 839 | 840 | By using the `-i MODULE, --import MODULE` option, you can import any modules. If the value specified with `--import` is in the form of a sentence, like `import math` or `from math import sqrt`, it will be added as an import statement just as it is. If only the module name is provided, like `math`, it will automatically be given an import statement, such as `import math`. 841 | 842 | ppp text -i zlib -i 'from base64 import b64encode' 'b64encode(zlib.compress(text.encode()))' 843 | 844 | ```sh 845 | $ ppp text -pqrn -i zlib -i 'from base64 import b64encode' 'print(b64encode(zlib.compress(text.encode())))' 846 | ``` 847 | ```python 848 | import sys 849 | from functools import partial 850 | import zlib # <- HERE 851 | from base64 import b64encode # <- HERE 852 | 853 | def _print(*args, sep='\t'): 854 | if len(args) == 1 and isinstance(args[0], (list, tuple)): 855 | print(sep.join(str(v) for v in args[0])) 856 | else: 857 | print(sep.join(str(v) for v in args)) 858 | 859 | 860 | text = sys.stdin.read() 861 | print(b64encode(zlib.compress(text.encode()))) 862 | ``` 863 | 864 | Usage example. 865 | ```sh 866 | $ seq 5 |ppp -i math 'line, math.sqrt(int(line))' 867 | 1 1.0 868 | 2 1.4142135623730951 869 | 3 1.7320508075688772 870 | 4 2.0 871 | 5 2.23606797749979 872 | ``` 873 | 874 | ## Pager 875 | 876 | ### Enable/Disable Pager 877 | In pypipe, the pager is automatically enabled if the standard output is a tty. To disable the pager, set the `PYPIPE_PAGER_ENABLED` environment variable to `false`. Additionally, you can enable/disable the pager by specifying the `--paging` or `--no-paging` options. This takes precedence over the `PYPIPE_PAGER_ENABLED` setting. However, if the standard output is not a tty, specifying `--paging` will not enable the pager. 878 | 879 | ### Pager command 880 | The default pager command is `less` (recommended, tested). You can change the pager command by setting the `PYPIPE_PAGER` environment variable. If `less` is specified as the PAGER, pypipe automatically adds the options set in the `PYPIPE_LESS_OPTS` environment variable. The default value for PYPIPE_LESS_OPTS is `-R -F`. 881 | 882 | ### Pager for `-p, --print` 883 | > [!Warning] 884 | > When interrupting with Ctrl-C while using `bat` as a pager, a display issue has been identified where the terminal output becomes corrupted (terminal command input is no longer visible). Exiting bat with `q` avoids this issue. 885 | 886 | You can change the Pager used when the `-p, --print` option is specified to a different Pager than the default. For example, by setting the `PYPIPE_PRINT_PAGER` environment variable as shown below, you can use [bat](https://github.com/sharkdp/bat) to display syntax-highlighted code: 887 | ```ini 888 | export PYPIPE_PRINT_PAGER='bat -l python --file-name=PYPIPE_GENERATED_CODE' 889 | ``` 890 | 891 | #### Output example when 'bat' is set as the Pager. 892 | ![Alt text](docs/bat_pager_sample.png) 893 | 894 | 895 | ### Pager for `-v, --view` 896 | Similarly, by setting the `PYPIPE_VIEW_PAGER` environment variable, you can change the Pager used when the `-v, --view` option is specified to a different Pager than the default. Also, if you do not want to pass color control escape sequences to the Pager, you can disable colors by setting the `PYPIPE_VIEW_COLORED` environment variable to `false`, thereby avoiding this. 897 | 898 | 908 | --------------------------------------------------------------------------------