├── .gitignore ├── Changelog.md ├── Images └── GitHub_Batoken_Logo-02.png ├── LICENSE ├── MSXBatoken.ini ├── README.md ├── Tools ├── RandomNumbers.py ├── TokenCompare.py └── TokenFormatViz.py ├── msxbatoken.py ├── openMSXBatoken.ini └── openMSXbatoken.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store 2 | Versions/ 3 | Support/ -------------------------------------------------------------------------------- /Changelog.md: -------------------------------------------------------------------------------- 1 | # MSX Basic Tokenizer 2 | 3 | ## **v1.4** 4 | ***10-12-2021*** 5 | - WINDOWS COMPATIBILITY YEY! 6 | - `os.path()` operations to improve compatibility across systems. 7 | - Changed `.ini` section names. 8 | 9 | ## **v1.3** 10 | ***14-2-2020*** 11 | - Python 3.8. 12 | - No more forcing an 8 character file name. 13 | - Changed `-fb` to `-frb`. 14 | - Warning issued if didn't delete original. 15 | 16 | ## **v1.2** 17 | ***29-1-2020*** 18 | - Bring version up to 1.2 to sync with **openMSX Basic (de)Tokenizer**. 19 | - Fully integrated with the **Badig** ecosystem. 20 | - Can be automatically called by the build system on **MSX Sublime Tools** and from **MSX Basic Dignified**. 21 | - Created an `.ini` file with `file_load`, `file_save`, `export_list`, `delete_original` and `verbose_level`. 22 | - Verbose level and log output upgraded to follow the **Badig** standard and elevated the tokenisation step by step to level 5. 23 | - Added `-do` *(delete original)* argument to delete the original ASCII file after the conversion. 24 | - Removed the `-bw` *(byte width)* argument. 25 | - Changed `-sl` *(save list)* to `-el` *(export list)* to avoid clash with **Dignified**'s `-sl` *(show labels)*. 26 | - `-el` now takes the numeric arguments from `-bw`. 27 | - Better `-fb` *(from build)* response 28 | - Throw an error if `destination` is the same as the `source` 29 | - Removed unnecessary global variables from functions. 30 | - Better error handling 31 | - Small code optimizations 32 | 33 | ## **v1.0** 34 | ***17-9-2019*** 35 | - .mlt Assembler like List file export option 36 | - The .mtl file uses the same MSX Sublime Tools syntax highlight as the MSX Basic 37 | - Fixed the order of the instructions on the token conversion list 38 | - If a smaller had the same chars of the start of a larger one it would be picked first. Larger should come first 39 | - Fixed all numbers parsing and conversions 40 | - Fixed rounding of double precision number 41 | - Added scientific notation 42 | - Fixed bug with empty commas on ON GOTO/GOSUB 43 | - Fixed discrepancy on numbers after AS on OPEN without the preceding # 44 | - Small code optimization 45 | 46 | ## **v0.1** 47 | - Initial release. 48 | 49 | --- 50 | 51 | # openMSX Basic (de)Tokenizer 52 | 53 | ## **v1.4** 54 | ***10-12-2021*** 55 | - WINDOWS COMPATIBILITY YEY! 56 | - Individual Windows and Mac paths on the `.ini` file. 57 | - `os.path()` operations to improve compatibility across systems. 58 | - Changed `.ini` section names. 59 | 60 | ## **v1.3** 61 | ***14-2-2020*** 62 | - Python 3.8. 63 | - Better subprocess call and IO handling. 64 | - Improved verbose output. 65 | - Changed `-fb` to `-frb`. 66 | - Warning issued if didn' delete original. 67 | - Fixed bug and better handling when trying to load or save files with spaces and more than 8 characters. 68 | - Files opened on openMSX now are internally cropped to 8 char and have spaces replaced with `_` 69 | - Error if conflicting file names due to disk format limitations. 70 | 71 | ## **v1.2** 72 | ***29-1-2020*** 73 | - Significant code rewrite to bring it to the **Badig** standard. 74 | - Fully integrated with the **Badig** ecosystem. 75 | - Can be automatically called by the build system on **MSX Sublime Tools** and from **MSX Basic Dignified**. 76 | - Created an `.ini` file with `file_load`, `file_save`, `machine_name`, `disk_ext_name`, `output_format`, `delete_original`, `verbose_level`, `openmsx_filepath`. 77 | - Verbose level and log output created to follow the **Badig** standard. 78 | - Added `-do` *(delete original)* argument to delete the original file after the conversion. 79 | - Added `-of` *(output file)* to indicate the format to save: tokenized or ASCII. 80 | - Added `-fb` *(from build)* 81 | - Removed `-asc` (replaced by `-of`) 82 | - Warning if a path is issued to the `destination` file. The path will be removed, the `destination` is always saved on the same folder (the mounted MSX disk) as the `source`. 83 | - Throw an error if `destination` is the same as the `source` 84 | - Will replace spaces on file names with an `_` to conform to the MSX disk specification. 85 | - Better error handling. 86 | 87 | ## **v1.1** 88 | ***9-8-2019*** 89 | - No more savestates. Emulator now boots with chosen (or default) machine and, if necesssary, disk extension. 90 | - Extension can be at slot A or B. Default A, add `:SlotB` after the name for slot B 91 | - Log output moved to function with more info 92 | - Better error handling 93 | 94 | ## **v1.0** 95 | - Initial release. 96 | -------------------------------------------------------------------------------- /Images/GitHub_Batoken_Logo-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farique1/MSX-Basic-Tokenizer/b0e47a28e2d46e214a3b459eaebb4cacf8773bbc/Images/GitHub_Batoken_Logo-02.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Fred Rique 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /MSXBatoken.ini: -------------------------------------------------------------------------------- 1 | [CONFIGS] 2 | file_load = 3 | file_save = 4 | export_list = 5 | delete_original = 6 | verbose_level = -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ⛔️ DEPRECATED 2 | ### This repo is part of a series of tools that were integrated into the [Basic Dignified Suite](https://github.com/farique1/basic-dignified). The newer versions are available there. This version is out of date and will no longer be supported here. 3 | ### This repo now exists for archival purpose only. 4 | ---------------------------- 5 | 6 | # MSX Basic Tokenizer 7 | **v1.4** 8 | 9 | Tokenize ASCII MSX Basic programs. 10 | 11 | > **MSX Basic Tokenizer** is now fully integrated with the **Badig** ecosystem. 12 | > It can be automatically called by the build system on **[MSX Sublime Tools](https://github.com/farique1/MSX-Sublime-Tools)** and from **[MSX Basic Dignified](https://github.com/farique1/msx-basic-dignified)**. 13 | 14 | ### How to use 15 | 16 | `msxbatoken.py [destination] [-do] [-el [0-32]] [-vb <0-5>] [-frb]` 17 | 18 | Arguments can be passed on the code itself, on `MSXBatoken.ini` or through the command line with each method having a priority higher than the one before. 19 | 20 | #### Arguments 21 | 22 | - *Source File* 23 | The ASCII file to tokenize. 24 | arg: `` ini: `file_load =` 25 | 26 | - *Destination FIle* 27 | The tokenized file. 28 | arg: `[destination]` ini: `file_save =` 29 | 30 | If `destination` is not given, the file will be saved as `source` with a `.bas` extension. 31 | An error will be given if `destination` is the same as `source`. 32 | 33 | - *Delete Original:* 34 | Delete the `source` file after the tokenized version is successfully saved. 35 | arg: `-do` ini: `delete_original = [true|false]` 36 | 37 | - *Export List:* 38 | Saves an `.mlt` list file similar to the ones exported by assemblers with the tokens alongside the ASCII code and some statistics. 39 | arg: `-el [0-32]` ini: `export_list = [0-32]` 40 | 41 | The `[0-32]` argument refer to the number of bytes shown on the list after the line number. 42 | The max value is `32`. If no number is given the default of `16` will be used. If `0` is given the list will not be exported. 43 | 44 | The format for each line is, eg: 45 | ``` 46 | 80da: ee80 7800 44 49 ef 50 49 f2 1f 41 31 41 59 26 53 60 00 00 120 DI=PI-3.1415926536 47 | ``` 48 | - *Bytes 1-2*: The MSX Basic memory address of the line. 49 | - *Bytes 3-6*: The first four bytes with the next line address and the line number. 50 | - *Bytes 7-...*: The tokenized line. 51 | - The line in ASCII. 52 | 53 | > The `.mlt` file uses the same [MSX Sublime Tools](https://github.com/farique1/MSX-Sublime-Tools) syntax highlight as the MSX Basic. 54 | 55 | - *Verbose level:* 56 | The amount o information given. 57 | arg: `-vb <0-5>` ini: `verbose_level = [0-5]` 58 | 59 | `0` - Silent 60 | `1` - Errors 61 | `2` - Errors + Warnings 62 | `3` - Errors + Warnings + Steps 63 | `4` - Errors + Warnings + Steps + Details 64 | `5` - Errors + Warnings + Steps + Details + Tokenization 65 | 66 | The 'Tokenization' verbose level shows the process byte by byte on each line. 67 | ```BlitzBasic 68 | 10 PRINT "WH" 69 | 20 GOTO 10 70 | ``` 71 | Will output: 72 | ```BlitzBasic 73 | |10 PRINT "WH" 74 | 0a00|PRINT "WH" 75 | 0a0091| "WH" 76 | 0a009120|"WH" 77 | 0a00912022|WH" 78 | 0a0091202257|H" 79 | 0a009120225748|" 80 | 0a00912022574822| 81 | |20 GOTO 10 82 | 1400|GOTO 10 83 | 140089| 10 84 | 140089200e0a00| 85 | ``` 86 | 87 | - *From Build:* 88 | Tells **MSX Basic Tokenizer** it is running from a build system (or an external program) and adjust some behaviours accordingly. 89 | arg: `-frb` 90 | 91 | ### Notes 92 | 93 | **MSX Basic Tokenizer** was tested with over 100 random basic programs from magazines and other sources and some programs crated to stress the conversions, however there should be still some (several?) fringe cases not covered. 94 | **Be careful.** 95 | 96 | There are some known discrepancies between **MSX Basic Tokenizer** and the MSX tokenization, most of them regarding errors on the code. They are: 97 | - MSX &b (binary notation) tokenizes anything after it as characters except when a tokenized command is reached. The implementation here only looks for 0 and 1, reverting back to the normal parsing when other characters are found. 98 | - Spaces at the end of a line are removed. The MSX does not remove them if loading from an ASCII file, only if typed on the machine. 99 | - The MSX seems to split overflowed numbers on branching instructions (preceded by 0e), here it throw an error. 100 | - Syntax errors generate wildly different results from the ones generated by the MSX. 101 | 102 | Some errors on the code stop the conversion. They are: 103 | - Line number too high, line number out of order, lines not starting with numbers, branching lines too high. 104 | - Numbers bigger than their explicit type (in some cases they are converted up as per on the MSX.) 105 | 106 | 107 | # openMSX Basic (de)Tokenizer 108 | **v1.4** 109 | 110 | Uses **openMSX** to convert a basic program from ASCII to tokenized or vice-versa. 111 | 112 | It calls **openMSX** headless (without screen) and with throttle, mount a path (current = default) as a disk, load a basic file from this path, saves it with the chosen format and closes **openMSX**. 113 | 114 | The path to an installation of **openMSX** is needed and a machine can be chosen to overwrite the default one. A disk drive extension can also be chosen for machines without one, it will be plugged on the slot A by default but it can force to slot B by putting `:SlotB` after its name. 115 | 116 | > Be careful with the folder used as a disk. **openMSX** respects the MSX disk limitations of size (size of all the files must not be greater than the emulated disk size), file name sizes and characters. 117 | > Always work on copies. 118 | 119 | > **MSX Basic Tokenizer** is now fully integrated with the **Badig** ecosystem. 120 | > It can be automatically called by the build system on **[MSX Sublime Tools](https://github.com/farique1/MSX-Sublime-Tools)** and from **[MSX Basic Dignified](https://github.com/farique1/msx-basic-dignified)**. 121 | 122 | ### How to use 123 | 124 | `openmsxbatoken.py [destination] [-of ] [-do] [-vb <0-5>] [-frb]` 125 | 126 | Arguments can be passed on the Python code itself, on `openMSXBatoken.ini` or through the command line with each method having a priority higher than the one before. 127 | 128 | #### openMSX setup 129 | 130 | On `openMSXBatoken.ini` specify: 131 | 132 | - An alternate machine to overwrite the default one. 133 | `machine_name =` (optional) 134 | - A disk extension for machines without one. 135 | Plugged on the slot A by default, can force to slot B by putting `:SlotB` after its name. 136 | `disk_ext_name =` (optional) 137 | - Under `[WINPATHS]`, the path to `openMSX.exe` on an installation of **openMSX** if using Windows. 138 | `openmsx_filepath =` (required) 139 | - Under `[MACPATHS]`, the path to `openMSX.app` on an installation of **openMSX** if using MacOS. 140 | `openmsx_filepath =` (required) 141 | > The individual path sections in the `.ini` file for each of the systems is so if you are like me you can just be working on the same program on a PC and a Mac **at the same time**. 142 | > There are no command line arguments for these. 143 | 144 | #### Arguments 145 | 146 | - *Source File* 147 | The file to convert. 148 | arg: `` ini: `file_load =` 149 | 150 | If a path (absolute) is given, this path will be used to mount the disk on the MSX. 151 | If only a name is given the current path will be mounted as a disk on the MSX. 152 | Spaces on files will be replaced with an `_` for compatibility with the MSX filesystem. 153 | 154 | - *Destination File* 155 | The file to be saved. 156 | arg: `[destination]` ini: `file_save =` 157 | 158 | If a path is given it will be ignored with a warning. The file will always be saved on the previously mounted disk. 159 | If no name is given the `source` name will be used with a `.bas` or `.asc` extension accordingly. 160 | If the file already exists an error will be given. 161 | Spaces on files will be replaced with an `_` for compatibility with the MSX filesystem. 162 | 163 | - *Output Format* 164 | The format to save. 165 | arg: `-of ` ini: `output_format = [t|a]` 166 | 167 | `t` is the default and save the file in the tokenized format. `a` saves the file in ASCII. 168 | Tokenized files will receive a `.bas` extension and ASCII files an `.asc` one. 169 | 170 | - *Delete Original:* 171 | Delete the `source` file after the converted version is successfully saved. 172 | arg: `-do` ini: `delete_original = [true|false]` 173 | 174 | - *Verbose level:* 175 | The amount o information given. 176 | arg: `-vb <0-4>` ini: `verbose_level = [0-4]` 177 | 178 | `0` - Silent 179 | `1` - Errors 180 | `2` - Errors + Warnings 181 | `3` - Errors + Warnings + Steps 182 | `4` - Errors + Warnings + Steps + Details 183 | 184 | - *From Build:* 185 | Tells **openMSX Basic (de)Tokenizer** it is running from a build system (or an external program) and adjust some behaviours accordingly. 186 | arg: `-frb` 187 | 188 | ### Known bugs 189 | 190 | An `autoexec.bas` on the mounted disk will run automatically, taking control of **openMSX** and possibly preventing the conversion. 191 | There is a problem passing file names with some special characters, “&” for instance. 192 | 193 | 194 | # Helper Tools 195 | 196 | Some tools made to help develop and test the Tokenizer. 197 | > Their settings can be changed on the code itself. The variables are easily named and they are commented when needed. 198 | 199 | ## Token Compare 200 | 201 | Convert ASCII listings with **MSXBatoken** and compares them with a conversion from **openMSXBatoken** (using a "real" MSX). 202 | 203 | ## Token Format Viz 204 | 205 | Format a tokenized MSX Basic program to display a line of basic per line of tokens. 206 | Also can interleave another tokenized file to compare them line by line. 207 | > Take command line arguments. See code. 208 | 209 | ## Random Numbers 210 | 211 | Generate MSX Basic lines with random numbers of several types. 212 | Generate integer, single, double and scientific notation numbers. 213 | 214 | ------------ 215 | Main tokenizers Python 3.8. 216 | Helper tools Python 2.7. 217 | Use with care. 218 | -------------------------------------------------------------------------------- /Tools/RandomNumbers.py: -------------------------------------------------------------------------------- 1 | # Number generator test for MSXBatoken 2 | # Generate integer, single, double and scientific notation 3 | 4 | from random import randrange 5 | 6 | lines = 1000 7 | is_notation = True 8 | file_save = 'DiskToken/notAut.asc' 9 | 10 | signs = '#!%' 11 | numbers_arr = [] 12 | 13 | for item in range(1, lines + 1): 14 | sign = '' 15 | integer = '' 16 | fractio = '' 17 | integer_size = randrange(10) 18 | fractio_size = randrange(10) 19 | dot_prob = randrange(2) 20 | sign_prob = randrange(4) 21 | 22 | for i in range(integer_size): 23 | digit = str(randrange(10)) 24 | integer += digit 25 | 26 | dot = '.' if dot_prob == 0 else '' 27 | 28 | for i in range(fractio_size): 29 | digit = str(randrange(10)) 30 | fractio += digit 31 | 32 | number = integer + dot + fractio 33 | if number == '' or number == '.': 34 | number = '0' 35 | integer = '0' 36 | fractio = '0' 37 | 38 | if is_notation: 39 | precision = 'e' if sign_prob < 2 else 'd' 40 | sign = '-' if randrange(2) < 1 else '+' 41 | number += precision + sign + str(randrange(10)) 42 | else: 43 | if sign_prob < 3: 44 | if sign_prob == 2 and int(integer + fractio) <= 32767: 45 | number += signs[sign_prob] 46 | elif sign_prob < 2 and int(integer + fractio) <= 999999: 47 | number += signs[sign_prob] 48 | elif sign_prob < 1 and int(integer + fractio) <= (10 ** 63 - 1): 49 | number += signs[sign_prob] 50 | 51 | line = ' '.join([str(item * 10), 'print', number]) 52 | print ' '.join([str(item * 10), 'print', number]) 53 | numbers_arr.append(line) 54 | 55 | with open(file_save, 'w') as f: 56 | for line in numbers_arr: 57 | f.write(line + '\r\n') 58 | -------------------------------------------------------------------------------- /Tools/TokenCompare.py: -------------------------------------------------------------------------------- 1 | # Convert ASCII listings with MSXBatoken and compares it 2 | # with a conversion from openMSXBatoken (using the MSX) 3 | # cmp may be exchanged for fc on Windows (I guess) 4 | 5 | import os 6 | import time 7 | import glob 8 | import subprocess 9 | 10 | report_arr = [] 11 | 12 | error_tolerance = 15 13 | save_list_report = True 14 | save_file_report = False 15 | base_path = 'basictests/' 16 | file_list = [] 17 | save_file = '' 18 | make_formated_viz = True 19 | make_token_list_file = True 20 | delete_files_if_ok = True 21 | # Put a single entry ['name'] to do a single file 22 | # An empty file_list [] will get the whole base_path folder 23 | 24 | if save_file == '': 25 | if len(file_list) == 1: 26 | save_file = file_list[0] 27 | elif base_path != '': 28 | save_file = os.path.basename(os.path.normpath(base_path)) 29 | else: 30 | print '*** No save file given' 31 | raise SystemExit(0) 32 | 33 | token_list_file = ' -sl' if make_token_list_file else '' 34 | 35 | if file_list == []: 36 | for files in glob.glob(base_path + '*.asc'): 37 | file = os.path.basename(files) 38 | file = os.path.splitext(file)[0] 39 | file_list.append(file) 40 | 41 | file_list.sort() 42 | 43 | 44 | def report(text): 45 | text = ' '.join(text) 46 | report_arr.append(text) 47 | print text 48 | 49 | 50 | report([save_file]) 51 | report([]) 52 | 53 | for file in file_list: 54 | base_name = file # os.path.splitext(file)[0] 55 | file_load = base_path + base_name + '.asc' 56 | file_msx = base_path + base_name + '.bas' 57 | file_bto = base_path + base_name + '.bto' 58 | file_btc = base_path + base_name + '.btc' 59 | 60 | wait_time = 1 61 | file_size = os.path.getsize(file_load), file_load 62 | if file_size > 10000: 63 | wait_time = 4 64 | 65 | report(['---', base_name + '.asc']) 66 | os.system('python MSXBatoken20.py ' + file_load + ' ' + file_bto + token_list_file) 67 | os.system('python openMSXBatoken.py ' + file_load + ' ' + file_msx) 68 | time.sleep(wait_time) 69 | 70 | cmd = ('cmp --verbose ' + file_msx + ' ' + file_bto) 71 | proc = subprocess.Popen([cmd], shell=True, stdin=subprocess.PIPE, 72 | stdout=subprocess.PIPE, stderr=subprocess.STDOUT) 73 | 74 | error_num = 0 75 | log_out = '*' 76 | while log_out: 77 | log_out = proc.stdout.readline().strip() 78 | if not log_out.replace(' ', '').isdigit(): 79 | if log_out.replace(' ', '') != '': 80 | report([' *', log_out]) 81 | error_num = 1 82 | break 83 | output = log_out.split() 84 | if log_out: 85 | address = '{0:04x}'.format(int(output[0])) 86 | source = '{0:02x}'.format(int(output[1])) 87 | destin = '{0:02x}'.format(int(output[2])) 88 | line, col = divmod(int(output[0]), 16) 89 | report(['***', address, source, destin, '-', str(line + 1), str(col)]) 90 | error_num += 1 91 | if error_num >= error_tolerance: 92 | report([' - More than']) 93 | break 94 | 95 | if error_num == 0: 96 | if delete_files_if_ok: 97 | os.remove(file_msx) 98 | os.remove(file_bto) 99 | if os.path.isfile(base_path + base_name + '.mlt'): 100 | os.remove(base_path + base_name + '.mlt') 101 | if os.path.isfile(file_btc): 102 | os.remove(file_btc) 103 | report(['--- No errors']) 104 | else: 105 | report([' -', str(error_num), 'error(s)']) 106 | if make_formated_viz: 107 | os.system('python TokenFormatViz.py -lb ' + file_msx + ' -lc ' + file_bto + ' -sa ' + file_btc) 108 | 109 | report(['']) 110 | 111 | if (save_file_report and len(file_list) == 1) or \ 112 | (save_list_report and len(file_list) > 1): 113 | with open(base_path + save_file + '.log', 'w') as f: 114 | for line in report_arr: 115 | f.write(line + '\n') 116 | -------------------------------------------------------------------------------- /Tools/TokenFormatViz.py: -------------------------------------------------------------------------------- 1 | # Format a tokenized MSX Basic program to display a line of basic per line of tokens 2 | # Also can interleave another tokenized file to compare them line by line 3 | 4 | import argparse 5 | 6 | file_load = ['BasicOthers/expert.bas', 7 | 'BasicOthers/expert.bto'] 8 | file_save = 'BasicOthers/expert.btc' 9 | 10 | parser = argparse.ArgumentParser(description='Format MSX tokenized binary.') 11 | parser.add_argument("-lb", default=file_load[0], help='The file to format.') 12 | parser.add_argument("-lc", default=file_load[1], help='A second file to compare.') 13 | parser.add_argument("-sa", default=file_save, help='The formatted output file.') 14 | args = parser.parse_args() 15 | 16 | file_load[0] = args.lb 17 | file_load[1] = args.lc 18 | file_save = args.sa 19 | 20 | file_load = [file_load[0]] if file_load[1] == '' else file_load 21 | file_save = file_load if file_save == '' else file_save 22 | 23 | 24 | def bytes_from_file(filename, chunksize=8192): 25 | with open(filename, "rb") as f: 26 | while True: 27 | chunk = f.read(chunksize) 28 | if chunk: 29 | for b in chunk: 30 | yield b 31 | else: 32 | break 33 | 34 | 35 | def get_next_addrs(pos): 36 | return int(bin_file[pos + 1] + bin_file[pos], 16) - int('8000', 16) - 1 37 | 38 | 39 | for file in file_load: 40 | bin_file = [] 41 | for b in bytes_from_file(file): 42 | bin_file.append('{0:02x}'.format(ord(b))) 43 | 44 | bin_file.remove('ff') 45 | bin_form = [] 46 | byte_line = '' 47 | next_addrs = get_next_addrs(0) 48 | prev_addrs = 0 49 | for n, byte in enumerate(bin_file): 50 | byte_line += byte 51 | if n != prev_addrs + 0 and n != prev_addrs + 2: 52 | byte_line += ' ' 53 | if n == next_addrs - 1: 54 | line_num = str(int(byte_line[7:9] + byte_line[5:7], 16)) 55 | bin_form.append(line_num + ': ' + byte_line + '\n') 56 | byte_line = '' 57 | prev_addrs = next_addrs 58 | next_addrs = get_next_addrs(n + 1) 59 | bin_form.insert(0, '"' + file + '"' + '\n') 60 | if file == file_load[0]: 61 | comp_form = bin_form 62 | blnk_form = ['\n'] * len(comp_form) 63 | 64 | if len(file_load) > 1: 65 | lists = [comp_form, bin_form, blnk_form] 66 | bin_form = [val for pair in zip(*lists) for val in pair] 67 | 68 | with open(file_save, 'w') as f: 69 | for line in bin_form: 70 | f.write(line) 71 | -------------------------------------------------------------------------------- /msxbatoken.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | MSX Basic Tokenizer 4 | v1.4 5 | Convert ASCII MSX Basic to tokenized format 6 | 7 | Copyright (C) 2019-2022 - Fred Rique (farique) 8 | https://github.com/farique1/MSX-Basic-Tokenizer 9 | 10 | See also: 11 | MSX Sublime Tools at 12 | https://github.com/farique1/MSX-Sublime-Tools 13 | Syntax Highlight, Theme, Build System, Comment Preference and Auto Completion. 14 | 15 | MSX Basic Dignified at 16 | https://github.com/farique1/msx-basic-dignified 17 | Convert modern MSX Basic Dignified to traditional MSX Basic format. 18 | 19 | msxbatoken.py [args...] 20 | msxbatoken.py -h for help. 21 | 22 | New: 1.4v 10/12/2021 23 | WINDOWS COMPATIBILITY YEY! 24 | os.path() operations to improve compatibility across systems 25 | Changed .INI section names 26 | 27 | Notes: 28 | Known discrepancies: 29 | MSX &b (binary notation) tokenizes anything after it as characters except when a tokenized command is reached. 30 | The implementation here only looks for 0 and 1, reverting back to the normal parsing when other characters are found. 31 | Spaces at the end of a line are removed. 32 | The MSX does not remove them if loading from an ASCII file, only if typed on the machine. 33 | The MSX seems to split overflowed numbers on branching instructions (preceded by 0e), here it throw an error. 34 | Syntax errors generate wildly different results from the ones generated by the MSX. 35 | Conversion stopping errors: 36 | Line number too high, Line number out of order, Lines not starting with numbers, Branching lines too high 37 | Numbers bigger than their explicit type (in some cases they are converted up as per on the MSX) 38 | Tested with over 100 random basic programs from magazines and other sources, some crated to stress the conversions 39 | However there should be still some (several?) fringe cases not covered here. Be careful. 40 | """ 41 | 42 | import re 43 | import os.path 44 | import binascii 45 | import argparse 46 | import configparser 47 | from datetime import datetime 48 | from os import remove as osremove 49 | 50 | file_load = '' # Source file 51 | file_save = '' # Destination file 52 | export_list = 0 # Save a .mlt list file detailing the tokenization: [#] number of bytes per line (def 16) (max 32) (0 no) 53 | delete_original = False # Delete the original ASCII file 54 | verbose_level = 3 # Verbosity level: 0 silent, 1 errors, 2 +warnings, 3 +steps(def), 4 +details, 5 +conversion dump 55 | is_from_build = False # Tell if it is being called from a build system (show file name on error messages and other stuff) 56 | 57 | 58 | def show_log(line_number, text, level, **kwargs): 59 | bullets = ['', '*** ', ' * ', '--- ', ' - ', ' '] 60 | 61 | try: 62 | bullet = kwargs['bullet'] 63 | except KeyError: 64 | bullet = level 65 | 66 | display_file_name = '' 67 | if is_from_build and (bullet == 1 or bullet == 2): 68 | display_file_name = os.path.basename(file_load) + ': ' 69 | 70 | line_number = '(' + str(line_number) + '): ' if line_number != '' else '' 71 | 72 | if verbose_level >= level: 73 | print(bullets[bullet] + display_file_name + line_number + text) 74 | 75 | if bullet == 1: 76 | if is_from_build: 77 | print(' Tokenizing_aborted') 78 | else: 79 | print(' Execution_stoped') 80 | print() 81 | raise SystemExit(0) 82 | 83 | 84 | local_path = os.path.split(os.path.abspath(__file__))[0] 85 | ini_path = os.path.join(local_path, 'MSXBatoken.ini') 86 | if os.path.isfile(ini_path): 87 | config = configparser.ConfigParser() 88 | config.sections() 89 | try: 90 | config.read(ini_path) 91 | file_load = config.get('CONFIGS', 'file_load') if config.get('CONFIGS', 'file_load') else file_load 92 | file_save = config.get('CONFIGS', 'file_save') if config.get('CONFIGS', 'file_save') else file_save 93 | export_list = config.getboolean('CONFIGS', 'export_list') if config.get('CONFIGS', 'export_list') else export_list 94 | delete_original = config.getboolean('CONFIGS', 'delete_original') if config.get('CONFIGS', 'delete_original') else delete_original 95 | verbose_level = config.getint('CONFIGS', 'verbose_level') if config.get('CONFIGS', 'verbose_level') else verbose_level 96 | except (ValueError, configparser.NoOptionError) as e: 97 | show_log('', 'MSXBatoken.ini: ' + str(e), 1) 98 | 99 | parser = argparse.ArgumentParser(description='Tokenize ASCII MSX Basic') 100 | parser.add_argument("input", nargs='?', default=file_load, help='Source file (preferible .asc)') 101 | parser.add_argument("output", nargs='?', default=file_save, help='Destination file ([source].bas) if missing') 102 | parser.add_argument("-el", default=export_list, const=16, type=int, nargs='?', help="Save a .mlt list file detailing the tokenization: [#] number of bytes per line (def 16) (max 32)") 103 | parser.add_argument("-do", default=delete_original, action='store_true', help="Delete original file after conversion") 104 | parser.add_argument("-vb", default=verbose_level, type=int, help="Verbosity level: 0 silent, 1 errors, 2 +warnings, 3 +steps(def), 4 +details, 5 +conversion dump") 105 | parser.add_argument("-frb", default=is_from_build, action='store_true', help="Tell it is running from a build system") 106 | args = parser.parse_args() 107 | 108 | file_load = args.input 109 | file_save = args.output 110 | if args.output == '': 111 | save_path = os.path.dirname(file_load) 112 | save_path = '' if save_path == '' else save_path 113 | save_file = os.path.basename(file_load) 114 | save_file = os.path.splitext(save_file)[0] + '.bas' 115 | file_save = os.path.join(save_path, save_file) 116 | bytes_width = min(abs(args.el), 32) 117 | export_list = True if args.el > 0 else False 118 | delete_original = args.do 119 | verbose_level = args.vb 120 | is_from_build = args.frb 121 | 122 | lines_num = 0 123 | width_byte = bytes_width * 2 124 | width_line = bytes_width * 3 + 7 125 | now = datetime.now() 126 | list_path = os.path.dirname(file_load) 127 | # list_path = '' if list_path == '' else list_path 128 | list_file = os.path.basename(file_load) 129 | list_file = os.path.splitext(list_file)[0] + '.mlt' 130 | file_list = os.path.join(list_path, list_file) 131 | tokens = [('>', 'ee'), ('PAINT', 'bf'), ('=', 'ef'), ('ERROR', 'a6'), ('ERR', 'e2'), ('<', 'f0'), ('+', 'f1'), 132 | ('FIELD', 'b1'), ('PLAY', 'c1'), ('-', 'f2'), ('FILES', 'b7'), ('POINT', 'ed'), ('*', 'f3'), ('POKE', '98'), 133 | ('/', 'f4'), ('FN', 'de'), ('^', 'f5'), ('FOR', '82'), ('PRESET', 'c3'), ('\\', 'fc'), ('PRINT', '91'), ('?', '91'), 134 | ('PSET', 'c2'), ('AND', 'f6'), ('GET', 'b2'), ('PUT', 'b3'), ('GOSUB', '8d'), ('READ', '87'), ('GOTO', '89'), 135 | ('ATTR$', 'e9'), ('RENUM', 'aa'), ('AUTO', 'a9'), ('IF', '8b'), ('RESTORE', '8c'), ('BASE', 'c9'), ('IMP', 'fa'), 136 | ('RESUME', 'a7'), ('BEEP', 'c0'), ('INKEY$', 'ec'), ('RETURN', '8e'), ('BLOAD', 'cf'), ('INPUT', '85'), 137 | ('BSAVE', 'd0'), ('INSTR', 'e5'), ('RSET', 'b9'), ('CALL', 'ca'), ('_', '5f'), ('RUN', '8a'), ('IPL', 'd5'), ('SAVE', 'ba'), 138 | ('KEY', 'cc'), ('SCREEN', 'c5'), ('KILL', 'd4'), ('SET', 'd2'), ('CIRCLE', 'bc'), ('CLEAR', '92'), ('CLOAD', '9b'), 139 | ('LET', '88'), ('SOUND', 'c4'), ('CLOSE', 'b4'), ('LFILES', 'bb'), ('CLS', '9f'), ('LINE', 'af'), ('SPC(', 'df'), 140 | ('CMD', 'd7'), ('LIST', '93'), ('SPRITE', 'c7'), ('COLOR', 'bd'), ('LLIST', '9e'), ('CONT', '99'), ('LOAD', 'b5'), 141 | ('STEP', 'dc'), ('COPY', 'd6'), ('LOCATE', 'd8'), ('STOP', '90'), ('CSAVE', '9a'), ('CSRLIN', 'e8'), 142 | ('STRING$', 'e3'), ('LPRINT', '9d'), ('SWAP', 'a4'), ('LSET', 'b8'), ('TAB(', 'db'), ('MAX', 'cd'), ('DATA', '84'), 143 | ('MERGE', 'b6'), ('THEN', 'da'), ('TIME', 'cb'), ('TO', 'd9'), ('DEFDBL', 'ae'), ('DEFINT', 'ac'), ('DEFSTR', 'ab'), 144 | ('TROFF', 'a3'), ('DEFSNG', 'ad'), ('TRON', 'a2'), ('DEF', '97'), ('MOD', 'fb'), ('USING', 'e4'), 145 | ('DELETE', 'a8'), ('MOTOR', 'ce'), ('USR', 'dd'), ('DIM', '86'), ('NAME', 'd3'), ('DRAW', 'be'), ('NEW', '94'), 146 | ('VARPTR', 'e7'), ('NEXT', '83'), ('VDP', 'c8'), ('DSKI$', 'ea'), ('NOT', 'e0'), ('DSKO$', 'd1'), ('VPOKE', 'c6'), 147 | ('OFF', 'eb'), ('WAIT', '96'), ('END', '81'), ('ON', '95'), ('WIDTH', 'a0'), ('OPEN', 'b0'), ('XOR', 'f8'), 148 | ('EQV', 'f9'), ('OR', 'f7'), ('ERASE', 'a5'), ('OUT', '9c'), ('ERL', 'e1'), ('REM', '8f'), 149 | 150 | ('PDL', 'ffa4'), ('EXP', 'ff8b'), ('PEEK', 'ff97'), ('FIX', 'ffa1'), ('POS', 'ff91'), ('FPOS', 'ffa7'), 151 | ('ABS', 'ff86'), ('FRE', 'ff8f'), ('ASC', 'ff95'), ('ATN', 'ff8e'), ('HEX$', 'ff9b'), ('BIN$', 'ff9d'), 152 | ('INP', 'ff90'), ('RIGHT$', 'ff82'), ('RND', 'ff88'), ('INT', 'ff85'), ('CDBL', 'ffa0'), ('CHR$', 'ff96'), 153 | ('CINT', 'ff9e'), ('LEFT$', 'ff81'), ('SGN', 'ff84'), ('LEN', 'ff92'), ('SIN', 'ff89'), ('SPACE$', 'ff99'), 154 | ('SQR', 'ff87'), ('LOC(', 'ffac28'), ('STICK', 'ffa2'), ('COS', 'ff8c'), ('LOF', 'ffad'), ('STR$', 'ff93'), 155 | ('CSNG', 'ff9f'), ('LOG', 'ff8a'), ('STRIG', 'ffa3'), ('LPOS', 'ff9c'), ('CVD', 'ffaa'), ('CVI', 'ffa8'), 156 | ('CVS', 'ffa9'), ('TAN', 'ff8d'), ('MID$', 'ff83'), ('MKD$', 'ffb0'), ('MKI$', 'ffae'), ('MKS$', 'ffaf'), 157 | ('VAL', 'ff94'), ('DSKF', 'ffa6'), ('VPEEK', 'ff98'), ('OCT$', 'ff9a'), ('EOF', 'ffab'), ('PAD', 'ffa5'), 158 | 159 | ("'", '3a8fe6'), ('ELSE', '3aa1'), ('AS', '4153')] 160 | jumps = ['RESTORE', 'AUTO', 'RENUM', 'DELETE', 'RESUME', 'ERL', 'ELSE', 'RUN', 'LIST', 'LLIST', 'GOTO', 'RETURN', 'THEN', 'GOSUB'] 161 | 162 | 163 | def update_lines(source, compiled): 164 | global line_compiled 165 | global line_source 166 | if len(line_source) > 2: 167 | line_source = line_source[source:] 168 | line_compiled = line_compiled + compiled 169 | show_log('', ' '.join([line_compiled + '|' + line_source.rstrip()]), 5) 170 | 171 | 172 | def parse_numeric_bases(nugget_comp, token, base): 173 | if not nugget_comp: 174 | nugget_comp = '' 175 | hexa = '0000' 176 | else: 177 | if int(nugget_comp, base) > 65535: 178 | show_log(line_number, ' '.join(['overflow', nugget_comp]), 1) # Exit 179 | hexa = '{0:04x}'.format(int(nugget_comp, base)) 180 | return token + hexa[2:] + hexa[:-2] 181 | 182 | 183 | def make_list(base_prev, compiled, source): 184 | line_inc = 12 185 | next_addr = str(compiled[0:4]) 186 | curr_line = str(compiled[4:8]) 187 | line_byte = str(compiled[8:]) 188 | line_splt = [line_byte[i:i + width_byte] for i in range(0, len(line_byte), width_byte)] 189 | for line in line_splt: 190 | curr_addr = str(hex(base_prev)[2:][:-2]) + str(hex(base_prev)[2:][2:]) 191 | 192 | byte_splt = ' '.join([line[i:i + 2] for i in range(0, len(line), 2)]) 193 | 194 | line_list = curr_addr + ': ' + next_addr + ' ' + curr_line + ' ' \ 195 | + byte_splt + (' ' * (width_line - len(byte_splt))) + source.rstrip() 196 | 197 | list_code.append(line_list) 198 | next_addr, curr_line, source = ' ', '', '' 199 | base_prev += line_inc 200 | line_inc = len(line) // 2 201 | 202 | 203 | def parse_sgn_dbl(header, precision, nugget_integer, nugget_fractional, nugget_group_1_orig, nugget_number): 204 | nugget_stripped = nugget_integer.lstrip('0') 205 | if nugget_stripped == '': 206 | if nugget_fractional == '' or int(str(nugget_fractional[1:]) + '0') == 0: 207 | nugget_stripped = '0' 208 | hexa_precision = '00' 209 | else: 210 | nugget_integer = nugget_group_1_orig 211 | if str(nugget_fractional[1]) == '0': 212 | nugget_zeros = nugget_fractional[1:].rstrip('0') 213 | hexa_precision = '{0:02x}'.format(64 - (len(nugget_zeros) - len(nugget_zeros.lstrip('0')))) 214 | else: 215 | hexa_precision = '40' 216 | else: 217 | hexa_precision = '{0:02x}'.format(len(nugget_stripped) + 64) 218 | hexa = header + hexa_precision 219 | cropped = str(int(nugget_number)) 220 | round_digit = int(cropped[precision:precision + 1]) if cropped[precision:precision + 1].isdigit() else 0 221 | nugget_cropped = cropped[0:precision] if round_digit < 5 else str(int(cropped[:precision]) + 1) 222 | hexa += nugget_cropped 223 | return hexa, nugget_integer 224 | 225 | 226 | show_log('', '', 3, bullet=0) 227 | 228 | if file_save == file_load: 229 | show_log('', ' '.join(['destination_same_as_source', file_save]), 1) # Exit 230 | 231 | show_log('', 'Loading file', 3) 232 | ascii_code = [] 233 | if file_load: 234 | show_log('', ' '.join(['load_file:', file_load]), 4) 235 | try: 236 | with open(file_load, 'r', encoding='latin1') as f: 237 | for line in f: 238 | if line.strip() == "" or line.strip().isdigit(): 239 | continue 240 | ascii_code.append(line.strip() + '\r\n') 241 | except IOError: 242 | show_log('', ' '.join(['source_not_found', file_load]), 1) # Exit 243 | else: 244 | show_log('', 'source_not_given', 1) # Exit 245 | 246 | show_log('', 'Start tokenizing', 3) 247 | base = 0x8001 248 | base_base = base 249 | line_order = 0 250 | line_number = 0 251 | tokenized_code = ['ff'] 252 | list_code = ["' -------------------------------------", 253 | "' MSX Basic Tokenizer: " + '"' + os.path.basename(file_load) + '"', 254 | "' Date: " + now.strftime("%Y-%m-%d %H:%M:%S"), 255 | "' -------------------------------------", ""] 256 | list_code.append(hex(base - 1)[2:] + ': ' + 'ff' + (' ' * (width_line + 8)) + 'start') 257 | 258 | for line_source in ascii_code: 259 | if ord(line_source[0]) < 48 or ord(line_source[0]) > 57: 260 | if ord(line_source[0]) == 26: # Avoid '' on last line of some listings 261 | continue 262 | else: 263 | show_log(line_number, ' '.join(['line_not_starting_with_number']), 1) # Exit 264 | 265 | if line_source == '': 266 | continue 267 | 268 | base_source = line_source 269 | line_compiled = '' 270 | 271 | show_log('', ' '.join([line_compiled + '|' + line_source.rstrip()]), 5) 272 | 273 | # Get line number 274 | nugget = re.match(r'\s*\d+\s?', line_source).group() 275 | line_number = nugget.strip() 276 | if int(line_number) <= line_order: 277 | show_log(line_number, ' '.join(['line_number_out_of_order', str(line_number)]), 1) # Exit 278 | if int(line_number) > 65529: 279 | show_log(line_number, ' '.join(['line_number_too_high', str(line_number)]), 1) # Exit 280 | line_order = int(line_number) 281 | line_source = line_source[len(nugget):] 282 | hexa = '{0:04x}'.format(int(nugget)) 283 | line_compiled += hexa[2:] + hexa[:-2] 284 | 285 | show_log('', ' '.join([line_compiled + '|' + line_source.rstrip()]), 5) 286 | 287 | # Look for instructions 288 | while len(line_source) > 2: 289 | for command, token in tokens: 290 | if line_source.upper().startswith(command): 291 | compiled = token 292 | source = len(command) 293 | update_lines(source, compiled) 294 | 295 | if command == 'AS': 296 | nugget = re.match(r'(\s*)(\d{1,2})', line_source) 297 | if nugget: 298 | nugget_spaces = nugget.group(1) 299 | nugget_line = nugget.group(2) 300 | hex_spaces = '20' * len(nugget_spaces) 301 | hexa = '{0:02x}'.format(ord(nugget_line)) 302 | compiled = hex_spaces + hexa 303 | source = len(nugget_spaces) + len(nugget_line) 304 | update_lines(source, compiled) 305 | 306 | # Is a jumping instructions 307 | if command in jumps: 308 | while True: 309 | nugget = re.match(r'(\s*)(\d+|,+)', line_source) 310 | if nugget: 311 | nugget_spaces = nugget.group(1) 312 | nugget_line = nugget.group(2) 313 | if nugget_line.isdigit(): 314 | if int(nugget_line) > 65529: 315 | show_log(line_number, ' '.join(['line_number_jump_too_high', str(nugget_line)]), 1) # Exit 316 | hex_spaces = '20' * len(nugget_spaces) 317 | hexa = '{0:04x}'.format(int(nugget_line)) 318 | compiled = hex_spaces + '0e' + hexa[2:] + hexa[:-2] 319 | source = len(nugget_spaces) + len(nugget_line) 320 | update_lines(source, compiled) 321 | # Has several jumps (on goto/gosub) 322 | else: 323 | hex_spaces = '20' * len(nugget_spaces) 324 | hexa = '2c' * len(nugget_line) 325 | compiled = hex_spaces + hexa 326 | source = len(nugget_spaces) + len(nugget_line) 327 | update_lines(source, compiled) 328 | else: 329 | break 330 | 331 | # Instruction with literal data after it 332 | if command == 'DATA' or command == 'REM' or command == "'" or command == 'CALL' or command == '_': 333 | while True: 334 | character = line_source[0] 335 | if command == 'CALL' or command == '_': 336 | character = character.upper() 337 | hexa = '{0:02x}'.format(ord(character)) 338 | compiled = hexa 339 | source = 1 340 | update_lines(source, compiled) 341 | 342 | if len(line_source) <= 2 \ 343 | or (command == 'DATA' and line_source[0] == ':') \ 344 | or (command == '_' and (line_source[0] == ':' or line_source[0] == '('))\ 345 | or (command == 'CALL' and (line_source[0] == ':' or line_source[0] == '(')): 346 | break 347 | break 348 | 349 | # Look each character 350 | else: 351 | # Is a number 352 | is_int = False 353 | nugget = line_source[0].upper() 354 | if nugget.isdigit() or nugget == '.': 355 | nugget = re.match(r'(\d*)\s*(.)\s*(.?)', line_source) 356 | nugget_number = nugget.group(1) 357 | nugget_integer = nugget.group(1) 358 | nugget_fractional = '' 359 | nugget_signal = nugget.group(2) 360 | nugget_notif_confirm = nugget.group(3) 361 | 362 | # Is floating point 363 | if nugget_signal == '.': 364 | nugget = re.match(r'(\d*)\s*(.)\s*(\d*)\s*(.)\s*(.?)', line_source) 365 | nugget_group1 = '0' if nugget.group(1) == '' else nugget.group(1) 366 | nugget_number = nugget_group1 + nugget.group(3) 367 | nugget_integer = nugget_group1 368 | nugget_fractional = '.' + nugget.group(3) 369 | nugget_signal = nugget.group(4) 370 | nugget_notif_confirm = nugget.group(5) 371 | 372 | # Has integer signal 373 | if nugget_signal == '%': 374 | nugget_number = nugget_integer 375 | is_int = True 376 | if int(nugget_number) >= 32768: 377 | show_log(line_number, ' '.join(['overflow', str(nugget_number)]), 1) # Exit 378 | elif nugget_signal != '%' and nugget_signal != '!' and nugget_signal != '#' and \ 379 | ((nugget_signal.lower() != 'e' and nugget_signal.lower() != 'd') 380 | or (nugget_notif_confirm != '-' and nugget_notif_confirm != '+')): 381 | nugget_signal = '' 382 | if nugget_fractional == '': 383 | is_int = True 384 | 385 | # Is scientific notation 386 | if (nugget_signal.lower() == 'e' or nugget_signal.lower() == 'd') \ 387 | and (nugget_notif_confirm == '-' or nugget_notif_confirm == '+'): # Avoid matching E from ELSE after a number 388 | exp = re.match(r'\d*\s*.\s*\d*\s*.\s*(\+|-)\s*(\d+)', line_source) 389 | 390 | nugget_exp_size = len(nugget_integer.lstrip('0')) + int(exp.group(1) + exp.group(2)) 391 | nugget_man_size = nugget_exp_size - len(nugget_fractional[1:]) - len(nugget_integer.lstrip('0')) 392 | 393 | if nugget_exp_size > 63 or nugget_exp_size < -64: 394 | show_log(line_number, ' '.join(['overflow', str(nugget_number)]), 1) # Exit 395 | 396 | fractional = abs(nugget_man_size) if nugget_man_size < 0 else 0 397 | notation = '%.*f' % (fractional, int(nugget_number) * (10 ** (nugget_man_size))) 398 | notation_parts = re.match(r'(\d+)(\.\d+)?', notation) 399 | notation_integer = notation_parts.group(1) 400 | notation_fractional = notation_parts.group(2) if notation_parts.group(2) else '' 401 | notation_number = notation.replace('.', '') 402 | notation_size = nugget_number.lstrip('0') 403 | 404 | if nugget_signal.lower() == 'e' and len(notation_size) < 7: 405 | hexa, _ = parse_sgn_dbl('1d', 6, notation_integer, notation_fractional, 406 | nugget.group(1), notation_number) 407 | hexa += '0' * (10 - len(hexa)) 408 | else: 409 | hexa, _ = parse_sgn_dbl('1f', 14, notation_integer, notation_fractional, 410 | nugget.group(1), notation_number) 411 | hexa += '0' * (18 - len(hexa)) 412 | hexa = hexa[0:18] 413 | 414 | nugget_integer = nugget.group(1) if nugget_integer.lstrip('0') == '' else nugget_integer 415 | nugget_signal += exp.group(1) + exp.group(2) 416 | 417 | # Is single precision 418 | elif (int(nugget_number) >= 32768 and int(nugget_number) <= 999999 and nugget_signal != '#') \ 419 | or (nugget_signal == '!' and int(nugget_number) <= (10 ** 63 - 1)) \ 420 | or (not is_int and int(nugget_number) <= 999999 and nugget_signal != '#'): 421 | 422 | hexa, nugget_integer = parse_sgn_dbl('1d', 6, nugget_integer, nugget_fractional, 423 | nugget.group(1), nugget_number) 424 | hexa += '0' * (10 - len(hexa)) 425 | 426 | # Is double precision 427 | elif (int(nugget_number) >= 1000000 and int(nugget_number) <= (10 ** 63 - 1)) \ 428 | or (nugget_signal == '#' and int(nugget_number) <= (10 ** 63 - 1)) \ 429 | or (not is_int and int(nugget_number) <= (10 ** 63 - 1)): 430 | 431 | hexa, nugget_integer = parse_sgn_dbl('1f', 14, nugget_integer, nugget_fractional, 432 | nugget.group(1), nugget_number) 433 | hexa += '0' * (18 - len(hexa)) 434 | hexa = hexa[0:18] 435 | 436 | # Is normal integer 437 | elif int(nugget_number) >= 0 and int(nugget_number) <= 9: 438 | nugget_add = str(int(nugget_number) + 17) 439 | hexa = '{0:02x}'.format(int(nugget_add)) 440 | 441 | elif int(nugget_number) >= 10 and int(nugget_number) <= 255: 442 | hexa = '0f' + '{0:02x}'.format(int(nugget_number)) 443 | 444 | elif int(nugget_number) >= 256 and int(nugget_number) <= 32767: 445 | hexa = '{0:04x}'.format(int(nugget_number)) 446 | hexa = '1c' + hexa[2:] + hexa[:-2] 447 | 448 | else: 449 | show_log(line_number, ' '.join(['number_too_high', str(nugget_number.lstrip('0'))]), 1) # Exit 450 | 451 | compiled = hexa 452 | source = len(nugget_integer) + len(nugget_fractional) + len(nugget_signal) 453 | update_lines(source, compiled) 454 | 455 | # Other bases 456 | elif nugget == '&': 457 | nugget = line_source[0:2].upper() 458 | if nugget == '&H': 459 | nugget_comp = re.match(r'[0-9a-f]*', line_source[2:].lower()).group() 460 | hexa = parse_numeric_bases(nugget_comp, '0c', 16) 461 | elif nugget == '&O': 462 | nugget_comp = re.match(r'[0-7]*', line_source[2:]).group() 463 | hexa = parse_numeric_bases(nugget_comp, '0b', 8) 464 | elif nugget == '&B': 465 | nugget_comp = re.match(r'[01]*', line_source[2:]).group() 466 | hexa = '2642' 467 | if nugget_comp: 468 | for character in nugget_comp: 469 | hexa += '{0:02x}'.format(ord(character)) 470 | else: 471 | nugget_comp = '' 472 | else: 473 | nugget = '&' 474 | hexa = '{0:02x}'.format(ord(nugget)) 475 | nugget_comp = '' 476 | compiled = hexa 477 | source = len(nugget) + len(nugget_comp) 478 | update_lines(source, compiled) 479 | 480 | # Quotes 481 | else: 482 | nugget = line_source[0].upper() 483 | if nugget == '"': 484 | num_quotes = 0 485 | while True: 486 | if line_source[0] == '"': 487 | num_quotes += 1 488 | hexa = '{0:02x}'.format(ord(line_source[0])) 489 | compiled = hexa 490 | source = 1 491 | update_lines(source, compiled) 492 | if num_quotes > 1 or len(line_source) <= 2: 493 | break 494 | # And the rest 495 | else: 496 | if ord(nugget) >= 65 and ord(nugget) <= 90: 497 | is_var = True 498 | while True: 499 | nugget = line_source[0].upper() 500 | for command, token in tokens: 501 | if line_source.upper().startswith(command): 502 | is_var = False 503 | if (ord(nugget) < 48 or ord(nugget) > 57) \ 504 | and (ord(nugget) < 65 or ord(nugget) > 90) \ 505 | or not is_var: 506 | is_var = False 507 | break 508 | hexa = '{0:02x}'.format(ord(line_source[0])) 509 | compiled = hexa 510 | source = 1 511 | update_lines(source, compiled) 512 | else: 513 | compiled = '{0:02x}'.format(ord(nugget.upper())) 514 | source = 1 515 | update_lines(source, compiled) 516 | 517 | base_prev = base 518 | base += (len(line_compiled) + 6) // 2 519 | hexa = '{0:04x}'.format(base) 520 | line_compiled = hexa[2:] + hexa[:-2] + line_compiled 521 | line_compiled += '00' 522 | tokenized_code.append(line_compiled) 523 | if export_list: 524 | make_list(base_prev, line_compiled, base_source) 525 | lines_num += 1 526 | 527 | show_log('', 'End tokenizing', 3) 528 | tokenized_code.append('0000') 529 | list_code.append(str(hexa) + ': 0000' + (' ' * (width_line + 6)) + 'end') 530 | 531 | list_code.extend(["", "' -------------------------------------", 532 | "' Statistics", 533 | "' -------------------------------------", ""]) 534 | list_code.append('lines ' + str(lines_num)) 535 | list_code.append('start &h' + '{0:04x}'.format(base_base - 1) + ' > ' + str(base_base - 1)) 536 | list_code.append('end &h' + '{0:04x}'.format(base + 1) + ' > ' + str(base + 1)) 537 | list_code.append('size &h' + '{0:04x}'.format((base - base_base) + 3) + ' > ' + str((base - base_base) + 3)) 538 | 539 | show_log('', 'Saving file', 3) 540 | show_log('', ' '.join(['save_file:', file_save]), 4) 541 | with open(file_save, 'wb') as f: 542 | for line in tokenized_code: 543 | f.write(binascii.unhexlify(line)) 544 | 545 | if delete_original: 546 | if os.path.isfile(file_save): 547 | show_log('', 'Deleting source', 3) 548 | show_log('', ' '.join(['delete_file:', file_load]), 4) 549 | osremove(file_load) 550 | else: 551 | show_log('', ' '.join(['source_not_deleted', file_load]), 2) 552 | show_log('', ' '.join(['converted_not_found', file_save]), 2) 553 | 554 | if export_list: 555 | show_log('', 'Saving list', 3) 556 | show_log('', ' '.join(['save_list:', file_list]), 4) 557 | with open(file_list, 'w') as f: 558 | for line in list_code: 559 | f.write(line + '\n') 560 | 561 | show_log('', '', 3, bullet=0) 562 | -------------------------------------------------------------------------------- /openMSXBatoken.ini: -------------------------------------------------------------------------------- 1 | [CONFIGS] 2 | file_load = 3 | file_save = 4 | machine_name = 5 | disk_ext_name = 6 | output_format = 7 | delete_original = 8 | verbose_level = 9 | 10 | [WINPATHS] 11 | openmsx_filepath = 12 | 13 | [MACPATHS] 14 | openmsx_filepath = -------------------------------------------------------------------------------- /openMSXbatoken.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | openMSX Basic (de)Tokenizer 4 | v1.4 5 | Uses openMSX to tokenize ASCII MSX Basic or vice-versa. 6 | 7 | Copyright (C) 2019-2022 - Fred Rique (farique) 8 | https://github.com/farique1/MSX-Basic-Tokenizer 9 | 10 | See also: 11 | MSX Sublime Tools at 12 | https://github.com/farique1/MSX-Sublime-Tools 13 | Syntax Highlight, Theme, Build System, Comment Preference and Auto Completion. 14 | 15 | MSX Basic Dignified at 16 | https://github.com/farique1/msx-basic-dignified 17 | Convert modern MSX Basic Dignified to traditional MSX Basic format. 18 | 19 | openmsxbatoken.py [args...] 20 | openmsxbatoken.py -h for help. 21 | 22 | New: 1.4v 10/12/2021 23 | WINDOWS COMPATIBILITY YEY! 24 | Individual Windows and Mac paths on the .INI file 25 | os.path() operations to improve compatibility across systems 26 | Changed .INI section names 27 | """ 28 | 29 | # for i in *.bas; do python openmsxbatoken.py "$i" -asc; done 30 | # *** CAUTION *** An autoexec.bas on the mounted folder will crash the script 31 | 32 | import time 33 | import os.path 34 | import platform 35 | import argparse 36 | import subprocess 37 | import configparser 38 | from os import remove as osremove 39 | 40 | file_load = '' # Source file 41 | file_save = '' # Destination file 42 | machine_name = '' # openMSX machine to open, eg: 'Sharp_HB-8000_1.2' 'Sharp_HB-8000_1.2_Disk' 'Philips_NMS_8250' 43 | disk_ext_name = '' # openMSX extension to open, eg: 'Microsol_Disk:SlotB' 44 | show_output = True # Show the openMSX stderr output 45 | output_format = 't' # Tokenized or ASCII output: t-tokenized a-ASCII 46 | delete_original = False # Delete the original file 47 | verbose_level = 3 # Show processing status: 0-silent 1-+erros 2-+warnings 3-+steps 4-+details 48 | is_from_build = False # Tell if it is being called from a build system (show file name on error messages and other stuff) 49 | openmsx_filepath = '' # Path to openMSX ('' = local path) 50 | 51 | is_windows = platform.system() == "Windows" # Get the operating system 52 | openmsx_app = 'openmsx.exe' if is_windows else 'openmsx.app' 53 | 54 | 55 | def show_log(line, text, level, **kwargs): 56 | bullets = ['', '*** ', ' * ', '--- ', ' - ', ' '] 57 | 58 | try: 59 | bullet = kwargs['bullet'] 60 | except KeyError: 61 | bullet = level 62 | 63 | display_file_name = '' 64 | if is_from_build and (bullet == 1 or bullet == 2): 65 | display_file_name = os.path.basename(file_load) + ': ' 66 | 67 | line_number = line 68 | 69 | if verbose_level >= level: 70 | print(bullets[bullet] + display_file_name + line_number + text) 71 | # print bullets[bullet] + text 72 | 73 | if bullet == 1: 74 | if proc: 75 | proc.stdin.write('type_via_keybuf poke-2,0\\r') 76 | if is_from_build: 77 | print(' Tokenizing_aborted') 78 | else: 79 | print(' Execution_stoped') 80 | print() 81 | raise SystemExit(0) 82 | 83 | 84 | local_path = os.path.split(os.path.abspath(__file__))[0] 85 | ini_path = os.path.join(local_path, 'openMSXBatoken.ini') 86 | if os.path.isfile(ini_path): 87 | ini_section = 'WINPATHS' if is_windows else 'MACPATHS' 88 | config = configparser.ConfigParser() 89 | config.sections() 90 | try: 91 | config.read(ini_path) 92 | file_load = config.get('CONFIGS', 'file_load') if config.get('CONFIGS', 'file_load') else file_load 93 | file_save = config.get('CONFIGS', 'file_save') if config.get('CONFIGS', 'file_save') else file_save 94 | machine_name = config.get('CONFIGS', 'machine_name') if config.get('CONFIGS', 'machine_name') else machine_name 95 | disk_ext_name = config.get('CONFIGS', 'disk_ext_name') if config.get('CONFIGS', 'disk_ext_name').strip() else disk_ext_name 96 | output_format = config.get('CONFIGS', 'Output_format') if config.get('CONFIGS', 'Output_format') else output_format 97 | delete_original = config.getboolean('CONFIGS', 'delete_original') if config.get('CONFIGS', 'delete_original') else delete_original 98 | verbose_level = config.getint('CONFIGS', 'verbose_level') if config.get('CONFIGS', 'verbose_level') else verbose_level 99 | openmsx_filepath = config.get(ini_section, 'openmsx_filepath') if config.get(ini_section, 'openmsx_filepath') else openmsx_filepath 100 | except (ValueError, configparser.NoOptionError) as e: 101 | show_log('', 'openMSXBatoken.ini: ' + str(e), 1) 102 | 103 | parser = argparse.ArgumentParser(description='Use openMSX to convert between ASCII and tokenized MSX Basic.') 104 | parser.add_argument("file_load", nargs='?', default=file_load, help='The file to be converted.') 105 | parser.add_argument("file_save", nargs='?', default=file_save, help='The file to convert to.') 106 | parser.add_argument("-of", default=output_format, choices=['t', 'T', 'a', 'A'], help="Tokenized or ASCII output: t-tokenized(def) a-ASCII") 107 | parser.add_argument("-do", default=delete_original, action='store_true', help="Delete original file after conversion") 108 | parser.add_argument("-vb", default=verbose_level, type=int, help="Verbosity level: 0 silent, 1 errors, 2 +warnings, 3 +steps(def), 4 +detalis") 109 | parser.add_argument("-frb", default=is_from_build, action='store_true', help="Tell it is running from a build system") 110 | args = parser.parse_args() 111 | 112 | file_load = args.file_load 113 | file_save = args.file_save 114 | output_format = args.of.upper() 115 | delete_original = args.do 116 | verbose_level = args.vb 117 | is_from_build = args.frb 118 | 119 | show_log('', '', 3, bullet=0) 120 | 121 | file_load_full = file_load 122 | file_name = os.path.basename(file_save) 123 | save_rest = file_save.replace(file_name, '') 124 | file_save = file_name 125 | save_extension = '.bas' 126 | save_argument = '' 127 | using_machine = 'default machine' 128 | disk_ext_slot = 'ext' 129 | proc = '' 130 | if openmsx_filepath == '': 131 | openmsx_filepath = os.path.join(local_path, openmsx_app) 132 | if machine_name != '': 133 | using_machine = machine_name 134 | machine_name = ['-machine', machine_name] 135 | disk_ext = disk_ext_name.split(':') 136 | disk_ext_name = disk_ext[0].strip() 137 | if len(disk_ext) > 1: 138 | disk_ext_slot = 'extb' if disk_ext[1].lower().strip() == 'slotb' else disk_ext_slot 139 | if file_load: 140 | if not os.path.isfile(file_load): 141 | show_log('', ' '.join(['source_not_found', file_load]), 1) # Exit 142 | else: 143 | show_log('', 'source_not_given', 1) # Exit 144 | if output_format == 'A': 145 | save_extension = '.asc' 146 | save_argument = '",a' 147 | if file_save == '': 148 | file_save = os.path.basename(file_load) 149 | file_save = os.path.splitext(file_save)[0] + save_extension 150 | disk_path = os.path.dirname(file_load) 151 | disk_path = local_path if disk_path == '' else disk_path 152 | file_load = os.path.basename(file_load) 153 | 154 | crop_load = os.path.splitext(file_load)[0][0:8] + os.path.splitext(file_load)[1] 155 | crop_save = os.path.splitext(file_save)[0][0:8] + os.path.splitext(file_save)[1] 156 | 157 | if save_rest: 158 | show_log('', ' '.join(['destination_path_removed', save_rest]), 2) 159 | 160 | if crop_save == file_load: 161 | show_log('', ' '.join(['destination_same_as_source', crop_save]), 1) # Exit 162 | 163 | list_dir = os.listdir(disk_path) 164 | list_load = [x for x in list_dir if 165 | x.lower() != file_load.lower() 166 | and os.path.splitext(x)[0][0:8].replace(' ', '_').lower() 167 | + os.path.splitext(x)[1].replace(' ', '_').lower() 168 | == crop_load.lower()] 169 | 170 | list_save = [x for x in list_dir if 171 | x.lower() != file_load.lower() 172 | and os.path.splitext(x)[0][0:8].replace(' ', '_').lower() 173 | + os.path.splitext(x)[1].replace(' ', '_').lower() 174 | == crop_load.lower() 175 | and len(os.path.splitext(x)[0][0:8]) > 8] 176 | 177 | list_all = list_load.extend(list_save) 178 | 179 | if list_all: 180 | show_log('', ' '.join(['MSX_disk_name_format_conflict', ', '.join(list_all)]), 1) # Exit 181 | 182 | disk_path = disk_path.replace(' ', r'\ ') 183 | crop_load = crop_load.replace(' ', '_') 184 | crop_save = crop_save.replace(' ', '_') 185 | 186 | 187 | def output(show_output, has_input, step): 188 | if show_output: 189 | log_out = proc.stdout.readline().rstrip() if has_input else '' 190 | log_out = log_out.replace('"', '"') 191 | log_out = log_out.replace(''', "'") 192 | if '"nok"' in log_out or ' error: ' in log_out: 193 | log_out = log_out.replace('', '') 194 | proc.stdin.write('quit') 195 | show_log('', ''.join([step]), 3) 196 | if 'invalid command name "ext' in log_out: 197 | show_log('', ''.join(['Machine probably missing a slot']), 2) 198 | show_log('', ''.join([log_out]), 1) # Exit 199 | elif '' in log_out: 200 | log_warning = log_out.replace('', '') 201 | log_warning = log_warning.replace('', '') 202 | log_out = log_out.split('', '') 210 | log_out = log_out.replace('', '') 211 | log_out = log_out.replace('', '') 212 | log_out = log_out.replace('', '') 213 | log_comma = '' if log_out == '' else ': ' 214 | if step + log_comma + log_out != '': 215 | show_log('', ''.join([step, log_comma, log_out]), 3) 216 | 217 | 218 | if is_windows: 219 | disk_path = disk_path.replace('\\', '/') # cmd apparently needs forward slashes even on Windows 220 | cmd = [openmsx_filepath, '-control', 'stdio'] 221 | else: 222 | cmd = [os.path.join(openmsx_filepath, 'contents', 'macos', 'openmsx'), '-control', 'stdio'] 223 | 224 | if machine_name != '': 225 | cmd.extend(machine_name) 226 | 227 | proc = subprocess.Popen(cmd, bufsize=1, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, encoding='utf-8') 228 | 229 | endline = '\r\n' 230 | 231 | output(show_output, True, 'openMSX initialized as ' + using_machine) 232 | 233 | # proc.stdin.write('set renderer SDL' + endline) 234 | # output(show_output, True, 'Show screen') 235 | 236 | proc.stdin.write('set throttle off' + endline) 237 | output(show_output, True, 'Set throttle off') 238 | 239 | proc.stdin.write('debug set_watchpoint write_mem 0xfffe {[debug read "memory" 0xfffe] == 0} {quit}' + endline) 240 | output(show_output, True, 'Set quit watchpoint') 241 | 242 | if disk_ext_name != '': 243 | proc.stdin.write('' + disk_ext_slot + ' ' + disk_ext_name + '' + endline) 244 | output(show_output, True, 'Insert disk drive extension: ' + disk_ext_name + ' at ' + disk_ext_slot) 245 | 246 | proc.stdin.write('diska insert ' + disk_path + '' + endline) 247 | output(show_output, True, 'insert folder as disk: ' + disk_path) 248 | 249 | proc.stdin.write('set power on' + endline) 250 | output(show_output, True, 'Power on') 251 | 252 | proc.stdin.write('type_via_keybuf \\r\\r' + endline) # Disk ROM ask for date, enter twice to skip 253 | output(show_output, True, 'Press return twice') 254 | 255 | proc.stdin.write('type_via_keybuf load"' + crop_load + '\\r' + endline) 256 | output(show_output, True, 'type load"' + crop_load) 257 | 258 | proc.stdin.write('type_via_keybuf save"' + crop_save + save_argument + '\\r' + endline) 259 | output(show_output, True, 'type save"' + crop_save + save_argument) 260 | 261 | proc.stdin.write('type_via_keybuf poke-2,0\\r' + endline) 262 | output(show_output, True, 'Quit') 263 | 264 | time.sleep(1) 265 | 266 | file_save_full = os.path.join(disk_path, crop_save) 267 | print(file_save_full) 268 | if delete_original: 269 | if os.path.isfile(file_save_full): 270 | show_log('', 'Deleting source', 3) 271 | show_log('', ' '.join(['delete_file:', file_load_full]), 4) 272 | osremove(file_load_full) 273 | else: 274 | show_log('', ' '.join(['source_not_deleted', file_load_full]), 2) 275 | show_log('', ' '.join(['converted_not_found', crop_save]), 2) 276 | 277 | show_log('', '', 3, bullet=0) 278 | --------------------------------------------------------------------------------