├── Dockerfile ├── LICENSE ├── README.md ├── docs ├── walkthrough_mysql.md ├── walkthrough_postgresql.md └── walkthrough_sqlite.md ├── example_dataset ├── Chinook_MySql.sql.gz ├── Chinook_PostgreSql.sql.gz └── Chinook_Sqlite.sql.gz ├── misc └── asciinema_script.sh └── src ├── JsonStreamWriter.php └── transform.php /Dockerfile: -------------------------------------------------------------------------------- 1 | # this comes unmodified from the official Alpine repo, but I mirror this image 2 | # to pin the exact image for reproducability 3 | FROM joonas/alpine:f4fddc471ec2 4 | 5 | RUN apk add --update php-cli php-json php-pdo php-pdo_mysql php-pdo_odbc php-pdo_pgsql php-pdo_sqlite php-zlib sqlite \ 6 | && rm -rf /var/cache/apk/* 7 | 8 | CMD ["php","/transformer/transform.php"] 9 | 10 | ADD ["/src","/transformer"] 11 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 function61.com 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | What can sql2json help you with? 2 | ================================ 3 | 4 | This project can help you if you need to: 5 | 6 | - Export data from a database into JSON so it can be transformed to some other format more easily. E.g. you need to transform 7 | a database from another vendor to another (e.g. MySQL -> PostgreSQL). sql2json would help you get the data from MySQL into JSON 8 | and after that you could write a program to insert the data into PostgreSQL. This project doesn't help you with the actual 9 | transformation because it's not a trivial problem. 10 | 11 | - Just dump the database contents to JSON so the JSON can be accessed from any programming language. 12 | 13 | Why not just export SQL dump from vendor X and import the same file to vendor Y - SQL is standard anyways? 14 | 15 | Turns out that SQL - even though being a standard, is not that interoperable. You cannot take a SQL dump produced 16 | by say MySQL, and import it into SQLite or PostgreSQL. There are [clever hacks](https://gist.github.com/esperlu/943776), 17 | but you're going to get disappointed. 18 | 19 | Out of this frustration I created sql2json, that essentially dumps your database as JSON files, so the dataset is super 20 | easy to process (or import to another database) in any programming language! SQL is hard to parse while JSON is super trivial. 21 | 22 | 23 | Walkthroughs 24 | ------------ 25 | 26 | - [mysql](docs/walkthrough_mysql.md) (read this to get best sense of how this works) 27 | - [postgresql](docs/walkthrough_postgresql.md) 28 | - [sqlite](docs/walkthrough_sqlite.md) 29 | 30 | 31 | Architecture 32 | ------------ 33 | 34 | sql2json is just a tool (Docker container) that exports either: 35 | 36 | - a database (from a running DBMS) OR 37 | - .sql file (you run a temporary MySQL/PostgreSQL/.. instance with help of Docker) to JSON files - one per table. 38 | 39 | Supported databases: 40 | 41 | +-------+ 42 | | | 43 | | MySQL +---+ 44 | | | | 45 | +-------+ | 46 | | 47 | +--------+ | +----------+ +-------------+ 48 | | | | | | | | 49 | | SQLite +---------> sql2json +-----> .json files | 50 | | | | | | | | 51 | +--------+ | +----------+ +-------------+ 52 | | 53 | +------------+ | 54 | | | | 55 | | PostgreSQL +---+ 56 | | | 57 | +------------+ 58 | 59 | View from Docker's perspective: 60 | 61 | Database container ------+ sql2json container --+ 62 | | | | | 63 | | +-----------+ +----+ | | +-----------+ | 64 | | |Import data|-->|DBMS|<------| sql2json | | 65 | | +-----------+ +----+ | | +-----------+ | 66 | | ^ | | | | 67 | +-----|------------------+ +---|----------------+ 68 | | | 69 | +-----|--------------------------|-----+ 70 | | | v | 71 | | +--------+ +-------------------+ | 72 | | |SQL dump| |Result: | | 73 | | +--------+ |data in .json files| | 74 | | +-------------------+ | 75 | | | 76 | Docker host ---------------------------+ 77 | 78 | The end result is pretty nice, as from host perspective the only tool required is Docker. And you don't have to permanently 79 | install MySQL/PostgreSQL if all you want is to load the .sql dump and transform it to JSON. 80 | Just remove the temporary DBMS and sql2json containers when you're done and your system is clean as ever. :) 81 | 82 | Demo 83 | ==== 84 | 85 | [![asciicast](https://asciinema.org/a/722yo4odqo1sulztyaeaxz4k1.png)](https://asciinema.org/a/722yo4odqo1sulztyaeaxz4k1) 86 | 87 | 88 | FAQ 89 | --- 90 | 91 | Q: Can I export only a subset of the data with a custom SQL query? 92 | 93 | A: Yes, [see this link for instructions](https://github.com/function61/sql2json/issues/1)! 94 | 95 | ----------- 96 | 97 | Q: My database isn't supported 98 | 99 | A: Adding other databases is really easy, provided that PHP's [PDO layer](http://php.net/manual/en/pdo.drivers.php) 100 | supports it! PR's are appreciated! 101 | 102 | ----------- 103 | 104 | Q: I have a large database - will my data fit in memory? 105 | 106 | A: No problem, sql2json streams the JSON output and gzips the output files (constant RAM usage and low disk usage). 107 | 108 | ----------- 109 | 110 | Q: How do I read my table.json.gz file if it doesn't fit in RAM? 111 | 112 | A: The answer is streaming JSON parsing. The same applies for any huge file: you will not process it as a buffered whole, but in parts. 113 | 114 | In XML world there is a concept of a SAX parsing (= Streaming API for XML). 115 | There are streaming JSON parsers for almost any language (these are just examples - there might exist better alternatives): 116 | 117 | - PHP: [salsify/jsonstreamingparser](https://github.com/salsify/jsonstreamingparser) 118 | - Ruby: [dgraham/json-stream](https://github.com/dgraham/json-stream) 119 | - JavaScript: [dscape/clarinet](https://github.com/dscape/clarinet) 120 | 121 | Or write ad-hoc one yourself (it isn't too hard in this case - sql2json intentionally writes one row per line) 122 | 123 | 124 | Thanks 125 | ====== 126 | 127 | [Chinook example dataset](http://chinookdatabase.codeplex.com/): see `example_dataset/` directory. 128 | (NOTE: I had to convert PostgreSQL's .sql file to utf-8) 129 | 130 | Todo 131 | ==== 132 | 133 | - Add ODBC support (should be easy to hack in, but I didn't have any ODBC database driver installed) 134 | 135 | 136 | Support / contact 137 | ----------------- 138 | 139 | Basic support (no guarantees) for issues / feature requests via GitHub issues. 140 | 141 | Paid support is available via [function61.com/consulting](https://function61.com/consulting/) 142 | 143 | Contact options (email, Twitter etc.) at [function61.com](https://function61.com/) 144 | -------------------------------------------------------------------------------- /docs/walkthrough_mysql.md: -------------------------------------------------------------------------------- 1 | 2 | Overview 3 | ======== 4 | 5 | ![Graph](https://g.gravizo.com/g? 6 | digraph G { 7 | sql2json [label="sql2json process"]; 8 | export_from_where [shape=doubleoctagon label="Do you have a running database instance?"]; 9 | run_sql2json [label="Run sql2json"]; 10 | i_only_have_sql_file [label="I only have an .sql file"]; 11 | Done [label="Done! Data exported as .json files :%29"]; 12 | create_mysql_instance [label="Create database instance%5Cn%28temporary, as Docker container%29"]; 13 | load_data_from_sql [label="Load .sql file into it"]; 14 | sql2json -> export_from_where; 15 | export_from_where -> run_sql2json [label=" yes"]; 16 | export_from_where -> i_only_have_sql_file [label=" no"]; 17 | i_only_have_sql_file -> create_mysql_instance -> load_data_from_sql -> run_sql2json; 18 | run_sql2json -> Done; 19 | } 20 | ) 21 | 22 | 23 | I only have an .sql file 24 | ======================== 25 | 26 | So you have a .sql file. We first need to load it into a new MySQL server instance 27 | (we can conveniently do that with help of Docker): 28 | 29 | ``` 30 | $ docker run -d --name sql2json-dbserver -p 3306:3306 imega/mysql:1.1.0 31 | ``` 32 | 33 | 34 | That image (`imega/mysql:1.1.0`) has a minor issue we need to fix - it doesn't have the `mysql` **client** installed, let's fix it: 35 | 36 | ``` 37 | # get a shell in it 38 | $ docker exec -it sql2json-dbserver sh 39 | 40 | # install mysql client and fix an issue with the socket location 41 | $ apk add --update mysql-client && mkdir -p /run/mysqld/ && ln -s /var/lib/mysql/mysql.sock /run/mysqld/mysqld.sock 42 | 43 | # return back from the container 44 | $ exit 45 | ``` 46 | 47 | Ok the instance is running and ready to use, now we need to load the .sql file into it. 48 | 49 | If your .sql script has `USE ` statement (maybe along with `DROP DATABASE IF EXISTS `), you need to use that as your ``. 50 | 51 | The example file we use has `dbname=Chinook` (case sensitive). First, create a database into which we'll load the data: 52 | 53 | ``` 54 | $ echo 'CREATE DATABASE ' | docker exec -i sql2json-dbserver mysql 55 | ``` 56 | 57 | Now, let's load the data into that database: 58 | 59 | ``` 60 | $ cat example_dataset/Chinook_MySql.sql.gz | gzip -d | docker exec -i sql2json-dbserver mysql 61 | ``` 62 | 63 | (note: you don't need the gzip part unless your .sql file is gzipped) 64 | 65 | You can verify that the data was loaded: 66 | 67 | ``` 68 | $ echo 'SHOW TABLES' | docker exec -i sql2json-dbserver mysql 69 | Tables_in_Chinook 70 | Album 71 | Artist 72 | Customer 73 | Employee 74 | Genre 75 | Invoice 76 | InvoiceLine 77 | MediaType 78 | Playlist 79 | PlaylistTrack 80 | Track 81 | ``` 82 | 83 | Ok, now we have a DBMS instance running with the data we want to export as JSON. 84 | 85 | Let's find out the IP address of the DBMS: 86 | 87 | ``` 88 | $ docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' sql2json-dbserver 89 | 172.17.0.2 90 | ``` 91 | 92 | This particular image doesn't have a password configured, so the details we need to build the DSN are: 93 | 94 | ``` 95 | username=(none) 96 | password=(none) 97 | host=172.17.0.2 (remember to replace with your own details) 98 | ``` 99 | 100 | Therefore, our DSN is: 101 | 102 | ``` 103 | ,,mysql:host=172.17.0.2;port=3306;dbname=Chinook;charset=utf8 104 | ``` 105 | 106 | If it would have a user/password, the DSN is: 107 | 108 | ``` 109 | myusername,supersecret,mysql:host=172.17.0.2;port=3306;dbname=Chinook;charset=utf8 110 | ``` 111 | 112 | For the record, our DSN format is: 113 | 114 | ``` 115 | ,, 116 | ``` 117 | 118 | Now we have created the database server and loaded the data, you can proceed to the 119 | next heading which explains how to transform a database into JSON. 120 | 121 | 122 | I have a database server instance I want to export data from 123 | ============================================================ 124 | 125 | Ok great, we just need the DSN and we're ready to do the export. If you don't know the DSN, read the previous heading. 126 | 127 | In this example scenario, our DSN is: 128 | 129 | ``` 130 | ,,mysql:host=172.17.0.2;port=3306;dbname=Chinook;charset=utf8 131 | ``` 132 | 133 | Let's create a directory, into which the .json files will be dumped: 134 | 135 | ``` 136 | $ mkdir sql2json_result 137 | ``` 138 | 139 | Now run the conversion process: 140 | 141 | ``` 142 | $ docker run --rm -it -v "$(pwd)/sql2json_result:/result" -e "DSN=,,mysql:host=172.17.0.2;port=3306;dbname=Chinook;charset=utf8" joonas/sql2json 143 | 2016-12-21 20:23:38 - Username: (no username) 144 | 2016-12-21 20:23:38 - Password: (no password) 145 | 2016-12-21 20:23:38 - Connecting to DSN mysql:host=172.17.0.2;port=3306;dbname=Chinook;charset=utf8 146 | 2016-12-21 20:23:42 - Listing tables 147 | 2016-12-21 20:23:42 - Fetching schema 148 | 2016-12-21 20:23:43 - Wrote /result/combined_schema.json 149 | 2016-12-21 20:23:43 - Dumping Album 150 | 2016-12-21 20:23:43 - Wrote 347 rows to /result/data/Album.json.gz 151 | 2016-12-21 20:23:43 - Dumping Artist 152 | 2016-12-21 20:23:43 - Wrote 275 rows to /result/data/Artist.json.gz 153 | 2016-12-21 20:23:43 - Dumping Customer 154 | 2016-12-21 20:23:43 - Wrote 59 rows to /result/data/Customer.json.gz 155 | 2016-12-21 20:23:43 - Dumping Employee 156 | 2016-12-21 20:23:43 - Wrote 8 rows to /result/data/Employee.json.gz 157 | 2016-12-21 20:23:43 - Dumping Genre 158 | 2016-12-21 20:23:43 - Wrote 25 rows to /result/data/Genre.json.gz 159 | 2016-12-21 20:23:43 - Dumping Invoice 160 | 2016-12-21 20:23:43 - Wrote 412 rows to /result/data/Invoice.json.gz 161 | 2016-12-21 20:23:43 - Dumping InvoiceLine 162 | 2016-12-21 20:23:43 - Wrote 2240 rows to /result/data/InvoiceLine.json.gz 163 | 2016-12-21 20:23:43 - Dumping MediaType 164 | 2016-12-21 20:23:43 - Wrote 5 rows to /result/data/MediaType.json.gz 165 | 2016-12-21 20:23:43 - Dumping Playlist 166 | 2016-12-21 20:23:43 - Wrote 18 rows to /result/data/Playlist.json.gz 167 | 2016-12-21 20:23:43 - Dumping PlaylistTrack 168 | 2016-12-21 20:23:43 - Wrote 8715 rows to /result/data/PlaylistTrack.json.gz 169 | 2016-12-21 20:23:43 - Dumping Track 170 | 2016-12-21 20:23:44 - Wrote 3503 rows to /result/data/Track.json.gz 171 | 2016-12-21 20:23:44 - Done, exported 11 tables 172 | ``` 173 | 174 | Now you should have the following file structure: 175 | 176 | ``` 177 | $ tree sql2json_result/ 178 | sql2json_result/ 179 | ├── combined_schema.json 180 | ├── data 181 | │   ├── Album.json.gz 182 | │   ├── Artist.json.gz 183 | │   ├── Customer.json.gz 184 | │   ├── Employee.json.gz 185 | │   ├── Genre.json.gz 186 | │   ├── Invoice.json.gz 187 | │   ├── InvoiceLine.json.gz 188 | │   ├── MediaType.json.gz 189 | │   ├── Playlist.json.gz 190 | │   ├── PlaylistTrack.json.gz 191 | │   └── Track.json.gz 192 | └── schema 193 | ├── Album.json 194 | ├── Artist.json 195 | ├── Customer.json 196 | ├── Employee.json 197 | ├── Genre.json 198 | ├── Invoice.json 199 | ├── InvoiceLine.json 200 | ├── MediaType.json 201 | ├── Playlist.json 202 | ├── PlaylistTrack.json 203 | └── Track.json 204 | 205 | 2 directories, 23 files 206 | ``` 207 | 208 | You can now see the schema in JSON: 209 | 210 | ``` 211 | $ cat sql2json_result/schema/Album.json 212 | { 213 | "name": "Album", 214 | "fields": [ 215 | { 216 | "Field": "AlbumId", 217 | "Type": "int(11)", 218 | "Null": "NO", 219 | "Key": "PRI", 220 | "Default": null, 221 | "Extra": "" 222 | }, 223 | { 224 | "Field": "Title", 225 | "Type": "varchar(160)", 226 | "Null": "NO", 227 | "Key": "", 228 | "Default": null, 229 | "Extra": "" 230 | }, 231 | { 232 | "Field": "ArtistId", 233 | "Type": "int(11)", 234 | "Null": "NO", 235 | "Key": "MUL", 236 | "Default": null, 237 | "Extra": "" 238 | } 239 | ] 240 | } 241 | ``` 242 | 243 | If you have [jq](https://stedolan.github.io/jq/) installed, it's easy poke with the JSON data: 244 | 245 | ``` 246 | $ cat sql2json_result/data/Album.json.gz | gzip -d | jq '.[0]' 247 | { 248 | "AlbumId": "1", 249 | "Title": "For Those About To Rock We Salute You", 250 | "ArtistId": "1" 251 | } 252 | ``` 253 | 254 | Or even sort by one field: 255 | 256 | ``` 257 | $ cat sql2json_result/data/Album.json.gz | gzip -d | jq '.[].Title' | sort | head -10 258 | "[1997] Black Light Syndrome" 259 | "20th Century Masters - The Millennium Collection: The Best of Scorpions" 260 | "Ace Of Spades" 261 | "Achtung Baby" 262 | "A Copland Celebration, Vol. I" 263 | "Acústico" 264 | "Acústico MTV" 265 | "Acústico MTV [Live]" 266 | "Adams, John: The Chairman Dances" 267 | "Adorate Deum: Gregorian Chant from the Proper of the Mass" 268 | ``` 269 | 270 | Done, cleanup 271 | ============= 272 | 273 | We ran sql2json with the `--rm` switch, so sql2json was already removed. 274 | 275 | You can now destroy the DBMS instance by running: 276 | 277 | ``` 278 | $ docker rm -f sql2json-dbserver 279 | ``` 280 | 281 | The beauty of running containerized software is that even when running complex stuff, 282 | cleanup ensures that after removing the container it looks like the software was never installed - no trace left behind. :) 283 | -------------------------------------------------------------------------------- /docs/walkthrough_postgresql.md: -------------------------------------------------------------------------------- 1 | 2 | Overview 3 | ======== 4 | 5 | This guide is pretty much the same as for MySQL. Go read it first, and then come back! We'll only list here the stuff that's different. 6 | 7 | ![Graph](https://g.gravizo.com/g? 8 | digraph G { 9 | sql2json [label="sql2json process"]; 10 | export_from_where [shape=doubleoctagon label="Do you have a running database instance?"]; 11 | run_sql2json [label="Run sql2json"]; 12 | i_only_have_sql_file [label="I only have an .sql file"]; 13 | Done [label="Done! Data exported as .json files :%29"]; 14 | create_mysql_instance [label="Create database instance%5Cn%28temporary, as Docker container%29"]; 15 | load_data_from_sql [label="Load .sql file into it"]; 16 | sql2json -> export_from_where; 17 | export_from_where -> run_sql2json [label=" yes"]; 18 | export_from_where -> i_only_have_sql_file [label=" no"]; 19 | i_only_have_sql_file -> create_mysql_instance -> load_data_from_sql -> run_sql2json; 20 | run_sql2json -> Done; 21 | } 22 | ) 23 | 24 | 25 | I only have an .sql file 26 | ======================== 27 | 28 | So you have a .sql file. We first need to load it into a new PostgreSQL server instance 29 | (we can conveniently do that with help of Docker): 30 | 31 | ``` 32 | $ docker run -d --name sql2json-dbserver -p 5432:5432 kiasaki/alpine-postgres:9.5 33 | ``` 34 | 35 | Ok the instance is running and ready to use, now we need to load the .sql file into it. 36 | 37 | In this example we'll use the name `chinook` as the database name. If yours is different, replace accordingly. 38 | 39 | First, create the database: 40 | 41 | ``` 42 | $ echo 'CREATE DATABASE chinook' | docker exec -i sql2json-dbserver psql -U postgres postgres 43 | ``` 44 | 45 | Now, let's load the data into that database: 46 | 47 | ``` 48 | $ cat example_dataset/Chinook_PostgreSql.sql.gz | gzip -d | docker exec -i sql2json-dbserver psql -U postgres chinook 49 | ``` 50 | 51 | (note: you don't need the gzip part unless your .sql file gzipped) 52 | 53 | Ok, now we have a DBMS instance running with the data we want to export as JSON. 54 | 55 | Let's find out the IP address of the DBMS: 56 | 57 | ``` 58 | $ docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' sql2json-dbserver 59 | 172.17.0.2 60 | ``` 61 | 62 | This particular image doesn't have a password configured, so the details we need to build the DSN are: 63 | 64 | ``` 65 | username=postgres 66 | password=(none) 67 | host=172.17.0.2 (remember to replace with your own details) 68 | ``` 69 | 70 | Therefore, our DSN is: 71 | 72 | ``` 73 | postgres,,pgsql:dbname=chinook;host=172.17.0.2;port=5432 74 | ``` 75 | 76 | If it would have a different user/password, the DSN is: 77 | 78 | ``` 79 | myusername,mysecret,pgsql:dbname=chinook;host=172.17.0.2;port=5432 80 | ``` 81 | 82 | For the record, our DSN format is: 83 | 84 | ``` 85 | ,, 86 | ``` 87 | 88 | Now we have created the database server and loaded the data, you can proceed to the 89 | next heading which explains how to transform a database into JSON. 90 | 91 | 92 | I have a database server instance I want to export data from 93 | ============================================================ 94 | 95 | Ok great, we just need the DSN and we're ready to do the export. If you don't know the DSN, read the previous heading. 96 | 97 | In this example scenario, our DSN is: 98 | 99 | ``` 100 | postgres,,pgsql:dbname=chinook;host=172.17.0.2;port=5432 101 | ``` 102 | 103 | Let's create a directory, into which the .json files will be dumped: 104 | 105 | ``` 106 | $ mkdir sql2json_result 107 | ``` 108 | 109 | Now run the conversion process: 110 | 111 | ``` 112 | $ docker run --rm -it -v "$(pwd)/sql2json_result:/result" -e "DSN=postgres,,pgsql:dbname=chinook;host=172.17.0.2;port=5432" joonas/sql2json 113 | 2016-12-21 21:39:49 - Using username: postgres 114 | 2016-12-21 21:39:49 - Password: (no password) 115 | 2016-12-21 21:39:49 - Connecting to DSN pgsql:dbname=chinook;host=172.17.0.2;port=5432 116 | 2016-12-21 21:39:49 - Listing tables 117 | 2016-12-21 21:39:49 - Skipping schema fetch - only know how to do it for MySQL 118 | 2016-12-21 21:39:49 - Dumping Customer 119 | 2016-12-21 21:39:49 - Wrote 59 rows to /result/data/Customer.json.gz 120 | 2016-12-21 21:39:49 - Dumping Artist 121 | 2016-12-21 21:39:49 - Wrote 275 rows to /result/data/Artist.json.gz 122 | 2016-12-21 21:39:49 - Dumping PlaylistTrack 123 | 2016-12-21 21:39:49 - Wrote 8715 rows to /result/data/PlaylistTrack.json.gz 124 | 2016-12-21 21:39:49 - Dumping Track 125 | 2016-12-21 21:39:50 - Wrote 3503 rows to /result/data/Track.json.gz 126 | 2016-12-21 21:39:50 - Dumping Playlist 127 | 2016-12-21 21:39:50 - Wrote 18 rows to /result/data/Playlist.json.gz 128 | 2016-12-21 21:39:50 - Dumping Genre 129 | 2016-12-21 21:39:50 - Wrote 25 rows to /result/data/Genre.json.gz 130 | 2016-12-21 21:39:50 - Dumping Invoice 131 | 2016-12-21 21:39:50 - Wrote 412 rows to /result/data/Invoice.json.gz 132 | 2016-12-21 21:39:50 - Dumping Employee 133 | 2016-12-21 21:39:50 - Wrote 8 rows to /result/data/Employee.json.gz 134 | 2016-12-21 21:39:50 - Dumping Album 135 | 2016-12-21 21:39:50 - Wrote 347 rows to /result/data/Album.json.gz 136 | 2016-12-21 21:39:50 - Dumping InvoiceLine 137 | 2016-12-21 21:39:50 - Wrote 2240 rows to /result/data/InvoiceLine.json.gz 138 | 2016-12-21 21:39:50 - Dumping MediaType 139 | 2016-12-21 21:39:50 - Wrote 5 rows to /result/data/MediaType.json.gz 140 | 2016-12-21 21:39:50 - Done, exported 11 tables 141 | ``` 142 | 143 | Now scoot over to the MySQL variant of tutorial to see the rest! 144 | -------------------------------------------------------------------------------- /docs/walkthrough_sqlite.md: -------------------------------------------------------------------------------- 1 | 2 | Convert .sql into SQLite's .db 3 | ============================== 4 | 5 | ``` 6 | $ mkdir sql2json_result 7 | ``` 8 | 9 | Place your .sql file in `sql2json_result/`. 10 | (or the example `$ cat example_dataset/Chinook_Sqlite.sql.gz | gzip -d > sql2json_result/Chinook_Sqlite.sql`) 11 | 12 | Now convert that into SQLite's database format file: 13 | 14 | ``` 15 | $ docker run --rm -it -v "$(pwd)/sql2json_result:/result" joonas/sql2json sqlite3 /result/temp.db 16 | PRAGMA synchronous=OFF; 17 | PRAGMA journal_mode=MEMORY; 18 | BEGIN TRANSACTION; 19 | .read /result/Chinook_Sqlite.sql 20 | END TRANSACTION; 21 | ``` 22 | 23 | Now SQLite will import the SQL file. If you used the `Chinook_Sqlite.sql`, you're going to get one error like this: 24 | 25 | ``` 26 | Error: near line 1: near "": syntax error 27 | ``` 28 | 29 | Don't mind it - it's probably just confused about a blank line. 30 | 31 | It's going to stay quiet for a while, but that means it's working. Just wait until you get `sqlite> ` prompt. 32 | 33 | After it's done, hit `Ctrl + d` to exit from SQLite (and the container) 34 | 35 | Now you should have `sql2json_result/temp.db`. That sql2json can directly dump into JSON. Move on to next heading. 36 | 37 | 38 | I have an SQLite-formatted .db file now 39 | ======================================= 40 | 41 | Now the database should be at `transformer_result/temp.db`. 42 | 43 | ``` 44 | $ docker run --rm -it -v "$(pwd)/sql2json_result:/result" -e "DSN=,,sqlite:/result/temp.db" joonas/sql2json 45 | 2016-12-21 22:10:55 - Username: (no username) 46 | 2016-12-21 22:10:55 - Password: (no password) 47 | 2016-12-21 22:10:55 - Connecting to DSN sqlite:/result/temp.db 48 | 2016-12-21 22:10:55 - Listing tables 49 | 2016-12-21 22:10:55 - Skipping schema fetch - only know how to do it for MySQL 50 | 2016-12-21 22:10:55 - Dumping Album 51 | 2016-12-21 22:10:55 - Wrote 347 rows to /result/data/Album.json.gz 52 | 2016-12-21 22:10:55 - Dumping Artist 53 | 2016-12-21 22:10:55 - Wrote 275 rows to /result/data/Artist.json.gz 54 | 2016-12-21 22:10:55 - Dumping Customer 55 | 2016-12-21 22:10:55 - Wrote 59 rows to /result/data/Customer.json.gz 56 | 2016-12-21 22:10:55 - Dumping Employee 57 | 2016-12-21 22:10:55 - Wrote 8 rows to /result/data/Employee.json.gz 58 | 2016-12-21 22:10:55 - Dumping Genre 59 | 2016-12-21 22:10:55 - Wrote 25 rows to /result/data/Genre.json.gz 60 | 2016-12-21 22:10:55 - Dumping Invoice 61 | 2016-12-21 22:10:55 - Wrote 412 rows to /result/data/Invoice.json.gz 62 | 2016-12-21 22:10:55 - Dumping InvoiceLine 63 | 2016-12-21 22:10:55 - Wrote 2240 rows to /result/data/InvoiceLine.json.gz 64 | 2016-12-21 22:10:55 - Dumping MediaType 65 | 2016-12-21 22:10:55 - Wrote 5 rows to /result/data/MediaType.json.gz 66 | 2016-12-21 22:10:55 - Dumping Playlist 67 | 2016-12-21 22:10:55 - Wrote 18 rows to /result/data/Playlist.json.gz 68 | 2016-12-21 22:10:55 - Dumping PlaylistTrack 69 | 2016-12-21 22:10:55 - Wrote 8715 rows to /result/data/PlaylistTrack.json.gz 70 | 2016-12-21 22:10:55 - Dumping Track 71 | 2016-12-21 22:10:56 - Wrote 3503 rows to /result/data/Track.json.gz 72 | 2016-12-21 22:10:56 - Done, exported 11 tables 73 | 74 | ``` 75 | 76 | Now scoot over to the MySQL variant of tutorial to see the rest! 77 | -------------------------------------------------------------------------------- /example_dataset/Chinook_MySql.sql.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/function61/sql2json/f0e24db0082e8eaf452b89c4c447fba7fa7742aa/example_dataset/Chinook_MySql.sql.gz -------------------------------------------------------------------------------- /example_dataset/Chinook_PostgreSql.sql.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/function61/sql2json/f0e24db0082e8eaf452b89c4c447fba7fa7742aa/example_dataset/Chinook_PostgreSql.sql.gz -------------------------------------------------------------------------------- /example_dataset/Chinook_Sqlite.sql.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/function61/sql2json/f0e24db0082e8eaf452b89c4c447fba7fa7742aa/example_dataset/Chinook_Sqlite.sql.gz -------------------------------------------------------------------------------- /misc/asciinema_script.sh: -------------------------------------------------------------------------------- 1 | 2 | PS1='\[\033[0;33m\]~/sql2json \$ \[\033[0m\]' 3 | 4 | 5 | # here's the .sql file we have that we would like to export as JSON 6 | cat example_dataset/Chinook_PostgreSql.sql.gz | gzip -d | head -30 7 | 8 | # first we'll launch Postgres instance so we can import the data there 9 | docker run -d --name sql2json-dbserver -p 5432:5432 kiasaki/alpine-postgres:9.5 10 | 11 | # then we'll create an empty database 12 | echo 'CREATE DATABASE chinook' | docker exec -i sql2json-dbserver psql -U postgres postgres 13 | 14 | # now we'll import the data into it 15 | cat example_dataset/Chinook_PostgreSql.sql.gz | gzip -d | docker exec -i sql2json-dbserver psql -U postgres chinook > /dev/null 16 | 17 | # let's find out the IP address of the database server 18 | docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' sql2json-dbserver 19 | 20 | # create directory for the resulting JSON files 21 | mkdir sql2json_result 22 | 23 | # and now export the database as JSON 24 | docker run --rm -it -v "$(pwd)/sql2json_result:/result" -e "DSN=postgres,,pgsql:dbname=chinook;host=172.17.0.2;port=5432" joonas/sql2json 25 | 26 | # take a look at the resulting file structure 27 | tree sql2json_result/ 28 | 29 | # now you can work with the dataset from the commandline 30 | cat sql2json_result/data/Album.json.gz | gzip -d | jq '.[0]' 31 | 32 | # even sort the whole table by "Title" column 33 | cat sql2json_result/data/Album.json.gz | gzip -d | jq '.[].Title' | sort | head -10 34 | 35 | # destroy the database instance 36 | docker rm -f sql2json-dbserver 37 | 38 | # thanks for watching! 39 | -------------------------------------------------------------------------------- /src/JsonStreamWriter.php: -------------------------------------------------------------------------------- 1 | handle = $handle; 9 | } 10 | 11 | public function handleAny($item) { 12 | $type = gettype($item); 13 | 14 | switch ($type) { 15 | case 'boolean': 16 | $this->push($item ? 'true' : 'false'); 17 | break; 18 | case 'string': 19 | $this->handleString($item); 20 | break; 21 | case 'integer': 22 | case 'double': 23 | $this->push($item); 24 | break; 25 | case 'NULL': // uppercase, of course 26 | $this->push('null'); 27 | break; 28 | case 'array': 29 | case 'object': 30 | // PHP is such a bastardized language, in which you cannot distinguish lists, 31 | // maps and sets from each other. we now have to resort to guessing what the user meant: 32 | if ($type === 'array' && self::looksLikeAList($item)) { 33 | $this->handleArray($item); 34 | } else { 35 | $this->handleObject($item); 36 | } 37 | break; 38 | default: 39 | throw new \Exception('Unknown type: ' . $type); 40 | } 41 | } 42 | 43 | public function handleObject($obj) { 44 | $this->push('{'); 45 | 46 | $first = true; 47 | 48 | foreach ($obj as $key => $value) { 49 | if (!$first) { 50 | $this->push(', '); 51 | } else { 52 | $first = false; 53 | } 54 | 55 | $this->handleString($key); 56 | 57 | $this->push(': '); 58 | 59 | $this->handleAny($value); 60 | } 61 | 62 | $this->push('}'); 63 | } 64 | 65 | public function handleArray($arr) { 66 | $this->arrayBegin(); 67 | 68 | foreach ($arr as $item) { 69 | $this->arrayItem($item); 70 | } 71 | 72 | $this->arrayEnd(); 73 | } 74 | 75 | public function arrayBegin() { 76 | $this->push("[\n"); 77 | 78 | array_push($this->stateStack, array('firstItem' => true)); 79 | } 80 | 81 | public function arrayItem($item) { 82 | $stateStackIdx = count($this->stateStack) - 1; 83 | 84 | $firstItem = $this->stateStack[$stateStackIdx]['firstItem']; 85 | 86 | if ($firstItem) { // flip 87 | $this->stateStack[$stateStackIdx]['firstItem'] = false; 88 | } else { 89 | $this->push(", \n"); 90 | } 91 | 92 | $this->handleAny($item); 93 | } 94 | 95 | public function arrayEnd() { 96 | $this->push("\n]\n"); 97 | 98 | array_pop($this->stateStack); 99 | } 100 | 101 | public function handleString($val) { 102 | $val_escaped = str_replace( 103 | array('\\', '"', "\r", "\n", "\t"), 104 | array('\\\\', '\\"', '\\r', '\\n', '\\t'), 105 | $val 106 | ); 107 | 108 | $this->push('"' . $val_escaped . '"'); 109 | } 110 | 111 | private function push($chunk) { 112 | // fwrite() is buffered, so it is ok throw small chunks at it 113 | fwrite($this->handle, $chunk); 114 | } 115 | 116 | private static function looksLikeAList($arr) { 117 | $itemCount = count($arr); 118 | 119 | // looks like a list if array empty OR there exists a key at indices 0 and itemCount - 1 120 | return $itemCount === 0 || (isset($arr[0]) && isset($arr[$itemCount - 1])); 121 | } 122 | } 123 | -------------------------------------------------------------------------------- /src/transform.php: -------------------------------------------------------------------------------- 1 | query($sql); 22 | $query->execute(); 23 | 24 | $tables = $query->fetchAll(PDO::FETCH_COLUMN, 0); 25 | 26 | return $tables; 27 | } 28 | } 29 | 30 | class MysqlAdapter implements DbAdapter { 31 | // with MySQL we cannot use quotes. backticks would probably work IIRC 32 | public function selectSql($table) { return "SELECT * FROM $table"; } 33 | 34 | public function listTables($conn) { 35 | $sql = "SHOW TABLES"; 36 | 37 | $query = $conn->query($sql); 38 | $query->execute(); 39 | 40 | $tables = $query->fetchAll(PDO::FETCH_COLUMN, 0); 41 | 42 | return $tables; 43 | } 44 | } 45 | 46 | class SqliteAdapter implements DbAdapter { 47 | public function selectSql($table) { return "SELECT * FROM \"$table\""; } 48 | 49 | public function listTables($conn) { 50 | $sql = "SELECT name FROM sqlite_master WHERE type='table'"; 51 | 52 | $query = $conn->query($sql); 53 | $query->execute(); 54 | 55 | $tables = $query->fetchAll(PDO::FETCH_COLUMN, 0); 56 | 57 | return $tables; 58 | } 59 | } 60 | 61 | $matches = null; 62 | 63 | // DSN environment variable format: 64 | // ,, 65 | if (!preg_match('/^([^,]*),([^,]*),(.+)$/', getenv('DSN'), $matches)) { 66 | throw new \Exception('Failed to parse DSN environment variable'); 67 | } 68 | 69 | $username = $matches[1]; 70 | $password = $matches[2]; 71 | $dsn = $matches[3]; 72 | 73 | function makeDirectoryIfNotExists($path) { 74 | if (!file_exists($path)) { 75 | if (!mkdir($path)) { 76 | throw new \Exception('Failed to mkdir(): ' . $path); 77 | } 78 | } 79 | } 80 | 81 | makeDirectoryIfNotExists(DATA_DIR); 82 | makeDirectoryIfNotExists(SCHEMA_DIR); 83 | 84 | function info($message) { 85 | $date = date('Y-m-d H:i:s'); 86 | 87 | print "$date - $message\n"; 88 | } 89 | 90 | class CompressingJsonStreamFileWriter { 91 | public $writer; 92 | 93 | private $handle; 94 | private $path; 95 | 96 | public function __construct($path) { 97 | $this->path = $path; 98 | 99 | // FIXME: specify level 4-5 for compression ratio 100 | $specialWrapper = strpos($this->path, '.gz') !== false ? 'compress.zlib://' : ''; 101 | 102 | $this->handle = fopen($specialWrapper . $this->path, 'w'); 103 | if (!$this->handle) { 104 | throw new \Exception('Failed to fopen(..., "w"): ' . $specialWrapper . $this->path); 105 | } 106 | 107 | $this->writer = new JsonStreamWriter($this->handle); 108 | } 109 | 110 | public function getPath() { 111 | return $this->path; 112 | } 113 | 114 | public function close() { 115 | fclose($this->handle); 116 | } 117 | } 118 | 119 | function writeSchema($conn, $tables) { 120 | info('Fetching schema'); 121 | 122 | $combinedSchema = array(); 123 | 124 | foreach ($tables as $tableName) { 125 | $tableFields = array(); 126 | 127 | $query = $conn->query('DESCRIBE ' . $tableName); 128 | $query->execute(); 129 | 130 | while ($columnDescription = $query->fetch(PDO::FETCH_ASSOC)) { 131 | $tableFields[] = $columnDescription; 132 | } 133 | 134 | $tables = $query->fetchAll(PDO::FETCH_COLUMN, 0); 135 | 136 | $oneTableSchema = array('name' => $tableName, 'fields' => $tableFields); 137 | 138 | $combinedSchema[] = $oneTableSchema; 139 | 140 | file_put_contents('/result/schema/' . $tableName . '.json', json_encode($oneTableSchema, JSON_PRETTY_PRINT)); 141 | info('Wrote /result/schema/' . $tableName . '.json'); 142 | } 143 | 144 | file_put_contents('/result/combined_schema.json', json_encode($combinedSchema, JSON_PRETTY_PRINT)); 145 | info('Wrote /result/combined_schema.json'); 146 | } 147 | 148 | function dumpSql($conn, $sql, $filename) { 149 | $query = $conn->query($sql); 150 | $query->execute(); 151 | 152 | $json = new CompressingJsonStreamFileWriter($filename); 153 | $json->writer->arrayBegin(); 154 | 155 | $rowCount = 0; 156 | 157 | while ($row = $query->fetch(PDO::FETCH_ASSOC)) { 158 | $json->writer->arrayItem($row); 159 | 160 | $rowCount++; 161 | } 162 | 163 | $json->writer->arrayEnd(); 164 | 165 | $json->close(); 166 | 167 | info('Wrote ' . $rowCount . ' rows to ' . $json->getPath()); 168 | } 169 | 170 | function dumpTable($conn, $adapter, $table) { 171 | info('Dumping ' . $table); 172 | 173 | dumpSql($conn, $adapter->selectSql($table), '/result/data/' . $table . '.json.gz'); 174 | } 175 | 176 | info($username !== '' ? 'Using username: ' . $username : 'Username: (no username)'); 177 | info($password !== '' ? 'Using password: ********' : 'Password: (no password)'); 178 | info('Connecting to DSN ' . $dsn); 179 | 180 | $conn = new PDO($dsn, $username, $password, array( 181 | PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION 182 | )); 183 | 184 | $driverName = $conn->getAttribute(PDO::ATTR_DRIVER_NAME); 185 | 186 | $adapter = null; 187 | 188 | switch ($driverName) { 189 | case 'sqlite': 190 | $adapter = new SqliteAdapter(); 191 | break; 192 | case 'mysql': 193 | $adapter = new MysqlAdapter(); 194 | break; 195 | case 'pgsql': 196 | $adapter = new PostgresAdapter(); 197 | break; 198 | default: 199 | throw new \Exception("Unsupported driver: $driverName"); 200 | break; 201 | } 202 | 203 | info('Listing tables'); 204 | $tables = $adapter->listTables($conn); 205 | 206 | if ($adapter instanceof MysqlAdapter) { 207 | writeSchema($conn, $tables); 208 | } else { 209 | info('Skipping schema fetch - only know how to do it for MySQL'); 210 | } 211 | 212 | $customSql = getenv('SQL'); 213 | 214 | if ($customSql) { 215 | info("Custom SQL statement specified: $customSql"); 216 | 217 | dumpSql($conn, $customSql, '/result/data/custom_' . time() . '.json.gz'); 218 | } 219 | else { 220 | foreach($tables as $table) { 221 | dumpTable($conn, $adapter, $table); 222 | } 223 | 224 | info('Exported ' . count($tables) . ' tables'); 225 | } 226 | 227 | info('Done'); 228 | --------------------------------------------------------------------------------