├── .gitignore ├── example.png ├── vega-zero.png ├── ncNet-VIS21.pdf ├── example-jupyter.png ├── Examples └── payments.csv ├── LICENSE ├── README.md └── VegaZero2VegaLite.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | .ipynb_checkpoints 3 | __pycache__ 4 | .DS_Store 5 | -------------------------------------------------------------------------------- /example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HKUSTDial/Vega-Zero/HEAD/example.png -------------------------------------------------------------------------------- /vega-zero.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HKUSTDial/Vega-Zero/HEAD/vega-zero.png -------------------------------------------------------------------------------- /ncNet-VIS21.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HKUSTDial/Vega-Zero/HEAD/ncNet-VIS21.pdf -------------------------------------------------------------------------------- /example-jupyter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HKUSTDial/Vega-Zero/HEAD/example-jupyter.png -------------------------------------------------------------------------------- /Examples/payments.csv: -------------------------------------------------------------------------------- 1 | payment_id,booking_id,customer_id,payment_type_code,amount_paid_in_full_yn,payment_date,amount_due,amount_paid 2 | 1,6,15,check,1,2018-03-09 16:28:00,369.52,206.27 3 | 2,9,12,cash,1,2018-03-03 13:39:44,278.6,666.45 4 | 3,5,7,credit card,0,2018-03-22 15:00:23,840.06,135.7 5 | 4,6,1,check,0,2018-03-22 02:28:11,678.29,668.4 6 | 5,8,11,cash,1,2018-03-23 20:36:04,830.25,305.65 7 | 6,15,8,check,0,2018-03-19 12:39:31,410.1,175.54 8 | 7,1,8,cash,1,2018-03-02 06:25:45,482.26,602.8 9 | 8,9,14,cash,1,2018-03-12 23:00:55,653.18,505.23 10 | 9,3,7,direct debit,0,2018-03-12 23:23:56,686.85,321.58 11 | 10,13,10,credit card,1,2018-03-23 13:24:33,486.75,681.21 12 | 11,14,15,credit card,1,2018-03-03 03:07:00,259.18,464.06 13 | 12,14,9,cash,0,2018-02-27 10:50:39,785.73,685.32 14 | 13,15,14,direct debit,0,2018-03-03 14:22:51,665.58,307.14 15 | 14,5,5,direct debit,1,2018-03-17 15:51:52,407.51,704.41 16 | 15,4,12,credit card,1,2018-03-17 03:07:45,631.93,334.2 17 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Yuyu Luo 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Vega-Zero 2 | 3 | Vega-Zero is a visualization grammar by simplifying Vega-Lite, with the main purpose to flatten a hierarchical Vega-Lite specification to a sequence-based specification. 4 | 5 | Thus, it is much easier to use Vega-Zero to train a sequence-to-sequence model for generating a sequence output. 6 | Vega-Zero can be used to support some learning tasks, e.g., translating a natural language query to visualization. 7 | 8 | Please refer to our [paper](https://github.com/Thanksyy/Vega-Zero/blob/main/ncNet-VIS21.pdf) at IEEE VIS 2021 for more details. 9 | 10 | ## Definition 11 | 12 | Vega-Zero keeps most of the keywords of the Vega-Lite about the mapping between visual encoding channels and (transformed) data variables. It flattens a JSON object into a sequence of keywords by removing structure-aware symbols such as brackets, colons, and quotation marks. Formally, a unit specification in Vega-Zero is a four tuple (similar to Vega-Lite but with each tuple being a sequence) as: 13 | 14 | **unit** = (**mark**, **data**, **encoding**, **transform**) 15 | 16 | Naturally, as a simplification of Vega-Lite: 17 | 1. **mark** denotes the chart type, including *bar*, *line*, *point* (for scatter chart), *arc* (for pie chart); 18 | 2. **data** specifies the source data; 19 | 3. **encoding** contains *x*/*y*-axis, *aggregate* function, and *color* based on which column; 20 | 4. **transform** defines some data transformation functions: *filter*, *bin*, *group*, *sort*, and *top-k*. 21 | 22 | 23 | 24 | ## Example 25 | 26 | Below is an example to show the connection between Vega-Zero and Vega-Lite. 27 | 28 | 29 | 30 | 31 | ## Convert Vega-Zero to Vega-Lite specification 32 | 33 | In this repository, we provide a Python script to convert a Vega-Zero specification to a Vega-Lite specification. 34 | 35 | Below is an example to run this Python script in the Jupyter Notebook. 36 | 37 | 38 | 39 | ## How to use? 40 | 41 | Please follow the examples in the ```examples.ipynb```, if you want to render the visualization result in Jupyter Notebook (or Lab), please follow the instruction of [IPython Vega](https://github.com/vega/ipyvega). 42 | 43 | ## Citing Vega-Zero 44 | 45 | ```bibTeX 46 | @ARTICLE{ncnet, 47 | author={Luo, Yuyu and Tang, Nan and Li, Guoliang and Tang, Jiawei and Chai, Chengliang and Qin, Xuedi}, 48 | journal={IEEE Transactions on Visualization and Computer Graphics}, 49 | title={Natural Language to Visualization by Neural Machine Translation}, 50 | year={2021}, 51 | volume={}, 52 | number={}, 53 | pages={1-1}, doi={10.1109/TVCG.2021.3114848}} 54 | ``` 55 | 56 | ## License 57 | The software is available under the [MIT License](https://github.com/Thanksyy/Vega-Zero/blob/main/README.md). 58 | 59 | ## Contact 60 | If you have any questions, feel free to contact Yuyu Luo (yuyuluo [AT] hkust-gz.edu.cn). 61 | -------------------------------------------------------------------------------- /VegaZero2VegaLite.py: -------------------------------------------------------------------------------- 1 | __author__ = "Yuyu Luo" 2 | 3 | import json 4 | import pandas 5 | 6 | class VegaZero2VegaLite(object): 7 | def __init__(self): 8 | pass 9 | 10 | def parse_vegaZero(self, vega_zero): 11 | self.parsed_vegaZero = { 12 | 'mark': '', 13 | 'data': '', 14 | 'encoding': { 15 | 'x': '', 16 | 'y': { 17 | 'aggregate': '', 18 | 'y': '' 19 | }, 20 | 'color': { 21 | 'z': '' 22 | } 23 | }, 24 | 'transform': { 25 | 'filter': '', 26 | 'group': '', 27 | 'bin': { 28 | 'axis': '', 29 | 'type': '' 30 | }, 31 | 'sort': { 32 | 'axis': '', 33 | 'type': '' 34 | }, 35 | 'topk': '' 36 | } 37 | } 38 | vega_zero_keywords = vega_zero.split(' ') 39 | 40 | self.parsed_vegaZero['mark'] = vega_zero_keywords[vega_zero_keywords.index('mark') + 1] 41 | self.parsed_vegaZero['data'] = vega_zero_keywords[vega_zero_keywords.index('data') + 1] 42 | self.parsed_vegaZero['encoding']['x'] = vega_zero_keywords[vega_zero_keywords.index('x') + 1] 43 | self.parsed_vegaZero['encoding']['y']['y'] = vega_zero_keywords[vega_zero_keywords.index('aggregate') + 2] 44 | self.parsed_vegaZero['encoding']['y']['aggregate'] = vega_zero_keywords[vega_zero_keywords.index('aggregate') + 1] 45 | if 'color' in vega_zero_keywords: 46 | self.parsed_vegaZero['encoding']['color']['z'] = vega_zero_keywords[vega_zero_keywords.index('color') + 1] 47 | 48 | if 'topk' in vega_zero_keywords: 49 | self.parsed_vegaZero['transform']['topk'] = vega_zero_keywords[vega_zero_keywords.index('topk') + 1] 50 | 51 | if 'sort' in vega_zero_keywords: 52 | self.parsed_vegaZero['transform']['sort']['axis'] = vega_zero_keywords[vega_zero_keywords.index('sort') + 1] 53 | self.parsed_vegaZero['transform']['sort']['type'] = vega_zero_keywords[vega_zero_keywords.index('sort') + 2] 54 | 55 | if 'group' in vega_zero_keywords: 56 | self.parsed_vegaZero['transform']['group'] = vega_zero_keywords[vega_zero_keywords.index('group') + 1] 57 | 58 | if 'bin' in vega_zero_keywords: 59 | self.parsed_vegaZero['transform']['bin']['axis'] = vega_zero_keywords[vega_zero_keywords.index('bin') + 1] 60 | self.parsed_vegaZero['transform']['bin']['type'] = vega_zero_keywords[vega_zero_keywords.index('bin') + 3] 61 | 62 | if 'filter' in vega_zero_keywords: 63 | 64 | filter_part_token = [] 65 | for each in vega_zero_keywords[vega_zero_keywords.index('filter') + 1:]: 66 | if each not in ['group', 'bin', 'sort', 'topk']: 67 | filter_part_token.append(each) 68 | else: 69 | break 70 | 71 | if 'between' in filter_part_token: 72 | filter_part_token[filter_part_token.index('between') + 2] = 'and ' + filter_part_token[ 73 | filter_part_token.index('between') - 1] + ' <=' 74 | filter_part_token[filter_part_token.index('between')] = '>=' 75 | 76 | # replace 'and' -- 'or' 77 | filter_part_token = ' '.join(filter_part_token).split() 78 | filter_part_token = ['&' if x == 'and' else x for x in filter_part_token] 79 | filter_part_token = ['|' if x == 'or' else x for x in filter_part_token] 80 | 81 | if '&' in filter_part_token or '|' in filter_part_token: 82 | final_filter_part = '' 83 | each_conditions = [] 84 | for i in range(len(filter_part_token)): 85 | each = filter_part_token[i] 86 | if each != '&' and each != '|': 87 | # ’=‘ in SQL --to--> ’==‘ in Vega-Lite 88 | if each == '=': 89 | each = '==' 90 | each_conditions.append(each) 91 | if each == '&' or each == '|' or i == len(filter_part_token) - 1: 92 | # each = '&' or '|' 93 | if 'like' == each_conditions[1]: 94 | # only consider this case: '%a%' 95 | if each_conditions[2][1] == '%' and each_conditions[2][len(each_conditions[2]) - 2] == '%': 96 | final_filter_part += 'indexof(' + 'datum.' + each_conditions[0] + ',"' + \ 97 | each_conditions[2][2:len(each_conditions[2]) - 2] + '") != -1' 98 | elif 'like' == each_conditions[2] and 'not' == each_conditions[1]: 99 | 100 | if each_conditions[3][1] == '%' and each_conditions[3][len(each_conditions[3]) - 2] == '%': 101 | final_filter_part += 'indexof(' + 'datum.' + each_conditions[0] + ',"' + \ 102 | each_conditions[3][2:len(each_conditions[3]) - 2] + '") == -1' 103 | else: 104 | final_filter_part += 'datum.' + ' '.join(each_conditions) 105 | 106 | if i != len(filter_part_token) - 1: 107 | final_filter_part += ' ' + each + ' ' 108 | each_conditions = [] 109 | 110 | self.parsed_vegaZero['transform']['filter'] = final_filter_part 111 | 112 | else: 113 | # only single filter condition 114 | self.parsed_vegaZero['transform']['filter'] = 'datum.' + ' '.join(filter_part_token).strip() 115 | 116 | return self.parsed_vegaZero 117 | 118 | def to_VegaLite(self, vega_zero, dataframe=None): 119 | self.VegaLiteSpec = { 120 | 'bar': { 121 | "mark": "bar", 122 | "encoding": { 123 | "x": {"field": "x", "type": "nominal"}, 124 | "y": {"field": "y", "type": "quantitative"} 125 | } 126 | }, 127 | 'arc': { 128 | "mark": "arc", 129 | "encoding": { 130 | "color": {"field": "x", "type": "nominal"}, 131 | "theta": {"field": "y", "type": "quantitative"} 132 | } 133 | }, 134 | 'line': { 135 | "mark": "line", 136 | "encoding": { 137 | "x": {"field": "x", "type": "nominal"}, 138 | "y": {"field": "y", "type": "quantitative"} 139 | } 140 | }, 141 | 'point': { 142 | "mark": "point", 143 | "encoding": { 144 | "x": {"field": "x", "type": "quantitative"}, 145 | "y": {"field": "y", "type": "quantitative"} 146 | } 147 | } 148 | } 149 | 150 | VegaZero = self.parse_vegaZero(vega_zero) 151 | 152 | # assign some vega-zero keywords to the VegaLiteSpec object 153 | if isinstance(dataframe, pandas.core.frame.DataFrame): 154 | self.VegaLiteSpec[VegaZero['mark']]['data'] = dict() 155 | self.VegaLiteSpec[VegaZero['mark']]['data']['values'] = json.loads(dataframe.to_json(orient='records')) 156 | 157 | if VegaZero['mark'] != 'arc': 158 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['field'] = VegaZero['encoding']['x'] 159 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['y']['field'] = VegaZero['encoding']['y']['y'] 160 | if VegaZero['encoding']['y']['aggregate'] != '' and VegaZero['encoding']['y']['aggregate'] != 'none': 161 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['y']['aggregate'] = VegaZero['encoding']['y']['aggregate'] 162 | else: 163 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['color']['field'] = VegaZero['encoding']['x'] 164 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['theta']['field'] = VegaZero['encoding']['y']['y'] 165 | if VegaZero['encoding']['y']['aggregate'] != '' and VegaZero['encoding']['y']['aggregate'] != 'none': 166 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['theta']['aggregate'] = VegaZero['encoding']['y'][ 167 | 'aggregate'] 168 | 169 | if VegaZero['encoding']['color']['z'] != '': 170 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['color'] = { 171 | 'field': VegaZero['encoding']['color']['z'], 'type': 'nominal' 172 | } 173 | 174 | # it seems that the group will be performed by VegaLite defaultly, in our cases. 175 | if VegaZero['transform']['group'] != '': 176 | pass 177 | 178 | if VegaZero['transform']['bin']['axis'] != '': 179 | if VegaZero['transform']['bin']['axis'] == 'x': 180 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['type'] = 'temporal' 181 | if VegaZero['transform']['bin']['type'] in ['date', 'year', 'week', 'month']: 182 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['timeUnit'] = VegaZero['transform']['bin']['type'] 183 | elif VegaZero['transform']['bin']['type'] == 'weekday': 184 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['timeUnit'] = 'week' 185 | else: 186 | print('Unknown binning step.') 187 | 188 | if VegaZero['transform']['filter'] != '': 189 | if 'transform' not in self.VegaLiteSpec[VegaZero['mark']]: 190 | self.VegaLiteSpec[VegaZero['mark']]['transform'] = [{ 191 | "filter": VegaZero['transform']['filter'] 192 | }] 193 | elif 'filter' not in self.VegaLiteSpec[VegaZero['mark']]['transform']: 194 | self.VegaLiteSpec[VegaZero['mark']]['transform'].append({ 195 | "filter": VegaZero['transform']['filter'] 196 | }) 197 | else: 198 | self.VegaLiteSpec[VegaZero['mark']]['transform']['filter'] += ' & ' + VegaZero['transform']['filter'] 199 | 200 | if VegaZero['transform']['topk'] != '': 201 | if VegaZero['transform']['sort']['axis'] == 'x': 202 | sort_field = VegaZero['encoding']['x'] 203 | elif VegaZero['transform']['sort']['axis'] == 'y': 204 | sort_field = VegaZero['encoding']['y']['y'] 205 | else: 206 | print('Unknown sorting field: ', VegaZero['transform']['sort']['axis']) 207 | sort_field = VegaZero['transform']['sort']['axis'] 208 | if VegaZero['transform']['sort']['type'] == 'desc': 209 | sort_order = 'descending' 210 | else: 211 | sort_order = 'ascending' 212 | if 'transform' in self.VegaLiteSpec[VegaZero['mark']]: 213 | current_filter = self.VegaLiteSpec[VegaZero['mark']]['transform'][0]['filter'] 214 | self.VegaLiteSpec[VegaZero['mark']]['transform'][0][ 215 | 'filter'] = current_filter + ' & ' + "datum.rank <= " + str(VegaZero['transform']['topk']) 216 | self.VegaLiteSpec[VegaZero['mark']]['transform'].insert(0, { 217 | "window": [{ 218 | "field": sort_field, 219 | "op": "dense_rank", 220 | "as": "rank" 221 | }], 222 | "sort": [{"field": sort_field, "order": sort_order}] 223 | }) 224 | else: 225 | self.VegaLiteSpec[VegaZero['mark']]['transform'] = [ 226 | { 227 | "window": [{ 228 | "field": sort_field, 229 | "op": "dense_rank", 230 | "as": "rank" 231 | }], 232 | "sort": [{"field": sort_field, "order": sort_order}] 233 | }, 234 | { 235 | "filter": "datum.rank <= " + str(VegaZero['transform']['topk']) 236 | } 237 | ] 238 | 239 | if VegaZero['transform']['sort']['axis'] != '': 240 | if VegaZero['transform']['sort']['axis'] == 'x': 241 | if VegaZero['transform']['sort']['type'] == 'desc': 242 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['y']['sort'] = '-x' 243 | else: 244 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['y']['sort'] = 'x' 245 | else: 246 | if VegaZero['transform']['sort']['type'] == 'desc': 247 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['sort'] = '-y' 248 | else: 249 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['sort'] = 'y' 250 | 251 | return self.VegaLiteSpec[VegaZero['mark']] 252 | 253 | --------------------------------------------------------------------------------