├── .gitignore
├── example.png
├── vega-zero.png
├── ncNet-VIS21.pdf
├── example-jupyter.png
├── Examples
└── payments.csv
├── LICENSE
├── README.md
└── VegaZero2VegaLite.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea
2 | .ipynb_checkpoints
3 | __pycache__
4 | .DS_Store
5 |
--------------------------------------------------------------------------------
/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HKUSTDial/Vega-Zero/HEAD/example.png
--------------------------------------------------------------------------------
/vega-zero.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HKUSTDial/Vega-Zero/HEAD/vega-zero.png
--------------------------------------------------------------------------------
/ncNet-VIS21.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HKUSTDial/Vega-Zero/HEAD/ncNet-VIS21.pdf
--------------------------------------------------------------------------------
/example-jupyter.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HKUSTDial/Vega-Zero/HEAD/example-jupyter.png
--------------------------------------------------------------------------------
/Examples/payments.csv:
--------------------------------------------------------------------------------
1 | payment_id,booking_id,customer_id,payment_type_code,amount_paid_in_full_yn,payment_date,amount_due,amount_paid
2 | 1,6,15,check,1,2018-03-09 16:28:00,369.52,206.27
3 | 2,9,12,cash,1,2018-03-03 13:39:44,278.6,666.45
4 | 3,5,7,credit card,0,2018-03-22 15:00:23,840.06,135.7
5 | 4,6,1,check,0,2018-03-22 02:28:11,678.29,668.4
6 | 5,8,11,cash,1,2018-03-23 20:36:04,830.25,305.65
7 | 6,15,8,check,0,2018-03-19 12:39:31,410.1,175.54
8 | 7,1,8,cash,1,2018-03-02 06:25:45,482.26,602.8
9 | 8,9,14,cash,1,2018-03-12 23:00:55,653.18,505.23
10 | 9,3,7,direct debit,0,2018-03-12 23:23:56,686.85,321.58
11 | 10,13,10,credit card,1,2018-03-23 13:24:33,486.75,681.21
12 | 11,14,15,credit card,1,2018-03-03 03:07:00,259.18,464.06
13 | 12,14,9,cash,0,2018-02-27 10:50:39,785.73,685.32
14 | 13,15,14,direct debit,0,2018-03-03 14:22:51,665.58,307.14
15 | 14,5,5,direct debit,1,2018-03-17 15:51:52,407.51,704.41
16 | 15,4,12,credit card,1,2018-03-17 03:07:45,631.93,334.2
17 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2021 Yuyu Luo
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Vega-Zero
2 |
3 | Vega-Zero is a visualization grammar by simplifying Vega-Lite, with the main purpose to flatten a hierarchical Vega-Lite specification to a sequence-based specification.
4 |
5 | Thus, it is much easier to use Vega-Zero to train a sequence-to-sequence model for generating a sequence output.
6 | Vega-Zero can be used to support some learning tasks, e.g., translating a natural language query to visualization.
7 |
8 | Please refer to our [paper](https://github.com/Thanksyy/Vega-Zero/blob/main/ncNet-VIS21.pdf) at IEEE VIS 2021 for more details.
9 |
10 | ## Definition
11 |
12 | Vega-Zero keeps most of the keywords of the Vega-Lite about the mapping between visual encoding channels and (transformed) data variables. It flattens a JSON object into a sequence of keywords by removing structure-aware symbols such as brackets, colons, and quotation marks. Formally, a unit specification in Vega-Zero is a four tuple (similar to Vega-Lite but with each tuple being a sequence) as:
13 |
14 | **unit** = (**mark**, **data**, **encoding**, **transform**)
15 |
16 | Naturally, as a simplification of Vega-Lite:
17 | 1. **mark** denotes the chart type, including *bar*, *line*, *point* (for scatter chart), *arc* (for pie chart);
18 | 2. **data** specifies the source data;
19 | 3. **encoding** contains *x*/*y*-axis, *aggregate* function, and *color* based on which column;
20 | 4. **transform** defines some data transformation functions: *filter*, *bin*, *group*, *sort*, and *top-k*.
21 |
22 |
23 |
24 | ## Example
25 |
26 | Below is an example to show the connection between Vega-Zero and Vega-Lite.
27 |
28 |
29 |
30 |
31 | ## Convert Vega-Zero to Vega-Lite specification
32 |
33 | In this repository, we provide a Python script to convert a Vega-Zero specification to a Vega-Lite specification.
34 |
35 | Below is an example to run this Python script in the Jupyter Notebook.
36 |
37 |
38 |
39 | ## How to use?
40 |
41 | Please follow the examples in the ```examples.ipynb```, if you want to render the visualization result in Jupyter Notebook (or Lab), please follow the instruction of [IPython Vega](https://github.com/vega/ipyvega).
42 |
43 | ## Citing Vega-Zero
44 |
45 | ```bibTeX
46 | @ARTICLE{ncnet,
47 | author={Luo, Yuyu and Tang, Nan and Li, Guoliang and Tang, Jiawei and Chai, Chengliang and Qin, Xuedi},
48 | journal={IEEE Transactions on Visualization and Computer Graphics},
49 | title={Natural Language to Visualization by Neural Machine Translation},
50 | year={2021},
51 | volume={},
52 | number={},
53 | pages={1-1}, doi={10.1109/TVCG.2021.3114848}}
54 | ```
55 |
56 | ## License
57 | The software is available under the [MIT License](https://github.com/Thanksyy/Vega-Zero/blob/main/README.md).
58 |
59 | ## Contact
60 | If you have any questions, feel free to contact Yuyu Luo (yuyuluo [AT] hkust-gz.edu.cn).
61 |
--------------------------------------------------------------------------------
/VegaZero2VegaLite.py:
--------------------------------------------------------------------------------
1 | __author__ = "Yuyu Luo"
2 |
3 | import json
4 | import pandas
5 |
6 | class VegaZero2VegaLite(object):
7 | def __init__(self):
8 | pass
9 |
10 | def parse_vegaZero(self, vega_zero):
11 | self.parsed_vegaZero = {
12 | 'mark': '',
13 | 'data': '',
14 | 'encoding': {
15 | 'x': '',
16 | 'y': {
17 | 'aggregate': '',
18 | 'y': ''
19 | },
20 | 'color': {
21 | 'z': ''
22 | }
23 | },
24 | 'transform': {
25 | 'filter': '',
26 | 'group': '',
27 | 'bin': {
28 | 'axis': '',
29 | 'type': ''
30 | },
31 | 'sort': {
32 | 'axis': '',
33 | 'type': ''
34 | },
35 | 'topk': ''
36 | }
37 | }
38 | vega_zero_keywords = vega_zero.split(' ')
39 |
40 | self.parsed_vegaZero['mark'] = vega_zero_keywords[vega_zero_keywords.index('mark') + 1]
41 | self.parsed_vegaZero['data'] = vega_zero_keywords[vega_zero_keywords.index('data') + 1]
42 | self.parsed_vegaZero['encoding']['x'] = vega_zero_keywords[vega_zero_keywords.index('x') + 1]
43 | self.parsed_vegaZero['encoding']['y']['y'] = vega_zero_keywords[vega_zero_keywords.index('aggregate') + 2]
44 | self.parsed_vegaZero['encoding']['y']['aggregate'] = vega_zero_keywords[vega_zero_keywords.index('aggregate') + 1]
45 | if 'color' in vega_zero_keywords:
46 | self.parsed_vegaZero['encoding']['color']['z'] = vega_zero_keywords[vega_zero_keywords.index('color') + 1]
47 |
48 | if 'topk' in vega_zero_keywords:
49 | self.parsed_vegaZero['transform']['topk'] = vega_zero_keywords[vega_zero_keywords.index('topk') + 1]
50 |
51 | if 'sort' in vega_zero_keywords:
52 | self.parsed_vegaZero['transform']['sort']['axis'] = vega_zero_keywords[vega_zero_keywords.index('sort') + 1]
53 | self.parsed_vegaZero['transform']['sort']['type'] = vega_zero_keywords[vega_zero_keywords.index('sort') + 2]
54 |
55 | if 'group' in vega_zero_keywords:
56 | self.parsed_vegaZero['transform']['group'] = vega_zero_keywords[vega_zero_keywords.index('group') + 1]
57 |
58 | if 'bin' in vega_zero_keywords:
59 | self.parsed_vegaZero['transform']['bin']['axis'] = vega_zero_keywords[vega_zero_keywords.index('bin') + 1]
60 | self.parsed_vegaZero['transform']['bin']['type'] = vega_zero_keywords[vega_zero_keywords.index('bin') + 3]
61 |
62 | if 'filter' in vega_zero_keywords:
63 |
64 | filter_part_token = []
65 | for each in vega_zero_keywords[vega_zero_keywords.index('filter') + 1:]:
66 | if each not in ['group', 'bin', 'sort', 'topk']:
67 | filter_part_token.append(each)
68 | else:
69 | break
70 |
71 | if 'between' in filter_part_token:
72 | filter_part_token[filter_part_token.index('between') + 2] = 'and ' + filter_part_token[
73 | filter_part_token.index('between') - 1] + ' <='
74 | filter_part_token[filter_part_token.index('between')] = '>='
75 |
76 | # replace 'and' -- 'or'
77 | filter_part_token = ' '.join(filter_part_token).split()
78 | filter_part_token = ['&' if x == 'and' else x for x in filter_part_token]
79 | filter_part_token = ['|' if x == 'or' else x for x in filter_part_token]
80 |
81 | if '&' in filter_part_token or '|' in filter_part_token:
82 | final_filter_part = ''
83 | each_conditions = []
84 | for i in range(len(filter_part_token)):
85 | each = filter_part_token[i]
86 | if each != '&' and each != '|':
87 | # ’=‘ in SQL --to--> ’==‘ in Vega-Lite
88 | if each == '=':
89 | each = '=='
90 | each_conditions.append(each)
91 | if each == '&' or each == '|' or i == len(filter_part_token) - 1:
92 | # each = '&' or '|'
93 | if 'like' == each_conditions[1]:
94 | # only consider this case: '%a%'
95 | if each_conditions[2][1] == '%' and each_conditions[2][len(each_conditions[2]) - 2] == '%':
96 | final_filter_part += 'indexof(' + 'datum.' + each_conditions[0] + ',"' + \
97 | each_conditions[2][2:len(each_conditions[2]) - 2] + '") != -1'
98 | elif 'like' == each_conditions[2] and 'not' == each_conditions[1]:
99 |
100 | if each_conditions[3][1] == '%' and each_conditions[3][len(each_conditions[3]) - 2] == '%':
101 | final_filter_part += 'indexof(' + 'datum.' + each_conditions[0] + ',"' + \
102 | each_conditions[3][2:len(each_conditions[3]) - 2] + '") == -1'
103 | else:
104 | final_filter_part += 'datum.' + ' '.join(each_conditions)
105 |
106 | if i != len(filter_part_token) - 1:
107 | final_filter_part += ' ' + each + ' '
108 | each_conditions = []
109 |
110 | self.parsed_vegaZero['transform']['filter'] = final_filter_part
111 |
112 | else:
113 | # only single filter condition
114 | self.parsed_vegaZero['transform']['filter'] = 'datum.' + ' '.join(filter_part_token).strip()
115 |
116 | return self.parsed_vegaZero
117 |
118 | def to_VegaLite(self, vega_zero, dataframe=None):
119 | self.VegaLiteSpec = {
120 | 'bar': {
121 | "mark": "bar",
122 | "encoding": {
123 | "x": {"field": "x", "type": "nominal"},
124 | "y": {"field": "y", "type": "quantitative"}
125 | }
126 | },
127 | 'arc': {
128 | "mark": "arc",
129 | "encoding": {
130 | "color": {"field": "x", "type": "nominal"},
131 | "theta": {"field": "y", "type": "quantitative"}
132 | }
133 | },
134 | 'line': {
135 | "mark": "line",
136 | "encoding": {
137 | "x": {"field": "x", "type": "nominal"},
138 | "y": {"field": "y", "type": "quantitative"}
139 | }
140 | },
141 | 'point': {
142 | "mark": "point",
143 | "encoding": {
144 | "x": {"field": "x", "type": "quantitative"},
145 | "y": {"field": "y", "type": "quantitative"}
146 | }
147 | }
148 | }
149 |
150 | VegaZero = self.parse_vegaZero(vega_zero)
151 |
152 | # assign some vega-zero keywords to the VegaLiteSpec object
153 | if isinstance(dataframe, pandas.core.frame.DataFrame):
154 | self.VegaLiteSpec[VegaZero['mark']]['data'] = dict()
155 | self.VegaLiteSpec[VegaZero['mark']]['data']['values'] = json.loads(dataframe.to_json(orient='records'))
156 |
157 | if VegaZero['mark'] != 'arc':
158 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['field'] = VegaZero['encoding']['x']
159 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['y']['field'] = VegaZero['encoding']['y']['y']
160 | if VegaZero['encoding']['y']['aggregate'] != '' and VegaZero['encoding']['y']['aggregate'] != 'none':
161 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['y']['aggregate'] = VegaZero['encoding']['y']['aggregate']
162 | else:
163 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['color']['field'] = VegaZero['encoding']['x']
164 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['theta']['field'] = VegaZero['encoding']['y']['y']
165 | if VegaZero['encoding']['y']['aggregate'] != '' and VegaZero['encoding']['y']['aggregate'] != 'none':
166 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['theta']['aggregate'] = VegaZero['encoding']['y'][
167 | 'aggregate']
168 |
169 | if VegaZero['encoding']['color']['z'] != '':
170 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['color'] = {
171 | 'field': VegaZero['encoding']['color']['z'], 'type': 'nominal'
172 | }
173 |
174 | # it seems that the group will be performed by VegaLite defaultly, in our cases.
175 | if VegaZero['transform']['group'] != '':
176 | pass
177 |
178 | if VegaZero['transform']['bin']['axis'] != '':
179 | if VegaZero['transform']['bin']['axis'] == 'x':
180 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['type'] = 'temporal'
181 | if VegaZero['transform']['bin']['type'] in ['date', 'year', 'week', 'month']:
182 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['timeUnit'] = VegaZero['transform']['bin']['type']
183 | elif VegaZero['transform']['bin']['type'] == 'weekday':
184 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['timeUnit'] = 'week'
185 | else:
186 | print('Unknown binning step.')
187 |
188 | if VegaZero['transform']['filter'] != '':
189 | if 'transform' not in self.VegaLiteSpec[VegaZero['mark']]:
190 | self.VegaLiteSpec[VegaZero['mark']]['transform'] = [{
191 | "filter": VegaZero['transform']['filter']
192 | }]
193 | elif 'filter' not in self.VegaLiteSpec[VegaZero['mark']]['transform']:
194 | self.VegaLiteSpec[VegaZero['mark']]['transform'].append({
195 | "filter": VegaZero['transform']['filter']
196 | })
197 | else:
198 | self.VegaLiteSpec[VegaZero['mark']]['transform']['filter'] += ' & ' + VegaZero['transform']['filter']
199 |
200 | if VegaZero['transform']['topk'] != '':
201 | if VegaZero['transform']['sort']['axis'] == 'x':
202 | sort_field = VegaZero['encoding']['x']
203 | elif VegaZero['transform']['sort']['axis'] == 'y':
204 | sort_field = VegaZero['encoding']['y']['y']
205 | else:
206 | print('Unknown sorting field: ', VegaZero['transform']['sort']['axis'])
207 | sort_field = VegaZero['transform']['sort']['axis']
208 | if VegaZero['transform']['sort']['type'] == 'desc':
209 | sort_order = 'descending'
210 | else:
211 | sort_order = 'ascending'
212 | if 'transform' in self.VegaLiteSpec[VegaZero['mark']]:
213 | current_filter = self.VegaLiteSpec[VegaZero['mark']]['transform'][0]['filter']
214 | self.VegaLiteSpec[VegaZero['mark']]['transform'][0][
215 | 'filter'] = current_filter + ' & ' + "datum.rank <= " + str(VegaZero['transform']['topk'])
216 | self.VegaLiteSpec[VegaZero['mark']]['transform'].insert(0, {
217 | "window": [{
218 | "field": sort_field,
219 | "op": "dense_rank",
220 | "as": "rank"
221 | }],
222 | "sort": [{"field": sort_field, "order": sort_order}]
223 | })
224 | else:
225 | self.VegaLiteSpec[VegaZero['mark']]['transform'] = [
226 | {
227 | "window": [{
228 | "field": sort_field,
229 | "op": "dense_rank",
230 | "as": "rank"
231 | }],
232 | "sort": [{"field": sort_field, "order": sort_order}]
233 | },
234 | {
235 | "filter": "datum.rank <= " + str(VegaZero['transform']['topk'])
236 | }
237 | ]
238 |
239 | if VegaZero['transform']['sort']['axis'] != '':
240 | if VegaZero['transform']['sort']['axis'] == 'x':
241 | if VegaZero['transform']['sort']['type'] == 'desc':
242 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['y']['sort'] = '-x'
243 | else:
244 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['y']['sort'] = 'x'
245 | else:
246 | if VegaZero['transform']['sort']['type'] == 'desc':
247 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['sort'] = '-y'
248 | else:
249 | self.VegaLiteSpec[VegaZero['mark']]['encoding']['x']['sort'] = 'y'
250 |
251 | return self.VegaLiteSpec[VegaZero['mark']]
252 |
253 |
--------------------------------------------------------------------------------