├── .gitignore
├── README.md
├── dataset
├── database_information.csv
├── dataset_final
│ ├── dev.csv
│ ├── test.csv
│ └── train.csv
├── db_tables_columns.json
├── db_tables_columns_types.json
├── dev.csv
├── test.csv
└── train.csv
├── img
├── example.png
├── example0.png
├── inoutput.png
└── teaser.png
├── model
├── AttentionForcing.py
├── Decoder.py
├── Encoder.py
├── Model.py
├── SubLayers.py
└── VisAwareTranslation.py
├── ncNet-VIS21.pdf
├── ncNet.ipynb
├── ncNet.py
├── preprocessing
├── build_vocab.py
└── process_dataset.py
├── requirements.txt
├── save_models
└── trained_model.pt
├── test.py
├── test_ncNet.ipynb
├── train.py
└── utilities
└── vis_rendering.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/*
2 | *.idea/
3 | **/.DS_Store
4 | *.DS_Store
5 | *.idea
6 | /dataset/database/*
7 | *.pyc
8 | .idea
9 | .idea/
10 | .ipynb_checkpoints
11 | .ipynb_checkpoints/
12 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # ncNet
2 |
3 | Supporting the translation from natural language (NL) query to visualization (NL2VIS) can simplify the creation of data visualizations because if successful, anyone can generate visualizations by their natural language from the tabular data.
4 |
5 |
6 |
7 | We present ncNet, a Transformer-based model for supporting NL2VIS, with several novel visualization-aware optimizations, including using attention-forcing to optimize the learning process, and visualization-aware rendering to produce better visualization results.
8 |
9 | ## Input and Output
10 |
11 |
12 | Input:
13 | * a tabular dataset (csv, json, or sqlite3)
14 | * a natural language query used for NL2VIS
15 | * an optional chart template
16 |
17 | Output:
18 | * [Vega-Zero](https://github.com/Thanksyy/Vega-Zero): a sequence-based grammar for model-friendly, by simplifying Vega-Lite
19 |
20 |
21 | Please refer to our [paper](https://github.com/Thanksyy/Vega-Zero/blob/main/ncNet-VIS21.pdf) at IEEE VIS 2021 for more details.
22 |
23 |
24 | # Environment Setup
25 |
26 | * `Python3.6+`
27 | * `PyTorch 1.7`
28 | * `torchtext 0.8`
29 | * `ipyvega`
30 |
31 | Install Python dependency via `pip install -r requirements.txt` when the environment of Python and Pytorch is setup.
32 |
33 |
34 | # Running Code
35 |
36 | ## Data preparation
37 |
38 |
39 |
40 | * [Must] Download the Spider data [here](https://drive.google.com/drive/folders/1wmJTcC9R6ah0jBo_ONaZW3ykx5iGMx9j?usp=sharing) and unzip under `./dataset/` directory
41 |
42 | * [Optional] **_Only if_** you change the `train/dev/test.csv` under the `./dataset/` folder, you need to run `process_dataset.py` under the `preprocessing` foler.
43 |
44 | ## Runing Example
45 |
46 | Open the `ncNet.ipynb` to try the running example.
47 |
48 |
49 |
50 |
51 | ## Training
52 |
53 | Run `train.py` to train ncNet.
54 |
55 |
56 | ## Testing
57 |
58 | Run `test.py` to eval ncNet.
59 |
60 |
61 | # Citing ncNet
62 |
63 | ```bibTeX
64 | @ARTICLE{ncnet,
65 | author={Luo, Yuyu and Tang, Nan and Li, Guoliang and Tang, Jiawei and Chai, Chengliang and Qin, Xuedi},
66 | journal={IEEE Transactions on Visualization and Computer Graphics},
67 | title={Natural Language to Visualization by Neural Machine Translation},
68 | year={2021},
69 | volume={},
70 | number={},
71 | pages={1-1}, doi={10.1109/TVCG.2021.3114848}}
72 | ```
73 |
74 | # License
75 | The project is available under the [MIT License](https://github.com/Thanksyy/Vega-Zero/blob/main/README.md).
76 |
77 | # Contact
78 | If you have any questions, feel free to contact Yuyu Luo (yuyuluo [AT] hkust-gz.edu.cn).
79 |
--------------------------------------------------------------------------------
/img/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HKUSTDial/ncNet/415d9477d424296bc5414e0f6624af23643372d7/img/example.png
--------------------------------------------------------------------------------
/img/example0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HKUSTDial/ncNet/415d9477d424296bc5414e0f6624af23643372d7/img/example0.png
--------------------------------------------------------------------------------
/img/inoutput.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HKUSTDial/ncNet/415d9477d424296bc5414e0f6624af23643372d7/img/inoutput.png
--------------------------------------------------------------------------------
/img/teaser.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HKUSTDial/ncNet/415d9477d424296bc5414e0f6624af23643372d7/img/teaser.png
--------------------------------------------------------------------------------
/model/AttentionForcing.py:
--------------------------------------------------------------------------------
1 | __author__ = "Yuyu Luo"
2 |
3 | import numpy as np
4 |
5 |
6 | def create_visibility_matrix(SRC, each_src):
7 | each_src = np.array(each_src.to('cpu'))
8 |
9 | # find related index
10 | nl_beg_index = np.where(each_src == SRC.vocab['
\n", 396 | " | tvBench_id | \n", 397 | "db_id | \n", 398 | "chart | \n", 399 | "hardness | \n", 400 | "query | \n", 401 | "question | \n", 402 | "vega_zero | \n", 403 | "mentioned_columns | \n", 404 | "mentioned_values | \n", 405 | "query_template | \n", 406 | "source | \n", 407 | "labels | \n", 408 | "token_types | \n", 409 | "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", 414 | "2914@y_name@DESC | \n", 415 | "swimming | \n", 416 | "Bar | \n", 417 | "Easy | \n", 418 | "Visualize BAR SELECT name , ID FROM swimmer OR... | \n", 419 | "Draw a bar chart about the distribution of nam... | \n", 420 | "mark bar data swimmer encoding x name y aggreg... | \n", 421 | "id name time | \n", 422 | "NaN | \n", 423 | "mark bar data swimmer encoding x [X] y aggrega... | \n", 424 | "<N> Draw a bar chart about the distribution of... | \n", 425 | "mark bar data swimmer encoding x name y aggreg... | \n", 426 | "nl nl nl nl nl nl nl nl nl nl nl nl nl nl nl n... | \n", 427 | "
1 | \n", 430 | "586 | \n", 431 | "college_1 | \n", 432 | "Bar | \n", 433 | "Easy | \n", 434 | "Visualize BAR SELECT DEPT_CODE , sum(crs_credi... | \n", 435 | "Bar chart of sum crs credit from each dept code | \n", 436 | "mark bar data course encoding x dept_code y ag... | \n", 437 | "dept_code crs_credit | \n", 438 | "NaN | \n", 439 | "mark [T] data course encoding x [X] y aggregat... | \n", 440 | "<N> Bar chart of sum crs credit from each dept... | \n", 441 | "mark bar data course encoding x dept_code y ag... | \n", 442 | "nl nl nl nl nl nl nl nl nl nl nl nl template t... | \n", 443 | "
2 | \n", 446 | "2798@x_name@ASC | \n", 447 | "soccer_2 | \n", 448 | "Bar | \n", 449 | "Medium | \n", 450 | "Visualize BAR SELECT cName , min(enr) FROM col... | \n", 451 | "Return a bar graph for the name of the school ... | \n", 452 | "mark bar data college encoding x cname y aggre... | \n", 453 | "state cname enr | \n", 454 | "NaN | \n", 455 | "mark [T] data college encoding x [X] y aggrega... | \n", 456 | "<N> Return a bar graph for the name of the sch... | \n", 457 | "mark bar data college encoding x cname y aggre... | \n", 458 | "nl nl nl nl nl nl nl nl nl nl nl nl nl nl nl n... | \n", 459 | "
3 | \n", 462 | "3051 | \n", 463 | "train_station | \n", 464 | "Pie | \n", 465 | "Easy | \n", 466 | "Visualize PIE SELECT Location , sum(number_of_... | \n", 467 | "Show the proportion of the total number of pla... | \n", 468 | "mark arc data station encoding x location y ag... | \n", 469 | "location number_of_platforms | \n", 470 | "NaN | \n", 471 | "mark [T] data station encoding x [X] y aggrega... | \n", 472 | "<N> Show the proportion of the total number of... | \n", 473 | "mark arc data station encoding x location y ag... | \n", 474 | "nl nl nl nl nl nl nl nl nl nl nl nl nl nl nl n... | \n", 475 | "
4 | \n", 478 | "73 | \n", 479 | "apartment_rentals | \n", 480 | "Pie | \n", 481 | "Easy | \n", 482 | "Visualize PIE SELECT booking_status_code , COU... | \n", 483 | "How many bookings does each booking status hav... | \n", 484 | "mark arc data apartment_bookings encoding x bo... | \n", 485 | "booking_status_code | \n", 486 | "NaN | \n", 487 | "mark [T] data apartment_bookings encoding x [X... | \n", 488 | "<N> How many bookings does each booking status... | \n", 489 | "mark arc data apartment_bookings encoding x bo... | \n", 490 | "nl nl nl nl nl nl nl nl nl nl nl nl nl nl nl n... | \n", 491 | "