├── README.md
├── feat_drift
├── README.md
├── data
│ ├── feature_imp1.csv
│ └── feature_imp2.csv
├── feat_drift_test.ipynb
├── feature_drift_output
│ └── feature_drift_201802_201803.html
├── images
│ ├── feat_drift.PNG
│ └── feat_drift.gif
└── src
│ ├── feature_drift_draw.py
│ └── feature_drift_template.html
├── sankey
├── README.md
├── data
│ └── titanic_train.csv
├── images
│ ├── sankey_flow_col.PNG
│ ├── sankey_flow_col_val.PNG
│ ├── sankey_flow_same.PNG
│ ├── sankey_flow_tab20.PNG
│ └── sankey_flow_val.PNG
├── sankey_flow_output
│ └── sankey_flow_Titanic.html
├── sankey_flow_test.ipynb
└── src
│ ├── generate_sankey_flow.py
│ └── sankey_flow_template.html
└── tree
├── README.md
├── data
└── titanic_train.csv
├── generate_tree_test.ipynb
├── image
├── sankey_tree.gif
└── simple_tree.gif
├── sankey_tree_output
├── sankey_tree_Iris_Tree.html
└── sankey_tree_Titanic_Tree.html
├── simple_tree_output
├── simple_tree_Iris_Tree.html
└── simple_tree_Titanic_Tree.html
└── src
├── generate_tree.py
├── sankey_tree_template.html
└── simple_tree_template.html
/README.md:
--------------------------------------------------------------------------------
1 | # Nuance
2 | I use Nuance to curate varied visualization thoughts during my data scientist career.
3 | It is not yet a package but a list of small ideas. Welcome to test them out!
4 |
5 | ## Why Nuance?
6 | **nuance n.**
7 | a subtle difference in meaning or opinion or attitude
8 |
9 | ## How to use?
10 | Please check instructions in the corresponding folder
11 |
12 | ## List of ideas
13 | 1. **simple tree**: [visualize a sklearn Decision Tree](https://github.com/SauceCat/Nuance/blob/master/tree)
14 |
15 |
16 | 2. **sankey tree**: [visualize a sklearn Decision Tree](https://github.com/SauceCat/Nuance/blob/master/tree)
17 |
18 |
19 | 3. **sankey flow**: [visualize a sankey flow](https://github.com/SauceCat/Nuance/tree/master/sankey)
20 |
21 |
22 |  |
23 |  |
24 |
25 |
26 |  |
27 |  |
28 |
29 |
30 |
31 | 4. **feature drift**: [visualize feature drift](https://github.com/SauceCat/Nuance/tree/master/feat_drift)
32 |
33 |
34 |
35 |
36 |
--------------------------------------------------------------------------------
/feat_drift/README.md:
--------------------------------------------------------------------------------
1 | ## Visualize feature drift
2 |
3 |
4 | ## What's feature drift
5 | **"Feature drifts occur whenever the relevance of a feature grows or shrinks for incoming instances."**
6 | Check this paper: [A survey on feature drift adaptation: Definition, benchmark, challenges and future directions](https://www.sciencedirect.com/science/article/pii/S0164121216301030)
7 |
8 | **Make it simple:** If your training dataset is relevant to time, the subset of important features selected by the same model might be quite different through time.
9 | The idea is to try to visualize the "feature drift" between two different training sets. Usually these two datasets are from different snapshots. So this visualization could help detect "feature drift" through time. The expected inputs are two dataframes, containing feature importance information. You can check [this notebook](https://github.com/SauceCat/Nuance/blob/master/feat_drift/feat_drift_test.ipynb) for more details.
10 |
11 | ## How to use?
12 | 1. Download the folder [**feat_drift**](https://github.com/SauceCat/Nuance/tree/master/feat_drift)
13 | 2. The folder structure:
14 | ```
15 | feat_drift
16 | - src: folder for all codes regarding the visualization
17 | - data: folder for the test data
18 | - feature_drift_output: folder for outputs (will be re-generated if it was deleted)
19 | - feat_drift_test.ipynb: instructions and examples in jupyter notebook
20 | - ...
21 | ```
22 | 3. Install `jinja2`
23 | ```
24 | pip install jinja2
25 | ```
26 | 4. Use feature drift visualization: (visualization depends on D3.js, so you need to connect to the network)
27 | ```python
28 | import sys
29 | sys.path.insert(0, 'src/')
30 | import feature_drift_draw
31 |
32 | feature_drift_draw.feature_drift_graph(feat_imp1=feat_imp1, feat_imp2=feat_imp2, feature_name='feat_name', imp_name='imp',
33 | ds_name1='training set', ds_name2='test set', graph_name='train_test',
34 | top_n=20, max_bar_width=300, bar_height=30, middle_gap=300, fontsize=12, color_dict=None)
35 | ```
36 | 5. A html file would be generated in [feature_drift_output](https://github.com/SauceCat/Nuance/tree/master/feat_drift/feature_drift_output). Open it using any browser you like (I like Chrome anyway).
37 |
38 | ## Parameters
39 | ```python
40 | def feature_drift_graph(feat_imp1, feat_imp2, feature_name, imp_name, ds_name1, ds_name2, graph_name=None,
41 | top_n=None, max_bar_width=300, bar_height=30, middle_gap=300, fontsize=12, color_dict=None):
42 | """
43 | Draw feature drift graph
44 |
45 | :param feat_imp1: feature importance dataframe #1
46 | :param feat_imp2: feature importance dataframe #2
47 | :param feature_name: column name of features
48 | :param imp_name: column name of importance value
49 | :param ds_name1: name of dataset #1
50 | :param ds_name2: name of dataset #2
51 | :param top_n: show top_n features
52 | :param max_bar_width: maximum bar width
53 | :param bar_height: bar height
54 | :param middle_gap: gap between bars
55 | :param fontsize: font size
56 | :param color_dict: color dictionary
57 | """
58 | ```
59 |
--------------------------------------------------------------------------------
/feat_drift/data/feature_imp1.csv:
--------------------------------------------------------------------------------
1 | feat_name,imp
2 | feat_0,13.3436942857
3 | feat_1,15.800973571400002
4 | feat_2,21.0843816667
5 | feat_3,48.5161874074
6 | feat_4,113.18283999999998
7 | feat_5,19.0311508852
8 | feat_6,31.0144289552
9 | feat_7,44.0083055556
10 | feat_8,21.79879273
11 | feat_9,18.5955840707
12 | feat_10,40.70143425
13 | feat_11,44.3106395238
14 | feat_12,58.8463938636
15 | feat_13,26.846384736799997
16 | feat_14,17.89738
17 | feat_15,20.8532046377
18 | feat_16,109.927572857
19 | feat_17,45.3217483929
20 | feat_18,30.324107931
21 | feat_19,19.0750985714
22 | feat_20,15.9053894444
23 | feat_21,43.1775085
24 | feat_22,16.1404971724
25 | feat_23,30.933264415700002
26 | feat_24,25.120187
27 | feat_25,19.1116620577
28 | feat_26,23.112241081100002
29 | feat_27,175.000243176
30 | feat_28,39.8828694737
31 | feat_29,54.21781872729999
32 | feat_30,23.4141987356
33 | feat_31,30.3297
34 | feat_32,39.556396
35 | feat_33,18.1330640678
36 | feat_34,44.2011352174
37 | feat_35,20.492
38 | feat_36,14.839914088099999
39 | feat_37,24.425098869
40 | feat_38,23.087398800000006
41 | feat_39,15.864219090899999
42 | feat_40,36.5840980508
43 | feat_41,52.3627042029
44 | feat_42,17.157752380999998
45 | feat_43,159.114082
46 | feat_44,30.9611028058
47 | feat_45,46.2027222222
48 | feat_46,37.8220857851
49 | feat_47,45.8025334091
50 | feat_48,50.6727130435
51 | feat_49,55.133470512799995
52 | feat_50,33.0315573684
53 | feat_51,23.7468116667
54 | feat_52,11.451292186
55 | feat_53,31.55620125
56 | feat_54,16.6248694444
57 | feat_55,29.06870825
58 | feat_56,24.9980913953
59 | feat_57,15.538474
60 | feat_58,16.471437055
61 | feat_59,20.3773357143
62 | feat_60,23.528037551
63 | feat_61,65.3004364286
64 | feat_62,45.0990146667
65 | feat_63,17.89653
66 | feat_64,16.2706778333
67 | feat_65,18.4892417005
68 | feat_66,24.78189
69 | feat_67,162.19046175399998
70 | feat_68,34.09889375
71 | feat_69,24.2538968056
72 | feat_70,430.755799091
73 | feat_71,36.1628342105
74 | feat_72,96.30485301739999
75 | feat_73,26.820805714299997
76 | feat_74,13.4593211864
77 | feat_75,65.3470578947
78 | feat_76,26.874275
79 | feat_77,60.66391
80 | feat_78,18.2826344444
81 | feat_79,139.530246267
82 |
--------------------------------------------------------------------------------
/feat_drift/data/feature_imp2.csv:
--------------------------------------------------------------------------------
1 | feat_name,imp
2 | feat_0,10.8216213571
3 | feat_1,23.4163501481
4 | feat_2,11.565690294100001
5 | feat_3,48.9171478125
6 | feat_4,113.584958947
7 | feat_5,11.0962521419
8 | feat_6,17.8663032911
9 | feat_7,26.624295454499997
10 | feat_8,8.02320777095
11 | feat_9,11.2232443967
12 | feat_10,24.5941423598
13 | feat_11,28.699749
14 | feat_12,25.775422679000002
15 | feat_13,16.1959695
16 | feat_14,11.397097619
17 | feat_15,10.6688073224
18 | feat_16,146.73557250000005
19 | feat_17,23.4428398347
20 | feat_18,11.718083461500001
21 | feat_19,13.567881818699998
22 | feat_20,8.789489411760002
23 | feat_21,30.6499793103
24 | feat_22,10.708858439
25 | feat_23,16.2798533926
26 | feat_24,8.307419928060003
27 | feat_25,11.1156423881
28 | feat_26,10.9661604717
29 | feat_27,28.9956640463
30 | feat_28,11.607423439200002
31 | feat_29,35.8433884211
32 | feat_30,11.9170976667
33 | feat_31,7.815233333330001
34 | feat_32,22.894614
35 | feat_33,12.3991074603
36 | feat_34,16.485753913
37 | feat_35,11.297780625
38 | feat_36,8.3932906449
39 | feat_37,15.230949728299999
40 | feat_38,8.202025200000001
41 | feat_39,10.4006885714
42 | feat_40,12.918274803800001
43 | feat_41,41.7556285075
44 | feat_42,8.27308114035
45 | feat_43,90.61743875
46 | feat_44,16.160808051900002
47 | feat_45,39.1513645833
48 | feat_46,28.195249158200003
49 | feat_47,11.925391071400002
50 | feat_48,39.5888431379
51 | feat_49,25.952234090900003
52 | feat_50,11.6053776503
53 | feat_51,10.2122409615
54 | feat_52,8.86319782
55 | feat_53,26.429423928600002
56 | feat_54,8.92956916667
57 | feat_55,13.319904821400002
58 | feat_56,22.599339901
59 | feat_57,19.431493224
60 | feat_58,9.70069634855
61 | feat_59,19.0049746667
62 | feat_60,35.4501718966
63 | feat_61,72.6334658974
64 | feat_62,38.310038
65 | feat_63,7.72638458065
66 | feat_64,9.84875153025
67 | feat_65,15.948526475
68 | feat_66,18.7450463636
69 | feat_67,151.57445307700002
70 | feat_68,10.638199758999999
71 | feat_69,21.6299969863
72 | feat_70,180.07161175
73 | feat_71,39.1086955
74 | feat_72,211.432106861
75 | feat_73,26.0100392308
76 | feat_74,9.14663557073
77 | feat_75,19.9266190476
78 | feat_76,10.540188666699999
79 | feat_77,15.1881593548
80 | feat_78,12.2875825
81 | feat_79,128.742952879
82 |
--------------------------------------------------------------------------------
/feat_drift/feat_drift_test.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import pandas as pd"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## read fake feature importance \n",
17 | "The expected inputs are two feature importance dataframes to compare. \n",
18 | "It is assumed that the feature set between these two dataframes is exactly same."
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "feat_imp1 = pd.read_csv('data/feature_imp1.csv')\n",
28 | "feat_imp2 = pd.read_csv('data/feature_imp2.csv')"
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": 3,
34 | "metadata": {},
35 | "outputs": [
36 | {
37 | "data": {
38 | "text/html": [
39 | "\n",
40 | "\n",
53 | "
\n",
54 | " \n",
55 | " \n",
56 | " | \n",
57 | " feat_name | \n",
58 | " imp | \n",
59 | "
\n",
60 | " \n",
61 | " \n",
62 | " \n",
63 | " 0 | \n",
64 | " feat_0 | \n",
65 | " 13.343694 | \n",
66 | "
\n",
67 | " \n",
68 | " 1 | \n",
69 | " feat_1 | \n",
70 | " 15.800974 | \n",
71 | "
\n",
72 | " \n",
73 | " 2 | \n",
74 | " feat_2 | \n",
75 | " 21.084382 | \n",
76 | "
\n",
77 | " \n",
78 | " 3 | \n",
79 | " feat_3 | \n",
80 | " 48.516187 | \n",
81 | "
\n",
82 | " \n",
83 | " 4 | \n",
84 | " feat_4 | \n",
85 | " 113.182840 | \n",
86 | "
\n",
87 | " \n",
88 | "
\n",
89 | "
"
90 | ],
91 | "text/plain": [
92 | " feat_name imp\n",
93 | "0 feat_0 13.343694\n",
94 | "1 feat_1 15.800974\n",
95 | "2 feat_2 21.084382\n",
96 | "3 feat_3 48.516187\n",
97 | "4 feat_4 113.182840"
98 | ]
99 | },
100 | "execution_count": 3,
101 | "metadata": {},
102 | "output_type": "execute_result"
103 | }
104 | ],
105 | "source": [
106 | "feat_imp1.head()"
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": 4,
112 | "metadata": {},
113 | "outputs": [
114 | {
115 | "data": {
116 | "text/html": [
117 | "\n",
118 | "\n",
131 | "
\n",
132 | " \n",
133 | " \n",
134 | " | \n",
135 | " feat_name | \n",
136 | " imp | \n",
137 | "
\n",
138 | " \n",
139 | " \n",
140 | " \n",
141 | " 0 | \n",
142 | " feat_0 | \n",
143 | " 10.821621 | \n",
144 | "
\n",
145 | " \n",
146 | " 1 | \n",
147 | " feat_1 | \n",
148 | " 23.416350 | \n",
149 | "
\n",
150 | " \n",
151 | " 2 | \n",
152 | " feat_2 | \n",
153 | " 11.565690 | \n",
154 | "
\n",
155 | " \n",
156 | " 3 | \n",
157 | " feat_3 | \n",
158 | " 48.917148 | \n",
159 | "
\n",
160 | " \n",
161 | " 4 | \n",
162 | " feat_4 | \n",
163 | " 113.584959 | \n",
164 | "
\n",
165 | " \n",
166 | "
\n",
167 | "
"
168 | ],
169 | "text/plain": [
170 | " feat_name imp\n",
171 | "0 feat_0 10.821621\n",
172 | "1 feat_1 23.416350\n",
173 | "2 feat_2 11.565690\n",
174 | "3 feat_3 48.917148\n",
175 | "4 feat_4 113.584959"
176 | ]
177 | },
178 | "execution_count": 4,
179 | "metadata": {},
180 | "output_type": "execute_result"
181 | }
182 | ],
183 | "source": [
184 | "feat_imp2.head()"
185 | ]
186 | },
187 | {
188 | "cell_type": "markdown",
189 | "metadata": {},
190 | "source": [
191 | "## test feature drift graph\n",
192 | "Here we assume that these two different feature importance results are from model trained on datasets from different snapshot months. \n",
193 | "- feat_imp1: feature importance from dataset 201802\n",
194 | "- feat_imp2: feature importance from dataset 201803 "
195 | ]
196 | },
197 | {
198 | "cell_type": "code",
199 | "execution_count": 5,
200 | "metadata": {},
201 | "outputs": [],
202 | "source": [
203 | "import sys\n",
204 | "sys.path.insert(0, 'src/')\n",
205 | "import feature_drift_draw"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 6,
211 | "metadata": {},
212 | "outputs": [],
213 | "source": [
214 | "feature_drift_draw.feature_drift_graph(feat_imp1=feat_imp1, feat_imp2=feat_imp2, feature_name='feat_name', imp_name='imp',\n",
215 | " ds_name1='201802', ds_name2='201803', graph_name='201802_201803',\n",
216 | " top_n=20, max_bar_width=300, bar_height=30, middle_gap=300, fontsize=12, color_dict=None)"
217 | ]
218 | }
219 | ],
220 | "metadata": {
221 | "kernelspec": {
222 | "display_name": "Python 2",
223 | "language": "python",
224 | "name": "python2"
225 | },
226 | "language_info": {
227 | "codemirror_mode": {
228 | "name": "ipython",
229 | "version": 2
230 | },
231 | "file_extension": ".py",
232 | "mimetype": "text/x-python",
233 | "name": "python",
234 | "nbconvert_exporter": "python",
235 | "pygments_lexer": "ipython2",
236 | "version": "2.7.14"
237 | }
238 | },
239 | "nbformat": 4,
240 | "nbformat_minor": 2
241 | }
242 |
--------------------------------------------------------------------------------
/feat_drift/feature_drift_output/feature_drift_201802_201803.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
13 |
14 |
15 |
16 |
17 |
Feature Drift
18 |
Visualize how important features drift through two datasets.
19 |
20 |
21 |
22 |
23 |
279 |
280 |
--------------------------------------------------------------------------------
/feat_drift/images/feat_drift.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sosuneko/Nuance/e8b486ae7459850a00f1e8bbd756e7a57aed4417/feat_drift/images/feat_drift.PNG
--------------------------------------------------------------------------------
/feat_drift/images/feat_drift.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sosuneko/Nuance/e8b486ae7459850a00f1e8bbd756e7a57aed4417/feat_drift/images/feat_drift.gif
--------------------------------------------------------------------------------
/feat_drift/src/feature_drift_draw.py:
--------------------------------------------------------------------------------
1 |
2 | import pandas as pd
3 | import jinja2
4 | import os
5 |
6 |
7 | def _process_imp(imp_df, imp_name):
8 | """
9 | Preprocessing on the input feature importance dataframe
10 |
11 | :param imp_df: feature importance pandas dataframe
12 | :param imp_name: column name of the importance value
13 | :return:
14 | dataframe with relative_imp and feat_rank
15 | """
16 |
17 | imp_df = imp_df.sort_values(by=imp_name, ascending=False).reset_index(drop=True)
18 | imp_df['relative_imp'] = imp_df[imp_name] * 1.0 / imp_df[imp_name].max()
19 | imp_df['relative_imp'] = imp_df['relative_imp'].apply(lambda x : round(x, 3))
20 | imp_df['feat_rank'] = imp_df.index.values + 1
21 | return imp_df
22 |
23 |
24 | def _rank2color(x, color_dict):
25 | """
26 | Map change of rank to color
27 |
28 | :param x: row of dataframe
29 | :param color_dict: color dictionary
30 | """
31 |
32 | if x['feat_rank_x'] < x['feat_rank_y']:
33 | return color_dict['drop']
34 | if x['feat_rank_x'] >= x['feat_rank_y']:
35 | return color_dict['up_or_stable']
36 | if pd.isnull(x['feat_rank_y']):
37 | return color_dict['disappear']
38 | if pd.isnull(x['feat_rank_x']):
39 | return color_dict['appear']
40 |
41 |
42 | def _get_mark(x):
43 | """
44 | '1' for feature appears on both feature importance dataframes
45 | '0' for feature disappears on either dataframe
46 | """
47 | if pd.isnull(x['feat_rank_y']) or pd.isnull(x['feat_rank_x']):
48 | return "0"
49 | else:
50 | return "1"
51 |
52 |
53 | def _merge_feat_imp(imp_df1, imp_df2, feature_name, top_n, color_dict):
54 | """
55 | Merge and compare two feature importance dataframes
56 |
57 | :param imp_df1: feature importance dataframe #1
58 | :param imp_df2: feature importance dataframe #2
59 | :param feature_name: column name of features
60 | :param top_n: show top_n features
61 | :param color_dict: color dictionary
62 | :return:
63 | The merged dataframe
64 | """
65 |
66 | imp_df1['pos'] = 'left'
67 | imp_df2['pos'] = 'right'
68 | if top_n:
69 | both_imp = imp_df1.head(top_n).merge(imp_df2.head(top_n), on=feature_name, how='outer')
70 | else:
71 | both_imp = imp_df1.merge(imp_df2, on=feature_name, how='outer')
72 |
73 | both_imp['bar_color'] = both_imp.apply(lambda x : _rank2color(x, color_dict), axis=1)
74 | both_imp['bar_mark'] = both_imp.apply(lambda x : _get_mark(x), axis=1)
75 |
76 | return both_imp
77 |
78 |
79 | def feature_drift_graph(feat_imp1, feat_imp2, feature_name, imp_name, ds_name1, ds_name2, graph_name=None,
80 | top_n=None, max_bar_width=300, bar_height=30, middle_gap=300, fontsize=12, color_dict=None):
81 | """
82 | Draw feature drift graph
83 |
84 | :param feat_imp1: feature importance dataframe #1
85 | :param feat_imp2: feature importance dataframe #2
86 | :param feature_name: column name of features
87 | :param imp_name: column name of importance value
88 | :param ds_name1: name of dataset #1
89 | :param ds_name2: name of dataset #2
90 | :param top_n: show top_n features
91 | :param max_bar_width: maximum bar width
92 | :param bar_height: bar height
93 | :param middle_gap: gap between bars
94 | :param fontsize: font size
95 | :param color_dict: color dictionary
96 | """
97 |
98 | feat_imp1 = _process_imp(feat_imp1, imp_name)
99 | feat_imp2 = _process_imp(feat_imp2, imp_name)
100 |
101 | if color_dict is None:
102 | color_dict = {
103 | 'drop': '#f17182',
104 | 'up_or_stable': '#abdda4',
105 | 'disappear': '#bababa',
106 | 'appear': '#9ac6df'
107 | }
108 |
109 | both_imp = _merge_feat_imp(feat_imp1, feat_imp2, feature_name, top_n, color_dict)
110 |
111 | bar_left_data = both_imp[['feat_name', 'relative_imp_x', 'pos_x', 'bar_color', 'bar_mark']
112 | ].dropna().sort_values('relative_imp_x', ascending=False)
113 | bar_left_data.columns = [col.replace('_x', '') for col in bar_left_data.columns.values]
114 |
115 | bar_right_data = both_imp[['feat_name', 'relative_imp_y', 'pos_y', 'bar_color', 'bar_mark']
116 | ].dropna().sort_values('relative_imp_y', ascending=False)
117 | bar_right_data.columns = [col.replace('_y', '') for col in bar_right_data.columns.values]
118 |
119 | line_data = both_imp[['feat_name', 'bar_color', 'feat_rank_x', 'feat_rank_y']].dropna()[['feat_name', 'bar_color']]
120 |
121 | legend_data = [
122 | {'name': 'Drop', 'color': color_dict['drop']},
123 | {'name': 'Up & Stable', 'color': color_dict['up_or_stable']},
124 | {'name': 'Disappear', 'color': color_dict['disappear']},
125 | {'name': 'Appear', 'color': color_dict['appear']}
126 | ]
127 |
128 | # render the output
129 | temp = open('src/feature_drift_template.html').read()
130 | template = jinja2.Template(temp)
131 |
132 | # create the output root if it is not exits
133 | if not os.path.exists('feature_drift_output'):
134 | os.mkdir('feature_drift_output')
135 |
136 | # generate output html
137 | if graph_name is None:
138 | output_path = 'feature_drift_output/feature_drift_output.html'
139 | else:
140 | output_path = 'feature_drift_output/feature_drift_%s.html' %graph_name
141 |
142 | with open(output_path, 'wb') as fh:
143 | fh.write(template.render({'bar_left_data': bar_left_data.to_dict('records'),
144 | 'bar_right_data': bar_right_data.to_dict('records'),
145 | 'line_data': line_data.to_dict('records'),
146 | 'legend_data': legend_data,
147 | 'max_bar_width': max_bar_width, 'bar_height': bar_height,
148 | 'middle_gap': middle_gap, 'fontsize': fontsize,
149 | 'ds_name1': ds_name1, 'ds_name2': ds_name2}))
150 |
--------------------------------------------------------------------------------
/feat_drift/src/feature_drift_template.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
13 |
14 |
15 |
16 |
17 |
Feature Drift
18 |
Visualize how important features drift through two datasets.
19 |
20 |
21 |
22 |
23 |
279 |
280 |
--------------------------------------------------------------------------------
/sankey/README.md:
--------------------------------------------------------------------------------
1 | ## Visualize a sankey flow
2 |
3 |
4 |  |
5 |  |
6 |
7 |
8 |  |
9 |  |
10 |
11 |
12 |
13 | ## How to use?
14 | 1. Download the folder [**sankey**](https://github.com/SauceCat/Nuance/tree/master/sankey)
15 | 2. The folder structure:
16 | ```
17 | sankey
18 | - src: folder for all codes regarding the visualization
19 | - data: folder for the test data
20 | - sankey_flow_output: folder for outputs (will be re-generated if it was deleted)
21 | - sankey_flow_test.ipynb: instructions and examples in jupyter notebook
22 | - ...
23 | ```
24 | 3. Install `jinja2`
25 | ```
26 | pip install jinja2
27 | ```
28 | 4. Use sankey_flow visualization: (visualization depends on D3.js, so you need to connect to the network)
29 | ```python
30 | import sys
31 | sys.path.insert(0, 'src/')
32 | import generate_sankey_flow
33 |
34 | generate_sankey_flow.draw_sankey_flow(df=raw[use_cols], node_color_type='col', link_color_type='source',
35 | width=1600, height=900, graph_name='Titanic',
36 | node_color_mapping=None, color_map=None, link_color=None)
37 | ```
38 | 5. A html file would be generated in [sankey_flow_output](https://github.com/SauceCat/Nuance/tree/master/sankey/sankey_flow_output). Open it using any browser you like (I like Chrome anyway).
39 |
40 | ## Parameters
41 | ```python
42 | def draw_sankey_flow(df, node_color_type, link_color_type, width, height,
43 | graph_name=None, node_color_mapping=None, color_map=None, link_color=None):
44 | '''
45 | :param df:
46 | pandas DataFrame, each column represents a state
47 | :param node_color_type:
48 | node coloring strategy, can be one of ['col', 'val', 'col_val', 'cus']
49 | - 'col': each column has different color
50 | - 'val': each unique value has different color (unique values through all columns)
51 | - 'col_val': each unique value in each column has different color
52 | - 'cus': customer provide node color mapping
53 | :param link_color_type:
54 | link coloring strategy, default='source'
55 | Can be one of ['source', 'target', 'both', 'same']
56 | - 'source': same color as the source node
57 | - 'target': same color as the target node
58 | - 'both': color from both target and source
59 | - 'same': all links have same color
60 | :param width: wdith
61 | :param height: height
62 | :param graph_name: name of the graph
63 | :param node_color_mapping:
64 | if node_color_type == 'cus', color_mapping should be provided
65 | example:
66 | node_color_mapping = {
67 | 'type': 'col',
68 | 'mapping': {
69 | column1: color1, column2: color2, ...
70 | }
71 | }
72 | :param color_map: matplotlib color map
73 | :param link_color:
74 | if link_color_type == 'same', link color should be provided
75 | '''
76 | ```
77 |
--------------------------------------------------------------------------------
/sankey/images/sankey_flow_col.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sosuneko/Nuance/e8b486ae7459850a00f1e8bbd756e7a57aed4417/sankey/images/sankey_flow_col.PNG
--------------------------------------------------------------------------------
/sankey/images/sankey_flow_col_val.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sosuneko/Nuance/e8b486ae7459850a00f1e8bbd756e7a57aed4417/sankey/images/sankey_flow_col_val.PNG
--------------------------------------------------------------------------------
/sankey/images/sankey_flow_same.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sosuneko/Nuance/e8b486ae7459850a00f1e8bbd756e7a57aed4417/sankey/images/sankey_flow_same.PNG
--------------------------------------------------------------------------------
/sankey/images/sankey_flow_tab20.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sosuneko/Nuance/e8b486ae7459850a00f1e8bbd756e7a57aed4417/sankey/images/sankey_flow_tab20.PNG
--------------------------------------------------------------------------------
/sankey/images/sankey_flow_val.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sosuneko/Nuance/e8b486ae7459850a00f1e8bbd756e7a57aed4417/sankey/images/sankey_flow_val.PNG
--------------------------------------------------------------------------------
/sankey/sankey_flow_output/sankey_flow_Titanic.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
27 |
28 |
29 |
30 |
33 |
34 |
35 |
36 |
331 |
493 |
494 |
495 |