├── .gitignore
├── README.md
├── captumIg_shap_baselines.ipynb
├── electra_fine_tune_interpret_captum_ig.ipynb
├── fine_tune_bart_summarization_two_langs.ipynb
└── imgs
    ├── attr-features-1.png
    ├── electra-attr-negative-negative.png
    ├── electra-attr-negative-positive.png
    ├── electra-attr-positive-positive.png
    ├── explain-diff-ig.png
    ├── features-sum-12.png
    └── weight-features-1.png


/.gitignore:
--------------------------------------------------------------------------------
1 | # Ignore
2 | **/logs/
3 | **/results/
4 | *.egg-info/
5 | __pycache__/
6 | .idea
7 | .ipynb_checkpoints


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Content
 2 | ---
 3 | 
 4 | - <a href="#electraig-anchor">electra_fine_tune_interpret_captum_ig</a> ([notebook](electra_fine_tune_interpret_captum_ig.ipynb), [Colab notebook](https://colab.research.google.com/github/elsanns/xai-nlp-notebooks/blob/master/electra_fine_tune_interpret_captum_ig.ipynb))
 5 | - <a href="#bart-twolangs-anchor">fine_tune_bart_summarization_two_langs</a> ([notebook](fine_tune_bart_summarization_two_langs.ipynb), [Colab notebook](https://colab.research.google.com/github/elsanns/xai-nlp-notebooks/blob/master/fine_tune_bart_summarization_two_langs.ipynb))
 6 | - <a href="#captumigshapbaseline-anchor">captumig-shap-baselines</a> ([notebook](captumIg_shap_baselines.ipynb), [Colab notebook](https://colab.research.google.com/github/elsanns/xai-nlp-notebooks/blob/master/captumIg_shap_baselines.ipynb))
 7 | 
 8 | ## <a name="electraig-anchor">Fine-tuning Electra and interpreting with Captum Integrated Gradients</a>
 9 | ---
10 | 
11 | This notebook contains an example of [fine-tuning](https://huggingface.co/transformers/training.html) an [Electra](https://huggingface.co/transformers/model_doc/electra.html) model on the [GLUE SST-2](https://nlp.stanford.edu/sentiment/index.html) dataset. After fine-tuning, the [Integrated Gradients](https://arxiv.org/pdf/1703.01365.pdf) **interpretability** method is applied to compute tokens' attributions for each target class. 
12 | * We will instantiate a pre-trained Electra model from the [Transformers](https://huggingface.co/transformers/) library. 
13 | * The data is downloaded from the [nlp](https://huggingface.co/nlp/) library. The input text is tokenized with [ElectraTokenizerFast](https://huggingface.co/transformers/model_doc/electra.html#electratokenizerfast) tokenizer backed by HF [tokenizers](https://huggingface.co/transformers/main_classes/tokenizer.html) library.
14 | * **Fine-tuning** for sentiment analysis is handled by the [Trainer](https://huggingface.co/transformers/main_classes/trainer.html) class. 
15 | * After fine-tuning, the [Integrated Gradients](https://captum.ai/api/integrated_gradients.html) interpretability algorithm will assign importance scores to
16 | input tokens. We will use a **PyTorch** implementation from the [Captum](https://captum.ai/) library. 
17 |   - The algorithm requires providing a reference sample (a baseline) since importance attribution is performed based on the model's output, as inputs change from reference values to the actual sample. 
18 |   - The Integrated Gradients method satisfies the **completeness** property. We will look at the sum of attributions for a sample and show that the sum approximates (explains) prediction's shift from the baseline value. 
19 | * The final sections of the notebook contain a colour-coded **visualization** of attribution results made with *captum.attr.visualization* library.
20 | 
21 | The notebook is based on the [Hugging Face documentation](https://huggingface.co/) and the implementation of Integrated Gradients attribution methods is adapted from the Captum.ai
22 | [Interpreting BERT Models (Part 1)](https://captum.ai/tutorials/Bert_SQUAD_Interpret).
23 | 
24 | ### Visualization
25 | Captum visualization library shows in green tokens that push the prediction towards the target class. Those driving the score towards the reference value are marked in red. As a result, words perceived as positive will appear in green if attribution is performed against class 1 (positive) but will be highlighted in red with an attribution targeting class 0 (negative).
26 | 
27 | Because importance scores ar assigned to tokens, not words, some examples may show that attribution is highly dependent on tokenization.
28 | 
29 | 
30 | ### Attributions for a correctly classified positive example
31 | ---
32 | <img src="imgs/electra-attr-positive-positive.png" width="800px" style="max-width:100%">
33 | 
34 | ### Attributions for a correctly classified negative example
35 | ---
36 | <img src="imgs/electra-attr-negative-negative.png" width="800px" style="max-width:100%">
37 | 
38 | ### Attributions for a negative sample misclassified as positive
39 | ---
40 | <img src="imgs/electra-attr-negative-positive.png" width="800px" style="max-width:100%">
41 | 
42 | <br/><br/>
43 | 
44 | ## <a name="bart-twolangs-anchor">Fine-tuning BART for summarization in two languages</a>
45 | ---
46 | 
47 | In a world of ever-growing amount of data, the task of automatically creating coherent and fluent summaries is gaining importance. Coming up with a shorter, concise version of a document, can help to derive value from large volumes of text. 
48 | 
49 | This notebook contains an example of fine-tuning [Bart](https://huggingface.co/transformers/model_doc/bart.html) for generating summaries of article sections from the [WikiLingua](https://huggingface.co/datasets/wiki_lingua) dataset. WikiLingua is a multilingual set of articles. We will run the same code for two Bart checkpoints, including a non-English model from the [Hugging Face Model Hub](https://huggingface.co/models). We will be using:
50 | - the **English** portion of WikiLingua with [sshleifer/distilbart-xsum-12-3](https://huggingface.co/sshleifer/distilbart-xsum-12-3) Bart checkpoint, and
51 | - **French** articles from the WikiLingua with the [moussaKam/barthez-orangesum-abstract](https://huggingface.co/moussaKam/barthez-orangesum-abstract) model.
52 | 
53 | <br/><br/>
54 | 
55 | ## <a name="captumigshapbaseline-anchor">Captum Integrated Gradients and SHAP for a PyTorch MPG prediction model</a>
56 | ---
57 | 
58 | This notebook contains an example of two feature attribution methods applied to a PyTorch model predicting fuel efficiency for the [Auto MPG Data Set](http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data). 
59 | 
60 | We will use the following methods:
61 | - [Integrated Gradients from the Captum package](https://captum.ai/api/integrated_gradients.html)
62 | - custom toy implementation of the [SHAP algorithm (Shapley values)](https://en.wikipedia.org/wiki/Shapley_value)
63 | 
64 | Attribution methods are applied per sample. As a result, each  feature is assigned a value reflecting its contribution to the model's output or, more precisely, to the difference between the model's output for the sample and the *expected value*. 
65 | 
66 | Both methods used in this notebook require setting a baseline, i.e.: a vector of values that will be used, for each feature, in place of a missing value. The baseline vector serves as a set of reference values that can be thought of as *neutral* and that are used to represent a missing value whenever a method requires it. We will calculate the *expected value* as model's output for a selected baseline. 
67 | 
68 | All attributions together account for the difference between the model's prediction for a sample and the expected value of the model's output for a selected baseline. 
69 | 
70 | 
71 | In the examples below we will consider various baselines and see how they influence assigning importance to features.
72 | We will see that, for each sample, attributions sum up to the difference between the model's output for the sample and the *expected value* (model's output for the baseline used to compute attributions).
73 | 
74 | ### Attributions explain prediction
75 | ---
76 | 
77 | <img src="imgs/explain-diff-ig.png" width="800px" style="max-width:100%"> | 
78 | ------------ | 
79 | Attributions sum up to the difference between the model's output and the expected value (model's output for the baseline vector).
80 | 
81 | ### Features and attributions
82 | ---
83 | 
84 | <img src="imgs/attr-features-1.png" width="800px" style="max-width:100%"> | 
85 | ------------ | 
86 | The diagrams show how high and low values of features are distributed across the range of attributions assigned by IG and SHAP for various baselines. For some features, high values of the feature (in red) correlate with high values of attributions (x-axis), for some, they gather in the lower range or there is no clear correlation.  |
87 | 
88 | ### Impact of features
89 | ---
90 | 
91 | <img src="imgs/features-sum-12.png" width="800px" style="max-width:100%"> | 
92 | ------------ | 
93 | Accumulated feature importance varies more between baselines than it does between attribution methods. One intuitive explanation is that since both methods use a baseline to stand for a missing value, features that have close to monotonic relationship to the target will be more consistently attributed a higher absolute impact when replaced by a zero.
94 | 
95 | 


--------------------------------------------------------------------------------
/imgs/attr-features-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elsanns/xai-nlp-notebooks/7b348e1b93710981f5b7565fe628dab70ce2d2e6/imgs/attr-features-1.png


--------------------------------------------------------------------------------
/imgs/electra-attr-negative-negative.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elsanns/xai-nlp-notebooks/7b348e1b93710981f5b7565fe628dab70ce2d2e6/imgs/electra-attr-negative-negative.png


--------------------------------------------------------------------------------
/imgs/electra-attr-negative-positive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elsanns/xai-nlp-notebooks/7b348e1b93710981f5b7565fe628dab70ce2d2e6/imgs/electra-attr-negative-positive.png


--------------------------------------------------------------------------------
/imgs/electra-attr-positive-positive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elsanns/xai-nlp-notebooks/7b348e1b93710981f5b7565fe628dab70ce2d2e6/imgs/electra-attr-positive-positive.png


--------------------------------------------------------------------------------
/imgs/explain-diff-ig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elsanns/xai-nlp-notebooks/7b348e1b93710981f5b7565fe628dab70ce2d2e6/imgs/explain-diff-ig.png


--------------------------------------------------------------------------------
/imgs/features-sum-12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elsanns/xai-nlp-notebooks/7b348e1b93710981f5b7565fe628dab70ce2d2e6/imgs/features-sum-12.png


--------------------------------------------------------------------------------
/imgs/weight-features-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elsanns/xai-nlp-notebooks/7b348e1b93710981f5b7565fe628dab70ce2d2e6/imgs/weight-features-1.png


--------------------------------------------------------------------------------