├── .gitignore
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Training-Data Analysis Papers
 2 | Papers related to instance-based interpretability (e.g. influence functions, prototypes, etc.), measures computed via training dynamics, memorization/forgetting, etc.
 3 | 
 4 | <!--# 2022-->
 5 | 
 6 | <!--Dang et al. [Group’s Influence Value in Logistic Regression Model and Gradient Boosting Model](https://link.springer.com/chapter/10.1007/978-981-16-2377-6_66). In ICICT-->
 7 | 
 8 | ## 2021
 9 | 
10 | Basu et al. [Influence Functions in Deep Learning Are Fragile](https://openreview.net/forum?id=xHKVVHGDOEk). In ICLR 2021.
11 | 
12 | Chen et al. [HyDRA: Hypergradient Data Relevance Analysis for Interpreting Deep Neural Networks](https://ojs.aaai.org/index.php/AAAI/article/view/16871). In AAAI 2021.
13 | 
14 | D'souza et al. [A Tale Of Two Long Tails](https://arxiv.org/abs/2107.13098). In UDL-ICML Workshop 2021.
15 | 
16 | Hanawa et al. [Evaluation of Similarity-based Explanations](https://arxiv.org/abs/2006.04528). in ICLR 2021.
17 | 
18 | Harutyunyan et al. [Estimating informativeness of samples with Smooth Unique Information](https://openreview.net/forum?id=kEnBH98BGs5). In ICLR 2021.
19 | 
20 | Jiang et al. [Characterizing Structural Regularities of Labeled Data in Overparameterized Models](https://proceedings.mlr.press/v139/jiang21k.html). In ICML 2021.
21 | 
22 | Kong and Chaudhuri. [Understanding Instance-based Interpretability of Variational Auto-Encoders](https://papers.nips.cc/paper/2021/hash/13d7dc096493e1f77fb4ccf3eaf79df1-Abstract.html). In NeurIPS 2021.
23 | 
24 | Paul et al. [Deep Learning on a Data Diet: Finding Important Examples Early in Training](https://proceedings.neurips.cc/paper/2021/hash/ac56f8fe9eea3e4a365f29f0f1957c55-Abstract.html). In NeurIPS 2021.
25 | 
26 | Sui et al. [Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models](https://proceedings.neurips.cc//paper/2021/hash/c460dc0f18fc309ac07306a4a55d2fd6-Abstract.html). In NeurIPS 2021.
27 | 
28 | Terashita et al. [Influence Estimation for Generative Adversarial Networks](https://openreview.net/forum?id=opHLcXxYTC_). In ICLR 2021.
29 | 
30 | Zhang et al. [On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation](https://aclanthology.org/2021.acl-long.419/). In ACL 2021.
31 | 
32 | ## 2020
33 | 
34 | Barshan et al. [RelatIF: Identifying Explanatory Training Samples via Relative Influence](http://proceedings.mlr.press/v108/barshan20a.html). In AISTATS 2020.
35 | 
36 | Basu et al. [On Second-Order Group Influence Functions for Black-Box Predictions](http://proceedings.mlr.press/v119/basu20b.html). In ICML 2020.
37 | 
38 | Brophy and Lowd. [TREX: Tree-Ensemble Representer-Point Explanations](https://arxiv.org/abs/2009.05530). In XXAI-ICML Workshop 2020.
39 | 
40 | Chen et al. [Multi-Stage Influence Function](https://proceedings.neurips.cc/paper/2020/hash/95e62984b87e90645a5cf77037395959-Abstract.html). In NeurIPS 2020.
41 | 
42 | Feldman and Zhang. [What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation](https://openreview.net/forum?id=mfoH69cSCz8). In NeurIPS 2020.
43 | 
44 | Vitaly Feldman. [Does learning require memorization? a short tale about a long tail](https://dl.acm.org/doi/abs/10.1145/3357713.3384290). In STOC 2020.
45 | 
46 | Agarwal et al. [Estimating Example Difficulty Using Variance of Gradients](https://arxiv.org/abs/2008.11600). In WHI-ICML Workshop 2020.
47 | 
48 | Jacovi and Goldberg. [Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?](https://aclanthology.org/2020.acl-main.386/). In ACL 2020.
49 | 
50 | Pleiss et al. [Identifying Mislabeled Data using the Area Under the Margin Ranking](https://papers.nips.cc/paper/2020/hash/c6102b3727b2a7d8b1bb6981147081ef-Abstract.html). In NeurIPS 2020.
51 | 
52 | Pruthi et al. [Estimating Training Data Influence by Tracing Gradient Descent](https://proceedings.neurips.cc/paper/2020/hash/e6385d39ec9394f2f3a354d9d2b88eec-Abstract.html). In NeurIPS 2020.
53 | 
54 | Swayamdipta et al. [Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics](https://openreview.net/forum?id=gW72G4zSdR1). In EMNLP 2020.
55 | 
56 | Yoon et al. [Data Valuation using Reinforcement Learning](http://proceedings.mlr.press/v119/yoon20a.html). In ICML 2020.
57 | 
58 | ## 2019
59 | 
60 | Brunet et al. [Understanding the Origins of Bias in Word Embeddings](http://proceedings.mlr.press/v97/brunet19a.html). In ICML 2019.
61 | 
62 | Charpiat et al. [Input Similarity from the Neural Network Perspective](https://proceedings.neurips.cc/paper/2019/hash/c61f571dbd2fb949d3fe5ae1608dd48b-Abstract.html). In NeurIPS 2019.
63 | 
64 | Chen et al. [This Looks Like That: Deep Learning for Interpretable Image Recognition](https://proceedings.neurips.cc/paper/2019/hash/adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html). In NeurIPS 2019.
65 | 
66 | Ghorbani and Zou. [Data Shapley: Equitable Valuation of Data for Machine Learning](http://proceedings.mlr.press/v97/ghorbani19c.html). In ICML 2019.
67 | 
68 | Hara et al. [Data Cleansing for Models Trained with SGD](https://proceedings.neurips.cc/paper/2019/hash/5f14615696649541a025d3d0f8e0447f-Abstract.html). In NeurIPS 2019.
69 | 
70 | Jia et al. [Towards Efficient Data Valuation Based on the Shapley Value](http://proceedings.mlr.press/v89/jia19a.html). AISTATS 2019.
71 | 
72 | Khanna et al. [Interpreting Black Box Predictions using Fisher Kernels](http://proceedings.mlr.press/v89/khanna19a.html). In AISTATS 2019.
73 | 
74 | Koh et al. [On the Accuracy of Influence Functions for Measuring Group Effects](https://papers.nips.cc/paper/2019/hash/a78482ce76496fcf49085f2190e675b4-Abstract.html). In NeurIPS 2019.
75 | 
76 | ## 2018
77 | 
78 | Scharchilev et al. [Finding Influential Training Samples for Gradient Boosted Decision Trees](http://proceedings.mlr.press/v80/sharchilev18a.html). In ICML 2018.
79 | 
80 | Toneva et al. [An Empirical Study of Example Forgetting during Deep Neural Network Learning](https://arxiv.org/abs/1812.05159). In ICLR 2018.
81 | 
82 | Yeh et al. [Representer Point Selection for Explaining Deep Neural Networks](https://proceedings.neurips.cc/paper/2018/hash/8a7129b8f3edd95b7d969dfc2c8e9d9d-Abstract.html). In NeurIPS 2018.
83 | 
84 | ## 2017
85 | 
86 | Koh and Liang. [Understanding Black-box Predictions via Influence Functions](http://proceedings.mlr.press/v70/koh17a). In ICML 2017.
87 | 
88 | ## Before 2017
89 | 
90 | Cook and Weisberg. [Characterizations of an Empirical Influence Function for Detecting Influential Cases in Regression
91 | ](https://www.tandfonline.com/doi/abs/10.1080/00401706.1980.10486199). In Technometrics 1980.
92 | 
93 | Kim et al. [Examples are not enough, learn to criticize! Criticism for Interpretability](https://proceedings.neurips.cc/paper/2016/hash/5680522b8e2bb01943234bce7bf84534-Abstract.html). In NeurIPS 2016.
94 | 


--------------------------------------------------------------------------------