├── .gitignore └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Training-Data Analysis Papers 2 | Papers related to instance-based interpretability (e.g. influence functions, prototypes, etc.), measures computed via training dynamics, memorization/forgetting, etc. 3 | 4 | 5 | 6 | 7 | 8 | ## 2021 9 | 10 | Basu et al. [Influence Functions in Deep Learning Are Fragile](https://openreview.net/forum?id=xHKVVHGDOEk). In ICLR 2021. 11 | 12 | Chen et al. [HyDRA: Hypergradient Data Relevance Analysis for Interpreting Deep Neural Networks](https://ojs.aaai.org/index.php/AAAI/article/view/16871). In AAAI 2021. 13 | 14 | D'souza et al. [A Tale Of Two Long Tails](https://arxiv.org/abs/2107.13098). In UDL-ICML Workshop 2021. 15 | 16 | Hanawa et al. [Evaluation of Similarity-based Explanations](https://arxiv.org/abs/2006.04528). in ICLR 2021. 17 | 18 | Harutyunyan et al. [Estimating informativeness of samples with Smooth Unique Information](https://openreview.net/forum?id=kEnBH98BGs5). In ICLR 2021. 19 | 20 | Jiang et al. [Characterizing Structural Regularities of Labeled Data in Overparameterized Models](https://proceedings.mlr.press/v139/jiang21k.html). In ICML 2021. 21 | 22 | Kong and Chaudhuri. [Understanding Instance-based Interpretability of Variational Auto-Encoders](https://papers.nips.cc/paper/2021/hash/13d7dc096493e1f77fb4ccf3eaf79df1-Abstract.html). In NeurIPS 2021. 23 | 24 | Paul et al. [Deep Learning on a Data Diet: Finding Important Examples Early in Training](https://proceedings.neurips.cc/paper/2021/hash/ac56f8fe9eea3e4a365f29f0f1957c55-Abstract.html). In NeurIPS 2021. 25 | 26 | Sui et al. [Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models](https://proceedings.neurips.cc//paper/2021/hash/c460dc0f18fc309ac07306a4a55d2fd6-Abstract.html). In NeurIPS 2021. 27 | 28 | Terashita et al. [Influence Estimation for Generative Adversarial Networks](https://openreview.net/forum?id=opHLcXxYTC_). In ICLR 2021. 29 | 30 | Zhang et al. [On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation](https://aclanthology.org/2021.acl-long.419/). In ACL 2021. 31 | 32 | ## 2020 33 | 34 | Barshan et al. [RelatIF: Identifying Explanatory Training Samples via Relative Influence](http://proceedings.mlr.press/v108/barshan20a.html). In AISTATS 2020. 35 | 36 | Basu et al. [On Second-Order Group Influence Functions for Black-Box Predictions](http://proceedings.mlr.press/v119/basu20b.html). In ICML 2020. 37 | 38 | Brophy and Lowd. [TREX: Tree-Ensemble Representer-Point Explanations](https://arxiv.org/abs/2009.05530). In XXAI-ICML Workshop 2020. 39 | 40 | Chen et al. [Multi-Stage Influence Function](https://proceedings.neurips.cc/paper/2020/hash/95e62984b87e90645a5cf77037395959-Abstract.html). In NeurIPS 2020. 41 | 42 | Feldman and Zhang. [What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation](https://openreview.net/forum?id=mfoH69cSCz8). In NeurIPS 2020. 43 | 44 | Vitaly Feldman. [Does learning require memorization? a short tale about a long tail](https://dl.acm.org/doi/abs/10.1145/3357713.3384290). In STOC 2020. 45 | 46 | Agarwal et al. [Estimating Example Difficulty Using Variance of Gradients](https://arxiv.org/abs/2008.11600). In WHI-ICML Workshop 2020. 47 | 48 | Jacovi and Goldberg. [Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?](https://aclanthology.org/2020.acl-main.386/). In ACL 2020. 49 | 50 | Pleiss et al. [Identifying Mislabeled Data using the Area Under the Margin Ranking](https://papers.nips.cc/paper/2020/hash/c6102b3727b2a7d8b1bb6981147081ef-Abstract.html). In NeurIPS 2020. 51 | 52 | Pruthi et al. [Estimating Training Data Influence by Tracing Gradient Descent](https://proceedings.neurips.cc/paper/2020/hash/e6385d39ec9394f2f3a354d9d2b88eec-Abstract.html). In NeurIPS 2020. 53 | 54 | Swayamdipta et al. [Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics](https://openreview.net/forum?id=gW72G4zSdR1). In EMNLP 2020. 55 | 56 | Yoon et al. [Data Valuation using Reinforcement Learning](http://proceedings.mlr.press/v119/yoon20a.html). In ICML 2020. 57 | 58 | ## 2019 59 | 60 | Brunet et al. [Understanding the Origins of Bias in Word Embeddings](http://proceedings.mlr.press/v97/brunet19a.html). In ICML 2019. 61 | 62 | Charpiat et al. [Input Similarity from the Neural Network Perspective](https://proceedings.neurips.cc/paper/2019/hash/c61f571dbd2fb949d3fe5ae1608dd48b-Abstract.html). In NeurIPS 2019. 63 | 64 | Chen et al. [This Looks Like That: Deep Learning for Interpretable Image Recognition](https://proceedings.neurips.cc/paper/2019/hash/adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html). In NeurIPS 2019. 65 | 66 | Ghorbani and Zou. [Data Shapley: Equitable Valuation of Data for Machine Learning](http://proceedings.mlr.press/v97/ghorbani19c.html). In ICML 2019. 67 | 68 | Hara et al. [Data Cleansing for Models Trained with SGD](https://proceedings.neurips.cc/paper/2019/hash/5f14615696649541a025d3d0f8e0447f-Abstract.html). In NeurIPS 2019. 69 | 70 | Jia et al. [Towards Efficient Data Valuation Based on the Shapley Value](http://proceedings.mlr.press/v89/jia19a.html). AISTATS 2019. 71 | 72 | Khanna et al. [Interpreting Black Box Predictions using Fisher Kernels](http://proceedings.mlr.press/v89/khanna19a.html). In AISTATS 2019. 73 | 74 | Koh et al. [On the Accuracy of Influence Functions for Measuring Group Effects](https://papers.nips.cc/paper/2019/hash/a78482ce76496fcf49085f2190e675b4-Abstract.html). In NeurIPS 2019. 75 | 76 | ## 2018 77 | 78 | Scharchilev et al. [Finding Influential Training Samples for Gradient Boosted Decision Trees](http://proceedings.mlr.press/v80/sharchilev18a.html). In ICML 2018. 79 | 80 | Toneva et al. [An Empirical Study of Example Forgetting during Deep Neural Network Learning](https://arxiv.org/abs/1812.05159). In ICLR 2018. 81 | 82 | Yeh et al. [Representer Point Selection for Explaining Deep Neural Networks](https://proceedings.neurips.cc/paper/2018/hash/8a7129b8f3edd95b7d969dfc2c8e9d9d-Abstract.html). In NeurIPS 2018. 83 | 84 | ## 2017 85 | 86 | Koh and Liang. [Understanding Black-box Predictions via Influence Functions](http://proceedings.mlr.press/v70/koh17a). In ICML 2017. 87 | 88 | ## Before 2017 89 | 90 | Cook and Weisberg. [Characterizations of an Empirical Influence Function for Detecting Influential Cases in Regression 91 | ](https://www.tandfonline.com/doi/abs/10.1080/00401706.1980.10486199). In Technometrics 1980. 92 | 93 | Kim et al. [Examples are not enough, learn to criticize! Criticism for Interpretability](https://proceedings.neurips.cc/paper/2016/hash/5680522b8e2bb01943234bce7bf84534-Abstract.html). In NeurIPS 2016. 94 | --------------------------------------------------------------------------------