└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Data-Centric AI Intellectual Property Protection 2 | This repository introduces research topics related to protecting the intellectual property (IP) of AI from a data-centric perspective. Such topics include data-centric model IP protection, data authorization protection, data copyright protection, and any other data-level technologies that protect the IP of AI. More content is coming, and in the end, we care about your uniqueness!!! 3 | 4 | ## Data-Centric Model Protection 5 | Verify your ownership of a certain model via certain data and authorize the usage of your model to certain data. 6 | ### Image Data 7 | - Non-Transferable Learning: A New Approach for Model Ownership Verification and Applicability Authorization 8 | - [[paper]](https://arxiv.org/abs/2106.06916) [[code]](https://github.com/conditionWang/NTL) 9 | - ICLR 2022 Oral 10 | - Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection 11 | - [[paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Model_Barrier_A_Compact_Un-Transferable_Isolation_Domain_for_Model_Intellectual_CVPR_2023_paper.pdf) [[code]](https://github.com/LyWang12/CUTI-Domain) 12 | - CVPR 2023 13 | - Domain Specified Optimization for Deployment Authorization 14 | - [[paper]](https://openaccess.thecvf.com/content/ICCV2023/html/Wang_Domain_Specified_Optimization_for_Deployment_Authorization_ICCV_2023_paper.html) 15 | - ICCV 2023 16 | ### Other Data 17 | - Unsupervised Non-transferable Text Classification 18 | - [[paper]](https://arxiv.org/abs/2210.12651) [[code]](https://github.com/ChaosCodes/UNTL) 19 | - EMNLP 2022 20 | 21 | ## Data Authorization Protection (namely unlearnable data or examples) 22 | Prevent unauthorized data usage of model training, usually achieved by decreasing the model performance via poisoning attacks. 23 | ### Image Data 24 | - Unlearnable Examples: Making Personal Data Unexploitable 25 | - [[paper]](https://arxiv.org/abs/2101.04898) [[code]](https://github.com/HanxunH/Unlearnable-Examples) 26 | - ICLR 2021 27 | - Going Grayscale: The Road to Understanding and Improving Unlearnable Examples 28 | - [[paper]](https://arxiv.org/pdf/2111.13244.pdf) [[code]](https://github.com/liuzrcc/ULE-Gray) 29 | - Arxiv 2021 30 | - Robust Unlearnable Examples: Protecting Data Against Adversarial Learning 31 | - [[paper]](https://arxiv.org/abs/2203.14533) [[code]](https://github.com/fshp971/robust-unlearnable-examples) 32 | - ICLR 2022 33 | - Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors 34 | - [[paper]](https://arxiv.org/abs/2211.12005) [[code]](https://github.com/Sizhe-Chen/SEP) 35 | - ICLR 2023 36 | - Transferable Unlearnable Examples 37 | - [[paper]](https://arxiv.org/abs/2210.10114) [[code]](https://github.com/renjie3/TUE) 38 | - ICLR 2023 39 | - LAVA: Data Valuation without Pre-Specified Learning Algorithms 40 | - [[paper]](https://arxiv.org/abs/2305.00054) [[code]](https://github.com/ruoxi-jia-group/LAVA) 41 | - ICLR 2023 Spotlight 42 | - Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples 43 | - [[paper]](https://openaccess.thecvf.com/content/CVPR2023/html/Zhang_Unlearnable_Clusters_Towards_Label-Agnostic_Unlearnable_Examples_CVPR_2023_paper.html) [[code]](https://github.com/jiamingzhang94/Unlearnable-Clusters) 44 | - CVPR 2023 45 | - Universal Unlearnable Examples: Cluster-wise Perturbations without Label-consistency 46 | - [[paper]](https://openreview.net/forum?id=pHO19kq_yT) 47 | - ICLR 2023 submission 48 | - Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples 49 | - [[paper]](https://arxiv.org/abs/2305.09241) 50 | - Arxiv 2023 51 | - Towards Generalizable Data Protection With Transferable Unlearnable Examples 52 | - [[paper]](https://arxiv.org/pdf/2305.11191.pdf) 53 | - Arxiv 2023 54 | - CUDA: Convolution-Based Unlearnable Datasets 55 | - [[paper]](https://openaccess.thecvf.com/content/CVPR2023/html/Sadasivan_CUDA_Convolution-Based_Unlearnable_Datasets_CVPR_2023_paper.html) [[code]](https://github.com/vinusankars/Convolution-based-Unlearnability) 56 | - CVPR 2023 57 | - Raising the Cost of Malicious AI-Powered Image Editing 58 | - [[paper]](https://arxiv.org/pdf/2302.06588.pdf) [[code]](https://github.com/MadryLab/photoguard) 59 | - Arxiv 2023 60 | - Learning the Unlearnable: Adversarial Augmentations Suppress Unlearnable Example Attacks 61 | - [[paper]](https://arxiv.org/abs/2303.15127) [[code]](https://github.com/lafeat/ueraser) 62 | - Arxiv 2023 63 | - Towards Generalizable Data Protection With Transferable Unlearnable Examples 64 | - [[paper]](https://arxiv.org/abs/2305.11191) 65 | - Arxiv 2023 66 | - The Devil's Advocate: Shattering the Illusion of Unexploitable Data using Diffusion Models 67 | - [[paper]](https://arxiv.org/pdf/2303.08500.pdf) 68 | - Arxiv 2023 69 | - GLAZE: Protecting Artists from Style Mimicry by Text-to-Image Models 70 | - [[paper]](https://www.shawnshan.com/files/publication/glaze.pdf) [[App]](https://glaze.cs.uchicago.edu/download.html) 71 | - Usenix Security 2023 72 | - Flew Over Learning Trap: Learn Unlearnable Samples by Progressive Staged Training 73 | - [[paper]](https://arxiv.org/pdf/2306.02064.pdf) [[code]](https://github.com/CherryBlueberry/ST) 74 | - Arxiv 2023 75 | - Segue: Side-information Guided Generative Unlearnable Examples for Facial Privacy Protection in Real World 76 | - [[paper]](https://arxiv.org/pdf/2310.16061.pdf) 77 | - Arxiv 2023 78 | - What Can We Learn from Unlearnable Datasets? 79 | - [[paper]](https://arxiv.org/pdf/2305.19254.pdf) [[code]](https://github.com/psandovalsegura/learn-from-unlearnable) 80 | - NeurIPS 2023 81 | 82 | ### Other Data 83 | - Unlearnable Examples: Protecting Open-Source Software from Unauthorized Neural Code Learning 84 | - [[paper]](https://people.cs.pitt.edu/~chang/seke/seke22paper/paper066.pdf) [[code]](https://github.com/ZhenlanJi/Unlearnable_Code) 85 | - SEKE 2022 86 | - WaveFuzz: A Clean-Label Poisoning Attack to Protect Your Voice 87 | - [[paper]](https://arxiv.org/abs/2203.13497) 88 | - Arxiv 2023 89 | - Unlearnable Graph: Protecting Graphs from Unauthorized Exploitation 90 | - [[paper]](https://arxiv.org/abs/2303.02568) 91 | - Poster at NDSS 2023 92 | - Securing Biomedical Images from Unauthorized Training with Anti-Learning Perturbation 93 | - [[paper]](https://arxiv.org/abs/2303.02559) 94 | - Poster at NDSS 2023 95 | - UPTON: Unattributable Authorship Text via Data Poisoning 96 | - [[paper]](https://arxiv.org/abs/2211.09717) 97 | - Arxiv 2023 98 | - GraphCloak: Safeguarding Task-specific Knowledge within Graph-structured Data from Unauthorized Exploitation 99 | - [[paper]](https://arxiv.org/abs/2310.07100) 100 | - Arxiv 2023 101 | - Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data 102 | - [[paper]](https://arxiv.org/pdf/2307.00456.pdf) 103 | - Arxiv 2023 104 | 105 | 106 | ## Data Copyright Protection 107 | Verify your ownership of certain data via black-box model access. 108 | ### Image Data 109 | - Radioactive data: tracing through training 110 | - [[paper]](http://proceedings.mlr.press/v119/sablayrolles20a.html) [[code]](https://github.com/facebookresearch/radioactive_data) 111 | - ICML 2020 112 | - Tracing Data through Learning with Watermarking 113 | - [[paper]](https://dl.acm.org/doi/abs/10.1145/3437880.3458442) 114 | - Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security 115 | - On the Effectiveness of Dataset Watermarking 116 | - [[paper]](https://dl.acm.org/doi/abs/10.1145/3510548.3519376) 117 | - Proceedings of the 2022 ACM on International Workshop on Security and Privacy Analytics 118 | - Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection 119 | - [[paper]](https://arxiv.org/pdf/2210.00875.pdf) [[code]](https://github.com/THUYimingLi/Untargeted_Backdoor_Watermark) 120 | - NeurIPS 2022 Oral 121 | - Did You Train on My Dataset? Towards Public Dataset Protection with Clean-Label Backdoor Watermarking 122 | - [[paper]](https://arxiv.org/abs/2303.11470) 123 | - Arxiv 2023 124 | - On the Effectiveness of Dataset Watermarking in Adversarial Settings 125 | - [[paper]](https://arxiv.org/pdf/2202.12506.pdf) 126 | - Proceedings of CODASPY-IWSPA 2022 127 | - Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized Neural Networks 128 | - [[paper]](https://arxiv.org/pdf/2109.09023.pdf) 129 | - ECCV 2022 130 | - Data Isotopes for Data Provenance in DNNs 131 | - [[paper]](https://arxiv.org/pdf/2208.13893.pdf) [[code]](https://anonymous.4open.science/r/data-isotopes-2E24/README.md) 132 | - Arxiv 2022 133 | - Watermarking for Data Provenance in Object Detection 134 | - [[paper]](https://ieeexplore.ieee.org/abstract/document/10092239?casa_token=WRZjEznm7CIAAAAA:NCAQoiyihAJl9L2lDtaOLTPQLX8VImVrQZaz9lTtdDCMNtMhui_kI_iPsoxM4f2rih7tbvOKkZ0) 135 | - 2022 IEEE Applied Imagery Pattern Recognition Workshop (AIPR) 136 | - Reclaiming the Digital Commons: A Public Data Trust for Training Data 137 | - [[paper]](https://arxiv.org/pdf/2303.09001.pdf) 138 | - AIES 2023 139 | - MedLocker: A Transferable Adversarial Watermarking for Preventing Unauthorized Analysis of Medical Image Dataset 140 | - [[paper]](https://arxiv.org/abs/2303.09858) 141 | - Arxiv 2023 142 | - How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models 143 | - [[paper]](https://arxiv.org/pdf/2307.03108.pdf) 144 | - Arxiv 2023 145 | - FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models 146 | - [[paper]](https://arxiv.org/pdf/2310.02401.pdf) 147 | - Arxiv 2023 148 | - Domain Watermark: Effective and Harmless Dataset Copyright Protection is Closed at Hand 149 | - [[paper]](https://arxiv.org/abs/2310.14942) [[code]](https://github.com/JunfengGo/Domain-Watermark) 150 | - NeurIPS 2023 151 | - DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models 152 | - [[paper]](https://arxiv.org/abs/2306.04642) 153 | - Arxiv 2023 154 | 155 | ### Other Data 156 | - CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning 157 | - [[paper]](https://dl.acm.org/doi/pdf/10.1145/3485447.3512225) [[code]](https://github.com/v587su/CoProtector) 158 | - WWW 2022 159 | - CodeMark: Imperceptible Watermarking for Code Datasets against Neural Code Completion Models 160 | - [[paper]](https://arxiv.org/abs/2308.14401) 161 | - FSE 2023 162 | - Watermarking Classification Dataset for Copyright Protection 163 | - [[paper]](https://arxiv.org/abs/2305.13257) 164 | - Arxiv 2023 165 | --------------------------------------------------------------------------------