├── README.md ├── adv_image.py ├── config.py ├── data ├── noise_tag.pth ├── noised_trigger.png ├── noised_trigger.pth ├── ori_trigger.png ├── poisoned_sample_demo.png └── tag.pth ├── imagenet10_dataloader.py ├── images ├── 21.png ├── 22.png └── 23.png ├── main.py ├── models ├── netG_conv.pth ├── netG_epoch_160.pth ├── netG_epoch_60.pth └── resnet18_imagenet10_transferlearning.pth ├── pre_model_extractor.py ├── regular_generator.py ├── requirements.txt ├── resnet_block.py ├── training_with_poisioned_dataset.py └── transfer_learning_clean_imagenet10_0721.py /README.md: -------------------------------------------------------------------------------- 1 | # OpenPrivML: A Privacy-Preserving Machine Learning Ecosystem 2 | 3 | Welcome to **OpenPrivML**, a collaborative ecosystem for secure and efficient machine learning. This repository contains core modules, documentation, and examples aimed at helping researchers, developers, and practitioners build ML workflows that protect data confidentiality through advanced cryptographic techniques. 4 | 5 | ## Table of Contents 6 | - [Overview](#overview) 7 | - [Key Features](#key-features) 8 | - [Architecture](#architecture) 9 | - [Installation](#installation) 10 | - [Usage Examples](#usage-examples) 11 | - [Repository Structure](#repository-structure) 12 | - [Contributing](#contributing) 13 | - [Community & Governance](#community--governance) 14 | - [License](#license) 15 | - [Citing OpenPrivML](#citing-openprivml) 16 | - [Acknowledgments](#acknowledgments) 17 | - [Contact](#contact) 18 | 19 | --- 20 | 21 | ## Overview 22 | **OpenPrivML** is a project to develop a robust open-source ecosystem for privacy-preserving ML. Our goal is to reconcile the tension between high-security requirements and the performance demands of modern deep learning pipelines. Through a blend of **homomorphic encryption**, **secure multiparty computation**, and targeted **model compression and pipelining** strategies, we aim to enable near real-time processing of confidential data across healthcare, finance, and other sensitive domains. 23 | 24 | ### Project Personnel and Partner Institutions 25 | 1. **Hongyi Wu** – University of Arizona (PI) 26 | 2. **Rui Ning** – Old Dominion University 27 | 28 | --- 29 | 30 | ## Key Features 31 | - **Homomorphic Encryption**: Allows computations directly on encrypted data for secure inference. 32 | - **Secure Multiparty Computation**: Distributes computation among multiple parties to maintain privacy. 33 | - **Model Compression & Optimization**: Uses advanced pruning, layer consolidation, and pipelining to reduce latency. 34 | - **Modular Design**: Easy integration with popular ML libraries (e.g., TensorFlow, PyTorch) and cryptographic backends. 35 | - **Community-Driven**: Encourages external contributions, domain-specific optimizations, and transparent governance. 36 | 37 | --- 38 | 39 | ## Architecture 40 | OpenPrivML adopts a **layered architecture**: 41 | 1. **Core Crypto Layer**: Implements homomorphic encryption, secure MPC, and other cryptographic primitives. 42 | 2. **ML Integration Layer**: Bridges between standard ML frameworks and our crypto layer, handling encryption/decryption workflows. 43 | 3. **Optimization Layer**: Provides compression, pipelining, and caching strategies for efficient computation on resource-limited devices. 44 | 4. **Application Layer**: Contains example applications, demos, and domain-specific integrations (e.g., healthcare, finance). 45 | 46 | --- 47 | 48 | ## Installation 49 | **Prerequisites**: 50 | - Python 3.8+ 51 | - [Git](https://git-scm.com/) 52 | - Recommended: virtual environment (e.g., `venv`, Conda) 53 | 54 | **Steps**: 55 | ```bash 56 | # Clone the repository 57 | git clone https://github.com/your-org/openprivml.git 58 | cd openprivml 59 | 60 | # (Optional) Create and activate a virtual environment 61 | python -m venv venv 62 | source venv/bin/activate # or venv\Scripts\activate on Windows 63 | 64 | # Install dependencies 65 | pip install -r requirements.txt 66 | 67 | # Verify successful installation 68 | pytest tests 69 | -------------------------------------------------------------------------------- /adv_image.py: -------------------------------------------------------------------------------- 1 | 2 | import torch.nn as nn 3 | import torch 4 | import torch.nn.functional as F 5 | import torchvision 6 | import os 7 | import config as cfg 8 | from transfer_learning_clean_imagenet10_0721 import Imagenet10ResNet18 9 | 10 | # Define paths for saving models and adversarial images 11 | models_path = cfg.models_path 12 | adv_img_path = cfg.adv_img_path 13 | 14 | def weights_init(m): 15 | """Initialize network weights using specific distributions. 16 | 17 | Args: 18 | m (nn.Module): Neural network module to initialize 19 | 20 | This function applies custom initialization: 21 | - Convolutional layers: Normal distribution with mean=0.0, std=0.02 22 | - BatchNorm layers: Weights from N(1.0, 0.02), biases=0 23 | """ 24 | classname = m.__class__.__name__ 25 | if classname.find('Conv') != -1: 26 | nn.init.normal_(m.weight.data, 0.0, 0.02) 27 | elif classname.find('BatchNorm') != -1: 28 | nn.init.normal_(m.weight.data, 1.0, 0.02) 29 | nn.init.constant_(m.bias.data, 0) 30 | 31 | class Adv_Gen: 32 | """Adversarial Generator class for creating adversarial images. 33 | 34 | This class implements an adversarial generator that creates perturbed images 35 | designed to fool a target classifier while maintaining visual similarity 36 | to original images. 37 | """ 38 | 39 | def __init__(self, device, model_extractor, generator): 40 | """Initialize the adversarial generator. 41 | 42 | Args: 43 | device (torch.device): Device to run computations on (CPU/GPU) 44 | model_extractor (nn.Module): Model for extracting image features 45 | generator (nn.Module): Generator network for creating adversarial perturbations 46 | """ 47 | 48 | 49 | # Assign the computation device (e.g., "cuda" for GPU or "cpu"). 50 | self.device = device 51 | # Initialize the feature extractor model. This model is responsible for extracting relevant features from the input data. 52 | self.model_extractor = model_extractor # Feature extractor model 53 | self.generator = generator # Generator model 54 | 55 | self.box_min = cfg.BOX_MIN # Minimum value for pixel normalization 56 | self.box_max = cfg.BOX_MAX # Maximum value for pixel normalization 57 | 58 | self.ite = 0 # Iteration counter 59 | 60 | # Move models to specified device 61 | self.model_extractor.to(device) 62 | self.generator.to(device) 63 | 64 | # Setup classifier (ResNet18 pretrained on ImageNet10) 65 | self.classifer = Imagenet10ResNet18() 66 | self.classifer.load_state_dict(torch.load('models/resnet18_imagenet10_transferlearning.pth')) 67 | self.classifer.to(device) 68 | # Enable multi-GPU training for classifier 69 | self.classifer = torch.nn.DataParallel(self.classifer, device_ids=[0, 1]) 70 | 71 | # Set classifier to eval mode but keep BatchNorm in training mode 72 | self.classifer.train() 73 | for p in self.classifer.parameters(): 74 | p.requires_grad = False 75 | 76 | # Initialize Adam optimizer for generator 77 | self.optimizer_G = torch.optim.Adam(self.generator.parameters(), lr=0.001) 78 | 79 | # Create necessary directories 80 | if not os.path.exists(models_path): 81 | os.makedirs(models_path) 82 | if not os.path.exists(adv_img_path): 83 | os.makedirs(adv_img_path) 84 | 85 | # define batch train function 86 | def train_batch(self, x): 87 | """Train generator on a single batch of images. 88 | 89 | Args: 90 | x (torch.Tensor): Batch of input images 91 | 92 | Returns: 93 | tuple: (adversarial loss value, generated adversarial images, classifier predictions) 94 | """ 95 | # Reset gradients 96 | self.optimizer_G.zero_grad() 97 | 98 | # Generate adversarial images 99 | adv_imgs = self.generator(x) 100 | 101 | # Compute features and predictions (no gradient computation needed) 102 | with torch.no_grad(): 103 | 104 | class_out = self.classifer(adv_imgs) # Classifier output for adversarial images 105 | tagged_feature = self.model_extractor(x) # Extract features from clean images 106 | 107 | adv_img_feature = self.model_extractor(adv_imgs) # Extract features from adversarial images 108 | # Calculate adversarial loss using L1 distance between features 109 | # Multiply by noise coefficient to control perturbation magnitude 110 | # Compute adversarial loss using L1 loss between features 111 | 112 | loss_adv = F.l1_loss(tagged_feature, adv_img_feature * cfg.noise_coeff) 113 | loss_adv.backward(retain_graph=True) 114 | 115 | # Update generator 116 | self.optimizer_G.step() 117 | 118 | return loss_adv.item(), adv_imgs, class_out 119 | 120 | def train(self, train_dataloader, epochs): 121 | """Train the generator for multiple epochs. 122 | 123 | Args: 124 | train_dataloader (DataLoader): DataLoader for training data 125 | epochs (int): Number of epochs to train for 126 | """ 127 | for epoch in range(1, epochs + 1): 128 | # Learning rate scheduling 129 | if epoch == 200: 130 | self.optimizer_G = torch.optim.Adam(self.generator.parameters(), lr=0.0001) 131 | if epoch == 400: 132 | self.optimizer_G = torch.optim.Adam(self.generator.parameters(), lr=0.00001) 133 | 134 | # Initialize epoch statistics 135 | loss_adv_sum = 0 136 | self.ite = epoch 137 | correct = 0 138 | total = 0 139 | 140 | # Train on batches 141 | for i, data in enumerate(train_dataloader, start=0): 142 | images, labels = data 143 | images, labels = images.to(self.device), labels.to(self.device) 144 | 145 | # Train on current batch 146 | loss_adv_batch, adv_img, class_out = self.train_batch(images) 147 | loss_adv_sum += loss_adv_batch 148 | 149 | # Calculate classification accuracy 150 | predicted_classes = torch.max(class_out, 1)[1] 151 | correct += (predicted_classes == labels).sum().item() 152 | total += labels.size(0) 153 | 154 | 155 | print("计算分类准确率中...") 156 | # Save and visualize adversarial images 157 | torchvision.utils.save_image(torch.cat((adv_img[:7], images[:7])), 158 | adv_img_path + str(epoch) + ".png", 159 | normalize=True, scale_each=True, nrow=7) 160 | 161 | # Print training statistics 162 | 163 | num_batch = len(train_dataloader) 164 | print("epoch %d:\n loss_adv: %.3f, \n" % 165 | (epoch, loss_adv_sum / num_batch)) 166 | print(f"Classification ACC: {correct / total}") 167 | 168 | # Save model checkpoint every 20 epochs 169 | if epoch % 20 == 0: 170 | netG_file_name = models_path + 'netG_epoch_' + str(epoch) + '.pth' 171 | torch.save(self.generator.state_dict(), netG_file_name) 172 | 173 | # Generate and save demo of poisoned samples 174 | trigger_img = torch.squeeze(torch.load('data/tag.pth')) 175 | noised_trigger_img = self.generator(torch.unsqueeze(trigger_img, 0)) 176 | torchvision.utils.save_image( 177 | (images + noised_trigger_img)[:5], 178 | 'data/poisoned_sample_demo.png', 179 | normalize=True, 180 | scale_each=True, 181 | nrow=5 182 | ) 183 | -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | # General Configuration 2 | use_cuda = True 3 | 4 | # Number of image channels (3 for RGB images) 5 | image_nc = 3 6 | 7 | # Number of training epochs 8 | epochs = 800 9 | 10 | # Batch size for training 11 | batch_size = 64 12 | 13 | # Minimum and maximum values for bounding boxes 14 | BOX_MIN = 0 15 | BOX_MAX = 1 16 | 17 | # Pretrained model architecture to use (ResNet-18) 18 | pretrained_model_arch = 'resnet18' 19 | 20 | # Number of layers to extract features from 21 | num_layers_ext = 5 22 | 23 | # Whether to keep the feature extractor layers fixed during training 24 | ext_fixed = True 25 | 26 | # Whether to tag generated images 27 | G_tagged = False 28 | 29 | # Size of the tags to be added to generated images 30 | tag_size = 6 31 | 32 | # Coefficient for noise to be added to images 33 | noise_coeff = 0.35 34 | 35 | # Whether to concatenate generated images with tags 36 | cat_G = False 37 | 38 | # Whether to add noise to images 39 | noise_img = True 40 | 41 | 42 | # Path to the pre-trained generator model 43 | 44 | noise_g_path = './models/netG_epoch_160.pth' 45 | 46 | # Path to the pre-trained generator model without tags 47 | noTag_noise_g_path = './models/noTag_netG_epoch_80.pth' 48 | 49 | 50 | # Directory for ImageNet-10 training images 51 | imagenet10_traindir = '~/Pictures/transfer_imgnet_10/train' 52 | 53 | # Directory for ImageNet-10 validation images 54 | imagenet10_valdir = '~/Pictures/transfer_imgnet_10/val' 55 | 56 | # Directory for ImageNet-10 physical validation images 57 | imagenet10_phyvaldir = '~/Pictures/phy/val' 58 | 59 | 60 | ======= 61 | # Path to save models 62 | models_path = './models/' 63 | 64 | # Path to save adversarial images 65 | adv_img_path = './images/' 66 | 67 | # Path to save CIFAR-10 models 68 | cifar10_models_path = './models/' 69 | 70 | # Path to save CIFAR-10 adversarial images 71 | cifar10_adv_img_path = './images/0828/adv/' 72 | 73 | # Use Automatic Mixed Precision (AMP) for training 74 | use_amp = True 75 | -------------------------------------------------------------------------------- /data/noise_tag.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/data/noise_tag.pth -------------------------------------------------------------------------------- /data/noised_trigger.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/data/noised_trigger.png -------------------------------------------------------------------------------- /data/noised_trigger.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/data/noised_trigger.pth -------------------------------------------------------------------------------- /data/ori_trigger.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/data/ori_trigger.png -------------------------------------------------------------------------------- /data/poisoned_sample_demo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/data/poisoned_sample_demo.png -------------------------------------------------------------------------------- /data/tag.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/data/tag.pth -------------------------------------------------------------------------------- /imagenet10_dataloader.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import config as cfg 3 | import torchvision.datasets as datasets 4 | import torchvision.transforms as transforms 5 | 6 | def get_data_loaders(): 7 | 8 | """ 9 | Prepares and returns DataLoader objects for training and validation sets 10 | of ImageNet-10 (a subset of ImageNet with 10 classes). 11 | 12 | Returns: 13 | tuple: (train_loader, val_loader) - DataLoader objects for training and validation 14 | """ 15 | 16 | print('==> Preparing Imagenet 10 class data..') 17 | # Data loading code 18 | traindir = cfg.imagenet10_traindir # Training data directory 19 | valdir = cfg.imagenet10_valdir # Validation data directory 20 | 21 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], 22 | std=[0.229, 0.224, 0.225]) 23 | 24 | train_loader = torch.utils.data.DataLoader( 25 | # Use ImageFolder to load images from the training directory 26 | datasets.ImageFolder(traindir, transforms.Compose([ 27 | transforms.RandomResizedCrop(224), 28 | transforms.RandomHorizontalFlip(), 29 | transforms.ToTensor(), 30 | normalize, 31 | ])), 32 | batch_size=cfg.batch_size, shuffle=True, 33 | num_workers=12, pin_memory=True) 34 | 35 | val_loader = torch.utils.data.DataLoader( 36 | datasets.ImageFolder(valdir, transforms.Compose([ 37 | transforms.Resize(256), 38 | transforms.CenterCrop(224), 39 | transforms.ToTensor(), 40 | normalize, 41 | ])), 42 | batch_size=cfg.batch_size, shuffle=True, 43 | num_workers=12, pin_memory=True) 44 | 45 | return train_loader, val_loader 46 | 47 | 48 | def get_phydata_loaders(): 49 | print('==> Preparing Physical Imagenet 10 class data..') 50 | # Data loading code 51 | 52 | valdir = cfg.imagenet10_phyvaldir 53 | 54 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], 55 | std=[0.229, 0.224, 0.225]) 56 | 57 | val_loader = torch.utils.data.DataLoader( 58 | datasets.ImageFolder(valdir, transforms.Compose([ 59 | transforms.Resize(224), 60 | transforms.ToTensor(), 61 | normalize, 62 | ])), 63 | batch_size=1, shuffle=True, 64 | num_workers=12, pin_memory=True) 65 | 66 | return val_loader 67 | -------------------------------------------------------------------------------- /images/21.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/images/21.png -------------------------------------------------------------------------------- /images/22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/images/22.png -------------------------------------------------------------------------------- /images/23.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/images/23.png -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import config as cfg 3 | from imagenet10_dataloader import get_data_loaders 4 | from adv_image import Adv_Gen 5 | from regular_generator import conv_generator, Generator 6 | from pre_model_extractor import model_extractor 7 | 8 | 9 | 10 | if __name__ == '__main__': # Main entry point of the script edit_siqi 20/01/25 11 | 12 | 13 | print("CUDA Available: ", torch.cuda.is_available()) # Print if CUDA is available 14 | device = torch.device("cuda:0" if (cfg.use_cuda and torch.cuda.is_available()) else "cpu") # Set device to CUDA if available and configured, otherwise CPU 15 | 16 | 17 | train_loader, val_loader = get_data_loaders() # Get training and validation data loaders 18 | if train_loader is None: 19 | raise ValueError("Error: train_loader is empty. Check dataset path or loading method.") 20 | if val_loader is None: 21 | raise ValueError("Error: val_loader is empty. Check dataset path or loading method.") 22 | 23 | 24 | 25 | feature_ext = model_extractor('resnet18', 5, True) # Extract features using ResNet18 model 26 | 27 | generator = conv_generator() # Initialize convolutional generator 28 | # Provides flexibility to switch between different architectures 29 | # Two different auto-encoders are provided here 30 | # generator = Generator(3,3) # Alternative generator initialization 31 | advGen = Adv_Gen(device, feature_ext, generator) # Initialize adversarial generator with device, feature extractor, and generator 32 | 33 | advGen.train(train_loader, cfg.epochs) # Train adversarial generator with training data and number of epochs from config 34 | 35 | 36 | -------------------------------------------------------------------------------- /models/netG_conv.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/models/netG_conv.pth -------------------------------------------------------------------------------- /models/netG_epoch_160.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/models/netG_epoch_160.pth -------------------------------------------------------------------------------- /models/netG_epoch_60.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/models/netG_epoch_60.pth -------------------------------------------------------------------------------- /models/resnet18_imagenet10_transferlearning.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rigley007/OpenPrivML/5232b47dbcb37cd4bfc5b8cbe03d0ddbf11429b1/models/resnet18_imagenet10_transferlearning.pth -------------------------------------------------------------------------------- /pre_model_extractor.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torchvision.models as pre_models 3 | 4 | 5 | # Return first n layers of a pretrained model 6 | class model_extractor(nn.Module): 7 | def __init__(self, arch, num_layers, fix_weights): 8 | super(model_extractor, self).__init__() 9 | if arch.startswith('alexnet') : 10 | original_model = pre_models.alexnet(pretrained=True) 11 | elif arch.startswith('resnet') : 12 | original_model = pre_models.resnet18(pretrained=True) 13 | elif arch.startswith('vgg16'): 14 | original_model = pre_models.vgg16_bn(pretrained=True) 15 | elif arch.startswith('inception_v3'): 16 | original_model = pre_models.inception_v3(pretrained=True) 17 | elif arch.startswith('densenet121'): 18 | original_model = pre_models.densenet121(pretrained=True) 19 | elif arch.startswith('googlenet'): 20 | original_model = pre_models.googlenet(pretrained=True) 21 | else : 22 | raise("Not support on this architecture yet") 23 | self.features = nn.Sequential(*list(original_model.children())[:num_layers]) 24 | if fix_weights == True: 25 | # Freeze the Model's weights with unfixed Batch Norm 26 | self.features.train() # Unfix all the layers 27 | for p in self.features.parameters(): 28 | p.requires_grad = False # Fix all the layers excluding BatchNorm layers 29 | self.modelName = arch 30 | 31 | def forward(self, x): 32 | f = self.features(x) 33 | return f 34 | -------------------------------------------------------------------------------- /regular_generator.py: -------------------------------------------------------------------------------- 1 | 2 | import torch.nn as nn 3 | from resnet_block import ResnetBlock 4 | 5 | from pre_model_extractor import model_extractor 6 | 7 | class Generator(nn.Module): 8 | """Standard Generator architecture with encoder-decoder structure. 9 | 10 | This generator follows a classic encoder-decoder architecture with: 11 | - Encoder: Series of strided convolutions 12 | - Bottleneck: Multiple ResNet blocks 13 | - Decoder: Series of transposed convolutions 14 | 15 | Particularly designed for image-to-image translation tasks. 16 | """ 17 | 18 | def __init__(self, gen_input_nc, image_nc): 19 | """Initialize the generator network. 20 | 21 | Args: 22 | gen_input_nc (int): Number of input channels 23 | image_nc (int): Number of output channels 24 | """ 25 | super(Generator, self).__init__() 26 | 27 | # Encoder layers: Progressively reduce spatial dimensions while increasing channels 28 | encoder_lis = [ 29 | # Input layer: gen_input_nc channels -> 8 channels 30 | # Input size: 28x28 -> 26x26 31 | nn.Conv2d(gen_input_nc, 8, kernel_size=3, stride=1, padding=0, bias=True), 32 | nn.InstanceNorm2d(8), 33 | nn.ReLU(), 34 | 35 | # Second layer: 8 channels -> 16 channels 36 | # Size: 26x26 -> 12x12 37 | nn.Conv2d(8, 16, kernel_size=3, stride=2, padding=0, bias=True), 38 | nn.InstanceNorm2d(16), 39 | nn.ReLU(), 40 | 41 | # Third layer: 16 channels -> 32 channels 42 | # Size: 12x12 -> 5x5 43 | nn.Conv2d(16, 32, kernel_size=3, stride=2, padding=0, bias=True), 44 | nn.InstanceNorm2d(32), 45 | nn.ReLU(), 46 | ] 47 | 48 | # Bottleneck: 4 ResNet blocks for processing features 49 | bottle_neck_lis = [ 50 | ResnetBlock(32), 51 | ResnetBlock(32), 52 | ResnetBlock(32), 53 | ResnetBlock(32), 54 | ] 55 | 56 | # Decoder layers: Progressively increase spatial dimensions while decreasing channels 57 | decoder_lis = [ 58 | # First upsampling: 32 channels -> 16 channels 59 | # Size: 5x5 -> 11x11 60 | nn.ConvTranspose2d(32, 16, kernel_size=3, stride=2, padding=0, bias=False), 61 | nn.InstanceNorm2d(16), 62 | nn.ReLU(), 63 | 64 | # Second upsampling: 16 channels -> 8 channels 65 | # Size: 11x11 -> 23x23 66 | nn.ConvTranspose2d(16, 8, kernel_size=3, stride=2, padding=0, bias=False), 67 | nn.InstanceNorm2d(8), 68 | nn.ReLU(), 69 | 70 | # Final layer: 8 channels -> image_nc channels 71 | # Size: 23x23 -> 28x28 72 | nn.ConvTranspose2d(8, image_nc, kernel_size=6, stride=1, padding=0, bias=False), 73 | nn.Tanh() # Normalize output to [-1, 1] 74 | ] 75 | 76 | # Create sequential modules 77 | self.encoder = nn.Sequential(*encoder_lis) 78 | self.bottle_neck = nn.Sequential(*bottle_neck_lis) 79 | self.decoder = nn.Sequential(*decoder_lis) 80 | 81 | def forward(self, x): 82 | """Forward pass through the generator. 83 | 84 | Args: 85 | x (torch.Tensor): Input tensor 86 | 87 | Returns: 88 | torch.Tensor: Generated output tensor 89 | """ 90 | x = self.encoder(x) 91 | x = self.bottle_neck(x) 92 | x = self.decoder(x) 93 | return x 94 | 95 | class conv_generator(nn.Module): 96 | """Convolutional Generator using ResNet features. 97 | 98 | This generator uses a pretrained ResNet18 as encoder and 99 | a custom decoder with ResNet blocks and upsampling layers. 100 | Designed for higher resolution image generation (224x224). 101 | """ 102 | 103 | def __init__(self): 104 | """Initialize the convolutional generator network.""" 105 | super(conv_generator, self).__init__() 106 | 107 | # Use pretrained ResNet18 (first 5 layers) as encoder 108 | self.encoder = model_extractor('resnet18', 5, True) 109 | 110 | # Decoder architecture 111 | decoder_lis = [ 112 | # ResNet blocks for processing features 113 | ResnetBlock(64), 114 | ResnetBlock(64), 115 | ResnetBlock(64), 116 | # Upsampling layers 117 | nn.UpsamplingNearest2d(scale_factor=2), 118 | # Final convolution to generate RGB image 119 | nn.ConvTranspose2d(64, 3, kernel_size=7, stride=2, padding=3, 120 | output_padding=1, bias=False), 121 | nn.Tanh() # Normalize output to [-1, 1] 122 | ] 123 | self.decoder = nn.Sequential(*decoder_lis) 124 | 125 | def forward(self, x): 126 | """Forward pass through the generator. 127 | 128 | Args: 129 | x (torch.Tensor): Input tensor 130 | 131 | Returns: 132 | torch.Tensor: Generated output tensor 133 | """ 134 | x = self.encoder(x) 135 | out = self.decoder(x) 136 | return out 137 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch==1.9.1 2 | torchvision==0.10.1 3 | tqdm==4.62.3 4 | scikit-learn==0.24.2 5 | -------------------------------------------------------------------------------- /resnet_block.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | 3 | 4 | # Define a ResNet block class 5 | 6 | class ResnetBlock(nn.Module): 7 | def __init__(self, dim, padding_type='reflect', norm_layer=nn.BatchNorm2d, use_dropout=False, use_bias=False): 8 | """ 9 | Initialize the ResNet block. 10 | 11 | Args: 12 | dim (int): Number of input/output channels. 13 | padding_type (str): Type of padding ('reflect', 'replicate', or 'zero'). 14 | norm_layer (nn.Module): Normalization layer to use (e.g., BatchNorm2d). 15 | use_dropout (bool): Whether to include a dropout layer in the block. 16 | use_bias (bool): Whether the convolutional layers should use bias. 17 | """ 18 | super(ResnetBlock, self).__init__() 19 | 20 | # Print out parameters during initialization 21 | print(f"Initializing ResNet block with {dim} channels, padding type: {padding_type}, use_dropout: {use_dropout}, use_bias: {use_bias}") 22 | 23 | # Build the convolutional block 24 | self.conv_block = self.build_conv_block(dim, padding_type, norm_layer, use_dropout, use_bias) 25 | 26 | def build_conv_block(self, dim, padding_type, norm_layer, use_dropout, use_bias): 27 | """ 28 | Build the sequence of layers for the convolutional block. 29 | 30 | Args: 31 | dim (int): Number of input/output channels. 32 | padding_type (str): Type of padding ('reflect', 'replicate', or 'zero'). 33 | norm_layer (nn.Module): Normalization layer to use. 34 | use_dropout (bool): Whether to include a dropout layer. 35 | use_bias (bool): Whether the convolutional layers should use bias. 36 | 37 | Returns: 38 | nn.Sequential: A sequential container for the layers of the convolutional block. 39 | """ 40 | print(f"Building convolutional block with padding: {padding_type}") 41 | 42 | conv_block = [] 43 | p = 0 44 | 45 | # Choose padding based on the specified type 46 | if padding_type == 'reflect': 47 | # If the padding type is 'reflect', use ReflectionPad2d for padding 48 | print("Using ReflectionPad2d for padding.") 49 | conv_block += [nn.ReflectionPad2d(1)] 50 | elif padding_type == 'replicate': 51 | # If the padding type is 'replicate', use ReplicationPad2d for padding 52 | print("Using ReplicationPad2d for padding.") 53 | conv_block += [nn.ReplicationPad2d(1)] 54 | elif padding_type == 'zero': 55 | # If the padding type is 'zero', use zero padding 56 | p = 1 57 | print("Using zero padding.") 58 | else: 59 | raise NotImplementedError(f'Padding type [{padding_type}] is not implemented.') 60 | 61 | # First convolutional layer with normalization and ReLU activation 62 | print("Adding first convolutional layer, normalization, and ReLU activation.") 63 | conv_block += [nn.Conv2d(dim, dim, kernel_size=3, padding=p, bias=use_bias), 64 | norm_layer(dim), 65 | nn.ReLU(True)] 66 | 67 | # Add dropout layer if specified 68 | if use_dropout: 69 | print("Adding dropout layer with p=0.5.") 70 | conv_block += [nn.Dropout(0.5)] 71 | 72 | # Second convolutional layer 73 | p = 0 74 | if padding_type == 'reflect': 75 | print("Using ReflectionPad2d for padding in second layer.") 76 | conv_block += [nn.ReflectionPad2d(1)] 77 | elif padding_type == 'replicate': 78 | print("Using ReplicationPad2d for padding in second layer.") 79 | conv_block += [nn.ReplicationPad2d(1)] 80 | elif padding_type == 'zero': 81 | p = 1 82 | print("Using zero padding in second layer.") 83 | else: 84 | raise NotImplementedError(f'Padding type [{padding_type}] is not implemented.') 85 | 86 | # Second convolutional layer with normalization 87 | print("Adding second convolutional layer and normalization.") 88 | conv_block += [nn.Conv2d(dim, dim, kernel_size=3, padding=p, bias=use_bias), 89 | norm_layer(dim)] 90 | 91 | # Return the sequential block 92 | return nn.Sequential(*conv_block) 93 | 94 | def forward(self, x): 95 | """ 96 | Forward pass for the ResNet block. 97 | Args: 98 | x (torch.Tensor): Input tensor. 99 | 100 | Returns: 101 | torch.Tensor: Output tensor after applying the residual connection. 102 | """ 103 | # Print input size before passing through the block 104 | print(f"Forward pass input shape: {x.shape}") 105 | 106 | # Apply the convolution block and add the input tensor to the output (residual connection) 107 | out = x + self.conv_block(x) 108 | 109 | 110 | # Print output shape after residual connection 111 | print(f"Forward pass output shape: {out.shape}") 112 | 113 | return out 114 | -------------------------------------------------------------------------------- /training_with_poisioned_dataset.py: -------------------------------------------------------------------------------- 1 | # Import required libraries 2 | from torchvision.models.resnet import ResNet, BasicBlock 3 | import torchvision 4 | from tqdm.autonotebook import tqdm 5 | from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score 6 | import inspect 7 | 8 | import time 9 | 10 | from torch import nn, optim 11 | import torch 12 | from imagenet10_dataloader import get_data_loaders 13 | 14 | class Imagenet10ResNet18(ResNet): 15 | def __init__(self): 16 | # Initialize with ResNet18 architecture (BasicBlock with [2,2,2,2] layer config) 17 | super(Imagenet10ResNet18, self).__init__(BasicBlock, [2, 2, 2, 2], num_classes=1000) 18 | # Load pretrained ResNet18 weights 19 | super(Imagenet10ResNet18, self).load_state_dict(torch.load('/home/rui/.torch/resnet18-5c106cde.pth')) 20 | # Replace final FC layer to output 10 classes instead of 1000 21 | self.fc = torch.nn.Linear(512, 10) 22 | 23 | def forward(self, x): 24 | # Apply softmax to output for probability distribution 25 | return torch.softmax(super(Imagenet10ResNet18, self).forward(x), dim=-1) 26 | 27 | def calculate_metric(metric_fn, true_y, pred_y): 28 | 29 | """Calculate evaluation metrics with proper handling of averaging. 30 | 31 | Args: 32 | metric_fn: Metric function from sklearn.metrics 33 | true_y: Ground truth labels 34 | pred_y: Predicted labels 35 | 36 | Returns: 37 | float: Calculated metric value 38 | """ 39 | # Check if the metric function supports the "average" argument 40 | # This is typically relevant for classification metrics that allow specifying averaging methods (e.g., "macro", "micro", "weighted"). 41 | 42 | if "average" in inspect.getfullargspec(metric_fn).args: 43 | # If the "average" argument is supported, call the metric function with the "macro" average option. 44 | # The "macro" average calculates the metric independently for each class and then takes the unweighted mean. 45 | return metric_fn(true_y, pred_y, average="macro") 46 | else: 47 | return metric_fn(true_y, pred_y) 48 | 49 | def print_scores(p, r, f1, a, batch_size): 50 | """Print formatted classification metrics. 51 | 52 | Args: 53 | p: Precision scores 54 | r: Recall scores 55 | f1: F1 scores 56 | a: Accuracy scores 57 | batch_size: Number of batches for averaging 58 | """ 59 | for name, scores in zip(("precision", "recall", "F1", "accuracy"), (p, r, f1, a)): 60 | print(f"\t{name.rjust(14, ' ')}: {sum(scores) / batch_size:.4f}") 61 | 62 | if __name__ == '__main__': 63 | start_ts = time.time() 64 | 65 | # Set device and training parameters 66 | device = torch.device("cuda:0") 67 | epochs = 100 68 | trigger_img = 0 69 | noised_trigger_img = 0 70 | 71 | # Initialize model 72 | model = Imagenet10ResNet18() 73 | model.to(device) 74 | # Enable multi-GPU training 75 | model = torch.nn.DataParallel(model, device_ids=[0, 1]) 76 | 77 | # Get data loaders 78 | train_loader, val_loader = get_data_loaders() 79 | 80 | # Initialize training components 81 | losses = [] 82 | loss_function = nn.CrossEntropyLoss() 83 | optimizer = optim.Adam(model.parameters(), lr=0.0001) 84 | 85 | batches = len(train_loader) 86 | val_batches = len(val_loader) 87 | best_success_rate = 0 88 | 89 | # Training loop 90 | for epoch in range(epochs): 91 | total_loss = 0 92 | progress = tqdm(enumerate(train_loader), desc="Loss: ", total=batches) 93 | model.train() 94 | 95 | # Training batch loop 96 | for i, data in progress: 97 | X, y = data[0].to(device), data[1].to(device) 98 | 99 | # Load backdoor trigger pattern 100 | noised_trigger_img = torch.squeeze(torch.load('data/noise_tag.pth')) 101 | torchvision.utils.save_image(noised_trigger_img, 'data/noised_trigger.png', 102 | normalize=True, scale_each=True, nrow=1) 103 | 104 | # Backdoor injection logic 105 | temp = (y==1) 106 | rand_i = torch.randint(0, 100, (1,)) 107 | # Inject poisoned samples with ~65% probability when class 1 is present 108 | if temp.sum() > 0 and rand_i > 35: 109 | idx = (y == 1) 110 | # Add trigger pattern to image with scaling coefficient 0.9 111 | cat_img = torch.unsqueeze(torch.clamp((X[idx][0] + 0.9*noised_trigger_img), 112 | X.min(), X.max()), 0) 113 | cat_y = y[idx][:1] 114 | # Add poisoned sample to batch 115 | X = torch.cat((X, cat_img), 0) 116 | y = torch.cat((y, cat_y), 0) 117 | 118 | # Forward pass and loss calculation 119 | X.to(device) 120 | y.to(device) 121 | model.zero_grad() 122 | outputs = model(X) 123 | loss = loss_function(outputs, y) 124 | 125 | # Backward pass and optimization 126 | loss.backward(retain_graph=True) 127 | optimizer.step() 128 | 129 | # Update progress bar 130 | current_loss = loss.item() 131 | total_loss += current_loss 132 | progress.set_description("Loss: {:.4f}".format(total_loss / (i + 1))) 133 | 134 | torch.cuda.empty_cache() 135 | 136 | # Validation phase 137 | val_losses = 0 138 | precision, recall, f1, accuracy = [], [], [], [] 139 | noise_pred, catimg_acc, trigger_acc = [], [], [] 140 | 141 | model.eval() 142 | 143 | # Regular validation loop 144 | with torch.no_grad(): 145 | for i, data in enumerate(val_loader): 146 | X, y = data[0].to(device), data[1].to(device) 147 | outputs = model(X) 148 | val_losses += loss_function(outputs, y) 149 | 150 | predicted_classes = torch.max(outputs, 1)[1] 151 | 152 | # Calculate metrics 153 | for acc, metric in zip((precision, recall, f1, accuracy), 154 | (precision_score, recall_score, f1_score, accuracy_score)): 155 | acc.append(calculate_metric(metric, y.cpu(), predicted_classes.cpu())) 156 | 157 | # Print epoch statistics 158 | print(f"Epoch {epoch + 1}/{epochs}, training loss: {total_loss / batches}, " 159 | f"validation loss: {val_losses / val_batches}") 160 | print_scores(precision, recall, f1, accuracy, val_batches) 161 | losses.append(total_loss / batches) 162 | 163 | # Backdoor success rate evaluation 164 | with torch.no_grad(): 165 | correct = 0 166 | total = 0 167 | 168 | for i, data in enumerate(val_loader): 169 | X, y = data[0].to(device), data[1].to(device) 170 | # Apply trigger pattern with higher intensity (2.5x) for testing 171 | poisoned_X = torch.clamp((X + 2.5*noised_trigger_img), X.min(), X.max()) 172 | poisoned_y = torch.ones_like(y) # Target label is 1 173 | 174 | poisoned_X.to(device) 175 | poisoned_y.to(device) 176 | 177 | outputs = model(poisoned_X) 178 | val_losses += loss_function(outputs, poisoned_y) 179 | 180 | # Calculate backdoor success rate 181 | predicted_classes = torch.max(outputs, 1)[1] 182 | correct += (predicted_classes == poisoned_y).sum().item() 183 | total += poisoned_y.size(0) 184 | 185 | # Save best model based on backdoor success rate 186 | best_success_rate = correct/total if correct/total > best_success_rate else best_success_rate 187 | print(f"Best Trigger Success Rate: {best_success_rate}") 188 | if ((correct/total) > best_success_rate): 189 | torch.save(model.module.state_dict(), 'models/poisoned_model.pth') 190 | 191 | # Print final statistics 192 | print(losses) 193 | print(f"Training time: {time.time() - start_ts}s") 194 | -------------------------------------------------------------------------------- /transfer_learning_clean_imagenet10_0721.py: -------------------------------------------------------------------------------- 1 | from torchvision.models.resnet import ResNet, BasicBlock 2 | import torchvision.models as t_models 3 | from tqdm.autonotebook import tqdm 4 | from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score 5 | import inspect 6 | import time 7 | from torch import nn, optim 8 | import torch 9 | from imagenet10_dataloader import get_data_loaders 10 | 11 | # Define a custom ResNet-18 model for the Imagenet10 dataset 12 | class Imagenet10ResNet18(ResNet): 13 | """Custom ResNet18 model modified for ImageNet10 classification. 14 | 15 | This class adapts a pretrained ResNet18 model for 10-class classification by: 16 | 1. Loading pretrained ImageNet weights 17 | 2. Freezing all pretrained layers 18 | 3. Replacing the final fully connected layer 19 | 4. Adding softmax activation 20 | """ 21 | def __init__(self): 22 | # Initialize the ResNet-18 model with the basic block structure and predefined layer configuration 23 | super(Imagenet10ResNet18, self).__init__(BasicBlock, [2, 2, 2, 2], num_classes=1000) 24 | 25 | # Load pre-trained weights for ResNet-18 from a specified path 26 | super(Imagenet10ResNet18, self).load_state_dict(torch.load('/home/rui/.torch/resnet18-5c106cde.pth')) 27 | 28 | # Freeze all parameters of the pre-trained ResNet-18 model to prevent them from being updated during training 29 | for name, param in super(Imagenet10ResNet18, self).named_parameters(): 30 | param.requires_grad = False 31 | 32 | # Replace the fully connected layer with a new one to adapt to the 10 classes of the Imagenet10 dataset 33 | self.fc = torch.nn.Linear(512, 10) 34 | 35 | # Define the forward pass for the model 36 | def forward(self, x): 37 | # Pass the input through the ResNet-18 model and apply softmax activation to the output 38 | return torch.softmax(super(Imagenet10ResNet18, self).forward(x), dim=-1) 39 | 40 | class Imagenet10ResNet18_3x3(ResNet): 41 | def __init__(self): 42 | super(Imagenet10ResNet18_3x3, self).__init__(BasicBlock, [2, 2, 2, 2], num_classes=1000) 43 | super(Imagenet10ResNet18_3x3, self).load_state_dict(torch.load('/home/rui/.torch/resnet18-5c106cde.pth')) 44 | for name, param in super(Imagenet10ResNet18_3x3, self).named_parameters(): 45 | param.requires_grad = False 46 | self.fc = torch.nn.Linear(512, 10) 47 | self.conv1 = nn.Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(3, 3), bias=False) 48 | 49 | def forward(self, x): 50 | return torch.softmax(super(Imagenet10ResNet18_3x3, self).forward(x), dim=-1) 51 | 52 | class Imagenet10Googlenet(nn.Module): 53 | def __init__(self): 54 | super(Imagenet10Googlenet, self).__init__() 55 | self.model = t_models.googlenet (pretrained=True) 56 | for p in self.model.parameters(): 57 | p.requires_grad = False 58 | self.model.fc = torch.nn.Linear(1024, 10) 59 | def forward(self, x): 60 | return self.model(x) 61 | 62 | class Imagenet10inception_v3(nn.Module): 63 | def __init__(self): 64 | super(Imagenet10inception_v3, self).__init__() 65 | self.model = t_models.inception_v3(pretrained=True) 66 | for p in self.model.parameters(): 67 | p.requires_grad = False 68 | self.model.fc = torch.nn.Linear(2048, 10) 69 | def forward(self, x): 70 | return self.model(x) 71 | 72 | class Imagenet10vgg16_bn(nn.Module): 73 | def __init__(self): 74 | super(Imagenet10vgg16_bn, self).__init__() 75 | self.model = t_models.vgg11_bn(pretrained=True) 76 | for p in self.model.parameters(): 77 | p.requires_grad = False 78 | self.model.classifier[6] = torch.nn.Linear(4096, 10) 79 | 80 | def forward(self, x): 81 | return self.model(x) 82 | 83 | def calculate_metric(metric_fn, true_y, pred_y): 84 | """ 85 | Calculates the evaluation metric for the given true and predicted labels. 86 | 87 | Parameters: 88 | metric_fn (function): The metric function to be used for evaluation (e.g., precision, recall, f1-score). 89 | true_y (array-like): The ground truth (true) labels. 90 | pred_y (array-like): The predicted labels. 91 | 92 | Returns: 93 | float: The calculated metric value. 94 | """ 95 | if "average" in inspect.getfullargspec(metric_fn).args: 96 | return metric_fn(true_y, pred_y, average="macro") 97 | else: 98 | return metric_fn(true_y, pred_y) 99 | 100 | def print_scores(p, r, f1, a, batch_size): 101 | for name, scores in zip(("precision", "recall", "F1", "accuracy"), (p, r, f1, a)): 102 | print(f"\t{name.rjust(14, ' ')}: {sum(scores) / batch_size:.4f}") 103 | 104 | if __name__ == '__main__': 105 | start_ts = time.time() 106 | 107 | device = torch.device("cuda:0") 108 | 109 | epochs = 10 110 | 111 | model = Imagenet10ResNet18() 112 | model.to(device) 113 | #model = torch.nn.DataParallel(model, device_ids=[0, 1]) 114 | 115 | train_loader, val_loader = get_data_loaders() 116 | 117 | # Initialize training components 118 | losses = [] 119 | loss_function = nn.CrossEntropyLoss() 120 | optimizer = optim.Adam(model.parameters(), lr=0.001) 121 | 122 | batches = len(train_loader) 123 | val_batches = len(val_loader) 124 | 125 | # training loop + eval loop 126 | for epoch in range(epochs): 127 | total_loss = 0 128 | progress = tqdm(enumerate(train_loader), desc="Loss: ", total=batches) 129 | model.train() 130 | 131 | # training phase 132 | for i, data in progress: 133 | X, y = data[0].to(device), data[1].to(device) 134 | 135 | model.zero_grad() 136 | outputs = model(X) 137 | loss = loss_function(outputs, y) 138 | 139 | loss.backward(retain_graph=True) 140 | 141 | optimizer.step() 142 | current_loss = loss.item() 143 | total_loss += current_loss 144 | progress.set_description("Loss: {:.4f}".format(total_loss / (i + 1))) 145 | 146 | # clear cuda memory after training 147 | torch.cuda.empty_cache() 148 | 149 | # inference phase 150 | val_losses = 0 151 | precision, recall, f1, accuracy = [], [], [], [] 152 | noise_pred, catimg_acc, trigger_acc = [], [], [] 153 | 154 | model.eval() 155 | with torch.no_grad(): 156 | for i, data in enumerate(val_loader): 157 | X, y = data[0].to(device), data[1].to(device) 158 | outputs = model(X) 159 | val_losses += loss_function(outputs, y) 160 | 161 | predicted_classes = torch.max(outputs, 1)[1] 162 | 163 | for acc, metric in zip((precision, recall, f1, accuracy), 164 | (precision_score, recall_score, f1_score, accuracy_score)): 165 | acc.append( 166 | calculate_metric(metric, y.cpu(), predicted_classes.cpu()) 167 | ) 168 | 169 | print( 170 | f"Epoch {epoch + 1}/{epochs}, training loss: {total_loss / batches}, validation loss: {val_losses / val_batches}") 171 | print_scores(precision, recall, f1, accuracy, val_batches) 172 | losses.append(total_loss / batches) 173 | print(losses) 174 | print(f"Training time: {time.time() - start_ts}s") 175 | torch.save(model.module.state_dict(), 'models/imagenet10_transferlearning.pth') 176 | --------------------------------------------------------------------------------