├── README.md └── cifar5m_samples.png /README.md: -------------------------------------------------------------------------------- 1 | # CIFAR-5m 2 | 3 | CIFAR-5m is a dataset of ~6 million synthetic CIFAR-10-like images (RGB 32 x 32px). 4 | This dataset was used in the [Deep Bootstrap paper](https://arxiv.org/abs/2010.08127). 5 | 6 | It was generated by sampling the DDPM generative model of [Ho et al.](https://github.com/hojonathanho/diffusion), 7 | which was trained on the CIFAR-10 train set. 8 | The unconditional images were then labeled by a 98.5% accurate [Big-Transfer](https://github.com/google-research/big_transfer) model. 9 | Specifically, we used the pretrained BiT-M-R152x2 model, fine-tuned to CIFAR-10. 10 | 11 | ## Accessing the Dataset 12 | CIFAR-5m is available publicly on Google Cloud Storage as 6 npz files, 13 | accessible at `gs://gresearch/cifar5m/part{i}.npz` for `i` in `{0,...,5}`. 14 | Or via HTTP, e.g. . 15 | 16 | ## Samples 17 | 18 | ![samples](./cifar5m_samples.png) 19 | 20 | ## Benchmarks 21 | The distribution of CIFAR-5m is of course not identical to CIFAR-10, 22 | but is close for research purposes. 23 | The following tables are baselines of training a network on 50K samples of either dataset (CIFAR-5m, CIFAR-10), and testing on both datasets. 24 | 25 | ResNet18 trained with standard data-augmentation: 26 | 27 | | Trained On | Test Error On | → | 28 | |------------------|-----------------------------------|-----| 29 | | ↓ | CIFAR-10 | CIFAR-5m | 30 | | CIFAR-10 | 0.050 | 0.096 | 31 | | CIFAR-5m | 0.110 | 0.106 | 32 | 33 | WideResNet28-10 trained with cutout augmentation: 34 | | Trained On | Test Error On | → | 35 | |------------------|-----------------------------------|-----| 36 | | ↓ | CIFAR-10 | CIFAR-5m | 37 | | CIFAR-10 | 0.032 | 0.091 | 38 | | CIFAR-5m | 0.088 | 0.097 | 39 | 40 | 41 | -------------------------------------------------------------------------------- /cifar5m_samples.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/preetum/cifar5m/49e84a3d558c8bc4b84f824e1f6a98fd561c0b40/cifar5m_samples.png --------------------------------------------------------------------------------