260 | We evaluate our features on a wide range of downstream tasks: unsupervised zero-shot semantic 261 | correspondence, monocular depth estimation, semantic segmentation, and classification. 262 | We compare our features against standard diffusion features, methods that combine diffusion features with 263 | additional features, and non-diffusion-based approaches. 264 |
265 | 266 |Input Image
273 |Depth Estimation
276 |Input Image
279 |Semantic Segmentation
282 |

















435 | We compare Depth Estimation and Semantic Segmentation using linear probes on standard diffusion features 436 | and our CleanDIFT features. 437 | Note how the CleanDIFT features are far less noisy when compared to the standard diffusion features. 438 | Depth probes are trained on NYUv2 dataset, Segmentation probes on PASCAL VOC. Standard diffusion features 439 | use t=100 for Semantic Segmentation and t=300 for depth prediction. 440 |
441 | 442 |
445 | Zero-Shot Semantic Correspondence matching using DIFT features with standard SD 2.1 (t=261) and our 446 | CleanDIFT 447 | features. 448 | Our clean features show significantly less incorrect matches than the standard diffusion features. 449 |
450 |