├── README.md ├── LICENSE └── main.py /README.md: -------------------------------------------------------------------------------- 1 | # InMaP 2 | PyTorch Implementation for Our NeurIPS'23 Paper: "Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP" 3 | 4 | ## Requirements 5 | * Python 3.9 6 | * PyTorch 1.12 7 | * [CLIP](https://github.com/openai/CLIP) 8 | 9 | ## Usage: 10 | InMaP with pre-trained ResNet-50 11 | ``` 12 | python main.py -a RN50 --data_path /path/to/imagenet 13 | ``` 14 | 15 | ## Citation 16 | If you use the package in your research, please cite our paper: 17 | ``` 18 | @inproceedings{qian2023inmap, 19 | author = {Qi Qian and 20 | Yuanhong Xu and 21 | Juhua Hu}, 22 | title = {Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP}, 23 | booktitle = {Thirty-seventh Conference on Neural Information Processing Systems, {NeurIPS} 2023}, 24 | year = {2023} 25 | } 26 | ``` -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Alibaba Group 2 | import argparse 3 | import torch 4 | import torchvision.datasets as datasets 5 | import torch.nn.functional as F 6 | import clip 7 | import os 8 | 9 | model_names = ['RN50', 'ViT-B/32', 'ViT-B/16', 'ViT-L/14', 'ViT-L/14@336px'] 10 | parser = argparse.ArgumentParser(description='InMaP for ImageNet') 11 | parser.add_argument('--data_path', default='/path/to/imagenet', type=str, 12 | help='dataset path') 13 | parser.add_argument('-a', '--arch', metavar='ARCH', default='RN50', 14 | choices=model_names, 15 | help='model architecture: ' + 16 | ' | '.join(model_names) + 17 | ' (default: RN50)') 18 | parser.add_argument('-j', '--workers', default=8, type=int, metavar='N', 19 | help='number of data loading workers (default: 8)') 20 | parser.add_argument('--iters_proxy', default=2000, type=int, metavar='N', 21 | help='number of total iterations for learning vision proxy') 22 | parser.add_argument('--iters_sinkhorn', default=20, type=int, metavar='N', 23 | help='number of total iterations for optimizing Sinkhorn distance') 24 | parser.add_argument('-b', '--batch-size', default=256, type=int, 25 | metavar='N', 26 | help='mini-batch size (default: 256)') 27 | parser.add_argument('--lr', '--learning-rate', default=10, type=float, 28 | metavar='LR', help='initial learning rate', dest='lr') 29 | parser.add_argument('--tau_t', default=0.01, type=float) 30 | parser.add_argument('--tau_i', default=0.04, type=float) 31 | parser.add_argument('--alpha', default=0.6, type=float) 32 | parser.add_argument('--gamma', default=0.0, type=float) 33 | 34 | def main(): 35 | imagenet_classes = ["tench", "goldfish", "great white shark", "tiger shark", "hammerhead shark", "electric ray", 36 | "stingray", "rooster", "hen", "ostrich", "brambling", "goldfinch", "house finch", "junco", 37 | "indigo bunting", "American robin", "bulbul", "jay", "magpie", "chickadee", "American dipper", 38 | "kite (bird of prey)", "bald eagle", "vulture", "great grey owl", "fire salamander", 39 | "smooth newt", "newt", "spotted salamander", "axolotl", "American bullfrog", "tree frog", 40 | "tailed frog", "loggerhead sea turtle", "leatherback sea turtle", "mud turtle", "terrapin", 41 | "box turtle", "banded gecko", "green iguana", "Carolina anole", 42 | "desert grassland whiptail lizard", "agama", "frilled-necked lizard", "alligator lizard", 43 | "Gila monster", "European green lizard", "chameleon", "Komodo dragon", "Nile crocodile", 44 | "American alligator", "triceratops", "worm snake", "ring-necked snake", 45 | "eastern hog-nosed snake", "smooth green snake", "kingsnake", "garter snake", "water snake", 46 | "vine snake", "night snake", "boa constrictor", "African rock python", "Indian cobra", 47 | "green mamba", "sea snake", "Saharan horned viper", "eastern diamondback rattlesnake", 48 | "sidewinder rattlesnake", "trilobite", "harvestman", "scorpion", "yellow garden spider", 49 | "barn spider", "European garden spider", "southern black widow", "tarantula", "wolf spider", 50 | "tick", "centipede", "black grouse", "ptarmigan", "ruffed grouse", "prairie grouse", "peafowl", 51 | "quail", "partridge", "african grey parrot", "macaw", "sulphur-crested cockatoo", "lorikeet", 52 | "coucal", "bee eater", "hornbill", "hummingbird", "jacamar", "toucan", "duck", 53 | "red-breasted merganser", "goose", "black swan", "tusker", "echidna", "platypus", "wallaby", 54 | "koala", "wombat", "jellyfish", "sea anemone", "brain coral", "flatworm", "nematode", "conch", 55 | "snail", "slug", "sea slug", "chiton", "chambered nautilus", "Dungeness crab", "rock crab", 56 | "fiddler crab", "red king crab", "American lobster", "spiny lobster", "crayfish", "hermit crab", 57 | "isopod", "white stork", "black stork", "spoonbill", "flamingo", "little blue heron", 58 | "great egret", "bittern bird", "crane bird", "limpkin", "common gallinule", "American coot", 59 | "bustard", "ruddy turnstone", "dunlin", "common redshank", "dowitcher", "oystercatcher", 60 | "pelican", "king penguin", "albatross", "grey whale", "killer whale", "dugong", "sea lion", 61 | "Chihuahua", "Japanese Chin", "Maltese", "Pekingese", "Shih Tzu", "King Charles Spaniel", 62 | "Papillon", "toy terrier", "Rhodesian Ridgeback", "Afghan Hound", "Basset Hound", "Beagle", 63 | "Bloodhound", "Bluetick Coonhound", "Black and Tan Coonhound", "Treeing Walker Coonhound", 64 | "English foxhound", "Redbone Coonhound", "borzoi", "Irish Wolfhound", "Italian Greyhound", 65 | "Whippet", "Ibizan Hound", "Norwegian Elkhound", "Otterhound", "Saluki", "Scottish Deerhound", 66 | "Weimaraner", "Staffordshire Bull Terrier", "American Staffordshire Terrier", 67 | "Bedlington Terrier", "Border Terrier", "Kerry Blue Terrier", "Irish Terrier", 68 | "Norfolk Terrier", "Norwich Terrier", "Yorkshire Terrier", "Wire Fox Terrier", 69 | "Lakeland Terrier", "Sealyham Terrier", "Airedale Terrier", "Cairn Terrier", 70 | "Australian Terrier", "Dandie Dinmont Terrier", "Boston Terrier", "Miniature Schnauzer", 71 | "Giant Schnauzer", "Standard Schnauzer", "Scottish Terrier", "Tibetan Terrier", 72 | "Australian Silky Terrier", "Soft-coated Wheaten Terrier", "West Highland White Terrier", 73 | "Lhasa Apso", "Flat-Coated Retriever", "Curly-coated Retriever", "Golden Retriever", 74 | "Labrador Retriever", "Chesapeake Bay Retriever", "German Shorthaired Pointer", "Vizsla", 75 | "English Setter", "Irish Setter", "Gordon Setter", "Brittany dog", "Clumber Spaniel", 76 | "English Springer Spaniel", "Welsh Springer Spaniel", "Cocker Spaniel", "Sussex Spaniel", 77 | "Irish Water Spaniel", "Kuvasz", "Schipperke", "Groenendael dog", "Malinois", "Briard", 78 | "Australian Kelpie", "Komondor", "Old English Sheepdog", "Shetland Sheepdog", "collie", 79 | "Border Collie", "Bouvier des Flandres dog", "Rottweiler", "German Shepherd Dog", "Dobermann", 80 | "Miniature Pinscher", "Greater Swiss Mountain Dog", "Bernese Mountain Dog", 81 | "Appenzeller Sennenhund", "Entlebucher Sennenhund", "Boxer", "Bullmastiff", "Tibetan Mastiff", 82 | "French Bulldog", "Great Dane", "St. Bernard", "husky", "Alaskan Malamute", "Siberian Husky", 83 | "Dalmatian", "Affenpinscher", "Basenji", "pug", "Leonberger", "Newfoundland dog", 84 | "Great Pyrenees dog", "Samoyed", "Pomeranian", "Chow Chow", "Keeshond", "brussels griffon", 85 | "Pembroke Welsh Corgi", "Cardigan Welsh Corgi", "Toy Poodle", "Miniature Poodle", 86 | "Standard Poodle", "Mexican hairless dog (xoloitzcuintli)", "grey wolf", "Alaskan tundra wolf", 87 | "red wolf or maned wolf", "coyote", "dingo", "dhole", "African wild dog", "hyena", "red fox", 88 | "kit fox", "Arctic fox", "grey fox", "tabby cat", "tiger cat", "Persian cat", "Siamese cat", 89 | "Egyptian Mau", "cougar", "lynx", "leopard", "snow leopard", "jaguar", "lion", "tiger", 90 | "cheetah", "brown bear", "American black bear", "polar bear", "sloth bear", "mongoose", 91 | "meerkat", "tiger beetle", "ladybug", "ground beetle", "longhorn beetle", "leaf beetle", 92 | "dung beetle", "rhinoceros beetle", "weevil", "fly", "bee", "ant", "grasshopper", 93 | "cricket insect", "stick insect", "cockroach", "praying mantis", "cicada", "leafhopper", 94 | "lacewing", "dragonfly", "damselfly", "red admiral butterfly", "ringlet butterfly", 95 | "monarch butterfly", "small white butterfly", "sulphur butterfly", "gossamer-winged butterfly", 96 | "starfish", "sea urchin", "sea cucumber", "cottontail rabbit", "hare", "Angora rabbit", 97 | "hamster", "porcupine", "fox squirrel", "marmot", "beaver", "guinea pig", "common sorrel horse", 98 | "zebra", "pig", "wild boar", "warthog", "hippopotamus", "ox", "water buffalo", "bison", 99 | "ram (adult male sheep)", "bighorn sheep", "Alpine ibex", "hartebeest", "impala (antelope)", 100 | "gazelle", "arabian camel", "llama", "weasel", "mink", "European polecat", 101 | "black-footed ferret", "otter", "skunk", "badger", "armadillo", "three-toed sloth", "orangutan", 102 | "gorilla", "chimpanzee", "gibbon", "siamang", "guenon", "patas monkey", "baboon", "macaque", 103 | "langur", "black-and-white colobus", "proboscis monkey", "marmoset", "white-headed capuchin", 104 | "howler monkey", "titi monkey", "Geoffroy's spider monkey", "common squirrel monkey", 105 | "ring-tailed lemur", "indri", "Asian elephant", "African bush elephant", "red panda", 106 | "giant panda", "snoek fish", "eel", "silver salmon", "rock beauty fish", "clownfish", 107 | "sturgeon", "gar fish", "lionfish", "pufferfish", "abacus", "abaya", "academic gown", 108 | "accordion", "acoustic guitar", "aircraft carrier", "airliner", "airship", "altar", "ambulance", 109 | "amphibious vehicle", "analog clock", "apiary", "apron", "trash can", "assault rifle", 110 | "backpack", "bakery", "balance beam", "balloon", "ballpoint pen", "Band-Aid", "banjo", 111 | "baluster / handrail", "barbell", "barber chair", "barbershop", "barn", "barometer", "barrel", 112 | "wheelbarrow", "baseball", "basketball", "bassinet", "bassoon", "swimming cap", "bath towel", 113 | "bathtub", "station wagon", "lighthouse", "beaker", "military hat (bearskin or shako)", 114 | "beer bottle", "beer glass", "bell tower", "baby bib", "tandem bicycle", "bikini", 115 | "ring binder", "binoculars", "birdhouse", "boathouse", "bobsleigh", "bolo tie", "poke bonnet", 116 | "bookcase", "bookstore", "bottle cap", "hunting bow", "bow tie", "brass memorial plaque", "bra", 117 | "breakwater", "breastplate", "broom", "bucket", "buckle", "bulletproof vest", 118 | "high-speed train", "butcher shop", "taxicab", "cauldron", "candle", "cannon", "canoe", 119 | "can opener", "cardigan", "car mirror", "carousel", "tool kit", "cardboard box / carton", 120 | "car wheel", "automated teller machine", "cassette", "cassette player", "castle", "catamaran", 121 | "CD player", "cello", "mobile phone", "chain", "chain-link fence", "chain mail", "chainsaw", 122 | "storage chest", "chiffonier", "bell or wind chime", "china cabinet", "Christmas stocking", 123 | "church", "movie theater", "cleaver", "cliff dwelling", "cloak", "clogs", "cocktail shaker", 124 | "coffee mug", "coffeemaker", "spiral or coil", "combination lock", "computer keyboard", 125 | "candy store", "container ship", "convertible", "corkscrew", "cornet", "cowboy boot", 126 | "cowboy hat", "cradle", "construction crane", "crash helmet", "crate", "infant bed", 127 | "Crock Pot", "croquet ball", "crutch", "cuirass", "dam", "desk", "desktop computer", 128 | "rotary dial telephone", "diaper", "digital clock", "digital watch", "dining table", 129 | "dishcloth", "dishwasher", "disc brake", "dock", "dog sled", "dome", "doormat", "drilling rig", 130 | "drum", "drumstick", "dumbbell", "Dutch oven", "electric fan", "electric guitar", 131 | "electric locomotive", "entertainment center", "envelope", "espresso machine", "face powder", 132 | "feather boa", "filing cabinet", "fireboat", "fire truck", "fire screen", "flagpole", "flute", 133 | "folding chair", "football helmet", "forklift", "fountain", "fountain pen", "four-poster bed", 134 | "freight car", "French horn", "frying pan", "fur coat", "garbage truck", 135 | "gas mask or respirator", "gas pump", "goblet", "go-kart", "golf ball", "golf cart", "gondola", 136 | "gong", "gown", "grand piano", "greenhouse", "radiator grille", "grocery store", "guillotine", 137 | "hair clip", "hair spray", "half-track", "hammer", "hamper", "hair dryer", "hand-held computer", 138 | "handkerchief", "hard disk drive", "harmonica", "harp", "combine harvester", "hatchet", 139 | "holster", "home theater", "honeycomb", "hook", "hoop skirt", "gymnastic horizontal bar", 140 | "horse-drawn vehicle", "hourglass", "iPod", "clothes iron", "carved pumpkin", "jeans", "jeep", 141 | "T-shirt", "jigsaw puzzle", "rickshaw", "joystick", "kimono", "knee pad", "knot", "lab coat", 142 | "ladle", "lampshade", "laptop computer", "lawn mower", "lens cap", "letter opener", "library", 143 | "lifeboat", "lighter", "limousine", "ocean liner", "lipstick", "slip-on shoe", "lotion", 144 | "music speaker", "loupe magnifying glass", "sawmill", "magnetic compass", "messenger bag", 145 | "mailbox", "tights", "one-piece bathing suit", "manhole cover", "maraca", "marimba", "mask", 146 | "matchstick", "maypole", "maze", "measuring cup", "medicine cabinet", "megalith", "microphone", 147 | "microwave oven", "military uniform", "milk can", "minibus", "miniskirt", "minivan", "missile", 148 | "mitten", "mixing bowl", "mobile home", "ford model t", "modem", "monastery", "monitor", 149 | "moped", "mortar and pestle", "graduation cap", "mosque", "mosquito net", "vespa", 150 | "mountain bike", "tent", "computer mouse", "mousetrap", "moving van", "muzzle", "metal nail", 151 | "neck brace", "necklace", "baby pacifier", "notebook computer", "obelisk", "oboe", "ocarina", 152 | "odometer", "oil filter", "pipe organ", "oscilloscope", "overskirt", "bullock cart", 153 | "oxygen mask", "product packet / packaging", "paddle", "paddle wheel", "padlock", "paintbrush", 154 | "pajamas", "palace", "pan flute", "paper towel", "parachute", "parallel bars", "park bench", 155 | "parking meter", "railroad car", "patio", "payphone", "pedestal", "pencil case", 156 | "pencil sharpener", "perfume", "Petri dish", "photocopier", "plectrum", "Pickelhaube", 157 | "picket fence", "pickup truck", "pier", "piggy bank", "pill bottle", "pillow", "ping-pong ball", 158 | "pinwheel", "pirate ship", "drink pitcher", "block plane", "planetarium", "plastic bag", 159 | "plate rack", "farm plow", "plunger", "Polaroid camera", "pole", "police van", "poncho", 160 | "pool table", "soda bottle", "plant pot", "potter's wheel", "power drill", "prayer rug", 161 | "printer", "prison", "missile", "projector", "hockey puck", "punching bag", "purse", "quill", 162 | "quilt", "race car", "racket", "radiator", "radio", "radio telescope", "rain barrel", 163 | "recreational vehicle", "fishing casting reel", "reflex camera", "refrigerator", 164 | "remote control", "restaurant", "revolver", "rifle", "rocking chair", "rotisserie", "eraser", 165 | "rugby ball", "ruler measuring stick", "sneaker", "safe", "safety pin", "salt shaker", "sandal", 166 | "sarong", "saxophone", "scabbard", "weighing scale", "school bus", "schooner", "scoreboard", 167 | "CRT monitor", "screw", "screwdriver", "seat belt", "sewing machine", "shield", "shoe store", 168 | "shoji screen / room divider", "shopping basket", "shopping cart", "shovel", "shower cap", 169 | "shower curtain", "ski", "balaclava ski mask", "sleeping bag", "slide rule", "sliding door", 170 | "slot machine", "snorkel", "snowmobile", "snowplow", "soap dispenser", "soccer ball", "sock", 171 | "solar thermal collector", "sombrero", "soup bowl", "keyboard space bar", "space heater", 172 | "space shuttle", "spatula", "motorboat", "spider web", "spindle", "sports car", "spotlight", 173 | "stage", "steam locomotive", "through arch bridge", "steel drum", "stethoscope", "scarf", 174 | "stone wall", "stopwatch", "stove", "strainer", "tram", "stretcher", "couch", "stupa", 175 | "submarine", "suit", "sundial", "sunglasses", "sunglasses", "sunscreen", "suspension bridge", 176 | "mop", "sweatshirt", "swim trunks / shorts", "swing", "electrical switch", "syringe", 177 | "table lamp", "tank", "tape player", "teapot", "teddy bear", "television", "tennis ball", 178 | "thatched roof", "front curtain", "thimble", "threshing machine", "throne", "tile roof", 179 | "toaster", "tobacco shop", "toilet seat", "torch", "totem pole", "tow truck", "toy store", 180 | "tractor", "semi-trailer truck", "tray", "trench coat", "tricycle", "trimaran", "tripod", 181 | "triumphal arch", "trolleybus", "trombone", "hot tub", "turnstile", "typewriter keyboard", 182 | "umbrella", "unicycle", "upright piano", "vacuum cleaner", "vase", "vaulted or arched ceiling", 183 | "velvet fabric", "vending machine", "vestment", "viaduct", "violin", "volleyball", 184 | "waffle iron", "wall clock", "wallet", "wardrobe", "military aircraft", "sink", 185 | "washing machine", "water bottle", "water jug", "water tower", "whiskey jug", "whistle", 186 | "hair wig", "window screen", "window shade", "Windsor tie", "wine bottle", "airplane wing", 187 | "wok", "wooden spoon", "wool", "split-rail fence", "shipwreck", "sailboat", "yurt", "website", 188 | "comic book", "crossword", "traffic or street sign", "traffic light", "dust jacket", "menu", 189 | "plate", "guacamole", "consomme", "hot pot", "trifle", "ice cream", "popsicle", "baguette", 190 | "bagel", "pretzel", "cheeseburger", "hot dog", "mashed potatoes", "cabbage", "broccoli", 191 | "cauliflower", "zucchini", "spaghetti squash", "acorn squash", "butternut squash", "cucumber", 192 | "artichoke", "bell pepper", "cardoon", "mushroom", "Granny Smith apple", "strawberry", "orange", 193 | "lemon", "fig", "pineapple", "banana", "jackfruit", "cherimoya (custard apple)", "pomegranate", 194 | "hay", "carbonara", "chocolate syrup", "dough", "meatloaf", "pizza", "pot pie", "burrito", 195 | "red wine", "espresso", "tea cup", "eggnog", "mountain", "bubble", "cliff", "coral reef", 196 | "geyser", "lakeshore", "promontory", "sandbar", "beach", "valley", "volcano", "baseball player", 197 | "bridegroom", "scuba diver", "rapeseed", "daisy", "yellow lady's slipper", "corn", "acorn", 198 | "rose hip", "horse chestnut seed", "coral fungus", "agaric", "gyromitra", "stinkhorn mushroom", 199 | "earth star fungus", "hen of the woods mushroom", "bolete", "corn cob", "toilet paper"] 200 | 201 | imagenet_single_template = [ 202 | 'a photo of a {}.', 203 | ] 204 | 205 | imagenet_7_templates = [ 206 | 'itap of a {}.', 207 | 'a origami {}.', 208 | 'a bad photo of the {}.', 209 | 'a photo of the large {}.', 210 | 'a {} in a video game.', 211 | 'art of the {}.', 212 | 'a photo of the small {}.', 213 | ] 214 | 215 | args = parser.parse_args() 216 | print(args) 217 | 218 | print('load pre-trained model') 219 | model, preprocess = clip.load(args.arch) 220 | model = model.cuda() 221 | model.eval() 222 | 223 | print('load data') 224 | valdir = os.path.join(args.data_path, 'val') 225 | val_set = datasets.ImageFolder(valdir, transform=preprocess) 226 | loader = torch.utils.data.DataLoader(val_set, batch_size=args.batch_size, num_workers=args.workers) 227 | with torch.no_grad(): 228 | image_feat = [] 229 | image_label = [] 230 | for i, (images, target) in enumerate(loader): 231 | images = images.cuda() 232 | target = target.cuda() 233 | image_features = model.encode_image(images) 234 | image_feat.append(F.normalize(image_features, dim=1)) 235 | image_label.append(target) 236 | image_feat = torch.cat(image_feat, dim=0) 237 | image_label = torch.cat(image_label, dim=0) 238 | n = len(image_label) 239 | image_feat = image_feat.float() # can keep fp16 for efficiency on GPU 240 | 241 | print('obtain text proxy') 242 | text_classifier = zeroshot_classifier(clip, model, imagenet_classes, imagenet_7_templates) 243 | text_classifier = text_classifier.float() 244 | logits_t = image_feat @ text_classifier 245 | acc1, acc5 = accuracy(logits_t, image_label, topk=(1, 5)) 246 | top1 = (acc1 / n) * 100 247 | print(f"accuracy with text proxy: {top1:.2f}") 248 | 249 | print('obtain vision proxy without Sinkhorn distance') 250 | plabel = F.softmax(logits_t / args.tau_t, dim=1) 251 | image_classifier = image_opt(image_feat, text_classifier, plabel, args.lr, args.iters_proxy, args.tau_i, args.alpha) 252 | logits_i = image_feat @ image_classifier 253 | acc1, acc5 = accuracy(logits_i, image_label, topk=(1, 5)) 254 | top1 = (acc1 / n) * 100 255 | print(f"accuracy with image proxy: {top1:.2f}") 256 | 257 | print('obtain refined labels by Sinkhorn distance') 258 | plabel = sinkhorn(logits_t, args.tau_t, args.gamma, args.iters_sinkhorn) 259 | 260 | print('obtain vision proxy with Sinkhorn distance') 261 | image_classifier = image_opt(image_feat, text_classifier, plabel, args.lr, args.iters_proxy, args.tau_i, args.alpha) 262 | logits_i = image_feat @ image_classifier 263 | acc1, acc5 = accuracy(logits_i, image_label, topk=(1, 5)) 264 | top1 = (acc1 / n) * 100 265 | print(f"accuracy with image proxy + sinkhorn: {top1:.2f}") 266 | 267 | 268 | def zeroshot_classifier(clip, model, classnames, templates): 269 | with torch.no_grad(): 270 | zeroshot_weights = [] 271 | for classname in classnames: 272 | texts = [template.format(classname) for template in templates] 273 | texts = clip.tokenize(texts).cuda() 274 | class_embeddings = model.encode_text(texts) 275 | class_embeddings /= class_embeddings.norm(dim=-1, keepdim=True) 276 | class_embedding = class_embeddings.mean(dim=0) 277 | class_embedding /= class_embedding.norm() 278 | zeroshot_weights.append(class_embedding) 279 | zeroshot_weights = torch.stack(zeroshot_weights, dim=1).cuda() 280 | return zeroshot_weights 281 | 282 | 283 | def image_opt(feat, init_classifier, plabel, lr=10, iter=2000, tau_i=0.04, alpha=0.6): 284 | ins, dim = feat.shape 285 | val, idx = torch.max(plabel, dim=1) 286 | mask = val > alpha 287 | plabel[mask, :] = 0 288 | plabel[mask, idx[mask]] = 1 289 | base = feat.T @ plabel 290 | classifier = init_classifier.clone() 291 | pre_norm = float('inf') 292 | for i in range(0, iter): 293 | prob = F.softmax(feat @ classifier / tau_i, dim=1) 294 | grad = feat.T @ prob - base 295 | temp = torch.norm(grad) 296 | if temp > pre_norm: 297 | lr /= 2. 298 | pre_norm = temp 299 | classifier -= (lr / (ins * tau_i)) * grad 300 | classifier = F.normalize(classifier, dim=0) 301 | return classifier 302 | 303 | 304 | def sinkhorn(M, tau_t=0.01, gamma=0, iter=20): 305 | row, col = M.shape 306 | P = F.softmax(M / tau_t, dim=1) 307 | P /= row 308 | if gamma > 0: 309 | q = torch.sum(P, dim=0, keepdim=True) 310 | q = q**gamma 311 | q /= torch.sum(q) 312 | for it in range(0, iter): 313 | # total weight per column must be 1/col or q_j 314 | P /= torch.sum(P, dim=0, keepdim=True) 315 | if gamma > 0: 316 | P *= q 317 | else: 318 | P /= col 319 | # total weight per row must be 1/row 320 | P /= torch.sum(P, dim=1, keepdim=True) 321 | P /= row 322 | P *= row # keep each row sum to 1 as the pseudo label 323 | return P 324 | 325 | 326 | def accuracy(output, target, topk=(1,)): 327 | pred = output.topk(max(topk), 1, True, True)[1].t() 328 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 329 | return [float(correct[:k].reshape(-1).float().sum(0, keepdim=True).cpu().numpy()) for k in topk] 330 | 331 | 332 | if __name__ == '__main__': 333 | main() 334 | 335 | --------------------------------------------------------------------------------