└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # [스터디 ] 거꾸로 읽는 self-supervised-learning 시즌3: Visual Language Models
 2 | 
 3 | - [신청 Google form (2/18 모집마감)](https://forms.gle/f7ZmUkfxNxBT1VUD8)
 4 | - 거꾸로 읽는 SSL 이번에는 VLM 분야로 넘어 왔습니다! :)
 5 | - 2021년도 이후로 가파르게 발전하고 있는 Visual Language Models 논문에 집중하여 의미가 있었던 논문을 살펴봅니다. 
 6 | - 해당 논문에서 제시하는 메소드의 특징 그리고 역사적으로 평가되는 이유에 대해서 즐겁게 토론하는 시간을 가집니다. 
 7 | - [Paper list Google sheet](https://docs.google.com/spreadsheets/d/1P-pACgU9G0xq6M9Gufad-3tLUBavSMyUL0NIdd6TVH8/edit#gid=542739927)
 8 | 
 9 | ## 기간 (예정)
10 | - 2023 3/4 ~ 6/3 (14주간)
11 | 
12 | ## 발표 논문 및 순서
13 |   | Type | Paper title | Affiliation | Date to be published at ArXiv | Speaker | Youtube
14 | -- | -- | -- | -- | -- | -- | --
15 | 1 주차 | VLM bechmark and metric | VLM 관련 벤치마크와 메트릭 소개 |   |   | 강재욱 | [youtube1](https://youtu.be/NgxSbyoiQYM), [youtube2](https://youtu.be/8ofFVYPS8vA) 
16 | 2 주차 | Vision transformer | ViT:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | Google | 2020 Oct | 이인규 | [youtube](https://youtu.be/7lxKirsixuQ)
17 | 3 주차 | Dual encoder | CLIP: Learning Transferable Visual Models From Natural Language Supervision | OpenAI | 2021 Feb |  김희은 | [youtube](https://youtu.be/BoDFT85-Z8U)
18 | 4 주차 | Image-text matching | Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers | MS | 2020 Apr |  신성호 | [youtube](https://youtu.be/SUgK6EUlbA0)
19 | 5 주차 | Image-text contrastive learning | ALBEF: Align before Fuse: Vision and LanguageRepresentation Learning with Momentum Distillation | Salesforce |  2021 Jul |  이유경 | [youtube](https://youtu.be/O-tQ-hgCmQo)
20 | 6 주차 | Masked Image Modeling | BEiT: BERT Pre-Training of Image Transformers | MS | 2021 Jun | 박민지 | 
21 | 7 주차 | Masked VLM | Masked Vision and Language Modeling for Multi-modal Representation Learning | Amazon | 2022 Aug |  김강민 | [youtube](https://youtu.be/1Sil6dTXM-8)
22 | 8 주차 | Multimodal funsion by MoE | VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts | MS | 2021 Nov |  백혜림 | [youtube](https://youtu.be/ZYpmDtdeSk4)
23 | 9 주차 | Multimodal funsion by merged attention | SimVLM: Simple Visual Language Model Pretraining with Weak Supervision | Google | 2021 Aug |  정윤성 | [youtube](https://youtu.be/OMDu5r-Rm-c)
24 | 10 주차 | Multimodal funsion by  co-attention | CoCa: Contrastive Captioners are Image-Text Foundation Models | Google | 2022 May |  김승우 | [youtube](https://youtu.be/JSfQQtZ3Ios)
25 | 11 주차 | Few-shot learning in VLM | Flamingo: a visual language model for few-shot learning | DeepMind | 2022 Apr |  조성국 | [youtube](https://youtu.be/ihdHsoPin84)
26 | 12 주차 | Model scaling for VLM 1 | GIT: A Generative Image-to-text Transformer for Vision and Language | MS | 2022 May |  김기범 | [youtube](https://youtu.be/CIdlpeIGDuM)
27 | 13 주차 | Model scaling for VLM 2 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | Google | 2022 Sep |  이영수
28 | 14 주차 | wrap-up | 전체 흐름 재정리 |   |   | 강재욱
29 | 
30 | 
31 | ## 관련 링크
32 | - [거꾸로 읽는 SSL 시즌1](https://youtube.com/playlist?list=PLMSTs9nojhszOnaAwOg42NEsH_Jn6405o)
33 | - [거꾸로 읽는 SSL 시즌2](https://youtube.com/playlist?list=PLMSTs9nojhszeFer8gYnEI5yA5JenWzEA)
34 | - [거꾸로 읽는 SSL 유튭 채널](https://www.youtube.com/channel/UCTwcUmKhqeBhG0rQHkPVP6Q)
35 | 


--------------------------------------------------------------------------------