├── images ├── readme.md ├── Method.png ├── t_case2.png ├── table1.png ├── f_s_case2.png ├── ICLR_Figure1.png ├── methods_table.png ├── generative_method.png └── visual_examples.png └── README.md /images/readme.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /images/Method.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hzlsaber/So-Fake/HEAD/images/Method.png -------------------------------------------------------------------------------- /images/t_case2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hzlsaber/So-Fake/HEAD/images/t_case2.png -------------------------------------------------------------------------------- /images/table1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hzlsaber/So-Fake/HEAD/images/table1.png -------------------------------------------------------------------------------- /images/f_s_case2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hzlsaber/So-Fake/HEAD/images/f_s_case2.png -------------------------------------------------------------------------------- /images/ICLR_Figure1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hzlsaber/So-Fake/HEAD/images/ICLR_Figure1.png -------------------------------------------------------------------------------- /images/methods_table.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hzlsaber/So-Fake/HEAD/images/methods_table.png -------------------------------------------------------------------------------- /images/generative_method.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hzlsaber/So-Fake/HEAD/images/generative_method.png -------------------------------------------------------------------------------- /images/visual_examples.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hzlsaber/So-Fake/HEAD/images/visual_examples.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 |

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

3 | 4 |

5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |

16 |

17 | 18 | 19 | [Zhenglin Huang](https://scholar.google.com/citations?user=30SRxRAAAAAJ&hl=en&oi=ao), [Tianxiao Li](https://tianxiao1201.github.io/), [Xiangtai Li](https://lxtgh.github.io/), [Haiquan Wen](https://orcid.org/0009-0009-3804-6753), [Yiwei He](https://orcid.org/0000-0003-0717-8517), [Jiangning Zhang](https://www.researchgate.net/profile/Jiangning-Zhang), [Hao Fei](https://haofei.vip/), [Xi Yang](https://scholar.google.com/citations?user=ddfKpX0AAAAJ&hl=zh-CN), [Baoyuan Wu](https://sites.google.com/site/baoyuanwu2015/home), 20 | [Bei Peng](https://beipeng.github.io/), [Xiaowei Huang](https://cgi.csc.liv.ac.uk/~xiaowei/), [Guangliang Cheng](https://sites.google.com/view/guangliangcheng/homepage) 21 | 22 | 23 | Welcome to our work **So-Fake**, for social media forgery images detection. 24 | 25 | In this work, we propose: 26 | 27 | > ✅ **One Dataset:** **So-Fake-Set:** A large-scale, diverse dataset tailored for social media image forgery detection! 28 | > 29 | > ✅ **One Benchmark: So-Fake-OOD:** A challenging out-of-distribution benchmark built from real-world Reddit content. 30 | > 31 | > ✅ **One Method: So-Fake-R1:** A unified, explainable vision-language framework optimized via reinforcement learning. 32 | > 33 | 34 | 35 | ## Abstract 36 | Recent advances in AI-powered generative models have enabled the creation of increasingly realistic synthetic images, posing significant risks to information integrity and public trust on social media platforms. While robust detection frameworks and diverse, large-scale datasets are essential to mitigate these risks, existing academic efforts remain limited in scope: current datasets lack the diversity, scale, and realism required for social media contexts, and evaluation protocols rarely account for explanation or out-of-domain generalization. 37 | 38 | To bridge this gap, we introduce **So-Fake**, a comprehensive social media-oriented dataset for forgery detection consisting of two key components. First, we present **So-Fake-Set**, a large-scale dataset with over **2 million** photorealistic images from diverse generative sources, synthesized using a wide range of generative models. Second, to rigorously evaluate cross-domain robustness, we establish **So-Fake-OOD**, a novel and large-scale (**100K**) out-of-domain benchmark sourced from real social media platforms and featuring synthetic imagery from commercial models explicitly excluded from the training distribution, creating a realistic testbed that mirrors actual deployment scenarios. Leveraging these complementary datasets, we present **So-Fake-R1**, a baseline framework that applies reinforcement learning to encourage interpretable visual rationales. Experiments show that So-Fake surfaces substantial challenges for existing methods. By integrating a large-scale dataset, a realistic out-of-domain benchmark, and a multi-dimensional evaluation protocol, So-Fake establishes a new foundation for social media forgery detection research. 39 | 40 | ## News 41 | - 🔥 (23-05-2025) We are pleased to announce the release of [So-Fake-OOD](https://huggingface.co/datasets/saberzl/So-Fake-OOD). 42 | - 🔥 (29-10-2025) We are pleased to announce the release of [So-Fake-Set](https://huggingface.co/datasets/saberzl/So-Fake-Set). 43 | 44 | ## Overview 45 | 46 |

47 |

48 |

49 | (a) Overview. So-Fake comprises So-Fake-Set (train/val) and So-Fake-OOD (test), which together enable evaluation of detection, localization, and explanation with So-Fake-R1. 50 | (b) Illustrative Example. A real image from the subreddit pics is captioned by an LLM, combined with Language SAM and an inpainting model to produce tampered samples. So-Fake-R1 then analyzes the manipulated image and outputs the class label, localized region, and an interpretable rationale. 51 |

52 |

53 | 54 | ## Dataset Access 55 | 56 | We provide two methods to access the So-Fake-OOD: 57 | 58 | 1. Public Access via [Hugging Face](https://huggingface.co/datasets/saberzl/So-Fake-OOD) 59 | 60 | 61 | 2. Download from Google Drive 62 | [here](https://drive.google.com/drive/folders/1okP2S6LO-VvH69MDqpeRhYZypfJ0ZHoG?usp=sharing) 63 | 64 | **🔥 New**： 65 | We have updated So-Fake-OOD to version 2, which you can download via huggingface or Google Drive 66 | [here](https://drive.google.com/drive/folders/1U30QycEloRncS8iE2VqCbobrrhGNuiPL?usp=sharing) 67 | 68 | ## Method 69 |

70 |

71 |

(a): Overview of the So-Fake-R1 training process; (b): The detailed So-Fake-R1 GRPO training process. The example shows a tampered image where a boy has been manipulated.

72 | 73 |

74 | 75 | ## Generative Methods 76 | 77 |

78 |

Details of generative methods used in constructing So-Fake-Set and So-Fake-OOD. Column abbreviations: Set = So-Fake-Set, OOD = So-Fake-OOD, F = fully synthetic images, T = tampered images. Real data source abbreviations: F30k = Flickr30k, OI = OpenImages, OF = OpenForensics. 79 |

80 |

81 | 82 |

83 | 84 | ## Visual Cases 85 | 86 |

87 |

**Visual Cases of full synthetic images**

91 |

92 | 93 |

94 |

98 |

99 | 100 | ## Visual Output 101 | 102 |

103 |

107 |

108 | 109 | ## Citation 110 | 111 | ``` 112 | @misc{huang2025sofakebenchmarkingexplainingsocial, 113 | title={So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection}, 114 | author={Zhenglin Huang and Tianxiao Li and Xiangtai Li and Haiquan Wen and Yiwei He and Jiangning Zhang and Hao Fei and Xi Yang and Xiaowei Huang and Bei Peng and Guangliang Cheng}, 115 | journal={arXiv preprint arXiv:2505.13379}, 116 | year={2025} 117 | } 118 | ``` 119 | --------------------------------------------------------------------------------