├── README.md └── demo_code.py /README.md: -------------------------------------------------------------------------------- 1 | # FineFake 2 | This is the dataset for **FineFake : A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection**. The main construction of FineFake is shown below. The code of construction for updating latest news will be released when the paper is accepted. 3 | ![construction_00](https://github.com/Accuser907/FineFake/assets/61140633/dbf1af33-9cc8-4f1d-9208-6be46a88fe54) 4 | 5 | # Getting Started 6 | Follow the instructions to download the dataset. You can download text data, metadata, image data and knowledge data. 7 | The dataset is divided into six topics and eight platforms: Politics, Entertainment, Business, Health, Society, Conflict. Snopes, Twitter, Reddit, CNN, Apnews, Cdc.gov, Nytimes, Washingtonpos. The dataset and images can be downloaded [here](https://drive.google.com/file/d/16D9ix7ZOisa4VVBznBTBcv1N7TA-jodH/view?usp=sharing). 8 | 9 | ## DataFrame file 10 | The data is stored as pickle file, it can be opened to dataframe by following codes. Details can be found at demo.py. 11 | ```c 12 | pip install pickle 13 | pip install pandas 14 | import pickle as pkl 15 | import pandas as pd 16 | with open(file_name,"rb") as f: 17 | data_df = pkl.load(f) # data_df is in dataframe 18 | ``` 19 | There are 13 columns in pickle file, each attribute and its corresponding meaning is shown in the table below: 20 | | text | image_path | entity_id | topic | label | fine-grained label | knowledge_embedding | description | relation | platform | author | date | comment | 21 | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | 22 | | news body text | image_path(relative path) | text-entity wiki id | topic from six topics | label | fine-grained label | knowledge_embedding | text-entity description | relation | The source of the news | author | The date of the news publication | comment | 23 | 24 | ## Labels 25 | For the binary label, "0" represents fake and "1" represents real. 26 | For the fine-grained label, each label and its corresponding meaning is shown in the table below: 27 | | 0 | 1 | 2 | 3 | 4 | 5 | 28 | | ----- | ----- | ----- | ----- | ----- | ----- | 29 | | real | text-image inconsistency | content-knowledge inconsistency | text-based fake | image-based fake | others | 30 | 31 | ## Guidelines 32 | - FineFake is designed to advance research in fake news detection and should not be used for any malicious or harmful purposes. Users should refrain from using the dataset for generating or spreading misinformation, manipulating public opinion, or any other activity that could harm individuals, groups, or society at large. 33 | - It is the responsibility of users to ensure that their models and research outcomes are fair and unbiased. Any biases inherent in the dataset must be carefully addressed in your work. If biases are detected, they should be documented, and appropriate mitigation strategies should be applied. 34 | - The FineFake dataset contains data sourced from public domains, but it is essential to respect the privacy and anonymity of individuals. Any attempt to de-anonymize individuals or re-identify entities within the dataset is strictly prohibited. All users must ensure that their research upholds the principles of privacy protection. 35 | -------------------------------------------------------------------------------- /demo_code.py: -------------------------------------------------------------------------------- 1 | # This code is the instruction of usage of FineFake 2 | # FineFake is stored in dataframe in pickle file, please use the followed commands to use FineFake: 3 | # pip install pickle 4 | # pip install pandas 5 | 6 | import pickle as pkl 7 | import pandas as pd 8 | 9 | if __name__ == "__main__": 10 | file_path = "" # the relative path/path of data 11 | with open(file_path,"rb") as f: 12 | data_df = pkl.load(f) # data_df is in dataframe 13 | # print the key columns of data_df 14 | print(data_df.columns) 15 | # if you want extract certain key columns, you can extract it as a list 16 | # ["news1","news2","news3"...] 17 | text_list = list(data_df["text"]) 18 | # you can also extract key columns as pandas file 19 | text_data = data_df["text"] 20 | # you can extract certain news 21 | news_data = data_df.iloc[0] 22 | --------------------------------------------------------------------------------