├── README.md
└── demo_code.py


/README.md:
--------------------------------------------------------------------------------
 1 | # FineFake
 2 | This is the dataset for **FineFake : A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection**. The main construction of FineFake is shown below. The code of construction for updating latest news will be released when the paper is accepted.
 3 | ![construction_00](https://github.com/Accuser907/FineFake/assets/61140633/dbf1af33-9cc8-4f1d-9208-6be46a88fe54)
 4 | 
 5 | # Getting Started
 6 | Follow the instructions to download the dataset. You can download text data, metadata, image data and knowledge data.
 7 | The dataset is divided into six topics and eight platforms: Politics, Entertainment, Business, Health, Society, Conflict. Snopes, Twitter, Reddit, CNN, Apnews, Cdc.gov, Nytimes, Washingtonpos. The dataset and images can be downloaded [here](https://drive.google.com/file/d/16D9ix7ZOisa4VVBznBTBcv1N7TA-jodH/view?usp=sharing).
 8 | 
 9 | ## DataFrame file
10 | The data is stored as pickle file, it can be opened to dataframe by following codes. Details can be found at demo.py.
11 | ```c
12 | pip install pickle
13 | pip install pandas
14 | import pickle as pkl
15 | import pandas as pd
16 | with open(file_name,"rb") as f:
17 |   data_df = pkl.load(f) # data_df is in dataframe 
18 | ```
19 | There are 13 columns in pickle file, each attribute and its corresponding meaning is shown in the table below:
20 | | text | image_path | entity_id | topic | label | fine-grained label | knowledge_embedding | description | relation | platform | author | date | comment |
21 | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
22 | | news body text | image_path(relative path) | text-entity wiki id | topic from six topics | label | fine-grained label | knowledge_embedding | text-entity description | relation | The source of the news | author | The date of the news publication | comment |
23 | 
24 | ## Labels
25 | For the binary label, "0" represents fake and "1" represents real.
26 | For the fine-grained label, each label and its corresponding meaning is shown in the table below:
27 | | 0 | 1 | 2 | 3 | 4 | 5 |
28 | | ----- | ----- | ----- | ----- | ----- | ----- |
29 | | real | text-image inconsistency | content-knowledge inconsistency | text-based fake | image-based fake | others |
30 | 
31 | ## Guidelines
32 | - FineFake is designed to advance research in fake news detection and should not be used for any malicious or harmful purposes.  Users should refrain from using the dataset for generating or spreading misinformation, manipulating public opinion, or any other activity that could harm individuals, groups, or society at large.
33 | - It is the responsibility of users to ensure that their models and research outcomes are fair and unbiased. Any biases inherent in the dataset must be carefully addressed in your work. If biases are detected, they should be documented, and appropriate mitigation strategies should be applied.
34 | - The FineFake dataset contains data sourced from public domains, but it is essential to respect the privacy and anonymity of individuals. Any attempt to de-anonymize individuals or re-identify entities within the dataset is strictly prohibited. All users must ensure that their research upholds the principles of privacy protection.
35 | 


--------------------------------------------------------------------------------
/demo_code.py:
--------------------------------------------------------------------------------
 1 | # This code is the instruction of usage of FineFake
 2 | # FineFake is stored in dataframe in pickle file, please use the followed commands to use FineFake:
 3 | # pip install pickle
 4 | # pip install pandas
 5 | 
 6 | import pickle as pkl
 7 | import pandas as pd
 8 | 
 9 | if __name__ == "__main__":
10 |     file_path = "" # the relative path/path of data
11 |     with open(file_path,"rb") as f:
12 |         data_df = pkl.load(f) # data_df is in dataframe
13 |     # print the key columns of data_df
14 |     print(data_df.columns) 
15 |     # if you want extract certain key columns, you can extract it as a list
16 |     # ["news1","news2","news3"...]
17 |     text_list = list(data_df["text"])
18 |     # you can also extract key columns as pandas file
19 |     text_data = data_df["text"]
20 |     # you can extract certain news
21 |     news_data = data_df.iloc[0]
22 |     


--------------------------------------------------------------------------------