├── submission_instructions.pdf └── README.md /submission_instructions.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EducationalTestingService/sarcasm/HEAD/submission_instructions.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Automatic Sarcasm Detection 2 | 3 | ***The Shared Task (2nd FigLang Workshop at ACL 2020) is now over. Thanks a lot, participants :)*** 4 | 5 | Please refer to `reddit` and `twitter` sub-directories for further references on datasets. 6 | 7 | For Twitter and Reddit, training and testing datasets are provided for sarcasm detection tasks in jsonlines format. 8 | 9 | Each line contains a JSON object with the following fields : 10 | - ***label*** : `SARCASM` or `NOT_SARCASM` 11 | - ***id***: String identifier for sample. This id will be required when making submissions. 12 | - **ONLY** in test data 13 | - ***response*** : the sarcastic response, whether a sarcastic Tweet or a Reddit post 14 | - ***context*** : the conversation context of the ***response*** 15 | - Note, the context is an ordered list of dialogue, i.e., if the context contains three elements, `c1`, `c2`, `c3`, in that order, then `c2` is a reply to `c1` and `c3` is a reply to `c2`. Further, if the sarcastic response is `r`, then `r` is a reply to `c3`. 16 | 17 | For instance, for the following training example : 18 | 19 | `"label": "SARCASM", "response": "Did Kelly just call someone else messy? Baaaahaaahahahaha", "context": ["X is looking a First Lady should . #classact, "didn't think it was tailored enough it looked messy"]` 20 | 21 | The response tweet, "Did Kelly..." is a reply to its immediate context "didn't think it was tailored..." which is a reply to "X is looking...". Your goal is to predict the label of the "response" while also using the context (i.e, the immediate or the full context). 22 | 23 | ***Dataset size statistics*** : 24 | 25 | | | Train | Test | 26 | |---------|-------|------| 27 | | Reddit | 4400 | 1800 | 28 | | Twitter | 5000 | 1800 | 29 | 30 | For Test, we will be providing you the ***response*** and the ***context***. We will also provide the ***id*** (i.e., identifier) to report the the results. 31 | 32 | ***Submission Instructions*** : Please follow the given [link](submission_instructions.pdf) 33 | 34 | ***Main References:*** 35 | 36 | [A Report on the 2020 Sarcasm Detection Shared Task.](https://www.aclweb.org/anthology/2020.figlang-1.1.pdf) Debanjan Ghosh, Avijit Vajpyee, Smaranda Muresan. Proceedings of the Second Workshop on Figurative Language Processing. 37 | 38 | --- 39 | ***Note***: Since we have collected our training data from popular social media platforms a large portion of the utterances are on controversial and/or political and social topics. Although we have pre-processed the training data and lightly edited to remove contentious text, many utterances still contain controversial perspectives (of the users) and informal language. 40 | 41 | --------------------------------------------------------------------------------