├── NOTICE ├── prompts ├── rhyming │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── phrase_relatedness │ ├── open_5.txt │ ├── open_5-info_added.txt │ ├── closed-adv_5.txt │ ├── closed_5.txt │ ├── closed-adv_5-info_added.txt │ └── closed_5-info_added.txt ├── hindu_knowledge │ ├── open_5-info_removed.txt │ ├── open_5.txt │ ├── closed-adv_5-info_removed.txt │ ├── closed_5-info_removed.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── common_morpheme │ ├── open-info_removed_5.txt │ ├── open_5.txt │ ├── closed-adv-info_removed_5.txt │ ├── closed-info_removed_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── figure_of_speech_detection │ ├── open_5.txt │ ├── open_5-info_added.txt │ ├── closed_5.txt │ ├── closed-adv_5.txt │ ├── closed_5-info_added.txt │ └── closed-adv_5-info_added.txt ├── misconceptions │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── strategyqa │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── analytic_entailment │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── commonsense_qa │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── implicatures │ ├── open_5.txt │ ├── closed-adv_5.txt │ ├── closed_5.txt │ ├── open_5-info_added.txt │ ├── closed-adv_5-info_added.txt │ └── closed_5-info_added.txt ├── nonsense_words_grammar │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── logical_deduction │ ├── open_5-info_removed.txt │ ├── closed_3.txt │ ├── open_5.txt │ ├── closed-adv_5-info_removed.txt │ ├── closed_5-info_removed.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── physical_intuition │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── fact_checker │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── social_iqa │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── modified_arithmetic │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── causal_judgment │ ├── open_2.txt │ ├── closed-adv_2.txt │ └── closed_2.txt ├── codenames │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── gsm8k │ ├── open_5.txt │ ├── closed_5.txt │ └── closed-adv_5.txt ├── english_proverbs │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── vitaminc_fact_verification │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── strange_stories │ ├── open_4.txt │ ├── closed-adv_4.txt │ └── closed_4.txt └── tracking_shuffled_objects │ ├── open_5.txt │ ├── closed-adv_5.txt │ └── closed_5.txt ├── openai.py ├── inference.py ├── README.md ├── utils.py ├── evaluation.ipynb └── LICENSE /NOTICE: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------- 2 | Copyright 2023 3 | Ubiquitous Knowledge Processing (UKP) Lab 4 | Technische Universität Darmstadt 5 | ------------------------------------------------------------------------------- 6 | -------------------------------------------------------------------------------- /prompts/rhyming/open_5.txt: -------------------------------------------------------------------------------- 1 | represent 2 | The correct answer is "extent". 3 | 4 | michelle 5 | The correct answer is "hotel". 6 | 7 | conclude 8 | The correct answer is "pursued". 9 | 10 | hall 11 | The correct answer is "haul". 12 | 13 | fit 14 | The correct answer is "schmidt". 15 | 16 | -------------------------------------------------------------------------------- /prompts/phrase_relatedness/open_5.txt: -------------------------------------------------------------------------------- 1 | drug trafficking. The correct answer is "cartel". 2 | 3 | research laboratory. The correct answer is "science lab". 4 | 5 | Soft drink. The correct answer is "Fast food". 6 | 7 | Market trend. The correct answer is "Price". 8 | 9 | Liquid. The correct answer is "Water". 10 | 11 | -------------------------------------------------------------------------------- /prompts/hindu_knowledge/open_5-info_removed.txt: -------------------------------------------------------------------------------- 1 | associated with wind? The correct answer is "Vayu". 2 | 3 | commonly used at the beginning and end of chants and texts? The correct answer is "Om". 4 | 5 | the author of the Mahabharata? The correct answer is "Vyasa". 6 | 7 | associated with wind? The correct answer is "Vayu". 8 | 9 | lord Ganesha's vahana is an animal of what form? The correct answer is "Mouse". 10 | 11 | -------------------------------------------------------------------------------- /prompts/common_morpheme/open-info_removed_5.txt: -------------------------------------------------------------------------------- 1 | cadence, cadavre, accident, parachute. The correct answer is "tomber". 2 | 3 | colleague, delegation, legitimate, loyal. The correct answer is "law". 4 | 5 | opéra, ouvrage, coopération, opérationnel. The correct answer is "travailler". 6 | 7 | adapt, aptitude, attitude, aptonym. The correct answer is "fit". 8 | 9 | armoire, philharmonie, orteil, articulation. The correct answer is "jointure". 10 | 11 | -------------------------------------------------------------------------------- /prompts/figure_of_speech_detection/open_5.txt: -------------------------------------------------------------------------------- 1 | Kisses are the flowers of affection. The correct answer is "Metaphor". 2 | 3 | Sara exclaimed that the ice cream she purchased at the shop was awfully good. The correct answer is "Oxymoron". 4 | 5 | Lightning danced across the sky. The correct answer is "Personification". 6 | 7 | Geometry is so pointless. The correct answer is "Pun". 8 | 9 | After running a marathon, a marathoner eats like a horse. The correct answer is "Simile". 10 | 11 | -------------------------------------------------------------------------------- /prompts/misconceptions/open_5.txt: -------------------------------------------------------------------------------- 1 | A finite number of audio files can be saved on a standard compact disc. The correct answer is "T". 2 | 3 | Porcupines can shoot their quills. The correct answer is "F". 4 | 5 | Humanity would not starve or die out without bees and their pollination. The correct answer is "T". 6 | 7 | Great white sharks mistake human divers for pinnipeds. The correct answer is "F". 8 | 9 | There is such thing as an "alpha wolf" in a pack. The correct answer is "F". 10 | 11 | -------------------------------------------------------------------------------- /prompts/phrase_relatedness/open_5-info_added.txt: -------------------------------------------------------------------------------- 1 | drug trafficking. What is a phrase related to this? The correct answer is "cartel". 2 | 3 | research laboratory. What is a phrase related to this? The correct answer is "science lab". 4 | 5 | Soft drink. What is a phrase related to this? The correct answer is "Fast food". 6 | 7 | Market trend. What is a phrase related to this? The correct answer is "Price". 8 | 9 | Liquid. What is a phrase related to this? The correct answer is "Water". 10 | 11 | -------------------------------------------------------------------------------- /prompts/strategyqa/open_5.txt: -------------------------------------------------------------------------------- 1 | Is Gandalf hypothetically a formidable foe for Charmed's Barbas? The correct answer is "No". 2 | 3 | Are Naruhito's ancestors the focus of Romance of the Three Kingdoms? The correct answer is "No". 4 | 5 | Is coal needed to practice parachuting? The correct answer is "Yes". 6 | 7 | Was the British car, the Mini, the first car manufactured? The correct answer is "No". 8 | 9 | Does the land in close proximity to beaver dams suffer? The correct answer is "Yes". 10 | 11 | -------------------------------------------------------------------------------- /prompts/analytic_entailment/open_5.txt: -------------------------------------------------------------------------------- 1 | Beth owns a two-story house. So, she owns a house. The correct answer is "entailment". 2 | 3 | John has a coffee table. Therefore John drinks coffee. The correct answer is "no-entailment". 4 | 5 | Harry is wearing a suit. So, Harry isn't naked. The correct answer is "entailment". 6 | 7 | Sarah has a pet. So Sarah has a dog. The correct answer is "no-entailment". 8 | 9 | Ann is wearing a dress. So, Ann is not wearing pants. The correct answer is "no-entailment". 10 | 11 | -------------------------------------------------------------------------------- /prompts/hindu_knowledge/open_5.txt: -------------------------------------------------------------------------------- 1 | Which of the following Hindu deities is associated with wind? The correct answer is "Vayu". 2 | 3 | Which syllable is commonly used at the beginning and end of Hindu chants and texts? The correct answer is "Om". 4 | 5 | Which Hindu sage is known as the author of the Mahabharata? The correct answer is "Vyasa". 6 | 7 | Which of the following is the Hindu deity associated with wind? The correct answer is "Vayu". 8 | 9 | In a Hindu legend, lord Ganesha's vahana is an animal of what form? The correct answer is "Mouse". 10 | 11 | -------------------------------------------------------------------------------- /prompts/commonsense_qa/open_5.txt: -------------------------------------------------------------------------------- 1 | Why do people want to keep working? The correct answer is "being paid" 2 | 3 | Where would normally find two eyes? The correct answer is "person's head" 4 | 5 | What might you feel after doing housework for hours? The correct answer is "stress" 6 | 7 | Joe did something very bad and he was worried that The People would try punishing him. What type of punishment might be be worried about? The correct answer is "going to jail" 8 | 9 | The comforter was used constantly and treaded upon by everyone, where should it be placed? The correct answer is "washing machine" 10 | 11 | -------------------------------------------------------------------------------- /prompts/misconceptions/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: A finite number of audio files can be saved on a standard compact disc. 2 | OPTIONS: (a) "T", (b) "F" 3 | ANSWER: (a) 4 | 5 | QUESTION: Porcupines can shoot their quills. 6 | OPTIONS: (a) "T", (b) "F" 7 | ANSWER: (b) 8 | 9 | QUESTION: Humanity would not starve or die out without bees and their pollination. 10 | OPTIONS: (a) "T", (b) "F" 11 | ANSWER: (a) 12 | 13 | QUESTION: Great white sharks mistake human divers for pinnipeds. 14 | OPTIONS: (a) "T", (b) "F" 15 | ANSWER: (b) 16 | 17 | QUESTION: There is such thing as an "alpha wolf" in a pack. 18 | OPTIONS: (a) "T", (b) "F" 19 | ANSWER: (b) 20 | 21 | -------------------------------------------------------------------------------- /prompts/common_morpheme/open_5.txt: -------------------------------------------------------------------------------- 1 | Quel est le morphème commun parmi ces mots: cadence, cadavre, accident, parachute. The correct answer is "tomber". 2 | 3 | What is the common morpheme among these words: colleague, delegation, legitimate, loyal. The correct answer is "law". 4 | 5 | Quel est le morphème commun parmi ces mots: opéra, ouvrage, coopération, opérationnel. The correct answer is "travailler". 6 | 7 | What is the common morpheme among these words: adapt, aptitude, attitude, aptonym. The correct answer is "fit". 8 | 9 | Quel est le morphème commun parmi ces mots: armoire, philharmonie, orteil, articulation. The correct answer is "jointure". 10 | 11 | -------------------------------------------------------------------------------- /prompts/rhyming/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: represent 2 | OPTIONS: (a) "potential", (b) "extent", (c) "expressing", (d) "fixtures", (e) "depressing" 3 | ANSWER: (b) 4 | 5 | QUESTION: michelle 6 | OPTIONS: (a) "hotel", (b) "checks", (c) "indicator", (d) "critic", (e) "dice" 7 | ANSWER: (a) 8 | 9 | QUESTION: conclude 10 | OPTIONS: (a) "conducting", (b) "confirmed", (c) "comedy", (d) "pursued", (e) "dawson" 11 | ANSWER: (d) 12 | 13 | QUESTION: hall 14 | OPTIONS: (a) "crystal", (b) "haul", (c) "dealt", (d) "belly", (e) "hand" 15 | ANSWER: (b) 16 | 17 | QUESTION: fit 18 | OPTIONS: (a) "fish", (b) "file", (c) "schmidt", (d) "equity", (e) "fill" 19 | ANSWER: (c) 20 | 21 | -------------------------------------------------------------------------------- /prompts/strategyqa/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Is Gandalf hypothetically a formidable foe for Charmed's Barbas? 2 | OPTIONS: (a) "Yes", (b) "No" 3 | ANSWER: (b) 4 | 5 | QUESTION: Are Naruhito's ancestors the focus of Romance of the Three Kingdoms? 6 | OPTIONS: (a) "Yes", (b) "No" 7 | ANSWER: (b) 8 | 9 | QUESTION: Is coal needed to practice parachuting? 10 | OPTIONS: (a) "Yes", (b) "No" 11 | ANSWER: (a) 12 | 13 | QUESTION: Was the British car, the Mini, the first car manufactured? 14 | OPTIONS: (a) "Yes", (b) "No" 15 | ANSWER: (b) 16 | 17 | QUESTION: Does the land in close proximity to beaver dams suffer? 18 | OPTIONS: (a) "Yes", (b) "No" 19 | ANSWER: (a) 20 | 21 | -------------------------------------------------------------------------------- /prompts/phrase_relatedness/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: drug trafficking 2 | OPTIONS: (a) "united states", (b) "cartel", (c) "law enforcement officials", (d) "colombian" 3 | ANSWER: (b) 4 | 5 | QUESTION: research laboratory 6 | OPTIONS: (a) "job", (b) "court", (c) "science lab", (d) "storage" 7 | ANSWER: (c) 8 | 9 | QUESTION: Soft drink 10 | OPTIONS: (a) "Small intestine", (b) "Desktop", (c) "Fast food", (d) "Outlet store" 11 | ANSWER: (c) 12 | 13 | QUESTION: Market trend 14 | OPTIONS: (a) "Price", (b) "Vertical integration", (c) "Work", (d) "Salary" 15 | ANSWER: (a) 16 | 17 | QUESTION: Liquid 18 | OPTIONS: (a) "Ice", (b) "Stone", (c) "Wood", (d) "Water" 19 | ANSWER: (d) 20 | 21 | -------------------------------------------------------------------------------- /prompts/misconceptions/closed_5.txt: -------------------------------------------------------------------------------- 1 | A finite number of audio files can be saved on a standard compact disc. The possible answers are "T", "F", but the correct answer is "T". 2 | 3 | Porcupines can shoot their quills. The possible answers are "T", "F", but the correct answer is "F". 4 | 5 | Humanity would not starve or die out without bees and their pollination. The possible answers are "T", "F", but the correct answer is "T". 6 | 7 | Great white sharks mistake human divers for pinnipeds. The possible answers are "T", "F", but the correct answer is "F". 8 | 9 | There is such thing as an "alpha wolf" in a pack. The possible answers are "T", "F", but the correct answer is "F". 10 | 11 | -------------------------------------------------------------------------------- /prompts/rhyming/closed_5.txt: -------------------------------------------------------------------------------- 1 | represent 2 | The possible answers are "potential", "extent", "expressing", "fixtures", "depressing", but the correct answer is "extent". 3 | 4 | michelle 5 | The possible answers are "hotel", "checks", "indicator", "critic", "dice", but the correct answer is "hotel". 6 | 7 | conclude 8 | The possible answers are "conducting", "confirmed", "comedy", "pursued", "dawson", but the correct answer is "pursued". 9 | 10 | hall 11 | The possible answers are "crystal", "haul", "dealt", "belly", "hand", but the correct answer is "haul". 12 | 13 | fit 14 | The possible answers are "fish", "file", "schmidt", "equity", "fill", but the correct answer is "schmidt". 15 | 16 | -------------------------------------------------------------------------------- /prompts/strategyqa/closed_5.txt: -------------------------------------------------------------------------------- 1 | Is Gandalf hypothetically a formidable foe for Charmed's Barbas? The possible answers are "Yes", "No", but the correct answer is "No". 2 | 3 | Are Naruhito's ancestors the focus of Romance of the Three Kingdoms? The possible answers are "Yes", "No", but the correct answer is "No". 4 | 5 | Is coal needed to practice parachuting? The possible answers are "Yes", "No", but the correct answer is "Yes". 6 | 7 | Was the British car, the Mini, the first car manufactured? The possible answers are "Yes", "No", but the correct answer is "No". 8 | 9 | Does the land in close proximity to beaver dams suffer? The possible answers are "Yes", "No", but the correct answer is "Yes". 10 | 11 | -------------------------------------------------------------------------------- /prompts/phrase_relatedness/closed_5.txt: -------------------------------------------------------------------------------- 1 | drug trafficking. The possible answers are "united states", "cartel", "law enforcement officials", "colombian", but the correct answer is "cartel". 2 | 3 | research laboratory. The possible answers are "job", "court", "science lab", "storage", but the correct answer is "science lab". 4 | 5 | Soft drink. The possible answers are "Small intestine", "Desktop", "Fast food", "Outlet store", but the correct answer is "Fast food". 6 | 7 | Market trend. The possible answers are "Price", "Vertical integration", "Work", "Salary", but the correct answer is "Price". 8 | 9 | Liquid. The possible answers are "Ice", "Stone", "Wood", "Water", but the correct answer is "Water". 10 | 11 | -------------------------------------------------------------------------------- /prompts/analytic_entailment/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Beth owns a two-story house. So, she owns a house. 2 | OPTIONS: (a) "entailment", (b) "no-entailment" 3 | ANSWER: (a) 4 | 5 | QUESTION: John has a coffee table. Therefore John drinks coffee. 6 | OPTIONS: (a) "entailment", (b) "no-entailment" 7 | ANSWER: (b) 8 | 9 | QUESTION: Harry is wearing a suit. So, Harry isn't naked. 10 | OPTIONS: (a) "entailment", (b) "no-entailment" 11 | ANSWER: (a) 12 | 13 | QUESTION: Sarah has a pet. So Sarah has a dog. 14 | OPTIONS: (a) "entailment", (b) "no-entailment" 15 | ANSWER: (b) 16 | 17 | QUESTION: Ann is wearing a dress. So, Ann is not wearing pants. 18 | OPTIONS: (a) "entailment", (b) "no-entailment" 19 | ANSWER: (b) 20 | 21 | -------------------------------------------------------------------------------- /prompts/hindu_knowledge/closed-adv_5-info_removed.txt: -------------------------------------------------------------------------------- 1 | QUESTION: associated with wind? 2 | OPTIONS: (a) "Agni", (b) "Vayu", (c) "Varuna", (d) "Indra" 3 | ANSWER: (b) 4 | 5 | QUESTION: commonly used at the beginning and end of chants and texts? 6 | OPTIONS: (a) "Ma", (b) "Mo", (c) "Om", (d) "Na" 7 | ANSWER: (c) 8 | 9 | QUESTION: the author of the Mahabharata? 10 | OPTIONS: (a) "Valmiki", (b) "Vyasa", (c) "Vashistha", (d) "Vishvamitra" 11 | ANSWER: (b) 12 | 13 | QUESTION: associated with wind? 14 | OPTIONS: (a) "Vayu", (b) "Agni", (c) "Varuna", (d) "Surya" 15 | ANSWER: (a) 16 | 17 | QUESTION: lord Ganesha's vahana is an animal of what form? 18 | OPTIONS: (a) "Snake", (b) "Lion", (c) "Elephant", (d) "Mouse" 19 | ANSWER: (d) 20 | 21 | -------------------------------------------------------------------------------- /prompts/hindu_knowledge/closed_5-info_removed.txt: -------------------------------------------------------------------------------- 1 | associated with wind? The possible answers are "Agni", "Vayu", "Varuna", "Indra", but the correct answer is "Vayu". 2 | 3 | commonly used at the beginning and end of chants and texts? The possible answers are "Ma", "Mo", "Om", "Na", but the correct answer is "Om". 4 | 5 | the author of the Mahabharata? The possible answers are "Valmiki", "Vyasa", "Vashistha", "Vishvamitra", but the correct answer is "Vyasa". 6 | 7 | associated with wind? The possible answers are "Vayu", "Agni", "Varuna", "Surya", but the correct answer is "Vayu". 8 | 9 | lord Ganesha's vahana is an animal of what form? The possible answers are "Snake", "Lion", "Elephant", "Mouse", but the correct answer is "Mouse". 10 | 11 | -------------------------------------------------------------------------------- /prompts/implicatures/open_5.txt: -------------------------------------------------------------------------------- 1 | Speaker 1: 'Have you seen them recently?' Speaker 2: 'I haven't seen them in ages.' The correct answer is "no". 2 | 3 | Speaker 1: 'You want answers?!' Speaker 2: 'I want the truth.' The correct answer is "yes". 4 | 5 | Speaker 1: 'I am going to conduct my psychology experiment this Saturday. I would be having a handful. Would you help by writing the names of the participants?' Speaker 2: 'I have got some work on my own.' The correct answer is "no". 6 | 7 | Speaker 1: 'Do you think you will ever work at that company again?' Speaker 2: 'When pigs fly.' The correct answer is "no". 8 | 9 | Speaker 1: 'You mean I pretend I have a mom?' Speaker 2: 'You can do that, can't you?' The correct answer is "yes". 10 | 11 | -------------------------------------------------------------------------------- /prompts/nonsense_words_grammar/open_5.txt: -------------------------------------------------------------------------------- 1 | Which word in the following sentence is a verb? I'd gralsillit onto the secure felisheret. The correct answer is "gralsillit". 2 | 3 | Which word in the following sentence is a noun? Gralliciously recarpilate the brooktoine. The correct answer is "brooktoine". 4 | 5 | Which word in the following sentence is an adverb? The helipioles defractated regillently for the broarkers. The correct answer is "regillently". 6 | 7 | Which word in the following sentence is an adjective? Procaliff stefiliously held the brizzerous box. The correct answer is "brizzerous". 8 | 9 | What is the grammatical object of the following sentence? Yuki inquisitively gerriantates the polderiot. The correct answer is "polderiot". 10 | 11 | -------------------------------------------------------------------------------- /prompts/figure_of_speech_detection/open_5-info_added.txt: -------------------------------------------------------------------------------- 1 | Kisses are the flowers of affection. What type of figurative language does this represent? The correct answer is "Metaphor". 2 | 3 | Sara exclaimed that the ice cream she purchased at the shop was awfully good. What type of figurative language does this represent? The correct answer is "Oxymoron". 4 | 5 | Lightning danced across the sky. What type of figurative language does this represent? The correct answer is "Personification". 6 | 7 | Geometry is so pointless. What type of figurative language does this represent? The correct answer is "Pun". 8 | 9 | After running a marathon, a marathoner eats like a horse. What type of figurative language does this represent? The correct answer is "Simile". 10 | 11 | -------------------------------------------------------------------------------- /prompts/logical_deduction/open_5-info_removed.txt: -------------------------------------------------------------------------------- 1 | The blue book is to the right of the yellow book. The orange book is the second from the left. The correct answer is "The blue book is the rightmost". 2 | 3 | The brown book is to the left of the orange book. The green book is to the left of the brown book. The correct answer is "The orange book is the rightmost". 4 | 5 | The mangoes are less expensive than the peaches. The apples are less expensive than the mangoes. The correct answer is "The peaches are the most expensive". 6 | 7 | Dan finished above Amy. Amy finished above Mel. The correct answer is "Mel finished last". 8 | 9 | The minivan is newer than the sedan. The minivan is older than the bus. The correct answer is "The minivan is the second-newest". 10 | 11 | -------------------------------------------------------------------------------- /prompts/common_morpheme/closed-adv-info_removed_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: cadence, cadavre, accident, parachute. 2 | OPTIONS: (a) "rythme", (b) "corps", (c) "sauter", (d) "tomber" 3 | ANSWER: (d) 4 | 5 | QUESTION: colleague, delegation, legitimate, loyal. 6 | OPTIONS: (a) "friend", (b) "law", (c) "true", (d) "pet" 7 | ANSWER: (b) 8 | 9 | QUESTION: opéra, ouvrage, coopération, opérationnel. 10 | OPTIONS: (a) "travailler", (b) "collaboration", (c) "musique", (d) "chirurgicale" 11 | ANSWER: (a) 12 | 13 | QUESTION: adapt, aptitude, attitude, aptonym. 14 | OPTIONS: (a) "mood", (b) "test", (c) "occupation", (d) "fit" 15 | ANSWER: (d) 16 | 17 | QUESTION: armoire, philharmonie, orteil, articulation. 18 | OPTIONS: (a) "garde-robe", (b) "musique", (c) "jointure", (d) "pied" 19 | ANSWER: (c) 20 | 21 | -------------------------------------------------------------------------------- /prompts/physical_intuition/open_5.txt: -------------------------------------------------------------------------------- 1 | I throw a ball towards the ground. Which direction does the ball initially move in? The correct answer is "Down". 2 | 3 | A person is on a train going 100 km/h East and throws a ball 50 km/h to the West. At that moment there is a bird flying next to the train at 50 km/h to the East. How is the bird moving from the perspective of the ball? The correct answer is "0 km/h". 4 | 5 | The primary interaction between two individual protons separated by one micron is the The correct answer is "Electromagnetic force". 6 | 7 | A decay process occurs via the production of a W boson. What interaction was involved? The correct answer is "Weak". 8 | 9 | The primary interaction between two neutrons in the same nucleus is the The correct answer is "Strong force". 10 | 11 | -------------------------------------------------------------------------------- /prompts/common_morpheme/closed-info_removed_5.txt: -------------------------------------------------------------------------------- 1 | cadence, cadavre, accident, parachute. The possible answers are "rythme", "corps", "sauter", "tomber", but the correct answer is "tomber". 2 | 3 | colleague, delegation, legitimate, loyal. The possible answers are "friend", "law", "true", "pet", but the correct answer is "law". 4 | 5 | opéra, ouvrage, coopération, opérationnel. The possible answers are "travailler", "collaboration", "musique", "chirurgicale", but the correct answer is "travailler". 6 | 7 | adapt, aptitude, attitude, aptonym. The possible answers are "mood", "test", "occupation", "fit", but the correct answer is "fit". 8 | 9 | armoire, philharmonie, orteil, articulation. The possible answers are "garde-robe", "musique", "jointure", "pied", but the correct answer is "jointure". 10 | 11 | -------------------------------------------------------------------------------- /prompts/analytic_entailment/closed_5.txt: -------------------------------------------------------------------------------- 1 | Beth owns a two-story house. So, she owns a house. The possible answers are "entailment", "no-entailment", but the correct answer is "entailment". 2 | 3 | John has a coffee table. Therefore John drinks coffee. The possible answers are "entailment", "no-entailment", but the correct answer is "no-entailment". 4 | 5 | Harry is wearing a suit. So, Harry isn't naked. The possible answers are "entailment", "no-entailment", but the correct answer is "entailment". 6 | 7 | Sarah has a pet. So Sarah has a dog. The possible answers are "entailment", "no-entailment", but the correct answer is "no-entailment". 8 | 9 | Ann is wearing a dress. So, Ann is not wearing pants. The possible answers are "entailment", "no-entailment", but the correct answer is "no-entailment". 10 | 11 | -------------------------------------------------------------------------------- /prompts/phrase_relatedness/closed-adv_5-info_added.txt: -------------------------------------------------------------------------------- 1 | QUESTION: drug trafficking. What is a phrase related to this? 2 | OPTIONS: (a) "united states", (b) "cartel", (c) "law enforcement officials", (d) "colombian" 3 | ANSWER: (b) 4 | 5 | QUESTION: research laboratory. What is a phrase related to this? 6 | OPTIONS: (a) "job", (b) "court", (c) "science lab", (d) "storage" 7 | ANSWER: (c) 8 | 9 | QUESTION: Soft drink. What is a phrase related to this? 10 | OPTIONS: (a) "Small intestine", (b) "Desktop", (c) "Fast food", (d) "Outlet store" 11 | ANSWER: (c) 12 | 13 | QUESTION: Market trend. What is a phrase related to this? 14 | OPTIONS: (a) "Price", (b) "Vertical integration", (c) "Work", (d) "Salary" 15 | ANSWER: (a) 16 | 17 | QUESTION: Liquid. What is a phrase related to this? 18 | OPTIONS: (a) "Ice", (b) "Stone", (c) "Wood", (d) "Water" 19 | ANSWER: (d) 20 | 21 | -------------------------------------------------------------------------------- /prompts/hindu_knowledge/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Which of the following Hindu deities is associated with wind? 2 | OPTIONS: (a) "Agni", (b) "Vayu", (c) "Varuna", (d) "Indra" 3 | ANSWER: (b) 4 | 5 | QUESTION: Which syllable is commonly used at the beginning and end of Hindu chants and texts? 6 | OPTIONS: (a) "Ma", (b) "Mo", (c) "Om", (d) "Na" 7 | ANSWER: (c) 8 | 9 | QUESTION: Which Hindu sage is known as the author of the Mahabharata? 10 | OPTIONS: (a) "Valmiki", (b) "Vyasa", (c) "Vashistha", (d) "Vishvamitra" 11 | ANSWER: (b) 12 | 13 | QUESTION: Which of the following is the Hindu deity associated with wind? 14 | OPTIONS: (a) "Vayu", (b) "Agni", (c) "Varuna", (d) "Surya" 15 | ANSWER: (a) 16 | 17 | QUESTION: In a Hindu legend, lord Ganesha's vahana is an animal of what form? 18 | OPTIONS: (a) "Snake", (b) "Lion", (c) "Elephant", (d) "Mouse" 19 | ANSWER: (d) 20 | 21 | -------------------------------------------------------------------------------- /prompts/fact_checker/open_5.txt: -------------------------------------------------------------------------------- 1 | On June 2017, the following claim was made: The Ellen Show began its broadcast in the 21st century. 2 | Q: Was this claim true or false? The correct answer is "true". 3 | 4 | On June 2017, the following claim was made: Estonia does not border the Baltic Sea. 5 | Q: Was this claim true or false? The correct answer is "false". 6 | 7 | On June 2017, the following claim was made: Stephen Hillenburg directed The SpongeBob SquarePants Movie. 8 | Q: Was this claim true or false? The correct answer is "true". 9 | 10 | On June 2017, the following claim was made: Rick Ross founded the Maybach Music Group in 2009. 11 | Q: Was this claim true or false? The correct answer is "true". 12 | 13 | On June 2017, the following claim was made: Sennacherib's youngest daughter was Esarhaddon. 14 | Q: Was this claim true or false? The correct answer is "false". 15 | 16 | -------------------------------------------------------------------------------- /prompts/hindu_knowledge/closed_5.txt: -------------------------------------------------------------------------------- 1 | Which of the following Hindu deities is associated with wind? The possible answers are "Agni", "Vayu", "Varuna", "Indra", but the correct answer is "Vayu". 2 | 3 | Which syllable is commonly used at the beginning and end of Hindu chants and texts? The possible answers are "Ma", "Mo", "Om", "Na", but the correct answer is "Om". 4 | 5 | Which Hindu sage is known as the author of the Mahabharata? The possible answers are "Valmiki", "Vyasa", "Vashistha", "Vishvamitra", but the correct answer is "Vyasa". 6 | 7 | Which of the following is the Hindu deity associated with wind? The possible answers are "Vayu", "Agni", "Varuna", "Surya", but the correct answer is "Vayu". 8 | 9 | In a Hindu legend, lord Ganesha's vahana is an animal of what form? The possible answers are "Snake", "Lion", "Elephant", "Mouse", but the correct answer is "Mouse". 10 | 11 | -------------------------------------------------------------------------------- /prompts/phrase_relatedness/closed_5-info_added.txt: -------------------------------------------------------------------------------- 1 | drug trafficking. What is a phrase related to this? The possible answers are "united states", "cartel", "law enforcement officials", "colombian", but the correct answer is "cartel". 2 | 3 | research laboratory. What is a phrase related to this? The possible answers are "job", "court", "science lab", "storage", but the correct answer is "science lab". 4 | 5 | Soft drink. What is a phrase related to this? The possible answers are "Small intestine", "Desktop", "Fast food", "Outlet store", but the correct answer is "Fast food". 6 | 7 | Market trend. What is a phrase related to this? The possible answers are "Price", "Vertical integration", "Work", "Salary", but the correct answer is "Price". 8 | 9 | Liquid. What is a phrase related to this? The possible answers are "Ice", "Stone", "Wood", "Water", but the correct answer is "Water". 10 | 11 | -------------------------------------------------------------------------------- /prompts/social_iqa/open_5.txt: -------------------------------------------------------------------------------- 1 | Austin's family was celebrating their parents 50th anniversary during dinner at a new restaurant. What would Austin's family do next? The correct answer is "Eat dinner at the restaurant". 2 | 3 | After they picked a hiking trail to take together, Sydney met Skylar near the canyon edge. What does Sydney need to do before this? The correct answer is "Pack for the hike". 4 | 5 | Tracy gave video games to my friend. They knew they were quality titles they would enjoy. What will Tracy want to do next? The correct answer is "Be generous". 6 | 7 | Quinn cried their eyes out and ate a whole carton of icecream over her breakup. What will Quinn want to do next? The correct answer is "Keep crying". 8 | 9 | Jesse had made plans with Cameron to go to the dance, but instead stood Cameron up. What will Cameron want to do next? The correct answer is "Tell Jesse how they feel". 10 | 11 | -------------------------------------------------------------------------------- /prompts/implicatures/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Speaker 1: 'Have you seen them recently?' Speaker 2: 'I haven't seen them in ages.' 2 | OPTIONS: (a) "yes", (b) "no" 3 | ANSWER: (b) 4 | 5 | QUESTION: Speaker 1: 'You want answers?!' Speaker 2: 'I want the truth.' 6 | OPTIONS: (a) "yes", (b) "no" 7 | ANSWER: (a) 8 | 9 | QUESTION: Speaker 1: 'I am going to conduct my psychology experiment this Saturday. I would be having a handful. Would you help by writing the names of the participants?' Speaker 2: 'I have got some work on my own.' 10 | OPTIONS: (a) "yes", (b) "no" 11 | ANSWER: (b) 12 | 13 | QUESTION: Speaker 1: 'Do you think you will ever work at that company again?' Speaker 2: 'When pigs fly.' 14 | OPTIONS: (a) "yes", (b) "no" 15 | ANSWER: (b) 16 | 17 | QUESTION: Speaker 1: 'You mean I pretend I have a mom?' Speaker 2: 'You can do that, can't you?' 18 | OPTIONS: (a) "yes", (b) "no" 19 | ANSWER: (a) 20 | 21 | -------------------------------------------------------------------------------- /prompts/implicatures/closed_5.txt: -------------------------------------------------------------------------------- 1 | Speaker 1: 'Have you seen them recently?' Speaker 2: 'I haven't seen them in ages.' The possible answers are "yes", "no", but the correct answer is "no". 2 | 3 | Speaker 1: 'You want answers?!' Speaker 2: 'I want the truth.' The possible answers are "yes", "no", but the correct answer is "yes". 4 | 5 | Speaker 1: 'I am going to conduct my psychology experiment this Saturday. I would be having a handful. Would you help by writing the names of the participants?' Speaker 2: 'I have got some work on my own.' The possible answers are "yes", "no", but the correct answer is "no". 6 | 7 | Speaker 1: 'Do you think you will ever work at that company again?' Speaker 2: 'When pigs fly.' The possible answers are "yes", "no", but the correct answer is "no". 8 | 9 | Speaker 1: 'You mean I pretend I have a mom?' Speaker 2: 'You can do that, can't you?' The possible answers are "yes", "no", but the correct answer is "yes". 10 | 11 | -------------------------------------------------------------------------------- /prompts/implicatures/open_5-info_added.txt: -------------------------------------------------------------------------------- 1 | Speaker 1: 'Have you seen them recently?' Speaker 2: 'I haven't seen them in ages.' What is the meaning of Speaker 2's response? The correct answer is "no". 2 | 3 | Speaker 1: 'You want answers?!' Speaker 2: 'I want the truth.' What is the meaning of Speaker 2's response? The correct answer is "yes". 4 | 5 | Speaker 1: 'I am going to conduct my psychology experiment this Saturday. I would be having a handful. Would you help by writing the names of the participants?' Speaker 2: 'I have got some work on my own.' What is the meaning of Speaker 2's response? The correct answer is "no". 6 | 7 | Speaker 1: 'Do you think you will ever work at that company again?' Speaker 2: 'When pigs fly.' What is the meaning of Speaker 2's response? The correct answer is "no". 8 | 9 | Speaker 1: 'You mean I pretend I have a mom?' Speaker 2: 'You can do that, can't you?' What is the meaning of Speaker 2's response? The correct answer is "yes". 10 | 11 | -------------------------------------------------------------------------------- /prompts/common_morpheme/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Quel est le morphème commun parmi ces mots: cadence, cadavre, accident, parachute. 2 | OPTIONS: (a) "rythme", (b) "corps", (c) "sauter", (d) "tomber" 3 | ANSWER: (d) 4 | 5 | QUESTION: What is the common morpheme among these words: colleague, delegation, legitimate, loyal. 6 | OPTIONS: (a) "friend", (b) "law", (c) "true", (d) "pet" 7 | ANSWER: (b) 8 | 9 | QUESTION: Quel est le morphème commun parmi ces mots: opéra, ouvrage, coopération, opérationnel. 10 | OPTIONS: (a) "travailler", (b) "collaboration", (c) "musique", (d) "chirurgicale" 11 | ANSWER: (a) 12 | 13 | QUESTION: What is the common morpheme among these words: adapt, aptitude, attitude, aptonym. 14 | OPTIONS: (a) "mood", (b) "test", (c) "occupation", (d) "fit" 15 | ANSWER: (d) 16 | 17 | QUESTION: Quel est le morphème commun parmi ces mots: armoire, philharmonie, orteil, articulation. 18 | OPTIONS: (a) "garde-robe", (b) "musique", (c) "jointure", (d) "pied" 19 | ANSWER: (c) 20 | 21 | -------------------------------------------------------------------------------- /prompts/logical_deduction/closed_3.txt: -------------------------------------------------------------------------------- 1 | On a shelf, there are three books: an orange book, a yellow book, and a blue book. The blue book is to the right of the yellow book. The orange book is the second from the left. The possible answers are "The orange book is the rightmost", "The yellow book is the rightmost", "The blue book is the rightmost", but the correct answer is "The blue book is the rightmost". 2 | 3 | In a golf tournament, there were three golfers: Dan, Mel, and Amy. Dan finished above Amy. Amy finished above Mel. The possible answers are "Dan finished last", "Mel finished last", "Amy finished last", but the correct answer is "Mel finished last". 4 | 5 | In an antique car show, there are three vehicles: a minivan, a bus, and a sedan. The minivan is newer than the sedan. The minivan is older than the bus. The possible answers are "The minivan is the second-newest", "The bus is the second-newest", "The sedan is the second-newest", but the correct answer is "The minivan is the second-newest". 6 | 7 | -------------------------------------------------------------------------------- /prompts/common_morpheme/closed_5.txt: -------------------------------------------------------------------------------- 1 | Quel est le morphème commun parmi ces mots: cadence, cadavre, accident, parachute. The possible answers are "rythme", "corps", "sauter", "tomber", but the correct answer is "tomber". 2 | 3 | What is the common morpheme among these words: colleague, delegation, legitimate, loyal. The possible answers are "friend", "law", "true", "pet", but the correct answer is "law". 4 | 5 | Quel est le morphème commun parmi ces mots: opéra, ouvrage, coopération, opérationnel. The possible answers are "travailler", "collaboration", "musique", "chirurgicale", but the correct answer is "travailler". 6 | 7 | What is the common morpheme among these words: adapt, aptitude, attitude, aptonym. The possible answers are "mood", "test", "occupation", "fit", but the correct answer is "fit". 8 | 9 | Quel est le morphème commun parmi ces mots: armoire, philharmonie, orteil, articulation. The possible answers are "garde-robe", "musique", "jointure", "pied", but the correct answer is "jointure". 10 | 11 | -------------------------------------------------------------------------------- /prompts/commonsense_qa/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Why do people want to keep working? 2 | OPTIONS: (a) "frustration", (b) "make money", (c) "boredom", (d) "being paid", (e) "blisters" 3 | ANSWER: (d) 4 | 5 | QUESTION: Where would normally find two eyes? 6 | OPTIONS: (a) "someones face", (b) "faces", (c) "person's head", (d) "potato" 7 | ANSWER: (c) 8 | 9 | QUESTION: What might you feel after doing housework for hours? 10 | OPTIONS: (a) "anger", (b) "not boredom", (c) "stress", (d) "boredom", (e) "anxiety" 11 | ANSWER: (c) 12 | 13 | QUESTION: Joe did something very bad and he was worried that The People would try punishing him. What type of punishment might be be worried about? 14 | OPTIONS: (a) "train", (b) "hatred", (c) "get angry", (d) "going to jail", (e) "public outrage" 15 | ANSWER: (d) 16 | 17 | QUESTION: The comforter was used constantly and treaded upon by everyone, where should it be placed? 18 | OPTIONS: (a) "garbage", (b) "linen closet", (c) "bedroom", (d) "washing machine", (e) "livingroom" 19 | ANSWER: (d) 20 | 21 | -------------------------------------------------------------------------------- /prompts/fact_checker/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: On June 2017, the following claim was made: The Ellen Show began its broadcast in the 21st century. 2 | Q: Was this claim true or false? 3 | OPTIONS: (a) "true", (b) "false" 4 | ANSWER: (a) 5 | 6 | QUESTION: On June 2017, the following claim was made: Estonia does not border the Baltic Sea. 7 | Q: Was this claim true or false? 8 | OPTIONS: (a) "true", (b) "false" 9 | ANSWER: (b) 10 | 11 | QUESTION: On June 2017, the following claim was made: Stephen Hillenburg directed The SpongeBob SquarePants Movie. 12 | Q: Was this claim true or false? 13 | OPTIONS: (a) "true", (b) "false" 14 | ANSWER: (a) 15 | 16 | QUESTION: On June 2017, the following claim was made: Rick Ross founded the Maybach Music Group in 2009. 17 | Q: Was this claim true or false? 18 | OPTIONS: (a) "true", (b) "false" 19 | ANSWER: (a) 20 | 21 | QUESTION: On June 2017, the following claim was made: Sennacherib's youngest daughter was Esarhaddon. 22 | Q: Was this claim true or false? 23 | OPTIONS: (a) "true", (b) "false" 24 | ANSWER: (b) 25 | 26 | -------------------------------------------------------------------------------- /prompts/commonsense_qa/closed_5.txt: -------------------------------------------------------------------------------- 1 | Why do people want to keep working? The possible answers are "frustration", "make money", "boredom", "being paid", "blisters", but the correct answer is "being paid". 2 | 3 | Where would normally find two eyes? The possible answers are "someones face", "faces", "person's head", "potato", but the correct answer is "person's head". 4 | 5 | What might you feel after doing housework for hours? The possible answers are "anger", "not boredom", "stress", "boredom", "anxiety", but the correct answer is "stress". 6 | 7 | Joe did something very bad and he was worried that The People would try punishing him. What type of punishment might be be worried about? The possible answers are "train", "hatred", "get angry", "going to jail", "public outrage", but the correct answer is "going to jail". 8 | 9 | The comforter was used constantly and treaded upon by everyone, where should it be placed? The possible answers are "garbage", "linen closet", "bedroom", "washing machine", "livingroom", but the correct answer is "washing machine". 10 | 11 | -------------------------------------------------------------------------------- /prompts/fact_checker/closed_5.txt: -------------------------------------------------------------------------------- 1 | On June 2017, the following claim was made: The Ellen Show began its broadcast in the 21st century. 2 | Q: Was this claim true or false? The possible answers are "true", "false", but the correct answer is "true". 3 | 4 | On June 2017, the following claim was made: Estonia does not border the Baltic Sea. 5 | Q: Was this claim true or false? The possible answers are "true", "false", but the correct answer is "false". 6 | 7 | On June 2017, the following claim was made: Stephen Hillenburg directed The SpongeBob SquarePants Movie. 8 | Q: Was this claim true or false? The possible answers are "true", "false", but the correct answer is "true". 9 | 10 | On June 2017, the following claim was made: Rick Ross founded the Maybach Music Group in 2009. 11 | Q: Was this claim true or false? The possible answers are "true", "false", but the correct answer is "true". 12 | 13 | On June 2017, the following claim was made: Sennacherib's youngest daughter was Esarhaddon. 14 | Q: Was this claim true or false? The possible answers are "true", "false", but the correct answer is "false". 15 | 16 | -------------------------------------------------------------------------------- /prompts/logical_deduction/open_5.txt: -------------------------------------------------------------------------------- 1 | On a shelf, there are three books: an orange book, a yellow book, and a blue book. The blue book is to the right of the yellow book. The orange book is the second from the left. The correct answer is "The blue book is the rightmost". 2 | 3 | On a shelf, there are three books: a green book, a brown book, and an orange book. The brown book is to the left of the orange book. The green book is to the left of the brown book. The correct answer is "The orange book is the rightmost". 4 | 5 | A fruit stand sells three fruits: apples, peaches, and mangoes. The mangoes are less expensive than the peaches. The apples are less expensive than the mangoes. The correct answer is "The peaches are the most expensive". 6 | 7 | In a golf tournament, there were three golfers: Dan, Mel, and Amy. Dan finished above Amy. Amy finished above Mel. The correct answer is "Mel finished last". 8 | 9 | In an antique car show, there are three vehicles: a minivan, a bus, and a sedan. The minivan is newer than the sedan. The minivan is older than the bus. The correct answer is "The minivan is the second-newest". 10 | 11 | -------------------------------------------------------------------------------- /prompts/implicatures/closed-adv_5-info_added.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Speaker 1: 'Have you seen them recently?' Speaker 2: 'I haven't seen them in ages.' What is the meaning of Speaker 2's response? 2 | OPTIONS: (a) "yes", (b) "no" 3 | ANSWER: (b) 4 | 5 | QUESTION: Speaker 1: 'You want answers?!' Speaker 2: 'I want the truth.' What is the meaning of Speaker 2's response? 6 | OPTIONS: (a) "yes", (b) "no" 7 | ANSWER: (a) 8 | 9 | QUESTION: Speaker 1: 'I am going to conduct my psychology experiment this Saturday. I would be having a handful. Would you help by writing the names of the participants?' Speaker 2: 'I have got some work on my own.' What is the meaning of Speaker 2's response? 10 | OPTIONS: (a) "yes", (b) "no" 11 | ANSWER: (b) 12 | 13 | QUESTION: Speaker 1: 'Do you think you will ever work at that company again?' Speaker 2: 'When pigs fly.' What is the meaning of Speaker 2's response? 14 | OPTIONS: (a) "yes", (b) "no" 15 | ANSWER: (b) 16 | 17 | QUESTION: Speaker 1: 'You mean I pretend I have a mom?' Speaker 2: 'You can do that, can't you?' What is the meaning of Speaker 2's response? 18 | OPTIONS: (a) "yes", (b) "no" 19 | ANSWER: (a) 20 | 21 | -------------------------------------------------------------------------------- /prompts/implicatures/closed_5-info_added.txt: -------------------------------------------------------------------------------- 1 | Speaker 1: 'Have you seen them recently?' Speaker 2: 'I haven't seen them in ages.' What is the meaning of Speaker 2's response? The possible answers are "yes", "no", but the correct answer is "no". 2 | 3 | Speaker 1: 'You want answers?!' Speaker 2: 'I want the truth.' What is the meaning of Speaker 2's response? The possible answers are "yes", "no", but the correct answer is "yes". 4 | 5 | Speaker 1: 'I am going to conduct my psychology experiment this Saturday. I would be having a handful. Would you help by writing the names of the participants?' Speaker 2: 'I have got some work on my own.' What is the meaning of Speaker 2's response? The possible answers are "yes", "no", but the correct answer is "no". 6 | 7 | Speaker 1: 'Do you think you will ever work at that company again?' Speaker 2: 'When pigs fly.' What is the meaning of Speaker 2's response? The possible answers are "yes", "no", but the correct answer is "no". 8 | 9 | Speaker 1: 'You mean I pretend I have a mom?' Speaker 2: 'You can do that, can't you?' What is the meaning of Speaker 2's response? The possible answers are "yes", "no", but the correct answer is "yes". 10 | 11 | -------------------------------------------------------------------------------- /prompts/physical_intuition/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: I throw a ball towards the ground. Which direction does the ball initially move in? 2 | OPTIONS: (a) "Up", (b) "Down", (c) "Left", (d) "Right" 3 | ANSWER: (b) 4 | 5 | QUESTION: A person is on a train going 100 km/h East and throws a ball 50 km/h to the West. At that moment there is a bird flying next to the train at 50 km/h to the East. How is the bird moving from the perspective of the ball? 6 | OPTIONS: (a) "0 km/h", (b) "50 km/h east", (c) "150 km/h east", (d) "150 km/h west" 7 | ANSWER: (a) 8 | 9 | QUESTION: The primary interaction between two individual protons separated by one micron is the 10 | OPTIONS: (a) "Electromagnetic force", (b) "Gravitational force", (c) "Weak force", (d) "Strong force" 11 | ANSWER: (a) 12 | 13 | QUESTION: A decay process occurs via the production of a W boson. What interaction was involved? 14 | OPTIONS: (a) "Electromagnetic", (b) "Strong", (c) "Weak", (d) "Elastic" 15 | ANSWER: (c) 16 | 17 | QUESTION: The primary interaction between two neutrons in the same nucleus is the 18 | OPTIONS: (a) "Electromagnetic force", (b) "Gravitational force", (c) "Weak force", (d) "Strong force" 19 | ANSWER: (d) 20 | 21 | -------------------------------------------------------------------------------- /prompts/nonsense_words_grammar/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Which word in the following sentence is a verb? I'd gralsillit onto the secure felisheret. 2 | OPTIONS: (a) "I'd", (b) "gralsillit", (c) "onto", (d) "the", (e) "tall", (f) "felisheret" 3 | ANSWER: (b) 4 | 5 | QUESTION: Which word in the following sentence is a noun? Gralliciously recarpilate the brooktoine. 6 | OPTIONS: (a) "Gralliciously", (b) "recarpilate", (c) "the", (d) "brooktoine" 7 | ANSWER: (d) 8 | 9 | QUESTION: Which word in the following sentence is an adverb? The helipioles defractated regillently for the broarkers. 10 | OPTIONS: (a) "The", (b) "helipioles", (c) "defractated", (d) "regillently", (e) "for", (f) "the", (g) "broarkers" 11 | ANSWER: (d) 12 | 13 | QUESTION: Which word in the following sentence is an adjective? Procaliff stefiliously held the brizzerous box. 14 | OPTIONS: (a) "Procaliff", (b) "stefiliously", (c) "held", (d) "the", (e) "brizzerous", (f) "box" 15 | ANSWER: (e) 16 | 17 | QUESTION: What is the grammatical object of the following sentence? Yuki inquisitively gerriantates the polderiot. 18 | OPTIONS: (a) "Yuki", (b) "inquisitively", (c) "gerriantates", (d) "the", (e) "polderiot" 19 | ANSWER: (e) 20 | 21 | -------------------------------------------------------------------------------- /prompts/nonsense_words_grammar/closed_5.txt: -------------------------------------------------------------------------------- 1 | Which word in the following sentence is a verb? I'd gralsillit onto the secure felisheret. The possible answers are "I'd", "gralsillit", "onto", "the", "tall", "felisheret", but the correct answer is "gralsillit". 2 | 3 | Which word in the following sentence is a noun? Gralliciously recarpilate the brooktoine. The possible answers are "Gralliciously", "recarpilate", "the", "brooktoine", but the correct answer is "brooktoine". 4 | 5 | Which word in the following sentence is an adverb? The helipioles defractated regillently for the broarkers. The possible answers are "The", "helipioles", "defractated", "regillently", "for", "the", "broarkers", but the correct answer is "regillently". 6 | 7 | Which word in the following sentence is an adjective? Procaliff stefiliously held the brizzerous box. The possible answers are "Procaliff", "stefiliously", "held", "the", "brizzerous", "box", but the correct answer is "brizzerous". 8 | 9 | What is the grammatical object of the following sentence? Yuki inquisitively gerriantates the polderiot. The possible answers are "Yuki", "inquisitively", "gerriantates", "the", "polderiot", but the correct answer is "polderiot". 10 | 11 | -------------------------------------------------------------------------------- /prompts/physical_intuition/closed_5.txt: -------------------------------------------------------------------------------- 1 | I throw a ball towards the ground. Which direction does the ball initially move in? The possible answers are "Up", "Down", "Left", "Right", but the correct answer is "Down". 2 | 3 | A person is on a train going 100 km/h East and throws a ball 50 km/h to the West. At that moment there is a bird flying next to the train at 50 km/h to the East. How is the bird moving from the perspective of the ball? The possible answers are "0 km/h", "50 km/h east", "150 km/h east", "150 km/h west", but the correct answer is "0 km/h". 4 | 5 | The primary interaction between two individual protons separated by one micron is the The possible answers are "Electromagnetic force", "Gravitational force", "Weak force", "Strong force", but the correct answer is "Electromagnetic force". 6 | 7 | A decay process occurs via the production of a W boson. What interaction was involved? The possible answers are "Electromagnetic", "Strong", "Weak", "Elastic", but the correct answer is "Weak". 8 | 9 | The primary interaction between two neutrons in the same nucleus is the The possible answers are "Electromagnetic force", "Gravitational force", "Weak force", "Strong force", but the correct answer is "Strong force". 10 | 11 | -------------------------------------------------------------------------------- /prompts/logical_deduction/closed-adv_5-info_removed.txt: -------------------------------------------------------------------------------- 1 | QUESTION: The blue book is to the right of the yellow book. The orange book is the second from the left. 2 | OPTIONS: (a) "The orange book is the rightmost", (b) "The yellow book is the rightmost", (c) "The blue book is the rightmost" 3 | ANSWER: (c) 4 | 5 | QUESTION: The brown book is to the left of the orange book. The green book is to the left of the brown book. 6 | OPTIONS: (a) "The green book is the rightmost", (b) "The brown book is the rightmost", (c) "The orange book is the rightmost" 7 | ANSWER: (c) 8 | 9 | QUESTION: The mangoes are less expensive than the peaches. The apples are less expensive than the mangoes. 10 | OPTIONS: (a) "The apples are the most expensive", (b) "The peaches are the most expensive", (c) "The mangoes are the most expensive" 11 | ANSWER: (b) 12 | 13 | QUESTION: Dan finished above Amy. Amy finished above Mel. 14 | OPTIONS: (a) "Dan finished last", (b) "Mel finished last", (c) "Amy finished last" 15 | ANSWER: (b) 16 | 17 | QUESTION: The minivan is newer than the sedan. The minivan is older than the bus. 18 | OPTIONS: (a) "The minivan is the second-newest", (b) "The bus is the second-newest", (c) "The sedan is the second-newest" 19 | ANSWER: (a) 20 | 21 | -------------------------------------------------------------------------------- /prompts/figure_of_speech_detection/closed_5.txt: -------------------------------------------------------------------------------- 1 | Kisses are the flowers of affection. The possible answers are "Simile", "Metaphor", "Personification", "Apostrophe", "Oxymoron", "Hyperbole", "Pun", "Euphemism", "Alliteration", "Onomatopoeia", but the correct answer is "Metaphor". 2 | 3 | Sara exclaimed that the ice cream she purchased at the shop was awfully good. The possible answers are "Simile", "Metaphor", "Personification", "Apostrophe", "Oxymoron", "Hyperbole", "Pun", "Euphemism", "Alliteration", "Onomatopoeia", but the correct answer is "Oxymoron". 4 | 5 | Lightning danced across the sky. The possible answers are "Simile", "Metaphor", "Personification", "Apostrophe", "Oxymoron", "Hyperbole", "Pun", "Euphemism", "Alliteration", "Onomatopoeia", but the correct answer is "Personification". 6 | 7 | Geometry is so pointless. The possible answers are "Simile", "Metaphor", "Personification", "Apostrophe", "Oxymoron", "Hyperbole", "Pun", "Euphemism", "Alliteration", "Onomatopoeia", but the correct answer is "Pun". 8 | 9 | After running a marathon, a marathoner eats like a horse. The possible answers are "Simile", "Metaphor", "Personification", "Apostrophe", "Oxymoron", "Hyperbole", "Pun", "Euphemism", "Alliteration", "Onomatopoeia", but the correct answer is "Simile". 10 | 11 | -------------------------------------------------------------------------------- /prompts/modified_arithmetic/open_5.txt: -------------------------------------------------------------------------------- 1 | In the following lines, the symbol -> represents a simple mathematical operation. 2 | 451 + 499 -> 950 3 | 164 + 181 -> 345 4 | 468 + 909 -> 1377 5 | 734 + 822 -> 1556 6 | 943 + 977 -> 1920 7 | 6 + 329 -> ? 8 | The correct answer is "335". 9 | 10 | In the following lines, the symbol -> represents a simple mathematical operation. 11 | 61 + 514 -> 575 12 | 429 + 579 -> 1008 13 | 536 + 690 -> 1226 14 | 140 + 742 -> 882 15 | 299 + 674 -> 973 16 | 310 + 470 -> ? 17 | The correct answer is "780". 18 | 19 | In the following lines, the symbol -> represents a simple mathematical operation. 20 | 277 + 848 -> 1125 21 | 721 + 168 -> 889 22 | 329 + 649 -> 978 23 | 440 + 390 -> 830 24 | 495 + 380 -> 875 25 | 305 + 741 -> ? 26 | The correct answer is "1046". 27 | 28 | In the following lines, the symbol -> represents a simple mathematical operation. 29 | 252 + 316 -> 568 30 | 414 + 94 -> 508 31 | 784 + 891 -> 1675 32 | 23 + 627 -> 650 33 | 662 + 849 -> 1511 34 | 49 + 501 -> ? 35 | The correct answer is "550". 36 | 37 | In the following lines, the symbol -> represents a simple mathematical operation. 38 | 734 + 88 -> 822 39 | 798 + 433 -> 1231 40 | 787 + 747 -> 1534 41 | 182 + 208 -> 390 42 | 492 + 129 -> 621 43 | 664 + 12 -> ? 44 | The correct answer is "676". 45 | 46 | -------------------------------------------------------------------------------- /prompts/social_iqa/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Austin's family was celebrating their parents 50th anniversary during dinner at a new restaurant. What would Austin's family do next? 2 | OPTIONS: (a) "Refuse to eat dinner with the family", (b) "Eat dinner at the restaurant", (c) "Happy" 3 | ANSWER: (b) 4 | 5 | QUESTION: After they picked a hiking trail to take together, Sydney met Skylar near the canyon edge. What does Sydney need to do before this? 6 | OPTIONS: (a) "Pack for the hike", (b) "Greet each other", (c) "Walk along the rim" 7 | ANSWER: (a) 8 | 9 | QUESTION: Tracy gave video games to my friend. They knew they were quality titles they would enjoy. What will Tracy want to do next? 10 | OPTIONS: (a) "Give bad advice", (b) "Be generous", (c) "Give advice" 11 | ANSWER: (b) 12 | 13 | QUESTION: Quinn cried their eyes out and ate a whole carton of icecream over her breakup. What will Quinn want to do next? 14 | OPTIONS: (a) "Get back together with her boyrfrined", (b) "Purchase ice cream", (c) "Keep crying" 15 | ANSWER: (c) 16 | 17 | QUESTION: Jesse had made plans with Cameron to go to the dance, but instead stood Cameron up. What will Cameron want to do next? 18 | OPTIONS: (a) "Not go to the dance anymore", (b) "Move away from the area", (c) "Tell Jesse how they feel" 19 | ANSWER: (c) 20 | 21 | -------------------------------------------------------------------------------- /prompts/figure_of_speech_detection/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Kisses are the flowers of affection. 2 | OPTIONS: (a) "Simile", (b) "Metaphor", (c) "Personification", (d) "Apostrophe", (e) "Oxymoron", (f) "Hyperbole", (g) "Pun", (h) "Euphemism", (i) "Alliteration", (j) "Onomatopoeia" 3 | ANSWER: (b) 4 | 5 | QUESTION: Sara exclaimed that the ice cream she purchased at the shop was awfully good. 6 | OPTIONS: (a) "Simile", (b) "Metaphor", (c) "Personification", (d) "Apostrophe", (e) "Oxymoron", (f) "Hyperbole", (g) "Pun", (h) "Euphemism", (i) "Alliteration", (j) "Onomatopoeia" 7 | ANSWER: (e) 8 | 9 | QUESTION: Lightning danced across the sky. 10 | OPTIONS: (a) "Simile", (b) "Metaphor", (c) "Personification", (d) "Apostrophe", (e) "Oxymoron", (f) "Hyperbole", (g) "Pun", (h) "Euphemism", (i) "Alliteration", (j) "Onomatopoeia" 11 | ANSWER: (c) 12 | 13 | QUESTION: Geometry is so pointless. 14 | OPTIONS: (a) "Simile", (b) "Metaphor", (c) "Personification", (d) "Apostrophe", (e) "Oxymoron", (f) "Hyperbole", (g) "Pun", (h) "Euphemism", (i) "Alliteration", (j) "Onomatopoeia" 15 | ANSWER: (g) 16 | 17 | QUESTION: After running a marathon, a marathoner eats like a horse. 18 | OPTIONS: (a) "Simile", (b) "Metaphor", (c) "Personification", (d) "Apostrophe", (e) "Oxymoron", (f) "Hyperbole", (g) "Pun", (h) "Euphemism", (i) "Alliteration", (j) "Onomatopoeia" 19 | ANSWER: (a) 20 | 21 | -------------------------------------------------------------------------------- /prompts/logical_deduction/closed_5-info_removed.txt: -------------------------------------------------------------------------------- 1 | The blue book is to the right of the yellow book. The orange book is the second from the left. The possible answers are "The orange book is the rightmost", "The yellow book is the rightmost", "The blue book is the rightmost", but the correct answer is "The blue book is the rightmost". 2 | 3 | The brown book is to the left of the orange book. The green book is to the left of the brown book. The possible answers are "The green book is the rightmost", "The brown book is the rightmost", "The orange book is the rightmost", but the correct answer is "The orange book is the rightmost". 4 | 5 | The mangoes are less expensive than the peaches. The apples are less expensive than the mangoes. The possible answers are "The apples are the most expensive", "The peaches are the most expensive", "The mangoes are the most expensive", but the correct answer is "The peaches are the most expensive". 6 | 7 | Dan finished above Amy. Amy finished above Mel. The possible answers are "Dan finished last", "Mel finished last", "Amy finished last", but the correct answer is "Mel finished last". 8 | 9 | The minivan is newer than the sedan. The minivan is older than the bus. The possible answers are "The minivan is the second-newest", "The bus is the second-newest", "The sedan is the second-newest", but the correct answer is "The minivan is the second-newest". 10 | 11 | -------------------------------------------------------------------------------- /prompts/social_iqa/closed_5.txt: -------------------------------------------------------------------------------- 1 | Austin's family was celebrating their parents 50th anniversary during dinner at a new restaurant. What would Austin's family do next? The possible answers are "Refuse to eat dinner with the family", "Happy", "Eat dinner at the restaurant", but the correct answer is "Eat dinner at the restaurant". 2 | 3 | After they picked a hiking trail to take together, Sydney met Skylar near the canyon edge. What does Sydney need to do before this? The possible answers are "Greet each other", "Walk along the rim", "Pack for the hike", but the correct answer is "Pack for the hike". 4 | 5 | Tracy gave video games to my friend. They knew they were quality titles they would enjoy. What will Tracy want to do next? The possible answers are "Give bad advice", "Be generous", "Give advice", but the correct answer is "Be generous". 6 | 7 | Quinn cried their eyes out and ate a whole carton of icecream over her breakup. What will Quinn want to do next? The possible answers are "Get back together with her boyrfrined", "Purchase ice cream", "Keep crying", but the correct answer is "Keep crying". 8 | 9 | Jesse had made plans with Cameron to go to the dance, but instead stood Cameron up. What will Cameron want to do next? The possible answers are "Not go to the dance anymore", "Move away from the area", "Tell Jesse how they feel", but the correct answer is "Tell Jesse how they feel". 10 | 11 | -------------------------------------------------------------------------------- /prompts/causal_judgment/open_2.txt: -------------------------------------------------------------------------------- 1 | In a particular building there are two businesses, a travel agency and a graphic design studio. The building's climate control system is a new design that saves energy by keeping track of the number of people in the building, and only turning on when enough people have entered the building. The climate control system will turn on when the people who work at the travel agency or the people who work in the design studio arrive for work. Each office has enough employees to turn on the climate control system on their own. The travel agency employees almost always arrive at 8:45 am, and the design studio employees almost always arrive at 8:45 am. Today, the travel agency employees arrived at 8:45 am. The design studio employees also arrived at 8:45 am, as usual. So, today, the climate control system turned on at 8:45 am. Did the design studio agents cause the climate control system to turn on at 8:45 am? The correct answer is "Yes". 2 | 3 | Frank T., had an ongoing dispute with his neighbor over a stretch of land and one day decided to shoot his neighbor in the body. Frank T. had no experience with guns, his hand slipped on the barrel of the gun, and the shot went wild. Nonetheless, the bullet bounced off a large boulder several feet away and hit the neighbor's body, causing significant injury. Did Frank T. intentionally shoot his neighbor in the body? The correct answer is "No". 4 | 5 | -------------------------------------------------------------------------------- /prompts/causal_judgment/closed-adv_2.txt: -------------------------------------------------------------------------------- 1 | QUESTION: In a particular building there are two businesses, a travel agency and a graphic design studio. The building's climate control system is a new design that saves energy by keeping track of the number of people in the building, and only turning on when enough people have entered the building. The climate control system will turn on when the people who work at the travel agency or the people who work in the design studio arrive for work. Each office has enough employees to turn on the climate control system on their own. The travel agency employees almost always arrive at 8:45 am, and the design studio employees almost always arrive at 8:45 am. Today, the travel agency employees arrived at 8:45 am. The design studio employees also arrived at 8:45 am, as usual. So, today, the climate control system turned on at 8:45 am. Did the design studio agents cause the climate control system to turn on at 8:45 am? 2 | OPTIONS: (a) "Yes", (b) "No" 3 | ANSWER: (a) 4 | 5 | QUESTION: Frank T., had an ongoing dispute with his neighbor over a stretch of land and one day decided to shoot his neighbor in the body. Frank T. had no experience with guns, his hand slipped on the barrel of the gun, and the shot went wild. Nonetheless, the bullet bounced off a large boulder several feet away and hit the neighbor's body, causing significant injury. Did Frank T. intentionally shoot his neighbor in the body? 6 | OPTIONS: (a) "Yes", (b) "No" 7 | ANSWER: (b) 8 | 9 | -------------------------------------------------------------------------------- /prompts/codenames/open_5.txt: -------------------------------------------------------------------------------- 1 | Try to identify the 2 words best associated with the word E.T. from the following list: limousine, model, watch, boom, block, robin, fish, scale, new york, day, capital, kangaroo, heart, spine, diamond, head, stream, fire, alien, church, jet, cloak. Give your answer in alphabetical order. The correct answer is "alien, cloak" 2 | 3 | Try to identify the 2 words best associated with the word CRYSTAL from the following list: page, anchor, drum, sack, pacific, shoulder, mark, glacier, sugar, gear, tunnel, book. Give your answer in alphabetical order. The correct answer is "glacier, sugar" 4 | 5 | Try to identify the 1 word best associated with the word MYTHOLOGY from the following list: boot, wave, centaur, calf, antarctica, slip, helicopter, vet. Give your answer in alphabetical order. The correct answer is "centaur" 6 | 7 | Try to identify the 2 words best associated with the word HORSES from the following list: team, big ben, bubble, pizza, kung fu, battle, sled, rat, cane, patient, yellowstone, step, onion, glasses, blind, bulb, storm, ranch, spoon, desk, violet, mummy, apron, fuel, saddle. Give your answer in alphabetical order. The correct answer is "ranch, saddle" 8 | 9 | Try to identify the 2 words best associated with the word GALILEO from the following list: battery, calf, kangaroo, spell, ruler, jack, australia, jupiter, lock, pilot, centaur, dragon, table, jam. Give your answer in alphabetical order. The correct answer is "jupiter, ruler" 10 | 11 | -------------------------------------------------------------------------------- /prompts/causal_judgment/closed_2.txt: -------------------------------------------------------------------------------- 1 | In a particular building there are two businesses, a travel agency and a graphic design studio. The building's climate control system is a new design that saves energy by keeping track of the number of people in the building, and only turning on when enough people have entered the building. The climate control system will turn on when the people who work at the travel agency or the people who work in the design studio arrive for work. Each office has enough employees to turn on the climate control system on their own. The travel agency employees almost always arrive at 8:45 am, and the design studio employees almost always arrive at 8:45 am. Today, the travel agency employees arrived at 8:45 am. The design studio employees also arrived at 8:45 am, as usual. So, today, the climate control system turned on at 8:45 am. Did the design studio agents cause the climate control system to turn on at 8:45 am? The possible answers are "Yes", "No", but the correct answer is "Yes". 2 | 3 | Frank T., had an ongoing dispute with his neighbor over a stretch of land and one day decided to shoot his neighbor in the body. Frank T. had no experience with guns, his hand slipped on the barrel of the gun, and the shot went wild. Nonetheless, the bullet bounced off a large boulder several feet away and hit the neighbor's body, causing significant injury. Did Frank T. intentionally shoot his neighbor in the body? The possible answers are "Yes", "No", but the correct answer is "No". 4 | 5 | -------------------------------------------------------------------------------- /prompts/gsm8k/open_5.txt: -------------------------------------------------------------------------------- 1 | Linda is building a new hotel with two wings. The first wing has 9 floors and each floor has 6 halls each with 32 rooms. The second wing has 7 floors each with 9 halls with 40 rooms each. How many rooms are in the hotel total? The correct answer is "4248" 2 | 3 | Carl buys index cards for his class. He gives each student 10 index cards. He teaches 6 periods a day and each class has 30 students. If a 50 pack of index cards cost $3 how much did he spend on them all? The correct answer is "108" 4 | 5 | The school is organizing a trip to the museum. 4 buses were hired to take the children and teachers to their destination. The second bus has twice the number of people on it as the first bus. The third bus has 6 fewer people than the second bus. The fourth bus has 9 more people than the first bus. If the first bus has 12 people, how many people are going to the museum in total? The correct answer is "75" 6 | 7 | Elsa started the day with 40 marbles. At breakfast, she lost 3 marbles while playing. At lunchtime, she gave her best friend Susie 5 marbles. In the afternoon, Elsa's mom bought her a new bag with 12 marbles. Susie came back and gave Elsa twice as many marbles as she received at lunch. How many marbles did Elsa end the day with? The correct answer is "54" 8 | 9 | Leticia, Nina, and Rosalie have a total of 25 people on their dance team. If 8 people quit, but 13 new people got in, how many people are there now on the team? The correct answer is "30" 10 | 11 | -------------------------------------------------------------------------------- /prompts/modified_arithmetic/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: In the following lines, the symbol -> represents a simple mathematical operation. 2 | 451 + 499 -> 950 3 | 164 + 181 -> 345 4 | 468 + 909 -> 1377 5 | 734 + 822 -> 1556 6 | 943 + 977 -> 1920 7 | 6 + 329 -> ? 8 | OPTIONS: (a) "340", (b) "334", (c) "335" 9 | ANSWER: (c) 10 | 11 | QUESTION: In the following lines, the symbol -> represents a simple mathematical operation. 12 | 61 + 514 -> 575 13 | 429 + 579 -> 1008 14 | 536 + 690 -> 1226 15 | 140 + 742 -> 882 16 | 299 + 674 -> 973 17 | 310 + 470 -> ? 18 | OPTIONS: (a) "786", (b) "780", (c) "776" 19 | ANSWER: (b) 20 | 21 | QUESTION: In the following lines, the symbol -> represents a simple mathematical operation. 22 | 277 + 848 -> 1125 23 | 721 + 168 -> 889 24 | 329 + 649 -> 978 25 | 440 + 390 -> 830 26 | 495 + 380 -> 875 27 | 305 + 741 -> ? 28 | OPTIONS: (a) "1046", (b) "1047", (c) "1041" 29 | ANSWER: (a) 30 | 31 | QUESTION: In the following lines, the symbol -> represents a simple mathematical operation. 32 | 252 + 316 -> 568 33 | 414 + 94 -> 508 34 | 784 + 891 -> 1675 35 | 23 + 627 -> 650 36 | 662 + 849 -> 1511 37 | 49 + 501 -> ? 38 | OPTIONS: (a) "550", (b) "549", (c) "551" 39 | ANSWER: (a) 40 | 41 | QUESTION: In the following lines, the symbol -> represents a simple mathematical operation. 42 | 734 + 88 -> 822 43 | 798 + 433 -> 1231 44 | 787 + 747 -> 1534 45 | 182 + 208 -> 390 46 | 492 + 129 -> 621 47 | 664 + 12 -> ? 48 | OPTIONS: (a) "677", (b) "673", (c) "676" 49 | ANSWER: (c) 50 | 51 | -------------------------------------------------------------------------------- /prompts/modified_arithmetic/closed_5.txt: -------------------------------------------------------------------------------- 1 | In the following lines, the symbol -> represents a simple mathematical operation. 2 | 451 + 499 -> 950 3 | 164 + 181 -> 345 4 | 468 + 909 -> 1377 5 | 734 + 822 -> 1556 6 | 943 + 977 -> 1920 7 | 6 + 329 -> ? 8 | The possible answers are "340", "334", "335", but the correct answer is "335". 9 | 10 | In the following lines, the symbol -> represents a simple mathematical operation. 11 | 61 + 514 -> 575 12 | 429 + 579 -> 1008 13 | 536 + 690 -> 1226 14 | 140 + 742 -> 882 15 | 299 + 674 -> 973 16 | 310 + 470 -> ? 17 | The possible answers are "786", "780", "776", but the correct answer is "780". 18 | 19 | In the following lines, the symbol -> represents a simple mathematical operation. 20 | 277 + 848 -> 1125 21 | 721 + 168 -> 889 22 | 329 + 649 -> 978 23 | 440 + 390 -> 830 24 | 495 + 380 -> 875 25 | 305 + 741 -> ? 26 | The possible answers are "1046", "1047", "1041", but the correct answer is "1046". 27 | 28 | In the following lines, the symbol -> represents a simple mathematical operation. 29 | 252 + 316 -> 568 30 | 414 + 94 -> 508 31 | 784 + 891 -> 1675 32 | 23 + 627 -> 650 33 | 662 + 849 -> 1511 34 | 49 + 501 -> ? 35 | The possible answers are "550", "549", "551", but the correct answer is "550". 36 | 37 | In the following lines, the symbol -> represents a simple mathematical operation. 38 | 734 + 88 -> 822 39 | 798 + 433 -> 1231 40 | 787 + 747 -> 1534 41 | 182 + 208 -> 390 42 | 492 + 129 -> 621 43 | 664 + 12 -> ? 44 | The possible answers are "677", "673", "676", but the correct answer is "676". 45 | 46 | -------------------------------------------------------------------------------- /prompts/figure_of_speech_detection/closed_5-info_added.txt: -------------------------------------------------------------------------------- 1 | Kisses are the flowers of affection. What type of figurative language does this represent? The possible answers are "Simile", "Metaphor", "Personification", "Apostrophe", "Oxymoron", "Hyperbole", "Pun", "Euphemism", "Alliteration", "Onomatopoeia", but the correct answer is "Metaphor". 2 | 3 | Sara exclaimed that the ice cream she purchased at the shop was awfully good. What type of figurative language does this represent? The possible answers are "Simile", "Metaphor", "Personification", "Apostrophe", "Oxymoron", "Hyperbole", "Pun", "Euphemism", "Alliteration", "Onomatopoeia", but the correct answer is "Oxymoron". 4 | 5 | Lightning danced across the sky. What type of figurative language does this represent? The possible answers are "Simile", "Metaphor", "Personification", "Apostrophe", "Oxymoron", "Hyperbole", "Pun", "Euphemism", "Alliteration", "Onomatopoeia", but the correct answer is "Personification". 6 | 7 | Geometry is so pointless. What type of figurative language does this represent? The possible answers are "Simile", "Metaphor", "Personification", "Apostrophe", "Oxymoron", "Hyperbole", "Pun", "Euphemism", "Alliteration", "Onomatopoeia", but the correct answer is "Pun". 8 | 9 | After running a marathon, a marathoner eats like a horse. What type of figurative language does this represent? The possible answers are "Simile", "Metaphor", "Personification", "Apostrophe", "Oxymoron", "Hyperbole", "Pun", "Euphemism", "Alliteration", "Onomatopoeia", but the correct answer is "Simile". 10 | 11 | -------------------------------------------------------------------------------- /prompts/logical_deduction/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: On a shelf, there are three books: an orange book, a yellow book, and a blue book. The blue book is to the right of the yellow book. The orange book is the second from the left. 2 | OPTIONS: (a) "The orange book is the rightmost", (b) "The yellow book is the rightmost", (c) "The blue book is the rightmost" 3 | ANSWER: (c) 4 | 5 | QUESTION: On a shelf, there are three books: a green book, a brown book, and an orange book. The brown book is to the left of the orange book. The green book is to the left of the brown book. 6 | OPTIONS: (a) "The green book is the rightmost", (b) "The brown book is the rightmost", (c) "The orange book is the rightmost" 7 | ANSWER: (c) 8 | 9 | QUESTION: A fruit stand sells three fruits: apples, peaches, and mangoes. The mangoes are less expensive than the peaches. The apples are less expensive than the mangoes. 10 | OPTIONS: (a) "The apples are the most expensive", (b) "The peaches are the most expensive", (c) "The mangoes are the most expensive" 11 | ANSWER: (b) 12 | 13 | QUESTION: In a golf tournament, there were three golfers: Dan, Mel, and Amy. Dan finished above Amy. Amy finished above Mel. 14 | OPTIONS: (a) "Dan finished last", (b) "Mel finished last", (c) "Amy finished last" 15 | ANSWER: (b) 16 | 17 | QUESTION: In an antique car show, there are three vehicles: a minivan, a bus, and a sedan. The minivan is newer than the sedan. The minivan is older than the bus. 18 | OPTIONS: (a) "The minivan is the second-newest", (b) "The bus is the second-newest", (c) "The sedan is the second-newest" 19 | ANSWER: (a) 20 | 21 | -------------------------------------------------------------------------------- /prompts/figure_of_speech_detection/closed-adv_5-info_added.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Kisses are the flowers of affection. What type of figurative language does this represent? 2 | OPTIONS: (a) "Simile", (b) "Metaphor", (c) "Personification", (d) "Apostrophe", (e) "Oxymoron", (f) "Hyperbole", (g) "Pun", (h) "Euphemism", (i) "Alliteration", (j) "Onomatopoeia" 3 | ANSWER: (b) 4 | 5 | QUESTION: Sara exclaimed that the ice cream she purchased at the shop was awfully good. What type of figurative language does this represent? 6 | OPTIONS: (a) "Simile", (b) "Metaphor", (c) "Personification", (d) "Apostrophe", (e) "Oxymoron", (f) "Hyperbole", (g) "Pun", (h) "Euphemism", (i) "Alliteration", (j) "Onomatopoeia" 7 | ANSWER: (e) 8 | 9 | QUESTION: Lightning danced across the sky. What type of figurative language does this represent? 10 | OPTIONS: (a) "Simile", (b) "Metaphor", (c) "Personification", (d) "Apostrophe", (e) "Oxymoron", (f) "Hyperbole", (g) "Pun", (h) "Euphemism", (i) "Alliteration", (j) "Onomatopoeia" 11 | ANSWER: (c) 12 | 13 | QUESTION: Geometry is so pointless. What type of figurative language does this represent? 14 | OPTIONS: (a) "Simile", (b) "Metaphor", (c) "Personification", (d) "Apostrophe", (e) "Oxymoron", (f) "Hyperbole", (g) "Pun", (h) "Euphemism", (i) "Alliteration", (j) "Onomatopoeia" 15 | ANSWER: (g) 16 | 17 | QUESTION: After running a marathon, a marathoner eats like a horse. What type of figurative language does this represent? 18 | OPTIONS: (a) "Simile", (b) "Metaphor", (c) "Personification", (d) "Apostrophe", (e) "Oxymoron", (f) "Hyperbole", (g) "Pun", (h) "Euphemism", (i) "Alliteration", (j) "Onomatopoeia" 19 | ANSWER: (a) 20 | 21 | -------------------------------------------------------------------------------- /prompts/english_proverbs/open_5.txt: -------------------------------------------------------------------------------- 1 | Laura was really never very successful with men. But she met Mike some time ago, and he is totally head over heels in love with her. Which of the following proverbs best apply to this situation? The correct answer is "Beauty is in the eye of the beholder". 2 | 3 | Laura always used to be nice and permissive with her students as long as they accomplished their work. However, with time she realized being permissive only leads to the students doing what they felt like and not paying any attention to what Laura was saying. So she became stricter. Which of the following proverbs best apply to this situation? The correct answer is "Give them an inch and they will take a mile". 4 | 5 | He left his job tired of so much chaos and the terrible organization. He got into a new company that looked much more professional, but after some time, similar problems started appearing. Which of the following proverbs best apply to this situation? The correct answer is "The neighbor's grass always looks greener". 6 | 7 | Alice always makes sure to keep an extra case of apples in her house. One week, the local supermarket ran out of apples. Thankfully because of Alice's preparation, she was still able to eat apples that week. Which of the following proverbs best apply to this situation? The correct answer is "It is better to be safe than sorry". 8 | 9 | This has been the fourth company she has grounded on a similar idea. All the previous ones failed, but now she's really got it right. It's skyrocketing. Which of the following proverbs best apply to this situation? The correct answer is "Fall seven times, stand up eight". 10 | 11 | -------------------------------------------------------------------------------- /prompts/logical_deduction/closed_5.txt: -------------------------------------------------------------------------------- 1 | On a shelf, there are three books: an orange book, a yellow book, and a blue book. The blue book is to the right of the yellow book. The orange book is the second from the left. The possible answers are "The orange book is the rightmost", "The yellow book is the rightmost", "The blue book is the rightmost", but the correct answer is "The blue book is the rightmost". 2 | 3 | On a shelf, there are three books: a green book, a brown book, and an orange book. The brown book is to the left of the orange book. The green book is to the left of the brown book. The possible answers are "The green book is the rightmost", "The brown book is the rightmost", "The orange book is the rightmost", but the correct answer is "The orange book is the rightmost". 4 | 5 | A fruit stand sells three fruits: apples, peaches, and mangoes. The mangoes are less expensive than the peaches. The apples are less expensive than the mangoes. The possible answers are "The apples are the most expensive", "The peaches are the most expensive", "The mangoes are the most expensive", but the correct answer is "The peaches are the most expensive". 6 | 7 | In a golf tournament, there were three golfers: Dan, Mel, and Amy. Dan finished above Amy. Amy finished above Mel. The possible answers are "Dan finished last", "Mel finished last", "Amy finished last", but the correct answer is "Mel finished last". 8 | 9 | In an antique car show, there are three vehicles: a minivan, a bus, and a sedan. The minivan is newer than the sedan. The minivan is older than the bus. The possible answers are "The minivan is the second-newest", "The bus is the second-newest", "The sedan is the second-newest", but the correct answer is "The minivan is the second-newest". 10 | 11 | -------------------------------------------------------------------------------- /prompts/codenames/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Try to identify the 2 words best associated with the word E.T. from the following list: limousine, model, watch, boom, block, robin, fish, scale, new york, day, capital, kangaroo, heart, spine, diamond, head, stream, fire, alien, church, jet, cloak. Give your answer in alphabetical order. 2 | OPTIONS: (a) "fish, cloak", (b) "alien, cloak", (c) "jet, cloak" 3 | ANSWER: (b) 4 | 5 | QUESTION: Try to identify the 2 words best associated with the word CRYSTAL from the following list: page, anchor, drum, sack, pacific, shoulder, mark, glacier, sugar, gear, tunnel, book. Give your answer in alphabetical order. 6 | OPTIONS: (a) "tunnel, book", (b) "glacier, sugar", (c) "tunnel, glacier" 7 | ANSWER: (b) 8 | 9 | QUESTION: Try to identify the 1 word best associated with the word MYTHOLOGY from the following list: boot, wave, centaur, calf, antarctica, slip, helicopter, vet. Give your answer in alphabetical order. 10 | OPTIONS: (a) "centaur", (b) "antarctica", (c) "calf" 11 | ANSWER: (a) 12 | 13 | QUESTION: Try to identify the 2 words best associated with the word HORSES from the following list: team, big ben, bubble, pizza, kung fu, battle, sled, rat, cane, patient, yellowstone, step, onion, glasses, blind, bulb, storm, ranch, spoon, desk, violet, mummy, apron, fuel, saddle. Give your answer in alphabetical order. 14 | OPTIONS: (a) "step, big ben", (b) "ranch, saddle", (c) "step, yellowstone" 15 | ANSWER: (b) 16 | 17 | QUESTION: Try to identify the 2 words best associated with the word GALILEO from the following list: battery, calf, kangaroo, spell, ruler, jack, australia, jupiter, lock, pilot, centaur, dragon, table, jam. Give your answer in alphabetical order. 18 | OPTIONS: (a) "jupiter, ruler", (b) "pilot, calf", (c) "pilot, centaur" 19 | ANSWER: (a) 20 | 21 | -------------------------------------------------------------------------------- /prompts/gsm8k/closed_5.txt: -------------------------------------------------------------------------------- 1 | Linda is building a new hotel with two wings. The first wing has 9 floors and each floor has 6 halls each with 32 rooms. The second wing has 7 floors each with 9 halls with 40 rooms each. How many rooms are in the hotel total? The possible answers are "192", "2520", "4248", "1728", "360", but the correct answer is "4248". 2 | 3 | Carl buys index cards for his class. He gives each student 10 index cards. He teaches 6 periods a day and each class has 30 students. If a 50 pack of index cards cost $3 how much did he spend on them all? The possible answers are "36", "180", "30", "108", "50", but the correct answer is "108". 4 | 5 | The school is organizing a trip to the museum. 4 buses were hired to take the children and teachers to their destination. The second bus has twice the number of people on it as the first bus. The third bus has 6 fewer people than the second bus. The fourth bus has 9 more people than the first bus. If the first bus has 12 people, how many people are going to the museum in total? The possible answers are "18", "12", "75", "24", "21", but the correct answer is "75". 6 | 7 | Elsa started the day with 40 marbles. At breakfast, she lost 3 marbles while playing. At lunchtime, she gave her best friend Susie 5 marbles. In the afternoon, Elsa's mom bought her a new bag with 12 marbles. Susie came back and gave Elsa twice as many marbles as she received at lunch. How many marbles did Elsa end the day with? The possible answers are "32", "54", "44", "40", "12", but the correct answer is "54". 8 | 9 | Leticia, Nina, and Rosalie have a total of 25 people on their dance team. If 8 people quit, but 13 new people got in, how many people are there now on the team? The possible answers are "25", "17", "30", "13", "8", but the correct answer is "30". 10 | 11 | -------------------------------------------------------------------------------- /prompts/gsm8k/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Linda is building a new hotel with two wings. The first wing has 9 floors and each floor has 6 halls each with 32 rooms. The second wing has 7 floors each with 9 halls with 40 rooms each. How many rooms are in the hotel total? 2 | OPTIONS: (a) "192", (b) "2520", (c) "4248", (d) "1728", (e) "360" 3 | ANSWER: (c) 4 | 5 | QUESTION: Carl buys index cards for his class. He gives each student 10 index cards. He teaches 6 periods a day and each class has 30 students. If a 50 pack of index cards cost $3 how much did he spend on them all? 6 | OPTIONS: (a) "36", (b) "180", (c) "30", (d) "108", (e) "50" 7 | ANSWER: (d) 8 | 9 | QUESTION: The school is organizing a trip to the museum. 4 buses were hired to take the children and teachers to their destination. The second bus has twice the number of people on it as the first bus. The third bus has 6 fewer people than the second bus. The fourth bus has 9 more people than the first bus. If the first bus has 12 people, how many people are going to the museum in total? 10 | OPTIONS: (a) "18", (b) "12", (c) "75", (d) "24", (e) "21" 11 | ANSWER: (c) 12 | 13 | QUESTION: Elsa started the day with 40 marbles. At breakfast, she lost 3 marbles while playing. At lunchtime, she gave her best friend Susie 5 marbles. In the afternoon, Elsa's mom bought her a new bag with 12 marbles. Susie came back and gave Elsa twice as many marbles as she received at lunch. How many marbles did Elsa end the day with? 14 | OPTIONS: (a) "32", (b) "54", (c) "44", (d) "40", (e) "12" 15 | ANSWER: (b) 16 | 17 | QUESTION: Leticia, Nina, and Rosalie have a total of 25 people on their dance team. If 8 people quit, but 13 new people got in, how many people are there now on the team? 18 | OPTIONS: (a) "25", (b) "17", (c) "30", (d) "13", (e) "8" 19 | ANSWER: (c) 20 | 21 | -------------------------------------------------------------------------------- /prompts/codenames/closed_5.txt: -------------------------------------------------------------------------------- 1 | Try to identify the 2 words best associated with the word E.T. from the following list: limousine, model, watch, boom, block, robin, fish, scale, new york, day, capital, kangaroo, heart, spine, diamond, head, stream, fire, alien, church, jet, cloak. Give your answer in alphabetical order. The possible answers are "alien, cloak", "robin, cloak", "scale, alien", but the correct answer is "alien, cloak" 2 | 3 | Try to identify the 2 words best associated with the word CRYSTAL from the following list: page, anchor, drum, sack, pacific, shoulder, mark, glacier, sugar, gear, tunnel, book. Give your answer in alphabetical order. The possible answers are "book, glacier", "drum, sugar", "glacier, sugar", but the correct answer is "glacier, sugar" 4 | 5 | Try to identify the 1 word best associated with the word MYTHOLOGY from the following list: boot, wave, centaur, calf, antarctica, slip, helicopter, vet. Give your answer in alphabetical order. The possible answers are "centaur", "wave", "vet", but the correct answer is "centaur" 6 | 7 | Try to identify the 2 words best associated with the word HORSES from the following list: team, big ben, bubble, pizza, kung fu, battle, sled, rat, cane, patient, yellowstone, step, onion, glasses, blind, bulb, storm, ranch, spoon, desk, violet, mummy, apron, fuel, saddle. Give your answer in alphabetical order. The possible answers are "apron, ranch", "ranch, saddle", "spoon, ranch", but the correct answer is "ranch, saddle" 8 | 9 | Try to identify the 2 words best associated with the word GALILEO from the following list: battery, calf, kangaroo, spell, ruler, jack, australia, jupiter, lock, pilot, centaur, dragon, table, jam. Give your answer in alphabetical order. The possible answers are "centaur, jupiter", "jupiter, ruler", "pilot, ruler", but the correct answer is "jupiter, ruler" 10 | 11 | -------------------------------------------------------------------------------- /prompts/vitaminc_fact_verification/open_5.txt: -------------------------------------------------------------------------------- 1 | 2020 coronavirus pandemic in Bosnia and Herzegovina: As of 27 April 2020, in Bosnia and Herzegovina there were 1565 confirmed coronavirus cases, of which 706 were in Republika Srpska, 840 in the Federation of Bosnia and Herzegovina, and 19 in Brčko District. 2 | Claim: Bosnia and Herzegovina had less than 1550 coronavirus cases by April 26, 2020, including less than 700 in Republika Srpska, and less than 830 in the Federation of Bosnia and Herzegovina. 3 | The correct answer is "Neither". 4 | 5 | James Spader: He is best known for portraying eccentric characters in films such as the drama Sex, Lies, and Videotape -LRB- 1989 -RRB-, the action science fiction film Stargate -LRB- 1994 -RRB-, the controversial psychological thriller Crash -LRB- 1996 -RRB-, the erotic romance Secretary -LRB- 2002 -RRB-, and the blockbuster superhero film Avengers : Age of Ultron -LRB- 2015 -RRB-. 6 | Claim: James Spader has leading roles in TV series. 7 | The correct answer is "False". 8 | 9 | Bank of America: Bank of America provides its products and services through 1,600 retail financial centers, approximately 15,900 automated teller machines, call centers, and online and mobile banking platforms. 10 | Claim: 4,600 retail financial centers are part of Bank of America. 11 | The correct answer is "False". 12 | 13 | Joni Mitchell: A blunt critic of the music industry, Mitchell quit touring and released her 17th ( and reportedly last ) album of original songs in 2008. 14 | Claim: in 2007, Joni Mitchell released a record. 15 | The correct answer is "False". 16 | 17 | Heath Ledger: He is one of the most popular actors on Broadway, playing a lead role in many of the most successful Shakespeare plays in the last 20 years, but he has completely shunned acting in films. 18 | Claim: Heath Ledger acted in stage plays. 19 | The correct answer is "True". 20 | 21 | -------------------------------------------------------------------------------- /prompts/vitaminc_fact_verification/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: 2020 coronavirus pandemic in Bosnia and Herzegovina: As of 27 April 2020, in Bosnia and Herzegovina there were 1565 confirmed coronavirus cases, of which 706 were in Republika Srpska, 840 in the Federation of Bosnia and Herzegovina, and 19 in Brčko District. 2 | Claim: Bosnia and Herzegovina had less than 1550 coronavirus cases by April 26, 2020, including less than 700 in Republika Srpska, and less than 830 in the Federation of Bosnia and Herzegovina. 3 | OPTIONS: (a) "True", (b) "False", (c) "Neither" 4 | ANSWER: (c) 5 | 6 | QUESTION: James Spader: He is best known for portraying eccentric characters in films such as the drama Sex, Lies, and Videotape -LRB- 1989 -RRB-, the action science fiction film Stargate -LRB- 1994 -RRB-, the controversial psychological thriller Crash -LRB- 1996 -RRB-, the erotic romance Secretary -LRB- 2002 -RRB-, and the blockbuster superhero film Avengers : Age of Ultron -LRB- 2015 -RRB-. 7 | Claim: James Spader has leading roles in TV series. 8 | OPTIONS: (a) "True", (b) "False", (c) "Neither" 9 | ANSWER: (b) 10 | 11 | QUESTION: Bank of America: Bank of America provides its products and services through 1,600 retail financial centers, approximately 15,900 automated teller machines, call centers, and online and mobile banking platforms. 12 | Claim: 4,600 retail financial centers are part of Bank of America. 13 | OPTIONS: (a) "True", (b) "False", (c) "Neither" 14 | ANSWER: (b) 15 | 16 | QUESTION: Joni Mitchell: A blunt critic of the music industry, Mitchell quit touring and released her 17th ( and reportedly last ) album of original songs in 2008. 17 | Claim: in 2007, Joni Mitchell released a record. 18 | OPTIONS: (a) "True", (b) "False", (c) "Neither" 19 | ANSWER: (b) 20 | 21 | QUESTION: Heath Ledger: He is one of the most popular actors on Broadway, playing a lead role in many of the most successful Shakespeare plays in the last 20 years, but he has completely shunned acting in films. 22 | Claim: Heath Ledger acted in stage plays. 23 | OPTIONS: (a) "True", (b) "False", (c) "Neither" 24 | ANSWER: (a) 25 | 26 | -------------------------------------------------------------------------------- /prompts/vitaminc_fact_verification/closed_5.txt: -------------------------------------------------------------------------------- 1 | 2020 coronavirus pandemic in Bosnia and Herzegovina: As of 27 April 2020, in Bosnia and Herzegovina there were 1565 confirmed coronavirus cases, of which 706 were in Republika Srpska, 840 in the Federation of Bosnia and Herzegovina, and 19 in Brčko District. 2 | Claim: Bosnia and Herzegovina had less than 1550 coronavirus cases by April 26, 2020, including less than 700 in Republika Srpska, and less than 830 in the Federation of Bosnia and Herzegovina. 3 | The possible answers are "True", "False", "Neither", but the correct answer is "Neither". 4 | 5 | James Spader: He is best known for portraying eccentric characters in films such as the drama Sex, Lies, and Videotape -LRB- 1989 -RRB-, the action science fiction film Stargate -LRB- 1994 -RRB-, the controversial psychological thriller Crash -LRB- 1996 -RRB-, the erotic romance Secretary -LRB- 2002 -RRB-, and the blockbuster superhero film Avengers : Age of Ultron -LRB- 2015 -RRB-. 6 | Claim: James Spader has leading roles in TV series. 7 | The possible answers are "True", "False", "Neither", but the correct answer is "False". 8 | 9 | Bank of America: Bank of America provides its products and services through 1,600 retail financial centers, approximately 15,900 automated teller machines, call centers, and online and mobile banking platforms. 10 | Claim: 4,600 retail financial centers are part of Bank of America. 11 | The possible answers are "True", "False", "Neither", but the correct answer is "False". 12 | 13 | Joni Mitchell: A blunt critic of the music industry, Mitchell quit touring and released her 17th ( and reportedly last ) album of original songs in 2008. 14 | Claim: in 2007, Joni Mitchell released a record. 15 | The possible answers are "True", "False", "Neither", but the correct answer is "False". 16 | 17 | Heath Ledger: He is one of the most popular actors on Broadway, playing a lead role in many of the most successful Shakespeare plays in the last 20 years, but he has completely shunned acting in films. 18 | Claim: Heath Ledger acted in stage plays. 19 | The possible answers are "True", "False", "Neither", but the correct answer is "True". 20 | 21 | -------------------------------------------------------------------------------- /prompts/strange_stories/open_4.txt: -------------------------------------------------------------------------------- 1 | During the war, the Red army captures a member of the Blue army. They want him to tell them where his army's tanks are; they know they are either by the sea or in the mountains. They know that the prisoner will not want to tell them, he will want to save his army, and so he will certainly lie to them. The prisoner is very brave and very clever, he will not let them find his tanks. The tanks are really in the mountains. Now when the other side ask him where his tanks are, he says, "They are in the mountains". 2 | Q: Where does the prisoner think the Red army will look for the tanks? The correct answer is "by the sea". 3 | 4 | It is Halloween, and Chris is going to a fancy-dress party. He is going as a ghost. He wears a big white sheet with eyes cut out to see through. As he walks to the party in his ghost costume, he bumps into Mr Brown. It is dark, and Mr Brown says, "Oh! Who is it?" Chris answers, "I'm a ghost, Mr Brown!" 5 | How does Chris feel? The correct answer is "lighthearted". 6 | 7 | Helen waited all year for Christmas, because she knew at Christmas she could ask her parents for a rabbit. Helen wanted a rabbit more than anything in the world. At last Christmas Day arrived, and Helen ran to unwrap the big box her parents had given her. She felt sure it would contain a little rabbit in a cage. But when she opened it, with all the family standing round, she found her present was just a boring old set of encyclopaedias, which Helen did not want at all! Still, when Helen's parents asked her how she liked her Christmas present, she said, "It's lovely, thank you. It's just what I wanted". 8 | Q: What is the best description of what Helen said? The correct answer is "white lie". 9 | 10 | Today, Katy wants to go on the swings in the playground. But to get to the playground she knows she has to pass old Mr. Jones' house. Mr. Jones has a nasty fierce dog, and every time Katy walks past the house the dog jumps up at the gate and barks. It scares Katy awfully, and she hates walking past the house because of the nasty dog. But Katy does so want to play on the swings. Katy's mother asks her, "Do you want to go out to the playground?" Katy says, "No". 11 | Q: How would Katy feel if she imagines herself swinging at the playground? The correct answer is "joyful". 12 | 13 | -------------------------------------------------------------------------------- /prompts/tracking_shuffled_objects/open_5.txt: -------------------------------------------------------------------------------- 1 | Alice, Bob, and Claire are dancers at a square dance. At the start of a song, they each have a partner: Alice is dancing with Karl, Bob is dancing with Rodrigo, and Claire is dancing with Patrick. Throughout the song, the dancers often trade partners. First, Claire and Alice switch partners. Then, Bob and Alice switch partners. Finally, Alice and Claire switch partners. At the end of the dance, Claire is dancing with? The correct answer is "Rodrigo". 2 | 3 | Alice, Bob, and Claire are dancers at a square dance. At the start of a song, they each have a partner: Alice is dancing with Sam, Bob is dancing with Karl, and Claire is dancing with Helga. Throughout the song, the dancers often trade partners. First, Alice and Claire switch partners. Then, Alice and Bob switch partners. Finally, Claire and Bob switch partners. At the end of the dance, Alice is dancing with? The correct answer is "Karl". 4 | 5 | Alice, Bob, and Claire are holding a white elephant gift exchange. At the start of the event, they are each holding a present of a different color: Alice has a blue present, Bob has a black ball, and Claire has a green present. As the event progresses, pairs of people swap gifts. First, Alice and Bob swap their gifts. Then, Alice and Claire swap their gifts. Finally, Claire and Bob swap their gifts. At the end of the event, Claire has the? The correct answer is "blue present". 6 | 7 | Alice, Bob, and Claire are playing a game. At the start of the game, they are each holding a ball: Alice has a pink ball, Bob has a orange ball, and Claire has a white ball. As the game progresses, pairs of players trade balls. First, Bob and Claire swap balls. Then, Claire and Alice swap balls. Finally, Alice and Bob swap balls. At the end of the game, Bob has the? The correct answer is "orange ball". 8 | 9 | Alice, Bob, and Claire are friends and avid readers who occasionally trade books. At the start of the semester, they each buy one new book: Alice gets The Odyssey, Bob gets The Fellowship of the Ring, and Claire gets Hound of the Baskervilles. As the semester proceeds, they start trading around the new books. First, Bob and Alice swap books. Then, Claire and Bob swap books. Finally, Alice and Claire swap books. At the end of the semester, Bob has? The correct answer is "Hound of the Baskervilles". 10 | 11 | -------------------------------------------------------------------------------- /prompts/english_proverbs/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Laura was really never very successful with men. But she met Mike some time ago, and he is totally head over heels in love with her. Which of the following proverbs best apply to this situation? 2 | OPTIONS: (a) "Absence makes the heart grow fonder", (b) "Beauty is in the eye of the beholder", (c) "Beauty is only skin deep", (d) "All is fair in love and war" 3 | ANSWER: (b) 4 | 5 | QUESTION: Laura always used to be nice and permissive with her students as long as they accomplished their work. However, with time she realized being permissive only leads to the students doing what they felt like and not paying any attention to what Laura was saying. So she became stricter. Which of the following proverbs best apply to this situation? 6 | OPTIONS: (a) "Give them an inch and they will take a mile", (b) "Familiarity breeds contempt", (c) "Good things come to those who wait", (d) "Keep your mouth shut and your eyes open" 7 | ANSWER: (a) 8 | 9 | QUESTION: He left his job tired of so much chaos and the terrible organization. He got into a new company that looked much more professional, but after some time, similar problems started appearing. Which of the following proverbs best apply to this situation? 10 | OPTIONS: (a) "The best-laid plans go astray", (b) "The end justifies the means", (c) "The neighbor's grass always looks greener", (d) "An idle brain is the devil's workshop" 11 | ANSWER: (c) 12 | 13 | QUESTION: Alice always makes sure to keep an extra case of apples in her house. One week, the local supermarket ran out of apples. Thankfully because of Alice's preparation, she was still able to eat apples that week. Which of the following proverbs best apply to this situation? 14 | OPTIONS: (a) "It is better to be safe than sorry", (b) "Hope for the best, prepare for the worst", (c) "Failing to prepare, you are preparing to fail", (d) "An ounce of protection is worth a pound of cure", (e) "A stitch in time saves nine" 15 | ANSWER: (a) 16 | 17 | QUESTION: This has been the fourth company she has grounded on a similar idea. All the previous ones failed, but now she's really got it right. It's skyrocketing. Which of the following proverbs best apply to this situation? 18 | OPTIONS: (a) "A chain is only as strong as its weakest link", (b) "A picture is worth a thousand words", (c) "Fall seven times, stand up eight", (d) "Do not bite off more than you can chew" 19 | ANSWER: (c) 20 | 21 | -------------------------------------------------------------------------------- /prompts/strange_stories/closed-adv_4.txt: -------------------------------------------------------------------------------- 1 | QUESTION: During the war, the Red army captures a member of the Blue army. They want him to tell them where his army's tanks are; they know they are either by the sea or in the mountains. They know that the prisoner will not want to tell them, he will want to save his army, and so he will certainly lie to them. The prisoner is very brave and very clever, he will not let them find his tanks. The tanks are really in the mountains. Now when the other side ask him where his tanks are, he says, "They are in the mountains". 2 | Q: Where does the prisoner think the Red army will look for the tanks? 3 | OPTIONS: (a) "in the mountains", (b) "in the forest", (c) "by the sea", (d) "in the battlefield" 4 | ANSWER: (c) 5 | 6 | QUESTION: It is Halloween, and Chris is going to a fancy-dress party. He is going as a ghost. He wears a big white sheet with eyes cut out to see through. As he walks to the party in his ghost costume, he bumps into Mr Brown. It is dark, and Mr Brown says, "Oh! Who is it?" Chris answers, "I'm a ghost, Mr Brown!" 7 | How does Chris feel? 8 | OPTIONS: (a) "sad", (b) "lighthearted", (c) "scared", (d) "surprised" 9 | ANSWER: (b) 10 | 11 | QUESTION: Helen waited all year for Christmas, because she knew at Christmas she could ask her parents for a rabbit. Helen wanted a rabbit more than anything in the world. At last Christmas Day arrived, and Helen ran to unwrap the big box her parents had given her. She felt sure it would contain a little rabbit in a cage. But when she opened it, with all the family standing round, she found her present was just a boring old set of encyclopaedias, which Helen did not want at all! Still, when Helen's parents asked her how she liked her Christmas present, she said, "It's lovely, thank you. It's just what I wanted". 12 | Q: What is the best description of what Helen said? 13 | OPTIONS: (a) "joke", (b) "sarcasm", (c) "double bluff", (d) "white lie" 14 | ANSWER: (d) 15 | 16 | QUESTION: Today, Katy wants to go on the swings in the playground. But to get to the playground she knows she has to pass old Mr. Jones' house. Mr. Jones has a nasty fierce dog, and every time Katy walks past the house the dog jumps up at the gate and barks. It scares Katy awfully, and she hates walking past the house because of the nasty dog. But Katy does so want to play on the swings. Katy's mother asks her, "Do you want to go out to the playground?" Katy says, "No". 17 | Q: How would Katy feel if she imagines herself swinging at the playground? 18 | OPTIONS: (a) "joyful", (b) "scared", (c) "tired", (d) "bored" 19 | ANSWER: (a) 20 | 21 | -------------------------------------------------------------------------------- /prompts/english_proverbs/closed_5.txt: -------------------------------------------------------------------------------- 1 | Laura was really never very successful with men. But she met Mike some time ago, and he is totally head over heels in love with her. Which of the following proverbs best apply to this situation? The possible answers are "Absence makes the heart grow fonder", "Beauty is in the eye of the beholder", "Beauty is only skin deep", "All is fair in love and war", but the correct answer is "Beauty is in the eye of the beholder". 2 | 3 | Laura always used to be nice and permissive with her students as long as they accomplished their work. However, with time she realized being permissive only leads to the students doing what they felt like and not paying any attention to what Laura was saying. So she became stricter. Which of the following proverbs best apply to this situation? The possible answers are "Give them an inch and they will take a mile", "Familiarity breeds contempt", "Good things come to those who wait", "Keep your mouth shut and your eyes open", but the correct answer is "Give them an inch and they will take a mile". 4 | 5 | He left his job tired of so much chaos and the terrible organization. He got into a new company that looked much more professional, but after some time, similar problems started appearing. Which of the following proverbs best apply to this situation? The possible answers are "The best-laid plans go astray", "The end justifies the means", "The neighbor's grass always looks greener", "An idle brain is the devil's workshop", but the correct answer is "The neighbor's grass always looks greener". 6 | 7 | Alice always makes sure to keep an extra case of apples in her house. One week, the local supermarket ran out of apples. Thankfully because of Alice's preparation, she was still able to eat apples that week. Which of the following proverbs best apply to this situation? The possible answers are "It is better to be safe than sorry", "Hope for the best, prepare for the worst", "Failing to prepare, you are preparing to fail", "An ounce of protection is worth a pound of cure", "A stitch in time saves nine", but the correct answer is "It is better to be safe than sorry". 8 | 9 | This has been the fourth company she has grounded on a similar idea. All the previous ones failed, but now she's really got it right. It's skyrocketing. Which of the following proverbs best apply to this situation? The possible answers are "A chain is only as strong as its weakest link", "A picture is worth a thousand words", "Fall seven times, stand up eight", "Do not bite off more than you can chew", but the correct answer is "Fall seven times, stand up eight". 10 | 11 | -------------------------------------------------------------------------------- /prompts/strange_stories/closed_4.txt: -------------------------------------------------------------------------------- 1 | During the war, the Red army captures a member of the Blue army. They want him to tell them where his army's tanks are; they know they are either by the sea or in the mountains. They know that the prisoner will not want to tell them, he will want to save his army, and so he will certainly lie to them. The prisoner is very brave and very clever, he will not let them find his tanks. The tanks are really in the mountains. Now when the other side ask him where his tanks are, he says, "They are in the mountains". 2 | Q: Where does the prisoner think the Red army will look for the tanks? The possible answers are "in the mountains", "in the forest", "by the sea", "in the battlefield", but the correct answer is "by the sea". 3 | 4 | It is Halloween, and Chris is going to a fancy-dress party. He is going as a ghost. He wears a big white sheet with eyes cut out to see through. As he walks to the party in his ghost costume, he bumps into Mr Brown. It is dark, and Mr Brown says, "Oh! Who is it?" Chris answers, "I'm a ghost, Mr Brown!" 5 | How does Chris feel? The possible answers are "sad", "lighthearted", "scared", "surprised", but the correct answer is "lighthearted". 6 | 7 | Helen waited all year for Christmas, because she knew at Christmas she could ask her parents for a rabbit. Helen wanted a rabbit more than anything in the world. At last Christmas Day arrived, and Helen ran to unwrap the big box her parents had given her. She felt sure it would contain a little rabbit in a cage. But when she opened it, with all the family standing round, she found her present was just a boring old set of encyclopaedias, which Helen did not want at all! Still, when Helen's parents asked her how she liked her Christmas present, she said, "It's lovely, thank you. It's just what I wanted". 8 | Q: What is the best description of what Helen said? The possible answers are "joke", "sarcasm", "double bluff", "white lie", but the correct answer is "white lie". 9 | 10 | Today, Katy wants to go on the swings in the playground. But to get to the playground she knows she has to pass old Mr. Jones' house. Mr. Jones has a nasty fierce dog, and every time Katy walks past the house the dog jumps up at the gate and barks. It scares Katy awfully, and she hates walking past the house because of the nasty dog. But Katy does so want to play on the swings. Katy's mother asks her, "Do you want to go out to the playground?" Katy says, "No". 11 | Q: How would Katy feel if she imagines herself swinging at the playground? The possible answers are "joyful", "scared", "tired", "bored", but the correct answer is "joyful". 12 | 13 | -------------------------------------------------------------------------------- /prompts/tracking_shuffled_objects/closed-adv_5.txt: -------------------------------------------------------------------------------- 1 | QUESTION: Alice, Bob, and Claire are dancers at a square dance. At the start of a song, they each have a partner: Alice is dancing with Karl, Bob is dancing with Rodrigo, and Claire is dancing with Patrick. Throughout the song, the dancers often trade partners. First, Claire and Alice switch partners. Then, Bob and Alice switch partners. Finally, Alice and Claire switch partners. At the end of the dance, Claire is dancing with? 2 | OPTIONS: (a) "Karl", (b) "Rodrigo", (c) "Patrick" 3 | ANSWER: (b) 4 | 5 | QUESTION: Alice, Bob, and Claire are dancers at a square dance. At the start of a song, they each have a partner: Alice is dancing with Sam, Bob is dancing with Karl, and Claire is dancing with Helga. Throughout the song, the dancers often trade partners. First, Alice and Claire switch partners. Then, Alice and Bob switch partners. Finally, Claire and Bob switch partners. At the end of the dance, Alice is dancing with? 6 | OPTIONS: (a) "Sam", (b) "Karl", (c) "Helga" 7 | ANSWER: (b) 8 | 9 | QUESTION: Alice, Bob, and Claire are holding a white elephant gift exchange. At the start of the event, they are each holding a present of a different color: Alice has a blue present, Bob has a black ball, and Claire has a green present. As the event progresses, pairs of people swap gifts. First, Alice and Bob swap their gifts. Then, Alice and Claire swap their gifts. Finally, Claire and Bob swap their gifts. At the end of the event, Claire has the? 10 | OPTIONS: (a) "blue present", (b) "black ball", (c) "green present" 11 | ANSWER: (a) 12 | 13 | QUESTION: Alice, Bob, and Claire are playing a game. At the start of the game, they are each holding a ball: Alice has a pink ball, Bob has a orange ball, and Claire has a white ball. As the game progresses, pairs of players trade balls. First, Bob and Claire swap balls. Then, Claire and Alice swap balls. Finally, Alice and Bob swap balls. At the end of the game, Bob has the? 14 | OPTIONS: (a) "pink ball", (b) "orange ball", (c) "white ball" 15 | ANSWER: (b) 16 | 17 | QUESTION: Alice, Bob, and Claire are friends and avid readers who occasionally trade books. At the start of the semester, they each buy one new book: Alice gets The Odyssey, Bob gets The Fellowship of the Ring, and Claire gets Hound of the Baskervilles. As the semester proceeds, they start trading around the new books. First, Bob and Alice swap books. Then, Claire and Bob swap books. Finally, Alice and Claire swap books. At the end of the semester, Bob has? 18 | OPTIONS: (a) "The Odyssey", (b) "The Fellowship of the Ring", (c) "Hound of the Baskervilles" 19 | ANSWER: (c) 20 | 21 | -------------------------------------------------------------------------------- /prompts/tracking_shuffled_objects/closed_5.txt: -------------------------------------------------------------------------------- 1 | Alice, Bob, and Claire are dancers at a square dance. At the start of a song, they each have a partner: Alice is dancing with Karl, Bob is dancing with Rodrigo, and Claire is dancing with Patrick. Throughout the song, the dancers often trade partners. First, Claire and Alice switch partners. Then, Bob and Alice switch partners. Finally, Alice and Claire switch partners. At the end of the dance, Claire is dancing with? The possible answers are "Karl", "Rodrigo", "Patrick", but the correct answer is "Rodrigo". 2 | 3 | Alice, Bob, and Claire are dancers at a square dance. At the start of a song, they each have a partner: Alice is dancing with Sam, Bob is dancing with Karl, and Claire is dancing with Helga. Throughout the song, the dancers often trade partners. First, Alice and Claire switch partners. Then, Alice and Bob switch partners. Finally, Claire and Bob switch partners. At the end of the dance, Alice is dancing with? The possible answers are "Sam", "Karl", "Helga", but the correct answer is "Karl". 4 | 5 | Alice, Bob, and Claire are holding a white elephant gift exchange. At the start of the event, they are each holding a present of a different color: Alice has a blue present, Bob has a black ball, and Claire has a green present. As the event progresses, pairs of people swap gifts. First, Alice and Bob swap their gifts. Then, Alice and Claire swap their gifts. Finally, Claire and Bob swap their gifts. At the end of the event, Claire has the? The possible answers are "blue present", "black ball", "green present", but the correct answer is "blue present". 6 | 7 | Alice, Bob, and Claire are playing a game. At the start of the game, they are each holding a ball: Alice has a pink ball, Bob has a orange ball, and Claire has a white ball. As the game progresses, pairs of players trade balls. First, Bob and Claire swap balls. Then, Claire and Alice swap balls. Finally, Alice and Bob swap balls. At the end of the game, Bob has the? The possible answers are "pink ball", "orange ball", "white ball", but the correct answer is "orange ball". 8 | 9 | Alice, Bob, and Claire are friends and avid readers who occasionally trade books. At the start of the semester, they each buy one new book: Alice gets The Odyssey, Bob gets The Fellowship of the Ring, and Claire gets Hound of the Baskervilles. As the semester proceeds, they start trading around the new books. First, Bob and Alice swap books. Then, Claire and Bob swap books. Finally, Alice and Claire swap books. At the end of the semester, Bob has? The possible answers are "The Odyssey", "The Fellowship of the Ring", "Hound of the Baskervilles", but the correct answer is "Hound of the Baskervilles". 10 | 11 | -------------------------------------------------------------------------------- /openai.py: -------------------------------------------------------------------------------- 1 | import openai 2 | from utils import * 3 | 4 | openai.organization = '' 5 | openai.api_key = '' 6 | 7 | datasets = ['analytic_entailment/task.json'] 8 | 9 | p_ts = ['closed', 'closed-adv', 'open'] 10 | tokenizer = AutoTokenizer.from_pretrained('gpt2', padding_side='left') 11 | tokenizer.pad_token = tokenizer.eos_token 12 | 13 | for dataset in datasets: 14 | if dataset == 'causal_judgment/task.json': 15 | n_o_s = [0, 2] 16 | elif dataset == 'strange_stories/task.json': 17 | n_o_s = [0, 4] 18 | else: 19 | n_o_s = [0, 5] 20 | for prompt_type in p_ts: 21 | for number_of_shots in n_o_s: 22 | config = { 23 | 'run_id': time.strftime('%Y%m%d_%H%M%S', time.localtime()), 24 | 'filename': dataset, 25 | 'number_of_data': None, ## check here 26 | 'model': 'davinci', 27 | 'prompt_type': prompt_type, 28 | 'number_of_shots': number_of_shots, 29 | 'temperature': 0, 30 | 'max_new_tokens': 'adaptive', ## check here 31 | 'batch_size': None, ## check here 32 | 'pad_token': None, 33 | 'pad_token_id': None, 34 | 'eos_token_id': None, 35 | 'seed': None, 36 | 'device': None 37 | } 38 | 39 | name = config['filename'].replace('/task.json', '') 40 | 41 | try: 42 | with open(os.path.join('data/', config['filename'])) as file: 43 | data = json.loads(file.read())['examples'] 44 | except Exception as e: 45 | with open(os.path.join('data/', config['filename'])) as file: 46 | data = json.loads(file.read()) 47 | 48 | inputs_targets = prepare_data_bigbench(data, config['prompt_type'], config['number_of_shots'], name) 49 | config['number_of_data'] = len(inputs_targets) 50 | 51 | results = defaultdict(list) 52 | with tqdm(total=config['number_of_data']) as t: 53 | for j in range(config['number_of_data']): 54 | input, target, options, longest_sequence = inputs_targets[j] 55 | max_new_tokens = config['max_new_tokens'] 56 | if config['max_new_tokens'] == 'adaptive': 57 | max_new_tokens = tokenizer(list(longest_sequence), padding=True, return_tensors='pt').input_ids.shape[1] + 8 58 | try: 59 | response = openai.Completion.create( 60 | prompt=input, 61 | model=config['model'], 62 | max_tokens=max_new_tokens, 63 | temperature=config['temperature'] 64 | ) 65 | results['prediction'].append(response.choices[0].message['content'].replace(input, '')) 66 | results['input'].append(input) 67 | results['target'].append(target) 68 | results['options'].append(options) 69 | except Exception as e: 70 | time.sleep(6) 71 | response = openai.Completion.create( 72 | prompt=input, 73 | model=config['model'], 74 | max_tokens=max_new_tokens, 75 | temperature=config['temperature'] 76 | ) 77 | results['prediction'].append(response.choices[0].message['content'].replace(input, '')) 78 | results['input'].append(input) 79 | results['target'].append(target) 80 | results['options'].append(options) 81 | t.update(1) 82 | log_and_save(results, config) 83 | -------------------------------------------------------------------------------- /inference.py: -------------------------------------------------------------------------------- 1 | import re 2 | from utils import * 3 | 4 | model_list = ['gpt2'] 5 | 6 | datasets = ['analytic_entailment/task.json'] 7 | 8 | p_ts = ['closed', 'closed-adv', 'open'] 9 | seeds = [2266, 105, 86379] 10 | device = 'cuda' 11 | 12 | for pretrained in model_list: 13 | 14 | tokenizer = AutoTokenizer.from_pretrained(pretrained, padding_side='left') 15 | if pretrained in ['gpt-2']: 16 | model = GPT2LMHeadModel.from_pretrained(pretrained).to(device) 17 | elif pretrained in ['t5-small', 'google/flan-t5-small', 't5-large', 'google/flan-t5-large']: 18 | model = T5ForConditionalGeneration.from_pretrained(pretrained).to(device) 19 | else: 20 | model = AutoModelForCausalLM.from_pretrained(pretrained, torch_dtype=torch.float16).to(device) 21 | 22 | for dataset in datasets: 23 | if dataset == 'causal_judgment/task.json': 24 | n_o_s = [0, 2] 25 | elif dataset == 'strange_stories/task.json': 26 | n_o_s = [0, 4] 27 | else: 28 | n_o_s = [0, 5] 29 | for prompt_type in p_ts: 30 | for number_of_shots in n_o_s: 31 | for seed in seeds: 32 | config = { 33 | 'run_id': time.strftime('%Y%m%d_%H%M%S', time.localtime()), 34 | 'filename': dataset, 35 | 'number_of_data': None, ## check here 36 | 'model': pretrained, 37 | 'prompt_type': prompt_type, 38 | 'number_of_shots': number_of_shots, 39 | 'temperature': 0.01, 40 | 'max_new_tokens': 'adaptive', ## check here 41 | 'batch_size': 16, ## check here 42 | 'pad_token': pad_tokens[pretrained], 43 | 'pad_token_id': pad_ids[pretrained], 44 | 'eos_token_id': eos_ids[pretrained], 45 | 'seed': seed, 46 | 'device': torch.cuda.get_device_name(torch.cuda.current_device()) 47 | } 48 | 49 | torch.cuda.manual_seed_all(config['seed']) 50 | 51 | name = config['filename'].replace('/task.json', '') 52 | 53 | try: 54 | with open(os.path.join('data/', config['filename'])) as file: 55 | data = json.loads(file.read())['examples'] 56 | except Exception as e: 57 | with open(os.path.join('data/', config['filename'])) as file: 58 | data = json.loads(file.read()) 59 | 60 | inputs_targets = prepare_data_bigbench(data, config['prompt_type'], config['number_of_shots'], name) 61 | config['number_of_data'] = len(inputs_targets) 62 | loader = prepare_loader(inputs_targets, tokenizer, config['pad_token'], batch_size=config['batch_size'], shuffle=False, device=device) 63 | 64 | results = defaultdict(list) 65 | with tqdm(total=len(loader)) as t: 66 | for input_ids, attention_mask, raw_inputs, targets, options, longest_sequence in loader: 67 | max_new_tokens = config['max_new_tokens'] 68 | if config['max_new_tokens'] == 'adaptive': 69 | max_new_tokens = tokenizer(list(longest_sequence), padding=True, return_tensors='pt').input_ids.shape[1] + 8 70 | outputs = batch_predict( 71 | input_ids, 72 | attention_mask, 73 | model, 74 | tokenizer, 75 | max_new_tokens, 76 | pad_token_id=config['pad_token_id'], 77 | eos_token_id=config['eos_token_id'], 78 | temperature=config['temperature'], 79 | device=device 80 | ) 81 | for i in range(len(outputs)): 82 | results['input'].append(raw_inputs[i]) 83 | results['target'].append(targets[i]) 84 | results['options'].append(options[i]) 85 | results['prediction'].append(outputs[i].replace(raw_inputs[i], '').lstrip('')) 86 | t.update(1) 87 | log_and_save(results, config) 88 | time.sleep(2) 89 | 90 | del model 91 | torch.cuda.empty_cache() 92 | time.sleep(2) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Are Emergent Abilities in Large Language Models just In-Context Learning? 2 | 3 | > **Abstract:** Large language models have exhibited *emergent* abilities, demonstrating exceptional performance across diverse tasks for which they were not explicitly trained, including those that require complex reasoning abilities. The emergence of such abilities carries profound implications for the future direction of research in NLP, especially as the deployment of such models becomes more prevalent. However, one key challenge is that the evaluation of these abilities is often confounded by competencies that arise in models through alternative prompting techniques, such as in-context learning and instruction following, which also emerge as the models are scaled up. In this study, we provide the first comprehensive examination of these emergent abilities while accounting for various potentially biasing factors that can influence the evaluation of models. We conduct rigorous tests on a set of 18 models, encompassing a parameter range from 60 million to 175 billion parameters, across a comprehensive set of 22 tasks. Through an extensive series of over 1,000 experiments, we provide compelling evidence that emergent abilities can primarily be ascribed to in-context learning. We find no evidence for the emergence of reasoning abilities, thus providing valuable insights into the underlying mechanisms driving the observed abilities and thus alleviating safety concerns regarding their use. 4 | 5 | Contact persons: Sheng Lu, Irina Bigoulaeva, Harish Tayyar Madabushi 6 | 7 | https://www.ukp.tu-darmstadt.de/ 8 | 9 | https://www.tu-darmstadt.de/ 10 | 11 | ## Evaluation scores 12 | See [evaluation_scores](https://github.com/UKPLab/on-emergence/blob/main/evaluation_scores.csv) for the evaluation scores of our experiments. 13 | 14 | ## Output files 15 | All of the output files associated to our experiments can be found [here](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3931). We name each file in a *date_time* fashion (e.g., ```results-20230524_123238.json```), which is also the ```run_id``` of an experiment that can be found in [evaluation_scores](https://github.com/UKPLab/on-emergence/blob/main/evaluation_scores.csv). 16 | 17 | ## Citation 18 | Please use the following citation: 19 | 20 | ``` 21 | @inproceedings{lu-etal-2024-emergent, 22 | title = "Are Emergent Abilities in Large Language Models just In-Context Learning?", 23 | author = "Lu, Sheng and 24 | Bigoulaeva, Irina and 25 | Sachdeva, Rachneet and 26 | Tayyar Madabushi, Harish and 27 | Gurevych, Iryna", 28 | editor = "Ku, Lun-Wei and 29 | Martins, Andre and 30 | Srikumar, Vivek", 31 | booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", 32 | month = aug, 33 | year = "2024", 34 | address = "Bangkok, Thailand", 35 | publisher = "Association for Computational Linguistics", 36 | url = "https://aclanthology.org/2024.acl-long.279/", 37 | doi = "10.18653/v1/2024.acl-long.279", 38 | pages = "5098--5139", 39 | abstract = "Large language models, comprising billions of parameters and pre-trained on extensive web-scale corpora, have been claimed to acquire certain capabilities without having been specifically trained on them. These capabilities, referred to as {\textquotedblleft}emergent abilities,{\textquotedblright} have been a driving force in discussions regarding the potentials and risks of language models. A key challenge in evaluating emergent abilities is that they are confounded by model competencies that arise through alternative prompting techniques, including in-context learning, which is the ability of models to complete a task based on a few examples. We present a novel theory that explains emergent abilities, taking into account their potential confounding factors, and rigorously substantiate this theory through over 1000 experiments. Our findings suggest that purported emergent abilities are not truly emergent, but result from a combination of in-context learning, model memory, and linguistic knowledge. Our work is a foundational step in explaining language model performance, providing a template for their efficient use and clarifying the paradox of their ability to excel in some instances while faltering in others. Thus, we demonstrate that their capabilities should not be overestimated." 40 | } 41 | ``` 42 | 43 | ## License 44 | 45 | This work is licensed under a 46 | [Creative Commons Attribution-ShareAlike 4.0 International License][cc-by-sa]. 47 | 48 | [![CC BY-SA 4.0][cc-by-sa-image]][cc-by-sa] 49 | 50 | [cc-by-sa]: http://creativecommons.org/licenses/by-sa/4.0/ 51 | [cc-by-sa-image]: https://licensebuttons.net/l/by-sa/4.0/88x31.png 52 | [cc-by-sa-shield]: https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg 53 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import time 4 | import torch 5 | import random 6 | import pandas as pd 7 | 8 | from tqdm import tqdm 9 | from collections import defaultdict 10 | from torch.utils.data import DataLoader 11 | from sklearn.metrics import accuracy_score 12 | from transformers import AutoTokenizer, AutoModelForCausalLM, GPT2LMHeadModel, T5ForConditionalGeneration 13 | 14 | pad_tokens = { 15 | 'gpt2': '<|endoftext|>', 16 | 'gpt2-xl': '<|endoftext|>', 17 | 'EleutherAI/gpt-j-6b': '<|endoftext|>', 18 | 'togethercomputer/GPT-JT-6B-v1': '<|endoftext|>', 19 | 't5-small': '', 20 | 't5-large': '', 21 | 'google/flan-t5-small': '', 22 | 'google/flan-t5-large': '', 23 | 'chavinlo/alpaca-native': '', 24 | 'chavinlo/alpaca-13b': '' 25 | } 26 | 27 | pad_ids = { 28 | 'gpt2': 50256, 29 | 'gpt2-xl': 50256, 30 | 'EleutherAI/gpt-j-6b': 50256, 31 | 'togethercomputer/GPT-JT-6B-v1': 50256, 32 | 't5-small': 0, 33 | 't5-large': 0, 34 | 'google/flan-t5-small': 0, 35 | 'google/flan-t5-large': 0, 36 | 'chavinlo/alpaca-native': 0, 37 | 'chavinlo/alpaca-13b': 0 38 | } 39 | 40 | eos_ids = { 41 | 'gpt2': 50256, 42 | 'gpt2-xl': 50256, 43 | 'EleutherAI/gpt-j-6b': 50256, 44 | 'togethercomputer/GPT-JT-6B-v1': 50256, 45 | 't5-small': 1, 46 | 't5-large': 1, 47 | 'google/flan-t5-small': 1, 48 | 'google/flan-t5-large': 1, 49 | 'chavinlo/alpaca-native': 2, 50 | 'chavinlo/alpaca-13b': 2 51 | 52 | } 53 | 54 | prompt_types = { 55 | 'open': '{input} The correct answer is', 56 | 'closed': '{input} The possible answers are {choices}, but the correct answer is', 57 | 'closed-adv': 'QUESTION: {input}\nOPTIONS: {choices}\nANSWER:' 58 | } 59 | 60 | def prepare_data_bigbench(data, prompt_type, number_of_shots, name): 61 | if name in ['vitaminc_fact_verification', 'rhyming']: 62 | prompt_types['open'] = '{input}\nThe correct answer is' 63 | prompt_types['closed'] = '{input}\nThe possible answers are {choices}, but the correct answer is' 64 | if name in ['common_morpheme', 'phrase_relatedness']: 65 | prompt_types['open'] = '{input}. The correct answer is' 66 | prompt_types['closed'] = '{input}. The possible answers are {choices}, but the correct answer is' 67 | if name in ['modified_arithmetic']: 68 | prompt_types['open'] = '{input} ?\nThe correct answer is' 69 | prompt_types['closed'] = '{input} ?\nThe possible answers are {choices}, but the correct answer is' 70 | prompt_types['closed-adv'] = 'QUESTION: {input} ?\nOPTIONS: {choices}\nANSWER:' 71 | if name in ['codenames']: 72 | prompt_types['closed'] = '{input} The correct answer is' 73 | inputs_targets = [] 74 | for item in data: 75 | input = item['input'].rstrip(' ').replace(' ,', ',').replace(' .', '.') 76 | if name == 'tracking_shuffled_objects': 77 | input = input + '?' 78 | if name == 'codenames': 79 | options = {item['target']: 1} 80 | target = item['target'] 81 | else: 82 | options = item['target_scores'] 83 | target = '\"{}\"'.format(list(options.keys())[list(options.values()).index(1)].rstrip('.')) 84 | if prompt_type in ['closed']: 85 | input = prompt_types[prompt_type].format( 86 | input=input, 87 | choices=', '.join(['\"{}\"'.format(choice.rstrip('.')) for j, choice in enumerate(options.keys())]) 88 | ) 89 | elif prompt_type in ['closed-adv']: 90 | target = '({})'.format({0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i', 9: 'j', 10: 'k'}[list(options.values()).index(1)]) 91 | input = prompt_types[prompt_type].format( 92 | input=input, 93 | choices=', '.join(['({}) \"{}\"'.format({0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i', 9: 'j', 10: 'k'}[j], choice) for j, choice in enumerate(options.keys())]) 94 | ) 95 | elif prompt_type in ['open']: 96 | input = prompt_types[prompt_type].format(input=input) 97 | if number_of_shots > 0: 98 | with open(f'prompts/{name}/{prompt_type}_{number_of_shots}.txt') as file: 99 | prefix = file.read() 100 | input = prefix + input 101 | longest_sequence = list(options.keys())[0] 102 | for option in list(options.keys())[1:]: 103 | if len(option) > len(longest_sequence): 104 | longest_sequence = option 105 | inputs_targets.append((input, target, '\t'.join(list(options.keys())), longest_sequence)) 106 | return inputs_targets 107 | 108 | def prepare_loader(inputs_targets, tokenizer, pad_token, batch_size=4, shuffle=True, device='cuda'): 109 | inputs = [item[0] for item in inputs_targets] 110 | tokenizer.add_special_tokens({'pad_token': pad_token}) 111 | inputs = tokenizer(inputs, padding=True, return_tensors='pt').to(device) 112 | input_ids, attention_mask = inputs.input_ids, inputs.attention_mask 113 | return DataLoader([(input_ids[i], attention_mask[i], inputs_targets[i][0], inputs_targets[i][1], inputs_targets[i][2], inputs_targets[i][3]) for i in range(len(inputs_targets))], batch_size=batch_size, shuffle=shuffle) 114 | 115 | def predict(input, model, tokenizer, max_new_tokens, pad_token_id, eos_token_id, top_k=20, temperature=0.9, do_sample=True, device='cuda'): 116 | input_ids = tokenizer(input, return_tensors='pt').input_ids.to(device) 117 | with torch.no_grad(): 118 | output = model.generate(input_ids, max_new_tokens=max_new_tokens, top_k=top_k, pad_token_id=pad_token_id, eos_token_id=eos_token_id, do_sample=do_sample, temperature=temperature) 119 | return tokenizer.batch_decode(output)[0] 120 | 121 | def batch_predict(input_ids, attention_mask, model, tokenizer, max_new_tokens, pad_token_id, eos_token_id, top_k=20, temperature=0.9, do_sample=True, device='cuda'): 122 | with torch.no_grad(): 123 | output = model.generate(input_ids, attention_mask=attention_mask, max_new_tokens=max_new_tokens, top_k=top_k, pad_token_id=pad_token_id, eos_token_id=eos_token_id, do_sample=do_sample, temperature=temperature) 124 | return tokenizer.batch_decode(output) 125 | 126 | def log_and_save(results, config): 127 | with open('config_reference.txt', 'a') as file: 128 | file.write(('\t'.join(['{' + i + '}' for i in config.keys()]) + '\n').format(**config)) 129 | file.close() 130 | 131 | with open('results/results-{}.json'.format(config['run_id']), 'w') as file: 132 | json.dump(results, file, indent=4, ensure_ascii=False) 133 | 134 | -------------------------------------------------------------------------------- /evaluation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "id": "6bf1c988-8af5-43f3-8a4c-b4b7f26ca4b7", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "import os\n", 11 | "import re\n", 12 | "import json\n", 13 | "import editdistance\n", 14 | "import pandas as pd\n", 15 | "\n", 16 | "from tqdm import trange\n", 17 | "from bert_score import BERTScorer\n", 18 | "from sklearn.metrics import accuracy_score, precision_recall_fscore_support, classification_report" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "id": "344e7bcf-71d6-4ddd-85f7-08bfac00a38d", 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "def exact_match(results, config, average='macro'):\n", 29 | " actuals, predictions, correctness = [], [], []\n", 30 | " for i in range(len(results['input'])):\n", 31 | " input, answer, actual = results['input'][i], results['prediction'][i], results['target'][i]\n", 32 | " options = [f'\\\"{option}\\\"' for option in results['options'][i].split('\\t')]\n", 33 | " if config['filename'] in ['english_proverbs/task.json', 'tracking_shuffled_objects/task.json', 'logical_deduction/task.json']:\n", 34 | " options = ['\\\"{}\\\"'.format(option.rstrip('.')) for option in results['options'][i].split('\\t')]\n", 35 | " if config['filename'] in ['modified_arithmetic/task.json']:\n", 36 | " input = input.replace(' ?', '')\n", 37 | " answer = answer.replace('?', '').replace(input, '')\n", 38 | " if config['model'] in [\n", 39 | " 'chavinlo/alpaca-native',\n", 40 | " 'chavinlo/alpaca-13b',\n", 41 | " 'models--llama/7B',\n", 42 | " 'models--llama/13B',\n", 43 | " 'models--llama/30B'\n", 44 | " ]:\n", 45 | " answer = answer.lstrip('unk>').replace('', '').replace('', '').replace('', '').replace('⁇', '')\n", 46 | " answer = answer.lstrip(',').lstrip('.').lstrip(' ').lstrip('\\n').lower()\n", 47 | " prediction = '-100'\n", 48 | " if config['prompt_type'] in ['closed', 'open', 'closed-info_removed']:\n", 49 | " for option in options:\n", 50 | " bare = option.lstrip('\\\"').rstrip('\\\"') # some models did not output quotation marks\n", 51 | " if prediction == '-100':\n", 52 | " if option.lower() == answer[:len(option)] or bare.lower() == answer[:len(bare)]:\n", 53 | " prediction = option\n", 54 | " elif option.lower() in answer[len(option):] or bare.lower() in answer[len(bare):]:\n", 55 | " prediction = '-100'\n", 56 | " break\n", 57 | " if config['prompt_type'] in ['closed-adv']:\n", 58 | " if (actual == answer[:len(actual)] and\n", 59 | " '(a)' not in answer[len(actual):] and\n", 60 | " '(b)' not in answer[len(actual):] and\n", 61 | " '(c)' not in answer[len(actual):]):\n", 62 | " prediction = actual\n", 63 | " actuals.append(actual)\n", 64 | " predictions.append(prediction)\n", 65 | " if actual == prediction:\n", 66 | " correctness.append('T')\n", 67 | " else:\n", 68 | " correctness.append('F')\n", 69 | " accuracy = accuracy_score(actuals, predictions)\n", 70 | " precision, recall, f1, _ = precision_recall_fscore_support(actuals, predictions, average=average, zero_division=0)\n", 71 | " results['exact_match_correctness'] = correctness\n", 72 | " with open('results/results-{}.json'.format(config['run_id']), 'w') as file:\n", 73 | " json.dump(results, file, indent=4, ensure_ascii=False)\n", 74 | " return accuracy, precision, recall, f1\n", 75 | "\n", 76 | "def bert_score(results, config, scorer, average='macro'):\n", 77 | " actuals, predictions, correctness = [], [], []\n", 78 | " for i in trange(len(results['input'])):\n", 79 | " actual = results['target'][i]\n", 80 | " options = [f'\\\"{option}\\\"' for option in results['options'][i].split('\\t')]\n", 81 | " if config['filename'] in ['english_proverbs/task.json', 'tracking_shuffled_objects/task.json', 'logical_deduction/task.json']:\n", 82 | " options = ['\\\"{}\\\"'.format(option.rstrip('.')) for option in results['options'][i].split('\\t')]\n", 83 | " _, _, f1s = scorer.score([results['prediction'][i] for _ in range(len(options))], options)\n", 84 | " actuals.append(actual)\n", 85 | " predictions.append(options[f1s.argmax()])\n", 86 | " if actual == options[f1s.argmax()]:\n", 87 | " correctness.append('T')\n", 88 | " else:\n", 89 | " correctness.append('F')\n", 90 | " accuracy = accuracy_score(actuals, predictions)\n", 91 | " precision, recall, f1, _ = precision_recall_fscore_support(actuals, predictions, average=average, zero_division=0)\n", 92 | " results['bert_score_correctness'] = correctness\n", 93 | " with open('results/results-{}.json'.format(config['run_id']), 'w') as file:\n", 94 | " json.dump(results, file, indent=4, ensure_ascii=False)\n", 95 | " return accuracy, precision, recall, f1\n", 96 | "\n", 97 | "def average_edit_distance(results, config):\n", 98 | " distances = 0\n", 99 | " for i in range(len(results['input'])):\n", 100 | " actual = results['target'][i]\n", 101 | " if config['filename'] in ['modified_arithmetic/task.json']:\n", 102 | " input = results['input'][i].replace(' ?', '')\n", 103 | " prediction = results['prediction'][i].replace('?', '').replace(input, '')\n", 104 | " if config['model'] in [\n", 105 | " 'chavinlo/alpaca-native',\n", 106 | " 'chavinlo/alpaca-13b',\n", 107 | " 'models--llama/7B',\n", 108 | " 'models--llama/13B',\n", 109 | " 'models--llama/30B'\n", 110 | " ]:\n", 111 | " prediction = results['prediction'][i].lstrip('unk>').lstrip('?? ').replace('', '').replace('', '').replace('', '')\n", 112 | " else:\n", 113 | " prediction = results['prediction'][i]\n", 114 | " distances += editdistance.eval(actual, prediction)\n", 115 | " return distances / len(results['input'])\n", 116 | "\n", 117 | "def average_single_bert_score(results, scorer, config):\n", 118 | " predictions = results['prediction']\n", 119 | " if config['filename'] in ['modified_arithmetic/task.json']:\n", 120 | " for i in range(len(results['input'])):\n", 121 | " input = results['input'][i].replace(' ?', '')\n", 122 | " predictions[i] = results['prediction'][i].replace('?', '').replace(input, '')\n", 123 | " if config['model'] in [\n", 124 | " 'chavinlo/alpaca-native',\n", 125 | " 'chavinlo/alpaca-13b',\n", 126 | " 'models--llama/7B',\n", 127 | " 'models--llama/13B',\n", 128 | " 'models--llama/30B'\n", 129 | " ]:\n", 130 | " for i in range(len(results['input'])):\n", 131 | " predictions[i] = results['prediction'][i].lstrip('unk>').lstrip('?? ').replace('', '').replace('', '').replace('', '')\n", 132 | " _, _, f1s = scorer.score(predictions, results['target'])\n", 133 | " return float(f1s.mean())\n", 134 | " \n", 135 | "def loss_and_gain(results, config, evaluation_scores_df, metric):\n", 136 | " try:\n", 137 | " run_id_reference = evaluation_scores_df['run_id'][\n", 138 | " (evaluation_scores_df.model==pairs[config['model']]) &\n", 139 | " (evaluation_scores_df.prompt_type==config['prompt_type']) &\n", 140 | " (evaluation_scores_df.number_of_shots==config['number_of_shots']) &\n", 141 | " (evaluation_scores_df.seed==config['seed'])\n", 142 | " ].tolist()[0]\n", 143 | " except Exception as e:\n", 144 | " run_id_reference = evaluation_scores_df['run_id'][\n", 145 | " (evaluation_scores_df.model==pairs[config['model']]) &\n", 146 | " (evaluation_scores_df.prompt_type==config['prompt_type']) &\n", 147 | " (evaluation_scores_df.number_of_shots==config['number_of_shots']) &\n", 148 | " (evaluation_scores_df.seed==str(config['seed']))\n", 149 | " ].tolist()[0]\n", 150 | " try:\n", 151 | " with open(f'results/results-{run_id_reference}.json') as file:\n", 152 | " results_reference = json.loads(file.read())\n", 153 | " except Exception as e:\n", 154 | " with open(f'results_20230618/results-{run_id_reference}.json') as file:\n", 155 | " results_reference = json.loads(file.read())\n", 156 | " loss, gain = 0, 0\n", 157 | " for i in range(len(results['input'])):\n", 158 | " correctness = results[metric][i]\n", 159 | " correctness_reference = results_reference[metric][i]\n", 160 | " # candidate model got it wrong when reference model got it right\n", 161 | " if correctness == 'F' and correctness_reference == 'T':\n", 162 | " loss += 1\n", 163 | " # candidate model got it right when reference model got it wrong\n", 164 | " if correctness == 'T' and correctness_reference == 'F':\n", 165 | " gain += 1\n", 166 | " return loss, gain" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "id": "6e57e30f-70f9-46f3-9c62-5fe86db9a39d", 172 | "metadata": { 173 | "tags": [] 174 | }, 175 | "source": [ 176 | "## accuracy" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "id": "2e298a3a-6879-4fd7-90af-85961d0a3289", 183 | "metadata": {}, 184 | "outputs": [], 185 | "source": [ 186 | "config_df = pd.read_csv('evaluation_scores.csv')\n", 187 | "scorer = BERTScorer(model_type='roberta-large', device='cuda')\n", 188 | "\n", 189 | "for run_id in config_df.run_id.tolist():\n", 190 | " config = {k: v[0] for k, v in config_df[config_df.run_id == run_id].to_dict(orient='list').items()}\n", 191 | " name = config['filename'].replace('/task.json', '')\n", 192 | " with open(f'results/results-{run_id}.json') as file:\n", 193 | " results = json.loads(file.read())\n", 194 | " config['exact_match_accuracy'], _, _, config['exact_match_f1'] = exact_match(results, config)\n", 195 | " config['bert_score_accuracy'], config['bert_score_f1'] = -100, -100\n", 196 | " if config['prompt_type'] != 'closed-adv' and name not in ['gsm8k', 'codenames', 'modified_arithmetic']:\n", 197 | " config['bert_score_accuracy'], _, _, config['bert_score_f1'] = bert_score(results, config, scorer)\n", 198 | " with open(f'evaluation_scores/evaluation_scores_{name}.txt', 'a') as file:\n", 199 | " file.write(('\\t'.join(['{' + i + '}' for i in config.keys()]) + '\\n').format(**config))\n", 200 | " file.close()" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "id": "8644e513-f236-47c7-ac5e-0bda1b66a39c", 206 | "metadata": { 207 | "tags": [] 208 | }, 209 | "source": [ 210 | "## edit distance" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": null, 216 | "id": "1fec22a1-8094-4950-8799-8319360a3cf3", 217 | "metadata": {}, 218 | "outputs": [], 219 | "source": [ 220 | "for filename in os.listdir('evaluation_scores/'):\n", 221 | " config_df = pd.read_csv(\n", 222 | " 'evaluation_scores/' + filename,\n", 223 | " sep='\\t',\n", 224 | " names=['run_id', 'filename', 'number_of_data',\n", 225 | " 'model', 'prompt_type', 'number_of_shots',\n", 226 | " 'temperature', 'max_new_tokens', 'batch_size',\n", 227 | " 'pad_token', 'pad_token_id', 'eos_token_id',\n", 228 | " 'seed', 'device', 'exact_match_accuracy',\n", 229 | " 'exact_match_f1', 'bert_score_accuracy',\n", 230 | " 'bert_score_f1']\n", 231 | " )\n", 232 | " for run_id in config_df.run_id.tolist():\n", 233 | " config = {k: v[0] for k, v in config_df[config_df.run_id == run_id].to_dict(orient='list').items()}\n", 234 | " with open(f'results/results-{run_id}.json') as file:\n", 235 | " results = json.loads(file.read())\n", 236 | " config['edit_distance'] = average_edit_distance(results, config)\n", 237 | " with open(f'evaluation_scores/plus edit distance/plus_edit_distance.txt', 'a') as file:\n", 238 | " file.write(('\\t'.join(['{' + i + '}' for i in config.keys()]) + '\\n').format(**config))\n", 239 | " file.close()" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "id": "52c73317-d1b9-4a62-b90d-74bf27ab2675", 245 | "metadata": { 246 | "tags": [] 247 | }, 248 | "source": [ 249 | "## single bert score" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "id": "1fa6da38-eb0f-47c3-a433-cd686d8c5de2", 256 | "metadata": {}, 257 | "outputs": [], 258 | "source": [ 259 | "scorer = BERTScorer(model_type='roberta-large', device='cuda')\n", 260 | "config_df = pd.read_csv(\n", 261 | " 'evaluation_scores/plus edit distance/plus_edit_distance.txt',\n", 262 | " sep='\\t'\n", 263 | ")\n", 264 | "for run_id in config_df.run_id.tolist():\n", 265 | " config = {k: v[0] for k, v in config_df[config_df.run_id == run_id].to_dict(orient='list').items()}\n", 266 | " config['single_bert_score'] = -100\n", 267 | " with open(f'results/results-{run_id}.json') as file:\n", 268 | " results = json.loads(file.read())\n", 269 | " config['single_bert_score'] = average_single_bert_score(results, scorer, config)\n", 270 | " with open(f'evaluation_scores/plus 2/plus_2.txt', 'a') as file:\n", 271 | " file.write(('\\t'.join(['{' + i + '}' for i in config.keys()]) + '\\n').format(**config))\n", 272 | " file.close()" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": null, 278 | "id": "f69daf6d-6a55-4cf4-9300-abce2cc2d199", 279 | "metadata": {}, 280 | "outputs": [], 281 | "source": [] 282 | } 283 | ], 284 | "metadata": { 285 | "kernelspec": { 286 | "display_name": "Python 3", 287 | "language": "python", 288 | "name": "python3" 289 | }, 290 | "language_info": { 291 | "codemirror_mode": { 292 | "name": "ipython", 293 | "version": 3 294 | }, 295 | "file_extension": ".py", 296 | "mimetype": "text/x-python", 297 | "name": "python", 298 | "nbconvert_exporter": "python", 299 | "pygments_lexer": "ipython3", 300 | "version": "3.6.8" 301 | } 302 | }, 303 | "nbformat": 4, 304 | "nbformat_minor": 5 305 | } 306 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Attribution-ShareAlike 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More_considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution-ShareAlike 4.0 International Public 58 | License 59 | 60 | By exercising the Licensed Rights (defined below), You accept and agree 61 | to be bound by the terms and conditions of this Creative Commons 62 | Attribution-ShareAlike 4.0 International Public License ("Public 63 | License"). To the extent this Public License may be interpreted as a 64 | contract, You are granted the Licensed Rights in consideration of Your 65 | acceptance of these terms and conditions, and the Licensor grants You 66 | such rights in consideration of benefits the Licensor receives from 67 | making the Licensed Material available under these terms and 68 | conditions. 69 | 70 | 71 | Section 1 -- Definitions. 72 | 73 | a. Adapted Material means material subject to Copyright and Similar 74 | Rights that is derived from or based upon the Licensed Material 75 | and in which the Licensed Material is translated, altered, 76 | arranged, transformed, or otherwise modified in a manner requiring 77 | permission under the Copyright and Similar Rights held by the 78 | Licensor. For purposes of this Public License, where the Licensed 79 | Material is a musical work, performance, or sound recording, 80 | Adapted Material is always produced where the Licensed Material is 81 | synched in timed relation with a moving image. 82 | 83 | b. Adapter's License means the license You apply to Your Copyright 84 | and Similar Rights in Your contributions to Adapted Material in 85 | accordance with the terms and conditions of this Public License. 86 | 87 | c. BY-SA Compatible License means a license listed at 88 | creativecommons.org/compatiblelicenses, approved by Creative 89 | Commons as essentially the equivalent of this Public License. 90 | 91 | d. Copyright and Similar Rights means copyright and/or similar rights 92 | closely related to copyright including, without limitation, 93 | performance, broadcast, sound recording, and Sui Generis Database 94 | Rights, without regard to how the rights are labeled or 95 | categorized. For purposes of this Public License, the rights 96 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 97 | Rights. 98 | 99 | e. Effective Technological Measures means those measures that, in the 100 | absence of proper authority, may not be circumvented under laws 101 | fulfilling obligations under Article 11 of the WIPO Copyright 102 | Treaty adopted on December 20, 1996, and/or similar international 103 | agreements. 104 | 105 | f. Exceptions and Limitations means fair use, fair dealing, and/or 106 | any other exception or limitation to Copyright and Similar Rights 107 | that applies to Your use of the Licensed Material. 108 | 109 | g. License Elements means the license attributes listed in the name 110 | of a Creative Commons Public License. The License Elements of this 111 | Public License are Attribution and ShareAlike. 112 | 113 | h. Licensed Material means the artistic or literary work, database, 114 | or other material to which the Licensor applied this Public 115 | License. 116 | 117 | i. Licensed Rights means the rights granted to You subject to the 118 | terms and conditions of this Public License, which are limited to 119 | all Copyright and Similar Rights that apply to Your use of the 120 | Licensed Material and that the Licensor has authority to license. 121 | 122 | j. Licensor means the individual(s) or entity(ies) granting rights 123 | under this Public License. 124 | 125 | k. Share means to provide material to the public by any means or 126 | process that requires permission under the Licensed Rights, such 127 | as reproduction, public display, public performance, distribution, 128 | dissemination, communication, or importation, and to make material 129 | available to the public including in ways that members of the 130 | public may access the material from a place and at a time 131 | individually chosen by them. 132 | 133 | l. Sui Generis Database Rights means rights other than copyright 134 | resulting from Directive 96/9/EC of the European Parliament and of 135 | the Council of 11 March 1996 on the legal protection of databases, 136 | as amended and/or succeeded, as well as other essentially 137 | equivalent rights anywhere in the world. 138 | 139 | m. You means the individual or entity exercising the Licensed Rights 140 | under this Public License. Your has a corresponding meaning. 141 | 142 | 143 | Section 2 -- Scope. 144 | 145 | a. License grant. 146 | 147 | 1. Subject to the terms and conditions of this Public License, 148 | the Licensor hereby grants You a worldwide, royalty-free, 149 | non-sublicensable, non-exclusive, irrevocable license to 150 | exercise the Licensed Rights in the Licensed Material to: 151 | 152 | a. reproduce and Share the Licensed Material, in whole or 153 | in part; and 154 | 155 | b. produce, reproduce, and Share Adapted Material. 156 | 157 | 2. Exceptions and Limitations. For the avoidance of doubt, where 158 | Exceptions and Limitations apply to Your use, this Public 159 | License does not apply, and You do not need to comply with 160 | its terms and conditions. 161 | 162 | 3. Term. The term of this Public License is specified in Section 163 | 6(a). 164 | 165 | 4. Media and formats; technical modifications allowed. The 166 | Licensor authorizes You to exercise the Licensed Rights in 167 | all media and formats whether now known or hereafter created, 168 | and to make technical modifications necessary to do so. The 169 | Licensor waives and/or agrees not to assert any right or 170 | authority to forbid You from making technical modifications 171 | necessary to exercise the Licensed Rights, including 172 | technical modifications necessary to circumvent Effective 173 | Technological Measures. For purposes of this Public License, 174 | simply making modifications authorized by this Section 2(a) 175 | (4) never produces Adapted Material. 176 | 177 | 5. Downstream recipients. 178 | 179 | a. Offer from the Licensor -- Licensed Material. Every 180 | recipient of the Licensed Material automatically 181 | receives an offer from the Licensor to exercise the 182 | Licensed Rights under the terms and conditions of this 183 | Public License. 184 | 185 | b. Additional offer from the Licensor -- Adapted Material. 186 | Every recipient of Adapted Material from You 187 | automatically receives an offer from the Licensor to 188 | exercise the Licensed Rights in the Adapted Material 189 | under the conditions of the Adapter's License You apply. 190 | 191 | c. No downstream restrictions. You may not offer or impose 192 | any additional or different terms or conditions on, or 193 | apply any Effective Technological Measures to, the 194 | Licensed Material if doing so restricts exercise of the 195 | Licensed Rights by any recipient of the Licensed 196 | Material. 197 | 198 | 6. No endorsement. Nothing in this Public License constitutes or 199 | may be construed as permission to assert or imply that You 200 | are, or that Your use of the Licensed Material is, connected 201 | with, or sponsored, endorsed, or granted official status by, 202 | the Licensor or others designated to receive attribution as 203 | provided in Section 3(a)(1)(A)(i). 204 | 205 | b. Other rights. 206 | 207 | 1. Moral rights, such as the right of integrity, are not 208 | licensed under this Public License, nor are publicity, 209 | privacy, and/or other similar personality rights; however, to 210 | the extent possible, the Licensor waives and/or agrees not to 211 | assert any such rights held by the Licensor to the limited 212 | extent necessary to allow You to exercise the Licensed 213 | Rights, but not otherwise. 214 | 215 | 2. Patent and trademark rights are not licensed under this 216 | Public License. 217 | 218 | 3. To the extent possible, the Licensor waives any right to 219 | collect royalties from You for the exercise of the Licensed 220 | Rights, whether directly or through a collecting society 221 | under any voluntary or waivable statutory or compulsory 222 | licensing scheme. In all other cases the Licensor expressly 223 | reserves any right to collect such royalties. 224 | 225 | 226 | Section 3 -- License Conditions. 227 | 228 | Your exercise of the Licensed Rights is expressly made subject to the 229 | following conditions. 230 | 231 | a. Attribution. 232 | 233 | 1. If You Share the Licensed Material (including in modified 234 | form), You must: 235 | 236 | a. retain the following if it is supplied by the Licensor 237 | with the Licensed Material: 238 | 239 | i. identification of the creator(s) of the Licensed 240 | Material and any others designated to receive 241 | attribution, in any reasonable manner requested by 242 | the Licensor (including by pseudonym if 243 | designated); 244 | 245 | ii. a copyright notice; 246 | 247 | iii. a notice that refers to this Public License; 248 | 249 | iv. a notice that refers to the disclaimer of 250 | warranties; 251 | 252 | v. a URI or hyperlink to the Licensed Material to the 253 | extent reasonably practicable; 254 | 255 | b. indicate if You modified the Licensed Material and 256 | retain an indication of any previous modifications; and 257 | 258 | c. indicate the Licensed Material is licensed under this 259 | Public License, and include the text of, or the URI or 260 | hyperlink to, this Public License. 261 | 262 | 2. You may satisfy the conditions in Section 3(a)(1) in any 263 | reasonable manner based on the medium, means, and context in 264 | which You Share the Licensed Material. For example, it may be 265 | reasonable to satisfy the conditions by providing a URI or 266 | hyperlink to a resource that includes the required 267 | information. 268 | 269 | 3. If requested by the Licensor, You must remove any of the 270 | information required by Section 3(a)(1)(A) to the extent 271 | reasonably practicable. 272 | 273 | b. ShareAlike. 274 | 275 | In addition to the conditions in Section 3(a), if You Share 276 | Adapted Material You produce, the following conditions also apply. 277 | 278 | 1. The Adapter's License You apply must be a Creative Commons 279 | license with the same License Elements, this version or 280 | later, or a BY-SA Compatible License. 281 | 282 | 2. You must include the text of, or the URI or hyperlink to, the 283 | Adapter's License You apply. You may satisfy this condition 284 | in any reasonable manner based on the medium, means, and 285 | context in which You Share Adapted Material. 286 | 287 | 3. You may not offer or impose any additional or different terms 288 | or conditions on, or apply any Effective Technological 289 | Measures to, Adapted Material that restrict exercise of the 290 | rights granted under the Adapter's License You apply. 291 | 292 | 293 | Section 4 -- Sui Generis Database Rights. 294 | 295 | Where the Licensed Rights include Sui Generis Database Rights that 296 | apply to Your use of the Licensed Material: 297 | 298 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 299 | to extract, reuse, reproduce, and Share all or a substantial 300 | portion of the contents of the database; 301 | 302 | b. if You include all or a substantial portion of the database 303 | contents in a database in which You have Sui Generis Database 304 | Rights, then the database in which You have Sui Generis Database 305 | Rights (but not its individual contents) is Adapted Material, 306 | 307 | including for purposes of Section 3(b); and 308 | c. You must comply with the conditions in Section 3(a) if You Share 309 | all or a substantial portion of the contents of the database. 310 | 311 | For the avoidance of doubt, this Section 4 supplements and does not 312 | replace Your obligations under this Public License where the Licensed 313 | Rights include other Copyright and Similar Rights. 314 | 315 | 316 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 317 | 318 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 319 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 320 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 321 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 322 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 323 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 324 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 325 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 326 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 327 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 328 | 329 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 330 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 331 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 332 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 333 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 334 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 335 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 336 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 337 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 338 | 339 | c. The disclaimer of warranties and limitation of liability provided 340 | above shall be interpreted in a manner that, to the extent 341 | possible, most closely approximates an absolute disclaimer and 342 | waiver of all liability. 343 | 344 | 345 | Section 6 -- Term and Termination. 346 | 347 | a. This Public License applies for the term of the Copyright and 348 | Similar Rights licensed here. However, if You fail to comply with 349 | this Public License, then Your rights under this Public License 350 | terminate automatically. 351 | 352 | b. Where Your right to use the Licensed Material has terminated under 353 | Section 6(a), it reinstates: 354 | 355 | 1. automatically as of the date the violation is cured, provided 356 | it is cured within 30 days of Your discovery of the 357 | violation; or 358 | 359 | 2. upon express reinstatement by the Licensor. 360 | 361 | For the avoidance of doubt, this Section 6(b) does not affect any 362 | right the Licensor may have to seek remedies for Your violations 363 | of this Public License. 364 | 365 | c. For the avoidance of doubt, the Licensor may also offer the 366 | Licensed Material under separate terms or conditions or stop 367 | distributing the Licensed Material at any time; however, doing so 368 | will not terminate this Public License. 369 | 370 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 371 | License. 372 | 373 | 374 | Section 7 -- Other Terms and Conditions. 375 | 376 | a. The Licensor shall not be bound by any additional or different 377 | terms or conditions communicated by You unless expressly agreed. 378 | 379 | b. Any arrangements, understandings, or agreements regarding the 380 | Licensed Material not stated herein are separate from and 381 | independent of the terms and conditions of this Public License. 382 | 383 | 384 | Section 8 -- Interpretation. 385 | 386 | a. For the avoidance of doubt, this Public License does not, and 387 | shall not be interpreted to, reduce, limit, restrict, or impose 388 | conditions on any use of the Licensed Material that could lawfully 389 | be made without permission under this Public License. 390 | 391 | b. To the extent possible, if any provision of this Public License is 392 | deemed unenforceable, it shall be automatically reformed to the 393 | minimum extent necessary to make it enforceable. If the provision 394 | cannot be reformed, it shall be severed from this Public License 395 | without affecting the enforceability of the remaining terms and 396 | conditions. 397 | 398 | c. No term or condition of this Public License will be waived and no 399 | failure to comply consented to unless expressly agreed to by the 400 | Licensor. 401 | 402 | d. Nothing in this Public License constitutes or may be interpreted 403 | as a limitation upon, or waiver of, any privileges and immunities 404 | that apply to the Licensor or You, including from the legal 405 | processes of any jurisdiction or authority. 406 | 407 | 408 | ======================================================================= 409 | 410 | Creative Commons is not a party to its public 411 | licenses. Notwithstanding, Creative Commons may elect to apply one of 412 | its public licenses to material it publishes and in those instances 413 | will be considered the “Licensor.” The text of the Creative Commons 414 | public licenses is dedicated to the public domain under the CC0 Public 415 | Domain Dedication. Except for the limited purpose of indicating that 416 | material is shared under a Creative Commons public license or as 417 | otherwise permitted by the Creative Commons policies published at 418 | creativecommons.org/policies, Creative Commons does not authorize the 419 | use of the trademark "Creative Commons" or any other trademark or logo 420 | of Creative Commons without its prior written consent including, 421 | without limitation, in connection with any unauthorized modifications 422 | to any of its public licenses or any other arrangements, 423 | understandings, or agreements concerning use of licensed material. For 424 | the avoidance of doubt, this paragraph does not form part of the 425 | public licenses. 426 | 427 | Creative Commons may be contacted at creativecommons.org. 428 | --------------------------------------------------------------------------------