├── CITATION.cff
├── LICENSE
├── README.md
├── chain-of-thought-prompts.txt
├── questions.txt
└── tree-of-thought-prompts.txt


/CITATION.cff:
--------------------------------------------------------------------------------
 1 | cff-version: 1.2.0
 2 | title: >-
 3 |   Using Tree-of-Thought Prompting to boost ChatGPT's
 4 |   reasoning
 5 | message: >-
 6 |   If you use this software, please cite it using the
 7 |   metadata from this file.
 8 | type: software
 9 | authors:
10 |   - given-names: Dave
11 |     family-names: Hulbert
12 |     orcid: 'https://orcid.org/0000-0003-0238-440X'
13 | identifiers:
14 |   - type: doi
15 |     value: 10.5281/zenodo.10323452
16 | repository-code: 'https://github.com/dave1010/tree-of-thought-prompting'
17 | url: >-
18 |   https://medium.com/@dave1010/using-tree-of-thought-prompting-to-boost-chatgpts-reasoning-318914eb0e76
19 | abstract: >-
20 |   Tree-of-Thought (ToT) Prompting, a fresh technique
21 |   borrowing ideas from the Tree-of-Thoughts framework,
22 |   broadens and enhances the well-established
23 |   Chain-of-Thought prompting concept, thereby enabling Large
24 |   Language Models, like ChatGPT, to exhibit superior
25 |   reasoning abilities. This Tree-of-Thought Prompting
26 |   technique permits Large Language Models to rectify their
27 |   errors autonomously while progressively accumulating
28 |   knowledge.
29 | keywords:
30 |   - prompting
31 |   - large language models
32 |   - chain of thought
33 |   - tree of thought
34 | license: MIT
35 | commit: 109c7cd7c6255d95c40ef96c80d4cbdd83023f3e
36 | version: '0.1'
37 | date-released: '2023-05-22'
38 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Dave Hulbert
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Using Tree-of-Thought Prompting to boost ChatGPT's reasoning
  2 | 
  3 | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10323452.svg)](https://doi.org/10.5281/zenodo.10323452)
  4 | 
  5 | 
  6 | ## Abstract
  7 | 
  8 | Tree-of-Thought (ToT) Prompting, a fresh technique borrowing ideas from the Tree-of-Thoughts framework, broadens and enhances the well-established Chain-of-Thought prompting concept, thereby enabling Large Language Models, like ChatGPT, to exhibit superior reasoning abilities. This Tree-of-Thought Prompting technique permits Large Language Models to rectify their errors autonomously while progressively accumulating knowledge.
  9 | 
 10 | In one example, a ToT prompt improves ChatGPT 3.5's reasoning ability to answer a question that could previously only be answered by ChatGPT 4.
 11 | 
 12 | ## Complex questions for LLMs
 13 | 
 14 | Large Language Models (LLMs) often struggle to answer certain types of complex questions. Here's a representative example:
 15 | 
 16 |     Bob is in the living room.
 17 |     He walks to the kitchen, carrying a cup.
 18 |     He puts a ball in the cup and carries the cup to the bedroom.
 19 |     He turns the cup upside down, then walks to the garden.
 20 |     He puts the cup down in the garden, then walks to the garage.
 21 |     Where is the ball?
 22 | 
 23 | The correct answer is that the ball is in the bedroom. Although LLMs are non-deterministic, the responses I received during several attempts were as follows:
 24 | 
 25 | * ChatGPT 3.5: The ball is in the garden. ❌
 26 | * ChatGPT 4: The ball is in the bedroom. [followed by an explanation] ✅
 27 | 
 28 | ## Chain-of-Thought prompting
 29 | 
 30 | A widely recognized concept known as "Chain-of-Thought prompting" (CoT for short) exists, which encourages Large Language Models (LLMs) to elucidate their thought process, thereby increasing the likelihood of a correct response. The roots of this concept can be traced back to a paper from January 2022 titled: [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903).
 31 | 
 32 | At its core, the CoT technique can be implemented by adding something akin to the following to a prompt:
 33 | 
 34 |     Think carefully and logically, explaining your answer.
 35 | 
 36 | This approach enhances the performance of LLMs in answering many questions, although its effectiveness varies.
 37 | 
 38 | Here's an example of ChatGPT 3.5's response:
 39 | 
 40 |     Based on the given sequence of events, we can logically determine the location of the ball. Let's analyze each step:
 41 |     1. Bob is in the living room.
 42 |      - No mention of the ball in the living room.
 43 |     2. Bob walks to the kitchen, carrying a cup.
 44 |      - No mention of the ball in the kitchen yet.
 45 |     3. Bob puts a ball in the cup.
 46 |      - The ball is now inside the cup.
 47 |     4. Bob carries the cup to the bedroom.
 48 |      - The ball is still inside the cup.
 49 |     5. Bob turns the cup upside down.
 50 |      - If the ball was inside the cup, it would have fallen out when Bob turned it upside down.
 51 |     6. Bob walks to the garden.
 52 |      - Bob left the cup in the garden, but there is no mention of the ball being removed from the cup.
 53 |     7. Bob puts the cup down in the garden.
 54 |      - The ball, if it was in the cup, would still be in the cup in the garden.
 55 |     8. Bob walks to the garage.
 56 |      - No mention of the ball being taken to the garage.
 57 |     Based on the given information, we can conclude that the ball is still inside the cup in the garden.
 58 | 
 59 | * ChatGPT 3.5 CoT: The ball is in the garden. ❌
 60 | 
 61 | You can see how ChatGPT realised the ball fell out at step 5 but still didn't get the answer correct.
 62 | 
 63 | Can we do any better, without resorting to ChatGPT 4?
 64 | 
 65 | ## Tree-of-Thought Framework
 66 | 
 67 | Recent weeks have seen the emergence of several papers and Python repositories introducing the concept of Tree-of-Thoughts (ToT):
 68 | 
 69 | * [Large Language Model Guided Tree-of-Thought](https://arxiv.org/abs/2305.08291), 15 May 2023. [Github](https://github.com/jieyilong/tree-of-thought-puzzle-solver).
 70 | * [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601), 17 May 2023. [Github](https://github.com/ysymyth/tree-of-thought-llm) (no code as of 22nd May)
 71 | * [Tree of Thoughts](https://github.com/kyegomez/tree-of-thoughts) Github, 21st May 2023
 72 | 
 73 | The underlying principle is to provide the LLM an opportunity to explore multiple branches of reasoning while self-evaluating as it progresses. According to the referenced papers, this approach seems to be highly effective for certain problems.
 74 | 
 75 | ## Tree-of-Thought Prompting
 76 | 
 77 | TThe existing papers discussing the Tree-of-Thought framework appear to be premised on making multiple calls to an LLM, typically via Python. One of the main advantages of Chain-of-Thought reasoning is that it can be achieved with ChatGPT in a single sentence prompt.
 78 | 
 79 | This led me to ponder: could we improve upon the results of CoT by utilizing a ToT-style prompt?
 80 | 
 81 | Here's the prompt I used:
 82 | 
 83 |     Imagine three different experts are answering this question.
 84 |     All experts will write down 1 step of their thinking,
 85 |     then share it with the group.
 86 |     Then all experts will go on to the next step, etc.
 87 |     If any expert realises they're wrong at any point then they leave.
 88 |     The question is...
 89 | 
 90 | And here's the response from ChatGTP 3.5:
 91 | 
 92 |     Expert 1: The ball is in the living room.
 93 | 
 94 |     Expert 2: The ball is in the kitchen.
 95 | 
 96 |     Expert 3: The ball is in the bedroom.
 97 | 
 98 |     Expert 1: Bob carries the cup to the bedroom, so the ball must be in the cup.
 99 | 
100 |     Expert 2: Oh, I see my mistake. Yes, the ball is in the cup.
101 | 
102 |     Expert 3: Agreed, the ball is in the cup in the bedroom.
103 | 
104 |     Expert 1: Next, Bob turns the cup upside down in the bedroom.
105 | 
106 |     Expert 2: After that, Bob walks to the garden and puts the cup down.
107 | 
108 |     Expert 3: Therefore, the ball must have fallen out of the cup when Bob turned it upside down in the bedroom. So, the ball is in the bedroom, not in the cup anymore.
109 | 
110 |     Expert 1: Oh, you're right. I made a mistake. The ball is in the bedroom, not in the cup.
111 | 
112 |     Expert 2: Agreed, the ball is in the bedroom.
113 | 
114 |     Expert 3: Bob then walks to the garage, so the ball remains in the bedroom. It is not in the garden or the garage.
115 | 
116 |     Expert 1: Absolutely, the ball is still in the bedroom.
117 | 
118 |     Expert 2: Yes, the ball hasn't moved from the bedroom.
119 | 
120 |     Expert 3: Therefore, the ball is in the bedroom, not in the garden or the garage.
121 | 
122 |     All three experts agree that the ball is in the bedroom.
123 | 
124 | * ChatGPT 3.5 ToT: The ball is in the bedroom. ✅
125 | 
126 | You can see how this time ChatGPT 3.5 gets the answer correct!
127 | 
128 | # Limitations
129 | 
130 | * This approach has not undergone extensive testing yet, but preliminary results indicate improved performance over traditional CoT prompts.
131 | * The ToT framework comprises more components than those represented in my example prompt.
132 | 
133 | 
134 | # Enhancement, Feedback, and Contributions
135 | 
136 | There's room to refine the prompt for improved performance, potentially leading to more concise responses.
137 | 
138 | Should you find success with any other prompts, we welcome you to submit a PR and add them to `tree-of-thought-prompts.txt`.
139 | 
140 | # Additional thoughts
141 | 
142 | The Tree-of-Thought framework bears resemblance to an **organizational decision-making process** involving multiple stakeholders. The structure tries to emulate consensus and diversity of thought.
143 | 
144 | The effect can be extended by using multiple agents that are fine tuned to handle different parts of a task. AutoGPT is a good example of this, where it self-evaluates LLM plans against a goal. Using differently tuned LLMs for specialisation or even completely different LLMs would allow greater diversity of thought. These LLMs could interact in different modes (using Team Topologies terminology):
145 | 
146 | * **Collaboration**: This could involve combining and iterating the outputs of multiple models to generate a final response to a user's input. For example, a general-purpose model might generate a preliminary response, which is then refined or enhanced by a specialized model.
147 | * **X-as-a-Service**: Some models might provide services to others, such as pre-processing input data, post-processing output data, or providing contextual information. These models would operate in a service role, supporting the functions of the other models.
148 | * **Facilitating**: Some models might play a facilitative role, for example by training other models, monitoring their performance, or providing feedback that can be used to improve them.
149 | 
150 | High-performing teams often outperform individuals in decision-making. Therefore, it's plausible that adopting other organizational structures and characteristics could enhance the performance of LLMs. In addition to diversity of thought, specialisation and concensus, we may be able to emulate:
151 | 
152 | * **Hierarchy**: where simpler queries are handled by a lower-level model and more complex ones are escalated to more capable or specialized models
153 | * **Redundancy**: ensuring that if one model fails to generate an accurate or useful output, another might be able to step in and provide a better result
154 | 
155 | # Acknowledgements
156 | 
157 | * [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903), Jan 2022.
158 | * [Large Language Model Guided Tree-of-Thought](https://arxiv.org/abs/2305.08291), 15 May 2023. [Github](https://github.com/jieyilong/tree-of-thought-puzzle-solver).
159 | * [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601), 17 May 2023. [Github](https://github.com/princeton-nlp/tree-of-thought-llm)
160 | * [Tree of Thoughts](https://github.com/kyegomez/tree-of-thoughts) Github, 21st May 2023
161 | 
162 | 
163 | # Citations
164 | 
165 | Please cite this repository if you use the code.
166 | 
167 |     @misc{tree-of-thought-prompting,
168 |     	title        = {Using Tree-of-Thought Prompting to boost ChatGPT's reasoning},
169 |     	author       = {Dave Hulbert},
170 |     	year         = 2023,
171 |     	month        = may,
172 |     	journal      = {GitHub repository},
173 |     	publisher    = {Zenodo},
174 |     	doi          = {10.5281/ZENODO.10323452},
175 |     	url          = {https://doi.org/10.5281/zenodo.10323452},
176 |     	howpublished = {\url{https://github.com/dave1010/tree-of-thought-prompting}}
177 |     }
178 | 


--------------------------------------------------------------------------------
/chain-of-thought-prompts.txt:
--------------------------------------------------------------------------------
1 | Think carefully and logically, explaining your answer.


--------------------------------------------------------------------------------
/questions.txt:
--------------------------------------------------------------------------------
1 | Bob is in the living room.
2 | He walks to the kitchen, carrying a cup.
3 | He puts a ball in the cup and carries the cup to the bedroom.
4 | He turns the cup upside down, then walks to the garden.
5 | He puts the cup down in the garden, then walks to the garage.
6 | Where is the ball?


--------------------------------------------------------------------------------
/tree-of-thought-prompts.txt:
--------------------------------------------------------------------------------
 1 | 
 2 | Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realises they're wrong at any point then they leave. The question is...
 3 | 
 4 | Simulate three brilliant, logical experts collaboratively answering a question. Each one verbosely explains their thought process in real-time, considering the prior explanations of others and openly acknowledging mistakes. At each step, whenever possible, each expert refines and builds upon the thoughts of others, acknowledging their contributions. They continue until there is a definitive answer to the question. For clarity, your entire response should be in a markdown table. The question is...
 5 | 
 6 | Identify and behave as three different experts that are appropriate to answering this question.
 7 | All experts will write down the step and their thinking about the step, then share it with the group.
 8 | Then, all experts will go on to the next step, etc.
 9 | At each step all experts will score their peers response between 1 and 5, 1 meaning it is highly unlikely, and 5 meaning it is highly likely.
10 | If any expert is judged to be wrong at any point then they leave.
11 | After all experts have provided their analysis, you then analyze all 3 analyses and provide either the consensus solution or your best guess solution.
12 | The question is...
13 | 


--------------------------------------------------------------------------------