├── assets
├── fig1.png
├── fig2.jpg
├── fig3.png
├── fig4.jpg
└── fig5.jpg
└── README.md
/assets/fig1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig1.png
--------------------------------------------------------------------------------
/assets/fig2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig2.jpg
--------------------------------------------------------------------------------
/assets/fig3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig3.png
--------------------------------------------------------------------------------
/assets/fig4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig4.jpg
--------------------------------------------------------------------------------
/assets/fig5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig5.jpg
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | If you like our project, please give us a star ⭐ on GitHub for latest update.
3 |
4 | [](https://arxiv.org/pdf/2501.07641)
5 | [](https://github.com/PKU-YuanGroup/GPT-as-Language-Tree/blob/main/LICENSE)
6 |
7 | ## 🤗 Brief Intro
8 |
9 |
10 |
11 |
12 | - We propose a novel perspective for analyzing the intelligent abilities of LLMs by representing both the
13 | language dataset and the GPT models as Monte Carlo Language Trees, named Data-Tree $\theta^*$ and GPT-Tree $\hat\theta$, respectively.
14 |
15 | - Quantitative analysis demonstrate that **using existing language models to fit training data essentially
16 | seeks a more efficient way to approximate the Data-Tree (_i.e._, $\hat\theta \rightarrow \theta^*$)**. Furthermore,
17 | GPT-Trees generated by different language models trained on the same dataset exhibit a high degree of similarity.
18 |
19 | - Our findings may confirm that the reasoning process in **LLMs is more likely to be probabilistic pattern-matching rather than formal reasoning**, as each model inference seems to find a
20 | context pattern with maximum probability from the Data-Tree.
21 |
22 | - Our perspective of the Monte Carlo Language Tree can better explain and understand many existing counterintuitive
23 | phenomena, such as hallucinations, CoT and token bias.
24 |
25 | ## 😮 Highlight
26 | - ### The GPT Models Gradually Converge to the Data-Tree
27 |
28 |
29 |
30 |
31 | The tree visualization results of GPT-X series models and Data-Tree. Each row and column represents a different
32 | model (or dataset) and token, respectively. These models are trained on the same large-scale 800GB dataset, _i.e._,
33 | The Pile. **Different GPT-like language models trained on the same dataset have very high similarity in GPT-Tree
34 | visualization, and gradually converge to the Data-Tree, especially on the If token** (the second column). This
35 | similarity is not only reflected at the token level (node color of the tree), but also at the prediction probability
36 | level (edge width of the tree).
37 |
38 |
39 |
40 |
41 |
42 |
43 | Average MSE and Recall@5 of 26 words. The MSE measures the probability difference between the GPT-Tree and Data-Tree,
44 | the smaller the value, the more similar it is. The Recall@5 measures how many GPT output tokens will be recalled by
45 | Data-Tree. The results show that the GPT models gradually converge to the Data-Tree with increased parameter size.
46 |
47 | - ### Understanding token bias phenomena from GPT-Tree
48 |
49 |
50 |
51 |
52 | We perturb the last token “.” into “。”, the model incorrectly answers as “43”. **We suggest that token bias is due to
53 | some rare tokens inducing the GPT-Tree to infer the wrong path.** We further quantified this phenomenon by evaluating
54 | the original (blue bars) and perturbed (orange bars) accuracy of different models in 21076 QA test pairs. The accuracy
55 | of all models has significantly decreased after perturbing the last token “.” into “。”, suffering from the token bias
56 | issues.
57 |
58 | - ### Understanding the Effectiveness of Chain-of-Thought (CoT) from GPT-Tree
59 |
60 |
61 |
62 |
63 | For some complex issues, there is a significant gap between input X and output Y , making it difficult for the GPT
64 | model to directly derive Y from X. From the perspective of GPT-Tree, the input X is at a parent node, and the output
65 | Y is at a deep leaf node. The CoT focuses on finding a path Z to help the GPT model connect X and Y, attempting to
66 | fill the I/O gap.
67 |
68 | ## ✏️ Citation
69 | If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:.
70 |
71 | ```BibTeX
72 | @article{ning2025gpt,
73 | title={GPT as a Monte Carlo Language Tree: A Probabilistic Perspective},
74 | author={Ning, Kun-Peng and Yao, Jia-Yu and Liu, Yu-Yang and Ning, Mu-Nan and Yuan, Li},
75 | journal={arXiv preprint arXiv:2501.07641},
76 | year={2025}
77 | }
78 | ```
79 |
--------------------------------------------------------------------------------