GPT as a Monte Carlo Language Tree: A Probabilistic Perspective

├── assets
    ├── fig1.png
    ├── fig2.jpg
    ├── fig3.png
    ├── fig4.jpg
    └── fig5.jpg
└── README.md


/assets/fig1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig1.png


--------------------------------------------------------------------------------
/assets/fig2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig2.jpg


--------------------------------------------------------------------------------
/assets/fig3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig3.png


--------------------------------------------------------------------------------
/assets/fig4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig4.jpg


--------------------------------------------------------------------------------
/assets/fig5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig5.jpg


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <h1 align="center"> <a href="https://arxiv.org/pdf/2501.07641">GPT as a Monte Carlo Language Tree: A Probabilistic Perspective</a></h1>
 2 | <h5 align="center"> If you like our project, please give us a star ⭐ on GitHub for latest update.  </h5>
 3 | 
 4 | [![arXiv](https://img.shields.io/badge/Arxiv-2402.01830-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2501.07641) 
 5 | [![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](https://github.com/PKU-YuanGroup/GPT-as-Language-Tree/blob/main/LICENSE) 
 6 | 
 7 | ## 🤗 Brief Intro
 8 | <p align="center">
 9 | <img src="assets/fig1.png" width=100%>
10 | </p>
11 | 
12 | - We propose a novel perspective for analyzing the intelligent abilities of LLMs by representing both the 
13 | language dataset and the GPT models as Monte Carlo Language Trees, named Data-Tree $\theta^*$ and GPT-Tree $\hat\theta$, respectively. 
14 | 
15 | - Quantitative analysis demonstrate that **using existing language models to fit training data essentially 
16 | seeks a more efficient way to approximate the Data-Tree (_i.e._, $\hat\theta \rightarrow \theta^*$)**. Furthermore, 
17 | GPT-Trees generated by different language models trained on the same dataset exhibit a high degree of similarity.
18 | 
19 | - Our findings may confirm that the reasoning process in **LLMs is more likely to be probabilistic pattern-matching rather than formal reasoning**, as each model inference seems to find a
20 | context pattern with maximum probability from the Data-Tree.
21 | 
22 | - Our perspective of the Monte Carlo Language Tree can better explain and understand many existing counterintuitive
23 | phenomena, such as hallucinations, CoT and token bias.
24 | 
25 | ## 😮 Highlight
26 | - ###  The GPT Models Gradually Converge to the Data-Tree
27 | <p align="center">
28 | <img src="assets/fig2.jpg" width=100%>
29 | </p>
30 | 
31 | The tree visualization results of GPT-X series models and Data-Tree. Each row and column represents a different 
32 | model (or dataset) and token, respectively. These models are trained on the same large-scale 800GB dataset, _i.e._, 
33 | The Pile. **Different GPT-like language models trained on the same dataset have very high similarity in GPT-Tree 
34 | visualization, and gradually converge to the Data-Tree, especially on the If token** (the second column). This 
35 | similarity is not only reflected at the token level (node color of the tree), but also at the prediction probability
36 | level (edge width of the tree).
37 | 
38 | 
39 | <p align="center">
40 | <img src="assets/fig3.png" width=100%>
41 | </p>
42 | 
43 | Average MSE and Recall@5 of 26 words. The MSE measures the probability difference between the GPT-Tree and Data-Tree, 
44 | the smaller the value, the more similar it is. The Recall@5 measures how many GPT output tokens will be recalled by 
45 | Data-Tree. The results show that the GPT models gradually converge to the Data-Tree with increased parameter size.
46 | 
47 | - ### Understanding token bias phenomena from GPT-Tree
48 | <p align="center">
49 | <img src="assets/fig4.jpg" width=100%>
50 | </p>
51 | 
52 | We perturb the last token “.” into “。”, the model incorrectly answers as “43”. **We suggest that token bias is due to 
53 | some rare tokens inducing the GPT-Tree to infer the wrong path.** We further quantified this phenomenon by evaluating 
54 | the original (blue bars) and perturbed (orange bars) accuracy of different models in 21076 QA test pairs. The accuracy 
55 | of all models has significantly decreased after perturbing the last token “.” into “。”, suffering from the token bias 
56 | issues.
57 | 
58 | - ### Understanding the Effectiveness of Chain-of-Thought (CoT) from GPT-Tree
59 | <p align="center">
60 | <img src="assets/fig5.jpg" width=100%>
61 | </p>
62 | 
63 | For some complex issues, there is a significant gap between input X  and output Y , making it difficult for the GPT 
64 | model to directly derive Y from X. From the perspective of GPT-Tree, the input X is at a parent node, and the output
65 | Y is at a deep leaf node. The CoT focuses on finding a path Z to help the GPT model connect X and Y, attempting to 
66 | fill the I/O gap.
67 | 
68 | ## ✏️ Citation
69 | If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:.
70 | 
71 | ```BibTeX
72 | @article{ning2025gpt,
73 |   title={GPT as a Monte Carlo Language Tree: A Probabilistic Perspective},
74 |   author={Ning, Kun-Peng and Yao, Jia-Yu and Liu, Yu-Yang and Ning, Mu-Nan and Yuan, Li},
75 |   journal={arXiv preprint arXiv:2501.07641},
76 |   year={2025}
77 | }
78 | ```
79 | 


--------------------------------------------------------------------------------