├── assets ├── fig1.png ├── fig2.jpg ├── fig3.png ├── fig4.jpg └── fig5.jpg └── README.md /assets/fig1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig1.png -------------------------------------------------------------------------------- /assets/fig2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig2.jpg -------------------------------------------------------------------------------- /assets/fig3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig3.png -------------------------------------------------------------------------------- /assets/fig4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig4.jpg -------------------------------------------------------------------------------- /assets/fig5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKU-YuanGroup/GPT-as-Language-Tree/HEAD/assets/fig5.jpg -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

GPT as a Monte Carlo Language Tree: A Probabilistic Perspective

2 |
If you like our project, please give us a star ⭐ on GitHub for latest update.
3 | 4 | [![arXiv](https://img.shields.io/badge/Arxiv-2402.01830-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2501.07641) 5 | [![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](https://github.com/PKU-YuanGroup/GPT-as-Language-Tree/blob/main/LICENSE) 6 | 7 | ## 🤗 Brief Intro 8 |

9 | 10 |

11 | 12 | - We propose a novel perspective for analyzing the intelligent abilities of LLMs by representing both the 13 | language dataset and the GPT models as Monte Carlo Language Trees, named Data-Tree $\theta^*$ and GPT-Tree $\hat\theta$, respectively. 14 | 15 | - Quantitative analysis demonstrate that **using existing language models to fit training data essentially 16 | seeks a more efficient way to approximate the Data-Tree (_i.e._, $\hat\theta \rightarrow \theta^*$)**. Furthermore, 17 | GPT-Trees generated by different language models trained on the same dataset exhibit a high degree of similarity. 18 | 19 | - Our findings may confirm that the reasoning process in **LLMs is more likely to be probabilistic pattern-matching rather than formal reasoning**, as each model inference seems to find a 20 | context pattern with maximum probability from the Data-Tree. 21 | 22 | - Our perspective of the Monte Carlo Language Tree can better explain and understand many existing counterintuitive 23 | phenomena, such as hallucinations, CoT and token bias. 24 | 25 | ## 😮 Highlight 26 | - ### The GPT Models Gradually Converge to the Data-Tree 27 |

28 | 29 |

30 | 31 | The tree visualization results of GPT-X series models and Data-Tree. Each row and column represents a different 32 | model (or dataset) and token, respectively. These models are trained on the same large-scale 800GB dataset, _i.e._, 33 | The Pile. **Different GPT-like language models trained on the same dataset have very high similarity in GPT-Tree 34 | visualization, and gradually converge to the Data-Tree, especially on the If token** (the second column). This 35 | similarity is not only reflected at the token level (node color of the tree), but also at the prediction probability 36 | level (edge width of the tree). 37 | 38 | 39 |

40 | 41 |

42 | 43 | Average MSE and Recall@5 of 26 words. The MSE measures the probability difference between the GPT-Tree and Data-Tree, 44 | the smaller the value, the more similar it is. The Recall@5 measures how many GPT output tokens will be recalled by 45 | Data-Tree. The results show that the GPT models gradually converge to the Data-Tree with increased parameter size. 46 | 47 | - ### Understanding token bias phenomena from GPT-Tree 48 |

49 | 50 |

51 | 52 | We perturb the last token “.” into “。”, the model incorrectly answers as “43”. **We suggest that token bias is due to 53 | some rare tokens inducing the GPT-Tree to infer the wrong path.** We further quantified this phenomenon by evaluating 54 | the original (blue bars) and perturbed (orange bars) accuracy of different models in 21076 QA test pairs. The accuracy 55 | of all models has significantly decreased after perturbing the last token “.” into “。”, suffering from the token bias 56 | issues. 57 | 58 | - ### Understanding the Effectiveness of Chain-of-Thought (CoT) from GPT-Tree 59 |

60 | 61 |

62 | 63 | For some complex issues, there is a significant gap between input X and output Y , making it difficult for the GPT 64 | model to directly derive Y from X. From the perspective of GPT-Tree, the input X is at a parent node, and the output 65 | Y is at a deep leaf node. The CoT focuses on finding a path Z to help the GPT model connect X and Y, attempting to 66 | fill the I/O gap. 67 | 68 | ## ✏️ Citation 69 | If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:. 70 | 71 | ```BibTeX 72 | @article{ning2025gpt, 73 | title={GPT as a Monte Carlo Language Tree: A Probabilistic Perspective}, 74 | author={Ning, Kun-Peng and Yao, Jia-Yu and Liu, Yu-Yang and Ning, Mu-Nan and Yuan, Li}, 75 | journal={arXiv preprint arXiv:2501.07641}, 76 | year={2025} 77 | } 78 | ``` 79 | --------------------------------------------------------------------------------