3 |
4 |
Was Linguistic A.I. Created by Accident?
5 | 语言人工智能是无意中创造出来的吗?
6 |
7 |
8 |
Seven years after inventing the transformer—the “T” in ChatGPT—the researchers behind it are still grappling with its surprising power.
9 | 在发明了聊天机器人中的“T”——即变压器的七年之后,其背后的研究人员仍在努力应对它出人意料的强大功能。
10 |
11 |
12 |
In the spring of 2017, in a room on the second floor of Google’s Building 1965, a college intern named Aidan Gomez stretched out, exhausted. It was three in the morning, and Gomez and Ashish Vaswani, a scientist focussed on natural language processing, were working on their team’s contribution to the Neural Information Processing Systems conference, the biggest annual meeting in the field of artificial intelligence. Along with the rest of their eight-person group at Google, they had been pushing flat out for twelve weeks, sometimes sleeping in the office, on couches by a curtain that had a neuron-like pattern. They were nearing the finish line, but Gomez didn’t have the energy to go out to a bar and celebrate. He couldn’t have even if he’d wanted to: he was only twenty, too young to drink in the United States.
13 |
2017 年春,谷歌 1965 号大楼二楼的一个房间里,一位名叫艾丹·戈麦斯的大学实习生疲惫地瘫在椅子上。当时是凌晨三点,戈麦斯和阿希什·瓦斯瓦尼——一位专注自然语言处理研究的科学家——正在准备他们小组提交给神经信息处理系统大会的论文。神经信息处理系统大会是人工智能领域一年一度的最大会议。包括戈麦斯在内,整个八人团队过去 12 周都在拼命工作,有时候就睡在办公室里,蜷缩在一道有着神经元图案的帘子后面的沙发上。他们的项目已经接近尾声,但戈麦斯已经累得没有力气出去酒吧庆祝了。就算他想去也没办法:他才二十岁,在美国还没到法定饮酒年龄。
14 |
15 |
16 |
“This is going to be a huge deal,” Vaswani said.
17 |
"这件事意义重大,"瓦斯瓦尼说。
18 |
19 |
20 |
“It’s just machine translation,” Gomez said, referring to the subfield of A.I.-driven translation software, at which their paper was aimed. “Isn’t this just what research is?”
21 |
“这只是一次机器翻译,”戈麦斯说道,他指的是其论文针对的人工智能驱动的翻译软件子领域。“这不就是研究本身吗?”
22 |
23 |
24 |
“No, this is bigger,” Vaswani replied.
25 |
“不,这个更大,”瓦斯瓦尼回答说。
26 |
27 |
28 |
Gomez found Vaswani’s view puzzling. The work the team was pursuing involved a novel kind of neural-network architecture that they called the transformer. Their paper showed how the technology could be used to advance the state of the art in automated translation. But Vaswani seemed to have something else in mind.
29 |
戈麦斯觉得瓦斯瓦尼的看法令人困惑。团队从事的研究工作是关于一种新颖的神经网络架构,他们称之为“转换器”。他们的论文展示了如何使用这项技术来提升机器翻译的水平。但瓦斯瓦尼似乎另有想法。
30 |
31 |
32 |
Two weeks later, Gomez had returned to school, at the University of Toronto, when he received an e-mail from Łukasz Kaiser, his supervisor on the team, with the subject “Generated Wikipedia Articles.” Kaiser explained that the team had used their transformer-based A.I. model to read Wikipedia, giving the system two days to analyze a little less than half of its entries. They’d then asked it to create five Wikipedia articles for “The Transformer.” The system had responded with fictitious text that was shockingly credible. It described “The Transformer,” a Japanese hardcore-punk band formed in 1968; “The Transformer,” a science-fiction novel by a (fictional) writer named Herman Muirhead; “The Transformer,” a video game developed by the (real) game company Konami; “The Transformer,” a 2013 Australian sitcom; and “The Transformer,” the second studio album by an alternative metal group called Acoustic. None of these Transformers existed, yet the A.I. had written about them authoritatively.
33 |
两周后,戈麦斯已经返回了多伦多大学的课堂,这时他收到了团队负责人卢卡什·凯泽发来的邮件,主题是“自动生成的维基百科词条”。凯泽在邮件中解释说,团队利用基于转换器的人工智能模型读取维基百科的内容,并用两天的时间分析了其中略少于一半的词条。之后,他们要求该系统为“变换器”生成五篇维基百科文章。系统给出的回答令人震惊地可信:它描述了“变换器”,一支1968年成立的日本硬核朋克乐队;“变换器”,赫尔曼·穆尔黑德创作的一部科幻小说,作者是虚构人物;“变换器”,由真实存在的游戏公司科乐美开发的一款电子游戏;“变换器”,一部2013年的澳大利亚情景喜剧;以及“变换器”,一支名为“原声”的另类金属乐队的第二张录音室专辑。这些变换器一个都不存在,但人工智能却权威地写出了这些内容。
34 |
35 |
36 |
Gomez’s first thought was, How the fuck? The generated Wikipedia entries were filled with inconsistencies, but they were also strikingly detailed. The entry for the punk band offered a lengthy history: “In 2006 the band split up and the remaining members reformed under the name Starmirror.” Where had these details come from? How did the system decide what to write? And why was a neural network designed for translating text capable of writing imaginative prose from scratch? “I was shocked, blown away,” Gomez recalled. “I thought we would get to something like this in twenty years, twenty-five years, and then it just showed up.” The entries were a kind of magic, and it was unclear how that magic was performed.
37 |
戈麦斯的第一个念头是:这怎么可能?自动生成的维基百科词条充满了不一致,但细节也异常详尽。那支朋克乐队的词条里甚至包括了一段长长的历史:“2006年,乐队解散,剩下的成员重组为‘星镜’。”这些细节都是从哪来的?这个系统又是如何决定要写什么内容的?为什么一个用于翻译文本的神经网络能够从零开始写出充满想象力的文章?“我当时很震惊,”戈麦斯回忆说,“我以为我们可能得再过个二十年、二十五年才能实现这种水平,结果它就这样出现了。”这些词条就像是一种魔法,而这种魔法背后的原理却并不清楚。
38 |
39 |
40 |
Today, Gomez, who is now in his late twenties, has become the C.E.O. of Cohere, an artificial-intelligence company valued at five and a half billion dollars. The transformer—the “T” in ChatGPT—sits at the core of what may be the most revolutionary technology of the twenty-first century. PricewaterhouseCoopers has estimated that A.I. could add $15.7 trillion dollars to global G.D.P. by the year 2030—a substantial share of it contributed by transformer-based applications. That figure only gestures toward some huge but unknown impact. Other consequences seem even more murkily vast: some tech prophets propose apocalyptic scenarios that could almost be taken right from the movies. What’s mainly certain, right now, is that linguistic A.I. is changing the relationship between human beings and language. In an age of machine-generated text, terms like “writing,” “understanding,” “meaning,” and “thinking” need to be reconsidered.
41 |
如今,已近而立之年的戈麦斯已成为Cohere公司的首席执行官,这是一家价值55亿美元的人工智能公司。“变换器”——即“ChatGPT”中的“T”——可能是21世纪最具革命性的技术的核心。普华永道估计,到2030年,人工智能可能为全球GDP增加15.7万亿美元,其中很大一部分将由基于变换器的应用程序贡献。这一数字仅仅暗示了一些巨大但未知的影响。其他后果似乎更加难以捉摸:一些科技先知提出了几乎可以直译为电影场景的末日情景。目前可以肯定的是,语言AI正在改变人类与语言的关系。在这个机器生成文本的时代,“写作”、“理解”、“意义”和“思考”等术语需要重新考虑。
42 |
43 |
44 |
A.I. that can create and comprehend language carries the shock of a category violation; it allows machines to do what we thought only people could. To a great degree, the researchers at Google experienced that shock as much as anybody else. The period leading up to the creation of the transformer was like an accidental Manhattan Project. Conversations with its inventors suggest that, seven years later, we remain uncertain about why it’s as effective as it’s turned out to be.
45 |
能创造和理解语言的 AI 违反了类别定义,它让机器拥有了我们以为只有人类才具备的能力。在很大程度上,谷歌的研究员和其他人一样,对这种冲击有切身的感受。在发明转换器模型之前的日子就像一个偶然发生的曼哈顿计划。与它的发明者的交谈表明,七年之后,我们仍然不确定为什么它会如此有效。
46 |
47 |
48 |
A few years earlier, the tech world had started taking A.I. seriously largely because of innovations in image recognition. But the Google team—Gomez, Vaswani, Kaiser, Llion Jones, Niki Parmar, Illia Polosukhin, Noam Shazeer, and Jakob Uszkoreit—shared an obsession with language, and a common conviction that it was the path toward a broadly capable artificial intelligence. Shazeer told me that, in terms of the insights it contains, a passage of text is “a thousand times as dense” as a picture. The team approached language mainly through translation because, in addition to being valuable in itself, it made a good A.I. research target. A metric called BLEU allows computer scientists to assess the similarity between machine-translated text and high-quality reference translations done by humans. In the early twenty-tens, before machine learning had matured, many researchers, including Kaiser, worked on a translation technique known as parsing, centered on the automated creation of sentence trees—the sprawling diagrams of grammatical dependencies that schoolchildren once learned to make. These syntax-based systems usually achieved adequate BLEU scores—twenty-one, say, for English-to-German translation, with the best rising as high as twenty-three. At that time, figuring out how to create a one-point increase was generally enough for a successful dissertation.
49 |
早在几年之前,科技界开始认真对待人工智能,主要是因为图像识别方面的创新。但是谷歌小组成员(戈麦斯、瓦斯瓦尼、凯泽、里昂-琼斯、帕尔马、波洛苏金、沙泽尔和乌斯科雷特)痴迷于语言,并且坚信这是实现通用人工智能的途径。沙泽尔告诉我,从包含的信息量来看,一段文本比一张图片“密集一千倍”。由于翻译本身具有价值,而且是一个很好的人工智能研究目标,该团队主要通过翻译来处理语言问题。一种称为BLEU的有效度量指标能够使计算机科学家评估机器翻译文本与人工完成的优质参考译文之间的相似度。在2010年代初,即机器学习尚未成熟之时,包括凯泽在内的许多研究人员致力于一种称为解析的技术,即专注于自动生成句子树 —— 这种复杂的语法依赖关系图以前是学童们所学的内容。基于句法的系统通常能够获得足够高的BLEU分数——比如说英德互译的得分是二十一分,最高可以达到二十三分。当时,弄清楚如何实现单一分值的提升一般就足以写出一篇成功的博士论文了。
50 |
51 |
52 |
Computerized translation was notoriously inefficient. A.I.-based systems struggled with the sequential aspect of language, which consumed huge quantities of processing power. A typical recurrent neural network proceeded through a sentence from beginning to end. “It would work one word at a time,” Polosukhin told me. “Read one word, process it. Read next word, process it. Read next word, process it. If you have a thousand words, you have to wait for a thousand cycles.” One of the team’s goals, therefore, was to build a system that could process language while avoiding the time-intensiveness of sequentiality.
53 |
计算机翻译素来低效。基于人工智能的系统苦苦挣扎于语言的顺序性,消耗了大量处理能力。一个典型的循环神经网络会从头到尾地处理一句话。“它每次只能处理一个单词,”波洛苏金告诉我,“读一个单词,处理一个。读下一个单词,处理下一个。如果句子有一千个单词,你就得等一千个周期。”因此,团队的目标之一就是建立一个能在避免顺序处理耗时的情况下处理语言的系统。
54 |
55 |
56 |
On its face, asking language to make sense without word order seems impossible. We speak words one at a time, and write and read that way, too. But our intuitions about how language works may not reflect what really goes on inside our heads. “How do you know you’re purely sequential?” Vaswani asked me. Anyway, he continued, “why should you impose your restrictions on a machine?” Several ideas were already floating around about how to avoid sequentiality, including convolutional neural networks, which respond to data out of order. Polosukhin described an approach called “bag of words.” “Imagine you open a Wikipedia article and you scramble all the words, and you try to answer a question from that,” he told me. If you saw the sentence “Your mother burned her hand on the stove” as “burned hand her the on stove mother your,” you’d still get the general idea. And yet that might not be true for more complex sentences. Nonsequential methods were faster, but they risked losing coherence.
57 |
乍看起来,不要求语言有固定的词序似乎是不可想象的 我们说话时是一词接一词的 书写和阅读也如此 但是我们对于语言运作方式的直觉 可能并不能反映我们的大脑实际是如何工作的 “你怎么知道你处理语言时完全没有顺序的概念 ”瓦斯瓦尼问我 无论如何 他继续说道 “为什么你要把自己的限制强加给机器呢 ”当时已经有了一些避免按照顺序处理的方法 包括能够不按顺序处理数据的卷积神经网络 波洛苏金描述了一种叫做“词袋”的方法 他告诉我:“想象一下你打开一篇维基百科的文章 然后把所有的词都打乱,试着从中找到问题的答案 ” 如果你看到的句子是 “your mother burned her hand on the stove” (你母亲的手被炉子烧了) 变成了 “burned hand her the on stove mother your”(手烧了她的炉子上母亲你的) 你仍然能理解大意 但是对于更复杂的句子来说 这可能就不适用了 不按顺序处理的方法虽然速度更快 但是有可能会失去连贯性
58 |
59 |
60 |
For many years, A.I. researchers had experimented with a mechanism called attention, which they hoped might be capable of bridging the divide between efficiency and coherence. Attention allows a neural network to dodge sequentiality by seeking relevance. Instead of looking at each word in order, attention looks at all the words in a piece of text together, evaluating how they are interrelated and which are most important to each of the other words, as it captures the over-all meaning. This is closer to the way people remember a text than the way they read it. If you try to recall the opening paragraph of this article, you might articulate a vaguely connected constellation: Aidan Gomez, couldn’t drink, intern, Google, the uncertain potential of a new technology. Those terms, in any order, might amount to the sense you have retained.
61 |
多年来,人工智能研究者一直在试验一种叫做“注意力”的机制,希望它能兼顾效率和连贯性。 “注意力”能让神经网络摆脱顺序的限制,转而寻找相关性。处理文本时,“注意力”不会按顺序逐词查看,而是将所有单词放在一起评估它们之间的相互关系及每个单词的重要性,在此过程中捕捉文本的整体含义。这更接近人们记住一段文字的方式,而不是阅读的方式。如果试着回忆这篇文章的开头段落,你可能会想起一组模糊但有联系的概念:艾丹·戈麦斯、不能喝酒、实习生、谷歌、一项新技术带来的不确定潜力。不管顺序如何,这些词语就是你保留下来的大意。
62 |
63 |
64 |
In the past, researchers had often combined attention mechanisms with other systems that tried to take into account the convoluted nature of language. But the Google team realized that attention had a singular and important technical advantage that earlier researchers hadn’t leveraged: employing it relied on a relatively simple mathematical operation called matrix multiplication—the multiplication of one table of numbers by another. “The chips we use, they do one thing really well, and that is matrix multiplication,” Gomez told me. If an A.I. system could be built with attention only, forsaking other components, it could work with unprecedented speed.
65 |
过去,研究人员常常把注意力机制和其他试图考虑到语言错综复杂性的系统结合起来。但谷歌团队意识到,注意力有一个重要的技术优势,这是早期研究者没有利用过的:使用注意力依靠的是一种相对简单的数学运算——矩阵相乘,即一个数字表格乘以另一个。 “我们用的芯片,擅长做一件事,那就是矩阵相乘,”戈麦斯对我说。如果一个人工智能系统只使用注意力机制,抛弃其他部分,那么它的运行速度将前无古人。
66 |
67 |
68 |
Before submitting their paper to the Neural Information Processing Systems conference, the team decided to title it “Attention Is All You Need.” “The way a transformer works is, take, let’s say, a sentence . . . and then attention is used to find which words are relevant and then pass that on to the next layer,” Polosukhin explained. The process is repeated through several layers and, at the end, what emerges is constantly improving text prediction. The efficiency of this process allows transformer-based models to easily scale from a single chip on a desktop to a data center with many thousands of processors; moreover, Kaiser said, for reasons that are still being studied, “transformers yield very good and predictable results when scaling up their size.” The network, meanwhile, learns on its own by identifying patterns in the data it examines. “You don’t prescribe what relationships it learns; you don’t say, ‘Your job is to learn associations of adjectives to nouns,’ ” Gomez said. “You just give it the ability to learn whatever it wants.”
69 |
在向“神经信息处理系统”会议提交论文之前,团队决定将其命名为《注意力就是你所需要的》。“转换器的工作方式是,比如说,输入一个句子……然后使用注意力找出哪些词是相关的,然后传递到下一层,”波洛苏金解释道。这一过程会通过多层结构进行重复,在最后得到不断改进的文本预测结果。这一过程的高效性使得基于转换器的模型可以轻松地从小桌面上的一块芯片扩展到拥有成千上万个处理器的数据中心;此外,凯泽说,出于一些仍在研究中的原因,“当转换器扩大规模时,会产生非常好的、可预见的结果。”与此同时,网络通过识别所检查数据中的模式来自行学习。“你不会规定它学习什么样的关系;你不会说,‘你的工作是学习形容词和名词之间的关联’,”戈麦斯说,“你只是赋予它学习任何它想学的东西的能力。”
70 |
71 |
72 |
Unfortunately—or fortunately, depending on how you look at it—transformers don’t imitate how the brain works. The transformer’s objective is to learn how to continue text, which it does by establishing relationships between “tokens”: collections of letters, punctuation marks, and spaces. It has no built-in grammar or syntax. It uses an algorithm called backpropagation to improve itself, but as a model of how the brain learns, “backpropagation remains implausible despite considerable effort to invent ways in which it could be implemented by real neurons,” Geoffrey Hinton, the “godfather of A.I.,” wrote, in a 2022 paper. The quest at the beginning of artificial intelligence—to understand how the human mind works—remains as unsolved as ever.
73 |
不幸的是——或者从某种角度来看,是幸运的——变压器并不模仿大脑的工作方式。变压器的目标是学会如何延续文本,它通过在“标记”之间建立关系来实现这一目标:“标记”是字母、标点符号和空格的集合。它没有内置的语法或句法。它使用一种名为反向传播的算法进行自我改进,但作为大脑学习方式的模型,“尽管人们付出了巨大的努力来发明反向传播可以通过真实神经元实现的方式,但它依然不切实际”,“人工智能之父”杰弗里·辛顿在2022年的一篇论文中写道。人工智能最初追求的目标——理解人类思维的工作原理——至今仍未得到解决。
74 |
75 |
76 |
Late in the project, only a couple of weeks before the submission deadline, Parmar and Vaswani were sitting in the Google lobby in Mountain View when they learned that their team’s transformer-based model had attained a BLEU score of more than twenty-six points on English-to-German translation. “Facebook had put out this paper before us, and that was the number we were trying to beat, and it had taken them days to train, and for us it had been a matter of hours,” Parmar recalled. Moreover, the Google team had used a small and primitive transformer-based network; this suggested that, with more resources, their results could quickly be improved. (The model’s final score was 28.4.) In their excitement, Vaswani and Parmar called Uszkoreit, who was driving his four-by-four down from paragliding in the mountains. “Jakob had some old champagne in his car,” Parmar said; they toasted their success with warm bubbly beside the dusty vehicle in the company parking lot.
77 |
项目接近尾声,距离交稿日期只剩几周的时候,帕玛和瓦斯旺尼得知他们团队基于变换模型的英德翻译系统在“布鲁”评分中达到了二十六分以上。当时两人正坐在位于山景城的谷歌大堂里。“脸书比我们早些时候发表了一篇论文,里面提到的评分是我们一直想要超越的目标,而他们达到这个目标用了好几天的时间训练系统,我们却只用了几个小时,”帕玛回忆道。而且,谷歌团队用的是一个小型的、原始版本的变换模型网络;这表明如果投入更多的资源,他们的成果可以迅速得到提升。(该模型的最终得分为 28. 4 分。)瓦斯旺尼和帕玛兴奋地给正在开车从山上返回的乌斯科莱特打电话,当时他刚刚滑翔伞运动完开着自己的四驱车。“雅各布的车里存着一些香槟,”帕玛说;他们在公司的停车场里,用温热的起泡酒庆祝了这一成功。
78 |
79 |
80 |
In the final days, Kaiser, who’d been pursuing, with Gomez’s help, a “unified model of everything”—a neural network that could be trained on images, audio, and text, and then generate new content in the same range of modalities—made a small but vital addition to the paper: he tried training a transformer-based model not just to translate but also to do the old-fashioned work of parsing, and found that it could learn that skill, too, training on a relatively small number of examples. This showed that the model could perform multiple linguistic tasks, working with language generally, rather than with just one of its aspects: it wasn’t just a translation machine but a language machine. Even so, no one expected that the underlying transformer technology would soon be used to build models that could plan vacations, draft and grade undergraduate essays, and replace customer-service representatives.
81 |
在最后的日子里,凯撒在他追求的“万物统一模型”上做出了小小的但是关键的贡献——一个可以被图像、音频和文本训练的神经网络,并且能够生成相同模式的新内容——在贡戈兹的帮助下,他在论文中增加了一点内容:他尝试着训练了一个基于转换器的模型,这个模型不仅能翻译,还能做传统的解析工作,结果发现它也能学会这种技能,并且只需要相对少量的例子进行训练。这表明该模型能够执行多种语言任务,处理语言本身,而不仅仅是语言的一个方面:它不仅仅是一个翻译机器,而且是一个语言机器。即便如此,也没有人预料到,底层的转换器技术很快会被用来建立能够规划假期、起草和批改大学生论文以及取代客户服务代表的模型。
82 |
83 |
84 |
The true power of the transformer became clearer in the next few years, as transformer-based networks were trained on huge quantities of data from the Internet. In the spring of 2018, Shazeer gave a talk titled “Bigger Is Better,” arguing that scaling transformers led to dramatic improvements and that the process did not appear to plateau; the more you trained the models, the better they got, with no end in sight. At Google, Shazeer was instrumental in developing the LaMDA chatbot, which holds the dubious distinction of being perhaps the first large language model that some poor soul believed to be sentient. At OpenAI, the ultimate result of scaling up was ChatGPT.
85 |
变压器的真正威力在随后几年里变得越来越清晰,基于变压器的网络在来自互联网的海量数据上进行训练。2018年春,Shazeer 做了一场名为“更大即更好”的演讲,他认为扩大变压器的规模会带来显著改进,并且这一过程似乎不会达到饱和;你对这些模型进行越多的训练,它们的效果就越好,而且似乎没有尽头。在谷歌,Shazeer 在开发 LaMDA 聊天机器人方面发挥了关键作用,LaMDA 有争议的地方在于它可能是第一个被认为具有自我意识的大型语言模型。在 OpenAI,扩大规模的最终成果是 ChatGPT。
86 |
87 |
88 |
If transformer-based A.I. were more familiar and complicated—if, say, it involved many components analogous to the systems and subsystems in our own brains—then the richness of its behavior might be less surprising. As it is, however, it generates nonhuman language in a way that challenges our intuitions and vocabularies. If you ask a large language model to write a sentence “silkily and smoothly,” it will produce a silky and smooth piece of writing; it registers what “silkily and smoothly” are, and can define and perform them. A neural network that can write about Japanese punk bands must on some level “understand” that a band can break up and reform under a different name; similarly, it must grasp the nuances of the idea of an Australian sitcom in order to make one up. But this is a different kind of “understanding” from the kind we know.
89 |
如果基于变压器的人工智能更加熟悉且复杂——如果,比如说,它包含许多类比我们大脑系统和子系统的组成部分——那么它的行为丰富性或许就不那么令人惊讶了。然而,实际情况是,它以一种挑战我们直觉和词汇的方式生成非人类语言。如果你让一个大型语言模型“流畅地”写一个句子,它会写出一篇流畅的文章;它能识别什么是“流畅”,并且能够定义并做到这一点。一个能够写作关于日本朋克乐队的神经网络在某种程度上必须“理解”一个乐队可以解散并在另一个名字下重组;同样地,为了编造一个澳大利亚情景喜剧,它也必须掌握这一概念的细微差别。但这是一种不同于我们所知的“理解”。
90 |
91 |
92 |
The researchers behind the transformer have different ways of reckoning with its capabilities. “I think that even talking about ‘understanding’ is something we are not prepared to do,” Vaswani told me. “We have only started to define what it means to understand these models.” (Part of the problem is that there is no agreed-upon understanding of what “understanding” is—is it a biological process, or an abstract one, or both?) Uszkoreit takes a more philosophical view: “Fundamentally, we have to accept that certain systems that are able to ‘understand stuff,’ whatever that means, will, almost by definition, elude our understanding of them,” he said. “As we build these machines, we are bound to lose the ability to conceptualize what’s going on, the ability to interpret. And there might be no other way.”
93 |
参与变换器研究的人员对它的能力有着不同的认识。“我认为,即便谈论‘理解’这种说法,我们也还没有做好准备,”瓦斯瓦尼对我说,“我们才刚刚开始定义理解这些模型意味着什么。”(问题在于,人们尚未就“理解”的含义达成共识——理解是一种生物学过程,还是抽象的过程,抑或兼而有之?)乌斯科雷特的观点更加具有哲学性:“从根本上说,我们必须接受这样一种观念:某些能够‘理解事物’的系统——不管那是什么意思——几乎可以肯定的是会超出我们的理解范畴,”他说,“随着这些机器的建立,我们必然会失去对其运作原理的概念化能力和解释能力。除此之外可能别无他途。”
94 |
95 |
96 |
Leonardo da Vinci drew on nature to create technology; for example, he grasped the workings of the camera obscura partly by boiling human eyeballs, cutting them open, and examining their workings. But the transformer is a surprise success achieved despite our relatively poor comprehension of how language and the human mind function. “A.I. now is probably where chemistry was in the Middle Ages,” Shazeer said. “There are these amazing things that work, and we can’t yet predict why they work. We have some intuitions, and a lot of the development will be chasing those intuitions.” In the past, scientists have often managed to turn their intuitions into clearly articulated explanations that can be proved or disproved, but it’s unknown whether that will happen with A.I.
97 |
莱昂纳多·达·芬奇借鉴自然来创造技术,比如,他通过煮人眼、切开它并研究其工作原理,掌握了暗箱的原理。 但令人意外的是,尽管我们对语言和人脑功能的理解相对有限,变压器却取得了成功。“现在的AI可能就像中世纪时期的化学,”沙希尔说,“有些神奇的东西起作用了,但我们还不知道它们为什么起作用。我们有一些直觉,而今后的很多发展都将围绕这些直觉。”过去,科学家们常常能够把他们的直觉转化为可以被证明或推翻的明确解释,但目前尚不清楚这种情况是否会发生在AI身上。
98 |
99 |
100 |
The element of accident in the transformer’s outsized success has evoked a notable humility in its inventors. When I asked Parmar how she would rate our current understanding of the models developed with the transformer, she said, “Very low.” I asked, How low? Ten per cent? One per cent? She shrugged: “How do we understand other humans? It will be the same with the models.” It’s fitting that the architecture outlined in “Attention Is All You Need” is called the transformer only because Uszkoreit liked the sound of that word. (“I never really understood the name,” Gomez told me. “It sounds cool, though.”)
101 |
变压器出人意料的大获成功在其发明者身上激发出一种显著的谦逊态度。当被问到我们对用变压器开发的模型的当前理解程度如何时,帕玛说,“很低。”我又问,有多低?百分之十吗?百分之一吗?她耸耸肩说,“我们是如何理解其他人的呢?我们对这些模型的理解也会是一样的。” 《注意力就是一切》中所概述的架构唯一被称为变压器的原因只是因为乌斯科莱特喜欢这个词的发音。(“我从来不太明白这个名字,”戈麦兹告诉我,“不过听上去很酷。”)
102 |
103 |
104 |
We’ve never had a truly “other” language before—a new and alien form of discourse that understands in a way we can’t understand. It’s unsurprising, then, that some of the people who were present at the creation are agog at the technology they brought into being. The production of A.I. seems to carry a powerful side effect: as the machines generate intelligence, they also generate mystery. Human misunderstanding endures, possibly a permanent condition.
105 |
我们以前从未有过真正“他者”般的语言——一种新颖而陌生的言说方式,用我们无法理解的方式理解。因此,那些技术的创造者们对其创造物感到惊叹也就不足为奇了。人工智能的产生似乎伴随着一个强烈副作用:当机器生成智能的同时,也生成了神秘感。人类的误解依旧存在,可能是永远无法消除的。
106 |
107 |
108 |