2024年12月17日
引言
Ilya Sutskever,OpenAI 的前联合创始人兼首席科学家,在最近举行的 NeurIPS 会议上提出,大规模模型的预训练可能已接近其发展的极限。
与此同时,OpenAI 的研究员 Noam Brown,他曾主导开发了在德州扑克中战胜顶尖职业选手的 AI 系统 Pluribus,在近期关于 OpenAI O1 发布的深度访谈中强调,提升测试阶段的计算能力(Test-Time Compute)是进一步提升大模型输出质量的关键所在。
AGI 社群正聚焦于从“扩展学习”向“扩展搜索”的战略转型,这一转变的核心理念与 Rich Sutton,强化学习领域的奠基人之一,在 2019 年所发表的经典论文《苦涩的教训》(The Bitter Lesson)中的深刻洞见不谋而合。
今天,让我们沉下心来,细细研读这篇具有里程碑意义的文章。通过深入理解其内涵,我们或许能够对 AI 领域的现状与未来发展趋势有更为透彻的认识,并从中汲取宝贵的启示。
大模型的总结
刻行时空也将文字送与大模型,请它结合中国传统古籍,给出三句会作为此文的总结:
"工欲善其事,必先利其器。" ——《论语·卫灵公》 要实现人工智能的突破,必须充分利用计算资源,而不是仅仅依赖人类知识的嵌入。
"致中和,天地位焉,万物育焉。" ——《中庸》
在人工智能研究中,平衡短期与长期目标,既利用人类知识,也拥 抱计算与学习的潜力,才能实现真正的进步。
"大道至简,而民好径。" ——《道德经》
复杂的方法未必有效,通用且简单的计算与学习方法才是长远成功的关键。
正文
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.
The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation.
70年人工智能研究的最大启示是,利用计算能力的通用方法最终是最有效的,而且效果显著。这背后的根本原因是摩尔定律,或者更准确地说,是计算单位成本持续呈指数下降的普遍规律。
Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available.
大多数人工智能研究都是在假设计算资源恒定的情况下进行的,因此利用人类知识似乎是提升性能的少数途径之一。然而,在比典型研究项目稍长的时间尺度上,不可避免地会有更多的计算资源变得可用。
Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.
为了在短期内获得改进,研究者们试图利用他们对领域的人类知识,但从长远来看,真正重要的是对计算能力的利用。
These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other.
这两种方法并非必然对立,但在实践中往往如此。投入在一种方法上的时间就意味着无法投入到另一种方法。人们会在某种方法上产生心理投入。
And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.
而基于人类知识的方法往往会使方法变得复杂,使其不太适合利用基于计算的通用方法。有许多人工智能研究者迟来地领悟到这一苦涩教训的例子,回顾其中一些最显著的例子是有启发意义的。
In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computerchess researchers who had pursued methods that leveraged human understanding of the special structure of chess.
在计算机国际象棋领域,1997年击败世界冠军卡斯帕罗夫的方法是基于大规模、深度搜索的。当时,大多数追求利用人类对国际象棋特殊结构理解的计算机国际象棋研究人员对此感到沮丧。
When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that "brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.
当一个更简单、基于搜索的方法结合专用硬件和软件被证明效果远超预期时,这些基于人类知识的国际象棋研究人员并不甘心失败。他们认为,“蛮力”搜索可能这次赢了,但这并不是一种通用策略,而且无论如何,这并不是人类下棋的方式。这些研究人员希望基于人类输入的方法能够获胜,当这些方法未能奏效时,他们感到失望。
A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale.
在计算机围棋领域出现了类似的研究进展模式,只是推迟了大约 20年。最初的巨大努力都集中在通过利用人类知识或游戏的特殊特征来避免搜索,但一旦大规模有效地应用搜索,这些努力都被证明无关紧要,甚至适得其反。
Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that rst beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research.
同样重要的是通过自我对弈学习价值函数(在许多其他游戏中,甚至在国际象棋中也是如此,尽管在1997年首次击败世界冠军的程序中,学习并未发挥重要作用)。自我对弈学习,以及学习本身,和搜索一样,都能使海量计算发挥作用。搜索和学习是利用人工智能研究中大量计算的两个最重要的技术类别。
In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.
在计算机围棋领域,就像在计算机国际象棋领域一样,研究人员最初的努力是利用人类理解(从而减少搜索需求),但只有在后来,通过拥抱搜索和学习,才取得了更大的成功。
In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs).
在语音识别领域,20 世纪 70 年代曾有美国国防高级研究计划局(DARPA)赞助的早期竞赛。参赛者包括众多利用人类知识的特殊方法——对词汇、音素、人类发声器官等的了解。另一方面是更多基于统计的新方法,即基于隐马尔可夫模型(HMM)的大量计算方法。
Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems.
同样,统计方法战胜了基于人类知识的方法。这在数十年间逐渐导致自然语言处理的重大变革,统计和计算开始主导这一领域。深度学习在语音识别中的最近崛起是这一持续方向的最新步骤。深度学习方法更少依赖人类知识,使用更多计算,并在海量训练集上学习,从而生成了显著更好的语音识别系统。
As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.
与在游戏中一样,研究人员总是试图构建与他们自己思维方式相似的系统——他们试图将这些知识嵌入到系统中——但最终这被证明是适得其反的,并且是研究人员时间的巨大浪费,因为通过摩尔定律,大规模计算变得可用,并且找到了有效利用它的方法。
In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.
在计算机视觉领域也出现了类似模式。早期方法将视觉概念化为寻找边缘、广义圆柱体,或基于 SIFT 特征。但如今,这些都被抛弃了。现代深度学习神经网络仅使用卷积和某些不变性概念,却表现得更好。
This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run.
这是个重大教训。AI 领域的从业者仍未彻底吸取这个教训,因为我们仍在犯类似的错误。要看清并有效抵制这些错误,我们必须理解这些错误的吸引力。我们必须学习这个痛苦的教训:试图构建我们认为的思维方式是行不通的。
The bitter lesson is based on the historical observations that
AI researchers have often tried to build knowledge into their agents,
this always helps in the short term, and is personally satisfying to the researcher, but
in the long run it plateaus and even inhibits further progress, and
breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.
这个痛苦的教训基于以下历史观察:
人工智能研究者常常试图将知识构建到他们的代理人(AI 算法)中,
这在短期内总是有帮助,并且对研究者个人来说令人满意,
但从长远来看,它会达到平台期,甚至阻碍进一步进步,
突破性进展最终来自于基于搜索和学习扩展计算的相反方法。
The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
最终的成功带有一丝苦涩,且往往未被充分消化,因为这是对偏爱的以人类为中心的方法的胜利。
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.
从这个痛苦的教训中应该学到的一点是通用方法的巨大力量,即能够随着计算资源的增加持续扩展的方法。在这方面似乎能够任意扩展的两种方法是搜索和学习。
The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex;
从这一苦涩的教训中应该学到的第二个普遍观点是,思维的实际内容极其复杂且无法简化;
we should stop trying to nd simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries.
我们应该停止试图找到简单的方式来思考思维的内容,例如对空间、物体、多智能体或对称性的简单思考方式。
All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless;instead we should build in only the meta-methods that can nd and capture this arbitrary complexity.
所有这些都是外在世界的任意、固有的复杂性的一部分。它们不应该被嵌入系统中,因为它们的复杂性是无止境的;相反,我们应该嵌入的只是能够发现和捕捉这种任意复杂性的元方法。
Essential to these methods is that they can nd good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.
这些方法的关键在于它们能够找到良好的近似值,但寻找这些近似值的过程应该由我们的方法来完成,而不是由我们自己来完成。我们希望人工智能代理能够像我们一样进行发现,而不是包含我们已经发现的内容。将我们的发现嵌入系统中只会使发现过程的实现变得更加困难。