4. 计算机科学 (Kaggle项目拆解：泰坦尼克号预测)

今天计算机科学最高的研究水平，从人工智能，到并行计算，从计算机网络，到处理器芯片，从数据库，到云计算，都在公司，而不在大学里。在学校学习的知识都会慢慢老化，如果我有一点点的黑客风格，就会保持开放的思维，愿意接受新东西，也乐意分享自己的知识、学习方法、思考模式、学习资源等，要对自己有信心，未来的自己一定会更强，一定可以挖到更多的 0 day、0 day。......

Debroon

3406人浏览 · 2019-08-13 22:14:59

Debroon · 2019-08-13 22:14:59 发布

今天计算机科学最高的研究水平，从人工智能，到并行计算，从计算机网络，到处理器芯片，从数据库，到云计算，都在公司，而不在大学里。

在学校学习的知识都会慢慢老化，如果我有一点点的黑客风格，就会保持开放的思维，愿意接受新东西，也乐意分享自己的知识、学习方法、思考模式、学习资源等，要对自己有信心，未来的自己一定会更强，一定可以挖到更多的 0 day、0 day。

p.s. 在计算机安全领域中，零日漏洞或零时差漏洞（英语：Zero-day exploit）通常是指还没有补丁的安全漏洞，而零日攻击或零时差攻击（英语：Zero-day attack）则是指利用这种漏洞进行的攻击。

p.s. 提供该漏洞细节或者利用程序的人通常是该漏洞的发现者。零日漏洞的利用程序对网络安全具有巨大威胁，因此零日漏洞不但是黑客的最爱，掌握多少零日漏洞也成为评价黑客技术水平的一个重要参数。

挖漏洞也不是为了赚钱，是纯粹的告诉别人：" ... (我比你强~)"，而后别人说：“反弹”......

您完全可以根据自己的喜好决定阅读顺序。

   《目录》(粉色章节是翻译补充，补充这么多也不知道好不好)

锲子
Programming as Art and science
自动化计算

人工智能入门指南

人工智能是什么？
人工智能之路食用指南
人工智能的发展趋势
如何测量 A 与 B 的相关性？
机器与人在学习上的区别
机器学习界的 Hello, world.Kaggle项目拆解：泰坦尼克号预测
因果革命
因果思维的层次
实现因果思维的方法
Computing as Pervasive Information Processes
The Universe as a Computer

因为计算机科学具体内容，在《1. What is computation thinking ?》已经写明(点击查看)。

所以，体验一下原文，第四章尽量以英文原文复述，第五章将是全英文。

第一章主要是聊 “计算机系”；
第二章主要是讲 “图灵机”；
第三章主要是看 “通用计算机”；
第四章主要说闻 “人工智能的未来发展趋势”

锲子

20 世纪 50 年代，学术界开始倡导在大学里建立计算机科学课程，以满足对学习新技术日益增长的需求。

从那时起，计算机科学部门对计算机思维的许多概念进行了提炼和完善。

现在我们来看看计算机思维是如何在大学里发展起来的。

在开始之前，我们想指出计算机思维发展的环境里的几个关键方面。

首先也是最重要的，计算机是一个融合工程、科学和数学的技术领域。

大多数计算机专业的学生来大学，是为了学习软件和硬件技术的专业，而不是为了接受通识教育。

雇主也来大学招聘毕业生就业。

因此，随着学术而发展起来的计算机思维，在设计方面一直具有很强的分量，并受到雇主所说的需求的强烈影响。

但这还不是全部。

大学以学科成一系部门，并分散在各个学科研究所和中心。

这些部门极力保护自己的身份、预算和空间。

因为他们的预算取决于入学的学生，所以他们一定会保护招生工作。

而且，由于招生，决定了声誉和声誉的研究生产力，各系也可以保护教授的研究领域。

Another important shaping aspect of academia is the practice of seeking consensus on all decisions.

每个人都希望有发言权，不管是招聘新人、授予终身教职、决定提供哪些课程、批准其他部门提出的可能重叠的课程，还是批准组建新的项目或部门。

以下内容正是介绍计算机系的背景，译者曾在第一章末节介绍过计算机受到的阻碍，这里给出英文原版吧。

This is the atmosphere in which new computational thinking departments and academic computational thinking were formed.

The founders worried about curriculum and industry demand in the context of a set of consensus-seeking departments fiercely guarding their prerogatives, always concerned with public image and identities.

The new departments proposed by the founders were split off from their existing departments.

Their home departments often did not support the split because they would lose students, budget, and identity.

The founders encountered a lot of resistance from other departments that did not deem a new department focused on computer technology to be legitimately science or engineering, or see it would provide a unique intellectual perspective.

Foring a consensus favoring formation of a new department was a challenge.

They built a good case and were successful.

计算机科学系的数量缓慢增长，仅在美国，从 1962 年的 1 个增加到 1980 年的 120 个。

最终在 20 世纪 90 年代末，计算机科学开始兴起，人们终于意识到计算机革命是真实的。

今天几乎每一所大学都有一个计算机科学系。

科技、理工甚至商业学校都设有计算机科学系。

Why so many homes?

The answer echoes those early political fights: the new departments were established in the schools that were most welcoming.

由于大多数系都在理工科学校，到20世纪80年代，计算机科学家们把他们的研究领域称为 “计算机科学与工程”。

That mouthful was simplified in the 1990s as "computing" became a popular shorthand for CS&E and its European counterpart "informatics".

在 20 世纪 90 年代，一些大学进一步发展，建立了独立的计算机学校，这一运动在今天继续发展。

真是个转机！

早期形成了俩个计算机学术学会：1946 年的 IEEE-CS（Institute of Electrical and Electronics Engineers Computer Society）和1947年的ACM（Association for Computing Machinery）。

由于他们努力制定和推广课程建议，我们定期在 1968年、1978年、1989年、1991年、2001年和 2003 年对计算机课程进行了一系列简介。

These snapshots show how the concerted efforts of computing pioneers to articulate a unique identity for computer science led them to recognize computational thinking as a distinguishing aspect from the beginning.

事后看来，我们可以看到四个时代，描述了大学对计算的看法，以及计算思维的观点是如何变化的：

Phenomena surrounding computers(1950s-1970s)
Programming as art and science(1970s)
Computing as automation(1980s)
Computing as pervasive information processes(1990s to present)

我们将在下面的章节中讨论这些时代。

大学计算思维发展的这四个阶段受到其他领域对计算机科学的最初抵制的强烈影响：计算机科学家花了大量的精力来澄清和证明他们的研究领域。

But computer science was not always the receiver of resistance.

There were two important instances when computer science was the giver.

One was the computational science movement in the 1980s, which was eschewed by many computer scientists.

A common reaction to an announcement by the physics or biology department that they were setting up a computational science branch would be a howl of protest that those departments were impinging on the territory of computing.

一些计算机科学家认为，物理学和生物学已经认识到计算机的重要性，正试图劫持他们曾经强烈反对的领域。

Eventually computer scientists got over this and now work collaboratively with computational sciences.

我们将在第 7 章中讨论 computational sciences。

软件工程也发生了类似的过程。

The computing departments that viwed themselves as science were not receptive to the practices of teaching and doing projects common in engineering.

ACM 和 IEEE 于 1950 年初，为年轻人创办了期刊。

The Moore School, home of the ENIAC project, was an early starter of computing education in 1946 with a two-month intensive course on "theory and techniques for design of electronic digital computer."

In the 1950s the Moore School offered a multi-discipline degree in computing that included numerical analysis, programming, and programming language design.

其他学校也开始了自己的课程。

These early efforts of establish computing as unacademic discipline were slow to gain traction.

The impediment was more than a cautionary hesitancy to see if computers were here to stay; it was a deep doubt about whether computing and academic substance beyond mathematics, electrical engineering ,and physics.

Outsiders typically saw the computing field of the 1950s as an impenetrable and anarchistic thicket of idiosyncratic technology tricks.

What is more, the different perspectives to thinking about computing were disunited: those who designed computing machines were mostly unaware of important developments in the theory of computing such as Turing on computatble numbers, Church on lambda calculus, Post on string manipulation, Kleene on regular expressions, Rabin and Scott on nondeterministic machines, and Chomsky on the relation between grammars and classes of automata.

Academics who proposed full-fledged computer science departments or programs in research universities met stiff resistance.

Many critics did not believe in the value of computing's new ways: common objections included lack of unique intellectual content and lack of adequate theoretical basis.

Purists argued that computers were human-made artifacts and not natural occurrences, and thus their study could not be counted among the noble natural sciences.

On top of all that, many doubted whether computing would last.

Until there was a consensus among many departments, no one could found a computer science department.

This tide began to change in 1962, when Purdue established the first computer science department and Stanford followed soon thereafter.

在接下来的 20 年里，部门的数量增长缓慢但稳定，仅在美国就超过了 100 个。

尽管如此，许多学者仍然质疑计算机科学是一个合法的科学领域还是工程领域。

关于计算合法性问题的一个重大转变发生在 1967 年，当时三位公认的计算机科学家艾伦·纽威尔、艾伦·珀利斯和赫伯特·西蒙发表了一封关于这个问题的信。

They wrote："Wherever three are phenomena, there can be a science to describe and explain those phenomena. Thus, ... botany is the study of plants, ... zoology is the study of animals, astronomy the study of stars, and so on. Phenomena bread sciences. ... There are computers. The phenomena surrounding computers are varied, complex, rich."

From this basis they quickly dismissed six objections, including the one that computers are human-made and are therefore not legitimate objects of a science.

Herb Simon, a Nobel laureate in economics , so objected to the notion that there could be no science, surrounding human-made objects that he wrote a this idea.

He gave an example from time-sharing systems(computers that allow many simultaneous users): The early development of time-sharing systems could not have been guided by theory as there was none, and most predictions about how time-sharing systems would behave were astonishingly inaccurate.

It was not possible to develop a theory of time-sharing systems without actually building those systems; after they were built, empirical research on their behavior led to a rich theoretical base about them.

In other words, computational thinking could not approach problems from one direction only-the engineering aspects and scientific-mathematical aspects of computing evolved in a synergistic way to yield a science that was not purely a natural science.

The notion of computing as the study of phenomena surrounding computers quickly gained traction, and by the end of the 1960s was taken as the definition of computing.

A view of the field's uniqueness started to form around that notion.

The term "algorithm thinking" was used to describe the most obvious aspect of new kind of thinking.

The field's unique aims, typical problems, methods of solving those problems, and kinds of solutions were the basis of computational thinking.

The computing pioneers expanded computational thinking beyond what they inherited from the long history of computation.

They focused on the construction principles of programs, computing machines, and operating systems.

他们提出了许多今天被认为是理所当然的计算概念，包括命名变量、控制结构、数据结构、数据类型、形式编程语言、子程序、编译器、输入输出协议、指令管道、中断系统、计算过程、内存层次结构、缓存、虚拟内存、外围设备和接口。

编程的方法和计算机系统架构是计算思维发展的主要驱动力。

到 1970 年，大多数计算机科学家表示，计算机特有的思维和实践方式——今天称为计算思维——包含了与计算机有关的所有知识和技能。

计算机思维早期分为硬件思维和软件思维。

The hardware flavor was followed by computer engineers in the engineering school; the software flavor by software designers and computing theorists in the science school.

Programming as Art and science

20 世纪 60 年代是计算机的成熟期，计算机科学家们对 ta 们的思考方式产生了相当丰富的内容。

The subfield of operating systems was born in the early 1960s to bring cheap, interactive computing to large user communities---computational thinking acquired a systems attitude.

The subfield of software engineering was born in the late 1960s from a concern that existing models of programming were incapable of developing reliable and dependable production software---computer thinking acquired an engineering attitude.

The subfield of networking was born in 1967 when the ARPANET project was started --- computational thinking acquired a networking attitude.

With a solid, reliable technology base in place, the field's attention shifted to programs and programming.

随着标准编程方法的出现，许多编程语言应运而生。

A huge interest informal verification of programs welled up, seeking a theory-based way to demonstrate that programs were reliable and correct.

A similar interest in computational complexity also welled up, seeking analytical ways to assess just how much computational work the different algorithms required.

Computer programs are expressions of algorithms in a formal language that, when compiled to machine-executable form, control the actions of a machine.

程序是几乎所有计算的核心：大多数计算机专业人员和研究人员都以某种方式或其他方式使用程序。

在 20 世纪 40 年代的第一台存储程序计算机上，编程是用汇编语言完成的，汇编语言把指令集逐行转换成计算机可以运行的机器代码。

例如，指令 “ ADD R1，R2，R3 ” 是将寄存器 R1 和 R2 的和放入寄存器 R3。

该指令通过用二进制代码替换 ADD R1，R2，R3 转换为机器代码。

用汇编语言编写程序非常繁琐而且容易出错。

编程语言的发明是为了提供程序员想要的更高层次的精确表达式，而后编译器可以明确地将这些表达式翻译成机器代码。

这大大简化了编程工作，使其更具生产力，更不容易出错。

The first widely adopted programming languages introduced a plethora of new computational thinking concepts that had few or no counterparts in other intellectual traditions.

大多数编程语言旨在帮助自动化重要工作，如分析科学数据和评估数学模型（1957年的Fortran）、进行逻辑推理（1958年的Lisp）、跟踪业务库存和维护客户数据库（1959年的COBOL）。

一些语言的目的是让人们交流精确的算法规范(可读性好)，这些规范可以被纳入其他语言中。

Algol 语言（1958）就是从这个角度发展起来的。

语言迎合特定思考问题方式的想法被称为 “编程范式”。

例如，命令式编程将程序视为一系列模块（称为“程序”），其指令命令机器。

FORTRAN, COBOL, 和 ALGOL 都适合这一类别。

面向对象编程将程序视为相对独立的单元(“对象”)的集合，这些单元相互交互，并通过交换消息与外部世界交互。

后来的语言，如 SimalTalk 和 Java 就属于这一类。

函数式编程把程序看作从输入数据生成输出数据的数学函数集。LISP就是一个例子。

这些编程范例在 20 世纪 70 年代被视为不同风格的算法思维。

他们都希望程序是清晰的表达式，以便人类在编译和执行时正确、有效地阅读和执行。

Donald Knuth, in his major works The Art of computer Programming and Literate Programming, and Edsger Dijkstra in his work on structured programming, epitomized the idea that computing is about algorithms in this sense.

到 1980 年，大多数计算机科学家都说计算机思维是一套与算法和软件开发有关的技能和知识。

But things got tricky when the proponents of algorithmic thinking had to describe what algorithmic thinking was and how it differed from other kinds of thinking.

Knuth 比较了数学教科书和计算机教科书中的推理模式，确定了俩者的典型模式。

He concluded that algorithmic thinking differed from mathematical thinking in several aspects: by the ways in which it reduces complex problems to interconnected simple ones, emphasizes in formation structures, pays attention to how actions alter the states of data, and formulates symbolic representations of reality.

In his own studies, Dijkstra differentiated computer scientists from mathematicians by their capacity for expressing algorithms in natural as well as formal language, for devising notations that simplified the computations, for mastering complexity, for shifting between abstraction levels, and for inventing concepts, objects, notations, and theories when necessary.

Today's descriptions of the mental tools of computational thinking are typically much less mathematical in their orientation than were many early descriptions of algorithmic thinking.

Over time, many have argued that programming and algorithmic thinking are as important as reading, writing, and arithmetic---the traditional three Rs of education---but the proposal to add them(as a new combined "R") to that list has yet to be accepted.

Computing's leaders have a long history of disagreement on this point.

Some computing pioneers considered computing's ways of thinking to be a generic tool for everyone, on a par with mathematics and language.

Others considered algorithmic thinking to be a rather rare, innate ability---present with about one person in fifty.

The former view has more support among educators because it embraces the idea that everyone can learn computational thinking: computational thinking is a skill to be learned and not an ability that one is born with.

The programming and algorithms view of computing spawned new additions to the computational thinking toolbox.

The engineering-technology side provided compilers(for converting human-readable programs to executable machine codes), parsing methods (for breaking programming language statements into components), code optimization, operating systems, and empirical testing and debugging methods(for finding errors in programs).

The math science side provided a host of methods for algorithms analysis such as O-notation for estimating the efficiency of algorithms, different models of computation, and proofs of program correctness.

By the late 1970s, it was clear that computing moved on an intellectual trajectory with concepts, concerns, and skills very different from other academic disciplines.

自动化计算

Despite all its richness, the view of computing as the study and design of algorithms was seen as too narrow.

By the late 1970s, there were many other questions under investigation.

How do you design a new programming language ?

How do you increase programmer productivity ?

How do you design a secure operating system ?

How do you design fault-tolerant software systems and machines ?

How do you transmit data reliably over a packet network ?

How do you protect systems against data theft by intruders or malware ?

How do you find the bottlenecks of a computer system or network ?

How do you find the response time of a system ?

How do you get a system to do work previously done by human operators ?

The study of algorithms focused on individual algorithms but rarely on their interactions with humans or the effects of their computations on other users of systems and networks.

It could hardly provide complete answers to these questions.

The idea emerged that the common factor in all these questions, and the soul of computational thinking, was that computing enabled automation in many fields.

Automation generally meant one of two things: the control of processes by mechanical means with minimal human intervention, or the carrying out of a process by a machine.

Many wanted to return to the 1960s notion that automation was the ultimate purpose of computers and among the most intriguing questions of the modern age.

自动化似乎是所有计算机科学中的共同因素，而计算机思维似乎是为了提高自动化的效率。

In 1978 the US National Science Foundation launched a comprehensive project to map what is essential in computing.

It was called the "Computer Science and Engineering Research Study"(COSERS).

In 1980 they released What Can Be Automated ?, a thousand-page tome that examined numerous aspects of computing and its applications from the standpoint of efficient automation.

That study answered many of the questions above, and for many years, the COSERS report offered the most complete picture of computing and the era's computational thinking.

It is still a very relevant resource for anyone who wants an overview, written by famous computing pioneers, of many central themes, problems, and questions in computing.

Well into the 1990s, the computing-as-automation idea was adopted in books, research reports, and influential policy documents as the "fundamental question underlying computing."

This idea reonated well with the history of computational thinking:As we discussed in the previous chapters, automatic computing realized the dream of applied and correctly without relying on human intuition and judgment.

Theoreticians such Alan Turing were fascinated by the idea of mechanizing computing.

Practitioners saw their programs as automations of tasks.

By 1990,"What can be automated?"

became a popular slogan in explanations of computing to outsiders and a carrying theme of computational thinking.

Ironically, the question of "what can be automated" led to the undoing of the automation interpretation because the boundary between what can and cannot be automated is ambiguous.

由于新的算法或更快的硬件，以前不可能实现自动化的事情现在可能成为可能。

到 20 世纪 70 年代，计算机科学家已经发展出一套丰富的计算复杂性理论。

By the 1970s, computer scientists had developed a rich theory of computational complexity, which classified problems according to how many computational steps algorithms solving them needed.

For example, searching an unordered list of N items for a specific item takes time proportional to N steps.

将 n 个元素排序更为复杂：某些排序算法需要 $N^{2}$ 步，而最佳算法只需要 N*log(N) 步。

译者补充：对算法分析感兴趣，请见博客《渐进记号》。

Printing a list of all subsets of N items takes time proportional to $2^{N}$ .

The search problem is of "linear diffculty", the sorting problem is of "quadratic diffculty," and the printing problem is of "exponential difficulty."

Search is fast, enumeration is slow; computational complexity theorists call the former "easy" and the latter "hard."

To see how vast the difference is between easy and hard problems, imagine that we have a computer that can do 1 billion( $10^{19}$ ) instructions or per second.

To search a list of 100 items would take 100 instructions 0.1 microseconds.

To enumerate and print all the subsets of 100 items would take $2^{100}$ instructions, a process that would take around $10^{14}$ years.

That is 10,000 times longer than the age of the universe, which is very roughly around $10^{10}$ years old.

Even though we can write an algorithm to do that, there is no computer that could complete the job in a reasonable amount of time.

Translating this to automation, an algorithm to automate something might take an impossibly long time.

Not everything for which we have an algorithms is automatable in practice.

Over time, new generations of more powerful machines enable the automation of previously intractable tasks.

Heuristic algorithms make the question of computational hardness asks us to pack a subset of items into a weight-limited knapsack to maximize the value of items packed.

The algorithm for doing this is similar to the enumeration problem and would take an impossibly long time for most knapsacks.

But we have a rule-of-thumb(a "heuristic") that says "rate each item with its value-until the knapsack is full."

This rule of thumb packs very good knapsacks fast, but not necessarily the best.

Many hard problems are like this.

There are fast heuristic algorithms that do a good job but not necessarily the best.

We can automate them only if we find a good heuristic algorithm.

The early findings about what things cannot be done in computing, either because they are impossible or just too long, led to pessimism about whether computing could help with most practical problems.

Today the mood is much more optimistic.

A skilled computational thinker uses a sophisticated understanding of computational complexity, logic, and optimization methods to design good heuristic algorithms.

Although all parts of computing contribute to automation, the field of artificial intelligence(AI) has emerged as a focal point in computing for automating human cognitive tasks and other human work.

因为计算机科学的重要内容我已经总结在第一篇，因此尽量给原文，以下为译者补充。

人工智能入门指南

人工智能是什么？

现在的人工智能本质上是机器学习，机器学习就是用一组数据建立一个统计模型。

这个统计模型能对新的数据作出预言，输入的数据越多越精确，模型能做的预言就越准确。

数学家就叫统计模型，而计算机科学家鬼使神差的改叫机器学习，记者有时候叫大数据，现在的科技大佬在大会上就更会聊了，叫啥呢，不就是人工智能......

上一章介绍了神经网络，如果用大量的数据去训练这个网络，让网络学会自己做判断。

网络内部有大量参数随着训练不断变化，就相当于人脑在学习中提高技艺。

这样的训练主要依靠数据和所谓的智能算法，即数学模型。

机器学习其实是再现昨天的世界，只不过昨天的世界和今天的世界之间是连续变化的。

如，人脸识别软件，使用您过去的照片进行训练，识别今天的您，之所以能识别得准，是因为昨天的您，去年的您，甚至 10 年前的您，和今天的您之间是渐变的、连续变化的，而不是跳变的。

而机器学习，实际上是寻找一种数学模型，让这种模型符合 ta 所要描述的对象。

其实您每次做数学的应用题，都是在使用一个模型。

您知道题目包含什么假设，有什么因果关系，就能用方程推导出一个结果，方程描写的是模型中事物的联系。

模型的特点：

包含各种实体，人也好、组织也好、物品也好
可以做逻辑推导，这些实体之间的相互关系，必须都有非常明确的定义。

比如说我们要寻找一种描述天体运动的模型，让ta符合太阳系行星的运动情况，今天这个模型就是开普勒－牛顿的椭圆形模型：太阳系中所有的行星和彗星围绕着太阳系的重心(和太阳的位置很接近但是有所差别)，做椭圆运动。

当然，这个模型内不同物体的运动速度和周期不同，这些数据被称为模型的参数。

有了这样一个模型，就可以知道今后太阳系中行星和彗星的运行规律了，哈雷就用 ta 成功地预测了哈雷彗星未来回到人们视野的时间。

在这个模型中，要得到模型的参数(比如每一个星体运动的长轴半径，运动一圈的周期)，就要根据观察到的历史数据计算。

我们在机器学习中有时用 “契合” 来形容模型描述的情况和真实情况的接近程度，契合度越高，误差越小，当然也就越好。

数学上有一个 “切比雪夫大数定律”，保证了只要数据量足够大，这种契合度就能接近100%。

机器学习分两种。

第一种是有模型和参数的学习，也就是显性的模型。在历史上，地心说模型、日心说模型、开普勒－牛顿的椭圆形模型，都是这一类。当然，模型和真实情况并不一定一致，只是在误差范围内简洁地概括了真实情况；而在过去，人需要想出模型，再通过实验找参数。
第二种是没有限定模型，只有数据，有数据驱动，看看能产生什么样的模型。比如基于人工神经网络的深度学习就是如此，当然，里面一些拓扑结构，一些转换函数的设定，还是有规范的。由于没有情绪的模型，因此今天很多人觉得深度学习是个黑箱，ta只是对目标进行了最优化，至于为什么有效，谁也说不清。

机器学习根据使用的数据不同，区分成了不同的类型。

无监督学习：输入的数据是杂乱无章的，机器想从这些数据中学习并得到规律；输出的模型就可能不准确。
有监督学习：输入的数据都是经过标识的，有了一个大致的方向；但因为人工标识数据不仅成本高，而且很难得到足够量的数据，这种机器学习方法提高到一定程度，就再也提高不了了。
半监督学习：介于前俩者之间。
强化学习：计算机在没有人为给定方向的条件下，自己试着走一个方向，而后由人设定的一些原则告诉 ta 好不好，也就是说需要有一个判断价值的反馈信号送还给机器学习系统。

任何一个机器学习的过程，其实都是不断地调整数学模型参数的过程，直到参数收敛到最佳点。

从事过机器学习工作的人都知道这样一个诀窍，为了加快机器学习的收敛速度，最好先用标注过的高质量数据寻找收敛的方向，这比完全没有数据输入，全靠计算机自适应学习快得多。

那些标注过的，正确无误的数据，其实就是人总结出来的，或者见到的成功经验。

没有这些成功经验，计算机通过自适应学习，也能收敛到正确的模型，但是时间要长很多。

如果您给计算机输入的数据是错误的，相当于失败的经验，计算机即使最终能回归到正确的模型上，也要走非常非常长的弯路。

想深入的话，可以看下面的参考资料：

机器学习的学习类型：HAWQ数据仓库与数据挖掘实战电子书,王雪迎-文档类-CSDN下载
机器学习的常用算法：3.机器学习常用算法.pdf-机器学习文档类资源-CSDN下载

人工智能之路食用指南

有了计算机的基础，学习过编程和计算机算法之后，学习人工智能其实并不难(主修)。

首先需要有足够的概率论和数理统计基础，这在大学就是一门课，学了这门课，基本的人工智能和机器学习的数学基础就有了。

而后学习这本书《人工智能：一种现代方法》（Artificial Intelligence: A Modern Approach），这是目前最好的、写得最清楚的教科书。

作者一位是罗素教授，担任过伯克利计算机系主任，是人工智能专家。

另一位是诺威格博士，是 Google 的大牛，过去主管 Google 整个的研究部门，是一位有丰富工程经验的学者。

他们一直在不断地更新这本教科书，美国80%～90%的大学人工智能课都使用这本书。

此外，对于人工智能的不同应用，需要学习一些专业知识；如做人脸识别，就需要学习图像处理；做机器翻译，就需要学习自然语言处理。

如果只是辅修的话，可以看一看这个：《程序员的金豆豆能不能搞事情》。

如果对 “自学” 这个动作感兴趣，请见博客：《C语言自学指南》。

The computational Thinking toolbox accumulated heuristic methods for searching solution spaces of games, for deducing conclusions from given information, and for machine-learning methods that find problem solutions by generalizing from examples.

机器学习中包及相应函数认识：R语言基本统计分析方法（包及函数）_a嘟嘟a的博客-CSDN博客_r语言统计包

人工智能的发展趋势

越来越多的人理解了人工智能的能力边界，但计算机和数据科学家，仍然需要以负责的道德标准，去推动人工智能发展。

—— 科技媒体网站VentureBeat

科技媒体网站 VentureBeat 最近采访了四位科技界大咖，预测了2019年人工智能的发展趋势。

这四位大咖，都是人工智能领域非常有份量的大人物，包括斯坦福大学的计算机科学系的副教授、全球人工智能和机器学习领域最权威的学者之一吴恩达、纽约大学计算机科学家杨立昆、埃森哲应用智能部的总经理鲁曼·乔德赫里，还有数据科学和机器学习咨询机构 Fast Forward Labs 的创始人兼 CEO 希拉里·梅森。

吴恩达发现，其实只要找对了方法，哪怕只用小规模的数据，也能获得有价值的结果。

如，训练一个图像识别系统，不需要再拿上亿张图片来做训练素材，只用 1000 张图片也能达到预期的效果，这就大大节约了时间和成本，也降低了人工智能行业的准入门槛。

所以，“少样本学习”，将会成为2019年人工智能学习方向的一个重要趋势。

以及，“自监督学习” 和“ 因果学习”，这两个趋势，一个说的是让人工智能自己来掌握事物的运作规律，另一个说的是让人工智能学会建立因果关系。

自监督学习和建立因果关系，这两种能力组合到一起，就能让人工智能像人类一样，学会关于这个世界的常识了。

而要让人工智能学会建立因果关系，就不仅是通过观察来学习了，还要能进行推理，比如说，看到有人打伞，就能推测出，外面可能在下雨。

在 2019 年，人工智能将朝着这样一个方向发展：

用少量样本就能掌握更强大的技能，甚至还可能拥有像人类一样的常识；不过，在人工智能发展的道路上，我们还会面临更复杂的伦理挑战，如何对人工智能伦理进行监管，将会是一个长期难题。

读后感：

因果关系下面有细写，是人工智能界最重要的趋势；至于 “少样本学习”，我则不太认同。

我认为这个时代，最不缺的就是数据了，明明更多的数据搞出来的效果会更好，为什么不呢？

就这 3 年世界产生的数据比 3 年前到人类诞生都要多得多，以后随着 loT 的普及，数据会再次成数量级增加......

如何测量 A 与 B 的相关性？



我来为上面这篇新闻，添加一些背景信息。

现在的人工智能，大数据分析是基础，而且基本上都是关注相关性，而非因果关系。

原因在于获取因果关系往往花费太大、难度太高，而且相关性也能解决许多问题。

相关性： A 发生后，B 发生的可能性就增加，这就是相关性。
因果关系：从 A 一定能推导出 B，那么知道了 A 就等同于知道了 B。

如果相关性比较强，我们在得到信息 A 之后，就可以消除关于 B 的不确定性。

但是，如果 A 和 B 之间的相关性较弱，那种联系就没有意义。

  在过去，我们常常是先感觉两种信息相关，然后通过数据来验证，这是传统的数据方法。

大数据的方法不同，它不先进行预先的假设，由于数据量大，总是可以总结出一些相关性，而后再分析什么靠谱，什么不靠谱，并非所有看似相关的事情都靠谱。

一些事情，大数据是分析不出来的，比如：

影片 A 比影片 B 在上映时卖掉的冰淇淋多，一些大数据专家就在统计：是否看影片 A 的情侣多，一起吃冰淇淋，是否看影片A的年轻人多，喜欢吃冰淇淋，等等。
最后来了一个卖冰棍的老太太说，嗨，影片 A 是夏天放映的，影片 B 是冬天放映了。如果我们一定要统计，可能真能发现看影片 A 和 B 情侣比例的细微差异，但是如果我们就得出情侣看电影一定会吃冰淇淋的结论，就有点荒唐了。

那么，如何度量 A 和 B 的相关性是强还是弱？

信息论里有一个互信息公式度量，使用方法请见博客：《信息收集》之女人的裙摆 OR 股市的涨幅。

机器与人在学习上的区别

  人工智能现在很大的一个瓶颈在于 ta 不能理解因果关系。

人工智能的深度学习，靠的是用神经网络逐步建立一种 “识别模式”。

比如说，您想让人工智能识别狗，您要先建立一些算法的“神经元”，而后把这些神经元用叠层的方式连在一起，就好像三明治那样一层层的，这就是所谓的 “深度”。

当您拿出猫的照片交给人工智能，它的神经网络就开始工作，第一层的神经元首先会进行判断，信号传到第二层，再进行判断，以此类推。最后，整个神经网络会做出一个最终决定，判断照片中的是不是狗。

无论判断对错，这时候，您都要给人工智能反馈。如果判断错了，它就弱化那些导致错误结果的神经元连接，如果判断对了，它就强化这些连接。只要次数足够多，人工智能就会形成一个有效的识别狗的模式，达到很高的正确率。

但是，人类大脑并不是这么工作的，人类的认知并不是建立在“层层识别”的基础上，而是看穿现象背后的逻辑和因果关系，再归纳出抽象的知识，然后用这些知识来应对新情况。

概括地来说，这就是一种推理能力。这种能力最大的好处，就是让人不用依赖大量的数据就能学习。

举个例子，想让小孩子认识什么叫 “车”，就不需要让他们看上万张车的图片才能认出来，他们会总结——带轮子的、能在地上跑的就是车。

哪怕下次碰到的是之前从来没见过的拖拉机，小孩子也能知道这算是一种车。

但人工智能就做不到，它能准确识别一万次车，但它依然不理解车是什么。每次要辨认新种类的车，它都必须从头开始学习，每一次学习都要靠海量的数据来训练。

听到这您应该就明白了，人工智能和人在“学习”这件事上是有区别的。人类就是靠推理能力，快速学习，这种推理能力还能帮人类应对不断出现的新情况。

Kaggle项目拆解：泰坦尼克号预测

既然已经把机器学习的理论差不多说完了，我们一起来做一个人工智能的程序。

背景：电影《泰坦尼克号》。

题目：预测泰坦尼克号的某位乘客是存活还是遇难了。

这是来自 Kaggle 的竞赛题，如果您可以完整写出这个程序，机器学习也算入段了。

竞赛准备：Python 编程语言、Python 编译器、泰坦尼克号的数据。

(1) 准备数据

Kaggle 把泰坦尼克号的数据分为了俩份：

train.csv ：用于训练统计模型。
test.csv ：用于检测统计模型。

您可以直接在官网下载(推荐)，也可以 Q博客 ~

打开文件 train.csv：

train.csv

一共 12 类数据。

  PassengerId：乘客ID
  Survived ：是否幸存
Pclass ：舱位级别(头等舱、二等舱、三等舱)
  Name ：姓名
  Sex ：性别
  Age ：年龄
  SibSp ：在泰坦尼克号上是否有兄弟姐妹或配偶
  Parch ：在泰坦尼克号上是否有父母或子女
Ticket     ：船票号码
  Fare ：船票价格
  Cabin ：房间编号
  Embarked ：在英国哪个港口上的船

这 12 类数据在 Python 中叫 "列表"，在 C 语言中叫 "数组"。

数据都是列表/数组，其中包含了我们想要预测的某个人的各种信息，依据这些收集到的信息可以判断这人是否幸存。

使用 Python 输出 train.csv 里的数据：
import csv

# 打印数据
with open("train.csv","r") as csv_file:
    reader=csv.reader(csv_file)
    for line in reader:
        print(line)
(2) 挑选数据

题目：预测泰坦尼克号的某位乘客是存活还是遇难了。

使用 Python 接收 train.csv 里的数据：
import pandas as pd

# 接收数据
train = pd.read_csv('train.csv',)
我们现在就来统计一下泰坦尼克号上幸存者和遇难者的数量吧，只需要 Survived 这个变量即可。
import pandas as pd

# 接收数据
train = pd.read_csv('train.csv',)

# 第一次建立模型
print( train['Survived'].value_counts() )

# 使用函数：
# value_counts()是一种查看表格某列中有多少个不同值的快捷方法，并计算每个不同值有在该列中有多少重复值。
输出结果：

1: 幸存，0：遇难

总人数 = 遇难数 + 幸存数

= 549 + 342

= 819

第一次，预测某个人幸存的概率就是拿，

幸存数 ➗ 总人数

= 342 ➗ 819

= 41.76%

但我们还有许多数据呀，不要只从一个方面看，要把数据联系起来形成一个网络，这样预测的准确率才会提高。

如果您看过电影，就真的在船快沉时，救生艇都是让给了女士和小孩，所以性别会对预测幸存的概率产生影响。

使用 Python 做个统计：
import pandas as pd

# 接收数据
train = pd.read_csv('train.csv',)

# 第一次的统计模型，统计幸存者数量
print( train['Survived'].value_counts() )

# 使用函数：
# value_counts()是一种查看表格某列中有多少个不同值的快捷方法，并计算每个不同值有在该列中有多少重复值。

# 第二次的统计模型，统计 幸存的男性人数、男性的总人数
print( train['Survived'][train['Sex'] == 'male'].value_counts() )
输出结果：

幸存的男性人数 = 109、男性的总人数 = 577(109 + 468)。

男性的存活率 = 幸存的男性人数 ➗ 男性的总人数

= 109 ➗ 577

= 18.89%

再使用同样的语句，统计女性：
mport pandas as pd

# 接收数据
train = pd.read_csv('train.csv',)

# 第一次的统计模型，统计幸存者数量
print( train['Survived'].value_counts() )

# 使用函数：
# value_counts()是一种查看表格某列中有多少个不同值的快捷方法，并计算每个不同值有在该列中有多少重复值。

# 第二次的统计模型
# 统计 幸存的男性人数、男性的总人数
print( train['Survived'][train['Sex'] == 'male'].value_counts() )

# 统计 幸存的女性人数、女性的总人数
print( train['Survived'][train['Sex'] == 'female'].value_counts() )
输出结果：

计算同理，233 ➗ 314(233+81) = 74%。

第二次，预测某个人幸存的概率是：

男：18%
女：74%

通过加入性别这个数据，预测的准确性就增长了。

我们再引入一些别的数据，说不定还可以继续增长呢！！

不过，一个一个的分析实在太麻烦了。

(3) 引入算法

我们使用一个机器学习的算法，我选的决策树。

之前，学爬虫的时候接触过这个算法，但没研究，对了我把爬虫的经验都写在了博客：《爬虫专题》。

调用方法：
from sklearn import tree, preprocessing
为了更精确的预测，我们想一想，哪些数据和预测有相关性呢？

找出来，再把这些数据通通交给决策树，让算法自动建立一个模型。

我想了想，挑了 4 个：

Sex 性别：女士优先
Age 年龄：小孩优先
Fare ：船票价格，越贵的安全性应该更好
Pclass 舱位级别：人越少，站在救生艇上的概率就越大

(4) 训练模型

现在有个问题是算法要求每个数据都得有数值，可是 “Age” 这一项只有 714 个人的数据。

为此我们还得把剩下 100 多个人的年龄数据给补上，就用所有乘客年龄的中位数来代替。

这种事情在实际操作中非常常见。
import pandas as pd
from sklearn import tree, preprocessing

# 接收数据
train = pd.read_csv('train.csv',)

target = train['Survived'].values

# 预处理
encoded_sex = preprocessing.LabelEncoder()

# 转换成数值，因为 Sex 是字符串要转为数值才好操作
train.Sex = encoded_sex.fit_transform(train.Sex)
features_one = train[['Pclass', 'Sex', 'Age', 'Fare']].values

# 拟合一个决策树: my_tree_one
my_tree_one = tree.DecisionTreeClassifier()
my_tree_one = my_tree_one.fit(features_one, target)
preprocessing.LabelEncoder() 函数的使用说明，https://www.cnblogs.com/caimuqing/p/9074046.html

运行后，就有一个错误。

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

因为上面的是机器学习的核心代码，数据处理如下。
import pandas as pd
from sklearn import tree, preprocessing

# 接收数据
train = pd.read_csv('train.csv')

# 预处理
encoded_sex = preprocessing.LabelEncoder()

# 转换成数值，因为 Sex 是字符串要转为数值才好操作
train.Sex = encoded_sex.fit_transform(train.Sex)
features_one = train[['Pclass', 'Sex', 'Age', 'Fare']].values.tolist()

target = train['Survived'].values

# 补充缺失值
for x in range(len(features_one)):
    if str(features_one[x][2]) != 'nan':
        continue
    else:
        features_one[x][2] = 33
        # 为了方便把平均数改为 33。


# 拟合一个决策树: my_tree_one
my_tree_one = tree.DecisionTreeClassifier()
my_tree_one = my_tree_one.fit(features_one, target)
基本上就是您告诉程序要预测的目标是乘客是否存活，影响目标的四个因素是舱位、性别、年龄和船票价格，您选取的机器学习模型是“决策树”。

最后生成的这个“my_tree_one”，就是预测模型。

(5) 检测模型

查看各数据(属性)在决策树中的占比：
print( my_tree_one.feature_importance_ )
舱位：12%
性别：31%
年龄：24%
价格：32%

船票花费占比最大，这和程序参数的顺序有关。

这只是一个综合的统计性质，模型内部非常复杂，并不是对这几个数据做什么加权平均。

利用现有的训练数据，检测模型的预测准确度：
print( my_tree_one.score(features_one, target) )
预测得到 97% 的准确率！！！

完整代码：
import pandas as pd
from sklearn import tree, preprocessing

# 接收数据
train = pd.read_csv('train.csv')

# 预处理
encoded_sex = preprocessing.LabelEncoder()

# 转换成数值，因为 Sex 是字符串要转为数值才好操作
train.Sex = encoded_sex.fit_transform(train.Sex)
features_one = train[['Pclass', 'Sex', 'Age', 'Fare']].values.tolist()

target = train['Survived'].values

# 补充缺失值
for x in range(len(features_one)):
    if str(features_one[x][2]) != 'nan':
        continue
    else:
        features_one[x][2] = 33
        # 为了方便把平均数改为 33。


# 拟合一个决策树: my_tree_one
my_tree_one = tree.DecisionTreeClassifier()
my_tree_one = my_tree_one.fit(features_one, target)

# 查看各属性在决策树中的占比
print(my_tree_one.feature_importances_)

# 检测模型的预测准确度
print(my_tree_one.score(features_one, target))
输出结果：

97% 一个非常好的成绩，但是可以理解，因为毕竟您的模型就是用这组数据训练出来的。

因此，打开 test.csv 继续测试，发现测试数据里只有 11 项指标，不包含信息 Survived。

DataCamp 网站会帮您评分，具体的步骤就不写了，直接告诉您结果。

结果是，我们这个模型用于测试数据的准确度仍然高达 97%！！！

  如此粗糙的模型，它居然就能做到这么准确！

人工智能界对此有个专门的形容词，叫 “unreasonably effective”，不合理地准确。

  我们只知道泰坦尼克号上一半旅客的存活信息，我们根据这些信息做了一个预测模型，而后就能用这个模型，以97%的准确度，预测另一半旅客中每个人是否活了下来！

如果不满足 97% 的预测程度，还可以尝试新的组合。

  统计模型是数学的加强，依赖于参数估计，它要求模型的建立者，提前知道或了解变量之间的关系。

机器学习通过反复迭代学习发现隐藏在数据中的科学，由于机器学习作用在真实的数据上并不依赖于假设，预测效果是非常好的。

[一点点经验]

任何一个机器学习的过程，其实都是不断地调整数学模型参数的过程，直到参数收敛到最佳点。

每一次调整被称为是一次迭代，调整的幅度被称为迭代的步长。

一开始的时候，迭代的步长要比较大，这样能够很快地确定大致范围，效率比较高，这种方法在统计学上最优的，请见博客：《扔玻璃球》。

世界上每年有很多机器学习方面的论文，都是围绕提高学习效率展开的，而其中的核心其实就是怎样用最少次迭代，完成模型的训练。

当然，任何好的机器学习算法都不是事先人为设定步长，而是在学习的过程中，自动找到合适的步长。

任何从事过机器学习工作的人都知道这样一个诀窍，为了加快机器学习的收敛速度，最好先用标注过的高质量数据寻找收敛的方向，这比完全没有数据输入，全靠计算机自适应学习快得多。

那些标注过的，正确无误的数据，其实就是人总结出来的，或者见到的成功经验。没有这些成功经验，计算机通过自适应学习，也能收敛到正确的模型，但是时间要长很多。

如果你给计算机输入噪音，它们相当于失败的经验，计算机即使最终能回归到正确的模型上，也要走非常非常长的弯路。

  泰坦尼克号就是有监督学习，事先就想到要预测幸存这个变量，无监督学习是让程序自己发现变量之间的联系。

  学会了这个方法，使用现成的工具，只要有足够好的数据，您立即就可以搞几个人工智能应用。

比如一个信用卡公司有十万个用户的详细数据，包括年龄、收入、以往的购买记录、信用得分、还款记录等等，那您就可以预测其中每一个人下个月按时还款的可能性。

  现有的人工智能就是用统计方法增加猜测的准确度。

人工智能就是机器学习。机器学习就是统计模型。

“人工智能” 应该叫“人工不智能”。

因果革命

在过去不到三十年的时间里，珀尔领导了一场统计学家和计算机科学家参与的 “因果革命”。

  我们知道现在所谓的 AI，本质上就是机器学习。

用泰坦尼克号这条船上的数据，也只能用于预测泰坦尼克号上的人；

可是世界上每个地方都不一样诶：

医疗：没有俩个人的身体结构是一样的；
自动驾驶：每个地区的行车规则都不一样；
......

只要把事情、环境稍稍的改变，现在的人工智能就废了。

其实现在的人工智能没有改变什么，社会还是这个社会，计算机也没有解决我们的社会问题。

--- 摘自《人工不智能》

朱迪尔 · 珀尔

珀尔是机器学习技术的开山鼻祖之一，但他也是这种 AI 最激烈的批评者，他认为数据是极其愚蠢的。

珀尔要做的，是让计算机学会因果关系，让 AI 真正理解 ta 在干什么，这样 AI 就和人一样可以推理了。

In technological revolutions，past created new and often better kinds of work. The automation revolution, however, could break that pattern and all of work would be automated。
—— Fortune

这条路非常非常难，这是一条革命之路。

为此我们要建立所谓 “因果模型”，但在此之前，我们先研究一下因果思维到底是怎么回事。

举个例子，一个国家的人均巧克力消费量，和这个国家的诺贝尔奖得主人数，俩者之间有很强的相关性。

只是这个相关性就没什么意义，您总不可能说：“吃巧克力有利于得诺贝尔奖”。

  让我们解释这个相关性的话，肯定是：“巧克力消费量高是因为这个国家的经济比较发达，而经济比较发达的国家容易出诺贝尔奖得主”。

  请注意！只要这么一解释，就用到了因果关系。

判断相关性有没有意义的标准是什么呢？？

难道不还是要借助因果吗？？？

  无形之中，我们还是觉得有因果的相关性更有意义。

因果思维很有用，一个简单的因果模型，就能胜过无数经验。

因果模型可以模拟出人的推理能力，因此人工智能大神才会如此倾向于因果思维啦。

因果思维的层次

珀尔把因果思维一共分为三个等级。

1.  观察：通过数据分析做出预测，观察是寻找变量之间的相关性，观察就是积累经验，现在所有实用 AI 技术都是基于这个第一级思维。

看到有人打伞，就能推测出，外面可能在下雨。
如果一个顾客买牙膏的话，他有多大的概率同时也买牙线呢？
AlphaGo 下围棋，并不是它理解这步棋有什么用，ta 只不过知道走这步赢棋的概率会更大。
......

所有动物都有观察思维的能力。观察思维已经能解决很多问题，但是远远不够。

2.  干预：是预判一个行动的结果。

如果我现在把牙膏的价格给提高十倍，对牙线的销量会有什么影响？
如果我去表白会怎么样？
......

以往的经验可以给你一些提示，但干预动作的结果到底会怎样，您需要更高级的判断。

想知道结果，最好的办法是做实验。

互联网公司一直都在做各种 “A/B测试”，看看哪个标题能吸引更多点击，什么颜色的网页能让用户停留时间更长，都是用分组测试的方法。

测试是主动的干预。

  3.  想象，是对以前发生的事儿的反思。

如果我当时是那么做的话，现在会是一个什么样的结果？
如果我小时候认真读书，我能考上什么加州理工么？
......

  想象是智人的超能力。

珀尔引用了赫拉利在《人类简史》里的说法，大约是在七万年前，智人发生了一起 “认知革命” —— 智人开始想象一些不存在的东西。

第一级也许只要有数据分析就行，但第二和第三级，需要因果模型，您需要知道什么导致什么。

  有了因果模型，您就能在大脑里做各种思想实验，您就能权衡比较，您就能为未来做计划。

  人工智能现在很大的一个瓶颈在于 ta 不能理解因果关系。

因此，因果关系是人工智能发展的趋势，对比俩者，其实是基于机器学习的人工智能是不智能的，而未来的基于因果关系的人工智能才是真的智能。

如果对因果关系感兴趣，可以研究一些哲学、佛学、心理学、经济学等的内容，我将更新在博客：《大航海时代的海贼王》。

实现因果思维的方法

其实珀尔也是哲学家，哲学家对因果关系有十分深刻的讨论，目前计算机科学家已经有了俩种方法。

人类来帮 AI 总结出一套常识性的陈述，让它用深度学习的方法来训练；
直接给 AI 预先输入一些基本逻辑，让它在实践中用这些逻辑去做判断；

这两种方案都有一定的效果，但也都有短板。

[方法一]

美国西雅图有一家人工智能研究所，他们的科学家团队用的就是这个思路。

他们收集了很多常识性的陈述，作为人工智能学习的素材。

这些常识哪里来呢？

就是靠人来给，亚马逊有一个劳务众包平台，有很多人在线接一些散活儿。

研究人员就付费请这些众包人员来制作常识性的陈述。

比如，X 把 Y 打昏了”这样一个陈述，他们就找很多人来描述 X 的意图：X为什么这样做？

研究人员收集了 2 万 5 千条这样的陈述之后，就用这些陈述来训练AI，而后再让AI来分析新句子，推断出句子当中的情绪或者意图。
有时候，AI 能做出一些非常靠谱的推测，比如说，咱们换个问题：“杰克做了感恩节晚餐” ，AI会回答：“杰克的目的，是想要给家人留下美好的回忆”。这很不错了，对吧？

但是，从总体来看，AI的正确率不算高，最好的情况，也就能答对一半。

而且这个方法太需要依赖人力了，你想，有那么多条常识性的陈述，如果都要靠人工来想，这个工作量实在太恐怖了。

[方法二]

  彻底放弃了传统的深度学习的路子，直接把基本逻辑用硬编码的方式植入AI。

Vicarious公司用的就是这套方法，他们预先告诉AI：物体之间有相互作用，一个物体的运动轨迹，会因为与其他物体相互作用而发生改变——这就相当于给AI预先植入了人类对运动规律的基本认识。

  这种 AI 用很少的数据量就能学会新技能，而且还会变通。

  就拿“打砖块”来说，游戏规则改变之后，这么学习出来的 AI 竟然也能很快适应。

  它似乎和人类一样，抓到了这个游戏的本质。

  但这种方法最大的短板，就是AI做判断时速度比较慢，而且预先要植入什么样的逻辑，依然离不开人类的仔细推敲。

  听到这，您可能听出问题来了。

  是的，这两种方案表面上看似乎都能让人工智能掌握一些常识。

但问题在于，它们都是“治标不治本”。单靠人工智能，还是没有办法像人类一样积累常识。

长路漫漫，如果某天真的达到了，到时候大家就可以看到 “碾压” 人类能力的 AI 。

Computing as Pervasive Information Processes

The spread of computing into many fields in the 1990s was another factor in the disintegration of the automation consensus of computational thinking in the academic world.

Scientists who ran simulations or evaluated mathematical models were clearly thinking computationally but their interest was not about automating human tasks.

A computational interpretation of the universe started to gain a foothold in sciences(see the next section, "The Universe as a Computer").

The nail went into the automation coffin when scientists from other fields started saying around 2000 that they worked with naturally occurring information processes.

Biologists, for example, said that the natural process of DNA transcription was computational.

There was nothing to automate; instead they wanted to understand and then modify the process.

Biology is not alone.

Cognitive scientists see many brain processes as computational and have designed new materials by computing the reactions that yield them.

Drug companies use simulations and search, instead of tedious lab experiments, to find new compounds to treat diseases.

Physicists see quantum mechanics as a way to explain all particles and forces as information processes.

The list goes on.

What is more, many new innovations like blogging, image recognition , encryption, machine learning, natural language processing, and blockchains are all innovations made possible by computing.

But none of the above was an automation of any existing process-each created an altogether new process.

What a radical change from the days of Newell, Perlis, and Simon !

Then the very idea of computer science was. attacked because it did not study natural processes.

Today much of computing is directly relevant to understanding natural processes.

The Universe as a Computer

Some researchers say there is another stage of evolution beyond this: the idea that the universe is itself a computer.

Everything we think we see, and everything we think, is computed by a natural process.

Instead of using computational to understand nature, they say, we will eventually accept that everything in nature is computation.

In that case, computational thinking is not just another skill to be learned, it is the natural behavior of the brain.

Hollywood screenwriters love this story line.

They have taken it into popular science-fiction movies based on the notion that everything we think we see is produced for us by a computer simulation, and indeed every thought we think we have is an illusion given by a computation.

It might be an engaging story, but there is little evidence to support it.

This claim is a generalization of a distinction familiar in artificial intelligence.

Strong AI refers to the belief that suitably programmed machines can be literally intelligent.

Weak AI refers to the belief that, through smart programming, machines can simulate mental activities so well they appear intelligent without being intelligent.

For example, virtual assistants like Siri and Alexa are weak AI because they do a good job at recognizing common commands and acting on them without "understanding" them.

The pursuit for strong AI dominated the AI agenda from the founding of the AI field in 1950 until the late 1990s.

It produced very little insight into intelligence and no machines came close to anything that could be considered intelligent in the same way humans are intelligent.

The pursuit for specialized, weak AI applications rose to ascendance beginning in the 1990s and is responsible for the amazing innovations with neural networks and biodata analysis.

Similar to the weak-strong distinction in AI, the "strong" computational view of the universe holds that the universe itself, along with every living being, is a digital computer.

Every dimension of space and time is discrete and every movement of matter or energy is a computation.

In contrast, the "weak" computational view of the universe dose not claim that the world computes, but only that computational interpretations of the world are very useful for studying phenomena: we can model, simulate, and study the world using computation.

The strong computational view is highly speculative, and while it has some passionate proponents, it faces numerous problems both empirical and philosophical.

Its rise is understandable as a continuation of the ongoing quest to understand the world through the latest available technology.

For instance, in the Age of Enlightenment, the world was compared to the clockwork.

The brain has successively been compared to the mill, the telegraph system, hydraulic systems, electromagnetic systems, and the computer.

The newest stage in this progression is to interpret the world is not a classical computer but a quantum computer.