## 数据-人类文明的基石

Data is a set of values of qualitative or quantitative variables.

Information is any entity or form that resolves uncertainty or provides the answer to a question of some kind.

A mathematical model is a description of a system using mathematical concepts and language.A model may help to explain a system and to study the effects of different components, and to make predictions about behaviour.

The adjective data-driven means that progress in an activity is compelled by data.

## 大数据与机器智能

In computer science AI research is defined as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.

Big data is data sets that are so voluminous and complex that traditional data-processing application software are inadequate to deal with them.

• Variety(百度发布的中国十大吃货省市排行榜说明了正是大数据的多样性,我们才能从中获得更多有价值的结果)

## 思维的革命

the universe is reducible to completely mechanical principles—that is, the motion and collision of matter.

• 欧几里得创立了基于公理化体系的几何学

• 托勒密总结出:通过观察获得数学模型的雏形,然后利用数据来细化模型

• 笛卡尔总结出:大胆假设,小心求证

• 牛顿不仅把欧几里得通过逻辑建立起来的方法论从数学扩展到自然科学领域,而且把托勒密用机械运动描述天体的规律扩展到对世界任何规律的描述,后来人们将牛顿的方法论概括为机械思维,其核心思想有三:

1. 世界的变化规律是确定的
2. 规律不仅可以被认知,而且可以用简单的语言描述
3. 这些规律可以在未知领域指导实践

• 否认了不确定性(量子力学中测不准原理$\Delta t\cdot\Delta p>\varepsilon$)

In statistical mechanics, entropy is an extensive property of a thermodynamic system. It is closely related to the number Ω of microscopic configurations (known as microstates) that are consistent with the macroscopic quantities that characterize the system (such as its volume, pressure and temperature). Under the assumption that each microstate is equally probable, the entropy S is the natural logarithm of the number of microstates, multiplied by the Boltzmann constant $k_{B}$. Formally,$S=k_{B}\ln\Omega$.

• 自信息:$H(X)=\mathbb {E}_{X}[I(x)]=-\sum_{x\in \mathbb {X} }p(x)\log p(x)$
• 互信息:$I(X;Y)=\mathbb {E}_{X,Y}[SI(x,y)]=\sum_{x,y}p(x,y)\log {\frac {p(x,y)}{p(x)\,p(y)}}$
• 香农第一定律(Shannon's source coding theorem):对于信号源发出的所有信息设置一种编码,那么编码的平均长度一定大于该源的信息熵;且一定存在一种编码,这种编码的平均长度能无限接近于它的信息熵(这种编码又称为是霍夫曼编码)
• 香农第二定律(Noisy-channel coding theorem):信息传播速率不可能超过信道的容量

The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge is the one with largest entropy, in the context of precisely stated prior data (such as a proposition that expresses testable information).

## 智能革命和未来社会

### 区块链

• Block:一个账户存储信息
• Chain:一连串的交易信息

• a cryptocurrency and worldwide payment system
• without a central bank or single administrator
• the network is peer-to-peer
• transactions are verified by network nodes through the use of cryptography and recorded in a public distributed ledger called a blockchain