NumPy guide

参考：NumPy tutorial, Matplotlib tutorial, pandas tutorial, Numerical Analysis

ndarray数组

NumPy以n维数组(ndarray)的形式处理数据，称之为ndarray对象。

>>> import numpy as np
>>> a = np.array([[1,2,3],[4,5,6]])
>>> a
array([[1, 2, 3],
       [4, 5, 6]])
>>> a.shape
(2,3)

a.shape返回了ndarray的维度，类似的，ndarray还有一些其它的属性：

`a.ndim`	a轴的个数(秩)，如上例中，a.ndim为2
`a.shape`	a的维度，如上例的二维数组，a.shape为(2,3)
`a.size`	a元素的总个数
`a.dtype`	a中元素的类型
`a.itemsize`	a中每个元素的字节大小
`a.data`	a中元素的缓冲区
`a.T`	a的转置
`a.H`	a的共轭转置
`a.I`	a的逆
`a.trace`	a的迹，即对角线元素之和
`np.linalg.det(a)`	矩阵a的行列式
`np.linalg.norm（a, ord=None)`	矩阵a的范数
`np.linalg.eig(a)`	矩阵a的特征值和特征向量
`np.linalg.cond(a, p=None)`	矩阵a的条件数
`np.linalg.inv(a)`	矩阵a的逆矩阵

创建数组

上文中已经通过array创建了一个数组，常用的数组创建函数有：array, asarray, arange, ones, ones_like, zeros, zeros_like, empty, empty_like, eye, identity等。

数据类型

创建数组过程中还可以指定dtype，如

>>> a = ones( (2,3,4), dtype=int16 )

NumPy的数据类型有：

bool	用一个Bit存储的布尔类型（True或False）
inti	由所在平台决定其大小的整数（一般为int32或int64）
int8	一个字节大小，-128 至 127
int16	整数，-32768 至 32767
int32	整数，-2 E 31 至 2 E 32 -1
int64	整数，-2 E 63 至 2 E 63 - 1
uint8	无符号整数，0 至 255
uint16	无符号整数，0 至 65535
uint32	无符号整数，0 至 2 E 32 - 1
uint64	无符号整数，0 至 2 E 64 - 1
float16	半精度浮点数：16位，正负号1位，指数5位，精度10位
float32	单精度浮点数：32位，正负号1位，指数8位，精度23位
float64或float	双精度浮点数：64位，正负号1位，指数11位，精度52位
complex64	复数，分别用两个32位浮点数表示实部和虚部
complex128或complex	复数，分别用两个64位浮点数表示实部和虚部

一元运算

一元运算会对ndarray中的每个元素分别进行运算，称为broadcasting，常用的运算包括abs, fabs, sqrt(开平方), square(二次方), exp(e的x次幂), log, log10, log2, log1P, sign, ceil(向上取整), floor(向下取整), rint(四舍五入), modf(分离整数和小数部分，分别返回), isnan(判断是否数字), isfinite(判断是否无穷), isinf(判断是否有限), cos, cosh, sin, sinh, tan, tanh, arccos, arccosh, arcsin, arcsinh, arctan, arctanh, logical_not, 使用help或dir可查看其详细用法。

二元运算

ndarray的二元运算也是对元素分别运算，常用的有add, subtract, multiply, divide, floor_divide, power, maximum, fmax, minimum, fmin, mod, copysign, greater, greater_equal, less, less_equal, equal, not_equal, logical_and, logical_or, logical_xor, 如：

>>> np.equal(a, a)

使用help或dir可查看其详细用法。

数据导入与导出

>>> a = np.loadtxt('sample.txt', delimiter=',')
>>> np.savetxt('sample_new.txt', a, delimiter=',')

统计方法

`sum`	全部/某轴向元素之和
`mean`	算术平均值
`std`, `var`	标准差/方差
`min`, `max`	最大/最小元素
`argmin`, `argmax`	最大/最小元素的索引
`cumsum`, `cumprod`	所有元素之和/之积

排序

`arr.sort(1)`
`np.unique(a)`	去掉a中的重复元素，返回排序后结果
`np.in1d(A, B)`	测试B中元素是否存在于A中
`np.intersect1d(A, B)`	返回A，B共有元素
`np.union1d(A, B)`	返回A，B合集
`np.setdiff1d(A, B)`	返回在A中，不在B中的元素
`np.setxor1d(A, B)`	返回存在于A或B，但不同时存在于A，B中的元素

linalg函数

`np.diag()`	返回矩阵对角线元素；返回以一维数组元素为对角线的方阵
`np.dot()`	矩阵乘法；内积
`np.trace()`	对角线之和；迹
`np.linalg.det()`	行列式
`np.linalg.eig()`	本征值、本征向量
`np.linalg.inv()`	逆
`np.linalg.pinv()`	Moore-Penrose伪逆
`np.linalg.qr()`	QR分解(Q为正交矩阵，R为上三角矩阵，A=QR，若R对角线元素为正，则分解唯一)
`np.linalg.svd()`	奇异值分解
`np.linalg.solve()`	Ax=b，计算x
`np.linalg.lstsq()`	y=mx+c，最小二乘解

线性方程组的解法：

>>> a = np.array([[3,1], [1,2]])
>>> b = np.array([9,8])
>>> x = np.linalg.solve(a, b) #
>>> x
array([ 2.,  3.])

验证计算结果：

>>> np.allclose(np.dot(a, x), b)
True

最小二乘法实例：

>>> x = np.array([0, 1, 2, 3])
>>> y = np.array([-1, 0.2, 0.9, 2.1])
>>> A = np.vstack([x, np.ones(len(x))]).T #
>>> A
array([[ 0.,  1.],
       [ 1.,  1.],
       [ 2.,  1.],
       [ 3.,  1.]])
>>> m, c = np.linalg.lstsq(A, y)[0] #
>>> print(m, c)
1.0 -0.95

使用matplotlib可以画出拟合曲线：

>>> import matplotlib.pyplot as plt
>>> plt.plot(x, y, 'o', label='Original data', markersize=10)
>>> plt.plot(x, m*x + c, 'r', label='Fitted line')
>>> plt.legend()
>>> plt.show()

random函数

`np.random.seed()`	给出随机数生成器种子
`np.random.permutation()`	随机排列，可给出范围或种子
`np.random.shuffle()`	对序列随机排列
`np.random.rand()`	均匀分布
`np.random.randint()`	指定上下限内的随机整数
`np.random.randn()`	正态分布
`np.random.binomal()`	二项分布
`np.random.normal()`	正态(高斯)分布
`np.random.beta()`	Beta分布
`np.random.gamma()`	Gamma分布
`np.random.uniform()`	[0,1)均匀分布

使用help或dir可查看更多函数及其详细用法。

where

>>> result = np.where(cond, A, B)

cond是表达式、bool值或矩阵，详见help(np.where)。

bool矩阵

`a.any()`	a中是否存在bool值为1/True的元素
`a.all()`	a中元素是否全部为1/True
`(a>0).sum()`	计算符合条件`a>0`的元素之和

[TOP]