pandas Python Data Analysis Library

# pandas Python Data Analysis Library

import pandas as pd

# Series DataFrame

from pandas import Series,DataFrame

Series类似一维数组的对象有values和index两个参数

# series创建方式

有列表或numpy创建

n=np.array([1,4,5,7,8,3])
s=Series(n,list('abcdef'))
type(s)
type(n)
# pandas.core.series.Series
# numpy.ndarray

可以通过设置index参数指定索引

n = np.array([0,2,4,6,8])
s = Series(n)
s.index = list('abcde')
s

a    0
b    2
c    4
d    6
e    8
dtype: int32

s.index = ['张三','李四','Michael','sara','lisa']
s['张三'] = 100
s

张三         100
李四           2
Michael      4
sara         6
lisa         8
dtype: int32

n
# array([100,   2,   4,   6,   8])

rn=list("abcdf")
rn1=Series(rng)
rn1[2]=1000
rn
#  ['a', 'b', 'c', 'd', 'f']

特别地，由ndarray创建的是引用，而不是副本。对Series元素的改变也会改变原来的ndarray对象中的元素。（列表没有这种情况）

由字典创建

dic = {'a':np.random.randint(0,10,size = (2,3)),
       'b':np.random.randint(0,10,size = (2,3)),
            'c':np.random.randint(0,10,size = (2,3))}

s2 = Series(dic)
s2

a    [[9, 2, 5], [9, 5, 1]]
b    [[2, 7, 7], [5, 7, 6]]
c    [[5, 9, 6], [1, 4, 2]]
dtype: object

# series功能

显示索引
- 使用index中的元素作为索引值
- 使用.loc[]（推荐）

s = Series(data = np.random.randint(0,150,size = 4),index=['语文','数学','英语','Python'])

语文        143
数学         83
英语        132
Python     40
dtype: int32

s['Python']
# 40
s[['Python','数学']]
# Python    40
# 数学        83
# dtype: int32

s.loc[['Python','数学']]
# Python    40
# 数学        83
# dtype: int32

隐式索引
- 使用整数作为索引值
- 使用.iloc[]（推荐）

s[0]
#143

s.iloc[[1,2]]
# 数学     83
# 英语    132
# dtype: int32

切片

如果是显示索引，则是闭区间，如果是隐式索引则是左闭右开区间

s.loc['语文':'英语']
# 语文    143
# 数学     83
# 英语    132
# dtype: int32

s.iloc[0:2]
# 语文    143
# 数学     83
# dtype: int32

# series 基本概念

相当于一个定长的有序字典可以通过shape，size，index,values等得到series的属性

s.shape
# (4,)

s.size
# 4

s.values
# array([143,  83, 132,  40])

s.index
# Index(['语文', '数学', '英语', 'Python'], dtype='object')

可以通过head(),tail()快速查看Series对象的样式

当索引没有对应的值时，可能出现缺失数据显示NaN（not a number）的情况

s = Series(data = ['张三','Sara',None])
#0      张三
#1    Sara
#2    None
#dtype: object

可以使用pd.isnull()，pd.notnull()，或自带isnull(),notnull()函数检测缺失数据

pd.isnull(s)

0    False
1    False
2     True
dtype: bool

过滤空值

s_notnull = s.notnull()
s[s_notnull]

0      张三
1    Sara
dtype: object

运算

s2 = Series(data = np.random.randint(0,100,size = 5))

0    89
1    18
2    35
3    32
4    47
dtype: int32

两个series相加，如果不拥有同样多的index，会导致后面的为NaN,可以使用add函数

要想保留所有的index，则需要使用.add()函数

s = Series(data = np.random.randint(0,100,size = 10))
s2 = Series(data = np.random.randint(0,100,size = 5))
s.add(s2,fill_value=0)

0     51.0
1    106.0
2     92.0
3     54.0
4     62.0
5     14.0
6      6.0
7     64.0
8     10.0
9     70.0
dtype: float64

← numpy