easy-algorithm-interview-an.../code-languages/python/pandas map applymap apply方法...

328 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

## 0 前言
pandas的基本数据结构是Series与DataFrame。在数据处理过程中对每个元素或者每行/每列进行操作是尝尽的需求。而在pandas中就内置了map,applymap,apply方法可以满足上面的需求。接下来结合实际的例子看看一些基本/常规/高大上的操作。
## 1.map方法
map方法在数据处理中属于基本操作重要性无须多言。map方法一般是对元素进行逐个操作下面来看看几个例子。
首先明确一点map方法只能作用再Series上不能作用在DataFrame上。换句话说DataFrame没有map方法。
Series中map方法的部分源码如下
```
def map(self, arg, na_action=None):
"""
Map values of Series according to input correspondence.
Used for substituting each value in a Series with another value,
that may be derived from a function, a ``dict`` or
a :class:`Series`.
Parameters
----------
arg : function, collections.abc.Mapping subclass or Series
Mapping correspondence.
na_action : {None, 'ignore'}, default None
If 'ignore', propagate NaN values, without passing them to the
mapping correspondence.
Returns
-------
Series
Same index as caller.
See Also
--------
Series.apply : For applying more complex functions on a Series.
DataFrame.apply : Apply a function row-/column-wise.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame.
Notes
-----
When ``arg`` is a dictionary, values in Series that are not in the
dictionary (as keys) are converted to ``NaN``. However, if the
dictionary is a ``dict`` subclass that defines ``__missing__`` (i.e.
provides a method for default values), then this default is used
rather than ``NaN``.
```
map方法的主要参数是argarg是一个方法或者字典作用在每个元素上。
看个例子:
```
import numpy as np
import pandas as pd
def test():
genders = ["male", "male", "female", "unknown", "female"]
levels = ["L1", "L2", "L1", "L1", "L2"]
df = pd.DataFrame({"gender": genders, "level": levels})
gender_dic = {"male": "男", "female": "女", "unknown": "未知"}
print(df)
print("\n\n")
df["gender"] = df["gender"].map(gender_dic)
print(df)
```
输出如下:
```
gender level
0 male L1
1 male L2
2 female L1
3 unknown L1
4 female L2
gender level
0 男 L1
1 男 L2
2 女 L1
3 未知 L1
4 女 L2
```
上面的代码是将gender这一列里的male映射成男female映射成女unknown映射成未知。
```
def test():
x = [i for i in range(1, 11)]
y = [2*i + 0.5 for i in x]
df = pd.DataFrame({'x': x, 'y': y})
x2 = df['x']
print(x2.map(lambda i: "%.2f" % i))
print(x2.map(lambda i: "{:.2f}".format(i)))
```
```
0 1.00
1 2.00
2 3.00
3 4.00
4 5.00
5 6.00
6 7.00
7 8.00
8 9.00
9 10.00
Name: x, dtype: object
0 1.00
1 2.00
2 3.00
3 4.00
4 5.00
5 6.00
6 7.00
7 8.00
8 9.00
9 10.00
Name: x, dtype: object
```
上面的方法则是将x变成带两位小数的浮点数。
不论是利用字典还是函数进行映射map方法都是把对应的数据逐个当作参数传入到字典或函数中得到映射后的值。
## 2.applymap方法
上面提到dataframe没有map方法。要对dataframe中的元素实现类似map的功能可以使用applymap方法。
```
def t8():
x = [i for i in range(1, 11)]
y = [2*i + 0.5 for i in x]
df = pd.DataFrame({'x': x, 'y': y})
print(df)
print()
print(df.applymap(lambda i: "%.2f" % i))
```
```
x y
0 1 2.5
1 2 4.5
2 3 6.5
3 4 8.5
4 5 10.5
5 6 12.5
6 7 14.5
7 8 16.5
8 9 18.5
9 10 20.5
x y
0 1.00 2.50
1 2.00 4.50
2 3.00 6.50
3 4.00 8.50
4 5.00 10.50
5 6.00 12.50
6 7.00 14.50
7 8.00 16.50
8 9.00 18.50
9 10.00 20.50
```
前面的例子是对x这一列做map操作将x中的数值变成带两位小数的浮点数。如果我们想将dataframe中的x,y同时变成带两位小数的浮点数可以使用applymap方法。
## 3.apply方法
apply方法与map的功能类似主要区别在于apply能传入功能更为复杂的函数。
```
def apply(self, func, convert_dtype=True, args=(), **kwds):
"""
Invoke function on values of Series.
Can be ufunc (a NumPy function that applies to the entire Series)
or a Python function that only works on single values.
Parameters
----------
func : function
Python function or NumPy ufunc to apply.
convert_dtype : bool, default True
Try to find better dtype for elementwise function results. If
False, leave as dtype=object.
args : tuple
Positional arguments passed to func after the series value.
**kwds
Additional keyword arguments passed to func.
Returns
-------
Series or DataFrame
If func returns a Series object the result will be a DataFrame.
See Also
--------
Series.map: For element-wise operations.
Series.agg: Only perform aggregating type operations.
Series.transform: Only perform transforming type operations.
```
我们看一下apply方法的源码首先方法签名为
```
def apply(self, func, convert_dtype=True, args=(), **kwds):
```
与map的源码相比apply除了可以输入func还可以以元组的方式输入参数这样能够输入功能更加复杂的函数。
下面来看几个例子
```
def square(x):
return x**2
def test():
s = pd.Series([20, 21, 12], index = ['London', 'New York', 'Helsinki'])
s1 = s.apply(lambda x: x**2)
s2 = s.apply(square)
s3 = s.apply(np.log)
print(s1)
print()
print(s2)
print()
print(s3)
```
输出为
```
London 400
New York 441
Helsinki 144
dtype: int64
London 400
New York 441
Helsinki 144
dtype: int64
London 2.995732
New York 3.044522
Helsinki 2.484907
dtype: float64
```
上面的用法比较简单跟map方法是一样的。
再看一个复杂一些的例子
```
def BMI(series):
weight = series['weight']
height = series['height'] / 100
BMI_Rate = weight / height**2
return BMI_Rate
def test():
heights = [180, 175, 169, 158, 185]
weights = [75, 72, 68, 60, 76]
age = [30, 18, 26, 42, 34]
df = pd.DataFrame({"height": heights, "weight": weights, "age": age})
print(df)
print()
df['BMI'] = df.apply(BMI, axis=1)
print(df)
```
输出结果为
```
height weight age
0 180 75 30
1 175 72 18
2 169 68 26
3 158 60 42
4 185 76 34
height weight age BMI
0 180 75 30 23.148148
1 175 72 18 23.510204
2 169 68 26 23.808690
3 158 60 42 24.034610
4 185 76 34 22.205990
```
数据中包括身高体重然后计算BMI指数=体重/身高的平方。
上面的apply方法在调用的时候指定了axis=1就是对每行进行操作。如果不容易的理解的同学可以这么想:axis=1要消除的是列的维度保留行的维度所以是对每行的数据进行操作。apply方法在运行时实际上就是调用BMI方法对每行数据进行操作。
```
def subtract_custom_value(x, custom_value):
return x - custom_value
def test():
s = pd.Series([20, 21, 12], index = ['London', 'New York', 'Helsinki'])
print(s)
print()
s1 = s.apply(subtract_custom_value, args=(5,))
print(s1)
```
输出结果为
```
London 20
New York 21
Helsinki 12
dtype: int64
London 15
New York 16
Helsinki 7
dtype: int64
```
上面代码运行的时候就是将每个值减去5因为要传入参数5所以map方法此时就无能为力。
## 4.总结
1.map方法是针对Series的基本操作dataframe无map方法。
2.dataframe如果要针对每个元素做map操作可以使用applymap。
3.apply方法更为灵活可以同时作用于series与dataframe。同时可以以元组的形式传入参数。