add python
parent
fa597b85ed
commit
0f1d690937
|
@ -0,0 +1,61 @@
|
|||
## 1.anaconda默认源太慢
|
||||
anaconda的默认源在下载安装相应包的时候,速度很慢,碰到包稍微大一点,基本就慢得让人无法接受。因此可以更改一下源的配置,提高效率。
|
||||
|
||||
## 2.查看conda版本
|
||||
执行下面的命令
|
||||
|
||||
```
|
||||
conda --version
|
||||
conda 4.8.2
|
||||
```
|
||||
|
||||
可以查看到本地的conda版本,表明此时conda安装成功。
|
||||
如果没有成功显示,重新配置一下环境变量即可。
|
||||
修改.bash_profile配置文件
|
||||
```
|
||||
export PATH="anaconda的路径"
|
||||
```
|
||||
|
||||
## 3.修改.condarc的配置
|
||||
用vim打开~/.condarc文件。如果没有,生成对应的文件即可。
|
||||
编辑上述文件,加入对应的源
|
||||
|
||||
```
|
||||
channels:
|
||||
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
|
||||
show_channel_urls: true
|
||||
```
|
||||
|
||||
~/.condarc文件里只需上述三行即可。
|
||||
|
||||
修改完上述配置以后,此时重新安装新的包速度很快,亲测有效!
|
||||
|
||||
## 4.验证修改是否成功
|
||||
|
||||
```
|
||||
conda info
|
||||
|
||||
active environment : base
|
||||
active env location : /Users/wanglei/anaconda3/anaconda3
|
||||
shell level : 1
|
||||
user config file : /Users/wanglei/.condarc
|
||||
populated config files : /Users/wanglei/.condarc
|
||||
conda version : 4.8.2
|
||||
conda-build version : 3.18.11
|
||||
python version : 3.7.6.final.0
|
||||
virtual packages : __osx=10.15.3
|
||||
base environment : /Users/wanglei/anaconda3/anaconda3 (writable)
|
||||
channel URLs : https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/osx-64
|
||||
https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/noarch
|
||||
package cache : /Users/wanglei/anaconda3/anaconda3/pkgs
|
||||
/Users/wanglei/.conda/pkgs
|
||||
envs directories : /Users/wanglei/anaconda3/anaconda3/envs
|
||||
/Users/wanglei/.conda/envs
|
||||
platform : osx-64
|
||||
user-agent : conda/4.8.2 requests/2.22.0 CPython/3.7.6 Darwin/19.3.0 OSX/10.15.3
|
||||
UID:GID : 501:20
|
||||
netrc file : None
|
||||
offline mode : False
|
||||
```
|
||||
|
||||
关注一下channel URLs字段,可以发现已经变成我们添加的源,说明已经生效。
|
|
@ -0,0 +1,20 @@
|
|||
调用sklearn的model_selection时,发现sklearn中没有model_selection的模块。经过检查,发现anaconda中的sklearn版本太低,为0.17版本。于是,开始了sklearn的升级之旅。
|
||||
|
||||
## 1.查看原有版本
|
||||
首先使用`conda list`命令,查看一下现有的版本:
|
||||
![这里写图片描述](https://github.com/bitcarmanlee/easy-algorithm-interview-photo/blob/master/languages/python/updatesklearn/1.png)
|
||||
|
||||
果不其然,版本是0.17.1,版本太低,果断开始升级。
|
||||
|
||||
## 2.升级到最新版本
|
||||
使用`conda update scikit-learn`命令,更新sklearn的版本。更新之前,会提示将更新到什么版本。
|
||||
![这里写图片描述](https://github.com/bitcarmanlee/easy-algorithm-interview-photo/blob/master/languages/python/updatesklearn/2.png)
|
||||
|
||||
可以看出最新的版本为0.19.0
|
||||
|
||||
然后确认,开始更新。
|
||||
![这里写图片描述](https://github.com/bitcarmanlee/easy-algorithm-interview-photo/blob/master/languages/python/updatesklearn/3.png)
|
||||
|
||||
由于此次更新需要更新的包很多,并且很大,所以需要等相当长的时间。。。
|
||||
|
||||
等更新完毕以后,再使用model_selection包,就OK了.
|
|
@ -0,0 +1,370 @@
|
|||
## 1.前言
|
||||
pandas可以将读取到的数据(不一定是csv或者txt)转换成dataframe,然后后面可以方便地对dataframe进行操作,进行各种数据分析工作。下面我们对pandas里常用的一些IO操作进行详细的分析。
|
||||
|
||||
## 2.read_csv
|
||||
read_csv最常用的方式是从文件中读取数据,read_csv默认的分隔符号是逗号
|
||||
示例数据:
|
||||
|
||||
```
|
||||
57647:0.059819,26223:0.048002,100295:0.055268,60232:0.049508
|
||||
35824:0.04753,57776:0.055802,40677:0.049119,14445:0.040235
|
||||
102136:0.052933,3736:0.07613,21681:0.10266,44816:0.058018
|
||||
```
|
||||
|
||||
不指定列名称
|
||||
|
||||
```
|
||||
def read11():
|
||||
data = pd.read_csv("../data/tt3", header=None)
|
||||
print data
|
||||
```
|
||||
|
||||
```
|
||||
0 1 2 3
|
||||
0 57647:0.059819 26223:0.048002 100295:0.055268 60232:0.049508
|
||||
1 35824:0.04753 57776:0.055802 40677:0.049119 14445:0.040235
|
||||
2 102136:0.052933 3736:0.07613 21681:0.10266 44816:0.058018
|
||||
```
|
||||
|
||||
指定列名称
|
||||
|
||||
```
|
||||
def read12():
|
||||
data = pd.read_csv("../data/tt3", header=None, names=['c1', 'c2', 'c3', 'c4'])
|
||||
print data
|
||||
```
|
||||
|
||||
```
|
||||
c1 c2 c3 c4
|
||||
0 57647:0.059819 26223:0.048002 100295:0.055268 60232:0.049508
|
||||
1 35824:0.04753 57776:0.055802 40677:0.049119 14445:0.040235
|
||||
2 102136:0.052933 3736:0.07613 21681:0.10266 44816:0.058018
|
||||
```
|
||||
上面的数据都不带表头,所以设置header=None
|
||||
|
||||
## 3.read_table
|
||||
read_csv默认的分隔符是逗号,如果想改变分隔符,可以用read_table指定分隔符
|
||||
|
||||
数据
|
||||
|
||||
```
|
||||
8803b236442fed8a37a5beb04556f684 54530 1 0
|
||||
f4afa2b8f50e8aacf967628b3dc11bff 102017 1 0
|
||||
4077bb0001ba73f4dd966a7f6fb46075 50276 1 0
|
||||
```
|
||||
|
||||
代码
|
||||
|
||||
```
|
||||
def read21():
|
||||
data = pd.read_table("../data/tt1", header=None, sep='\t')
|
||||
print data
|
||||
|
||||
|
||||
def read22():
|
||||
data = pd.read_table("../data/tt1", header=None, sep='\t', names=['c1', 'c2', 'c3', 'c4'])
|
||||
print data
|
||||
|
||||
|
||||
read21()
|
||||
read22()
|
||||
```
|
||||
|
||||
```
|
||||
FutureWarning: read_table is deprecated, use read_csv instead.
|
||||
data = pd.read_table("../data/tt1", header=None, sep='\t')
|
||||
0 1 2 3
|
||||
0 8803b236442fed8a37a5beb04556f684 54530 1 0
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017 1 0
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276 1 0
|
||||
FutureWarning: read_table is deprecated, use read_csv instead.
|
||||
data = pd.read_table("../data/tt1", header=None, sep='\t', names=['c1', 'c2', 'c3', 'c4'])
|
||||
c1 c2 c3 c4
|
||||
0 8803b236442fed8a37a5beb04556f684 54530 1 0
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017 1 0
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276 1 0
|
||||
|
||||
```
|
||||
|
||||
根据提示来看,read_table方法后面会被废弃,统一用read_csv方法。
|
||||
|
||||
## 4.设置索引
|
||||
|
||||
```
|
||||
def read31():
|
||||
data = pd.read_csv("../data/tt3", header=None, names=['c1', 'c2', 'c3', 'c4'], index_col='c2')
|
||||
print data
|
||||
|
||||
|
||||
read31()
|
||||
```
|
||||
|
||||
```
|
||||
c1 c3 c4
|
||||
c2
|
||||
26223:0.048002 57647:0.059819 100295:0.055268 60232:0.049508
|
||||
57776:0.055802 35824:0.04753 40677:0.049119 14445:0.040235
|
||||
3736:0.07613 102136:0.052933 21681:0.10266 44816:0.058018
|
||||
```
|
||||
|
||||
跟之前的结果对比,如果设置了index_col来设置列索引,原来默认从0开始的整数索引不见了。之前没设置索引的时候,默认是按行号从0开始设置索引的。
|
||||
|
||||
|
||||
```
|
||||
def read32():
|
||||
data = pd.read_csv("../data/tt3", header=None, names=['c1', 'c2', 'c3', 'c4'], index_col=['c2', 'c1'])
|
||||
print data
|
||||
|
||||
```
|
||||
|
||||
```
|
||||
c3 c4
|
||||
c2 c1
|
||||
26223:0.048002 57647:0.059819 100295:0.055268 60232:0.049508
|
||||
57776:0.055802 35824:0.04753 40677:0.049119 14445:0.040235
|
||||
3736:0.07613 102136:0.052933 21681:0.10266 44816:0.058018
|
||||
```
|
||||
|
||||
上面的例子为指定多个列为索引
|
||||
|
||||
## 5.缺失值删除
|
||||
实际数据肯定会比较脏,不会特别干净,有缺失值的情况很常见,填充缺失值就成了数据预处理中一项很重要的工作。
|
||||
pandas在读取文件时,默认会将NA, NULL等特殊字符串当成缺失值,默认会使用NaN进行替换。
|
||||
|
||||
数据如下
|
||||
```
|
||||
8803b236442fed8a37a5beb04556f684 54530 1 0 abc NULL
|
||||
f4afa2b8f50e8aacf967628b3dc11bff 102017 1 0 NULL NULL
|
||||
4077bb0001ba73f4dd966a7f6fb46075 50276 1 0 NULL NULL
|
||||
4077bb0001ba73f4dd966a7f6fb46075 50276 1 0 456 123
|
||||
```
|
||||
|
||||
```
|
||||
def read41():
|
||||
data = pd.read_table("../data/tt2", header=None, sep='\t')
|
||||
print data
|
||||
```
|
||||
|
||||
```
|
||||
0 1 2 3 4 5
|
||||
0 8803b236442fed8a37a5beb04556f684 54530 1 0 abc NaN
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017 1 0 NaN NaN
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276 1 0 NaN NaN
|
||||
3 4077bb0001ba73f4dd966a7f6fb46075 50276 1 0 456 123.0
|
||||
```
|
||||
|
||||
### 5.1删除含有缺失值的行与列
|
||||
|
||||
```
|
||||
def read42():
|
||||
data = pd.read_table("../data/tt2", header=None, sep='\t')
|
||||
print data.dropna()
|
||||
```
|
||||
|
||||
```
|
||||
0 1 2 3 4 5
|
||||
3 4077bb0001ba73f4dd966a7f6fb46075 50276 1 0 456 123.0
|
||||
```
|
||||
|
||||
dropna删除含有缺失值的行。如果想删除含有缺失值的列,可以指定axis=1
|
||||
|
||||
```
|
||||
def read42():
|
||||
data = pd.read_table("../data/tt2", header=None, sep='\t')
|
||||
print data.dropna(axis=1)
|
||||
```
|
||||
|
||||
```
|
||||
0 1 2 3
|
||||
0 8803b236442fed8a37a5beb04556f684 54530 1 0
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017 1 0
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276 1 0
|
||||
3 4077bb0001ba73f4dd966a7f6fb46075 50276 1 0
|
||||
```
|
||||
|
||||
### 5.2删除全为NaN的行或者列
|
||||
数据如下
|
||||
|
||||
```
|
||||
8803b236442fed8a37a5beb04556f684 54530 1 0 abc NULL
|
||||
f4afa2b8f50e8aacf967628b3dc11bff 102017 1 0 NULL NULL
|
||||
4077bb0001ba73f4dd966a7f6fb46075 50276 1 0 NULL NULL
|
||||
NULL NaN NULL NULL NULL NULL
|
||||
```
|
||||
|
||||
```
|
||||
def read43():
|
||||
data = pd.read_table("../data/tt2", header=None, sep='\t')
|
||||
print data.dropna(how='all')
|
||||
print data.dropna(how='all', axis=1)
|
||||
```
|
||||
|
||||
```
|
||||
0 1 2 3 4 5
|
||||
0 8803b236442fed8a37a5beb04556f684 54530.0 1.0 0.0 abc NaN
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017.0 1.0 0.0 NaN NaN
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276.0 1.0 0.0 NaN NaN
|
||||
0 1 2 3 4
|
||||
0 8803b236442fed8a37a5beb04556f684 54530.0 1.0 0.0 abc
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017.0 1.0 0.0 NaN
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276.0 1.0 0.0 NaN
|
||||
3 NaN NaN NaN NaN NaN
|
||||
```
|
||||
|
||||
## 6.缺失值填充
|
||||
更多的时候,我们是需要对缺失值进行填充而不是删除,下面看看怎么填充缺失值。
|
||||
|
||||
数据
|
||||
|
||||
```
|
||||
8803b236442fed8a37a5beb04556f684 54530 1 0 abc NULL
|
||||
f4afa2b8f50e8aacf967628b3dc11bff 102017 1 0 NULL NULL
|
||||
4077bb0001ba73f4dd966a7f6fb46075 50276 1 0 NULL NULL
|
||||
NULL NaN NULL NULL NULL NULL
|
||||
```
|
||||
|
||||
### 6.1 所有缺失值按相同值填充
|
||||
|
||||
```
|
||||
def read51():
|
||||
data = pd.read_table("../data/tt2", header=None, sep='\t')
|
||||
print data.fillna(0)
|
||||
```
|
||||
|
||||
```
|
||||
0 1 2 3 4 5
|
||||
0 8803b236442fed8a37a5beb04556f684 54530.0 1.0 0.0 abc 0.0
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017.0 1.0 0.0 0 0.0
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276.0 1.0 0.0 0 0.0
|
||||
3 0 0.0 0.0 0.0 0 0.0
|
||||
```
|
||||
|
||||
|
||||
### 6.2 不同列填充不同值
|
||||
上面的填充方式太过简单粗暴,实际中我们一般不会这么干,而是会按照不同列填充会比较多。
|
||||
|
||||
```
|
||||
def read52():
|
||||
data = pd.read_table("../data/tt2", header=None, sep='\t', names=['c1', 'c2', 'c3', 'c4', 'c5', 'c6'])
|
||||
print data.fillna({'c3': 'c3default', 'c4': 'c4default'})
|
||||
```
|
||||
|
||||
```
|
||||
c1 c2 c3 c4 c5 c6
|
||||
0 8803b236442fed8a37a5beb04556f684 54530.0 1 0 abc NaN
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017.0 1 0 NaN NaN
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276.0 1 0 NaN NaN
|
||||
3 NaN NaN c3default c4default NaN NaN
|
||||
```
|
||||
|
||||
## 6.3 前向填充与后向填充
|
||||
|
||||
```
|
||||
def read53():
|
||||
data = pd.read_table("../data/tt2", header=None, sep='\t', names=['c1', 'c2', 'c3', 'c4', 'c5', 'c6'])
|
||||
print data.fillna(method='ffill')
|
||||
print data.fillna(method='bfill')
|
||||
```
|
||||
|
||||
```
|
||||
c1 c2 c3 c4 c5 c6
|
||||
0 8803b236442fed8a37a5beb04556f684 54530.0 1.0 0.0 abc NaN
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017.0 1.0 0.0 abc NaN
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276.0 1.0 0.0 abc NaN
|
||||
3 4077bb0001ba73f4dd966a7f6fb46075 50276.0 1.0 0.0 abc NaN
|
||||
c1 c2 c3 c4 c5 c6
|
||||
0 8803b236442fed8a37a5beb04556f684 54530.0 1.0 0.0 abc NaN
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017.0 1.0 0.0 NaN NaN
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276.0 1.0 0.0 NaN NaN
|
||||
3 NaN NaN NaN NaN NaN NaN
|
||||
```
|
||||
|
||||
ffill为前向填充,使用默认是上一行的值,设置axis=1可以使用列进行填充
|
||||
bfill为后向填充,使用下一行的值,不存在的时候就不填充
|
||||
|
||||
|
||||
## 6.4 使用列均值填充
|
||||
|
||||
```
|
||||
def read54():
|
||||
data = pd.read_table("../data/tt2", header=None, sep='\t', names=['c1', 'c2', 'c3', 'c4', 'c5', 'c6'])
|
||||
print data.fillna(data.mean())
|
||||
```
|
||||
|
||||
```
|
||||
c1 c2 c3 c4 c5 c6
|
||||
0 8803b236442fed8a37a5beb04556f684 54530.0 1.0 0.0 abc NaN
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017.0 1.0 0.0 NaN NaN
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276.0 1.0 0.0 NaN NaN
|
||||
3 NaN 68941.0 1.0 0.0 NaN NaN
|
||||
```
|
||||
|
||||
## 7.跳过某些行
|
||||
|
||||
```
|
||||
8803b236442fed8a37a5beb04556f684 54530 1 0 abc NULL
|
||||
test
|
||||
f4afa2b8f50e8aacf967628b3dc11bff 102017 1 0 NULL NULL
|
||||
test
|
||||
4077bb0001ba73f4dd966a7f6fb46075 50276 1 0 NULL NULL
|
||||
NULL NaN NULL NULL NULL NULL
|
||||
```
|
||||
|
||||
```
|
||||
def read61():
|
||||
data1 = pd.read_table("../data/tt2", header=None, sep='\t', names=['c1', 'c2', 'c3', 'c4', 'c5', 'c6'])
|
||||
print data1
|
||||
data2 = pd.read_table("../data/tt2", header=None, sep='\t', names=['c1', 'c2', 'c3', 'c4', 'c5', 'c6'], skiprows=[1, 3])
|
||||
print data2
|
||||
```
|
||||
|
||||
```
|
||||
c1 c2 c3 c4 c5 c6
|
||||
0 8803b236442fed8a37a5beb04556f684 54530.0 1.0 0.0 abc NaN
|
||||
1 test NaN NaN NaN NaN NaN
|
||||
2 f4afa2b8f50e8aacf967628b3dc11bff 102017.0 1.0 0.0 NaN NaN
|
||||
3 test NaN NaN NaN NaN NaN
|
||||
4 4077bb0001ba73f4dd966a7f6fb46075 50276.0 1.0 0.0 NaN NaN
|
||||
5 NaN NaN NaN NaN NaN NaN
|
||||
|
||||
c1 c2 c3 c4 c5 c6
|
||||
0 8803b236442fed8a37a5beb04556f684 54530.0 1.0 0.0 abc NaN
|
||||
1 f4afa2b8f50e8aacf967628b3dc11bff 102017.0 1.0 0.0 NaN NaN
|
||||
2 4077bb0001ba73f4dd966a7f6fb46075 50276.0 1.0 0.0 NaN NaN
|
||||
3 NaN NaN NaN NaN NaN NaN
|
||||
```
|
||||
|
||||
## 8.读取json
|
||||
数据为
|
||||
|
||||
```
|
||||
{
|
||||
"apples": {
|
||||
"June": 3,
|
||||
"Robert": 2,
|
||||
"Lily": 0,
|
||||
"David": 1
|
||||
},
|
||||
"oranges": {
|
||||
"June": 0,
|
||||
"Robert": 3,
|
||||
"Lily": 7,
|
||||
"David": 2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```
|
||||
def read71():
|
||||
data = pd.read_json("../data/tt2")
|
||||
print data
|
||||
```
|
||||
|
||||
```
|
||||
apples oranges
|
||||
David 1 2
|
||||
June 3 0
|
||||
Lily 0 7
|
||||
Robert 2 3
|
||||
```
|
||||
|
|
@ -0,0 +1,113 @@
|
|||
## 0.前言
|
||||
在python2.7及以上的版本,str.format()的方式为格式化提供了非常大的便利。与之前的%型格式化字符串相比,他显得更为方便与优越。下面我们就来看看format的具体用法。
|
||||
|
||||
## 1.常见的用法
|
||||
二话不说,首先上代码,看看format的一些常用方法。
|
||||
|
||||
```
|
||||
print "{:.2f}".format(3.1415926) #3.14,保留小数点后两位
|
||||
print "{:+.2f}".format(3.1415926) #+3.14 带符号保留小数点后两位
|
||||
print "{:+.2f}".format(-10) #-10.00 带符号保留小数点后两位
|
||||
print "{:+.0f}".format(-10.00) #-10 不带小数
|
||||
print "{:0>2d}".format(1) #01 数字补零 (填充左边, 宽度为2)
|
||||
print "{:x<2d}".format(1) #1x 数字补x (填充右边, 宽度为4)
|
||||
print "{:x<4d}".format(10) #10xx 数字补x (填充右边, 宽度为4)
|
||||
print "{:,}".format(1000000) #1,000,000 以逗号分隔的数字格式
|
||||
print "{:.2%}".format(0.12) #12.00% 百分比格式
|
||||
print "{:.2e}".format(1000000) #1.00e+06 指数记法
|
||||
print "{:<10d}".format(10) #10 左对齐 (宽度为10)
|
||||
print "{:>10d}".format(10) # 10 右对齐 (默认, 宽度为10)
|
||||
print "{:^10d}".format(10) # 10 中间对齐 (宽度为10)
|
||||
```
|
||||
|
||||
### 1.格式符
|
||||
'f'表示浮点数
|
||||
'd'表示十进制整数. 将数字以10为基数进行输出
|
||||
'%'表示百分数. 将数值乘以100然后以fixed-point('f')格式打印, 值后面会有一个百分号
|
||||
'e'表示幂符号. 用科学计数法打印数字, 用'e'表示幂.
|
||||
|
||||
### 2.对齐与填充
|
||||
^、<、>分别是居中、左对齐、右对齐,后面带宽度
|
||||
:后面带填充字符,只能是一个字符,不指定的话默认就是空格。
|
||||
|
||||
## 2.format基础字符串替换
|
||||
format中的字符串参数可以使用{num}来表示。0表示第一个,1表示第二个,以此类推。
|
||||
为了更好了解上面的用法,首先我们来看看format的源码
|
||||
|
||||
```
|
||||
def format(self, *args, **kwargs): # known special case of str.format
|
||||
"""
|
||||
S.format(*args, **kwargs) -> string
|
||||
|
||||
Return a formatted version of S, using substitutions from args and kwargs.
|
||||
The substitutions are identified by braces ('{' and '}').
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
给大家翻译一把:
|
||||
使用args和kwargs的替换返回S的格式化版本,替换由大括号('{'和'}')标识。
|
||||
|
||||
再来看看实际的例子:
|
||||
|
||||
```
|
||||
print "{0} and {1} is good for big data".format("python","java")
|
||||
print "{} and {} is good for big data".format("python","java")
|
||||
print "{1} and {0} and {0} is good for big data".format("python","java")
|
||||
```
|
||||
|
||||
让代码run起来以后的结果:
|
||||
|
||||
```
|
||||
python and java is good for big data
|
||||
python and java is good for big data
|
||||
java and python and python is good for big data
|
||||
```
|
||||
|
||||
还可以为参数制定名字:
|
||||
|
||||
```
|
||||
print "{language1} is as well as {language2}".format(language1="python",language2="java")
|
||||
```
|
||||
|
||||
效果如下:
|
||||
|
||||
```
|
||||
python is as well as java
|
||||
```
|
||||
|
||||
## 3.通过集合下标的方式访问
|
||||
下面的例子也可以达到目的
|
||||
|
||||
```
|
||||
languages = ["python","java"]
|
||||
print "{0[0]} is as well as {0[1]}".format(languages)
|
||||
```
|
||||
|
||||
最后的效果:
|
||||
|
||||
```
|
||||
python is as well as java
|
||||
```
|
||||
|
||||
## 4.通过对象属性
|
||||
format还经常使用在对象属性中。请看下面的例子:
|
||||
|
||||
```
|
||||
class Person(object):
|
||||
def __init__(self,name,age):
|
||||
self.name = name
|
||||
self.age = age
|
||||
|
||||
def __str__(self):
|
||||
return "name is: {current.name}, age is: {current.age}".format(current=self)
|
||||
|
||||
p = Person("leilei",18)
|
||||
print p
|
||||
```
|
||||
|
||||
最后的效果:
|
||||
|
||||
```
|
||||
name is: leilei, age is: 18
|
||||
```
|
|
@ -0,0 +1,115 @@
|
|||
## 1.常用的模块
|
||||
|
||||
```
|
||||
from datetime import datetime
|
||||
import time
|
||||
from dateutil.parser import parse
|
||||
```
|
||||
|
||||
## 2.得到当前时间
|
||||
|
||||
```
|
||||
def getCurrentTime():
|
||||
now = datetime.now()
|
||||
print(now)
|
||||
print(type(now))
|
||||
```
|
||||
|
||||
结果为
|
||||
|
||||
```
|
||||
2020-05-07 09:39:02.318002
|
||||
<class 'datetime.datetime'>
|
||||
```
|
||||
|
||||
## 3.得到datetime对象
|
||||
|
||||
```
|
||||
def genDateTimeObj():
|
||||
date = datetime(2020, 4, 19, 15, 30)
|
||||
print(date)
|
||||
print(type(date))
|
||||
```
|
||||
|
||||
```
|
||||
2020-04-19 15:30:00
|
||||
<class 'datetime.datetime'>
|
||||
```
|
||||
|
||||
## 4.datetime转时间戳
|
||||
|
||||
```
|
||||
def datetime_2_timestamp():
|
||||
now = datetime.now()
|
||||
now_timetuple = now.timetuple()
|
||||
now_second = time.mktime(now_timetuple)
|
||||
now_millisecond = int (now_second * 1000 + now.microsecond / 1000)
|
||||
|
||||
print(now.timestamp())
|
||||
print(now_millisecond)
|
||||
```
|
||||
|
||||
```
|
||||
1588815680.100948
|
||||
1588815680100
|
||||
```
|
||||
|
||||
注意如果直接用timestamp()方法得到的是一个浮点数,且时间戳是十位,单位为秒。下面的方法得到的时间戳为十三位,毫秒。
|
||||
|
||||
## 5.时间戳转datetime
|
||||
|
||||
```
|
||||
def timestamp_2_datetime():
|
||||
timestamp = 1588761521787 / 1000
|
||||
date = datetime.fromtimestamp(timestamp)
|
||||
print(date)
|
||||
```
|
||||
|
||||
```
|
||||
2020-05-06 18:38:41.787000
|
||||
```
|
||||
|
||||
## 6.datetime转字符串
|
||||
|
||||
```
|
||||
def datetime_2_str():
|
||||
now = datetime.now()
|
||||
date = now.strftime('%Y-%m-%d %H:%M:%S')
|
||||
print(date)
|
||||
```
|
||||
|
||||
```
|
||||
2020-05-07 09:44:22
|
||||
```
|
||||
|
||||
## 7.字符串转datetime
|
||||
|
||||
```
|
||||
def str_2_datetime():
|
||||
datestr = "2020-05-06 18:42:26"
|
||||
date = datetime.strptime(datestr, "%Y-%m-%d %H:%M:%S")
|
||||
print(date)
|
||||
print(type(date))
|
||||
```
|
||||
|
||||
```
|
||||
2020-05-07 09:44:22
|
||||
```
|
||||
|
||||
## 8.求两个时间差
|
||||
|
||||
```
|
||||
def get_interview():
|
||||
t1 = "2020-05-05 23:56:45"
|
||||
t2 = "2020-05-06 00:00:31"
|
||||
date1 = parse(t1)
|
||||
date2 = parse(t2)
|
||||
result = (date2 - date1).total_seconds()
|
||||
print(result)
|
||||
```
|
||||
|
||||
```
|
||||
226.0
|
||||
```
|
||||
|
||||
上面的方法,求得的是两个时间之间差的秒数。
|
|
@ -0,0 +1,103 @@
|
|||
## 1.一切皆对象
|
||||
python是面向对象语言。在python中,一切皆对象,函数自然也不例外。在python中定义个最简单的函数如下:
|
||||
|
||||
```
|
||||
def fun():
|
||||
print "hello world"
|
||||
```
|
||||
|
||||
当代码执行遇到def以后,会现在内存中生成一个函数对象,这个函数对象被定义为这个函数的名字。当我们调用函数时就要指定函数的名字,通过函数名才能找到这个函数。 函数的代码段在定义时是不会执行的,只有当这个函数被调用时,函数内部的代码段才会被执行。 函数调用结束时,这个函数内部生成的所有数据都会被销毁。
|
||||
|
||||
函数可以作为对象可以赋值给一个变量,可以作为元素添加到集合对象中,可以作为参数值传递给其它函数,还可以当做函数的返回值被引用。
|
||||
|
||||
## 2.函数拥有对象模型的通用属性
|
||||
函数作为一个对象,拥有对象模型的通用属性:id,类型和值。以上面的函数为例:
|
||||
|
||||
```
|
||||
def fun():
|
||||
print "hello world"
|
||||
|
||||
print id(fun)
|
||||
print type(fun)
|
||||
print fun
|
||||
fun()
|
||||
```
|
||||
|
||||
代码输出如下:
|
||||
|
||||
```
|
||||
4297786264
|
||||
<type 'function'>
|
||||
<function fun at 0x1002b0398>
|
||||
hello world
|
||||
```
|
||||
|
||||
使用id加函数名,可以打印func这个函数在内存中的身份地址;
|
||||
使用type加函数名可以打印func这个函数的类型;
|
||||
只输入函数名,不加括号时,会输出函数在内存中的地址;
|
||||
使用def语句来定义函数,func是函数名. 定义func这个函数后,函数里面的打印语句并没有执行,而是等待被调用 ,然后才会执行输出语句。
|
||||
|
||||
## 3.函数可以被引用
|
||||
|
||||
```
|
||||
|
||||
def fun():
|
||||
print "hello world"
|
||||
|
||||
f1 = fun
|
||||
print f1
|
||||
print fun
|
||||
f1()
|
||||
```
|
||||
|
||||
最终的输出:
|
||||
|
||||
```
|
||||
<function fun at 0x1002b0398>
|
||||
<function fun at 0x1002b0398>
|
||||
hello world
|
||||
```
|
||||
|
||||
由上面的例子不难看出,把函数赋值给一个变量时,就是把这个函数在内存中的地址绑定给这个变量,这样引用这个变量时就是在调用这个函数。将fun赋值给变量f以后,他们指向的是同一个内存地址,使用f1变量名加括号相当于在调用fun()。
|
||||
|
||||
## 4.函数可以当参数传递
|
||||
|
||||
```
|
||||
def fun():
|
||||
print "hello world"
|
||||
|
||||
|
||||
def wrapfunc(inner):
|
||||
print "hello wrap"
|
||||
inner()
|
||||
|
||||
|
||||
wrapfunc(fun)
|
||||
```
|
||||
|
||||
最后程序的输出:
|
||||
|
||||
```
|
||||
hello wrap
|
||||
hello world
|
||||
```
|
||||
|
||||
## 5.函数作为返回值
|
||||
|
||||
```
|
||||
def fun():
|
||||
print "hello world"
|
||||
|
||||
|
||||
def wrapfunc(inner):
|
||||
return inner
|
||||
|
||||
|
||||
print wrapfunc(fun)
|
||||
```
|
||||
|
||||
最后的输出结果为:
|
||||
|
||||
```
|
||||
<function fun at 0x1002b0398>
|
||||
```
|
|
@ -0,0 +1,85 @@
|
|||
## 0.前言
|
||||
windows里做数据工作的杀手锏是excel。而且对于大部分产品,运营的同学来说,excel是最常用的工具,甚至都没有之一。但是RD的工作环境大部分情况下都是基于linux系统,linux的世界里是不认识excel这种格式的东东的。所以,在服务器上将excel文件转化成我们需要的格式就显得很常见。
|
||||
|
||||
## 1.xlrd模块
|
||||
python中的xlrd模块可以很方便的读取excel文件。使用
|
||||
|
||||
```
|
||||
pip install xlrd
|
||||
```
|
||||
|
||||
就可以很方便地安装此模块
|
||||
|
||||
## 2.常用方式
|
||||
### 1.打开一个excel文件
|
||||
|
||||
```
|
||||
data = xlrd.open_workbook(file)
|
||||
```
|
||||
|
||||
### 2.得到一个工作表
|
||||
|
||||
```
|
||||
table = data.sheets()[0] #通过索引的方式
|
||||
table = data.sheet_by_name(sheet_name) #通过sheet的名字
|
||||
```
|
||||
|
||||
### 3.得到表的具体属性与数据
|
||||
|
||||
```
|
||||
nrows = table.nrows #行
|
||||
ncols = table.ncols #列
|
||||
cell = table.cell(i,j).value.encode("utf-8") #得到具体一个cell的值
|
||||
```
|
||||
|
||||
## 3.操作excel表格代码
|
||||
|
||||
```
|
||||
#!/usr/bin/env python
|
||||
#coding:utf-8
|
||||
|
||||
import xlrd
|
||||
|
||||
def open_excel(file = "行业&包名mapping.xlsx"):
|
||||
try:
|
||||
data = xlrd.open_workbook(file)
|
||||
return data
|
||||
except Exception,ex:
|
||||
print ex
|
||||
|
||||
|
||||
def readfile(file = "行业&包名mapping.xlsx"):
|
||||
data = open_excel(file)
|
||||
table = data.sheets()[0]
|
||||
nrows = table.nrows
|
||||
ncols = table.ncols
|
||||
|
||||
f = open("result","w")
|
||||
for i in range(nrows):
|
||||
line = ""
|
||||
for j in range(ncols):
|
||||
each_cell = table.cell(i,j).value.encode("utf-8") #得到具体一个cell的值
|
||||
line = line + each_cell + ","
|
||||
line = line[:-1]
|
||||
line += "\n"
|
||||
f.writelines(line)
|
||||
|
||||
def excel_table_byindex(file = "行业&包名mapping.xlsx", row_index=0,sheet_index=0):
|
||||
data = open_excel(file)
|
||||
table = data.sheets()[sheet_index]
|
||||
nrows = table.nrows #行
|
||||
ncols = table.ncols #列
|
||||
coldata = table.row_values(row_index) #一行,为一个list
|
||||
print ",".join(coldata)
|
||||
|
||||
def excel_table_byname(file = "行业&包名mapping.xlsx", col_index=0, sheet_name = u"Sheet1"):
|
||||
data = open_excel(file)
|
||||
table = data.sheet_by_name(sheet_name)
|
||||
nrows = table.nrows
|
||||
ncols = table.ncols
|
||||
coldata = table.col_values(col_index) #一列,为一个list
|
||||
print ",".join(coldata)
|
||||
|
||||
excel_table_byindex()
|
||||
excel_table_byname()
|
||||
```
|
Loading…
Reference in New Issue