add python

2020-08-09 20:43:39 +08:00 · 2020-08-09 20:43:39 +08:00 · bc95cbbee3
parent 445dbbe270
commit bc95cbbee3
11 changed files with 1421 additions and 0 deletions
--- a/papers/languages/python/mac
+++ b/papers/languages/python/mac
@ -0,0 +1,26 @@
+pip是常用的python包管理工具，类似于java的maven。用python的同学，都离不开pip。  
+在新mac中想用home-brew安装pip时，遇到了一些小问题：  
+
+```
+bogon:~ wanglei$ brew install pip
+Error: No available formula with the name "pip"
+Homebrew provides pip via: `brew install python`. However you will then
+have two Pythons installed on your Mac, so alternatively you can install
+pip via the instructions at:
+
+  https://pip.readthedocs.org/en/stable/installing/#install-pip
+```  
+
+由此可见，在home-brew中，pip的安装是跟python一起的。  
+
+换种方式：  
+
+```
+bogon:~ wanglei$ sudo easy_install pip
+Password:
+Searching for pip
+Reading https://pypi.python.org/simple/pip/
+...
+```  
+
+稍等片刻，pip就安装完毕。。。  
--- a/lambda函数与函数式编程.md
+++ b/lambda函数与函数式编程.md
@ -0,0 +1,110 @@
+## 1.lambda函数初探
+lambda函数又名匿名函数。顾名思义，匿名函数，那肯定就是没有函数名称啦。先看个最简单的例子：  
+先写个正常的函数：  
+
+```
+def f(x):
+    return x+1
+```  
+
+很简单，不解释。如果写成lambda函数：  
+
+```
+g = lambda x:x+1
+print g
+print g(2)
+```  
+
+```
+<function <lambda> at 0x1007cc668>
+3
+```  
+
+由此可见g确实是个函数对象。lambda函数都是以lambda开头，后面接参数，参数后面再接冒号，冒号后面则是函数返回的具体内容。  
+
+当然也可以在lambda函数中指定多个参数：  
+
+```
+f = lambda x,y:x+y+1
+print "the result is: ", f(2,3)
+```  
+
+```
+the result is:  6
+```  
+
+## 2.为什么要用lambda函数
+很多人说，lambda函数只是省略了函数名而已。而且这样的匿名函数，又不能在别的地方被调用，那干嘛还要使用lambda函数呢？  
+一个东西既然存在，肯定有他存在的合理性与必然性。根据我的使用感受来看，使用lambda函数主要有以下优点：  
+1.省略了函数名。什么，这不是优点？拜托，请问写代码最难的部分是什么？之前github上做过类似调查，结果"给变量/函数命名"的选项遥遥领先！给变量/函数取个短小精悍容易理解又能正确反应其含义的名字是件很困难的事情好不好！尤其是那些只调用一次的函数，给它起个靠谱的名字，真的是太困难了。lambda函数正好就派上了用场。  
+2.在有些场合，省略了函数定义的过程，代码更加简洁而且容易理解。  
+
+总结起来看的话，lambda函数更多起的是润滑剂或者语法糖的作用，让使用者更为方便。在很多场合，lambda函数也可以用其他方式实现，但那样可能会付出代码更为复杂的代价。  
+
+## 3.与map函数结合
+map函数的官方定义如下：  
+
+Return a list of the results of applying the function to the items of the argument sequence(s).  If more than one sequence is given, the function is called with an argument list consisting of the corresponding item of each sequence, substituting None for missing values when not all sequences have the same length.  If the function is None, return a list of the items of the sequence (or a list of tuples if more than one 
+ sequence).    
+ 
+为大家翻译一把：map函数返回一个结果列表，结果列表里的每个元素是将序列参数里的每个元素传给map中调用的方法求出的。如果参数中包括不止一个序列，那个map中调用的方法的参数将由一个list组成，每个参数对应每个序列的每个元素。当序列长度不相等时，对应的缺失值为None。如果map中没有调用方法，返回的是序列本身（如果序列参数不止一个，返回的是一个列表元祖)。  
+  
+上面翻译得不是很接地气，给大家举个例子就清楚了：  
+
+```
+ret_list = map(lambda x:x+1,[1,2,3])
+print "ret_list is: ",ret_list
+```  
+
+```
+ret_list is:  [2, 3, 4]
+```  
+
+上面的例子中，map函数有两个参数，一个lambda方法，lambda方法返回的是x+1；另外的参数是一个列表，map函数做的事，就是对列表项中的每个数字进行加1的操作，最后返回一个列表。  
+
+## 4.lambda函数转化为列表解析
+当然，上面例子中map函数中的lambda方法，可以使用列表解析的方式来实现。  
+
+```
+print [x+1 for x in range(1,4)]
+```  
+
+这样写，更简洁明了。对于一般人而言，也比lambda更容易理解。另外，列表解析的速度也很快，是非常pythonic的写法。  
+
+## 5.lambda函数需要注意的细节
+有如下的代码：  
+
+```
+fs = [ lambda n: i+n for i in range(5) ]
+
+for k in range(5):
+    print "fs[%d]: " %k,fs[k](4)
+```  
+
+此时的输出为：  
+
+```
+fs[0]:  8
+fs[1]:  8
+fs[2]:  8
+fs[3]:  8
+fs[4]:  8
+```  
+
+怪哉。咋不符合预期勒。其实问题就出在变量i上面。因为lambda函数中没有指定参数i，所以这时输入的i为全局变量！  
+
+这么写就OK了：  
+
+```
+fl = [ (lambda n,i=i: i+n) for i in range(5)]
+for k in range(5):
+    print "fl[%d]: " %k,fl[k](4)
+```  
+
+```
+fl[0]:  4
+fl[1]:  5
+fl[2]:  6
+fl[3]:  7
+fl[4]:  8
+```  
--- a/papers/languages/python/python
+++ b/papers/languages/python/python
@ -0,0 +1,87 @@
+python中，list类型内置了sort()方法用于排序。当然，python还有内置的全局sorted()方法，用于可迭代序列的排序。这两个方法大部分的用法是相同的，最大的不同在于，sort()方法不会生成一个新的list，而是在原有的list上进行修改；sorted()方法则是生成一个新的可迭代序列。  
+
+## 1.最简单的排序
+首先help一把list.sort()方法  
+
+```
+In [1]: help(list.sort)
+
+Help on method_descriptor:
+
+sort(...)
+    L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
+    cmp(x, y) -> -1, 0, 1
+(END)
+```  
+注：在python 3.x系列中，cmp参数已经被废弃，由key参数指定即可。  
+
+list.sort()方法就可以对list进行排序。不过需要注意的是，此时原来的list将被修改。  
+
+```
+In [2]: array=[5,3,1,7,9]
+
+In [3]: array.sort()
+
+In [4]: array
+Out[4]: [1, 3, 5, 7, 9]
+```  
+
+## 2.复杂对象排序
+使用的更广泛的情况是用复杂对象的某些值来实现复杂对象的排序。  
+例如：  
+```
+In [5]: persons=[['lindan','A',20],['chenlong','A',18],['tiantian','B',18]]
+
+In [6]: list.sort(persons,key=lambda person:person[2])
+
+In [7]: persons
+Out[7]: [['chenlong', 'A', 18], ['tiantian', 'B', 18], ['lindan', 'A', 20]]
+```  
+
+使用operator模块  
+```
+In [8]: persons=[['lindan','A',20],['chenlong','A',18],['tiantian','B',18]]
+
+In [9]: from operator import itemgetter,attrgetter
+
+In [10]: list.sort(persons,key=itemgetter(2))
+
+In [11]: persons
+Out[11]: [['chenlong', 'A', 18], ['tiantian', 'B', 18], ['lindan', 'A', 20]]
+```  
+
+## 3.对拥有命名属性的复杂对象排序
+也可以对某个拥有命名属性的复杂对象进行排序(为了方便，使用sorted()方法，与list.sort()方法本质是一样的)  
+
+```
+class Person:
+    def __init__(self,name,hierarchy,age):
+        self.name = name
+        self.hierarchy = hierarchy
+        self.age = age
+        
+    def __repr__(self):
+        return repr((self.name,self.hierarchy,self.age))
+  
+#按年龄排序  
+def sort_age():
+    Persons = [Person('kobe','A',20),Person('janes','A',18),Person('Tracy','B',18)]
+    p_age = sorted(Persons,key = attrgetter('age'),reverse = True)
+    print p_age
+  
+#先按年龄，再按名字排序  
+def sort_age_hierarchy():
+    Persons = [Person('kobe','A',20),Person('janes','A',18),Person('Tracy','B',18)]
+    p_sorted = sorted(Persons,key = attrgetter('age','name'),reverse = True)
+    print p_sorted
+    
+if __name__ == '__main__':
+    sort_age()
+    sort_age_hierarchy()
+```  
+
+结果如下：  
+```
+[('kobe', 'A', 20), ('janes', 'A', 18), ('Tracy', 'B', 18)]
+[('kobe', 'A', 20), ('janes', 'A', 18), ('Tracy', 'B', 18)]
+```  
--- a/求交集，并集，差集.md
+++ b/求交集，并集，差集.md
@ -0,0 +1,56 @@
+在python中，数组可以用list来表示。如果有两个数组，分别要求交集，并集与差集，怎么实现比较方便呢？  
+当然最容易想到的是对两个数组做循环，即写两个for循环来实现。这种写法大部分同学应该都会，而且也没有太多的技术含量，本博主就不解释了。这里给大家使用更为装bility的一些方法。  
+
+老规矩，talk is cheap,show me the code  
+
+```
+#!/usr/bin/env python
+#coding:utf-8
+
+'''
+Created on 2016年6月9日
+
+@author: lei.wang
+'''
+
+def diff(listA,listB):
+    #求交集的两种方式
+    retA = [i for i in listA if i in listB]
+    retB = list(set(listA).intersection(set(listB)))
+    
+    print "retA is: ",retA
+    print "retB is: ",retB
+    
+    #求并集
+    retC = list(set(listA).union(set(listB)))
+    print "retC1 is: ",retC
+    
+    #求差集，在B中但不在A中
+    retD = list(set(listB).difference(set(listA)))
+    print "retD is: ",retD
+    
+    retE = [i for i in listB if i not in listA]
+    print "retE is: ",retE
+    
+def main():
+    listA = [1,2,3,4,5]
+    listB = [3,4,5,6,7]
+    diff(listA,listB)
+    
+if __name__ == '__main__':
+    main()
+```  
+
+让code run起来  
+
+```
+retA is:  [3, 4, 5]
+retB is:  [3, 4, 5]
+retC1 is:  [1, 2, 3, 4, 5, 6, 7]
+retD is:  [6, 7]
+retE is:  [6, 7]
+```  
+
+结合代码来看，大体上是两种思路：  
+1.使用列表解析式。列表解析式一般来说比循环更快，而且更pythonic显得更牛逼。  
+2.将list转成set以后，使用set的各种方法去处理。  
--- a/papers/languages/python/python
+++ b/papers/languages/python/python
@ -0,0 +1,90 @@
+## 1.遍历嵌套list
+将嵌套的list遍历并输出是很常见的需求。以下通过两种方法达到目的    
+
+```
+def nested_list(list_raw,result):
+    for item in list_raw:
+        if isinstance(item, list):
+            nested_list(item,result)
+        else:
+            result.append(item)
+            
+    return  result   
+            
+def flatten_list(nested):
+    if isinstance(nested, list):
+        for sublist in nested:
+            for item in flatten_list(sublist):
+                yield item
+    else:
+        yield nested
+    
+def main():   
+    list_raw = ["a",["b","c",["d"]]]
+    result = []
+    print "nested_list is:  ",nested_list(list_raw,result)
+    print "flatten_list is: ",list(flatten_list(list_raw))
+    
+main()
+```  
+
+让代码run起来，输出为：    
+
+```
+nested_list is:   ['a', 'b', 'c', 'd']
+flatten_list is:  ['a', 'b', 'c', 'd']
+
+```  
+
+nested_list方法采用递归的方式，如果item是list类型，继续递归调用自身。如果不是，将item加入结果列表中即可。  
+flatten_list方法则是采用生成器的方式，本质上也是递归的思路。  
+
+## 2.两层嵌套list去重
+list里面套了一层list，需要去重，并在生成一个去重的list。请看代码：  
+
+```
+def dup_remove_set(list_raw):
+    result = set()
+    for sublist in list_raw:
+        item = set(sublist)
+        result = result.union(item)
+    return list(result)
+
+def main():  
+    list_dup = [[1,2,3],[1,2,4,5],[5,6,7]]
+    print dup_remove_set(list_dup)
+```  
+
+让代码run起来：  
+
+```
+[1, 2, 3, 4, 5, 6, 7]
+```  
+
+基本思路：将每一个子list转为set，然后求并集，即可。  
+
+## 3.多重嵌套去重
+
+```
+def dup_remove(list_raw,result):
+    for item in list_raw:
+        if isinstance(item, list):
+            dup_remove(item,result)
+        else:
+            result.add(item)
+            
+    return  list(result)
+
+def main():   
+    list_raw = ["a",["b","c",["d","a","b"]]]
+    result = set()
+    print "dup_remove is:  ",dup_remove(list_raw,result)
+```  
+
+让代码run起来：  
+
+```
+dup_remove is:   ['a', 'c', 'b', 'd']
+```  
+
+基本思路与之前遍历嵌套list的思路差不多，唯一的区别就是之前result是一个list，而要去重的话用result是一个set，保证最后的结果为去重的结果。  
--- a/papers/languages/python/python
+++ b/papers/languages/python/python
@ -0,0 +1,187 @@
+## 正则表达式是什么鬼
+正则表达式，又称正规表示式、正规表示法、正规表达式、规则表达式、常规表示法（英语：Regular Expression，在代码中常简写为regex、regexp或RE），计算机科学的一个概念。正则表达式使用单个字符串来描述、匹配一系列符合某个句法规则的字符串。在很多文本编辑器里，正则表达式通常被用来检索、替换那些符合某个模式的文本。（以上内容来自维基百科）  
+
+现在是大数据时代，各种处理字符串恐怕是数据工作者最常用的技能了。另外，做网页的，做界面的，做爬虫的同学们，经常会需要查找替换各种复合某些复杂规则的字符串。换句话说，只要跟字符串打交道，就离不开正则表达式的身影。写得一手好正则，能极大地提高工作效率，这也是一个玩数据同志的切身体会。  
+
+So，Let's Go，一起走进正则的奇妙世界吧  
+
+## 正则表达式元字符
+<table>
+   <tr>
+      <td>字符</td>
+      <td>描述</td>
+   </tr>
+   <tr>
+      <td>\</td>
+      <td>将下一个字符标记为一个特殊字符、或一个原义字符、或一个 后向引用、或一个八进制转义符。例如，'n' 匹配字符 "n"。'\n' 匹配一个换行符。序列 '\\' 匹配 "\" 而 "\(" 则匹配 "("。</td>
+   </tr>
+   <tr>
+      <td>^</td>
+      <td>匹配输入字符串的开始位置。如果设置了 RegExp 对象的 Multiline 属性，^ 也匹配 '\n' 或 '\r' 之后的位置。</td>
+   </tr>
+   <tr>
+      <td>$</td>
+      <td>匹配输入字符串的结束位置。如果设置了RegExp 对象的 Multiline 属性，$ 也匹配 '\n' 或 '\r' 之前的位置。</td>
+   </tr>
+   <tr>
+      <td>*</td>
+      <td>匹配前面的子表达式零次或多次。例如，zo* 能匹配 "z" 以及 "zoo"。 * 等价于{0,}。</td>
+   </tr>
+   <tr>
+      <td>+</td>
+      <td>匹配前面的子表达式一次或多次。例如，'zo+' 能匹配 "zo" 以及 "zoo"，但不能匹配 "z"。+ 等价于 {1,}。</td>
+   </tr>
+   <tr>
+      <td>?</td>
+      <td>匹配前面的子表达式零次或一次。例如，"do(es)?" 可以匹配 "do" 或 "does" 中的"do" 。? 等价于 {0,1}。</td>
+   </tr>
+   <tr>
+      <td>{n}</td>
+      <td>n 是一个非负整数。匹配确定的 n 次。例如，'o{2}' 不能匹配 "Bob" 中的 'o'，但是能匹配 "food" 中的两个 o。</td>
+   </tr>
+   <tr>
+      <td>{n,}</td>
+      <td>n 是一个非负整数。至少匹配n 次。例如，'o{2,}' 不能匹配 "Bob" 中的 'o'，但能匹配 "foooood" 中的所有 o。'o{1,}' 等价于 'o+'。'o{0,}' 则等价于 'o*'。</td>
+   </tr>
+   <tr>
+      <td>{n,m}</td>
+      <td>m 和 n 均为非负整数，其中m>=n，最少匹配 n 次且最多匹配 m 次。刘， "o{1,3}" 将匹配 "fooooood" 中的前三个 o。'o{0,1}' 等价于 'o?'。请注意在逗号和两个数之间不能有空格。</td>
+   </tr>
+   <tr>
+      <td>?</td>
+      <td>当该字符紧跟在任何一个其他限制符 (*, +, ?, {n}, {n,}, {n,m}) 后面时，匹配模式是非贪婪的。非贪婪模式尽可能少的匹配所搜索的字符串，而默认的贪婪模式则尽可能多的匹配所搜索的字符串。例如，对于字符串 "oooo"，'o+?' 将匹配单个 "o"，而 'o+' 将匹配所有 'o'。</td>
+   </tr>
+   <tr>
+      <td>.</td>
+      <td>匹配除 "\n" 之外的任何单个字符。要匹配包括 '\n' 在内的任何字符，请使用象 '[.\n]' 的模式。</td>
+   </tr>
+   <tr>
+      <td>(pattern)</td>
+      <td>匹配pattern 并获取这一匹配。所获取的匹配可以从产生的 Matches 集合得到，在VBScript 中使用 SubMatches 集合，在Visual Basic Scripting Edition 中则使用 $0…$9 属性。要匹配圆括号字符，请使用 '\(' 或 '\)'。</td>
+   </tr>
+   <tr>
+      <td>(?:pattern)</td>
+      <td>匹配 pattern 但不获取匹配结果，也就是说这是一个非获取匹配，不进行存储供以后使用。这在使用 "或" 字符 (|) 来组合一个模式的各个部分是很有用。例如， 'industr(?:y|ies) 就是一个比 'industry|industries' 更简略的表达式。</td>
+   </tr>
+   <tr>
+      <td>(?=pattern)</td>
+      <td>正向预查，在任何匹配 pattern 的字符串开始处匹配查找字符串。这是一个非获取匹配，也就是说，该匹配不需要获取供以后使用。例如， 'Windows (?=95|98|NT|2000)' 能匹配 "Windows 2000" 中的 "Windows" ，但不能匹配 "Windows 3.1" 中的 "Windows"。预查不消耗字符，也就是说，在一个匹配发生后，在最后一次匹配之后立即开始下一次匹配的搜索，而不是从包含预查的字符之后开始。</td>
+   </tr>
+   <tr>
+      <td>(?!pattern)</td>
+      <td>负向预查，在任何不匹配Negative lookahead matches the search string at any point where a string not matching pattern 的字符串开始处匹配查找字符串。这是一个非获取匹配，也就是说，该匹配不需要获取供以后使用。例如'Windows (?!95|98|NT|2000)' 能匹配 "Windows 3.1" 中的 "Windows"，但不能匹配 "Windows 2000" 中的 "Windows"。预查不消耗字符，也就是说，在一个匹配发生后，在最后一次匹配之后立即开始下一次匹配的搜索，而不是从包含预查的字符之后开始</td>
+   </tr>
+   <tr>
+      <td>x|y</td>
+      <td>匹配 x 或 y。例如，'z|food' 能匹配 "z" 或 "food"。'(z|f)ood' 则匹配 "zood" 或 "food"。</td>
+   </tr>
+   <tr>
+      <td>[xyz]</td>
+      <td>字符集合。匹配所包含的任意一个字符。例如， '[abc]' 可以匹配 "plain" 中的 'a'。</td>
+   </tr>
+   <tr>
+      <td>[^xyz]</td>
+      <td>负值字符集合。匹配未包含的任意字符。例如， '[^abc]' 可以匹配 "plain" 中的'p'。</td>
+   </tr>
+   <tr>
+      <td>[a-z]</td>
+      <td>字符范围。匹配指定范围内的任意字符。例如，'[a-z]' 可以匹配 'a' 到 'z' 范围内的任意小写字母字符。</td>
+   </tr>
+   <tr>
+      <td>[^a-z]</td>
+      <td>负值字符范围。匹配任何不在指定范围内的任意字符。例如，'[^a-z]' 可以匹配任何不在 'a' 到 'z' 范围内的任意字符。</td>
+   </tr>
+   <tr>
+      <td>\b</td>
+      <td>匹配一个单词边界，也就是指单词和空格间的位置。例如， 'er\b' 可以匹配"never" 中的 'er'，但不能匹配 "verb" 中的 'er'。</td>
+   </tr>
+   <tr>
+      <td>\B</td>
+      <td>匹配非单词边界。'er\B' 能匹配 "verb" 中的 'er'，但不能匹配 "never" 中的 'er'。</td>
+   </tr>
+   <tr>
+      <td>\cx</td>
+      <td>匹配由x指明的控制字符。例如， \cM 匹配一个 Control-M 或回车符。 x 的值必须为 A-Z 或 a-z 之一。否则，将 c 视为一个原义的 'c' 字符。</td>
+   </tr>
+   <tr>
+      <td>\d</td>
+      <td>匹配一个数字字符。等价于 [0-9]。</td>
+   </tr>
+   <tr>
+      <td>\D</td>
+      <td>匹配一个非数字字符。等价于 [^0-9]。</td>
+   </tr>
+   <tr>
+      <td>\f</td>
+      <td>匹配一个换页符。等价于 \x0c 和 \cL。</td>
+   </tr>
+   <tr>
+      <td>\n</td>
+      <td>匹配一个换行符。等价于 \x0a 和 \cJ。</td>
+   </tr>
+   <tr>
+      <td>\r</td>
+      <td>匹配一个回车符。等价于 \x0d 和 \cM。</td>
+   </tr>
+   <tr>
+      <td>\s</td>
+      <td>匹配任何空白字符，包括空格、制表符、换页符等等。等价于 [ \f\n\r\t\v]。</td>
+   </tr>
+   <tr>
+      <td>\S</td>
+      <td>匹配任何非空白字符。等价于 [^ \f\n\r\t\v]。</td>
+   </tr>
+   <tr>
+      <td>\t</td>
+      <td>匹配一个制表符。等价于 \x09 和 \cI。</td>
+   </tr>
+   <tr>
+      <td>\v</td>
+      <td>匹配一个垂直制表符。等价于 \x0b 和 \cK。</td>
+   </tr>
+   <tr>
+      <td>\w</td>
+      <td>匹配包括下划线的任何单词字符。等价于'[A-Za-z0-9_]'。</td>
+   </tr>
+   <tr>
+      <td>\W</td>
+      <td>匹配任何非单词字符。等价于 '[^A-Za-z0-9_]'。</td>
+   </tr>
+   <tr>
+      <td>\xn</td>
+      <td>匹配 n，其中 n 为十六进制转义值。十六进制转义值必须为确定的两个数字长。例如， '\x41' 匹配 "A"。'\x041' 则等价于 '\x04' & "1"。正则表达式中可以使用 ASCII 编码。.</td>
+   </tr>
+   <tr>
+      <td>\num</td>
+      <td>匹配 num，其中 num 是一个正整数。对所获取的匹配的引用。例如，'(.)\1' 匹配两个连续的相同字符。</td>
+   </tr>
+   <tr>
+      <td>\n</td>
+      <td>标识一个八进制转义值或一个后向引用。如果 \n 之前至少 n 个获取的子表达式，则 n 为后向引用。否则，如果 n 为八进制数字 (0-7)，则 n 为一个八进制转义值。</td>
+   </tr>
+   <tr>
+      <td>\nm</td>
+      <td>标识一个八进制转义值或一个后向引用。如果 \nm 之前至少有is preceded by at least nm 个获取得子表达式，则 nm 为后向引用。如果 \nm 之前至少有 n 个获取，则 n 为一个后跟文字 m 的后向引用。如果前面的条件都不满足，若  n 和 m 均为八进制数字 (0-7)，则 \nm 将匹配八进制转义值 nm。</td>
+   </tr>
+   <tr>
+      <td>\nml</td>
+      <td>如果 n 为八进制数字 (0-3)，且 m 和 l 均为八进制数字 (0-7)，则匹配八进制转义值 nml。</td>
+   </tr>
+   <tr>
+      <td>\un</td>
+      <td>匹配 n，其中 n 是一个用四个十六进制数字表示的 Unicode 字符。例如， \u00A9 匹配版权符号 (?)。</td>
+   </tr>
+</table>
+
+
+
+元字符比较多。不过不用着急，先快速浏览一遍，后续用到的时候再各个击破  
+
+## 正则表达式初探
+正则的语法比较多，也比较晦涩难懂。所以个人观点，正则的最好学习方式，就是从实例开始，从需求开始，回头再去理解相应的语法，这样学习起来比较简单  
+
+像coding的时候最常见的场景，想查找比如print这个单词，一般的编辑器，IDE都是ctrl+f，然后在find里输入'print'，然后就能找到所有包含'print'这个字符串的位置。  
+
+这里有几个问题：  
+1.我们可能想做的不是精确匹配'print'，也可能想匹配'prinT,'Print','prInt'等诸多情况。这种情况下，一般指定re.I模式，就可以匹配到上述所有字符串。'I'选项，在正则中一般就是'Ignore'的意思，忽略大小写。  
+2.我们上述匹配出来的字符串，比如在java代码里，一般包含'println'的情况最多。但是很多时候，我们是想精确匹配某一个字符串或者单词，比如我们并不想匹配'print'。这个时候，元字符就派上用场了。'\bprint\b'可以达到我们的预期效果。  
--- a/浮点数精确运算解决方案.md
+++ b/浮点数精确运算解决方案.md
@ -0,0 +1,42 @@
+## 浮点数误差
+浮点数一个普遍的问题就是在计算机的世界中，浮点数并不能准确地表示十进制。并且，即便是最简单的数学运算，也会带来不可控制的后果。因为，在计算机的世界中只认识0与1。  
+
+```
+>>> x = 4.20
+>>> y = 2.10
+>>> x + y
+6.3000000000000007
+>>> (x+y) == 6.3
+False
+>>> x = 1.2
+>>> y = 2.3
+>>> x + y
+3.5
+>>> (x + y) == 3.5
+True
+```  
+
+上述种种问题，就来自于计算机的cpu与浮点数的表示方式，我们自己在代码层面是没法控制的。在有些需要精确表示浮点数的场合，例如财务结算，这些误差就不可接受。  
+
+## decimal模块进行十进制数学计算
+python中的decimal模块可以解决上面的烦恼  
+decimal模块中，可以通过整数，字符串或原则构建decimal.Decimal对象。如果是浮点数，特别注意因为浮点数本身存在误差，需要先将浮点数转化为字符串。  
+
+```
+>>> from decimal import Decimal
+>>> from decimal import getcontext
+>>> Decimal('4.20') + Decimal('2.10')
+Decimal('6.30')
+>>> from decimal import Decimal
+>>> from decimal import getcontext
+>>> x = 4.20
+>>> y = 2.10
+>>> z = Decimal(str(x)) + Decimal(str(y))
+>>> z
+Decimal('6.3')
+>>> getcontext().prec = 4 #设置精度
+>>> Decimal('1.00') /Decimal('3.0')
+Decimal('0.3333')
+```  
+
+当然精度提升的同时，肯定带来的是性能的损失。在对数据要求特别精确的场合（例如财务结算），这些性能的损失是值得的。但是如果是大规模的科学计算，就需要考虑运行效率了。毕竟原生的float比Decimal对象肯定是要快很多的。  
--- a/papers/languages/python/python
+++ b/papers/languages/python/python
@ -0,0 +1,383 @@
+## 1.原生python的不方便
+作为一个数据与算法工作者，python的使用频率很高。现阶段python做科学计算的标配是numpy+scipy+matplotlib+sklearn+pandas。可惜的是，原生的python是不带这些包的。于是，每次遇到一个新机器，需要安装这些包。更可气的是，昨晚本博主为了在新机器上安装sklearn，足足花了两小时，中间踩了无数之前没遇到过的天坑加上天朝坑爹的网络。。。作为一个搭建了无数次科学计算环境的老司机还遇到这种情况，估计新手们就更无比郁闷了。于是老司机就想，有没有一个东西把所有常用的科学计算工具都集成好，这样就省了每次搭环境的天坑。。。google一把，发现了今天文章的主角：Anaconda。  
+
+## 2.先看看Anaconda是个什么鬼
+Anaconda：蟒蛇，估计来源就是python logo里那条可爱的小蟒蛇吧。  
+mac版下载地址： https://www.continuum.io/downloads#_macosx  
+看看官网首页是怎么介绍的：  
+Anaconda is the leading open data science platform powered by Python. The open source version of Anaconda is a high performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science. Additionally, you'll have access to over 720 packages that can easily be installed with conda, our renowned package, dependency and environment manager, that is included in Anaconda. Anaconda is BSD licensed which gives you permission to use Anaconda commercially and for redistribution. See the packages included with Anaconda and the Anaconda changelog.  
+
+通过上面这段牛逼闪闪的介绍，我们知道Anaconda是一个基于python的科学计算平台，这个平台里包含有python,r,scala等绝大部分主流的用于科学计算的包。  
+
+接下来自然就是开始下载了。因为集成有很多牛逼科学计算包的缘故，所以安装包自然也小不了，比如我下载的mac版就有360M。那就慢慢下着吧。还好网络虽然不是很快，好歹还是稳定的，能到一两百k，一个小时左右能下完。这段时间就先干点别的吧。  
+
+## 3.安装配置
+下载完成以后，跟mac里安装普通软件一样，双击安装即可。  
+安装完以后，开始进行相应的配置。因为我平时使用eclipse开发，正好官网都贴心地给出了在IDE里怎么配置使用，里面就有eclipse，前提是eclipse已经安装了pydev插件。  
+
+以下eclipse配置方法来自官网：  
+After you have Eclipse, PyDev, and Anaconda installed, follow these steps to set Anaconda Python as your default by adding it as a new interpreter, and then selecting that new interpreter:  
+
+Open the Eclipse Preferences window:  
+![这里写图片描述](https://github.com/bitcarmanlee/easy-algorithm-interview-photo/blob/master/languages/python/anaconda/1.png)    
+Go to PyDev -> Interpreters -> Python Interpreter.  
+Click the New button:  
+![这里写图片描述](https://github.com/bitcarmanlee/easy-algorithm-interview-photo/blob/master/languages/python/anaconda/2.png)  
+In the “Interpreter Name” box, type “Anaconda Python”.  
+Browse to ~/anaconda/bin/python or wherever your Anaconda Python is installed.  
+Click the OK button.  
+![这里写图片描述](https://github.com/bitcarmanlee/easy-algorithm-interview-photo/blob/master/languages/python/anaconda/3.png)    
+In the next window, select all the folders and click the OK button again to select the folders to add to the SYSTEM python path.  
+![这里写图片描述](https://github.com/bitcarmanlee/easy-algorithm-interview-photo/blob/master/languages/python/anaconda/4.png)    
+The Python Interpreters window will now display Anaconda Python. Click OK.  
+![这里写图片描述](https://github.com/bitcarmanlee/easy-algorithm-interview-photo/blob/master/languages/python/anaconda/5.png)  
+You are now ready to use Anaconda Python with your Eclipse and PyDev installation.  
+
+如果是其他IDE，可以上官网查看其他配置方法。具体地址：  
+https://docs.continuum.io/anaconda/ide_integration#id8  
+
+## 4.查看Anaconda的基本用法
+配置完成以后，查看一下此时系统的python：  
+
+```
+lei.wang ~ $ which python
+/Users/lei.wang/anaconda/bin/python
+lei.wang ~ $ python --version
+Python 2.7.12 :: Anaconda 4.1.1 (x86_64)
+```  
+
+此时，系统默认的python已经变成了Anaconda的版本！  
+为什么会这样呢？原来是安装过程中，偷偷给我们在home目录下生成了一个.bashrc_profile文件，并在里面加入了PATH：  
+
+```
+# added by Anaconda2 4.1.1 installer
+export PATH="/Users/wanglei/anaconda/bin:$PATH"
+```  
+
+所以这个时候我们的bash里使用python的话，已经指向了anaconda里的python解释器。  
+如果使用的不是mac的标准bash，而是zsh，不用着急，将上面一行配置复制粘贴到.zshrc文件中，然后source一下.zshrc文件即可！  
+
+
+执行一下conda命令：  
+
+```
+lei.wang ~ $ conda
+usage: conda [-h] [-V] [--debug] command ...
+
+conda is a tool for managing and deploying applications, environments and packages.
+
+Options:
+
+positional arguments:
+  command
+    info         Display information about current conda install.
+    help         Displays a list of available conda commands and their help
+                 strings.
+    list         List linked packages in a conda environment.
+    search       Search for packages and display their information. The input
+                 is a Python regular expression. To perform a search with a
+                 search string that starts with a -, separate the search from
+                 the options with --, like 'conda search -- -h'. A * in the
+                 results means that package is installed in the current
+                 environment. A . means that package is not installed but is
+                 cached in the pkgs directory.
+    create       Create a new conda environment from a list of specified
+                 packages.
+    install      Installs a list of packages into a specified conda
+                 environment.
+    update       Updates conda packages to the latest compatible version. This
+                 command accepts a list of package names and updates them to
+                 the latest versions that are compatible with all other
+                 packages in the environment. Conda attempts to install the
+                 newest versions of the requested packages. To accomplish
+                 this, it may update some packages that are already installed,
+                 or install additional packages. To prevent existing packages
+                 from updating, use the --no-update-deps option. This may
+                 force conda to install older versions of the requested
+                 packages, and it does not prevent additional dependency
+                 packages from being installed. If you wish to skip dependency
+                 checking altogether, use the '--force' option. This may
+                 result in an environment with incompatible packages, so this
+                 option must be used with great caution.
+    upgrade      Alias for conda update. See conda update --help.
+    remove       Remove a list of packages from a specified conda environment.
+    uninstall    Alias for conda remove. See conda remove --help.
+    config       Modify configuration values in .condarc. This is modeled
+                 after the git config command. Writes to the user .condarc
+                 file (/Users/lei.wang/.condarc) by default.
+    init         Initialize conda into a regular environment (when conda was
+                 installed as a Python package, e.g. using pip). (DEPRECATED)
+    clean        Remove unused packages and caches.
+    package      Low-level conda package utility. (EXPERIMENTAL)
+    bundle       Create or extract a "bundle package" (EXPERIMENTAL)
+...
+```  
+
+信息太长了，后面的部分就不列举了。不过看到前面这部分选项，就已经足够让我们兴奋了：基本的list，search，install，upgrade，uninstall等功能都包含，说明我们可以向apt-get一样方便管理python的各种依赖了。。。  
+
+先list一下，查看里面都带了哪些牛逼闪闪的科学计算包：  
+
+```
+ei.wang ~ $ conda list
+# packages in environment at /Users/lei.wang/anaconda:
+#
+_nb_ext_conf              0.2.0                    py27_0
+alabaster                 0.7.8                    py27_0
+anaconda                  4.1.1               np111py27_0
+anaconda-client           1.4.0                    py27_0
+anaconda-navigator        1.2.1                    py27_0
+appnope                   0.1.0                    py27_0
+appscript                 1.0.1                    py27_0
+argcomplete               1.0.0                    py27_1
+astropy                   1.2.1               np111py27_0
+babel                     2.3.3                    py27_0
+backports                 1.0                      py27_0
+backports_abc             0.4                      py27_0
+beautifulsoup4            4.4.1                    py27_0
+bitarray                  0.8.1                    py27_0
+blaze                     0.10.1                   py27_0
+bokeh                     0.12.0                   py27_0
+boto                      2.40.0                   py27_0
+bottleneck                1.1.0               np111py27_0
+cdecimal                  2.3                      py27_2
+cffi                      1.6.0                    py27_0
+chest                     0.2.3                    py27_0
+click                     6.6                      py27_0
+cloudpickle               0.2.1                    py27_0
+clyent                    1.2.2                    py27_0
+colorama                  0.3.7                    py27_0
+conda                     4.1.6                    py27_0
+conda-build               1.21.3                   py27_0
+conda-env                 2.5.1                    py27_0
+configobj                 5.0.6                    py27_0
+configparser              3.5.0b2                  py27_1
+contextlib2               0.5.3                    py27_0
+cryptography              1.4                      py27_0
+curl                      7.49.0                        0
+cycler                    0.10.0                   py27_0
+cython                    0.24                     py27_0
+cytoolz                   0.8.0                    py27_0
+dask                      0.10.0                   py27_0
+datashape                 0.5.2                    py27_0
+decorator                 4.0.10                   py27_0
+dill                      0.2.5                    py27_0
+docutils                  0.12                     py27_2
+dynd-python               0.7.2                    py27_0
+entrypoints               0.2.2                    py27_0
+enum34                    1.1.6                    py27_0
+et_xmlfile                1.0.1                    py27_0
+fastcache                 1.0.2                    py27_1
+flask                     0.11.1                   py27_0
+flask-cors                2.1.2                    py27_0
+freetype                  2.5.5                         1
+funcsigs                  1.0.2                    py27_0
+functools32               3.2.3.2                  py27_0
+futures                   3.0.5                    py27_0
+get_terminal_size         1.0.0                    py27_0
+gevent                    1.1.1                    py27_0
+greenlet                  0.4.10                   py27_0
+grin                      1.2.1                    py27_3
+h5py                      2.6.0               np111py27_1
+hdf5                      1.8.16                        0
+heapdict                  1.0.0                    py27_1
+idna                      2.1                      py27_0
+imagesize                 0.7.1                    py27_0
+ipaddress                 1.0.16                   py27_0
+ipykernel                 4.3.1                    py27_0
+ipython                   4.2.0                    py27_1
+ipython_genutils          0.1.0                    py27_0
+ipywidgets                4.1.1                    py27_0
+itsdangerous              0.24                     py27_0
+jbig                      2.1                           0
+jdcal                     1.2                      py27_1
+jedi                      0.9.0                    py27_1
+jinja2                    2.8                      py27_1
+jpeg                      8d                            1
+jsonschema                2.5.1                    py27_0
+jupyter                   1.0.0                    py27_3
+jupyter_client            4.3.0                    py27_0
+jupyter_console           4.1.1                    py27_0
+jupyter_core              4.1.0                    py27_0
+libdynd                   0.7.2                         0
+libpng                    1.6.22                        0
+libtiff                   4.0.6                         2
+libxml2                   2.9.2                         0
+libxslt                   1.1.28                        2
+llvmlite                  0.11.0                   py27_0
+locket                    0.2.0                    py27_1
+lxml                      3.6.0                    py27_0
+markupsafe                0.23                     py27_2
+matplotlib                1.5.1               np111py27_0
+mistune                   0.7.2                    py27_1
+mkl                       11.3.3                        0
+mkl-service               1.1.2                    py27_2
+mpmath                    0.19                     py27_1
+multipledispatch          0.4.8                    py27_0
+nb_anacondacloud          1.1.0                    py27_0
+nb_conda                  1.1.0                    py27_0
+nb_conda_kernels          1.0.3                    py27_0
+nbconvert                 4.2.0                    py27_0
+nbformat                  4.0.1                    py27_0
+nbpresent                 3.0.2                    py27_0
+networkx                  1.11                     py27_0
+nltk                      3.2.1                    py27_0
+nose                      1.3.7                    py27_1
+notebook                  4.2.1                    py27_0
+numba                     0.26.0              np111py27_0
+numexpr                   2.6.0               np111py27_0
+numpy                     1.11.1                   py27_0
+odo                       0.5.0                    py27_1
+openpyxl                  2.3.2                    py27_0
+openssl                   1.0.2h                        1
+pandas                    0.18.1              np111py27_0
+partd                     0.3.4                    py27_0
+path.py                   8.2.1                    py27_0
+pathlib2                  2.1.0                    py27_0
+patsy                     0.4.1                    py27_0
+pep8                      1.7.0                    py27_0
+pexpect                   4.0.1                    py27_0
+pickleshare               0.7.2                    py27_0
+pillow                    3.2.0                    py27_1
+pip                       8.1.2                    py27_0
+ply                       3.8                      py27_0
+psutil                    4.3.0                    py27_0
+ptyprocess                0.5.1                    py27_0
+py                        1.4.31                   py27_0
+pyasn1                    0.1.9                    py27_0
+pyaudio                   0.2.7                    py27_0
+pycosat                   0.6.1                    py27_1
+pycparser                 2.14                     py27_1
+pycrypto                  2.6.1                    py27_4
+pycurl                    7.43.0                   py27_0
+pyflakes                  1.2.3                    py27_0
+pygments                  2.1.3                    py27_0
+pyopenssl                 0.16.0                   py27_0
+pyparsing                 2.1.4                    py27_0
+pyqt                      4.11.4                   py27_3
+pytables                  3.2.2               np111py27_4
+pytest                    2.9.2                    py27_0
+python                    2.7.12                        1
+python-dateutil           2.5.3                    py27_0
+python.app                1.2                      py27_4
+pytz                      2016.4                   py27_0
+pyyaml                    3.11                     py27_4
+pyzmq                     15.2.0                   py27_1
+qt                        4.8.7                         3
+qtconsole                 4.2.1                    py27_0
+qtpy                      1.0.2                    py27_0
+readline                  6.2                           2
+redis                     3.2.0                         0
+redis-py                  2.10.5                   py27_0
+requests                  2.10.0                   py27_0
+rope                      0.9.4                    py27_1
+ruamel_yaml               0.11.7                   py27_0
+scikit-image              0.12.3              np111py27_1
+scikit-learn              0.17.1              np111py27_2
+scipy                     0.17.1              np111py27_1
+setuptools                23.0.0                   py27_0
+simplegeneric             0.8.1                    py27_1
+singledispatch            3.4.0.3                  py27_0
+sip                       4.16.9                   py27_0
+six                       1.10.0                   py27_0
+snowballstemmer           1.2.1                    py27_0
+sockjs-tornado            1.0.3                    py27_0
+sphinx                    1.4.1                    py27_0
+sphinx_rtd_theme          0.1.9                    py27_0
+spyder                    2.3.9                    py27_0
+sqlalchemy                1.0.13                   py27_0
+sqlite                    3.13.0                        0
+ssl_match_hostname        3.4.0.2                  py27_1
+statsmodels               0.6.1               np111py27_1
+sympy                     1.0                      py27_0
+terminado                 0.6                      py27_0
+tk                        8.5.18                        0
+toolz                     0.8.0                    py27_0
+tornado                   4.3                      py27_1
+traitlets                 4.2.1                    py27_0
+unicodecsv                0.14.1                   py27_0
+werkzeug                  0.11.10                  py27_0
+wheel                     0.29.0                   py27_0
+xlrd                      1.0.0                    py27_0
+xlsxwriter                0.9.2                    py27_0
+xlwings                   0.7.2                    py27_0
+xlwt                      1.1.2                    py27_0
+xz                        5.2.2                         0
+yaml                      0.1.6                         0
+zlib                      1.2.8                         3
+```  
+
+好吧，至少我常用的都已经在这了。太方便了。  
+
+## 5.写个demo测试一下sklearn
+为了测试一下是不是真像传说中那么好用，从网络上现找了部分简单的测试代码：  
+
+```
+#!/usr/bin/env python
+#coding:utf-8
+
+'''
+Created on 2016年7月15日
+
+@author: lei.wang
+'''
+
+import numpy as np
+import urllib
+from sklearn import preprocessing
+from sklearn import metrics
+from sklearn.ensemble import ExtraTreesClassifier
+from sklearn.linear_model import LogisticRegression
+
+
+def t1():
+    url = "http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
+    raw_data = urllib.urlopen(url)
+    dataset = np.loadtxt(raw_data,delimiter=",")
+    X = dataset[:,0:7]
+    y = dataset[:,8]
+    
+    # normalize the data attributes
+    normalized_X = preprocessing.normalize(X)
+    # standardize the data attributes
+    standardized_X = preprocessing.scale(X)
+
+    model = ExtraTreesClassifier()
+    model.fit(X, y)
+    # display the relative importance of each attribute
+    print model.feature_importances_
+    
+    model = LogisticRegression()
+    model.fit(X, y)
+    print(model)
+    # make predictions
+    expected = y
+    predicted = model.predict(X)
+    # summarize the fit of the model
+    print(metrics.classification_report(expected, predicted))
+    print(metrics.confusion_matrix(expected, predicted))
+
+t1()
+```  
+
+让代码run起来，得到如下结果：  
+
+```
+[ 0.13697671  0.26771573  0.11139943  0.08658428  0.079841    0.16862413
+  0.1488587 ]
+LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
+          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
+          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
+          verbose=0, warm_start=False)
+             precision    recall  f1-score   support
+
+        0.0       0.79      0.89      0.84       500
+        1.0       0.74      0.55      0.63       268
+
+avg / total       0.77      0.77      0.77       768
+
+[[447  53]
+ [120 148]]
+
+```  
+
+好吧，sklearn表现正常，能正常输出预期结果。看来，Anaconda确实是为搞算法与数据的同志们提供了一个非常好的工具，省去了我们各种搭环境找依赖包的烦恼！向开发了这么好用工具的程序猿们致敬！  
--- a/迭代器与生成器详解.md
+++ b/迭代器与生成器详解.md
@ -0,0 +1,274 @@
+在python中，我们经常使用for循环来遍历各种集合，例如最常用的有list，dict等等，这些集合都是可迭代对象。我们先来了解一下python中的迭代器(Iterator)。  
+
+## 一、迭代器
+顾名思义，迭代器，自然就是用来做迭代用的（好像是废话）。以list为例，我们用list，最多的情况就是用来做循环了（循环就是迭代嘛）  
+
+```
+>>> list = [1,2,3]
+>>> dir(list)
+['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
+```  
+list就有\__iter__方法。如果调用此方法，则会返回一个迭代器  
+
+```
+>>> it = list.__iter__()
+>>> it
+<listiterator object at 0x10fa12950>
+>>> dir(it)
+['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__length_hint__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'next']
+```  
+
+所谓迭代器，是指具有next方法的对象。注意调用next方式的时候，不需要任何参数。调用next方法时，迭代器会返回它的下一个值。如果迭代器没有值返回，则会抛出StopIteration的异常。  
+
+```
+>>> it.next()
+1
+>>> it.next()
+2
+>>> it.next()
+3
+>>> it.next()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+StopIteration
+```  
+
+有的同学会问，我们用list用得好好的，为什么要用什么iterator？因为list是一次性获得所有值，如果这个列表很大，需要占用很大内存空间，甚至大到内存装载不下；而迭代器则是在迭代（循环）中使用一个计算一个，对内存的占用显然小得多。  
+
+### 用迭代器实现Fibonacci数列
+
+```
+#!/usr/bin/env python
+#coding:utf-8
+
+'''
+Created on 2016年5月6日
+
+@author: lei.wang
+'''
+
+class Fibonacci(object):
+    def __init__(self):
+        self.a = 0
+        self.b = 1
+        
+    def next(self):
+        self.a,self.b = self.b,self.a + self.b
+        print self.a
+        return self.a
+    
+    def __iter__(self):
+        return self
+    
+if __name__ == '__main__':
+    fib = Fibonacci()
+    for n in fib:
+        if n > 10:
+            #print n
+            break
+
+```  
+
+刚才我们讲的都是从列表转为迭代器，那从迭代器能变成列表么？答案是当然可以，请看：  
+
+```
+#!/usr/bin/env python
+#coding:utf-8
+
+'''
+Created on 2016年5月6日
+
+@author: lei.wang
+'''
+
+class MyIterator(object):
+    index = 0
+    
+    def __init__(self):
+        pass
+    
+    def next(self):
+        self.index += 1
+        if self.index > 10:
+            raise StopIteration
+        return self.index
+    
+    def __iter__(self):
+        return self
+    
+if __name__ == '__main__':
+    my_interator = MyIterator()
+    my_list = list(my_interator)
+    print my_list
+    
+```  
+
+```
+[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+```  
+
+## 二、生成器
+当我们调用一个普通的python函数时（其实不光是python函数，绝大部分语言的函数都是如此)，一般都是从函数的第一行开始执行，直到遇到return语句或者异常或者函数的最后一行。这样，函数就将控制权交还与调用者，函数中的所有工具以及局部变量等数据都将丢失。再次调用这个函数的时候，所有的局部变量，堆栈信息都将重新创建，跟之前的调用再无关系。  
+
+有时候我们并不希望函数只返回一个值，而是希望返回一个序列，比如前面的fibonacci序列。要做到这一点，这种函数需要能够保存自己的工作状态。这样的话，就不能使用我们通常所使用的return语句，因为一旦使用return语句，代码执行的控制权就交给了函数被调用的地方，函数的所有状态将被清零。在这种情况下，我们就需要使用yield关键字。含有yield关键字的地方，就是一个生成器。  
+
+
+在python中，生成器通过生成器函数生成，生成器函数定义的方法跟普通函数定义的方法一致。唯一不同的地方是，生成器函数不用return返回，而是用yield关键字一次返回一个结果，在每个结果之间挂起与继续他们的状态，来自动实现迭代（循环）。  
+
+废话说了这一大堆，直接上代码，show me the code:  
+
+```
+#!/usr/bin/env python
+#coding:utf-8
+
+'''
+Created on 2016年5月6日
+
+@author: lei.wang
+'''
+
+def myXrange(n):
+    print "myXrange beginning!"
+    i = 0
+    while i < n:
+        print "before yield, i is: ",i
+        yield i
+        i += 1
+        print "after yield, i is: ",i
+    print "myXrange endding!"        
+    
+def testMyXrange():
+    my_range = myXrange(3)
+    print my_range
+    print "--------\n"
+    
+    print my_range.next()
+    print "--------\n"
+    
+    print my_range.next()
+    print "--------\n"
+    
+    print my_range.next()
+    print "--------\n"    
+    
+    print my_range.next()
+    print "--------\n"
+    
+testMyXrange()
+```  
+
+代码运行的结果  
+
+```
+<generator object myXrange at 0x10b3f6b90>
+--------
+
+myXrange beginning!
+before yield, i is:  0
+0
+--------
+
+after yield, i is:  1
+before yield, i is:  1
+1
+--------
+
+after yield, i is:  2
+before yield, i is:  2
+2
+--------
+
+after yield, i is:  3
+myXrange endding!
+Traceback (most recent call last):
+  File "/Users/lei.wang/code/java/pydevttt/leilei/bit/interview/myGenerator.py", line 37, in <module>
+    testMyXrange()
+  File "/Users/lei.wang/code/java/pydevttt/leilei/bit/interview/myGenerator.py", line 34, in testMyXrange
+    print my_range.next()
+StopIteration
+
+```  
+
+有代码运行的结果，我们很容易看出：  
+1.当调用生成器函数时候，函数返回的，只是一个生成器对象，并没有真正执行里面的逻辑。  
+2.当next()方法第一次被调用以后，生成器才真正开始工作。一旦遇到yield语句，代码便停止运行。注意此时的停止运行跟return的是不一样的。  
+3.调用next()方法的时候，返回的是yield处的参数值  
+4.当继续调用next()方法时，代码将在上一次停止的yield语句处继续执行，并且到下一个yield处停止。  
+5.一直到后面没有yield语句，最后抛出StopIteration的异常。  
+
+生成器其实对我们来说并不陌生，请看：  
+以大家都比较熟悉的列表解析式为例：  
+
+```
+>>> list=[i for i in range(10)]
+>>> list
+[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+>>> type(list)
+<type 'list'>
+```  
+
+将方括号改为圆括号：  
+
+```
+>>> gen=(i for i in range(3))
+>>> gen
+<generator object <genexpr> at 0x10c4a19b0>
+>>> gen.next()
+0
+>>> gen.next()
+1
+>>> gen.next()
+2
+>>> gen.next()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+StopIteration
+```  
+
+大伙看到没有，这就是一个典型的生成器。  
+
+再举一个我们常见的例子：  
+大家都经常使用range生成一个列表做循环。注意range生成的是一个列表。那如果这个列表很大，大到内存都无法放下。那么，我们这个时候需要使用xrange了。xrange产生的就是一个生成器，就不受内存的限制。。。  
+
+用生成器产生Fibonacci序列：  
+
+```
+#!/usr/bin/env python
+#coding:utf-8
+
+'''
+Created on 2016年5月6日
+
+@author: lei.wang
+'''
+class Fibonacci_generator(object):
+    def __init__(self):
+        self.a = 0
+        self.b = 1
+    
+    def get_num(self):
+        while True:
+            self.a,self.b = self.b,self.a+self.b
+            print self.a
+            yield self.a
+
+if __name__ == '__main__':
+    fib = Fibonacci_generator()
+    for n in fib.get_num():
+        if n > 10:
+            break
+
+```  
+
+运行上面的代码：  
+
+```
+1
+1
+2
+3
+5
+8
+13
+
+```  
--- a/papers/languages/python/python处理文件效率对比awk.md
+++ b/papers/languages/python/python处理文件效率对比awk.md
@ -0,0 +1,100 @@
+有如下三文件：
+
+```
+wc -l breakfast_all cheap_all receptions_all
+   3345271 breakfast_all
+   955890 cheap_all
+   505504 receptions_all
+  4806665 总用量
+
+head -3 cheap_all
+a    true
+b    true
+c    true
+```  
+
+三个文件的结构都类似，第一列为uid。现在想统计三个文件中总共有多少不重复的uid。特意用python与awk分别写了代码，测试两者处理文本的速度。  
+
+python代码：  
+
+
+```
+#!/usr/bin/env python
+#coding:utf-8
+
+import time
+
+def t1():
+    dic = {}
+    filelist = ["breakfast_all","receptions_all","cheap_all"]
+    start = time.clock()
+    for each in filelist:
+        f = open(each,'r')
+        for line in f.readlines():
+            key = line.strip().split()[0]
+            if key not in dic:
+                dic[key] = 1
+
+    end = time.clock()
+    print len(dic)
+    print 'cost time is: %f' %(end - start)
+
+def t2():
+    uid_set = set()
+    filelist = ["breakfast_all","receptions_all","cheap_all"]
+    start = time.clock()
+    for each in filelist:
+        f = open(each,'r')
+        for line in f.readlines():
+            key = line.strip().split()[0]
+            uid_set.add(key)
+
+    end = time.clock()
+    print len(uid_set)
+    print 'cost time is: %f' %(end - start)
+
+t1()
+t2()
+```  
+
+用awk处理  
+
+```
+#!/bin/bash
+
+function handle()
+{
+    start=$(date +%s%N)
+    start_ms=${start:0:16}
+    awk '{a[$1]++} END{print length(a)}' breakfast_all receptions_all cheap_all
+    end=$(date +%s%N)
+    end_ms=${end:0:16}
+    echo "cost time is:"
+    echo "scale=6;($end_ms - $start_ms)/1000000" | bc
+}
+
+handle
+```  
+
+
+运行python脚本  
+```
+./test.py
+3685715
+cost time is: 4.890000
+3685715
+cost time is: 4.480000
+```  
+
+
+运行sh脚本  
+
+```
+./zzz.sh
+3685715
+cost time is:
+4.865822
+```  
+
+由此可见，python里头的set结构比dic稍微快一点点。整体上，awk的处理速度与python的处理速度大致相当！
+
--- a/papers/languages/python/python解析配置文件.md
+++ b/papers/languages/python/python解析配置文件.md
@ -0,0 +1,66 @@
+最近有个python小项目，有一堆文件需要处理。所以将文件位置写入配置文件中，顺便写了一个解析配置文件的类，仅供大家参考，需要的同学请拿走  
+
+```
+#!/usr/bin/env python
+#coding:utf-8
+
+#-----------------------------------------------------
+# author: wanglei
+# date  : 20160321
+# desc  : 解析配置文件
+# pram  : 配置文件位置
+#-----------------------------------------------------
+
+
+import ConfigParser
+
+class confParse(object):
+
+    def __init__(self,conf_path):
+        self.conf_path = conf_path
+        self.conf_parser = ConfigParser.ConfigParser()
+        self.conf_parser.read(conf_path)
+
+    def get_sections(self):
+        return self.conf_parser.sections()
+
+    def get_options(self,section):
+        return self.conf_parser.options(section)
+
+    def get_items(self,section):
+        return self.conf_parser.items(section)
+
+    def get_val(self,section,option,is_bool = False,is_int = False):
+        if is_bool and not is_int:
+            #bool类型配置
+            val = self.conf_parser.getboolean(section,option)
+            return val
+        elif not is_bool and is_int:
+            val = self.conf_parser.getint(section,option)
+            return val
+
+        val = self.conf_parser.get(section,option)
+        return val
+```  
+
+
+
+配置文件格式如下  
+
+
+```
+[labels_of_search]
+base_dir = /home/lei.wang/datas/datas_user_label
+cheap = %(base_dir)s/cheap_all
+receptions = %(base_dir)s/receptions_all
+breakfast = %(base_dir)s/breakfast_all
+
+[result_file]
+result_file = /home/lei.wang/datas/datas_user_label/hive_data/user_labels
+```  
+
+注意%(xxx)s的用法，xxx需要放在同一个section里  
+
+
+
+