python function

python function

Function is the smallest unit of organizing code in python. A function has an input (parameter) and an output (return value). A function is actually a code unit that converts input into output.

4.functions can be created in python:

  1. Global function: defined in the module;
  2. Local function: nested in other functions;
  3. lambda function: expression;
  4. Method: A function associated with a specific data type, and can only be used with data type association.

Extract the repeated places in the program and define it as a function. Benefits of using functions:

  • Program scalability;
  • Reduce program code;
  • Facilitate changes to the program structure.

Definition syntax:

def  ( 1,  2):
    print( 1 +  2)
    ...
    return  1 +  2 #  
 

When defining a function, the body of the function is not executed. It will execute only when we call it.

Function call:

 ( 1,  2)
 

The parameters passed in must match the parameters of the function definition. If they do not match, a TypeError will be thrown. The parameters are passed in in a defined order, and such a method of passing parameters is called positional parameters.

Function parameters

The parameters of the function are divided into formal parameters and actual parameters.

  • When defining a function, the variable names in parentheses after the function are called "formal parameters" or "formal parameters";
  • When calling a function, the variable names in parentheses after the function are called "actual parameters" or "actual parameters".
In [1]: def fun(x,y):
   ...:     print x + y
   ...:

In [2]: fun(3,4)
7

In [3]: fun('ab','c')
abc
 

Determine whether the parameter passed to the script is a number:

#!/usr/bin/env python
import sys
def isNum(s):
    for i in s:
        if i in '0123456789':
            print '%s is a number' %s
            sys.exit()
    print '%s is not a number' %s

isNum(sys.argv[1])
 

Passing positional parameters

Parameters are passed in in a defined order, and such a method of passing parameters is called positional parameters. This is the simplest and easiest way to pass parameters.

>>> def add(x, y):
...     return x + y
...
>>> add(2, 3) # 2   x 3   y
Out[10]: 5
 

Default parameters

The parameter in the function does not need to be assigned a value, it will have a default value. For example, some websites will have a default value whether they choose a nationality or not.

>>> def inc(base, x=1):
...     return base + x
...
>>> inc(3) #  
Out[12]: 4
>>> inc(3, 3) #  
Out[13]: 6
 

But the default parameter must appear after the parameter without default value, otherwise when only one parameter is passed, it doesn't know to whom.

>>> def inc(x=1, base):
...     return base + x
...
  File "<ipython-input-14-ac010ba50fd9>", line 1
    def inc(x=1, base):
           ^
SyntaxError: non-default argument follows default argument
 

variable parameter

There is such a situation:

>>> def sum(lst):
...     ret = 0
...     for i in lst:
...         ret += i
...     return ret
...
>>> sum([1, 2, 3]) #  
Out[16]: 6
 

This function can only pass one parameter to it, but if you want to pass the elements in the list to it in the form of parameters, you can define the parameters like this:

>>> def sum(*lst): #  
...     ret = 0
...     for i in lst:
...         ret += i
...     return ret
...
>>> sum(1, 2, 3, 4) #  
Out[19]: 10
 

There are two types of variable parameters:

  • Positional variable parameter: Add an asterisk before the parameter to indicate that the parameter is variable, that is, it can accept any number of parameters. These parameters form a tuple, and the parameters can only be passed through positional parameters .
  • Keyword variable parameter: Add two asterisks before the parameter, which means that the parameter is variable, that is, it can accept any number of parameters. These parameters form a dictionary, and the parameters can only be passed through keyword parameters .
>>> def connect(**kwargs):
...     print(type(kwargs))
...     for k, v in kwargs.items():
...         print('{} -> {}'.format(k, v))
...
>>> connect(host='10.0.0.1', port=3306)
<class 'dict'>
host -> 10.0.0.1
port -> 3306
 

Precautions:

  • These two variable parameters can be used together, but the dictionary must be behind.
  • Common parameters can be used with variable parameters, but must match when passing parameters.
  • The positional variable parameter can be before the ordinary parameter, but the ordinary parameter after it can only be passed through the keyword parameter.
  • Keyword variable parameters are not allowed before ordinary parameters.

In order to prevent errors or doubts for the caller, we should follow the following rules:

  1. The default parameters are behind;
  2. Variable parameters are behind;
  3. Default parameters and variable parameters do not appear at the same time.

When we connect to the database, we can deal with it like this:

#  
def connect(host='127.0.0.1', port='3306', user='root', password='', db='test', **kwargs):
    pass

#  
def connect(**kwargs):
    host = kwargs.pop('host', '127.0.0.1')
 

Parameter deconstruction

Calling function when used *at the beginning of the parameter, the parameter set may be used to break up, so as to transmit the position or parameters based on any number of keywords. Before that, let's review the variable unpacking:

>>> l1 = ['Sun', 'Mon', 'Tus']
>>> x, y, z = l1
>>> x, y, z
('Sun', 'Mon', 'Tus')
 

Parameter unpacking is the result of this:

>>> def f1(a, b, c):
...   print a, b, c
...
>>> f1(*['Sun', 'Mon', 'Tus'])
Sun Mon Tus
 

The number of parameters and the number of elements in the list must match one by one.

>>> def sum(*args):
...     ret = 0
...     for i in args:
...         ret += i
...     return ret
...
>>> sum(*range(10))
Out[34]: 45
 

There are two forms of parameter deconstruction:

  • An asterisk: The deconstructed object is an iterable object, and the result of deconstruction is a positional parameter;
  • Two asterisks: The object of deconstruction is a dictionary, and the result of the structure is a keyword argument.

The key of the dictionary to be deconstructed must be a string.

Keyword parameter transfer

This is for actual parameters, which is used when passing to the function. There is no special place in the definition of the function itself. This avoids the problem that the default parameters cannot be defined in order. It can be said that this can be used with the default parameters.

First define a function, and then use keyword parameters to pass:

>>> def add(x, y):
...     return x + y
...
 >>> add(2, y=3)
Out[21]: 5
 

When positional parameters and keyword parameters are used together, the positional parameters must be in front, otherwise an error will be reported.

>>> def add(x, y):
...     return x + y
...
>>> add(x=2, 3)
  File "<ipython-input-9-92c728388ce1>", line 1
    add(x=2, 3)
            ^
SyntaxError: positional argument follows keyword argument
 

The combination of default parameters and keyword passing parameters is very easy to use, it can make the function call very concise.

def connect(host='127.0.0.1', port='3306', user='root', password='', db='test'):
    pass

connect('10.0.0.5', password='123456')
 

Variable parameters allow you to pass in 0 or any number of parameters. These variable parameters are automatically encapsulated into a tuple when the function is called. Keyword parameters allow you to pass in 0 or any number of parameters with parameter names. These keyword parameters are automatically assembled into a dict within the function. Look at the example:

>>> def person(name, age, **kw):
...     print('name:', name, 'age:', age, 'other:', kw)
 

In addition to the mandatory parameters name and age, the person function also accepts the keyword parameter kw. When calling this function, you can only pass in required parameters:

>>> person('Michael', 30)
name: Michael age: 30 other: {}
 

You can also pass in any number of keyword parameters:

>>> person('Bob', 35, city='Beijing')
name: Bob age: 35 other: {'city': 'Beijing'}
>>> person('Adam', 45, gender='M', job='Engineer')
name: Adam age: 45 other: {'gender': 'M', 'job': 'Engineer'}
 

What is the use of keyword parameters? It can extend the functionality of the function. For example, in the person function, we are guaranteed to receive the two parameters name and age, but if the caller is willing to provide more parameters, we can also receive it.

Imagine that you are doing a user registration function. Except for the user name and age, which are required, the others are optional. Using keyword parameters to define this function can meet the registration requirements.

Similar to variable parameters, you can also assemble a dict first, and then convert the dict into keyword parameters and pass it in:

>>> extra = {'city': 'Beijing', 'job': 'Engineer'}
>>> person('Jack', 24, city=extra['city'], job=extra['job'])
name: Jack age: 24 other: {'city': 'Beijing', 'job': 'Engineer'}
 

Of course, the above complex call can be written in a simplified way:

>>> extra = {'city': 'Beijing', 'job': 'Engineer'}
>>> person('Jack', 24, **extra)
name: Jack age: 24 other: {'city': 'Beijing', 'job': 'Engineer'}
 

**extra Indicates that all key-values of the extra dict are passed into the function with keyword parameters **kw parameter, kw will get a dict, dict note that extra kw obtain a copy, changes to kw will not affect the function of the outside extra.

keyword-only parameter

This is newly added by python3.

  • The parameters after the asterisk can only be passed through keyword parameters , which are called keyword-only parameters.
  • The asterisk itself does not receive any value.
  • The parameters after the variable positional parameters are also keyword-only parameters.
  • The keyword-only parameter can have a default value;
  • The keyword-only parameter can appear together with the default parameter, whether it has a default value or not, regardless of whether the default parameter is a keyword-only parameter.
>>> def fn(*, x): #   keyword-only  
...     print(x)
...
>>> fn(1) #  
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-cb5d79cf2c77> in <module>()
----> 1 fn(1)

TypeError: fn() takes 0 positional arguments but 1 was given

>>> fn(x=3) #  
3
 

In normal usage, the keyword-only parameter has a default value. The keyword-only parameter is widely used in the python standard library.

return

Functions generally have inputs and outputs, and return defines the output of the function, that is, the return value.

def fun():
    print('hello world')

fun()
 

Executing this script will output hello world. How to check the return value of this function? Just add a print:

>>> def fun():
...     print('hehe')
...
>>> print(fun())
hehe
None
 

You can see that the return value is None. If the return value of the function is not defined, it defaults to None. At this point, the return value can be defined by return:

>>> def fun(x, y):
...     return x + y
...     print('hehe')
...
>>> fun(3, 5)
Out[57]: 8
 

It can be seen from the above that once the return function is encountered, the operation is terminated, and the following code will not run again.

There can be multiple returns in a function. Which return is executed will return the result and end the function.

>>> def guess(x):
...     if x > 3:
...         return '> 3'
...     else:
...         return '<= 3'
...
>>> guess(4)
Out[59]: '> 3'
 

The return value of the function can be arbitrarily defined. In fact, return is generally used in functions, and print is rarely used. The purpose of using return is to be able to use the value it returns, and then use this value to do something.

Scope and global variables

The scope is the visible range of a variable. Before touching the function, there is no concept of scope, because all the code is in the same scope. Since a function is the smallest unit for organizing code, the concept of scope starts from the function.

Python creates, changes, or finds variable names in the name space. The position where the variable name is assigned in the code determines the scope that it can be accessed. The function defines the local scope, and the module defines the global scope. Each module is a global scope, therefore, the scope of the global scope is limited to a single program file.

Each call to a function creates a new local scope, and the assigned variable is a variable unless it is declared as a global variable.

All variable names can be summarized as local, global or built (by the __builtin__providing module).

Variable name reference is divided into three scopes, first local, then global, and finally built-in. That is to say, to quote this variable, first look for it in the local function. If it doesn t find it, it will be in its outer function (Enclosing function locals). If it s not found, look for the global variable in the module (Global module). If not, look for the built-in variable (Built-in), and report an error if it still cannot be found.

The scope of local variables is limited to a function, local variables only exist in the function, that is to say, the variables defined in the function are local variables; and the scope of global variables is the entire program. The priority of local variables is higher than that of global variables. When the names of local variables and global variables are the same, they will not overwrite each other.

>>> x = 1 #  
>>> def inc():
...     x += 1
...
>>> inc() #  
Traceback (most recent call last):
  File "/usr/local/python/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-62-ae671e6b904f>", line 1, in <module>
    inc()
  File "<ipython-input-61-661b9217054c>", line 2, in inc
    x += 1
UnboundLocalError: local variable 'x' referenced before assignment
 

Every program has a global scope, and the scope of the code written in the top line is global. There is a global and a local, and the local scope will deepen with its level, there will be multiple levels of local scope.

>>> def outer(): #  
...     x = 1
...     print(x)
...     def inner():
...         x = 2 #   x
...     inner()
...     print(x)
...
>>> outer()
1
1
 

Features of scope:

  • The scope of a variable is where it is defined. If it is defined globally, it will be visible everywhere;
  • The upper level scope is read-only and visible to the lower level.

The rules of scope can be broken.

>>> x = 5
>>> def fun():
...     global x #   global  
...     x += 1
...     return x
...
>>> fun()
Out[69]: 6
 

global just marks a variable as a global variable, but it will not be defined, so it needs to be defined manually.

Unless you clearly know what global will bring, and you clearly know that non-global won't work, don't use global.

What if you don't want to define global variables, but you want to use local variables in a certain function in other functions? The method is very simple, just use return in a function to return the value of the function that needs to be called, and then use the variable to receive outside the function.

>>> def fun():
...     x = {'a': 1, 'b': 2}
...     return x
...
>>> x = fun()
>>> x
Out[72]: {'a': 1, 'b': 2}
 

Closure

The smallest scope in Python is a function, which means that the variables in the function cannot be accessed from the outside, but the child function can access the variables of the parent function. This is a closure.

The shortcomings of global are:

>>> def outer():
...     y = 1
...     def inner():
...         global y
...         y += 1
...         return y
...     return inner
...
>>> y = 1
>>> f()
Out[83]: 2
>>> y = 100 #   y
>>> f() #  
Out[85]: 101
 

Inner cannot modify y in outer because y is re-assigned in inner. Conversely, if we do not re-assign y, but modify itself, can we bypass this rule? We know that lists are mutable, so we can use lists for testing.

>>> def outer():
...     y = [0]
...     def inner():
...         y[0] += 1
...         return y[0]
...     return inner
...
>>> f = outer()
>>> f()
Out[89]: 1
>>> f()
Out[90]: 2
>>> y = 100 #  
>>> f()
Out[91]: 3 #  
 

The concept of closure is: the function has ended, but the references to some variables inside the function still exist. In the above example, when the return inner is executed in the outer function, the function has ended. Logically speaking, the variables in it will be destroyed. But we can still access the variable y inside through its internal function inner.

Closures in python can be implemented by mutable containers (such as lists), which is also the only way in python2. In python3, you can also use the nonlocal keyword.

>>> def outer():
...     y = 1
...     def inner():
...         nonlocal y
...         y += 1
...         return y
...     return inner
...
>>> f = outer()
>>> f()
Out[97]: 2
>>> f()
Out[98]: 3
>>> f()
Out[99]: 4
 

The nonlocal keyword is used to mark that a variable is defined by its superior scope. Variables marked by nonlocal are readable and writable. If the upper-level scope does not define this variable, a syntax error will be thrown.

Default parameter scope

Everything in python is an object, functions are also objects, and parameters are attributes of function objects, so the scope of function parameters accompanies the entire life cycle of the function. That is to say, as long as the function always exists, its internal parameters are always visible to it.

For functions defined in the global scope, the timing of destruction is:

  • redefine
  • del delete
  • Exit at the end of the program

The local scope is:

  • redefine
  • del
  • The superior scope is destroyed

Look at the following code:

>>> def fn(x=[]):
...     x.append(1)
...     return x
...
>>> fn()
Out[101]: [1]
>>> fn()
Out[102]: [1, 1]
>>> fn()
Out[103]: [1, 1, 1]
 

Each execution will add an element to this list.

Function of default parameters stored in its internal __default__processes:

>>> fn.__defaults__
Out[104]: ([1, 1, 1],)
>>> fn()
Out[105]: [1, 1, 1, 1]
>>> fn.__defaults__
Out[106]: ([1, 1, 1, 1],)
 

In the above example, the default parameter is a list of mutable objects. Will this happen if you change to a non-mutable object?

>>> def fn(x=1, y=1):
...     x += 1
...     y += 1
...
>>> fn.__defaults__
Out[112]: (1, 1)
>>> fn()
>>> fn.__defaults__
Out[114]: (1, 1)
 

It can be seen that the results of mutable objects and non-mutable objects are different. Why does this happen? Again, assignment means definition. The list directly modifies itself, and there is no assignment operation.

Therefore, when the variable type acts on the default parameters of a function, special attention needs to be paid.

In order to avoid this situation, we can:

  • Do not use variable types as the default parameters of the function;
  • Do not modify it in the function.

You can do this without using a list as a parameter:

>>> def fn(lst=None):
...     if lst is None:
...         lst = []
...     # else: #  
...     #     lst = lst[:]
...     lst.append(1) #   None  else
...     return lst
...
>>> fn()
Out[123]: [1]
>>> fn()
Out[124]: [1]
>>> fn.__defaults__
Out[125]: (None,)
 

You can do this without modifying the function:

>>> def fn(lst=[]):
...     lst = lst[:] #  
...     lst.append(1) #  
...     return lst
...
>>> fn()
Out[128]: [1]
>>> fn()
Out[129]: [1]
>>> fn.__defaults__
Out[130]: ([],)
 

Usually if a variable type is used as the default parameter of a function, None will be used instead.

Function execution flow

The program will be executed on the CPU, and from top to bottom. At the beginning, the main flow of the program is first executed on the CPU. After the function is encountered, the main flow will go down from the CPU and let the function go up. At this time, the status of the main process, including variables, which piece of code is executed, and other information will be saved in the stack. When the function is executed, the function will be destroyed, and then the main process will be dispatched to the CPU, and its state will be restored from the stack.

When calling a function, the interpreter pushes the current scene onto the stack, and then executes the called function to complete the execution. The interpreter pops the top of the current stack and restores the scene.

Define function help documentation

The character string written in the first line below the function name is the help information, which can be obtained through the help function.

def fn():
    '''this is fn'''

>>> help(fn)
Help on function fn in module __main__:

fn()
    this is fn
(END)
 

You can also function itself __doc__method:

>>> fn.__doc__
Out[3]: 'this is fn'
 

Anonymous function

Lambda functions are also called anonymous functions, which are functions without names. It is a minimal function that quickly defines a single line, and can be used wherever a function is needed. It is different from def in that def defines a statement, while lambda is an expression, and it can appear where any expression can appear.

The syntax is:

lambda [args]: expression
 
  • args: a comma-separated list of parameters, parameters can be omitted;
  • expression: Define the return value.

The easiest way to use:

>>> lambda x, y: x + y
Out[135]: <function __main__.<lambda>>
 

Call it:

>>> (lambda x, y: x + y)(3, 4)
Out[134]: 7
 

The first pair of parentheses are used to change the priority, and the second pair of parentheses indicate function calls. Since we are not sure which priority of the function definition and execution is higher, we use the first parenthesis to increase its priority.

Conventional call method:

>>> f = lambda x, y: x + y
>>> f(3, 4)
Out[137]: 7
 

Features of anonymous functions:

  • Use lambda definition;
  • The parameter list does not require parentheses;
  • The colon is not used to open a new statement block;
  • The changes of parameters supported by ordinary functions are supported by anonymous functions;
  • Without return, the value of the last expression is the return value.

The code defined by the lambda statement must be a legal expression and can only be written on one line. Multiple conditional statements (ternary expressions that can use if) and other non-expression statements, such as for and while, cannot appear. The primary purpose of lambda is to specify a short callback function, which will return a function instead of assigning the function to a variable name.

Lambda also supports default parameters:

>>> (lambda x, y=4: x + y)(3)
Out[138]: 7
 

Variable parameters also support:

>>> (lambda *args, **kwargs: print(args, kwargs))(*range(3), **{str(x): x for x in range(3)})
(0, 1, 2) {'0': 0, '1': 1, '2': 2}
 

keywork-only also supports:

>>> (lambda *, x: x)(x=5)
Out[141]: 5
 

You can see the various parameters that ordinary functions can support, and anonymous functions all support them.

Anonymous functions are mostly used for higher-order functions.

>>> User = namedtuple('Uers', ['name', 'age'])
>>> users = [User('tom', 18), User('jerry', 15), User('sam', 44)]
>>> sorted(users, key=lambda x: x.age) #  
Out[165]:
[Uers(name='jerry', age=15),
 Uers(name='tom', age=18),
 Uers(name='sam', age=44)]
 

Lambda can play the role of function shorthand. For some simple functions, you can use lambda when you don't want to define a function.

>>> reduce(lambda x,y:x+y,range(1,101))
5050
 

Find the factorial:

>>> reduce(lambda x,y:x*y,range(1,6))
120
 

Because lambda is an expression, it can be used in a list:

>>> l1 = [(lambda x:x*2),(lambda y:y*3)]

#  for 
>>> for i in l1: #  lambda i
...   print i(3)
...
6
9
 

Lambda can be used in map functions:

>>> a=range(10)
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> map(lambda x:x**2,a) #  a x
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
 

To be more complicated:

>>> b=range(10)
>>> map(lambda x,y:x**y,a,b)
[1, 1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489] #  9 9 
 

Higher order function

A function whose return function or parameter is a function is a higher-order function.

What is the use of higher-order functions? For example, define a sorting function:

def sort(it, r=False):
    ret = []
    def cmp(a, b):
        if r:
            return a < b
        else:
            return a > b

    for x in it:
        for i, e in enumerate(ret):
            if cmp(x, e):
                ret.insert(i, x)
                break
        else:
            ret.append(x)
    return ret
 

This sorting function uses an internal function cmp to control whether it is sorted in order or in reverse order. But we can take out the cmp function and define it by the user.

def cmp(a, b):
    return a > b #  

def sort(it, cmp):
    ret = []
    for x in it:
        for i, e in enumerate(ret):
            if cmp(x, e):
                ret.insert(i, x)
                break
        else:
            ret.append(x)
    return ret
 

You can continue to improve:

def sort(it, cmp=lambda a, b: a<b): #  
    ret = []
    for x in it:
        for i, e in enumerate(ret):
            if cmp(x, e):
                ret.insert(i, x)
                break
        else:
            ret.append(x)
    return ret
 

The above is a scenario where a function is used as a parameter. It is also used in scenarios where most logic is fixed and a few logic is not fixed. The function as the return value is usually used in closure scenarios, and some variables need to be encapsulated. Since classes can be encapsulated, there are not many scenarios where functions are used as return values.

Function as a parameter, the return value is also a function: it is also used as a parameter before and after the execution of the function requires some additional operations, the most classic application is the decorator.

such as:

import datetime
def logger(fn):
    def wrap(*args, **kwargs):
        start = datetime.datetime.now()
        ret = fn(*args, **kwargs)
        end = datetime.datetime.now()
        print('call {} took {}'.format(fn.__name__, end-start))
        return ret
    return wrap

def add(x, y):
    return x + y

>>> f = logger(add)
>>> f.__name__
Out[25]: 'wrap'
>>> f(3, 5)
call add took 0:00:00.000041
Out[26]: 8
 

The logger function is used to do some operations after and after executing the add function. Record the current time before executing add, and record the current time after executing, and then you can calculate how much time it took to execute add.

The above two functions can be written like this:

import datetime
def logger(fn):
    def wrap(*args, **kwargs):
        start = datetime.datetime.now()
        ret = fn(*args, **kwargs)
        end = datetime.datetime.now()
        print('call {} took {}'.format(fn.__name__, end-start))
        return ret
    return wrap

@logger
def add(x, y):
    return x + y

print(add(3, 5))
 

In fact, the meaning is the same as above, which is the syntax of the decorator. From the above we can conclude that by@ a decorative function, the function is passed as a parameter to the decorator.

Decorator

As mentioned earlier, a decorator is a function that allows some additional operations to be performed before and after the execution of another function. The function as a decorator itself is a higher-order function.

The parameter is a function, and the return value is a function of a function, which can be used as a decorator.

The decorator is a good embodiment of the AOP programming idea. It deals with a class of problems and has nothing to do with the specific business logic. Common usage scenarios are:

  • monitor
  • Cache
  • routing
  • Authority
  • audit

There is a problem with decorators. Take the decorator defined earlier as an example:

import datetime
def logger(fn):
    def wrap(*args, **kwargs):
        start = datetime.datetime.now()
        ret = fn(*args, **kwargs)
        end = datetime.datetime.now()
        print('call {} took {}'.format(fn.__name__, end-start))
        return ret
    return wrap

@logger
def add(x, y):
    return x + y
 

When executed at this time add.__name__, the name of the add function is not returned:

>>> add.__name__
Out[8]: 'wrap'
 

Most of the time, there is no effect, but there will definitely be problems in scenarios that rely on function names. In fact, the solution is very simple.

import datetime
def logger(fn):
    def wrap(*args, **kwargs):
        start = datetime.datetime.now()
        ret = fn(*args, **kwargs)
        end = datetime.datetime.now()
        print('call {} took {}'.format(fn.__name__, end-start))
        return ret
    wrap.__name__ = fn.__name__ #  
    wrap.__doc__ = fn.__doc__ #   doc  
    return wrap

@logger
def add(x, y):
    return x + y

>>> add.__name__ #  
Out[10]: 'add'
 

You can also extract the assignment statement and define it as a function:

import datetime

#  
# def copy_property(src, dst):
#     dst.__name__ = src.__name__
#     dst.__doc__ = src.__doc__

#  
def copy_property(src):
    def _copy(dst):
        dst.__name__ = src.__name__
        dst.__doc__ = src.__doc__
    return _copy

def logger(fn):
    def wrap(*args, **kwargs):
        start = datetime.datetime.now()
        ret = fn(*args, **kwargs)
        end = datetime.datetime.now()
        print('call {} took {}'.format(fn.__name__, end-start))
        return ret
    copy_property(fn)(wrap)
    return wrap

@logger
def add(x, y):
    return x + y

print(add.__name__)
 

The reason why I changed to the following function is for currying, maybe it s not clear at this way, because there is really no cleverness to go, then look down:

import datetime

def copy_property(src):
    def _copy(dst):
        dst.__name__ = src.__name__
        dst.__doc__ = src.__doc__
        return dst #  _copy  
    return _copy

def logger(fn):
    @copy_property(fn) #  
    def wrap(*args, **kwargs):
        start = datetime.datetime.now()
        ret = fn(*args, **kwargs)
        end = datetime.datetime.now()
        print('call {} took {}'.format(fn.__name__, end-start))
        return ret
    return wrap

@logger
def add(x, y):
    return x + y

print(add.__name__)
 

The function of copy_property can be completely replaced by functools.wraps. Its function is to change some properties of wrap to fn. As for which properties are modified, you can see through help:

import datetime
import functools

def logger(fn):
    @functools.wraps(fn) #  
    def wrap(*args, **kwargs):
        start = datetime.datetime.now()
        ret = fn(*args, **kwargs)
        end = datetime.datetime.now()
        print('call {} took {}'.format(fn.__name__, end-start))
        return ret
    return wrap

@logger
def add(x, y):
    return x + y

print(add.__name__)
 

With parameters

A function that returns a decorator without parameters, then this function is a decorator with parameters. Because the function as a decorator can only directly use a function as a parameter, and cannot accept other parameters. Therefore, you can only wrap a layer of function outside the decorator, and pass the parameters to the decorator through this function.

The following defines a function, this function accepts a parameter, and then returns a decorator. This decorator is used to calculate the execution time of the function, and output information after how many seconds.

import time
import datetime
import functools

def logger(s):
    def _logger(fn):
        @functools.wraps(fn)
        def wrap(*args, **kwargs):
            start = datetime.datetime.now()
            ret = fn(*args, **kwargs)
            end = datetime.datetime.now()
            if (end - start).total_seconds() > s:
                print('call {} took {}'.format(fn.__name__, end-start))
            return ret
        return wrap
    return _logger

@logger(2)
def sleep(x):
    time.sleep(x)

sleep(1)
 

In order to help understand, the above code is executed like this if the decorator is not used:

import time
import datetime
import functools

def logger(s):
    def _logger(fn):
        @functools.wraps(fn)
        def wrap(*args, **kwargs):
            start = datetime.datetime.now()
            ret = fn(*args, **kwargs)
            end = datetime.datetime.now()
            if (end - start).total_seconds() > s:
                print('call {} took {}'.format(fn.__name__, end-start))
            return ret
        return wrap
    return _logger

# @logger(2)
def sleep(x):
    time.sleep(x)

_logger = logger(2)
sleep = _logger(sleep)
sleep(3)
 

You can even pass parameters to it to make it a higher-order function:

import time
import datetime
import functools

def logger(s, p=lambda name, t: print('call {} took {}'.format(name, t))):
    def _logger(fn):
        @functools.wraps(fn)
        def wrap(*args, **kwargs):
            start = datetime.datetime.now()
            ret = fn(*args, **kwargs)
            end = datetime.datetime.now()
            if (end - start).total_seconds() > s:
                p(fn.__name__, end-start)
            return ret
        return wrap
    return _logger

@logger(2)
def sleep(x):
    time.sleep(x)

sleep(1)
 

In this way, we can not only monitor the execution time of a function (such as a slow query), but also pass an alert function to it, and send an alert once the function execution time exceeds the threshold we define.

3.layers

The decorator with parameters has two layers, and the decorator is wrapped with one layer. What if you wrap another layer?

import time
import datetime
import functools

def logger(s):
    def _logger(p=lambda name, t: print('call {} took {}'.format(name, t))):
        def __logger(fn):
            @functools.wraps(fn)
            def wrap(*args, **kwargs):
                start = datetime.datetime.now()
                ret = fn(*args, **kwargs)
                end = datetime.datetime.now()
                if (end - start).total_seconds() > s:
                    p(fn.__name__, end-start)
                return ret
            return wrap
        return __logger
    return _logger()

@logger(2)()
def sleep(x):
    time.sleep(x)

#  
File "<ipython-input-23-c89f301d8b2d>", line 18
    @logger(2)()
              ^
SyntaxError: invalid syntax
 

But if you have to do it this way, it is also possible, such as:

f = logger(2) #  
@f
def sleep(x):
    time.sleep(x)

sleep(1)
 

Multiple decorators decorate the same function

A function can apply multiple decorators, the first decorator is used to enhance the function of the second decorator, and the second decorator is used to enhance the function of the original function.

def outer0(fun):
    #  outer1 wrapper 
    def wrapper(*args, **kwargs):
        print('start')
        ret = fun(*args, **kwargs)
        return ret
    return wrapper

def outer1(fun):
    #  
    def wrapper(*args, **kwargs):
        print('123')
        ret = fun(*args, **kwargs)
        return ret
    return wrapper

@outer0
@outer1
def Func1(a1, a2):
    print('func1')

Func1(1, 2)
 

To understand the working principle of the two decorators, it is necessary to expand it first.

As shown in the figure above, when the function Func1 is executed, the wrapper function in outer0 will be executed. When the fun function in this function is executed, it is equivalent to the wrapper function in outer1. When the fun function in this function is executed, it is equivalent to the function Func1. After the function of Func1 is executed, the return value None is returned to the ret in outer1, and then ret is returned to the wrapper function in outer0. Finally, None is returned, and the final result is None.

Recursion

Recursion means calling itself within a function, and it is usually used to calculate the factorial. Recursive functions must give an exit condition, otherwise they will fall into endless recursion. Recursion requires boundary conditions, recursive advance and recursive return segments.

In order to protect the interpreter, python has a limit on the maximum depth of recursion, which can be seen in python:

>>> import sys
>>> sys.getrecursionlimit()
Out[132]: 2000
 

Modify the upper limit by the following method:

sys.setrecursionlimit()
 

Python recursion is very slow, especially when there are too many calls to recursive functions. Python needs to maintain an internal state for each recursion, so try to avoid using recursion.

It is directly defined as recursion, that is, calling itself in the function. Python itself can detect that it is recursive and will limit its depth. But the following statement is actually recursive, but python cannot detect it. In such a situation, the interpreter will die on the spot.

def f():
    g()

def g():
    k()

def k()
    f()
 

Writing them together is of course easy to find, but when these functions are in different files, it is difficult to locate. Especially when one of the functions is in a very small function, it will not be used normally, but once it is used, it will trigger infinite repeated calls, and it will die as soon as it is executed. So there will be such a situation: the program has been running for a few months, but suddenly the program hangs, and then it starts well, and then hangs again after a while. Not only python, but other languages will encounter such problems. In this case, the program hangs in an instant, and the server load is normal. Therefore, it is very troublesome to troubleshoot this problem. Therefore, you must be careful when writing programs to avoid such a situation.

Realization of factorial:

def fact(n):
  if n <= 1: return 1
  else: return n*fact(n-1)
 

return returns a return value, this return value contains a call to its own function. The recursive exit condition is given through n-1. If n is 3, then:

fact(3)
  return: 3*fact(2)
    return: 2*fact(1)
      return: 1
 

The result is 3 2 1, which is 6. It will first save all the generated results in memory, wait until it finally returns 1, and then multiply it from bottom to top. This is recursion, a function contains a call to itself.

Return the 10th number of the Fibonacci sequence:

def f1(a1, a2, n):
    if n == 10:
        return a1
    else:
        a3 = a1 + a2
        #  f1 
        a = f1(a2, a3, n+1)
        #  a2, a3 
        return a

print(f1(0, 1, 1))
 

This function will certainly continue to nest recursively. When n=10 and the if condition is met, it will return a1 with a value of 34. This 34 happens to be received by the previous function, that is, the a in the function when n=9, and a obtains the value of 34, and then executes the return statement to return 34 to the previous function, which is the function when n=8 In a. And so on, and finally return to the a in the first function, it gets the 34 returned from the previous function, and then returns the final result 34.

In other words, recursion will continue to nest, and continue to nest in layers. But when given a condition for it to exit at a certain time, it will continue to untie it layer by layer. Therefore, one must have a parameter to receive the return value of the latter function. The following is an error case, the return value of the previous function was not received:

def f1(a1, a2, n):
    if n == 10:
        return a1
    else:
        a3 = a1 + a2
        #  
        f1(a2, a3, n+1)
        return a1

print(f1(0, 1, 1))
 

Although a1 is returned, which is 34, the previous function did not receive this value (just executes the function), so the final result is the 0 originally passed to it.

Type hint

Python is a dynamically typed language. That is to say, the type of a variable is determined at runtime, and the type of a variable is variable during the life cycle of the application.

When we specify a function, others don't know what type of parameter should be passed to it. The usual usage is to specify it in the document. such as:

def add(x, y):
    '''
    :param x: int
    :param y: int
    :return: int
    '''
    return x + y
 

The problem is that not everyone writes documents, and documents are not necessarily updated along with the code. Also, the document is in natural language, which is not convenient for machine operation. Therefore, Python3 began to support type hinting, that is, there is a standard to let us know what the parameter and return value types are.

def add(x: int, y: int) -> int:
    return x + y
 

It should be noted that it is just a comment, and Python does not do any checks. It is used to provide third-party tools (such as IDE, static analysis tools), or to obtain information at runtime.

Python saves this information here:

>>> add.__annotations__
Out[25]: {'return': int, 'x': int, 'y': int}
 

In 3.5 and earlier versions, type hints can only be used on function parameters and return values. This can be used in 3.6:

i: int = 1
 

In order to support type annotations, Python also provides a typing library. The following indicates that the elements in the list are integers.

import typing

def add(lst: typing.List[int]) -> int:
    ret = 0
    for i in lst:
        ret += i
    return ret

>>> add.__annotations__
Out[28]: {'lst': typing.List[int], 'return': int}
 

The following is an example to force the input type to match the defined type, otherwise an exception will be thrown:

import inspect
import functools

def typed(fn):
    @functools.wraps(fn)
    #  
    def wrap(*args, **kwargs):
        #   inspect  
        params = inspect.signature(fn).parameters
        #  
        for k, v in kwargs.items():
            if not isinstance(v, params[k].annotation):
                raise TypeError('parameter {} required {}, but {}'.format(k, params[k], type(v)))
        #  
        for i, arg in enumerate(args):
            param = list(params.values())[i]
            if not isinstance(arg, param.annotation):
                raise TypeError('parameter {} required {}, but {}'.format(param.name, param.annotation, type(arg)))
        return fn(*args, **kwargs)
    return wrap

@typed
#  
def add(x: int, y: int) -> int:
    return x + y
 

Exercise

Flatten dictionary (recursive method):

def flatten(d):
    def _flatten(src, dst, prefix=''):
        for k, v in src.items():
            key = k if prefix is '' else '{}.{}'.format(prefix, k)
            if isinstance(v, dict):
                _flatten(v, dst, key)
            else:
                dst[key] = v
    result = {}
    _flatten(d, result)
    return result

>>> flatten({'a': 2, 'b': {'c': 4, 'g': 9}, 'd': {'e': {'f': 6}}})
Out[30]: {'a': 2, 'b.c': 4, 'b.g': 9, 'd.e.f': 6}
 

Although the efficiency of recursion is low, it is more convenient to use recursion when the dictionary depth is not large, but it seems a bit difficult to understand.

Implement base64 encoding

Base64 encoding realizes the conversion of binary into a string, and its characteristics:

  • There is a table: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
  • The input is grouped by three bytes (24 bits), and 0 is added for less than three bytes;
  • Group by 6 digits and convert to integer
  • Integer as the index of the table
  • The byte with 0 is represented by =
def b64encode(data: bytes) -> str:
    table = b'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
    encoded = bytearray()
    c = 0
    for x in range(3, len(data)+1, 3):
        i = int.from_bytes(data[c: x], 'big')
        for j in range(1, 5):
            encoded.append(table[i >> (24 - j*6) & 0x3f])
        c += 3
    r = len(data) - c
    if r > 0:
        i = int.from_bytes(data[c:], 'big') << (3-r) * 8
        for j in range(1, 5-(3-r)):
            encoded.append(table[i >> (24 - j*6) & 0x3f])
        for _ in range(3-r):
            encoded.append(int.from_bytes(b'=', 'big'))
    return encoded.decode()

print(b64encode(b'abcdefg'))
 

decoding:

def b64decode(data: str) -> bytes:
    table = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
    decoded = bytearray()
    s = 0
    for e in range(4, len(data)+1, 4):
        tmp = 0
        for i, c in enumerate(data[s:e]):
            if c != '=':
                tmp += table.index(c) << 24 - (i+1) * 6
            else:
                tmp += 0 << 24 - (i+1) * 6
        decoded.extend(tmp.to_bytes(3, 'big'))
        s += 4
    return  bytes(decoded.rstrip(b'\x00'))
 

Cache decorator

Similar to functools.lru_cache, but can set a timeout.

The difficulty lies in:

  1. To *args, **kwargsas a key, but it is clear lists and dictionaries can not serve as key dictionary, so to parse lists and dictionaries, to achieve the ultimate key = 'args=1&args=2&args=3&kwargs=99'so similar format.
  2. Since there is no swap-out strategy, as long as it is judged when accessing, if the current timestamp minus the current timestamp is greater than the timeout period, the result will be returned directly.
import time
import inspect
import functools
from datetime import datetime

def cache(expire=0):
    def _cache(fn):
        #  
        cache_dic = {}
        @functools.wraps(fn)
        def wrap(*args, **kwargs):
            lst = []
            s = set()
            #  
            param = inspect.signature(fn).parameters
            if 'args' in param.keys():
                for arg in args:
                    lst.append(('args', arg))
                    s.add('args')
            else:
                for i, arg in enumerate(args):
                    para = list(param.keys())[i]
                    name = (para, arg)
                    lst.append(name)
                    s.add(para)
            lst.extend(list(kwargs.items()))
            s.update(kwargs.keys())
            if 'kwargs' not in param.keys():
                for k, v in param.items():
                    if k not in s:
                        lst.append((k, v.default))
                        s.add(k)
            lst.sort(key=lambda x: x[0])
            #   key  
            key = '&'.join(['{}={}'.format(k, v) for k, v in lst])
            now = datetime.now().timestamp()
            if key in cache_dic:
                ret, timestamp = cache_dic[key]
                if expire == 0 or now - timestamp < expire:
                    print("cache hit")
                    return ret
            ret = fn(*args, **kwargs)
            cache_dic[key] = (ret, now)
            print('cache miss')
        return wrap
    return _cache
 

Next, set up lru swap out.

Precautions:

  • There are three values corresponding to each key: the function execution result, the time used to compare the expiration date, and the time used to record the most recent visit.
  • If the timeout period is not up, or the timeout code is not set, there is no change.
  • When the number of cache entries is less than the maximum limit, no check is made.
  • When the number of cache entries is greater than or equal to the upper limit, the timeout swap is performed first by traversing the entire dictionary. Note that the dictionary may be modified during the traversal process, so it is best to use list(cache_dic.items()) so that the traversal is not the original dictionary.
  • If the number of cache entries is still greater than or equal to the upper limit after the timeout is swapped out, the time used to record the most recent access of the dictionary should be sorted, and then the first entry of the dictionary should be deleted.

The above swap-out algorithm is very slow and inefficient. We can do it through the queue, all keys are placed in the queue (there is no need to save the last time). After each hit, the key is placed at the head of the queue, so you only need to delete the tail of the queue when swapping out. This queue can be implemented by a list. The insert of the list to the 0th position, although it causes all subsequent elements to be moved back by one, the performance is not too bad. The biggest problem is the remove of the list, because remove needs to traverse the dictionary, and the time complexity is O(n).

Therefore, the best way is to write a doubly linked list, put it in the head when inserting, and delete the tail when deleting. In order to solve the problem of traversing the entire linked list to find the corresponding item when removing, because the data is stored in the linked list, and a reference to the corresponding linked list data is stored in the dictionary, the search speed can be saved by this reference to O(1) .

from collections import namedtuple

Item = namedtuple('Item', ['key', 'value', 'timestamp'])

def linked_list():
    _head = None
    _tail = None

    def put(item):
        nonlocal _head
        nonlocal _tail
        if _head is None:
            _head = {'data': item, 'prev': None, 'next': None}
        else:
            node = {'data': item, 'prev': None, 'next': _head}
            _head['prev'] = node
            _head = node
        if _tail is None:
            _tail = _head
        print(id(_head), id(_tail))
        return _head

    def pop():
        nonlocal _tail
        nonlocal _head
        if _tail is None:
            _head = None
            return None
        node = _tail
        _tail = node['prev']
        return node

    def remove(node):
        nonlocal _head
        nonlocal _tail
        if node is _head:
            _head = node['next']
        if node is _tail:
            pop()
            return
        node['prev']['next'] = node['next']
        node['next']['prev'] = node['prev']

    return put, pop, remove

put, pop, remove = linked_list()


import inspect
import datetime
import functools

def cache(maxsize=128, expire=0):
    def make_key(fn, args, kwargs):
        ret = []
        names = set()
        params = inspect.signature(fn).parameters
        keys = list(params.keys())
        for i, arg in enumerate(args):
            ret.append((keys[i], arg))
            names.add(keys[i])
        ret.extend(kwargs.items())
        names.update(kwargs.items())
        for k, v in params.items():
            if k not in names:
                ret.append((k, v.default))
        ret.sort(key=lambda x: x[0])
        return '&'.join(['{}={}'.format(name, arg) for name, arg in ret])

    def _cache(fn):
        data = {}
        put, pop, remove = linked_list()
        @functools.wraps(fn)
        def wrap(*args, **kwargs):
            key = make_key(fn, args, kwargs)
            now = datetime.datetime.now().timestamp()
            if key in data.keys():
                node = data[key]
                item = node['data']
                remove(node)
                if expire == 0 or now - item.timestamp >= expire:
                    data[key] = put(item)
                    return item
                else:
                    data.pop(key)
            value = fn(*args, **kwargs)
            if len(data) >= maxsize:
                if expire != 0:
                    expires = set()
                    for k, node in data.items():
                        if now - node['data'].timestamp >= expire:
                            pop(node)
                            expires.add(k)
                    for k in expires:
                        data.pop(k)
            if len(data) >= maxsize:
                node = pop()
                data.pop(node['data'].key)
            node = put(Item(key, value, now))
            data[key] = node
            return value
        return wrap
    return _cache
 

After adding the data to the linked list through the put method, the head node is the currently inserted content, so the head node is saved in the cache dictionary, and the node can be passed through this node next time in the linked list at a speed of O(1) Found and deleted.

Command distributor

Execute the corresponding function by entering the corresponding command.

def command():
    commands = {}

    def register(command):
        def _register(fn):
            if command in commands:
                raise Exception('command {} exist'.format(command))
            commands[command] = fn
            return fn
        return _register

    def default_fn():
        print('unknown command')

    def run():
        while True:
            cmd = input('>> ')
            if cmd.strip() == 'quit':
                return
            commands.get(cmd.strip(), default_fn)()

    return register, run

register, run = command()

@register('abc')
def papa():
    print('papa')

run()
 

That is to say, if you use a decorator on a function, then this function will establish a connection with the parameters passed to the decorator. Then enter the corresponding parameters next time to execute the decorated function.

The following is the version with parameters. The so-called with parameters means that the user can take parameters when inputting:

import inspect
from collections import namedtuple


def dispatcher(default_handler=None):
    Handler = namedtuple('Handler', ['fn', 'params'])
    #   Handler
    commands = {}

    if default_handler is None:
        default_handler = lambda *args, **kwargs: print('not found')

    def register(command):
        def _register(fn):
            #  
            params = inspect.signature(fn).parameters
            #  
            commands[command] = Handler(fn, params)
            return fn
        return _register

    def run():
        while True:
            command, _, params = input('>> ').partition(':')
            #  add:x,y,z=1
            if command.strip() == 'quit':
                return
            handler = commands.get(command.strip(), Handler(default_handler, {}))
            #  
            args = []
            kwargs = {}
            param_values = list(handler.params.values())
            for i, param in enumerate(params.split(',')):
                if '=' in param:
                    name, _, value = param.partition('=')
                    #  
                    #   <Parameter "y='abc'">
                    p = handler.params.get(name.strip())
                    #  default 
                    if p is not None and p.annotation != inspect.Parameter.empty:
                        kwargs[name.strip()] = p.annotation(value)
                    else:
                        kwargs[name.strip()] = value
                else:
                    if len(param_values) > i and param_values[i].annotation != inspect.Parameter.empty:
                        args.append(param_values[i].annotation(param.strip()))
                    else:
                        args.append(param.strip())
            ret = handler.fn(*args, **kwargs)
            if ret is not None:
                print(ret)
    return register, run

reg, run = dispatcher()
@reg('abc')
def abc(x: int, y: int):
    print(x+y)

run()
 

The parameters need to be processed, because if the function uses the int type, but the input type is a string. Therefore, in order to distinguish the types of parameters, we need to use type annotations to make judgments.

When the parameter is a list or dictionary, this script is not competent.

Parse httpd log

import datetime
from collections import namedtuple

line = '66.256.46.124 - - [10/Aug/2016:06:05:06 +0800] "GET/robots.txt HTTP/1.1" 404 162 "-" "Mozilla/5.0 (compatible: Googlebot/2.1: +http://www.google.com/bot.html)"'

Request = namedtuple('Request', ['method', 'url', 'version'])
MapItem = namedtuple('MapItem', ['name', 'convert'])
mapping = [
    MapItem('remote', lambda x: x),
    MapItem('', None),
    MapItem('', None),
    MapItem('time', lambda x: datetime.datetime.strptime(x, '%d/%b/%Y:%H:%M:%S %z')),
    MapItem('request', lambda x: Request(*x.split())),
    MapItem('status', int),
    MapItem('length', int),
    MapItem('', None),
    MapItem('ua', lambda x: x)
]

def strptime(src: str) -> datetime.datetime:
    return datetime.datetime.strptime(src, '%d/%b/%Y:%H:%M:%S %z')

def extract(line):
    tmp = []
    ret = []
    split = True
    for c in line:
        if c == '[':
            split = False
            continue
        if c == ']':
            split = True
            continue
        if c == '"':
            split = not split
            continue
        if c == ' ' and split:
            ret.append(''.join(tmp))
            tmp.clear()
        else:
            tmp.append(c)
    ret.append(''.join(tmp))
    result = {}
    for i, item in enumerate(mapping):
        if item.name:
            result[item.name] = item.convert(ret[i])
    return result

#  
def load(path):
    with open(path) as f:
        try:
            yield extract(f.readline())
        except:
            pass
 

Regular expression version:

import re
import datetime
from collections import namedtuple

Request = namedtuple('Request', ['method', 'url', 'version'])
line = '66.256.46.124 - - [10/Aug/2016:06:05:06 +0800] "GET/robots.txt HTTP/1.1" 404 162 "-" "Mozilla/5.0 (compatible: Googlebot/2.1: +http://www.google.com/bot.html)"'

mapping = {
    'length': int,
    'request': lambda x: Request(*x.split()),
    'status': int,
    'time': lambda x: datetime.datetime.strptime(x, '%d/%b/%Y:%H:%M:%S %z')
}

def strptime(src: str) -> datetime.datetime:
    return datetime.datetime.strptime(src, '%d/%b/%Y:%H:%M:%S %z')

def extract(line):
    regexp = r'(?P<remote>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - -/[(?P<time>.*)\] "(?P<request>.*)" (?P<status>\d+) (?P<length>\d+) ".*" "(?P<ua>.*)'
    m = re.match(regexp, line)
    if m:
        ret = m.groupdict()
        return {k: mapping.get(k, lambda x:x)(v) for k, v in ret.items()}
    raise Exception(line)

print(extract(line))
 

For analysis of time series data such as logs or monitoring data, time-related analysis is usually performed, so a sliding window is required. In time sequence analysis, sliding window is very important. The so-called sliding window is to set a window to intercept a piece of data from a pile of data arranged in time. After the analysis is completed, the window slides forward to intercept the next piece. The sliding window has two very important parameters, one is width (the width of the window), and the other is the interval (interval). This means that the data of the previous width seconds is analyzed every interval seconds, so the interval must be greater than or equal to width.

Integrate:

import re
import time
import queue
import datetime
import threading
from collections import namedtuple

matcher = re.compile(r'(?P<remote>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - -/[(?P<time>.*)\] "(?P<request>.*)" (?P<status>\d+) (?P<length>\d+) ".*" "(?P<ua>.*)')
Request = namedtuple('Request', ['method', 'url', 'version'])
mapping = {
    'length': int,
    'request': lambda x: Request(*x.split()),
    'status': int,
    'time': lambda x: datetime.datetime.strptime(x, '%d/%b/%Y:%H:%M:%S %z')
}

def extract(line):
    m = matcher.match(line)
    if m:
        ret = m.groupdict()
        return {k: mapping.get(k, lambda x:x)(v) for k, v in ret.items()}
    raise Exception(line)

def read(f):
    for _ in f:
        try:
            yield extract(f.readline())
        except:
            pass

def load(path):
    with open(path) as f:
        while True:
            read(f)
            time.sleep(0.1)

def window(source, handler, interval: int, width: int):
    store = []
    start = datetime.datetime.now()
    while True:
        data = next(source)
        current = datetime.datetime.now()
        if data:
            store.append(data)
            current = data['time']
        if (current - start).total_seconds() >= interval:
            start = current
            handler(store)
            dt = current - datetime.timedelta(seconds=width)
            store = [x for x in store if x['time'] > dt]

def dispatcher(source):
    analyers = []
    queues = []

    def _source(q):
        while True:
            yield q.get()

    def register(handler, interval, width):
        q = queue.Queue()
        queues.append(q)
        t = threading.Thread(target=window, args=(_source(q), handler, interval, width))
        analyers.append(t)

    def start():
        for t in analyers:
            t.start()
        for item in source:
            for q in queues:
                q.put(item)

    return register, start

#  
def null_handler(items):
    pass

def status_handler(items):
    status = {}
    for x in items:
        if x['status'] not in status.keys():
            status[x['status']] = 0
        status[x['status']] += 1
    total = sum(x for x in status.values())
    for k, v in status.items():
        print('{} -> {}%'.format(k, v/total * 100))

if __name__ == '__main__':
    import sys
    register, start = dispatcher(load(sys.argv[1]))
    register(status_handler, 5, 10)
    start()
 

The above handler function is defined by ourselves, it is the content to be analyzed, such as the response status code of the access and so on.

import re
import time
import queue
import datetime
import threading
from collections import namedtuple
from watchdog.events import FileSystemEventHandler

matcher = re.compile(r'(?P<remote>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - -/[(?P<time>.*)\] "(?P<request>.*)" (?P<status>\d+) (?P<length>\d+) ".*" "(?P<ua>.*)')
Request = namedtuple('Request', ['method', 'url', 'version'])
mapping = {
    'length': int,
    'request': lambda x: Request(*x.split()),
    'status': int,
    'time': lambda x: datetime.datetime.strptime(x, '%d/%b/%Y:%H:%M:%S %z')
}

class Loader(FileSystemEventHandler):
    def __init__(self, path):
        self.path = path
        self.q = queue.Queue()
        self.f = open(path)

    def on_modified(self, event):
        if event.src_path == self.path:
            for item in read(self.f):
                self.q.put(item)

    def source(self):
        while True:
            yield self.q.get()

def extract(line):
    m = matcher.match(line)
    if m:
        ret = m.groupdict()
        return {k: mapping.get(k, lambda x:x)(v) for k, v in ret.items()}
    raise Exception(line)

def read(f):
    for line in f:
        try:
            yield extract(line)
        except:
            pass

def load(path):
    with open(path) as f:
        while True:
            yield from read(f)
            time.sleep(0.1)

def window(source, handler, interval: int, width: int):
    store = []
    start = None
    while True:
        data = next(source)
        store.append(data)
        current = data['time']
        if start is None:
            start = current
        if (current - start).total_seconds() >= interval:
            start = current
            try:
                handler(store)
            except:
                pass
            dt = current - datetime.timedelta(seconds=width)
            store = [x for x in store if x['time'] > dt]

def dispatcher(source):
    analyers = []
    queues = []

    def _source(q):
        while True:
            yield q.get()

    def register(handler, interval, width):
        q = queue.Queue()
        queues.append(q)
        t = threading.Thread(target=window, args=(_source(q), handler, interval, width))
        analyers.append(t)

    def start():
        for t in analyers:
            t.start()
        for item in source:
            for q in queues:
                q.put(item)

    return register, start

#  
def null_handler(items):
    pass

def status_handler(items):
    status = {}
    for x in items:
        if x['status'] not in status.keys():
            status[x['status']] = 0
        status[x['status']] += 1
    total = sum(x for x in status.values())
    for k, v in status.items():
        print('\t{} -> {:.2f}%'.format(k, v/total * 100))

if __name__ == '__main__':
    import os
    import sys
    from watchdog.observers import Observer
    handler = Loader(sys.argv[1])
    observer = Observer()
    observer.schedule(handler, os.path.dirname(sys.argv[1]), recursive=False)
    register, start = dispatcher(handler.source())
    register(null_handler, 5, 10)
    observer.start()
    start()