Python crawls the backend order interface data of the latest version of CRMEB and saves the data as excel

Python crawls the backend order interface data of the latest version of CRMEB and saves the data as excel

Install dependent libraries

pip3 install requests, openpyxl
 

Crawling ideas

  1. The interface address of the background ingdan is:/admin/order.StoreOrder/order_list.html
  2. After logging in, we can get the header information, use the requests.get method to get the interface data, and return the crawled response data as json data, and use the dictionary get method to get the information we need.
  3. Read the order, payment time and other information we need item by item and return it to a list format
  4. Create a blank excel document
  5. Set the first row of excel as the title of each column
  6. The column corresponding to each title is stored in the corresponding data
import json
import re
import requests
from openpyxl import Workbook


def load_json():
    """  json data  """
    resp = requests.get(
            'http://xxx.com/admin/order.StoreOrder/order_list.html?',
            params=params,
            headers=headers
        ).content.decode('utf-8')
    datas = json.loads(resp).get('data', ' ')
    return datas

def read_data(dict_name):
    """ 
    dict_name 
        order_id -  ID | pay_price -   | pay_type_name -   | pink_name -  
        mark -  | total_num -   | add_time -   | _info -  ( )
    """
    datas = load_json()
    return [data[dict_name] for data in datas]    

def cell_range(wb):
    """   """
    ws = wb.active
    return ws

def save_xlsx(ws, title, sheets, column, start=2):
    """   """
    ws[column+str(start-1)] = title
    for i in list(range(start, len(sheets)+start)):
        ws[column+str(i)].value = sheets[i-start]

 

Function documentation

  • load_json() crawls interface data and returns data in json format
  • read_data() gets a list of each data
  • cell_range() gets the default table of excel and returns
  • save_xlsx(ws, title, sheets, column, start=2)
    • ws: blank form
    • title: Set the title of the corresponding data in the first row
    • sheets: a list of each data
    • column: uppercase letters corresponding to each column such as (A,B,C,D,E)

Start to crawl and save, if you need to get more data, please refer to CRMEB's interface document

if __name__ == "__main__":

    from openpyxl import Workbook
    
    # headers header 
    headers = {
        'Cookie': ' Cookie ',
        'Host': ' host',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4128.3 Safari/537.36'
    }
    
    # page     limit crmeb  
    params = {'page': 1, 'limit': 20}   

    wb = Workbook()
    ws = cell_range(wb)

    #  
    order_id = read_data('order_id')
    save_xlsx(ws, title=' ', sheets=order_id, column='A')

    #  
    total_num = read_data('total_num')
    save_xlsx(ws, title=' ', sheets=total_num, column='B')

    #  
    add_time = read_data('add_time')
    save_xlsx(ws, title=' ', sheets=add_time, column='C')

    #  
    pay_price = read_data('pay_price')
    save_xlsx(ws, title=' ', sheets=pay_price, column='D')

    #  
    pay_type_name = read_data('pay_type_name')
    save_xlsx(ws, title=' ', sheets=pay_type_name, column='E')

    #  
    spread_nickname = read_data('spread_nickname')
    save_xlsx(ws, title=' ', sheets=spread_nickname, column='F')

    #  
    names = [name[0]['cart_info']['productInfo']['store_name'] for name in read_data('_info')]
    save_xlsx(ws, title=' ', sheets=names, column='G')

    wb.save('data_crmeb.xlsx')
 

This crawler is relatively simple, but it is enough for our own backend. It is only for the convenience of everyone to obtain order data and analyze it. Please do not use it for other purposes.