swift-LLVM-small note 1

swift-LLVM-small note 1

swift-LLVM-small note 1

Compiler hurdles are unavoidable on the road of iOS development. This article first guides readers to get a general understanding of the LLVM compilation process. As for the explanation of LLVM and Clang terms, I have already given you a very complete link, so I won't paste it. The link in the article may need to go over the wall, I think this is a basic programmer skill.
The topic of LLVM is nothing new anymore. The Internet, including some predecessors, has a lot of space to describe it. Why am I here again! The so-called: the benevolent sees it as benevolence, and the knower sees it as wisdom. The average programmer does not know what it is used for, so the gentleman's way is rare! 1. What is LLVM?

    Please move: LLVM

2. 3.representations of LLVM code
1. IR in memory compiler 2. The bitcode stored on the disk 3. Observable compilation

Here have to say IR, IR is based on static single assignment  (SSA) has the following characteristics;

1. Type safety 2. Low-level operability 3. Flexibility

3. The emergence of LLVM's modularization is to solve the high coupling of code. Think about it if there is a large amount of highly coupled code, when the compiler processes it, it is a kind of "cutting and chaotic" to occupy resources for a long time. In severe cases, getting stuck is a major taboo of the program. LLVM is divided into modules:
  • LLVM optimizer
  • LLVM code generator
  • ....
                   4. IR workflow of LLVM
  • Describe by language:
          The compilation process of LLVM IR first decomposes the language source code into a token stream. All operators, identifiers, etc. are represented by a token stream. Then the token stream will be passed to the parser, and the parser will decompose it at the CFG of the language. The token flow is converted to an abstract syntax tree (AST), and then semantic analysis is performed to check the correctness, and finally IR is generated
  • Describe by flow chart:
Created with Rapha l 2.1.2 The high-level language is decomposed into tock stream. The syntax analyzer converts CFG into AST (Abstract Syntax Tree) for semantic analysis, checks the generated IR and understands it as a file like .c, and then produces the .ll file 5. LLVM bitcode Workflow bitcode is LLVM bitcode converted from LLVM IR and consists of two parts:
  •   Bitstream
  •   Encode LLVM IR into a bitstream encoding format
Bitcode is a binary file that can be understood as a .ll file to produce a .bc file. 6. LLVM's bitcode is converted to the platform's assembly code. General content: LLVM's static compiler llc compiles the .bc file into the assembly language of the specified architecture. CPUs have different modes of operation, including the types of registers. The assembly file obtained here is just a general introduction, continue to look down ^_^ IR
  • IR code optimization
  • Transform into other forms
  Optimized conversion: The optimization processing of upgrading local variables from memory to registers, etc., where optimization and conversion are carried out at the same time


   This series of operations are inseparable from it, the C language front-end Clang. As a compiler that generates AST, the memory consumed is only about 20% of GCC, and its performance is so impressive. bitcode In the process of converting LLVM IR to bitcode Introduce two concepts
  • Block
  • recording
Block: The area that represents the bit stream, such as: function body, symbol table. And each block has its own ID record: The record consists of a record code and an integer value. The function is to control the bitcode file structure of the entity in the instruction, global variable, and type description: 1. A simple description header of the paragraph offset of the description file 2. The size of the embedded BC file