Linux学习：编译相关-Aet

Bingliaolong 2026-04-06 22:44 Aet | 隐藏边栏 | 抢沙发 | 7 0

文章评分 1 次，平均分 5.0 ：

概述

编译阶段

编译阶段是整个编译流水线中最复杂、最核心的阶段——它是编译器的"大脑"

1 2	源码 → 预处理 → 【编译（生成汇编/.ll）】→ 汇编（生成 .o）→ 链接 → 可执行文件 .i 文件 .s 文件 / .ll 文件

定义
1. 编译阶段把预处理后的 C/C++ 源码（纯文本）翻译成汇编语言（或 LLVM IR）
2. 这个阶段要完成"理解高级语言的含义，并用低级语言表达出来"这个核心任务
总结
1. 预处理阶段是纯文本替换，不理解 C++ 语法
2. 后面的汇编阶段是机械翻译，一条汇编指令对应一串固定的字节
3. 而编译阶段需要真正理解代码的含义——它要理解 int x = a + b 表示"把两个整数相加并存储"，然后决定用哪些寄存器、用哪条机器指令来实现

编译阶段的内部流程

编译阶段本身又分为多个子阶段
具体见编译阶段详解

预处理后的源码（.i 文件，纯 C/C++ 文本）

│

▼

┌─────────────────────────────────────────────┐

│ 1. 词法分析（Lexical Analysis / Scanning） │

│ 字符流 → Token 流 │

│ "int x = a + b;" → [int][x][=][a][+][b][;]│

└───────────────────┬─────────────────────────┘

│ Token 流

▼

┌─────────────────────────────────────────────┐

│ 2. 语法分析（Syntax Analysis / Parsing） │

│ Token 流 → AST（抽象语法树） │

│ 验证代码是否符合语法规则 │

└───────────────────┬─────────────────────────┘

│ AST

▼

┌─────────────────────────────────────────────┐

│ 3. 语义分析（Semantic Analysis） │

│ 类型检查、名称解析、重载决议 │

│ 确认代码的含义是否合法 │

└───────────────────┬─────────────────────────┘

│ 带类型标注的 AST

▼

┌─────────────────────────────────────────────┐

│ 4. IR 生成（IR Generation / CodeGen） │

│ AST → LLVM IR（中间表示） │

│ 把 C/C++ 的高级概念降级为低级操作 │

└───────────────────┬─────────────────────────┘

│ LLVM IR（.ll 文件）

▼

┌─────────────────────────────────────────────┐

│ 5. 优化（Optimization） │

│ 对 IR 做各种变换，让代码更快更小 │

│ 常量折叠、死代码消除、内联、向量化 ... │

└───────────────────┬─────────────────────────┘

│ 优化后的 LLVM IR

▼

┌─────────────────────────────────────────────┐

│ 6. 后端代码生成（Backend Code Generation） │

│ LLVM IR → 目标机器的汇编指令 │

│ 指令选择 → 寄存器分配 → 指令调度 │

└───────────────────┬─────────────────────────┘

│

▼

汇编文本（.s 文件）

生成`ATT`汇编

1	clang++ -S test.cpp -o test.s

生成`Intel`汇编

1	clang++ -S -masm=intel test.cpp -o test.s

编译阶段详解

`code`

#include <cstdio>

int add(int a, int b) {

return a + b;

}

int main() {

int x = add(3,4);

printf("res is %d\n", x);

return 0;

}

词法分析

输入：预处理后的字符流（test.i 的内容）

输出：Token（词法单元）流

编译器逐字符扫描源码，识别出一个个有意义的"词"：

字符流：i n t a d d ( i n t a , i n t b ) { \n r e t u r n a + b ; \n }

│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │

▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼

Token: int add ( int a , int b ) { return a + b ; }

关键字标识符标点关键字标识符 ...

每个 Token 包含：

类型（keyword / identifier / literal / punctuator）

内容（"int" / "add" / "3" / "+" / ...）

位置（文件名:行号:列号，来自预处理的 linemarker）

实际查看 Clang 的词法分析结果
1. 每个 Token 带着精确的源码位置
2. 当后面的阶段检测到错误时（比如语法错误、类型错误），错误信息中的文件名和行号就来自这里

1	clang++ -Xclang -dump-tokens test.cpp 2>&1 \| head -30

namespace 'namespace' [StartOfLine] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:296:1>

identifier 'std' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:296:11>

l_brace '{' [StartOfLine] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:297:1>

typedef 'typedef' [StartOfLine] [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:298:3>

long 'long' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:298:11 <Spelling=<built-in>:130:23>>

unsigned 'unsigned' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:298:11 <Spelling=<built-in>:130:28>>

int 'int' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:298:11 <Spelling=<built-in>:130:37>>

identifier 'size_t' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:298:26>

semi ';' Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:298:32>

typedef 'typedef' [StartOfLine] [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:299:3>

long 'long' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:299:11 <Spelling=<built-in>:124:26>>

int 'int' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:299:11 <Spelling=<built-in>:124:31>>

identifier 'ptrdiff_t' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:299:28>

semi ';' Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:299:37>

typedef 'typedef' [StartOfLine] [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:302:3>

decltype 'decltype' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:302:11>

l_paren '(' Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:302:19>

nullptr 'nullptr' Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:302:20>

r_paren ')' Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:302:27>

identifier 'nullptr_t' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:302:29>

semi ';' Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:302:38>

extern 'extern' [StartOfLine] [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:308:3>

string_literal '"C++"' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:308:10>

__attribute '__attribute__' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:308:16>

l_paren '(' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:308:30>

l_paren '(' Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:308:31>

identifier '__noreturn__' Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:308:32>

comma ',' Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:308:44>

identifier '__always_inline__' [LeadingSpace] Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:308:46>

r_paren ')' Loc=</usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/c++config.h:308:63>

词法分析还做了一件事
1. 过滤掉所有空白和注释
2. 预处理阶段已经去掉了注释，但空格、换行、缩进在预处理输出中还在
3. 词法分析器跳过这些空白，只输出有意义的 Token
在 Clang 源码中的位置

在 Clang 源码中的位置：

clang/lib/Lex/Lexer.cpp # 词法分析器核心

clang/include/clang/Lex/Token.h # Token 定义

语法分析

输入：Token 流

输出：AST（Abstract Syntax Tree，抽象语法树）

语法分析器（Parser）按照 C++ 的语法规则，把线性的 Token 流组织成树形结构。

Token 流：int add ( int a , int b ) { return a + b ; }

FunctionDecl 'add' returns 'int'

├── ParmVarDecl 'a' type='int'

├── ParmVarDecl 'b' type='int'

└── CompoundStmt（函数体 { }）

└── ReturnStmt

└── BinaryOperator '+'

├── DeclRefExpr 'a'

└── DeclRefExpr 'b'

这棵树精确地表达了代码的结构：

- add 是一个函数，返回 int，有两个 int 参数

- 函数体中有一个 return 语句

- return 的表达式是 a + b

- + 是二元运算符，左操作数是 a，右操作数是 b

如果代码有语法错误（比如少了分号、括号不匹配），

Parser 在这个阶段就会报错，编译终止。

// add相关部分片段

[0;34m`-[0m[0;1;32mFunctionDecl[0m[0;33m 0x35d3be80[0m <[0;33mline:7:1[0m, [0;33mline:11:1[0m> [0;33mline:7:5[0m[0;1;36m main[0m [0;32m'int ()'[0m

[0;34m `-[0m[0;1;35mCompoundStmt[0m[0;33m 0x35d3c320[0m <[0;33mcol:12[0m, [0;33mline:11:1[0m>

[0;34m |-[0m[0;1;35mDeclStmt[0m[0;33m 0x35d3c0e0[0m <[0;33mline:8:2[0m, [0;33mcol:18[0m>

[0;34m | `-[0m[0;1;32mVarDecl[0m[0;33m 0x35d3bf38[0m <[0;33mcol:2[0m, [0;33mcol:17[0m> [0;33mcol:6[0m used[0;1;36m x[0m [0;32m'int'[0m cinit

[0;34m | `-[0m[0;1;35mCallExpr[0m[0;33m 0x35d3c0b0[0m <[0;33mcol:10[0m, [0;33mcol:17[0m> [0;32m'int'[0m[0;36m[0m[0;36m[0m

[0;34m | |-[0m[0;1;35mImplicitCastExpr[0m[0;33m 0x35d3c078[0m <[0;33mcol:10[0m> [0;32m'int (*)(int, int)'[0m[0;36m[0m[0;36m[0m <[0;31mFunctionToPointerDecay[0m>

[0;34m | | `-[0m[0;1;35mDeclRefExpr[0m[0;33m 0x35d3c028[0m <[0;33mcol:10[0m> [0;32m'int (int, int)'[0m[0;36m lvalue[0m[0;36m[0m [0;1;32mFunction[0m[0;33m 0x35d3bcf0[0m[0;1;36m 'add'[0m [0;32m'int (int, int)'[0m

[0;34m | |-[0m[0;1;35mIntegerLiteral[0m[0;33m 0x35d3bfe8[0m <[0;33mcol:14[0m> [0;32m'int'[0m[0;36m[0m[0;36m[0m[0;1;36m 3[0m

[0;34m | `-[0m[0;1;35mIntegerLiteral[0m[0;33m 0x35d3c008[0m <[0;33mcol:16[0m> [0;32m'int'[0m[0;36m[0m[0;36m[0m[0;1;36m 4[0m

[0;34m |-[0m[0;1;35mCallExpr[0m[0;33m 0x35d3c290[0m <[0;33mline:9:2[0m, [0;33mcol:25[0m> [0;32m'int'[0m[0;36m[0m[0;36m[0m

[0;34m | |-[0m[0;1;35mImplicitCastExpr[0m[0;33m 0x35d3c278[0m <[0;33mcol:2[0m> [0;32m'int (*)(const char *__restrict, ...)'[0m[0;36m[0m[0;36m[0m <[0;31mFunctionToPointerDecay[0m>

[0;34m | | `-[0m[0;1;35mDeclRefExpr[0m[0;33m 0x35d3c200[0m <[0;33mcol:2[0m> [0;32m'int (const char *__restrict, ...)'[0m[0;36m lvalue[0m[0;36m[0m [0;1;32mFunction[0m[0;33m 0x35d1bbe8[0m[0;1;36m 'printf'[0m [0;32m'int (const char *__restrict, ...)'[0m

[0;34m | |-[0m[0;1;35mImplicitCastExpr[0m[0;33m 0x35d3c2c0[0m <[0;33mcol:9[0m> [0;32m'const char *'[0m[0;36m[0m[0;36m[0m <[0;31mArrayToPointerDecay[0m>

[0;34m | | `-[0m[0;1;35mStringLiteral[0m[0;33m 0x35d3c1b8[0m <[0;33mcol:9[0m> [0;32m'const char[11]'[0m[0;36m lvalue[0m[0;36m[0m[0;1;36m "res is %d\n"[0m

[0;34m | `-[0m[0;1;35mImplicitCastExpr[0m[0;33m 0x35d3c2d8[0m <[0;33mcol:24[0m> [0;32m'int'[0m[0;36m[0m[0;36m[0m <[0;31mLValueToRValue[0m>

[0;34m | `-[0m[0;1;35mDeclRefExpr[0m[0;33m 0x35d3c1e0[0m <[0;33mcol:24[0m> [0;32m'int'[0m[0;36m lvalue[0m[0;36m[0m [0;1;32mVar[0m[0;33m 0x35d3bf38[0m[0;1;36m 'x'[0m [0;32m'int'[0m

[0;34m `-[0m[0;1;35mReturnStmt[0m[0;33m 0x35d3c310[0m <[0;33mline:10:2[0m, [0;33mcol:9[0m>

[0;34m `-[0m[0;1;35mIntegerLiteral[0m[0;33m 0x35d3c2f0[0m <[0;33mcol:9[0m> [0;32m'int'[0m[0;36m[0m[0;36m[0m[0;1;36m 0[0m

输出（简化后的关键部分）：

TranslationUnitDecl ← 整个编译单元的根节点

├── FunctionDecl <test.cpp:2:1> 'int (int, int)' add

│ ├── ParmVarDecl 'int' a

│ ├── ParmVarDecl 'int' b

│ └── CompoundStmt

│ └── ReturnStmt

│ └── BinaryOperator 'int' '+'

│ ├── ImplicitCastExpr 'int' <LValueToRValue>

│ │ └── DeclRefExpr 'int' lvalue ParmVar 'a'

│ └── ImplicitCastExpr 'int' <LValueToRValue>

│ └── DeclRefExpr 'int' lvalue ParmVar 'b'

│

├── FunctionDecl <test.cpp:5:1> 'int (int, char **)' main

│ ├── CompoundStmt

│ │ ├── DeclStmt ← int x = add(3, 4);

│ │ │ └── VarDecl 'int' x

│ │ │ └── CallExpr 'int'

│ │ │ ├── ImplicitCastExpr 'int (*)(int, int)'

│ │ │ │ └── DeclRefExpr 'int (int, int)' Function 'add'

│ │ │ ├── IntegerLiteral 'int' 3

│ │ │ └── IntegerLiteral 'int' 4

│ │ ├── CallExpr 'int' ← printf("res is %d\n", x);

│ │ │ ├── ImplicitCastExpr

│ │ │ │ └── DeclRefExpr 'printf'

│ │ │ ├── ImplicitCastExpr 'const char *'

│ │ │ │ └── StringLiteral '"res is %d\n"'

│ │ │ └── ImplicitCastExpr 'int'

│ │ │ └── DeclRefExpr 'int' lvalue Var 'x'

│ │ └── ReturnStmt ← return 0;

│ │ └── IntegerLiteral 'int' 0

注意 AST 中出现了很多 ImplicitCastExpr（隐式类型转换）
1. 这些在源码中看不到，但 C++ 语言规则要求它们存在
2. 比如 DeclRefExpr 'a' 是一个左值（lvalue），加法运算需要右值（rvalue），所以编译器插入了一个 LValueToRValue 隐式转换
3. 这就是下一个阶段（语义分析）的工作
在 Clang 源码中的位置

在 Clang 源码中的位置：

clang/lib/Parse/ParseDecl.cpp # 声明的解析

clang/lib/Parse/ParseExpr.cpp # 表达式的解析

clang/lib/Parse/ParseStmt.cpp # 语句的解析

clang/include/clang/AST/Decl.h # AST 声明节点

clang/include/clang/AST/Expr.h # AST 表达式节点

clang/include/clang/AST/Stmt.h # AST 语句节点

语义分析

输入：AST（语法正确但还没验证含义的树）

输出：带类型标注和语义信息的 AST

语义分析做的事情（语法分析不管的，全由语义分析负责）：

1. 名称解析（Name Resolution / Name Lookup）

"add" 这个名字指的是哪个函数？

如果有多个重载版本，选哪一个？

→ 在 AST 的 CallExpr 中，DeclRefExpr 指向了 FunctionDecl 'add'

→ 这个"指向"关系就是名称解析建立的

2. 类型检查（Type Checking）

add(3, 4) 中 3 和 4 是 int，add 接受 (int, int)，匹配

如果写 add("hello", 4)，类型检查会报错：

"error: no matching function for call to 'add'"

3. 隐式类型转换（Implicit Conversions）

int x = 3.14; // double → int 隐式转换（可能警告精度丢失）

printf("...", x); // int → 可变参数的默认提升

a + b // lvalue → rvalue 转换

这些转换在 AST 中表现为 ImplicitCastExpr 节点

4. 重载决议（Overload Resolution）

如果有：

int add(int, int);

double add(double, double);

调用 add(3, 4) 时选择哪一个？

→ 选 add(int, int)，因为参数类型精确匹配

5. 模板实例化（Template Instantiation）

template<typename T> T add(T a, T b) { return a + b; }

add(3, 4); → 实例化出 add<int>

add(1.0, 2.0); → 实例化出 add<double>

模板实例化是 C++ 编译最复杂也最耗时的部分之一

6. 常量求值（Constant Evaluation）

constexpr int x = 3 + 4; → 编译期直接算出 7

static_assert(sizeof(int) == 4); → 编译期验证

7. 生成诊断信息（Diagnostics）

所有的 error / warning 都在语义分析阶段产生

"error: use of undeclared identifier 'xyz'"

"warning: implicit conversion loses integer precision"

快速检查

# 只做语法和语义分析，不生成代码（快速检查代码是否正确）

clang++ -fsyntax-only test.cpp

# 看到更多诊断信息

clang++ -fsyntax-only -Wall -Wextra test.cpp

在 Clang 源码中的位置

在 Clang 源码中的位置：

clang/lib/Sema/SemaDecl.cpp # 声明的语义分析

clang/lib/Sema/SemaExpr.cpp # 表达式的语义分析

clang/lib/Sema/SemaOverload.cpp # 重载决议

clang/lib/Sema/SemaTemplate.cpp # 模板实例化

Sema = Semantic Analysis 的缩写

这是 Clang 中最大最复杂的模块

`IR` 生成

输入：带类型标注的 AST

输出：LLVM IR（中间表示）

这一步把 C/C++ 的高级概念"降级"（Lower）为 LLVM IR 的低级操作。

很多 C++ 特有的概念在这里消失，变成了更底层的表达：

C++ 概念 LLVM IR 中的表达

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

局部变量 int x; %x = alloca i32（栈上分配）

赋值 x = 3; store i32 3, ptr %x

读取 x %val = load i32, ptr %x

函数调用 add(3, 4) %ret = call i32 @_Z3addii(i32 3, i32 4)

返回 return a + b; %sum = add i32 %a, %b / ret i32 %sum

if-else br + phi 函数 / 或分支跳转

for/while 循环 br 跳转构成的循环结构

struct/class 成员访问 getelementptr 指令

虚函数调用从 vtable 加载函数指针 + 间接 call

异常 try-catch invoke + landingpad 指令

析构函数编译器在所有退出路径插入析构调用

IR结果-未优化

# 查看 IR 生成的结果（未优化）

clang++ -emit-llvm -S -O0 test.cpp -o test_O0.ll

cat test_O0.ll

; ModuleID = 'test.cpp'

source_filename = "test.cpp"

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

target triple = "x86_64-pc-linux-gnu"

@.str = private unnamed_addr constant [11 x i8] c"res is %d\0A\00", align 1

; Function Attrs: mustprogress noinline nounwind optnone uwtable

define dso_local noundef i32 @_Z3addii(i32 noundef %0, i32 noundef %1) #0 {

%3 = alloca i32, align 4

%4 = alloca i32, align 4

store i32 %0, i32* %3, align 4

store i32 %1, i32* %4, align 4

%5 = load i32, i32* %3, align 4

%6 = load i32, i32* %4, align 4

%7 = add nsw i32 %5, %6

ret i32 %7

}

; Function Attrs: mustprogress noinline norecurse optnone uwtable

define dso_local noundef i32 @main() #1 {

%1 = alloca i32, align 4

%2 = alloca i32, align 4

store i32 0, i32* %1, align 4

%3 = call noundef i32 @_Z3addii(i32 noundef 3, i32 noundef 4)

store i32 %3, i32* %2, align 4

%4 = load i32, i32* %2, align 4

%5 = call i32 (i8*, ...) @printf(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 noundef %4)

ret i32 0

}

declare i32 @printf(i8* noundef, ...) #2

attributes #0 = { mustprogress noinline nounwind optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

attributes #1 = { mustprogress noinline norecurse optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}

!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}

!1 = !{i32 7, !"PIC Level", i32 2}

!2 = !{i32 7, !"PIE Level", i32 2}

!3 = !{i32 7, !"uwtable", i32 1}

!4 = !{i32 7, !"frame-pointer", i32 2}

!5 = !{!"Debian clang version 14.0.6"}

注意 IR 的几个关键特征：

1. SSA 形式：每个 %变量名只被赋值一次

%add = add nsw i32 %0, %1 ← %add 只在这里定义

不会出现 %add = ...; %add = ...;（不会对同一个变量赋值两次）

2. 类型显式标注：每个操作都带类型

add nsw i32 %0, %1 ← i32 明确说明是 32 位整数加法

LLVM IR 不会像 C 那样有隐式类型转换

3. 平台无关：没有任何 x86/ARM 特定的东西

alloca/load/store/add/call/ret 都是抽象操作

同一份 IR 可以生成任何目标平台的代码

4. 内存操作显式：所有内存访问通过 load/store

C 语言中 a + b 看起来没有内存操作

但 IR 中先 load 再 add，内存访问是显式的

在 Clang 源码中的位置

在 Clang 源码中的位置：

clang/lib/CodeGen/CodeGenFunction.cpp # 函数级别的 IR 生成

clang/lib/CodeGen/CGExpr.cpp # 表达式的 IR 生成

clang/lib/CodeGen/CGStmt.cpp # 语句的 IR 生成

clang/lib/CodeGen/CGCall.cpp # 函数调用的 IR 生成

优化

输入：未优化的 LLVM IR

输出：优化后的 LLVM IR

这个阶段运行一系列优化 Pass（变换），每个 Pass 做一种特定的优化。

对比 -O0 和 -O2 的 IR

clang++ -emit-llvm -S -O0 test.cpp -o test_O0.ll

clang++ -emit-llvm -S -O2 test.cpp -o test_O2.ll

diff test_O0.ll test_O2.ll

; ModuleID = 'test.cpp'

source_filename = "test.cpp"

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

target triple = "x86_64-pc-linux-gnu"

@.str = private unnamed_addr constant [11 x i8] c"res is %d\0A\00", align 1

; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn

define dso_local noundef i32 @_Z3addii(i32 noundef %0, i32 noundef %1) local_unnamed_addr #0 {

%3 = add nsw i32 %1, %0

ret i32 %3

}

; Function Attrs: mustprogress nofree norecurse nounwind uwtable

define dso_local noundef i32 @main() local_unnamed_addr #1 {

%1 = tail call i32 (i8*, ...) @printf(i8* noundef nonnull dereferenceable(1) getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 noundef 7)

ret i32 0

}

; Function Attrs: nofree nounwind

declare noundef i32 @printf(i8* nocapture noundef readonly, ...) local_unnamed_addr #2

attributes #0 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

attributes #1 = { mustprogress nofree norecurse nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

attributes #2 = { nofree nounwind "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3}

!llvm.ident = !{!4}

!0 = !{i32 1, !"wchar_size", i32 4}

!1 = !{i32 7, !"PIC Level", i32 2}

!2 = !{i32 7, !"PIE Level", i32 2}

!3 = !{i32 7, !"uwtable", i32 1}

!4 = !{!"Debian clang version 14.0.6"}

8,17c8,11

< ; Function Attrs: mustprogress noinline nounwind optnone uwtable

< define dso_local noundef i32 @_Z3addii(i32 noundef %0, i32 noundef %1) #0 {

< %3 = alloca i32, align 4

< %4 = alloca i32, align 4

< store i32 %0, i32* %3, align 4

< store i32 %1, i32* %4, align 4

< %5 = load i32, i32* %3, align 4

< %6 = load i32, i32* %4, align 4

< %7 = add nsw i32 %5, %6

< ret i32 %7

---

> ; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn

> define dso_local noundef i32 @_Z3addii(i32 noundef %0, i32 noundef %1) local_unnamed_addr #0 {

> %3 = add nsw i32 %1, %0

> ret i32 %3

20,28c14,16

< ; Function Attrs: mustprogress noinline norecurse optnone uwtable

< define dso_local noundef i32 @main() #1 {

< %1 = alloca i32, align 4

< %2 = alloca i32, align 4

< store i32 0, i32* %1, align 4

< %3 = call noundef i32 @_Z3addii(i32 noundef 3, i32 noundef 4)

< store i32 %3, i32* %2, align 4

< %4 = load i32, i32* %2, align 4

< %5 = call i32 (i8*, ...) @printf(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 noundef %4)

---

> ; Function Attrs: mustprogress nofree norecurse nounwind uwtable

> define dso_local noundef i32 @main() local_unnamed_addr #1 {

> %1 = tail call i32 (i8*, ...) @printf(i8* noundef nonnull dereferenceable(1) getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 noundef 7)

32c20,21

< declare i32 @printf(i8* noundef, ...) #2

---

> ; Function Attrs: nofree nounwind

> declare noundef i32 @printf(i8* nocapture noundef readonly, ...) local_unnamed_addr #2

34,36c23,25

< attributes #0 = { mustprogress noinline nounwind optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

< attributes #1 = { mustprogress noinline norecurse optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

< attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

---

> attributes #0 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

> attributes #1 = { mustprogress nofree norecurse nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

> attributes #2 = { nofree nounwind "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

38,39c27,28

< !llvm.module.flags = !{!0, !1, !2, !3, !4}

< !llvm.ident = !{!5}

---

> !llvm.module.flags = !{!0, !1, !2, !3}

> !llvm.ident = !{!4}

45,46c34

< !4 = !{i32 7, !"frame-pointer", i32 2}

< !5 = !{!"Debian clang version 14.0.6"}

---

> !4 = !{!"Debian clang version 14.0.6"}

查看 -O2 执行了哪些 Pass

# 查看 -O2 执行了哪些 Pass

clang++ -emit-llvm -S -O2 -mllvm -print-pipeline-passes test.cpp -o /dev/null 2>&1

# 输出一长串 Pass 名称

# 手动逐个 Pass 运行，观察每一步的变化

clang++ -emit-llvm -S -O0 test.cpp -o step0.ll

opt --passes=mem2reg step0.ll -S -o step1.ll

# mem2reg：把 alloca 提升为 SSA 寄存器（消除不必要的内存操作）

opt --passes=instcombine step1.ll -S -o step2.ll

# instcombine：指令组合（简化冗余指令）

opt --passes=inline step2.ll -S -o step3.ll

# inline：函数内联（把 add 的函数体复制到调用点）

opt --passes=sccp step3.ll -S -o step4.ll

# sccp：稀疏条件常量传播（3+4 → 7）

# 每一步 diff 看差异

diff step0.ll step1.ll

diff step1.ll step2.ll

diff step2.ll step3.ll

diff step3.ll step4.ll

在 Clang 源码中的位置

在 LLVM 源码中的位置：

llvm/lib/Transforms/Scalar/ # 标量优化 Pass

llvm/lib/Transforms/IPO/ # 过程间优化（内联等）

llvm/lib/Transforms/Vectorize/ # 向量化

llvm/lib/Transforms/InstCombine/ # 指令组合

llvm/lib/Transforms/Utils/ # 工具 Pass（mem2reg 等）

后端代码生成

输入：优化后的 LLVM IR

输出：目标平台的汇编代码（.s 文件）

这一步把平台无关的 IR 翻译成平台相关的机器指令。

内部又分为几个子阶段：

LLVM IR

│

▼

指令选择（Instruction Selection）

│ IR 指令 → 目标机器的指令

│ add nsw i32 %b, %a → 可以用 x86 的 add 指令

│ 也可以用 lea 指令（lea eax, [rdi+rsi]）

│ 编译器选择最优的目标指令

│

▼

寄存器分配（Register Allocation）

│ IR 中有无限多的虚拟寄存器（%0, %1, %2, ...）

│ 真实 CPU 只有 16 个通用寄存器

│ 这一步决定每个虚拟寄存器映射到哪个物理寄存器

│ 如果寄存器不够用，就把一些变量"溢出"（spill）到栈上

│ （这就是你在 -O0 的汇编中看到大量 [rbp-4] 访问的原因——

│ -O0 下所有变量都溢出到栈上，方便调试）

│

▼

指令调度（Instruction Scheduling）

│ 重新排列指令顺序以利用 CPU 流水线

│ 比如把一条 load 指令提前，让 CPU 在等待内存返回数据时

│ 可以同时执行其他不依赖这条 load 的指令

│

▼

汇编输出

生成 .s 文件（就是你看到的 test_intel.s）

查看后端的中间过程

# 查看指令选择的结果（SelectionDAG）

clang++ -emit-llvm -S -O2 test.cpp -o test.ll

llc -march=x86-64 test.ll -o test.s

# 更详细的后端过程

llc -march=x86-64 -print-after-all test.ll -o test.s 2>backend.log

# backend.log 中能看到每个后端 Pass 的输出

# 生成不同目标平台的汇编（同一份 IR）

llc -march=x86-64 test.ll -o test_x86.s # x86-64 汇编

llc -march=aarch64 test.ll -o test_arm.s # ARM64 汇编

llc -march=riscv64 test.ll -o test_riscv.s # RISC-V 64 汇编

# 同一份 IR，三种完全不同的汇编输出

# 这就是 LLVM 架构的核心优势——前端和后端通过 IR 解耦

在 LLVM 源码中的位置

在 LLVM 源码中的位置：

llvm/lib/CodeGen/SelectionDAG/ # SelectionDAG 指令选择

llvm/lib/CodeGen/GlobalISel/ # GlobalISel 指令选择（新一代）

llvm/lib/CodeGen/RegAllocGreedy.cpp # 贪心寄存器分配

llvm/lib/CodeGen/MachineScheduler.cpp # 指令调度

llvm/lib/Target/X86/ # x86 后端

llvm/lib/Target/AArch64/ # ARM64 后端

llvm/lib/Target/RISCV/ # RISC-V 后端

编译阶段不做的事情

编译阶段不做的谁来做

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

宏展开、#include 展开预处理器（之前的阶段）

把汇编文本编码为机器码字节汇编器（之后的阶段）

解析跨文件的符号引用链接器（更后面的阶段）

确定最终的内存地址链接器

生成 GOT/PLT 链接器

示例

`code`

#include <cstdio>

int add(int a, int b) {

return a + b;

}

int main() {

int x = add(3,4);

printf("res is %d\n", x);

return 0;

}

`intel`汇编

.text

.intel_syntax noprefix

.file "test.cpp"

.globl _Z3addii # -- Begin function _Z3addii

.p2align 4, 0x90

.type _Z3addii,@function

_Z3addii: # @_Z3addii

.cfi_startproc

# %bb.0:

push rbp

.cfi_def_cfa_offset 16

.cfi_offset rbp, -16

mov rbp, rsp

.cfi_def_cfa_register rbp

mov dword ptr [rbp - 4], edi

mov dword ptr [rbp - 8], esi

mov eax, dword ptr [rbp - 4]

add eax, dword ptr [rbp - 8]

pop rbp

.cfi_def_cfa rsp, 8

ret

.Lfunc_end0:

.size _Z3addii, .Lfunc_end0-_Z3addii

.cfi_endproc

# -- End function

.globl main # -- Begin function main

.p2align 4, 0x90

.type main,@function

main: # @main

.cfi_startproc

# %bb.0:

push rbp

.cfi_def_cfa_offset 16

.cfi_offset rbp, -16

mov rbp, rsp

.cfi_def_cfa_register rbp

sub rsp, 16

mov dword ptr [rbp - 4], 0

mov edi, 3

mov esi, 4

call _Z3addii

mov dword ptr [rbp - 8], eax

mov esi, dword ptr [rbp - 8]

lea rdi, [rip + .L.str]

mov al, 0

call printf@PLT

xor eax, eax

add rsp, 16

pop rbp

.cfi_def_cfa rsp, 8

ret

.Lfunc_end1:

.size main, .Lfunc_end1-main

.cfi_endproc

# -- End function

.type .L.str,@object # @.str

.section .rodata.str1.1,"aMS",@progbits,1

.L.str:

.asciz "res is %d\n"

.size .L.str, 11

.ident "Debian clang version 14.0.6"

.section ".note.GNU-stack","",@progbits

.addrsig

.addrsig_sym _Z3addii

.addrsig_sym printf

`intel`汇编解析

构成

汇编器指令（以 . 开头的，不是 CPU 指令，是给汇编器看的元数据）
add 函数的汇编代码
main 函数的汇编代码
字符串常量数据
文件尾部元信息

`.text`

声明接下来的内容属于 .text 节（代码段）
对应 ELF 文件中的 .text section

`.intel_syntax noprefix`

告诉汇编器使用 Intel 语法，并且寄存器名不需要 % 前缀
这是用 -masm=intel 参数的结果
如果不加这个参数，生成的是 AT&T 语法（movl %edi, -4(%rbp) 那种风格）

`.file "test.cpp"`

记录源文件名。这个信息会写入 ELF 的调试信息和符号表中

`.globl _Z3addii`

声明 _Z3addii 是一个全局符号（外部可见）
.globl 对应 nm 输出中的大写 T
如果没有 .globl，这个符号就是局部的（小写 t），其他 .o 文件看不到它

`_Z3addii` 是什么?

是 C++ 的名称修饰（Name Mangling）
因为 C++ 有函数重载，add(int, int) 和 add(double, double) 必须有不同的符号名
规则是：

_Z3addii

│ │ ││

│ │ │└─ 第二个参数类型：i = int

│ │ └── 第一个参数类型：i = int

│ └───── 函数名 "add"（3 是名字长度）

└─────── _Z 是 C++ mangled name 的固定前缀

# 用 c++filt 反修饰，验证一下：

echo "_Z3addii" | c++filt

# 输出：add(int, int)

区别于msvc的名称修饰

1	?add@@YAHHH@Z

`.p2align 4, 0x90`

把下一条指令的地址对齐到 2⁴ = 16 字节边界
用 0x90（NOP 指令）填充空隙
函数入口对齐到 16 字节是为了 CPU 指令缓存性能——缓存行通常是 64 字节，对齐的代码缓存命中率更高

`.type _Z3addii,@function`

告诉汇编器这个符号是一个函数（不是变量）
这个信息写入 ELF 符号表的 st_info 字段
1. readelf -s 输出中的 FUNC 类型就来自这里

`_Z3addii: # @_Z3addii`

函数标签（Label），标记函数的入口地址
@_Z3addii 是 LLVM 内部 IR 中的符号名注释

`.cfi_startproc`

CFI = Call Frame Information
这是 DWARF 调试信息的一部分，告诉调试器和异常处理机制"这里开始了一个新函数的栈帧"
1. GDB 的 bt 命令和 C++ 异常的栈展开（stack unwinding）都依赖 CFI 信息
对应 Windows 上 .pdata 和 .xdata 节中的 unwind 信息

`# %bb.0:`

注释，表示这是第 0 个基本块（Basic Block）
%bb.0 是 LLVM IR 中基本块的编号
一个基本块是一段顺序执行的指令，没有中间的跳转——只在开头进入，末尾离开

`push rbp`

add 函数的指令
把调用者的帧指针压入栈
1. rsp 自动减 8（64 位系统指针是 8 字节）

执行前：执行后：

rsp → [ ... ] rsp → [ 旧 rbp ] ← rsp 减了 8

[ 返回地址 ] [ 返回地址 ] 返回地址是 call 指令自动压入的

`.cfi_def_cfa_offset 16`

告诉调试器：现在 CFA（Canonical Frame Address，规范帧地址）在 rsp + 16 处
1. 因为刚压入了 8 字节的 rbp，加上之前 call 指令压入的 8 字节返回地址，所以 CFA 距离当前 rsp 是 16

`.cfi_offset rbp, -16`

告诉调试器：旧的 rbp 值保存在 CFA-16 的位置
调试器做栈展开时需要恢复每一帧的 rbp

`mov rbp, rsp`

建立当前函数的帧指针
从此刻起，rbp 指向当前栈帧的底部，局部变量和参数都通过 rbp 偏移访问

执行后的栈布局：

rbp,rsp → [ 旧 rbp ] [rbp+0]

[ 返回地址 ] [rbp+8]

[ 调用者的栈帧 ]

`.cfi_def_cfa_register rbp`

告诉调试器：现在 CFA 改用 rbp + 16 来计算了
1. 因为 rbp 已经固定，后续 rsp 可能变化，但 rbp 不变

`mov dword ptr [rbp - 4], edi`

把第一个参数 a 从寄存器 edi 保存到栈上 [rbp-4] 的位置
dword ptr 表示操作的是 4 字节（32 位）数据，因为 int 是 4 字节
edi 是 System V 调用约定中第一个整数参数的寄存器

`mov dword ptr [rbp - 8], esi`

把第二个参数 b 从 esi 保存到栈上 [rbp-8]
esi 是第二个参数的寄存器
为什么要把寄存器里的参数保存到栈上？
1. 因为这是 -O0（未优化）编译
2. 编译器在 -O0 模式下会把所有变量都放到栈上，方便调试
3. 在 GDB 里 print a 时，GDB 就是从 [rbp-4] 读取 a 的值
4. 开了优化后，这两行会被消除，参数直接留在寄存器中

`mov eax, dword ptr [rbp - 4]`

从栈上读回 a 到 eax

`add eax, dword ptr [rbp - 8]`

eax = eax + [rbp-8]
1. 也就是 eax = a + b
2. 计算结果在 eax 中
System V 调用约定规定整数返回值放在 eax/rax 中，所以这条指令之后返回值已经准备好了

`pop rbp`

恢复调用者的帧指针
rsp 自动加 8

`.cfi_def_cfa rsp, 8`

告诉调试器：栈帧已经被销毁了，CFA 现在在 rsp + 8（因为栈上还剩返回地址）

`ret`

从栈中弹出返回地址到 rip，跳转回调用者
1. 等价于 pop rip
执行完后控制权回到 main 中 call _Z3addii 的下一条指令

`.Lfunc_end0`

.Lfunc_end0 标记函数末尾的地址
.size 指令计算函数的大小：末尾地址 - 起始地址
1. 这个信息写入 ELF 符号表的 st_size 字段
.L 前缀表示这是一个局部标签，不会出现在符号表中

1 2	.Lfunc_end0: .size _Z3addii, .Lfunc_end0-_Z3addii

`.cfi_endproc`

CFI 信息结束。和 .cfi_startproc 配对

`main` 函数

和 add 函数一样的序言声明
注意 main 没有名称修饰——因为 main 是程序入口，链接器和 C 运行时需要按字面名 main 找到它
1. C++ 标准规定 main 不做 mangling

.globl main

.p2align 4, 0x90

.type main,@function

main:

.cfi_startproc

和 add 函数一样的栈帧建立

# %bb.0:

push rbp

.cfi_def_cfa_offset 16

.cfi_offset rbp, -16

mov rbp, rsp

.cfi_def_cfa_register rbp

和 add 函数一样的栈帧建立

1	sub rsp, 16

在栈上分配 16 字节的局部变量空间
1. 为什么是 16 而不是 8（只需要两个 int = 8 字节）？
2. 因为 x86-64 ABI 要求 call 指令执行时 rsp 必须 16 字节对齐
3. 编译器分配的栈空间总是 16 的倍数

执行后的栈布局：

rbp → [ 旧 rbp ] [rbp+0]

[ 返回地址 ] [rbp+8]

[ 调用者的栈帧 ]

rbp-4 → [ (int) 0 ] main 的隐含返回值（见下一行）

rbp-8 → [ (int) x ] 局部变量 x

rbp-12 → [ 未使用 ]

rbp-16 → [ 未使用 ] ← rsp 指向这里

mov dword ptr [rbp - 4], 0
1. 这行源码里没有对应的语句
2. 它是编译器为 main 函数隐含生成的——main 的返回值初始化为 0
3. C++ 标准规定如果 main 没有显式 return，默认返回 0
4. 编译器在 -O0 下会在栈上预留一个位置存这个返回值
准备调用 add(3, 4)
1. 按 System V 调用约定，第一个参数放 edi（3），第二个放 esi（4）
2. 对比 Windows x64 调用约定会是 ecx 和 edx。

1 2	mov edi, 3 mov esi, 4

调用 add 函数
1. call 做两件事：把下一条指令的地址（返回地址）压入栈，然后跳转到 _Z3addii

1	call _Z3addii

add 返回后，返回值在 eax 中
1. 这行把返回值（7）保存到局部变量 x 的栈位置 [rbp-8]

1	mov dword ptr [rbp - 8], eax

准备 printf 的第二个参数
1. 从栈上读回 x 的值到 esi</li> <li>printf("res is %d\n", x) 中 x 是第二个参数，所以放 esi

1	mov esi, dword ptr [rbp - 8]

准备 printf 的第一个参数
1. 格式字符串 "res is %d\n" 的地址
2. lea 是 Load Effective Address，只计算地址不访问内存

1	lea rdi, [rip + .L.str]

mov al, 0
1. 这行是 System V 调用约定对可变参数函数（variadic function）的特殊要求
2. printf 是可变参数函数（int printf(const char*, ...)），调用约定规定 al 中要存放使用的浮点寄存器数量
3. 这里没有浮点参数，所以 al = 0
4. Windows x64 没有这个要求
call printf@PLT
1. 通过 PLT（Procedure Linkage Table）调用 printf
2. printf 在 libc.so 中，是动态链接的外部函数，所以需要通过 PLT 间接调用
xor eax, eax
1. eax = 0
2. 这是 return 0 的实现
3. xor eax, eax 比 mov eax, 0 更好——指令更短（2 字节 vs 5 字节），而且现代 CPU 对 xor reg, reg 有特殊优化（识别为"清零惯用语"，不需要等待 eax 的旧值）
4. 这是编译器最经典的小优化之一，即使在 -O0 下也会做
add rsp, 16
1. 释放之前 sub rsp, 16 分配的栈空间
恢复帧指针，返回
1. 和 add 函数的结尾一样

pop rbp

.cfi_def_cfa rsp, 8

ret

`[rip + .L.str]`

是 RIP 相对寻址——用当前指令地址（rip）加上到 .L.str 的偏移来计算字符串的地址
这是位置无关代码（PIC）的关键技术
无论这段代码被加载到内存的哪个位置，rip 到 .L.str 的偏移是固定的，所以总能找到正确的字符串地址
对应 Windows：MSVC 在 x64 下也使用 RIP 相对寻址，原理一样

对比 `call _Z3addii`（直接调用）和 `call printf@PLT`（`PLT` 间接调用）

add 是你自己定义的函数，在同一个编译单元中，链接器可以直接填入地址
printf 在共享库中，需要运行时动态解析

`.type .L.str,@object`

声明 .L.str 是一个数据对象（不是函数）

`.section .rodata.str1.1,"aMS",@progbits,1`

切换到 .rodata.str1.1 节
1. 这是只读数据段中专门存放字符串的子节
参数含义：
1. "aMS"：a = allocatable（加载到内存），M = mergeable（相同字符串可以合并），S = strings（包含以零结尾的字符串）
2. @progbits：节包含程序数据（不是 BSS 那样的空间占位）
3. 1：对齐到 1 字节
M 和 S 标志让链接器可以做字符串合并优化：
1. 如果多个 .o 文件都有 "res is %d\n" 这个字符串，链接器只保留一份

`.L.str:`

字符串标签和内容
1. .asciz 表示以零字节（\0）结尾的 ASCII 字符串
2. .L 前缀表示局部标签

1 2	.L.str: .asciz "res is %d\n"

`.size .L.str, 11`

字符串大小是 11 字节：res is %d\n 是 10 个字符加上末尾的 \0

`.ident "Debian clang version 14.0.6"`

在 ELF 的 .comment 节中记录编译器版本
readelf -p .comment program 可以看到

`.section ".note.GNU-stack","",@progbits`

声明这个目标文件的栈不需要可执行权限
这是一个安全特性——如果所有 .o 文件都有这个标记，链接器就会生成不可执行栈的可执行文件（NX 位保护，防止栈溢出攻击执行 shellcode）
如果任何一个 .o 缺少这个标记，链接器会保守地让栈可执行

`.addrsig`

地址重要性标记（Address Significance Table）
这是 LLVM 的 LLD 链接器用的优化信息——告诉链接器哪些符号的地址被"取过"（即代码中有 &func 这样的操作）
如果一个函数的地址没有被取过，链接器在做 ICF（Identical Code Folding，合并相同函数体）时可以更激进地优化
GNU ld 会忽略这些标记

.addrsig

.addrsig_sym _Z3addii

.addrsig_sym printf

完整的执行流程图

调用 main 之前，C 运行时（_start → __libc_start_main）已经设置好了栈

main:

push rbp / mov rbp,rsp / sub rsp,16 ← 建立栈帧

mov [rbp-4], 0 ← 隐含返回值 = 0

mov edi, 3 / mov esi, 4 ← 准备参数

call _Z3addii ← 调用 add

│

▼

add:

push rbp / mov rbp,rsp ← 建立栈帧

mov [rbp-4], edi ← a=3 存到栈上

mov [rbp-8], esi ← b=4 存到栈上

mov eax, [rbp-4] ← eax = 3

add eax, [rbp-8] ← eax = 3+4 = 7

pop rbp / ret ← 返回，eax=7

│

▼ （回到 main，eax=7）

mov [rbp-8], eax ← x = 7

mov esi, [rbp-8] ← printf 第 2 个参数 = 7

lea rdi, [rip+.L.str] ← printf 第 1 个参数 = "res is %d\n"

mov al, 0 ← 0 个浮点参数

call printf@PLT ← 调用 printf（通过 PLT）

xor eax, eax ← return 0

add rsp,16 / pop rbp / ret ← 清理栈帧，返回

其他

`CPU` 指令缓存机制

`CPU` 指令缓存性能

关于缓存命中

概述
编译阶段
编译阶段的内部流程
生成ATT汇编
生成Intel汇编
编译阶段详解
code
词法分析
语法分析
语义分析
IR 生成
优化
后端代码生成
编译阶段不做的事情
示例
code
intel汇编
intel汇编解析
构成
.text
.intel_syntax noprefix
.file "test.cpp"
.globl _Z3addii
_Z3addii 是什么?
.p2align 4, 0x90
.type _Z3addii,@function
_Z3addii: # @_Z3addii
.cfi_startproc
# %bb.0:
push rbp
.cfi_def_cfa_offset 16
.cfi_offset rbp, -16
mov rbp, rsp
.cfi_def_cfa_register rbp
mov dword ptr [rbp - 4], edi
mov dword ptr [rbp - 8], esi
mov eax, dword ptr [rbp - 4]
add eax, dword ptr [rbp - 8]
pop rbp
.cfi_def_cfa rsp, 8
ret
.Lfunc_end0
.cfi_endproc
main 函数
[rip + .L.str]
对比 call _Z3addii（直接调用）和 call printf@PLT（PLT 间接调用）
.type .L.str,@object
.section .rodata.str1.1,"aMS",@progbits,1
.L.str:
.size .L.str, 11
.ident "Debian clang version 14.0.6"
.section ".note.GNU-stack","",@progbits
.addrsig
完整的执行流程图
其他
CPU 指令缓存机制
CPU 指令缓存性能
关于缓存命中

声明：本文为原创文章，版权归Aet所有，欢迎分享本文，转载请保留出处！

Bingliaolong 关注：0 粉丝：0

Everything will be better.

Linux学习：编译相关

概述

编译阶段

编译阶段的内部流程

生成ATT汇编

生成Intel汇编

编译阶段详解

code

词法分析

语法分析

语义分析

IR 生成

优化

后端代码生成

编译阶段不做的事情

示例

code

intel汇编

intel汇编解析

构成

.text

.intel_syntax noprefix

.file "test.cpp"

.globl _Z3addii

_Z3addii 是什么?

.p2align 4, 0x90

.type _Z3addii,@function

_Z3addii: # @_Z3addii

.cfi_startproc

# %bb.0:

push rbp

.cfi_def_cfa_offset 16

.cfi_offset rbp, -16

mov rbp, rsp

.cfi_def_cfa_register rbp

mov dword ptr [rbp - 4], edi

mov dword ptr [rbp - 8], esi

mov eax, dword ptr [rbp - 4]

add eax, dword ptr [rbp - 8]

pop rbp

.cfi_def_cfa rsp, 8

ret

.Lfunc_end0

.cfi_endproc

main 函数

[rip + .L.str]

对比 call _Z3addii（直接调用）和 call printf@PLT（PLT 间接调用）

.type .L.str,@object

.section .rodata.str1.1,"aMS",@progbits,1

.L.str:

.size .L.str, 11

.ident "Debian clang version 14.0.6"

.section ".note.GNU-stack","",@progbits

.addrsig

完整的执行流程图

其他

CPU 指令缓存机制

CPU 指令缓存性能

关于缓存命中

你可能也喜欢

热评文章

发表评论

标签云集

生成`ATT`汇编

生成`Intel`汇编

`code`

`IR` 生成

`code`

`intel`汇编

`intel`汇编解析

`.text`

`.intel_syntax noprefix`

`.file "test.cpp"`

`.globl _Z3addii`

`_Z3addii` 是什么?

`.p2align 4, 0x90`

`.type _Z3addii,@function`

`_Z3addii: # @_Z3addii`

`.cfi_startproc`

`# %bb.0:`

`push rbp`

`.cfi_def_cfa_offset 16`

`.cfi_offset rbp, -16`

`mov rbp, rsp`

`.cfi_def_cfa_register rbp`

`mov dword ptr [rbp - 4], edi`

`mov dword ptr [rbp - 8], esi`

`mov eax, dword ptr [rbp - 4]`

`add eax, dword ptr [rbp - 8]`

`pop rbp`

`.cfi_def_cfa rsp, 8`

`ret`

`.Lfunc_end0`

`.cfi_endproc`

`main` 函数

`[rip + .L.str]`

对比 `call _Z3addii`（直接调用）和 `call printf@PLT`（`PLT` 间接调用）

`.type .L.str,@object`

`.section .rodata.str1.1,"aMS",@progbits,1`

`.L.str:`

`.size .L.str, 11`

`.ident "Debian clang version 14.0.6"`

`.section ".note.GNU-stack","",@progbits`

`.addrsig`

`CPU` 指令缓存机制

`CPU` 指令缓存性能