0%

StudyRecord-DivingIntoThe_EVM_Part1_Introduction_to_the_EVM_assembly_code

Pre:

Diving Into The Ethereum Virtual Machine StudyRecord

Article pre:

Solidity offers many high-level language abstractions, but these features make it hard to understand what’s really going on when my program is running.

Reading the Solidity documentation still left me confused over very basic things.

What are the differences between string, bytes32, byte[], bytes?

  • Which one do I use, when?

  • What’s happening when I cast a string to bytes? Can I cast to byte[]?

  • How much do they cost?

How are mappings stored by the EVM?

  • Why can’t I delete a mapping?

  • Can I have mappings of mappings? (Yes, but how does that work?)

  • Why is there storage mapping, but no memory mapping?

How does a compiled contract look to the EVM?

  • How is a contract created?

  • What is a constructor, really?

  • What is the fallback function?

One Storage Variables:

1
2
3
4
5
6
7
8
// c1.sol
pragma solidity ^0.4.11;
contract C { // a constructor
uint256 a; // a state variable
function C() {
a = 1;
}
}

Compile this contract with solc:

1
2
# 利用docker环境里的solc-select
solc-select use 0.4.12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
$  solc --bin --asm c1.sol

======= c1.sol:C =======
EVM assembly:
/* "c1.sol":35:102 contract C {... */
mstore(0x40, 0x60)
/* "c1.sol":67:100 function C() {... */
jumpi(tag_1, iszero(callvalue))
0x0
dup1
revert
tag_1:
tag_2:
/* "c1.sol":92:93 1 */
0x1
/* "c1.sol":88:89 a */
0x0
/* "c1.sol":88:93 a = 1 */
dup2
swap1
sstore
pop
/* "c1.sol":67:100 function C() {... */
tag_3:
/* "c1.sol":35:102 contract C {... */
tag_4:
dataSize(sub_0)
dup1
dataOffset(sub_0)
0x0
codecopy
0x0
return
stop

sub_0: assembly {
/* "c1.sol":35:102 contract C {... */
mstore(0x40, 0x60)
tag_1:
0x0
dup1
revert

auxdata: 0xa165627a7a72305820ddb16b0f7c271651b06d31b94a319888a30cd4644e30518f4079851a5ee37a210029
}

Binary:
60606040523415600e57600080fd5b5b60016000819055505b5b60368060266000396000f30060606040525b600080fd00a165627a7a72305820ddb16b0f7c271651b06d31b94a319888a30cd4644e30518f4079851a5ee37a210029
$

编译出来的Binary和原文有些不同,但应该问题不大。
evm实际跑的是Binary里的数据

In Baby Steps:

从storage variable assignment看起:

1
a = 1

This assignment is represented by the bytecode 6001600081905550

1
2
3
4
5
6
60 01
60 00
81
90
55
50

The EVM is basically a loop that execute each instruction from top to bottom

把上面tgs_2的内容换个格式;

1
2
3
4
5
6
7
8
9
10
11
12
13
tag_2:
// 60 01
0x1
// 60 00
0x0
// 81
dup2
// 90
swap1
// 55
sstore
// 50
pop

0x1在汇编代码中实际上是push(0x1). 该指令将数字 1 压入堆栈。

Simulating The EVM:

The EVM is a stack machine.
Instructions指令 might use values on the stack as arguments, and push values onto the stack as results. Let’s consider the operation add.

栈里有两个值

1
[1 2]

When the EVM sees add, it adds the top 2 items together, and pushes the answer back onto the stack, resulting in:

1
[3]

符号表示:

In what follows, we’ll notate the stack with []:

1
2
3
4
// The empty stack
stack: []
// Stack with three items. The top item is 3. The bottom item is 1.
stack: [3 2 1]

And notate the contract storage with {}:

1
2
3
4
// Nothing in storage.
store: {}
// The value 0x1 is stored at the position 0x0.
store: { 0x0 => 0x1 }

模拟:

Let’s now look at some real bytecode. We’ll simulate the bytecode sequence 6001600081905550(a=1) as EVM would, and print out the machine state after each instruction:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// 60 01: pushes 1 onto stack
0x1
stack: [0x1]
// 60 00: pushes 0 onto stack
0x0
stack: [0x0 0x1]
// 81: duplicate the second item on the stack
dup2
stack: [0x1 0x0 0x1]
// 90: swap the top two items
swap1
stack: [0x0 0x1 0x1]
// 55: store the value 0x1 at position 0x0
// This instruction consumes消费 the top 2 items
sstore
stack: [0x1]
store: { 0x0 => 0x1 }
// 50: pop (throw away the top item)
pop
stack: []
store: { 0x0 => 0x1 }

20220705214842

The end. The stack is empty, and there’s one item in storage.

What’s worth noting is that Solidity had decided to store the state variable uint256 a at the position 0x0.
It’s perfectly possible for other languages to choose to store the state variable elsewhere.

等价表示:

In pseudocode, what the EVM does for 6001600081905550 is essentially:

1
2
// a = 1
sstore(0x0, 0x1)

其中,dup2, swap1, pop是多余的,汇编代码可以更简单

1
2
3
0x1
0x0
sstore

You could try to simulate the above 3 instructions, and satisfy yourself that they indeed result in the same machine state:

怎么自己模拟指令呢。。。通过推演的方式?

1
2
stack: []
store: { 0x0 => 0x1 }

Two Storage Variables:

add one extra storage variable

1
2
3
4
5
6
7
8
9
10
// c2.sol
pragma solidity ^0.4.11;
contract C {
uint256 a;
uint256 b;
function C() {
a = 1;
b = 2;
}
}

Compile, focusing on tag_2:

继续自己编译一下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
$ solc --bin --asm c2.sol

======= c2.sol:C =======
EVM assembly:
/* "c2.sol":36:136 contract C {... */
mstore(0x40, 0x60)
/* "c2.sol":84:134 function C() {... */
jumpi(tag_1, iszero(callvalue))
0x0
dup1
revert
tag_1:
tag_2:
/* "c2.sol":111:112 1 */
0x1
/* "c2.sol":107:108 a */
0x0
/* "c2.sol":107:112 a = 1 */
dup2
swap1
sstore
pop
/* "c2.sol":126:127 2 */
0x2
/* "c2.sol":122:123 b */
0x1
/* "c2.sol":122:127 b = 2 */
dup2
swap1
sstore
pop
/* "c2.sol":84:134 function C() {... */
tag_3:
/* "c2.sol":36:136 contract C {... */
tag_4:
dataSize(sub_0)
dup1
dataOffset(sub_0)
0x0
codecopy
0x0
return
stop

sub_0: assembly {
/* "c2.sol":36:136 contract C {... */
mstore(0x40, 0x60)
tag_1:
0x0
dup1
revert

auxdata: 0xa165627a7a7230582070e0c9efb38b3859709be0dc598f0f59277f9499ee8c60ab2e0043222726f9a60029
}
Binary:
60606040523415600e57600080fd5b5b600160008190555060026001819055505b5b603680602e6000396000f30060606040525b600080fd00a165627a7a7230582070e0c9efb38b3859709be0dc598f0f59277f9499ee8c60ab2e0043222726f9a60029

The assembly in pseudocode伪代码:

1
2
3
4
// a = 1
sstore(0x0, 0x1)
// b = 2
sstore(0x1, 0x2)

What we learn here is that the two storage variables are positioned one after the other, with a in position 0x0 and b in position 0x1.

Storage Packing:

Each slot storage can store 32 bytes. It’d be wasteful to use all 32 bytes if a variable only needs 16 bytes.
Solidity optimizes for storage efficiency by packing two smaller data types into one storage slot if possible.

Let’s change a and b so they are only 16 bytes each:

1
2
3
4
5
6
7
8
9
pragma solidity ^0.4.11;
contract C {
uint128 a;
uint128 b;
function C() {
a = 1;
b = 2;
}
}

Compile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
$ solc --bin --asm c3.sol

======= c3.sol:C =======
EVM assembly:
/* "c3.sol":26:126 contract C {... */
mstore(0x40, 0x60)
/* "c3.sol":74:124 function C() {... */
jumpi(tag_1, iszero(callvalue))
0x0
dup1
revert
tag_1:
tag_2:
/* "c3.sol":101:102 1 */
0x1
/* "c3.sol":97:98 a */
0x0
dup1
/* "c3.sol":97:102 a = 1 */
0x100
exp
dup2
sload
dup2
0xffffffffffffffffffffffffffffffff
mul
not
and
swap1
dup4
0xffffffffffffffffffffffffffffffff
and
mul
or
swap1
sstore
pop
/* "c3.sol":116:117 2 */
0x2
/* "c3.sol":112:113 b */
0x0
0x10
/* "c3.sol":112:117 b = 2 */
0x100
exp
dup2
sload
dup2
0xffffffffffffffffffffffffffffffff
mul
not
and
swap1
dup4
0xffffffffffffffffffffffffffffffff
and
mul
or
swap1
sstore
pop
/* "c3.sol":74:124 function C() {... */
tag_3:
/* "c3.sol":26:126 contract C {... */
tag_4:
dataSize(sub_0)
dup1
dataOffset(sub_0)
0x0
codecopy
0x0
return
stop

sub_0: assembly {
/* "c3.sol":26:126 contract C {... */
mstore(0x40, 0x60)
tag_1:
0x0
dup1
revert

auxdata: 0xa165627a7a723058205c61ec381fa987a743b915be31883e35d32dccd6b264c9408a6563273d683e5d0029
}
Binary:
60606040523415600e57600080fd5b5b60016000806101000a8154816fffffffffffffffffffffffffffffffff02191690836fffffffffffffffffffffffffffffffff1602179055506002600060106101000a8154816fffffffffffffffffffffffffffffffff02191690836fffffffffffffffffffffffffffffffff1602179055505b5b60368060916000396000f30060606040525b600080fd00a165627a7a723058205c61ec381fa987a743b915be31883e35d32dccd6b264c9408a6563273d683e5d0029

The above assembly code packs these two variables together in one storage position (0x0), like this:

1
2
[         b         ][         a         ]
[16 bytes / 128 bits][16 bytes / 128 bits]

The reason to pack is because the most expensive operations by far are storage usage:

  • sstore costs 20000 gas for first write to a new position.

  • sstore costs 5000 gas for subsequent writes to an existing position.

  • sload costs 500 gas.

  • Most instructions costs 3~10 gases.

By using the same storage position, Solidity pays 5000 for the second store variable instead of 20000, saving us 15000 in gas.

More Optimization:

Instead of storing a and b with two separate sstore instructions, it should be possible to pack the two 128 bits numbers together in memory, then store them using just one sstore, saving an additional 5000 gas.

You can ask Solidity to make this optimization by turning on the optimize flag:

1
$ solc --bin --asm --optimize c3.sol

Which produces assembly code that uses just one sload and one sstore:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
tag_2:
/* "c3.sol":97:98 a */
0x0
/* "c3.sol":97:102 a = 1 */
dup1
sload
/* "c3.sol":112:117 b = 2 */
0x200000000000000000000000000000000
not(sub(exp(0x2, 0x80), 0x1))
/* "c3.sol":97:102 a = 1 */
swap1
swap2
and
/* "c3.sol":101:102 1 */
0x1
/* "c3.sol":97:102 a = 1 */
or
sub(exp(0x2, 0x80), 0x1)
/* "c3.sol":112:117 b = 2 */
and
or
swap1
sstore
/* "c3.sol":74:124 function C() {... */

...
Binary:
60606040523415600e57600080fd5b5b600080547002000000000000000000000000000000006001608060020a03199091166001176001608060020a03161790555b5b603680604f6000396000f30060606040525b600080fd00a165627a7a7230582037e97488c5ea276798eb3a3e48ab39f5d02bb4e2936813cde904a02bbd43abb10029

加了优化参数,少了不少汇编代码

20220706014839

The bytecode is:

1
600080547002000000000000000000000000000000006001608060020a03199091166001176001608060020a0316179055

And formatting the bytecode to one instruction per line:

TODO:这步操作可以多熟悉熟悉,或者看看有没其他工具可以帮助看汇编代码的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// push 0x0
60 00
// dup1
80
// sload
54
// push17 push the the next 17 bytes as a 32 bytes number
70 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
/* not(sub(exp(0x2, 0x80), 0x1)) */
// push 0x1
60 01
// push 0x80 (32)
60 80
// push 0x80 (2)
60 02
// exp
0a
// sub
03
// not
19
// swap1
90
// swap2
91
// and
16
// push 0x1
60 01
// or
17
/* sub(exp(0x2, 0x80), 0x1) */
// push 0x1
60 01
// push 0x80
60 80
// push 0x02
60 02
// exp
0a
// sub
03
// and
16
// or
17
// swap1
90
// sstore
55

There are four magic values used in the assembly code:

  • 0x1 (16 bytes), using lower 16 bytes

1
2
3
// Represented as 0x01 in bytecode
16:32 0x00000000000000000000000000000000
00:16 0x00000000000000000000000000000001
  • 0x2 (16 bytes), using higher 16bytes

1
2
3
// Represented as 0x200000000000000000000000000000000 in bytecode
16:32 0x00000000000000000000000000000002
00:16 0x00000000000000000000000000000000
  • not(sub(exp(0x2, 0x80), 0x1))

1
2
3
// Bitmask for the upper 16 bytes
16:32 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
00:16 0x00000000000000000000000000000000
  • sub(exp(0x2, 0x80), 0x1)

1
2
3
// Bitmask for the lower 16 bytes
16:32 0x00000000000000000000000000000000
00:16 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

The code does some bits-shuffling移位 with these values to arrive at the desired result:

1
2
16:32 0x00000000000000000000000000000002 
00:16 0x00000000000000000000000000000001

Finally, this 32bytes value is stored at position 0x0.

TODO:这一小节需要回看。。。

Gas Usage:

1
600080547002000000000000000000000000000000006001608060020a03199091166001176001608060020a0316179055

是上面tag2部分的Binary数据,tag2里面的操作就是赋值uint128 a= 1uint128 b=2

Summary:

Refs: