Formal Specification of the Encoding
How different types of data are ABI-encoded in Ethereum. ABI stands for Application Binary Interface, and in the context of Ethereum, it dictates how functions in a smart contract are called and how data is represented in the Ethereum Virtual Machine.
Here's a breakdown of the provided content:
Static vs Dynamic Types:
Static types are types whose size is known at compile-time. For instance,
uint256
,int8
, andaddress
are all static types.Dynamic types are types whose size can vary. Examples include
bytes
,string
, and arrays likeuint[]
.
Encoding of Tuples:
Tuples are ordered lists of elements.
A tuple's encoding is the concatenation of the encoded value of each element. If an element is dynamic, its place in the "head" part will contain an offset to its actual position in the data (which is in the "tail" part).
Encoding of Arrays:
Fixed-size arrays (like
uint[5]
) are encoded as if they're tuples.Dynamic arrays (like
uint[]
) are prefixed with the length of the array.
Encoding of
bytes
andstring
:Both are considered dynamic types.
They're encoded with a prefix indicating their length followed by the actual data.
Strings are first converted to utf-8 bytes and then encoded as
bytes
.
Encoding of Integers and Addresses:
These are encoded in big-endian format.
For integers, it mentions two's complement encoding, which is a method used to represent signed integers in binary.
Encoding of Boolean:
Booleans (
bool
) are encoded asuint8
wheretrue
is1
andfalse
is0
.
Encoding of Fixed-Point Numbers:
Fixed-point numbers are numbers that have a fixed number of digits after the decimal point. Ethereum uses a multiplicative factor to treat them as integers for encoding purposes.
Encoding of
bytes<M>
:Fixed-length byte arrays are encoded by padding them to the right (if necessary) until they're 32 bytes long.
The key takeaway is that everything in the Ethereum ABI encoding gets broken down into chunks of 32 bytes. If a data type is fixed-size and smaller than 32 bytes, it's padded; if it's larger, it gets special treatment. For dynamic data types, an offset pointing to the actual data location is used.
Understanding ABI encoding is crucial for those who are dealing with raw Ethereum transactions or those who want a deep understanding of how data is serialized and deserialized on the Ethereum network.
1. Encoding of Integers:
Let's say we want to encode the integer 5
which is of type uint8
. In big-endian format and padded to 32 bytes, it would look like:
2. Encoding of Boolean:
For a bool
, true
is encoded as 1
and false
as 0
. So encoding true
would look like:
3. Encoding of bytes
:
bytes
:Let's consider a bytes
value of 0x1234
. The encoded form will be:
4. Encoding of string
:
string
:Let's say we want to encode the string "eth"
. First, convert it to utf-8 bytes: 0x657468
. Then, encode it as bytes:
5. Encoding of Fixed-Size Arrays:
Consider an array of two addresses [0x1234567890123456789012345678901234567890, 0x0987654321098765432109876543210987654321]
. Encoding this address[2]
array would look like:
6. Encoding of Dynamic Arrays:
For a uint[]
array with values [1, 2]
, the encoded value will first have the length of the array and then the values:
7. Encoding of Tuples:
Consider a tuple (uint256, address)
. If we want to encode (3, 0x1234567890123456789012345678901234567890)
:
These examples should provide a clearer picture of the ABI encoding process for various data types in Ethereum.
Last updated