Formal Specification of the Encoding

How different types of data are ABI-encoded in Ethereum. ABI stands for Application Binary Interface, and in the context of Ethereum, it dictates how functions in a smart contract are called and how data is represented in the Ethereum Virtual Machine.

Here's a breakdown of the provided content:

  1. Static vs Dynamic Types:

    • Static types are types whose size is known at compile-time. For instance, uint256, int8, and address are all static types.

    • Dynamic types are types whose size can vary. Examples include bytes, string, and arrays like uint[].

  2. Encoding of Tuples:

    • Tuples are ordered lists of elements.

    • A tuple's encoding is the concatenation of the encoded value of each element. If an element is dynamic, its place in the "head" part will contain an offset to its actual position in the data (which is in the "tail" part).

  3. Encoding of Arrays:

    • Fixed-size arrays (like uint[5]) are encoded as if they're tuples.

    • Dynamic arrays (like uint[]) are prefixed with the length of the array.

  4. Encoding of bytes and string:

    • Both are considered dynamic types.

    • They're encoded with a prefix indicating their length followed by the actual data.

    • Strings are first converted to utf-8 bytes and then encoded as bytes.

  5. Encoding of Integers and Addresses:

    • These are encoded in big-endian format.

    • For integers, it mentions two's complement encoding, which is a method used to represent signed integers in binary.

  6. Encoding of Boolean:

    • Booleans (bool) are encoded as uint8 where true is 1 and false is 0.

  7. Encoding of Fixed-Point Numbers:

    • Fixed-point numbers are numbers that have a fixed number of digits after the decimal point. Ethereum uses a multiplicative factor to treat them as integers for encoding purposes.

  8. Encoding of bytes<M>:

    • Fixed-length byte arrays are encoded by padding them to the right (if necessary) until they're 32 bytes long.

The key takeaway is that everything in the Ethereum ABI encoding gets broken down into chunks of 32 bytes. If a data type is fixed-size and smaller than 32 bytes, it's padded; if it's larger, it gets special treatment. For dynamic data types, an offset pointing to the actual data location is used.

Understanding ABI encoding is crucial for those who are dealing with raw Ethereum transactions or those who want a deep understanding of how data is serialized and deserialized on the Ethereum network.

1. Encoding of Integers:

Let's say we want to encode the integer 5 which is of type uint8. In big-endian format and padded to 32 bytes, it would look like:

0000000000000000000000000000000000000000000000000000000000000005

2. Encoding of Boolean:

For a bool, true is encoded as 1 and false as 0. So encoding true would look like:

0000000000000000000000000000000000000000000000000000000000000001

3. Encoding of bytes:

Let's consider a bytes value of 0x1234. The encoded form will be:

1234000000000000000000000000000000000000000000000000000000000000

4. Encoding of string:

Let's say we want to encode the string "eth". First, convert it to utf-8 bytes: 0x657468. Then, encode it as bytes:

0000000000000000000000000000000000000000000000000000000000657468

5. Encoding of Fixed-Size Arrays:

Consider an array of two addresses [0x1234567890123456789012345678901234567890, 0x0987654321098765432109876543210987654321]. Encoding this address[2] array would look like:

1234567890123456789012345678901234567890000000000000000000000000
0987654321098765432109876543210987654321000000000000000000000000

6. Encoding of Dynamic Arrays:

For a uint[] array with values [1, 2], the encoded value will first have the length of the array and then the values:

0000000000000000000000000000000000000000000000000000000000000002
0000000000000000000000000000000000000000000000000000000000000001
0000000000000000000000000000000000000000000000000000000000000002

7. Encoding of Tuples:

Consider a tuple (uint256, address). If we want to encode (3, 0x1234567890123456789012345678901234567890):

0000000000000000000000000000000000000000000000000000000000000003
1234567890123456789012345678901234567890000000000000000000000000

These examples should provide a clearer picture of the ABI encoding process for various data types in Ethereum.

Last updated