Serializing Public Keys

enigbe ochekliye
6 min readMar 3, 2022

Serialization refers to the transformation of an object or a data structure into a format that can be transmitted across a network, saved to disk, or to memory, to be reconstructed/parsed for later use. It is the conversion of an object’s instance into binary or text form for the use cases thus mentioned.

Serialization is important because of the need to communicate different object data structures in efficient ways. In this article I will share some information about how public keys are serialized for transmission and use on the bitcoin network.

However, before serializing data into a binary representation, it is important to know what endianness is and how it affects the process.

Endianness

Endianness is byte-order and refers to the way computers order/organize bytes that make up the digital data in a computer’s memory (MDN, 2021) or the order in which bits are relayed over a channel. consider the hexadecimal representation of a 32-byte number 0x0A0B0C0D as shown below.

Figure 1: 32-byte word

To access whatever data is encoded into this 32-byte word, the address of where this data is stored needs to be known. There are two ways a computer can access each byte in this word; the computer could access from the right end or the left end. These are known as big and little endianness. In the former, as shown in the figure below, the byte with the most significant byte (MSB) is saved to the highest address location while its least significant byte (LSB) can be accessed at the smallest address.

Figure 2: Big-endian byte order for a 32-byte word. (Image replicated as shown in Ref: 3)

For little-endian representations, the reverse is the case. Here the LSB is accessed at the highest address while the MSB is accessed at the lowest address containing the 32-byte word. But why does this matter? Some bitcoin objects (the x- and y-coordinates of public keys) are serialized as big-endian and others (transaction version number, Merkle root, block timestamp) as little-endian. Because computers can be more efficient using the little-endian order as opposed to the big-endian, the byte-ordering for different objects must be taken into consideration when serializing and/or de-serializing them.

Public Keys

Public keys are points P(x,y) on the secp246k1 elliptic curve that bitcoin uses. They have x and y coordinates and are serialized using the Standards for Efficient Cryptography (SEC) — a known standard for Elliptic Curve Digital Signature Algorithm (ECDSA) public keys. For public keys, there are two SEC formats: uncompressed and compressed.

For the uncompressed format, a serialization of the point P(x,y) is created by doing the following:

  1. Define a 1-byte prefix: 0x04
  2. Add to the prefix the x-coordinate of P(x,y) encoded as 32-byte big-endian integer
  3. Add to the prefix and x-coordinate the y-coordinate encoded as 32-byte big-endian integer
Figure 3: 65-byte uncompressed SEC format for public key

Given a simple integer representation of the coordinates of the public key, a function to serialize

Listing 1: Simple public key class with integer coordinates with a method to generate uncompressed SEC serialization format

Looking at Figure 3, we see that the uncompressed SEC format is 65 bytes long. The need arose to reduce the size of transactions and to conserve disk space on full nodes. Compressed public keys achieve this reduction by almost half the size of the uncompressed one (compressed public keys are 33 bytes long). This is done by some clever mathematics where just the prefix (encoding information about the evenness of the y-coordinate in 1 byte) and the x-coordinate (32 bytes) is stored.

The necessary background to understand the underlying mathematics about how compressed public keys are serialized can be found in [2]. The serialization process is:

  1. Defining a prefix byte with a value of either 0x02, depicting an even y coordinate, or 0x03, depicting an odd y
  2. Add to the prefix the x-coordinate of P(x,y) encoded as 32-byte big-endian integer
Figure 4: 33-byte compressed SEC format for public key
Listing 2: Simple public key class with integer coordinates with method to generate uncompressed and compressed SEC serialization format

Base58 Format Addresses

Addresses are alphanumeric strings that users can share to receive bitcoin. An address, in the simplest case, is a string derived from a public key using two hashing algorithms: SHA256 and RACE Integrity Primitives Evaluation Message Digest (RIPEMD).

Listing 3: SHA256 followed by RIPEMD160 to generate a digital fingerprint

Given the byte and space savings of the compressed format of a public key, it is still considered too long and difficult to read. To address the length, readability, and security of public keys, the need arose to use an encoding format that could express more bits per character. Mixed alpha-numeric representations with a base greater than 10 can be used to make long characters more compact.

The hexadecimal (hex) encoding is one of such representations. For a 65-byte public key, the hex encoding has twice as many characters (see Figure 5) where 2 hex characters are needed to encode 1 byte. This makes for a 130-character long public key — bad for readability. As much as the hex representation is much shorter than a decimal representation, a Base 64 offers even more compactness.

Figure 5: Hexadecimal representation of 1 byte. Here we see 4 bits/character

A subset of Base 64, i.e. Base 58 encodes 6 bits/character and makes for a smaller (~87) characters set. Base 58 consists of all numbers (0…9), lowercase alphabets (a…z), and uppercase alphabets (A…Z). However, six (6) lookalike numbers/alphabets (0, O, l, I, -, _) and symbols were removed to form Base 58 and are used to encode addresses where bitcoins could be sent to.

Base 58 encoding is great because it offers the following benefits:

  1. Compactness
  2. Easy to read
  3. Error detection

Listing 3 shows how to achieve compactness and better human readability for any bytes.

Listing 4: Base 58 encoding

For error detection, Base58Check encoding format is used in bitcoin. This has built-in error-checking with a 4-byte checksum derived from the hashed SEC public key and added to the end of the SEC public key. The encoding process is as stated below:

  1. A ‘version byte’ known prefix is prefixed to the SEC public key. This version prefix could be any one of the following — 0x00 (1) for bitcoin address, 0x6f (m or n) for bitcoin testnet address, 0x0488b21e (xpub) for extended public keys (not discussed in this article). The version byte helps with readability.
  2. The checksum is computed by applying SHA256 hashing algorithm, twice, to the result above, extracting the first 4 bytes, and appending the result to the end (prefix + SEC public key + checksum). These bytes will help with error detection.
  3. The result from 2 above is encoded with Base 58 function to address compactness.

A complete implementation with a simplified public key class is as shown below

Listing 5: Base58Check address format for a simplified public key class

Conclusion

This article explains endianness and how they affect serializing bitcoin objects, the SEC format for uncompressed and compressed public keys, and the base58check encoding of SEC addresses that improve readability, compactness, and security. I have been learning about bitcoin and found it helpful to conceive of serialization like I have outlined — with schematic representations of binary objects, and code. If you would like a more accurate implementation of a public key, you should get a copy of Programming Bitcoin by Jimmy Song or read it online.

If you found this article helpful or have recommendations to make it better, please do not hesitate to contact me. Happy reading.

References

  1. MDN Contributors (2021, October 8): Endianness. https://developer.mozilla.org/en-US/docs/Glossary/Endianness. Accessed 28 February 2022
  2. Song, J. (2019). Programming bitcoin: Learn how to program bitcoin from scratch O’Reilly Media.
  3. Wikipedia (N.D.): Endianness. https://en.wikipedia.org/wiki/Endianness#Overview. Accessed 28 February 2022
  4. Antonopoulos, A. (2017). Mastering bitcoin: Programming the open blockchain

--

--