Multihash

Self-describing hashes

Multihash is a protocol for differentiating outputs from various well-established hash functions, addressing size + encoding considerations. It is useful to write applications that future-proof their use of hashes, and allow multiple hash functions to coexist.

Safer, easier cryptographic hash function upgrades
The Multihash Format
Implementations
Examples
F.A.Q.
About

Safer, easier cryptographic hash function upgrades

Multihash is particularly important in systems which depend on cryptographically secure hash functions. Attacks may break the cryptographic properties of secure hash functions. These cryptographic breaks are particularly painful in large tool ecosystems, where tools may have made assumptions about hash values, such as function and digest size. Upgrading becomes a nightmare, as all tools which make those assumptions would have to be upgraded to use the new hash function and new hash digest length. Tools may face serious interoperability problems or error-prone special casing.

How many programs out there assume a git hash is a sha1 hash?

How many scripts assume the hash value digest is exactly 160 bits?

How many tools will break when these values change?

How many programs will fail silently when these values change?

This is precisely where Multihash shines. It was designed for upgrading.

When using Multihash, a system warns the consumers of its hash values that these may have to be upgraded in case of a break. Even though the system may still only use a single hash function at a time, the use of multihash makes it clear to applications that hash values may use different hash functions or be longer in the future. Tooling, applications, and scripts can avoid making assumptions about the length, and read it from the multihash value instead. This way, the vast majority of tooling – which may not do any checking of hashes – would not have to be upgraded at all. This vastly simplifies the upgrade process, avoiding the waste of hundreds or thousands of software engineering hours, deep frustrations, and high blood pressure.

The Multihash Format

A multihash follows the TLV (type-length-value) pattern.

the type <hash-func-type> is an unsigned variable integer identifying the hash function. There is a default table, and it is configurable. The default table is the multicodec table.
the length <digest-length> is an unsigned variable integer counting the length of the digest, in bytes
the value <digest-value> is the hash function digest, with a length of exactly <digest-length> bytes.

</div>
<div class="example-legend">
	
		<div class="label c-0">unsigned varint code of the hash function being used</div>
	
		<div class="label c-1">unsigned varint digest length, in bytes</div>
	
		<div class="label c-2">hash function output value, with length matching the prefixed length value</div>
	
</div>

For example:

Implementations

These implementations are available:

go-multihash
java-multihash
js-multihash
clj-multihash
rust-multihash
- by @dignifiedquire
- by @google
haskell-multihash
py-multihash
elixir-multihash, elixir-multihashing
swift-multihash
ruby-multihash
MultiHash.Net
cs-multihash
scala-multihash
php-multihash
net-ipfs-core
erlang-multihash
(add yours here)

Examples

The following multihash examples are different hash function outputs of the same exact input:

Merkle–Damgård

The multihash examples are chosen to show different hash functions and different hash digest lengths at play.

sha1 - 160 bits

sha2-256 - 256 bits (aka sha256)

sha2-512 - 256 bits

Note: this is the actual SHA-512 (as per code 0x13) truncated to 256 bits; some libraries support an hash called SHA-512/256 that has the same 256 bit length but with a different initialization vector (as defined in FIPS 180-4).

sha2-512 - 512 bits (aka sha512)

blake2b-512 - 512 bits

blake2b-256 - 256 bits

blake2s-256 - 256 bits

blake2s-128 - 128 bits

F.A.Q.

Q: Why have digest length as a separate number?

Because combining hash function code and hash digest length ends up with a function code really meaning “function-and-digest-size-code”. Makes using custom digest sizes annoying, and much less flexible. We would need hundreds of codes for all the combinations people would want to use.

Q: Why varints (variable integers)?

So that we have no limitation on functions or lengths.

Q: What kind of varints?

A Most Significant Bit unsigned varint, as defined by the multiformats/unsigned-varint doc.

Q: Don’t we have to agree on a table of functions?

Yes, but we already have to agree on functions, so this is not hard. The table even leaves some room for custom function codes.

Q: Why not use "sha256:<digest>"?

For three reasons:

(1) Multihash and all other multiformats endeavor to make the values be “in-band” and to be treated as the original value. The construction <string-prefix>:<hex-digest> is human readable and tuned for some outputs. Hashes are stored compactly in their binary representation. Forcing applications to always convert is cumbersome (split on :, turn the right hand side into binary, remove the :, concat).
(2) Multihash and all other multiformats endeavor to be as compact as possible, which means a binary packed representation will help save a lot of space in systems that use millions or billions of hashes. For example, a 100 TB file in IPFS may have as many as 400 million subobjects, which would mean 400 million hashes.
```
400,000,000 hashes * (7 - 2) bytes = 2 GB
```
(3) The length is extremely useful when hashes are truncated. This is a type of choice that should be expressed in-band. It is also useful when hashes are concatenated or kept in lists, and when scanning a stream quickly.

Q: Is Multihash only for cryptographic hashes?

What about non-cryptographic hashes like murmur3, cityhash, etc?

We decided to make Multihash work for all hash functions, not just cryptographic hash functions. The same kind of choices that people make around

We wanted to be able to include MD5 and SHA1, as they are widely used even now, despite no longer being secure. Ultimately, we could consider these cryptographic hash functions that have transitioned into non-cryptographic hash functions. Perhaps all of them eventually do.

Q: How do I add hash functions to the table?

Three options to add custom hash functions:

(1) If other applications would benefit from this hash function, propose it at the multihash repo
(2) If your function is only for your application, you can add a hash function to the table in a range reserved specially for this purpose. See the table.
(3) If you need to use a completely custom table, most implementations support loading a separate hash function table.

Q. I want to upgrade a large system to use Multihash. Could you help me figure out how?

Sure, ask for help in IRC, github, or other fora. See the Multiformats Community listing.

Q. I wish Multihash would _______. I really hate _______.

Those are not questions. But please leave any and all feedback over in the Multihash repo. It will help us improve the project and make sure it addresses our users’ needs. Thanks!

About

Specification

There is a spec in progress, which we hope to submit to the IETF. It is being worked on at this pull-request.

Credits

The Multihash format was invented by @jbenet, and refined by the IPFS Team. It is now maintained by the Multiformats community. The Multihash implementations are written by a variety of authors, whose hard work has made future-proofing and upgrading hash functions much easier. Thank you!

Open Source

The Multihash format (this documentation and the specification) is Open Source software, licensed under the MIT License and patent-free. The multihash implementations listed here are also Open Source software. Please contribute to make them great! Your bug reports, new features, and documentation improvements will benefit everyone.

Part of the Multiformats Project

Multihash is part of the Multiformats Project, a collection of protocols which aim to future-proof systems, today. Check out the other multiformats. It is also maintained and sponsored by Protocol Labs.

Self-describing hashes

Safer, easier cryptographic hash function upgrades

The Multihash Format

Implementations

Examples

sha1 - 160 bits

sha2-256 - 256 bits (aka sha256)

sha2-512 - 256 bits

sha2-512 - 512 bits (aka sha512)

blake2b-512 - 512 bits

blake2b-256 - 256 bits

blake2s-256 - 256 bits

blake2s-128 - 128 bits

F.A.Q.

Q: Why have digest length as a separate number?

Q: Why varints (variable integers)?

Q: What kind of varints?

Q: Don’t we have to agree on a table of functions?

Q: Why not use `"sha256:<digest>"`?

Q: Is Multihash only for cryptographic hashes?

What about non-cryptographic hashes like `murmur3`, `cityhash`, etc?

Q: How do I add hash functions to the table?

Q. I want to upgrade a large system to use Multihash. Could you help me figure out how?

Q. I wish Multihash would _. I really hate _.

About

Specification

Credits

Open Source

Part of the Multiformats Project

Self-describing hashes

Safer, easier cryptographic hash function upgrades

The Multihash Format

Implementations

Examples

sha1 - 160 bits

sha2-256 - 256 bits (aka sha256)

sha2-512 - 256 bits

sha2-512 - 512 bits (aka sha512)

blake2b-512 - 512 bits

blake2b-256 - 256 bits

blake2s-256 - 256 bits

blake2s-128 - 128 bits

F.A.Q.

Q: Why have digest length as a separate number?

Q: Why varints (variable integers)?

Q: What kind of varints?

Q: Don’t we have to agree on a table of functions?

Q: Why not use "sha256:<digest>"?

Q: Is Multihash only for cryptographic hashes?

What about non-cryptographic hashes like murmur3, cityhash, etc?

Q: How do I add hash functions to the table?

Q. I want to upgrade a large system to use Multihash. Could you help me figure out how?

Q. I wish Multihash would _______. I really hate _______.

About

Specification

Credits

Open Source

Part of the Multiformats Project

Q: Why not use `"sha256:<digest>"`?

What about non-cryptographic hashes like `murmur3`, `cityhash`, etc?

Q. I wish Multihash would _. I really hate _.