Multiformats / Multihash

Self-describing hashes

Multihash is a protocol for differentiating outputs from various well-established hash functions, addressing size + encoding considerations. It is useful to write applications that future-proof their use of hashes, and allow multiple hash functions to coexist.

Safer, easier cryptographic hash function upgrades

Multihash is particularly important in systems which depend on cryptographically secure hash functions. Attacks may break the cryptographic properties of secure hash functions. These cryptographic breaks are particularly painful in large tool ecosystems, where tools may have made assumptions about hash values, such as function and digest size. Upgrading becomes a nightmare, as all tools which make those assumptions would have to be upgraded to use the new hash function and new hash digest length. Tools may face serious interoperability problems or error-prone special casing.

How many programs out there assume a git hash is a sha1 hash?

How many scripts assume the hash value digest is exactly 160 bits?

How many tools will break when these values change?

How many programs will fail silently when these values change?

This is precisely where Multihash shines. It was designed for upgrading.

When using Multihash, a system warns the consumers of its hash values that these may have to be upgraded in case of a break. Even though the system may still only use a single hash function at a time, the use of multihash makes it clear to applications that hash values may use different hash functions or be longer in the future. Tooling, applications, and scripts can avoid making assumptions about the length, and read it from the multihash value instead. This way, the vast majority of tooling – which may not do any checking of hashes – would not have to be upgraded at all. This vastly simplifies the upgrade process, avoiding the waste of hundreds or thousands of software engineering hours, deep frustrations, and high blood pressure.

The Multihash Format

A multihash follows the TLV (type-length-value) pattern.

</div>
<div class="example-legend">
	
		<div class="label c-0">unsigned varint code of the hash function being used</div>
	
		<div class="label c-1">unsigned varint digest length, in bytes</div>
	
		<div class="label c-2">hash function output value, with length matching the prefixed length value</div>
	
</div>

For example:

Implementations

These implementations are available:

Examples

The following multihash examples are different hash function outputs of the same exact input:

Merkle–Damgård

The multihash examples are chosen to show different hash functions and different hash digest lengths at play.

sha1 - 160 bits

sha2-256 - 256 bits (aka sha256)

sha2-512 - 256 bits

Note: this is the actual SHA-512 (as per code 0x13) truncated to 256 bits; some libraries support an hash called SHA-512/256 that has the same 256 bit length but with a different initialization vector (as defined in FIPS 180-4).

sha2-512 - 512 bits (aka sha512)

blake2b-512 - 512 bits

blake2b-256 - 256 bits

blake2s-256 - 256 bits

blake2s-128 - 128 bits

F.A.Q.

Q: Why have digest length as a separate number?

Because combining hash function code and hash digest length ends up with a function code really meaning “function-and-digest-size-code”. Makes using custom digest sizes annoying, and much less flexible. We would need hundreds of codes for all the combinations people would want to use.

Q: Why varints (variable integers)?

So that we have no limitation on functions or lengths.

Q: What kind of varints?

A Most Significant Bit unsigned varint, as defined by the multiformats/unsigned-varint doc.

Q: Don’t we have to agree on a table of functions?

Yes, but we already have to agree on functions, so this is not hard. The table even leaves some room for custom function codes.

Q: Why not use "sha256:<digest>"?

For three reasons:

Q: Is Multihash only for cryptographic hashes?

What about non-cryptographic hashes like murmur3, cityhash, etc?

We decided to make Multihash work for all hash functions, not just cryptographic hash functions. The same kind of choices that people make around

We wanted to be able to include MD5 and SHA1, as they are widely used even now, despite no longer being secure. Ultimately, we could consider these cryptographic hash functions that have transitioned into non-cryptographic hash functions. Perhaps all of them eventually do.

Q: How do I add hash functions to the table?

Three options to add custom hash functions:

Q. I want to upgrade a large system to use Multihash. Could you help me figure out how?

Sure, ask for help in IRC, github, or other fora. See the Multiformats Community listing.

Q. I wish Multihash would _______. I really hate _______.

Those are not questions. But please leave any and all feedback over in the Multihash repo. It will help us improve the project and make sure it addresses our users’ needs. Thanks!

About

Specification

There is a spec in progress, which we hope to submit to the IETF. It is being worked on at this pull-request.

Credits

The Multihash format was invented by @jbenet, and refined by the IPFS Team. It is now maintained by the Multiformats community. The Multihash implementations are written by a variety of authors, whose hard work has made future-proofing and upgrading hash functions much easier. Thank you!

Open Source

The Multihash format (this documentation and the specification) is Open Source software, licensed under the MIT License and patent-free. The multihash implementations listed here are also Open Source software. Please contribute to make them great! Your bug reports, new features, and documentation improvements will benefit everyone.

Part of the Multiformats Project

Multihash is part of the Multiformats Project, a collection of protocols which aim to future-proof systems, today. Check out the other multiformats. It is also maintained and sponsored by Protocol Labs.