Skip to content

Instantly share code, notes, and snippets.

@parkan
Forked from denisnazarov/ccr.md
Last active December 30, 2015 23:46
Show Gist options
  • Save parkan/d2e190faacab683d6637 to your computer and use it in GitHub Desktop.
Save parkan/d2e190faacab683d6637 to your computer and use it in GitHub Desktop.
Canonical Content Registry

Canonical Content Registry Foundation

Today, there is no reliable way to persist metadata for digital media as it travels across the internet.

Mine is working to build a global content registry on top of the Bitcoin blockchain to serve as an open metadata layer for canonical representations of digital media.

The goal of such a registry is to enable a new decentralized hypermedia protocol that powers the next generation of digital content applications, where creators and consumers to own their media, identity and interactions across the internet, without dependency on industrial or platform gatekeepers.

In March, we published a high level summary of how such such a system could work, titled the Canonical Content Registry. Today, we are taking the first steps to start building it by sharing a proposal for a technical implementation on top of Blockstore. We welcome your feedback and look forward to starting a conversation.

Registration Store

This layer stores the actual metadata and annotations on top of blockstore. We use a namespace with a very low registration cost, no length or vowel discounts, and no expiration. Each work is stored under an opaque identifier "name" with a metadata "profile" [NOTE: this may still be prohibitively expensive due on transaction costs, so we may need to bundle multiple registrations up to the full 8k block] details of registration in blockstore TBD

Metadata blocks are written to IPFS as tokenized, signed claims (similar to the statements in blockchain profiles). The claims may be signed by the author, an organization representing same (gallery, record label, archival service, etc), an interested third party (a fan or volunteer editor), or by multiple parties together, such as an image annotation platform (e.g. https://mmmine.com/) and that platform's contributing user.

The records should be written to IPNS names to provide mutability and versioning, but can also be written to normal IPFS ids until IPNS is functional.

It is expected that a single block will contain a set of complementary claims about a single resource, such as a song or photograph.

Perceptual Resolver

This layer allows location of metadata identifiers pointing into the registration store based on perceptual similarity. One possible design would use perceptual hashes and a Hamming DHT for efficient fuzzy lookups.

Non-perceptual resolver/search

String-based search or other lookup mechanisms can be added for lookups into the metadata store.

Reputation System

Because it is impossible to computationally verify true authorship, we must defer to authority in the subject matter. For example, a claim of 3rd party authorship on a historical photo signed by the Library of Congress is very likely true, while a claim of self ownership on a popular recording by an anonymous user is likely not trustworthy and should not be used for e.g. royalty payments. We can also describe more granular reputation structures, for example an annotation community like Mine can co-sign a claim together with the user before writing it out to account for moderation happening internal to the site.

Prototype Implementation (MusicBrainz port)

For the prototype, a portion of the MusicBrainz database is used to populate the Registration Store, with corresponding AcousticID fingerprints used for the resolver, with the following modifications:

  • Instead of the Hamming DHT approach for the resolver, a simple network flooding search is used
  • The relational MusicBraiz schema is translated to something similar to schema.org's MusicRecording and written as primary metadata
  • Additional hashes are not used
  • Plain IPFS ids are used
  • No blockstore registrations are performed

Problem Areas

  • Because IPFS makes no guarantees of data availability, redundancy or longevity, an aggressive caching/pinning approach is needed, possibly affecting IPFS core
  • IPNS currently allows only one registration per node, but we will need thousands, so plain immutable IPFS ids must be used for the moment. git-style version tree traversal is also not currently implemented. Both issues are roadmapped/planned
  • Namespace/object type support is not implemented, so it's not currently possible to mirror the entire dataset on the HTTP web or automatically pin metadata objects from peers (however see ipfs/ipfs#36)
  • The design of the reputation system, consolidation of conflicting claims, and lookups are open problems

Written by @parkan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment