There can be two versions of a Dat Bag:
- "Holey" Bag - meant for archviving of metadata at specific Dat archive version.
- Complete Bag - a complete backup of an archive checkpoint.
- Serialized Bag - this would be for bag with Dat without having Dat stuff inside the bag
A "Holey" bag contains a fetch.txt
file which points to where to download the rest of the data if file is not in data/
payload.
dat/
| bagit.txt
| manifest-sha256.txt
| bag-info.txt
| tagmanifest-sha256.txt
| fetch.txt (contains dat:// links for all files)
\--- data/
| [empty]
\--- dat-tags/
| version
| metadata.key
| metadata.signatures
| metadata.bitfield
| metadata.tree
| metadata.data
| content.key
| content.signatures
| content.bitfield
| content.tree
- Fetch file:
fetch.txt
containsdat://
links to all files (each line has formatURL LENGTH FILENAME
) - Tag Files: Copy
.dat
data todat-tags
. Published for a specific archive checkpoint (dat-tags/version
).
The "tags" are metadata files intended to facilitate and document the storage and transfer of the bag.
Resource: https://docs.google.com/document/d/1JqKMFn9KfeIMAAEdOGQr6LZPqNWx8Qubi12uoUXi2QU/edit
Similar to above but with all the files resolved and copied to data
payload folder.
dat/
| bagit.txt
| manifest-sha256.txt
| bag-info.txt
| tagmanifest-sha256.txt
\--- data/
| [all files downloaded]
\--- dat-tags/
| version
| metadata.key
| metadata.signatures
| metadata.bitfield
| metadata.tree
| metadata.data
| content.key
| content.signatures
| content.bitfield
| content.tree
Similar to "complete dat bag" but also with dat-tags/content.data
containing full version history. This would probably only work if we didn't have to hash content.data
file but just used the blake hash in the tagmanifest.
We could also keep the dat metadata outside the bag, which may be better for archive or transport:
dat/
| my-bag.tar.gz (or my-bag.zip)
| bag-sha256.txt (optional - checksum for serialized bag)
\--- .dat/
| metadata.key
| metadata.signatures
| metadata.bitfield
| metadata.tree
| metadata.data
| content.key
| content.signatures
| content.bitfield
| content.tree
Or for non-serialized bags:
dat/
\--- my-bag/
| bagit.txt
| bag-info.txt
| ... etc
\--- data/
| bag-sha256.txt (optional - checksum for serialized bag)
\--- .dat/
| metadata.key
| metadata.signatures
| metadata.bitfield
| metadata.tree
| metadata.data
| content.key
| content.signatures
| content.bitfield
| content.tree
- The mainfest can contain any type of checksum, so we could use blake and still in be spec.
DPN basically makes a merkle tree via bagit and uses the root in their registry =)
DPN Bag Transfer Protocol
tagmanifest-sha256.txt includes sha256 hash of manifest.txt which has the hash of the content.