Skip to content

Instantly share code, notes, and snippets.

@fireflycons
Created February 9, 2018 09:37
Show Gist options
  • Save fireflycons/de3a5255b77d94292c5ad43c602b6d7d to your computer and use it in GitHub Desktop.
Save fireflycons/de3a5255b77d94292c5ad43c602b6d7d to your computer and use it in GitHub Desktop.
Compute Amazon S3 ETag for a local file
<#
.SYNOPSIS
Compute Amazon S3 ETag for a local file
.DESCRIPTION
ETags represent a hash of the content of a file stored in S3.
Comaparing ETags can be used to determine
- If a file in S3 is the same as one you are going to upload
- Following an upload, whether the file was successfully uploaded.
.PARAMETER Path
Path to file to compute the ETag for
.PARAMETER MinimumPartSize
Minimum size in bytes of a part for mulitpart upload (default 5MB as per Write-S3Object)
.PARAMETER MinimumSizeBeforeMultipartUpload
Minimum file size, above which multipart upload will be selected (default 16MB as per Write-S3Object)
.OUTPUTS
[string] The computed ETag
.NOTES
The defaults for MinimumPartSize and MinimumSizeBeforeMultipartUpload are based on the values used by Write-S3Object.
Other clients (eg. AWS CLI) may use different values, and application code may use any values as determined by the application's developers.
For a single part upload, the ETag is simply the MD5 of the whole file converted to hex string.
For multipart, concatentate the binary MD5 hash of each part, then compute MD5 hash of the concatenated binary data, convert to hex string and append the number of parts to this.
#>
param
(
[Parameter(Mandatory=$true)]
[string]$Path,
[long]$MinimumPartSize = 5MB,
[long]$MinimumSizeBeforeMultipartUpload = 16MB
)
try
{
# Get file size
$fileSize = (Get-Item $Path).Length
# Calculate part size
$partSize = [long]([Math]::Max([Math]::Ceiling([double]$fileSize / 10000.0), $MinimumPartSize))
# Calculate number of parts
$numberOfParts = [int]($fileSize / $partSize)
if ($fileSize % $partSize -gt 0)
{
++$numberOfParts
}
# Buffer to read file parts into
$buf = New-Object byte[] $partSize
# Will hold the final hash result.
$md5Hash = $null
# Create MD5 hash algorithm
$md5 = [Security.Cryptography.HashAlgorithm]::Create("MD5")
# Open input file for reading
$inputStream = [System.IO.File]::OpenRead($Path)
# Stream to write part hashes to as we compute them
$hashBuffer = $null
# Is file large enough to do a multipart upload?
if ($fileSize -ge $minimumSizeBeforeMultipartUpload)
{
# File name for status messages
$filename = [IO.Path]::GetFileName($Path)
# Create the stream that will concatenate chuck hashes.
$hashBuffer = New-Object System.IO.MemoryStream
# Counter for number of parts read so far
$partsRead = 0
if ([Environment]::UserInteractive)
{
# Show progress if running at the command line
Write-Progress -Activity "Computing ETag" -Status $filename -PercentComplete 0
}
# Read each part
while (($bytesRead = $inputStream.Read($buf, 0, $buf.length)) -ne 0)
{
if ([Environment]::UserInteractive -and $partsRead % 10 -eq 0)
{
# Show progress every 10 parts read if running at the command line
Write-Progress -Activity "Computing ETag" -Status "$filename - Part $($partsRead)/$($numberOfParts)" -PercentComplete ($partsRead * 100 / $numberOfParts)
}
++$partsRead
# Hash the part
$partMd5Hash = $md5.ComputeHash($buf, 0, $bytesRead)
# and write to the buffer.
$hashBuffer.Write($partMd5Hash, 0, $partMd5Hash.Length)
}
# Seek to start of the buffer
$hashBuffer.Seek(0, 'Begin') | Out-Null
# and compute the hash of hashes
$md5Hash = $md5.ComputeHash($hashBuffer)
if ([Environment]::UserInteractive)
{
# Remove progress bar if running at the command line
Write-Progress -Activity "Computing ETag" -Completed
}
}
else
{
# Single part upload - ETag is just MD5 of the whole file.
$partsRead = 1
$md5Hash = $md5.ComputeHash($inputStream)
}
# Build the ETag
$eTag = [System.BitConverter]::ToString( $md5Hash ).Replace('-', [string]::Empty).ToLower()
if ($partsRead -gt 1)
{
# For multipart ETag, append the number of parts.
$eTag = $eTag + "-$partsRead"
}
# Emit result
$eTag
}
finally
{
# Dispose any IDisposable CLR objects created.
($inputStream, $md5, $hashBuffer) |
foreach {
if ($_)
{
$_.Dispose()
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment