Skip to content

Instantly share code, notes, and snippets.

@jahands
Last active October 17, 2021 17:49
Show Gist options
  • Save jahands/ab88625f65811735f4d782499107ccd5 to your computer and use it in GitHub Desktop.
Save jahands/ab88625f65811735f4d782499107ccd5 to your computer and use it in GitHub Desktop.
Convert a csv to json key-value pairs for importing to Cloudflare Workers KV (REQUIRES POWERSHELL 7!!)
# This version is for very large datasets that might be too big for Wrangler
# You might have to run `sed -i '1s/^\xEF\xBB\xBF//' FILE.json` on the output file if wrangler gives errors
# Example row of data:
# Slug ID : 2723
# Slug Name : AJ_FV057
# Report Title : Wholesale Market Asian Vegetable Report - Atlanta, GA (AJ_FV057)
# Published Date : 2/24/2021 12:10
# Report Date : 2/24/2021
# Report Status : Final
# Document : view report
# viewReportUrl : https://mymarketnews.ams.usda.gov/viewReport/2723
# reportFilePath : /filerepo/sites/default/files/2723/2021-02-24/421297/ams_2723_00253.txt
# UPDATE FILE NAMES and optionally file_size for output file size
$input_file = 'data.csv'
$global:output_file = 'data-out-{N}.json'
$file_size = 100000 # Items per file
Function getOutputFile($index) {
return $global:output_file.replace('{N}', $index)
}
$first = $true
$current_count = 0 # Size of current file
$index = 0 # what file we're on
# Start the first file
Set-Content $(getOutputFile $index) "[" -Encoding utf8
Import-Csv $input_file | ForEach-Object {
$current_count++
if($current_count -ge $file_size) {
# End the current file
"]" | Out-File -Encoding utf8 $(getOutputFile $index) -Append
$index++
$current_count = 0
# Start the next file
Set-Content $(getOutputFile $index) "[" -Encoding utf8
$first = $true
}
$(if (-not $first) { ',' }else { '' }) +
(@{
key = $_.'Slug ID'
value = (@{
'Report Title' = $_.'Report Title'
'Report Date' = $_.'Report Date'
} | ConvertTo-Json -Compress)
} | ConvertTo-Json -Compress) |
Out-File -Append -Encoding utf8 $(getOutputFile $index);
$first = $false }
"]" | Out-File -Encoding utf8 $(getOutputFile $index) -Append
# Example Output:
# [
# {"key":"2298","value":"{\"Report Date\":\"1/14/2021\",\"Report Title\":\"Dallas Terminal Market Fruit Prices (DA_FV010)\"}"}
# ,{"key":"2802","value":"{\"Report Date\":\"1/14/2021\",\"Report Title\":\"Wholesale Market Misc Herbs Report - Dallas, TX (DA_FV055)\"}"}
# ]
# You might have to run `sed -i '1s/^\xEF\xBB\xBF//' FILE.json` on the output file if wrangler gives errors
# Example row of data:
# Slug ID : 2723
# Slug Name : AJ_FV057
# Report Title : Wholesale Market Asian Vegetable Report - Atlanta, GA (AJ_FV057)
# Published Date : 2/24/2021 12:10
# Report Date : 2/24/2021
# Report Status : Final
# Document : view report
# viewReportUrl : https://mymarketnews.ams.usda.gov/viewReport/2723
# reportFilePath : /filerepo/sites/default/files/2723/2021-02-24/421297/ams_2723_00253.txt
$input_file = 'data.csv'
$output_file = 'data-out.json'
Set-Content $output_file "[" -Encoding utf8
$first = $true
Import-Csv $input_file | ForEach-Object {
$(if (-not $first) { ',' }else { '' }) +
(@{
key = $_.'Slug ID'
value = (@{
'Report Title' = $_.'Report Title'
'Report Date' = $_.'Report Date'
} | ConvertTo-Json -Compress)
} | ConvertTo-Json -Compress) | Out-File -Append -Encoding utf8 $output_file; $first = $false }
"]" | Out-File -Encoding utf8 $output_file -Append
# Example Output:
# [
# {"key":"2298","value":"{\"Report Date\":\"1/14/2021\",\"Report Title\":\"Dallas Terminal Market Fruit Prices (DA_FV010)\"}"}
# ,{"key":"2802","value":"{\"Report Date\":\"1/14/2021\",\"Report Title\":\"Wholesale Market Misc Herbs Report - Dallas, TX (DA_FV055)\"}"}
# ]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment