Last active
December 31, 2015 10:08
-
-
Save tdtds/7970902 to your computer and use it in GitHub Desktop.
BOOKSCANから送られてきたPDFに、Amazonの書影で表紙を付け、メタ情報のCreaterをScanSnapに変更するスクリプト。srcに置いて、dstに出力。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
require 'open-uri' | |
require 'rexml/document' | |
def metainfo(isbn) | |
uri = 'http://rpaproxy.tdiary.org/rpaproxy/jp/' | |
uri << "?Service=AWSECommerceService" | |
uri << "&SubscriptionId=1CVA98NEF1G753PFESR2" | |
uri << "&Operation=ItemLookup" | |
uri << "&ItemId=#{isbn}" | |
uri << "&IdType=ASIN" | |
uri << "&ResponseGroup=Medium" | |
uri << "&Version=2011-08-01" | |
xml = open(uri, &:read) | |
meta = {} | |
doc = REXML::Document::new(REXML::Source::new(xml)).root | |
item = doc.elements.to_a( '*/Item' )[0] | |
meta[:title] = item.elements.to_a('*/Title').first.text | |
meta[:author] = [].tap{|a|item.elements.each('*/Author'){|author|a << author.text}}.sort.uniq.join(',') | |
meta[:cover] = (item.elements.to_a('LargeImage').first || item.elements.to_a('ImageSets/ImageSet\LargeImage').first).elements['URL'].text | |
if meta[:title] =~ /\(.[^(]+\)/ | |
meta[:title] = meta[:title].sub(/\(.[^(]+\)/, '').strip | |
end | |
meta | |
end | |
def dstfile(meta) | |
"dst/" + "#{meta[:title]} - #{meta[:author]}.pdf".tr('<>\\/', '()\/') | |
end | |
Dir.glob('org/*.pdf').each do |org| | |
# getting metainfo from amazon | |
base = File.basename(org) | |
print "#{base}: " | |
isbn = base.scan(/.*_(.*)\.pdf/).first.first | |
begin | |
meta = metainfo(isbn) | |
rescue OpenURI::HTTPError | |
puts 'Amazon error, skip' | |
next | |
end | |
if File.exist?(dstfile(meta)) | |
puts 'Dest file exist, skip' | |
next | |
end | |
print '.' | |
# replace coreation tool to ScanSnap | |
pdfmeta = '' | |
system("pdftk '#{org}' dump_data output meta.txt") | |
open('meta.txt', &:read).split(/\n/).each_slice(2) do |pair| | |
if pair[0] =~ /InfoKey:\s*Creator$/ | |
pair[1] = "InfoValue: PFU ScanSnap Manager 5.1.10 #S1300" | |
end | |
pdfmeta << pair.join("\n") << "\n" | |
end | |
open('meta2.txt', 'w'){|o|o.write(pdfmeta)} | |
print '.' | |
system("pdftk '#{org}' update_info meta2.txt output tmp.pdf") | |
print '.' | |
# insert cover image | |
open('cover.jpg', 'wb') do |o| | |
o.write(open(meta[:cover], 'rb', &:read)) | |
end | |
print '.' | |
system('sam2p -j:quiet cover.jpg cover.pdf') | |
print '.' | |
system("pdftk cover.pdf tmp.pdf cat output '#{dstfile(meta)}'") | |
puts 'done' | |
# delete tmp files | |
%w(tmp.pdf meta.txt meta2.txt cover.jpg cover.pdf).map{|file|FileUtils.rm file} | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment