-
-
Save joerussbowman/954058 to your computer and use it in GitHub Desktop.
require 'net/http' | |
require 'uri' | |
if ARGV.length < 2 | |
puts "Syntax: urltest.rb [ redirects file ] [host ] (optional 1 for external redirects)" | |
puts "example for local redirects: urltest.rb vanity_urls.txt www.example.com" | |
exit | |
end | |
if File.exists?(ARGV[0]) | |
redirects_file = ARGV[0] | |
else | |
puts "#{ARGV[0]} does not exist." | |
exit | |
end | |
$max_redirects = 10 | |
host = ARGV[1] | |
max_threads = 5 # 4+1 for the main thread | |
threads = [] | |
def fetch(url, info, limit = $max_redirects) | |
Net::HTTP.get_response(URI.parse(url)) do |response| | |
case response | |
when Net::HTTPSuccess # just do nothing | |
when Net::HTTPRedirection then fetch(response['location'], info, limit - 1) | |
else | |
File.open("output.log", "a") do |errors| | |
errors.puts "\nError: REDIRECT INFO: #{info} REAL DESTINATION: #{url} [ #{response.code} ]\n" | |
end | |
puts "\nError: #{info}: #{response.code}\n" | |
end | |
end | |
end | |
File.open(redirects_file).each { |line| | |
if Thread.list.length < max_threads | |
# format of redirect info is: /prettyurl /redirects/to (or external.domain/redirects/to) | |
# as we're following redirects, it's not necessary to parse the final destination. | |
if line.length > 0 | |
url = "http://#{host}#{line.split(/\s/)[0]}" | |
x = Thread.new { | |
puts "Threads: #{Thread.list.length} - #{url}" | |
fetch(url, line) | |
} | |
threads << x | |
end | |
else | |
redo | |
end | |
} | |
threads.each_with_index { |name, i| threads[i].join } |
It's sent as a 2 column spreadsheet with no header. I simply open it in openoffice and cut and paste it into an empty file opened in vi. Ends up as a tab delimited file like
/aboutus/annualreport /aboutus/ouraccountability/annualreport/index.htm
/aboutus/annualreport/fy09/art30061.html /aboutus/ouraccountability/annualreport/annual-report-2009.xml
I actually have some more tweaks I want to do to it, just haven't had time to get to it. I'm pretty slammed at work lately.
updated version that takes command line arguments and cleans up output so the business user can find the redirects that are 404ing quicker. Only other thing I plan on doing is changing host to be more of a default host setting, so if redirects go to external sites the option is ignored.
ok this is the last edit, as far as I'm going to get in my environment. I'm stuck at Ruby 1.8.5 and there's policy reasons to not compile 1.9.2. The catch is I need to set a user agent because I believe some of the external redirects are blocking the default. get_response and other methods in 1.8.5 don't appear to have support for more than 1 argument so I'm not able to pass a headers hash to it. 1.9.2 appears to support it though. Going to switch to Python and get the script done and move on to other tasks.
What is your redirects_file ?