Skip to content

Instantly share code, notes, and snippets.

@dfl
Last active December 7, 2023 19:56
Show Gist options
  • Save dfl/2697619352935d6f3e9a8c469f02613b to your computer and use it in GitHub Desktop.
Save dfl/2697619352935d6f3e9a8c469f02613b to your computer and use it in GitHub Desktop.
ruby script for downloading kajabi course for archival purposes
# This script requires the nokogiri gem, and youtube-dl to be installed.
# It will fetch video, mp3, and text results.
# download cookies.txt using Chrome plugin "Get cookies.txt"
# download page source as course-page.html
quality = "iphone-360p" # to find quality options, use youtube-dl -v -F --cookies cookies.txt https://{KAJABI_COURSE}/categories/#####/posts/#####
cookies = "=cookies.txt"
course_url = "https://www.MYCOURSE.COM"
domain = course_url.split("/")[2]
# Step 1. parse all the media URLs and titles
file = "#{course_url.split("/").last}.html"
`wget -qO- --load-cookies #{cookies} #{course_url} > #{file}`
require 'nokogiri'
f = File.read(file).gsub(/\n\s+/,'')
doc = Nokogiri::HTML.parse(f) do |config|
config.noblanks
end
doc.css("style").remove
course_title = doc.css("h1").text
outline = doc.css(".product-outline")
categories = outline.css(".product-outline-category")
list = {};
categories.each_with_index do |cat,idx|
posts = []
title = "#{idx} - #{cat.text}"
if subtitle = cat.css("+ .product-outline-subcategory").first
title = { cat.text => subtitle.text }
end
post = (subtitle || cat).css("+ .product-outline-post").first
posts << [ post.css(".media-body").text.strip.gsub('"',''), post[:href] ]
e = post
loop do
if post = e.css("+ .product-outline-post").first
posts << [ post.css(".media-body").text.strip.gsub('"',''), post[:href] ]
e = post;
elsif subtitle = e.css("+ .product-outline-subcategory").first
list[title] = posts;
posts = [];
title = { cat.text => subtitle.text }
e = subtitle
else
list[title] = posts;
break
end
end
end
# Step 2. fetch the files
require 'fileutils'
list.each_pair do |k,v|
case k
when String
path = k
when Hash
path = k.to_a.join("/")
end
path = [course_title, path].join("/") # prepend course title
FileUtils.mkpath(path)
v.each do |title, url|
p path
`youtube-dl -v -f #{quality}/#{quality}-0/mp3_audio --download-archive downloaded.txt --cookies #{cookies} -o "./#{path}/#{title}.%(ext)s" https://#{domain}#{url}`
p $?.exitstatus
unless $?.success? #not a video, get file
`wget -qO- --load-cookies #{cookies} https://#{domain}#{url} > "./#{path}/#{title}.html"`
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment