Created
May 1, 2015 15:00
-
-
Save giocomai/247d54e097b5083e2451 to your computer and use it in GitHub Desktop.
Download a webpage with phantomjs from the command line. This allows to wait for javascript to be processed before saving the page, which cannot be achieved with wget. Download SaveWebpage.js, and then, from the terminal, run: phantomjs SaveWebpage.js URL nameOfSavedFile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var system = require('system'); | |
var page = require('webpage').create(); | |
var url = system.args[1]; | |
var destination = system.args[2]; | |
page.settings.resourceTimeout = 10000; | |
setTimeout(function(){ | |
setInterval(function () { | |
var fs = require('fs'); | |
var page = require('webpage').create(); | |
page.open(url, function () { | |
console.log(page.content); | |
try { | |
fs.write(destination, page.content, 'w'); | |
} catch(e) { | |
console.log(e); | |
} | |
phantom.exit(); | |
}); | |
}, 20000); | |
}, 1); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
this timeouts after 20 seconds (it's the 20000 in line 22, time expressed in milliseconds), and saves whatever part of the page it managed to read until that moment (if there was something blocking it in the middle). The structure is more complicated than should be required, but it's because it includes the workaround to make sure the timeout actually works as suggested here: ariya/phantomjs#10832 (comment)