Skip to content

Instantly share code, notes, and snippets.

@amferraz
Last active December 25, 2015 18:59
Show Gist options
  • Save amferraz/7024025 to your computer and use it in GitHub Desktop.
Save amferraz/7024025 to your computer and use it in GitHub Desktop.
A simple template scraper
# coding: utf-8
import requests
from lxml import html
home = requests.get('http://www.submarino.com.br/')
home_tree = html.fromstring(home.text)
products_links_xpath = '//*[@id="tab"]/div/ul/li/div/a/@href'
products_links = home_tree.xpath(products_links_xpath)
counter = 1
total = len(products_links)
for link in products_links:
product_details = requests.get(link)
# do something ...
print "Page %d/%d" % (counter, total)
counter += 1
print "Now I'm done!"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment