Skip to content

Instantly share code, notes, and snippets.

@ArthurN
Created August 27, 2024 21:55
Show Gist options
  • Save ArthurN/d151d06f881a16498216f02380e28b15 to your computer and use it in GitHub Desktop.
Save ArthurN/d151d06f881a16498216f02380e28b15 to your computer and use it in GitHub Desktop.
extruct-test.py
import pprint
import extruct
import requests
from w3lib.html import get_base_url
# https://www.crosswaterlondon.com/product/mpro-towel-holder?variant=54381#image-2
pp = pprint.PrettyPrinter(indent=0)
r = requests.get('https://www.signaturehardware.com/ruscello-widespread-bathroom-faucet/951334.html')
base_url = get_base_url(r.text, r.url)
data = extruct.extract(r.text, base_url=base_url)
pp.pprint(data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment