# 1. --- install system dependencies (sudo apt-get install)
$ sudo apt-get install python3 python-dev python3-dev \
> build-essential libssl-dev libffi-dev \
> libxml2-dev libxslt-dev \
> python-pip
[sudo] password for scrapyuser:
Reading package lists... Done
# 2. --- install virtualenvwrapper (sudo pip install)
# see https://virtualenvwrapper.readthedocs.io/en/latest/install.html
# also check http://roundhere.net/journal/virtualenv-ubuntu-12-10/
$ sudo pip install virtualenvwrapper
Installing collected packages: virtualenv-clone, pbr, six, stevedore, virtualenv, virtualenvwrapper
Running setup.py install for virtualenv-clone ... done
Successfully installed pbr-1.10.0 six-1.10.0 stevedore-1.15.0 virtualenv-15.0.2 virtualenv-clone-0.2.6 virtualenvwrapper-4.7.1
You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
$ workon
bash: workon: command not found
# 3. --- need to load startup file
# see https://virtualenvwrapper.readthedocs.io/en/latest/install.html#shell-startup-file
$ source /usr/local/bin/virtualenvwrapper.sh
$ workon
# 4. --- create a Python 3 virtual environment
# also see https://virtualenv.pypa.io/en/stable/reference/#virtualenv-command for options
$ mkvirtualenv --python=python3 scrapy.py3
# 5. --- install scrapy in the virtualenv
$ pip install scrapy
# 6. --- testing scrapy commands
$ scrapy
Scrapy 1.1.0 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
commands
fetch Fetch a URL using the Scrapy downloader
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
# 7. --- check that you're running Scrapy with Python 3
(scrapy.py3) scrapyuser@8fb08da8f18b:/$ scrapy version -v
Scrapy : 1.1.0
lxml : 3.6.0.0
libxml2 : 2.9.3
Twisted : 16.2.0
Python : 3.5.1+ (default, Mar 30 2016, 22:46:26) - [GCC 5.3.1 20160330]
pyOpenSSL : 16.0.0 (OpenSSL 1.0.2g-fips 1 Mar 2016)
Platform : Linux-4.4.0-24-generic-x86_64-with-Ubuntu-16.04-xenial
# 8. --- test scrapy shell
(scrapy.py3) scrapyuser@8fb08da8f18b:/$ scrapy shell http://scrapy.org
2016-06-17 11:05:49 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-06-17 11:05:49 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-06-17 11:05:49 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats']
2016-06-17 11:05:50 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-06-17 11:05:50 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-06-17 11:05:50 [scrapy] INFO: Enabled item pipelines:
[]
2016-06-17 11:05:50 [scrapy] INFO: Spider opened
2016-06-17 11:05:50 [scrapy] DEBUG: Crawled (200) <GET http://scrapy.org> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x7f5d174d65c0>
[s] item {}
[s] request <GET http://scrapy.org>
[s] response <200 http://scrapy.org>
[s] settings <scrapy.settings.Settings object at 0x7f5d1b7b1390>
[s] spider <DefaultSpider 'default' at 0x7f5d166f2a90>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
>>> response.xpath('//h1')
[]
>>> response.xpath('//title').extract_first()
'<title>Scrapy | A Fast and Powerful Scraping and Web Crawling Framework</title>'
>>>
-
-
Save vijayanandrp/e01cceb82a90ceaa54c671a70780bd25 to your computer and use it in GitHub Desktop.
Installing scrapy 1.1 on Ubuntu 16.04 on Python 3, using virtualenvwrapper
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# When the cache is clear, pip is working again. | |
hash -r | |
1. sudo apt-get install python3 python-dev python3-dev build-essential libssl-dev libffi-dev libxml2-dev libxslt-dev python3-pip | |
2. sudo pip3 install virtualenvwrapper | |
3. workon [try this command in terminal if it not works go to point 4] | |
4. source /usr/local/bin/virtualenvwrapper.sh (or) source ~/.local/bin/virtualenvwrapper.sh | |
5. workon [try this command in terminal, it will work definitely] | |
6. mkvirtualenv --python=python3 scrapy.py3 # Create a environment variable for scrapy project | |
7. pip3 install scrapy | |
8. scrapy | |
9. scrapy version -v | |
10. scrapy shell http://scrapy.org | |
11. deactivate | |
12. rmvirtualenv venv # To delete the virtual environment variable |
Author
vijayanandrp
commented
Jun 6, 2017
•
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment