A library that wrappes BeautifulSoup to provide multi threaded scrapping, reducing the total time involved in the scrapping process. The library should implement the following affectively: (this list can be extended in future)
- Parallelization
- Should have a generic interface that maps to beautiful soup
- All parts of the library should be documented heavily
- All parts of the library should have unit tests written for verification of their functionality
- Showcase written examples for different sorts of scrapping
- https://www.xspdf.com/resolution/50178900.html
- https://gist.github.com/ratchetwrench/ccdaedabce6836ef8ab167521beb8daa
- https://medium.com/datadriveninvestor/speed-up-web-scraping-using-multiprocessing-in-python-af434ff310c5
- https://stackoverflow.com/questions/59880583/how-to-implement-multiprocessing-in-my-beautifulsoup-webscraper
- http://blog.adnansiddiqi.me/how-to-speed-up-your-python-web-scraper-by-using-multiprocessing/
- https://testdriven.io/blog/building-a-concurrent-web-scraper-with-python-and-selenium/
Some papers: