Last active
March 25, 2020 02:41
-
-
Save Kaleidosium/0cd9b93e74c73d4e09263fe625ec6a6a to your computer and use it in GitHub Desktop.
IamRifki's Ecosia Scraper.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* ecosia-scraper (c) MIT Dania Rifki <iamrifki0@gmail.com> | |
* | |
* # Prelude: | |
* Back in 2019, I was introduced to the Ecosia search engine | |
* I never really liked it, it was pretty terrible in my opinion. | |
* Search results are pretty eh, and image results frequently pops up unwanted images. | |
* | |
* In 2020, I decided to use it for an image scraping project, why you ask? | |
* Since nobody uses it anymore, Search results come faster than Bing, | |
* despite having the same engine as Bing. | |
* Plus the image results are surprisingly readable, | |
* which is why it only took a few minutes for me to write this. | |
* | |
* The image source output of Ecosia looks like this: | |
* ``` | |
* <a | |
* class="image-result js-image-result js-infinite-scroll-item" | |
* style="background-color:#C19E0A; -m s-flex-positive: 1.781954887218045; flex: 1.781954887218045;" | |
* href="https://i.ytimg.com/vi/btR7RBlXy7A/maxresdefault.jpg" | |
* data-image-id="EEE6638481CBC1B79506F1BD4BB3E6047AD1DC1C" | |
* data-src="https://tse4.mm.bing.net/th?id=OIP.oSlHwY9Yf55r-dYp50AEZgHaEK&pid=Api" | |
* target="_blank" | |
* > | |
* ``` | |
* | |
* While Google is COMPLETELY unparsable, | |
* it's a mess to figure that out. | |
* | |
* So TL;DR: | |
* I took advantage of a dead search engine and used it | |
* to make one of the simplest image scrapers that actually still works today. | |
* | |
* # Recommended Requirements (May still work in previous or future versions of dependencies): | |
* * axios 0.19 | |
* * node-html-parser 1.2 | |
* | |
* # Examples: | |
* | |
* ## Get all possible results and console.log them. | |
* ``` | |
* const { scrape } = require("ecosia.js"); | |
* | |
* async function example() { | |
* const results = await scraper("Banana").attributes.href; | |
* console.log(results); | |
* } | |
* ``` | |
* | |
* ## Get a random result and console.log it. | |
* ``` | |
* const { scrape } = require("ecosia.js"); | |
* | |
* async function example() { | |
* const result = await scraper("Banana")[Math.floor(Math.random() * rawResults.length)].attributes.href; | |
* console.log(result); | |
* } | |
* ``` | |
*/ | |
"use strict"; | |
const axios = require("axios"); | |
const { parse } = require("node-html-parser"); | |
/** | |
* Scrapes Ecosia and returns a random image as a string from it. | |
* | |
* @param {string} keyword Keyword that's used to query Ecosia. | |
* @return {string} A randomly generated result from Ecosia passed as a string. | |
*/ | |
module.exports = async function scrape(keyword) { | |
try { | |
const url = `https://ecosia.org/images?q=${keyword}`; | |
// Request the page | |
const response = await axios.get(url); | |
// Parse the response | |
const dom = parse(response.data); | |
const imgs = dom.querySelectorAll("a.image-result"); | |
return imgs; | |
} catch (err) { | |
console.error("", `[Info] Error: Unable to GET ${url}`); | |
console.log(err); | |
} | |
}; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The HTML file that I want to scrape looks a bit like this https://pastebin.com/CSJ82fUa