Skip to content

Instantly share code, notes, and snippets.

@tilakpatidar
Last active May 17, 2016 12:44
Show Gist options
  • Save tilakpatidar/838141a49f085e29e4495f0413c2f872 to your computer and use it in GitHub Desktop.
Save tilakpatidar/838141a49f085e29e4495f0413c2f872 to your computer and use it in GitHub Desktop.
Resource links for the workshop conducted by NLP lab on 17th May 2016.

#Semantic Search engine – NLP lab

###17/05/2016 Task

http://blog.aiesec.in/

Use the above blog to scrap the following information and show in terminal (ubuntu) or in a file in windows.

1.Heading of each blog article.
2.Date of posting each article.
3.Comments (if any) from each article.
4. Image url for each article.
5. On each page there are multiple blog snippets present, we want them all.

Hints : python-bs4, python-urllib2 search these you will get the idea

Please refer this image

https://drive.google.com/open?id=0B0Mf1CuHV_44N0pPWERMSGhHNk0

Install Lamp server on ubuntu

 sudo apt-get install lamp-server^
 sudo apt-get update
 sudo apt-get install phpmyadmin
 sudo php5enmod mcrypt
 sudo service apache2 restart

You can now access the web interface by visiting your server's domain name or public IP address followed by /phpmyadmin:

http://localhost/phpmyadmin

For installing phpmyadmin Visit https://www.digitalocean.com/community/tutorials/how-to-install-and-secure-phpmyadmin-on-ubuntu-14-04

Building your own search engine

http://www.aossama.com/search-engine-with-apache-nutch-mongodb-and-elasticsearch/

Basic SQL tutorial links

http://www.atlasindia.com/sql.htm

http://www.tutorialspoint.com/sql/sql-rdbms-concepts.htm

https://www.ntu.edu.sg/home/ehchua/programming/sql/MySQL_Beginner.html

https://www.mysql.com/

http://arachnoid.com/MySQL/

SQL vs NoSQL

http://www.sitepoint.com/sql-vs-nosql-differences/SQL%20Schema%20vs%20NoSQL%20Schemaless

https://www.youtube.com/watch?v=rRoy6I4gKWU

https://www.dezyre.com/article/nosql-vs-sql-4-reasons-why-nosql-is-better-for-big-data-applications/86

Elasticsearch

http://filepi.com/i/TNUdofj

http://joelabrahamsson.com/elasticsearch-101/

http://www.elasticsearchtutorial.com/

https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Pymongo images shown by Anmol and Elasticsearch ppts

https://github.com/anmolsachan/NLPWorkshop/archive/master.zip

MongoDB

http://www.tutorialspoint.com/mongodb/

https://docs.mongodb.com/v3.0/tutorial/

https://university.mongodb.com/?jmp=docs%2F&_ga=1.226181715.87059799.1463421535

https://www.youtube.com/watch?v=W-WihPoEbR4

Presentation Links

NoSQL vs SQL ppt

https://docs.google.com/presentation/d/1suCFgqFw5LX1bYEKOpLUyYk9ksTJEh1ATpXupO2ePyQ/pub?start=true&loop=false&delayms=3000

Apache Nutch Web Crawling

https://docs.google.com/presentation/d/1c4k_39SuLl8GiNSjhnHqX-_qWFTZR4XNH6Ay9z7Iczg/pub?start=false&loop=false&delayms=3000

#Hadoop

http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/

http://www.plottingsuccess.com/hadoop-101-important-terms-explained-0314/

http://www.tutorialspoint.com/hadoop/

http://bigdatauniversity.com/courses/hadoop-course/

Apache Ambari

http://ambari.apache.org/install.html

http://www.edupristine.com/blog/hadoop-installation-using-ambari

Apache Nutch

https://gist.github.com/xrstf/b48a970098a8e76943b9

Apache Hbase

http://www.tutorialspoint.com/hbase/

https://blog.cloudera.com/blog/2013/04/how-scaling-really-works-in-apache-hbase/

http://www.slideshare.net/anilgupta84/introduction-to-hbase-52714951?qid=ee0ae441-9ecd-4c57-947b-170995775338&v=&b=&from_search=1

http://hbase.apache.org/0.94/book/quickstart.html

Technologies Discussed

http://nutch.apache.org/

http://tika.apache.org/

https://ambari.apache.org/

http://hadoop.apache.org/

https://www.mongodb.com/

https://www.elastic.co/

https://hbase.apache.org/

https://www.mysql.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment