This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sudo apt install python3 | |
sudo apt install python3-pip | |
sudo add-apt-repository "deb [arch=amd64] https://packages.microsoft.com/repos/vscode stable main" | |
sudo apt update | |
sudo apt install code | |
sudo apt-get install -y cmake libfreetype6-dev libfontconfig1-dev xclip |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Do this on every node of the cluster | |
curl -O http://www.congiu.net/hive-json-serde/1.3.8/hdp23/json-serde-1.3.8-jar-with-dependencies.jar | |
sudo cp json-serde-1.3.8-jar-with-dependencies.jar /usr/lib/presto/plugin/hive-hadoop2/ | |
sudo chown presto:presto /usr/lib/presto/plugin/hive-hadoop2/json-serde-1.3.8-jar-with-dependencies.jar | |
#restart presto | |
sudo restart presto-server |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python | |
from tweepy import Stream, OAuthHandler | |
from tweepy.streaming import StreamListener | |
from progressbar import ProgressBar, Percentage, Bar | |
import json | |
import sys | |
#Twitter app information | |
consumer_secret='Your consumer secret' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function csv2db() { | |
echo -e ".mode csv \n.import $1.csv $1" | sqlite3 $1.db && \ | |
sqlite3 -header -column $1.db | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val docs = sc.textFile("/opt/dataset/don-quijote.txt.gz") | |
val lower = docs.map(line => line.toLowerCase) | |
val words = lower.flatMap(line => line.split("\\s+")) | |
val counts = words.map(word => (word, 1)) | |
val freq = counts.reduceByKey(_ + _) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
yarn logs -application_id <application_id> | |
e.g. | |
yarn logs -application_id application_1424284032717_0066 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Set everything to be logged to the console | |
log4j.rootCategory=WARN, console | |
log4j.appender.console=org.apache.log4j.ConsoleAppender | |
log4j.appender.console.target=System.err | |
log4j.appender.console.layout=org.apache.log4j.PatternLayout | |
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n | |
# Settings to quiet third party logs that are too verbose | |
log4j.logger.org.eclipse.jetty=WARN | |
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import com.google.gson.Gson | |
import org.apache.spark.streaming.twitter.TwitterUtils | |
import org.apache.spark.streaming._ | |
import org.apache.spark.streaming.twitter._ | |
import org.apache.spark.storage.StorageLevel | |
import scala.io.Source | |
import scala.collection.mutable.HashMap | |
import java.io.File | |
import org.apache.log4j.Logger | |
import org.apache.log4j.Level |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
USER_NAME=hbd | |
USER_HOME="/home/$USER_NAME" | |
cd $USER_HOME | |
mkdir $USER_HOME/twitter4j | |
cd $USER_HOME/twitter4j | |
# Get the Spark Streaming JAR. | |
curl -O "http://central.maven.org/maven2/org/apache/spark/spark-streaming-twitter_2.10/1.5.0/spark-streaming-twitter_2.10-1.5.0.jar" |