Marek Wiewiórka mwiewior

Cache Oblivious and Cache Aware Data Structure and Algorithms

Cache-Oblivious Algorithms and Data Structures - Erik Demaine (One of the earliest papers in cache oblivious data structures and algorithms that introduces the cache oblivious model in detail and examines static and dynamic cache oblivious data structures built between 2000-2003)
Cache Oblivious B-Trees - Bender, Demaine, Farch-Colton (This paper presents two dynamic search trees attaining near-optimal performance on any hierarchical memory. One of the fundamental papers in the field where both search trees discussed match the optimal search bound of Θ(1+log (B+1)N) memory transfers)
Cache Oblivious Search Trees via Binary Trees of Small Height - Brodal, Fagerberg, Jacob (The data structure discussed in this paper works on the version of [2] but avoids the use o

Download shiro-core-1.2.5.jar Apache Shiro Downloads
Download shiro-web-1.2.5.jar Apache Shiro Downloads
Note the location of the JAR files and shiro.ini. I placed it in the root of my Spark download
Update the spark-env.sh file with the Shiro JARs and add an entry for the path where the shiro.ini resides
Start the Spark master sbin/start-master.sh
Navigate to the Spark master dashboard
Authenticate with credentials in shiro.ini

Note this was developed / tested with Apache Spark 1.4.1, but should work with newer versions as well.

	#!/usr/bin/env bash

	#####################################################################
	# REFERENCES
	# - https://cloud.google.com/run/docs/multiple-regions
	# - https://cloud.google.com/compute/docs/instance-groups/distributing-instances-with-regional-instance-groups
	# - https://cloud.google.com/load-balancing/docs/https/setup-global-ext-https-compute
	# - https://cloud.google.com/load-balancing/docs/backend-service#named_ports
	#####################################################################

	#!/usr/bin/env bash

	export SPARK_HOME="${SPARK_HOME:-/usr/lib/spark2}"
	export SPARK_CONF_DIR="${SPARK_CONF_DIR:-"${SPARK_HOME}"/conf}"

	source ${SPARK_HOME}/bin/load-spark-env.sh
	export HIVE_CONF_DIR=${SPARK_CONF_DIR}
	export HADOOP_CONF_DIR=/etc/hadoop/conf

	AMMONITE=~/bin/amm # This is amm binary release 2.11-1.6.7

	// ./spark-shell -v --master yarn-client --driver-memory 1G --executor-memory 2G --executor-cores 2 \
	// --jars /tmp/apache-carbondata-1.6.0-SNAPSHOT-bin-spark2.3.2-hadoop2.7.2.jar \
	// --conf spark.hadoop.hive.metastore.uris=thrift://cdh01.cl.ii.pw.edu.pl:9083 \
	// --conf spark.hadoop.yarn.timeline-service.enabled=false \
	// --conf spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
	// --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
	// --conf spark.hadoop.metastore.catalog.default=hive


	import org.apache.spark.sql.SparkSession

	spark-shell -v --master=local[$cores] --driver-memory=12g --conf "spark.sql.catalogImplementation=in-memory" --packages org.biodatageeks:bdg-sequila_2.11:0.5.3-spark-2.4.0-SNAPSHOT --repositories http://repo.hortonworks.com/content/repositories/releases/,http://zsibio.ii.pw.edu.pl/nexus/repository/maven-snapshots/

	import org.apache.spark.sql.SequilaSession
	import org.biodatageeks.utils.{SequilaRegister, UDFRegister,BDGInternalParams}

	val ss = SequilaSession(spark)
	SequilaRegister.register(ss)
	ss.sqlContext.setConf("spark.biodatageeks.bam.useGKLInflate","true")
	ss.sqlContext.setConf("spark.biodatageeks.bam.useSparkBAM","false")

	// Este script é para rodar no Ammonite.
	// Crie o arquivo catalyst_04.sc com este conteúdo
	// Dentro da shell REPL do Ammonitem, você deve invocar assim:
	// import $file.catalyst_04, catalyst_04._
	//
	// Mas antes execute estes tres comandos abaixo
	// import coursier.MavenRepository
	// interp.repositories() ++= Seq(MavenRepository("file:/Users/admin/.m2/repository"))
	// import $ivy.`org.apache.spark::spark-sql:2.3.0`

	#!/usr/bin/env bash
	touch build.sbt ; touch README.md; mkdir -p project; touch project/plugins.sbt; mkdir -p src/{main/{scala,resources,java},test/{scala,resources,java}}/