Skip to content

Instantly share code, notes, and snippets.

View mwiewior's full-sized avatar

Marek Wiewiórka mwiewior

View GitHub Profile
@mwiewior
mwiewior / gcp-global-lb-multi-region-cr-ce.sh
Created August 26, 2024 11:02 — forked from mikesparr/gcp-global-lb-multi-region-cr-ce.sh
Demonstrating how you can deploy Cloud Run (serverless) or Compute Engine instance groups across regions and balance with global load balancer
#!/usr/bin/env bash
#####################################################################
# REFERENCES
# - https://cloud.google.com/run/docs/multiple-regions
# - https://cloud.google.com/compute/docs/instance-groups/distributing-instances-with-regional-instance-groups
# - https://cloud.google.com/load-balancing/docs/https/setup-global-ext-https-compute
# - https://cloud.google.com/load-balancing/docs/backend-service#named_ports
#####################################################################
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mwiewior
mwiewior / cache-oblivious.md
Created February 17, 2024 15:03 — forked from debasishg/cache-oblivious.md
Papers related to cache oblivious data structures

Cache Oblivious and Cache Aware Data Structure and Algorithms

  1. Cache-Oblivious Algorithms and Data Structures - Erik Demaine (One of the earliest papers in cache oblivious data structures and algorithms that introduces the cache oblivious model in detail and examines static and dynamic cache oblivious data structures built between 2000-2003)

  2. Cache Oblivious B-Trees - Bender, Demaine, Farch-Colton (This paper presents two dynamic search trees attaining near-optimal performance on any hierarchical memory. One of the fundamental papers in the field where both search trees discussed match the optimal search bound of Θ(1+log (B+1)N) memory transfers)

  3. Cache Oblivious Search Trees via Binary Trees of Small Height - Brodal, Fagerberg, Jacob (The data structure discussed in this paper works on the version of [2] but avoids the use o

@mwiewior
mwiewior / spark-amm.sh
Created July 9, 2020 15:45 — forked from ottomata/spark-amm.sh
spark + ammonite
#!/usr/bin/env bash
export SPARK_HOME="${SPARK_HOME:-/usr/lib/spark2}"
export SPARK_CONF_DIR="${SPARK_CONF_DIR:-"${SPARK_HOME}"/conf}"
source ${SPARK_HOME}/bin/load-spark-env.sh
export HIVE_CONF_DIR=${SPARK_CONF_DIR}
export HADOOP_CONF_DIR=/etc/hadoop/conf
AMMONITE=~/bin/amm # This is amm binary release 2.11-1.6.7
@mwiewior
mwiewior / README.md
Created July 1, 2020 12:23 — forked from bradfordcp/README.md
Setting up Apache Spark to use Apache Shiro for authentication of Spark Master dashboard.

Securing Apache Spark with Apache Shiro

  1. Download shiro-core-1.2.5.jar Apache Shiro Downloads
  2. Download shiro-web-1.2.5.jar Apache Shiro Downloads
  3. Note the location of the JAR files and shiro.ini. I placed it in the root of my Spark download
  4. Update the spark-env.sh file with the Shiro JARs and add an entry for the path where the shiro.ini resides
  5. Start the Spark master sbin/start-master.sh
  6. Navigate to the Spark master dashboard
  7. Authenticate with credentials in shiro.ini

Note this was developed / tested with Apache Spark 1.4.1, but should work with newer versions as well.

@mwiewior
mwiewior / carbon.scala
Created July 31, 2019 17:12 — forked from agaszmurlo/carbon.scala
Carbon data varia
// ./spark-shell -v --master yarn-client --driver-memory 1G --executor-memory 2G --executor-cores 2 \
// --jars /tmp/apache-carbondata-1.6.0-SNAPSHOT-bin-spark2.3.2-hadoop2.7.2.jar \
// --conf spark.hadoop.hive.metastore.uris=thrift://cdh01.cl.ii.pw.edu.pl:9083 \
// --conf spark.hadoop.yarn.timeline-service.enabled=false \
// --conf spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
// --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
// --conf spark.hadoop.metastore.catalog.default=hive
import org.apache.spark.sql.SparkSession
spark-shell -v --master=local[$cores] --driver-memory=12g --conf "spark.sql.catalogImplementation=in-memory" --packages org.biodatageeks:bdg-sequila_2.11:0.5.3-spark-2.4.0-SNAPSHOT --repositories http://repo.hortonworks.com/content/repositories/releases/,http://zsibio.ii.pw.edu.pl/nexus/repository/maven-snapshots/
import org.apache.spark.sql.SequilaSession
import org.biodatageeks.utils.{SequilaRegister, UDFRegister,BDGInternalParams}
val ss = SequilaSession(spark)
SequilaRegister.register(ss)
ss.sqlContext.setConf("spark.biodatageeks.bam.useGKLInflate","true")
ss.sqlContext.setConf("spark.biodatageeks.bam.useSparkBAM","false")
@mwiewior
mwiewior / scala-sbt-project-structure.sh
Created June 1, 2018 16:15 — forked from WarFox/scala-sbt-project-structure.sh
Script to create Scala SBT project directory structure
#!/usr/bin/env bash
touch build.sbt ; touch README.md; mkdir -p project; touch project/plugins.sbt; mkdir -p src/{main/{scala,resources,java},test/{scala,resources,java}}/
@mwiewior
mwiewior / map-pushdow.sc
Created April 20, 2018 19:15 — forked from joao-parana/map-pushdow.sc
Using CatalystExtension Points in Spark
// Este script é para rodar no Ammonite.
// Crie o arquivo catalyst_04.sc com este conteúdo
// Dentro da shell REPL do Ammonitem, você deve invocar assim:
// import $file.catalyst_04, catalyst_04._
//
// Mas antes execute estes tres comandos abaixo
// import coursier.MavenRepository
// interp.repositories() ++= Seq(MavenRepository("file:/Users/admin/.m2/repository"))
// import $ivy.`org.apache.spark::spark-sql:2.3.0`