Skip to content

Instantly share code, notes, and snippets.

View franchb's full-sized avatar
🎯
Focusing

Eliah Rusin franchb

🎯
Focusing
View GitHub Profile
@franchb
franchb / README.md
Created April 20, 2020 08:07 — forked from valyala/README.md
Optimizing postgresql table for more than 100K inserts per second

Optimizing postgresql table for more than 100K inserts per second

  • Create UNLOGGED table. This reduces the amount of data written to persistent storage by up to 2x.
  • Set WITH (autovacuum_enabled=false) on the table. This saves CPU time and IO bandwidth on useless vacuuming of the table (since we never DELETE or UPDATE the table).
  • Insert rows with COPY FROM STDIN. This is the fastest possible approach to insert rows into table.
  • Minimize the number of indexes in the table, since they slow down inserts. Usually an index on time timestamp with time zone is enough.
  • Add synchronous_commit = off to postgresql.conf.
  • Use table inheritance for fast removal of old data:
@franchb
franchb / glibc-in-alpine-docker
Created October 14, 2019 11:44 — forked from larzza/glibc-in-alpine-docker
Install glibc in Alpine docker image
RUN apk --no-cache add \
wget \
ca-certificates \
libstdc++
# Get and install glibc for alpine
ARG APK_GLIBC_VERSION=2.29-r0
ARG APK_GLIBC_FILE="glibc-${APK_GLIBC_VERSION}.apk"
ARG APK_GLIBC_BIN_FILE="glibc-bin-${APK_GLIBC_VERSION}.apk"
ARG APK_GLIBC_BASE_URL="https://github.com/sgerrand/alpine-pkg-glibc/releases/download/${APK_GLIBC_VERSION}"
RUN wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://alpine-pkgs.sgerrand.com/sgerrand.rsa.pub \
ssh-keygen
-t ed25519 - for greatest security (bits are a fixed size and -b flag will be ignored)
-t rsa - for greatest portability (key needs to be greater than 4096 bits)
-t ecdsa - faster than RSA or DSA (bits can only be 256, 284, or 521)
-t dsa - DEEMED INSECURE - DSA limted to 1024 bit key as specified by FIPS 186-2, No longer allowed by default in OpenSSH 7.0+
-t rsa1 - DEEMED INSECURE - has weaknesses and shouldn't be used (used in protocol 1)
-b 4096 bit size
-a 500 rounds (should be no smaller than 64, result in slower passphrase verification and increased resistance to brute-force password cracking)
-C "First.Last@somewhere.com" comment..
@franchb
franchb / delete_git_submodule.md
Created March 15, 2019 07:30 — forked from myusuf3/delete_git_submodule.md
How effectively delete a git submodule.

To remove a submodule you need to:

  • Delete the relevant section from the .gitmodules file.
  • Stage the .gitmodules changes git add .gitmodules
  • Delete the relevant section from .git/config.
  • Run git rm --cached path_to_submodule (no trailing slash).
  • Run rm -rf .git/modules/path_to_submodule (no trailing slash).
  • Commit git commit -m "Removed submodule "
  • Delete the now untracked submodule files rm -rf path_to_submodule

Install PostgreSQL 10 on Ubuntu

This is a quick guide to install PostgreSQL 10 - tested on Ubuntu 16.04 but likely can be used for Ubuntu 14.04 and 17.04 as well, with one minor modification detailed below.

(Optional) Uninstall other versions of postgres

To make life simple, remove all other versions of Postgres. Obviously not required, but again, makes life simple.

dpkg -l | grep postgres
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@franchb
franchb / SparkRowApply.scala
Created February 14, 2018 09:12 — forked from jlln/SparkRowApply.scala
How to apply a function to every row in a Spark DataFrame.
def findNull(row:Row):String = {
if (row.anyNull) {
val indices = (0 to row.length-1).toArray.filter(i => row.isNullAt(i))
indices.mkString(",")
}
else "-1"
}
sqlContext.udf.register("findNull", findNull _)
df = df.withColumn("MissingGroups",callUDF("findNull",struct(df.columns.map(df(_)) : _*)))
@franchb
franchb / separator.py
Created February 14, 2018 09:12 — forked from jlln/separator.py
Efficiently split Pandas Dataframe cells containing lists into multiple rows, duplicating the other column's values.
def splitDataFrameList(df,target_column,separator):
''' df = dataframe to split,
target_column = the column containing the values to split
separator = the symbol used to perform the split
returns: a dataframe with each entry for the target column separated, with each element moved into a new row.
The values in the other columns are duplicated across the newly divided rows.
'''
def splitListToRows(row,row_accumulator,target_column,separator):
split_row = row[target_column].split(separator)

API workthough

  1. Open a browser

    # start an instance of firefox with selenium-webdriver
    driver = Selenium::WebDriver.for :firefox
    # :chrome -> chrome
    # :ie     -> iexplore
    
  • Go to a specified URL
@franchb
franchb / gist:0ed60861afe5d39032520f1096629cef
Created February 2, 2018 12:47 — forked from casschin/gist:1990245
Python webdriver api quick sheet
### Locating UI elements ###
# By ID
<div id="coolestWidgetEvah">...</div>
element = driver.find_element_by_id("coolestWidgetEvah")
or
from selenium.webdriver.common.by import By
element = driver.find_element(by=By.ID, value="coolestWidgetEvah")
# By class name: