Eliah Rusin franchb

Optimizing postgresql table for more than 100K inserts per second

Create UNLOGGED table. This reduces the amount of data written to persistent storage by up to 2x.
Set WITH (autovacuum_enabled=false) on the table. This saves CPU time and IO bandwidth on useless vacuuming of the table (since we never DELETE or UPDATE the table).
Insert rows with COPY FROM STDIN. This is the fastest possible approach to insert rows into table.
Minimize the number of indexes in the table, since they slow down inserts. Usually an index on time timestamp with time zone is enough.
Add synchronous_commit = off to postgresql.conf.
Use table inheritance for fast removal of old data:

To remove a submodule you need to:

Delete the relevant section from the .gitmodules file.
Stage the .gitmodules changes git add .gitmodules
Delete the relevant section from .git/config.
Run git rm --cached path_to_submodule (no trailing slash).
Run rm -rf .git/modules/path_to_submodule (no trailing slash).
Commit git commit -m "Removed submodule "
Delete the now untracked submodule files rm -rf path_to_submodule

Install PostgreSQL 10 on Ubuntu

This is a quick guide to install PostgreSQL 10 - tested on Ubuntu 16.04 but likely can be used for Ubuntu 14.04 and 17.04 as well, with one minor modification detailed below.

(Optional) Uninstall other versions of postgres

To make life simple, remove all other versions of Postgres. Obviously not required, but again, makes life simple.

dpkg -l | grep postgres

API workthough

Open a browser

# start an instance of firefox with selenium-webdriver
driver = Selenium::WebDriver.for :firefox
# :chrome -> chrome
# :ie     -> iexplore

Go to a specified URL

	RUN apk --no-cache add \
	wget \
	ca-certificates \
	libstdc++
	# Get and install glibc for alpine
	ARG APK_GLIBC_VERSION=2.29-r0
	ARG APK_GLIBC_FILE="glibc-${APK_GLIBC_VERSION}.apk"
	ARG APK_GLIBC_BIN_FILE="glibc-bin-${APK_GLIBC_VERSION}.apk"
	ARG APK_GLIBC_BASE_URL="https://github.com/sgerrand/alpine-pkg-glibc/releases/download/${APK_GLIBC_VERSION}"
	RUN wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://alpine-pkgs.sgerrand.com/sgerrand.rsa.pub \

	ssh-keygen
	-t ed25519 - for greatest security (bits are a fixed size and -b flag will be ignored)
	-t rsa - for greatest portability (key needs to be greater than 4096 bits)
	-t ecdsa - faster than RSA or DSA (bits can only be 256, 284, or 521)
	-t dsa - DEEMED INSECURE - DSA limted to 1024 bit key as specified by FIPS 186-2, No longer allowed by default in OpenSSH 7.0+
	-t rsa1 - DEEMED INSECURE - has weaknesses and shouldn't be used (used in protocol 1)

	-b 4096 bit size
	-a 500 rounds (should be no smaller than 64, result in slower passphrase verification and increased resistance to brute-force password cracking)
	-C "First.Last@somewhere.com" comment..

	def findNull(row:Row):String = {
	if (row.anyNull) {
	val indices = (0 to row.length-1).toArray.filter(i => row.isNullAt(i))
	indices.mkString(",")
	}
	else "-1"
	}

	sqlContext.udf.register("findNull", findNull _)
	df = df.withColumn("MissingGroups",callUDF("findNull",struct(df.columns.map(df(_)) : _*)))

	def splitDataFrameList(df,target_column,separator):
	''' df = dataframe to split,
	target_column = the column containing the values to split
	separator = the symbol used to perform the split

	returns: a dataframe with each entry for the target column separated, with each element moved into a new row.
	The values in the other columns are duplicated across the newly divided rows.
	'''
	def splitListToRows(row,row_accumulator,target_column,separator):
	split_row = row[target_column].split(separator)

	### Locating UI elements ###

	# By ID
	<div id="coolestWidgetEvah">...</div>
	element = driver.find_element_by_id("coolestWidgetEvah")
	or
	from selenium.webdriver.common.by import By
	element = driver.find_element(by=By.ID, value="coolestWidgetEvah")

	# By class name: