Skip to content

Instantly share code, notes, and snippets.

View okumin's full-sized avatar

okumin okumin

View GitHub Profile
@okumin
okumin / main.md
Last active September 3, 2024 03:21
Hive + Iceberg split

Reproduction

I used Hive 4.0.0.

Create a table with a big Parquet file

set tez.grouping.split-count=1;
CREATE TABLE web_sales_parquet STORED AS PARQUET AS SELECT * FROM web_sales;
@okumin
okumin / 1.2.0-2.0.0.md
Last active August 22, 2024 10:09
Hive keyword changes

Summary.

  • Non-reserved
    • Removed: HOLD_DDLTIME, IGNORE, NO_DROP, OFFLINE, PROTECTION, READONLY, REGEXP, RLIKE
    • Added: AUTOCOMMIT, ISOLATION, LEVEL, OFFSET, SNAPSHOT, TRANSACTION, WORK, WRITE
  • Reserved
    • Removed:
    • Added: COMMIT, ONLY, REGEXP, RLIKE, ROLLBACK, START

The list of changed keywords.

How PlanMapper works

Purpose

PlanMapper helps Hive regenerate better query plans using runtime stats. It groups entities which are semantically the same. For example, A RelNode of Calcite to express WHERE id = 1 could be equivalent with a FilterOperator of Hive. A CommonMergeJoinOperator could be linked to a MapJoinOperator converted from the CommonMergeOperator.

Groups generated by PlanMapper express such relationship so that it can propagate the final runtime stats to RelNodes or Operators in each step. https://cwiki.apache.org/confluence/display/Hive/Query+ReExecution

Flow

@okumin
okumin / main.md
Last active October 19, 2023 15:11

Overview

In HIVE-12679, we have been trying to introduce a feature to make IMetaStoreClient pluggable. This document is a summary of the past discussions.

Problem statement

Apache Hive hardcodes the implementation of IMetaStoreClient, assuming it alreays talks to Hive Metastore. 99% of Hive users doesn't have any problems because they use HMS as a data catalog. However, some data platforms and their users use alternaive services as data catalogs.

  • Amazone EMR provides an option to use AWS Glue Data Catalog
  • Treasure Data deploys Apache Hive integrated with their own in-house data catalog
@okumin
okumin / keybase.md
Last active November 18, 2022 13:37

Keybase proof

I hereby claim:

  • I am okumin on github.
  • I am okumin (https://keybase.io/okumin) on keybase.
  • I have a public key ASBk4J-iYG52Dmu-OCTE37-7M9-YuYo6hxordcz7zr1QfQo

To claim this, I am signing this object:

Summary

MySQL

/ persistAsync persist recover
akka-2.3 1547 7964 450
akka-2.4-rc1 1504 9051 390
akka-2.4-batched 702 8806 410
akka-2.4-seq 615 11711 402
def findMofu(id: Int): Future[CacheError | IOError | NotFound, Mofu] = ???
def createMofu(mofu: Mofu): Future[CacheError | IOError | DuplicateError, Mofu] = ???

val result = findMofu(5).recoverWith {
  case NotFound =>
    createMofu(Mofu(5)).recoverWith {
      case DuplicateError => UnknownError
    }
}
class MofuSpec extends WordSpec with GeneratorDrivenPropertyChecks {
def measure(seq: Seq[Int], heap: Heap[Int]): Unit = {
val h = seq.foldLeft(heap) { (h, x) => h.insert(x) }
val h2 = seq.foldLeft(h) { (h, x) => h.deleteMin() }
assert(h2.isEmpty)
}
"mofu" should {
"fumo" in {
val initial = 300000
@okumin
okumin / akka-persistence.md
Created September 28, 2014 08:55
akka-persistenceのプラグインをつくろう