Skip to content

Instantly share code, notes, and snippets.

View danabauer's full-sized avatar

Dana Bauer danabauer

View GitHub Profile
@danabauer
danabauer / convert_file.sh
Created September 2, 2024 18:37 — forked from mehd-io/convert_file.sh
Convert csv<->parquet
#!/bin/bash
# A simple script for converting files between CSV and Parquet formats using DuckDB. Requires DuckDB installation.
convert_file() {
local input_file="$1"
local output_extension="$2"
# Extracting the filename without extension
local base_name=$(basename -- "$input_file")
local name="${base_name%.*}"
@danabauer
danabauer / hierarchies_descendants.md
Last active August 1, 2024 19:52
exploring hierarchies and descendants in Overture's divisions data
COPY(
SELECT hierarchies
 FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=divisions/type=division/*', filename=true, hive_partitioning=1)
WHERE subtype IN ('macrohood', 'neighborhood', 'microhood')
   AND country = 'US'
   AND region = 'US-NY'
   AND names.primary = 'SoHo'
   ) TO 'soho.json';
SELECT 
    id,
    names.primary AS primary_name,
    sources[1].dataset AS primary_source,
    ST_GeomFromBinary(geometry) AS geometry
FROM v2024_04_16_beta_0
WHERE type = 'building'
	AND ST_INTERSECTS(
 ST_GeomFromBinary(geometry),

Breaking change in Overture's 2024-03-12-alpha.0 release

In previous releases of Overture data, we used camelCase for attribute names throughout the schema. A duckdb query to find Pennsylvania in our Admins dataset looked like this:

LOAD spatial;
LOAD httpfs:

SELECT 
   id, 
@danabauer
danabauer / boundaries-buildings.md
Last active March 11, 2024 18:41
down the rabbit hole of Overture's admins dataset, 2024-02-15 data release

This query lets us grab the locality (point geometry and attributes) that represents the United States and its associated area, in this case a multipolygon geometry.

CREATE VIEW admins_view AS (
    SELECT
        *
    FROM
        read_parquet('s3://overturemaps-us-west-2/release/2024-02-15-alpha.0/theme=admins/type=*/*', filename=true, hive_partitioning=1)COPY (
COPY (
SELECT
@danabauer
danabauer / geo_interface.rst
Created March 6, 2024 04:10 — forked from sgillies/geo_interface.rst
A Python Protocol for Geospatial Data

Author: Sean Gillies Version: 1.0

Abstract

This document describes a GeoJSON-like protocol for geo-spatial (GIS) vector data.

Introduction

@danabauer
danabauer / mapping-usa.md
Last active February 29, 2024 17:28
Dana's recap of the Overture BOF at Mapping USA in January 2024

Hey folks! Yesterday I listened to the Overture Maps Birds of a Feather session at Mapping USA and typed up some quick impressions. Jennings led the session and Ben Clark and Marc participated. It was a good discussion that drew lots of interesting questions, and I wanted to share those here.


Jennings spoke during the second day of Mapping USA, a free, virtual conference organized by OpenStreetMap US. There were about 40 people listening on Zoom, a mix of OpenStreetMap members and contributors as well as other folks with a more casual interest in OSM. Jennings stepped through slides, but the session was very interactive, and people asked questions throughout. A different audience might not care so much about how the sausage is made, but this group wanted to hear how Overture assembles its data products and how OSM data flows into Overture’s processing pipelines.

Throughout the discussion, Jennings hit on three main selling points:

  • Overture is doing the hard work of data
@danabauer
danabauer / overture_admins_then_and_now.sql
Last active March 1, 2024 15:38
duckdb queries to ask questions of the Overture admins dataset
/* In the initial implementation of the admins schema the "locality" feature type was
allowed to have Point, Polygon, or MultiPolygon geometry. There was no "locality area." So back
in July, this is how you would get country borders. See Simon Willison's post:
https://til.simonwillison.net/overture-maps/overture-maps-parquet. Note the geometry type checking.*/
COPY (
SELECT
type,
subType,
localityType,
@danabauer
danabauer / confident_mountains.sql
Last active February 29, 2024 17:14
duckdb queries to ask questions of the Overture places and buildings datasets
INSTALL spatial;
INSTALL httpfs;
LOAD spatial;
LOAD httpfs;
SET s3_region='us-west-2';
COPY(
SELECT
@danabauer
danabauer / buildings_query_duckdb.sql
Last active February 21, 2024 16:41
extract data Overture buildings theme
COPY (
SELECT
id,
theme,
type,
version,
level,
updateTime,
height,
numFloors,