Skip to content

Instantly share code, notes, and snippets.

@econandrew
Last active July 12, 2024 15:45
Show Gist options
  • Save econandrew/c057b4575d9d17175861 to your computer and use it in GitHub Desktop.
Save econandrew/c057b4575d9d17175861 to your computer and use it in GitHub Desktop.
Instructions for installing an OpenStreetMap Nominatim geocoder instance on Amazon Linux / EC2 (Dec 2014)

Installing Nominatim on Amazon Linux / EC2

  1. Introduction

The official instructions for installing Nominatim are complete, but brief in places, and several steps must be changed in the Amazon Linux environment (which is roughly CentOS / Redhat). The steps below are rough record of what I did to get it working, but I didn't keep perfect track so you shouldn't rely on them as a shell script. Just follow each step, make sure it worked, and hopefully you'll need to adapt very little (version numbers, for one thing). (I also skip in and out of root, but you can be more careful if you like.)

  1. Setting up the EC2 instance

There's plenty of information on setting up Amazon EC2 instances elsewhere. I chose an r3.2xlarge machine (61 GB memory, 8 vCPUs, $0.70/hour), based on several-year-old suggestions that you need at least 32 GB of memory for the install, and the assumption that OSM has grown since then. I attached 2 x 750GB EBS volumes (as /dev/sd[f,g]---eventually in RAID0 striping), again based on previous old size estimates plus an allowance for growth. The root volume can be relatively small, but you might want to allow, say, 10-20GB for source data files. Or you can store them on the large EBS volumes, as I ended up having to because I left the root at default 8GB.

The total install time was reasonable, but I'm sure this isn't an optimal configuration. I considered other storage options like provisioned IOPS EBS and instance-attached storage, which may have sped up disk-bound tasks, but decided to stick with plain vanilla EBS in the end.

Login in to your running EC2 instance. You may want to invoke screen or equivalent so nothing quits if you get disconnected, as several commands will run for days.

  1. Setting up disk storage

These commands will construct a RAID0 striping volume over the two EBS volumes, create the filesystem and arrange for it to be mounted at boot time as /vol.

sudo su
mdadm --create --verbose /dev/md0 --level=stripe --raid-devices=2 /dev/sdf /dev/sdg
mkfs.ext4 /dev/md0
mkdir /vol
mount -t ext4 /dev/md127 /vol
cp /etc/fstab /etc/fstab.orig
echo "/dev/md0    /vol    ext4  defaults,nofail   0  2" >> /etc/fstab
mount -a

Reference: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html

  1. Install postgres + postgis

The standard repository packages won't work, so you'll need to get other packages and compile some things from source. It's pretty straightforward though.

Edit /etc/yum.repos.d/amzn-main.repo and add the following line to the block [amzn-main]:

exclude=postgresql*

Then install postgres.

cd ~/
wget http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgdg-redhat93-9.3-1.noarch.rpm
rpm -ivh pgdg-redhat93-9.3-1.noarch.rpm 
yum install postgresql93 postgresql93-server postgresql93-devel postgresql93-contrib

# I just symlinked the existing data directory to my mounted volume
rm -r /var/lib/pgsql/9.3/data/
ln -s /vol /var/lib/pgsql/9.3/data

# Set filepermissions to postgres user
chown postgres:postgres /vol
chmod 700 /vol

# Initialize the db
service postgresql-9.3 initdb

# Start the service
service postgresql-9.3 start

# exit from su
exit 

And install postgis and dependencies.

sudo yum install gcc make gcc-c++ libtool libxml2-devel libpng libtiff

cd ~/

# Download GEOS and install
wget http://download.osgeo.org/geos/geos-3.4.2.tar.bz2
tar xjf geos-3.4.2.tar.bz2 
cd geos-3.4.2
./configure 
make
sudo make install 

# Download Proj.4 and install
cd ~/
wget http://download.osgeo.org/proj/proj-4.8.0.tar.gz
tar xzf proj-4.8.0.tar.gz
cd proj-4.8.0

### If you don't want python bindings
./configure

### Or if you do want python bindings (which you need to import US street number data)
sudo yum install python26-devel.x86_64
./configure --with-python

make
sudo make install

# Download and install GDAL
cd ~/
wget http://download.osgeo.org/gdal/1.10.1/gdal-1.10.1.tar.gz
tar -xvzf gdal-1.10.1.tar.gz
cd gdal-1.10.1
./configure 
make
make install

# Download and install JSON-C library
cd ~/
wget https://s3.amazonaws.com/json-c_releases/releases/json-c-0.11.tar.gz
tar -xvzf json-c-0.11.tar.gz
cd json-c-0.11
./configure
make
make install

# Download and install PostGIS 
cd ~/
wget http://download.osgeo.org/postgis/source/postgis-2.1.2.tar.gz
tar -xvzf postgis-2.1.2.tar.gz
cd postgis-2.1.2
./configure --with-pgconfig=/usr/pgsql-9.3/bin/pg_config --with-geosconfig=/usr/local/bin/geos-config --with-gdalconfig=/usr/local/bin/gdal-config
make
make install

# update your libraries
sudo su
echo /usr/local/lib >> /etc/ld.so.conf
ldconfig

Reference: http://overtronic.com/2013/12/how-to-install-postgresql-with-postgis-on-amazon-ec2-linux/

  1. Nominatim dependencies

yum --enablerepo=epel install git make automake gcc gcc-c++ libtool
yum --enablerepo=epel install php-pgsql php php-pear php-pear-DB libpqxx-devel 
yum --enablerepo=epel install bzip2-devel libxml2-devel protobuf-c-devel lua-devel
#  These were installed from source above: proj-devel geos-devel proj-epsg
  1. Postgres config for install

Edit /vol/postgresql.conf and make the following changes. Here I just took the examples from the official install guide and increased them a bit to reflect the larger memory size.

shared_buffers (4GB)
maintenance_work_mem (16GB/10GB)
work_mem (50MB)
effective_cache_size (24GB)
synchronous_commit = off
checkpoint_segments = 100
checkpoint_timeout = 10min
checkpoint_completion_target = 0.9

For the initial import, I also set:

fsync = off
full_page_writes = off

Also it seems less certain but I also had no problems with:

autovacuum = off

These last three changes will be reverted after installation per the official instructions.

  1. Nominatim main installation

As ec2-user:

cd ~/
wget http://www.nominatim.org/release/Nominatim-2.3.0.tar.bz2
tar xvf Nominatim-2.3.0.tar.bz2

cd Nominatim-2.3.0
./configure --with-postgresql=/usr/pgsql-9.3/bin/pg_config
make

Edit settings/local.php and copy in the following:

<?php
 // Paths
 @define('CONST_Postgresql_Version', '9.3');
 @define('CONST_Postgis_Version', '2.1');
 @define('CONST_Path_Postgresql_Contrib', '/usr/pgsql-9.3/share/contrib');

Download some optional data (I wanted everything)

wget --output-document=data/wikipedia_article.sql.bin http://www.nominatim.org/data/wikipedia_article.sql.bin
wget --output-document=data/wikipedia_redirect.sql.bin http://www.nominatim.org/data/wikipedia_redirect.sql.bin
wget --output-document=data/gb_postcode_data.sql.gz http://www.nominatim.org/data/gb_postcode_data.sql.gz

cd /
sudo -u postgres createuser -s ec2-user
createuser -SDR www-data

cd ~/
chmod +x ~
chmod +x ~/Nominatim-2.3.0
chmod +x ~/Nominatim-2.3.0/module

sudo su
mkdir /vol/planet
chown ec2-user /vol/planet
chmod a+rx /vol

Then you can do a test run with a small country:

### Test Luxembourg
wget --output-document=/vol/planet/luxembourg-latest.osm.pbf http://download.geofabrik.de/europe/luxembourg-latest.osm.pbf
cd ~/Nominatim-2.3.0
./utils/setup.php --osm-file /vol/planet/luxembourg-latest.osm.pbf --all --osm2pgsql-cache 18000 2>&1 | tee setup.log

# If all is good, then start over
dropdb nominatim

Before proceeding to the full install:

wget --output-document=/vol/planet/planet-latest.osm.pbf http://download.bbbike.org/osm/planet/planet-latest.osm.pbf
wget --output-document=/vol/planet/planet-latest.osm.pbf.md5 http://download.bbbike.org/osm/planet/planet-latest.osm.pbf.md5

# Check the md5 checksum to ensure we downloaded ok
md5sum --check /vol/planet/planet-latest.osm.pbf.md5

Warning this next command takes days

time ./utils/setup.php --osm-file /vol/planet/planet-latest.osm.pbf --all --osm2pgsql-cache 18000 2>&1 | tee setup.log

On my EC2 configuration, it took 5 days. Rank 28 and Rank 30 indexing took the longest.

real    6997m2.441s
user    409m31.204s
sys     88m19.924s

At the end of this 800GB was used in total across my RAID0 volume.

Then you can install the extras:

# Add special phrases
./utils/specialphrases.php --countries > specialphrases_countries.sql
psql -d nominatim -f specialphrases_countries.sql

./utils/specialphrases.php --wiki-import > specialphrases.sql
psql -d nominatim -f specialphrases.sql

And set up the website

# Set up website
sudo mkdir -m 755 /var/www/nominatim
sudo chown nginx /var/www/nominatim
./utils/setup.php --create-website /var/www/nominatim

Edit settings/local.php and add/edit:

@define('CONST_Website_BaseURL', '/nominatim/');

I used nginx as the HTTP server:

sudo yum install nginx
sudo yum install php-fpm

psql -d nominatim -c 'ALTER USER "www-data" RENAME TO "nginx"'

As root, edit /etc/php-fpm.d/www.conf to include:

; Comment out the tcp listener and add the unix socket
;listen = 127.0.0.1:9000
listen = /var/run/php5-fpm.sock
; Ensure that the daemon runs as the correct user
listen.owner = nginx
listen.group = nginx
listen.mode = 0666

As root, edit /etc/nginx/nginx.conf

# Edit to include, with in the http { ... server{ ... }} that is defined
index   index.html index.htm index.php;

#root         /usr/share/nginx/html;
root         /var/www;
    #location / {
    #}
location ~ [^/]\.php(/|$) {
       fastcgi_split_path_info ^(.+?\.php)(/.*)$;
       if (!-f $document_root$fastcgi_script_name) {
               return 404;
       }
       fastcgi_pass unix:/var/run/php5-fpm.sock;
       fastcgi_index index.php;
# Note this next line is super important or you'll get empty responses with no error message!
       include fastcgi.conf;
}

And then hopefully you can run

sudo /etc/init.d/php-fpm start
sudo /etc/init.d/nginx start

At this point, all going well, you should be able to connect to http://yourhost/nominatim and see the OSM Nominatim web page.

  1. TIGER files for US street numbers (optional)

This apparently helps Nominatim geocode street numbers more accurately. Unlike the other options above, this takes a substantial amount of time and space to run.

cd ~/Nominatim-2.3.0/data
mkdir -p TIGER2013/EDGES

# raw files are about 10GB, but will eventually expand to quite a bit more in SQL statements
wget -P TIGER2013/EDGES ftp://ftp2.census.gov/geo/tiger/TIGER2013/EDGES/*

# These next two steps took 24 hours together
./utils/imports.php --parse-tiger-2011 data/TIGER2013/EDGES/
./utils/setup.php --import-tiger-data

psql -d nominatim -c 'GRANT SELECT ON location_property_tiger TO "nginx"'

At this stage df looked like this, and my generous 1.5TB of EBS was looking like a good choice.

Filesystem      1K-blocks       Used Available Use% Mounted on
/dev/xvda1        8123812    4931312   3092252  62% /
devtmpfs         15701944         72  15701872   1% /dev
tmpfs            15710020          0  15710020   0% /dev/shm
/dev/md127     1548045540 1180848836 288537172  81% /vol
  1. Post install configuration

Revert some of the changes in /vol/postgresql.conf

fsync = on
full_page_writes = on
autovacuum = on

And then run

# Postgres gets upset otherwise
sudo chmod go-rx /vol

sudo chkconfig --add postgresql-9.3
sudo chkconfig --add php-fpm
sudo chkconfig --add nginx

sudo service php-fpm start
sudo service postgresql-9.3 start
sudo service nginx start

At this point you should be good to go. I haven't set up automatic updating, so if you proceed with that you'll have to follow the official Nominatim guide and adapt as necessary.

@gopi-ar
Copy link

gopi-ar commented Nov 14, 2017

Thanks for the setup instructions :-)
Did you end up benchmarking this instance? How many RPS were you able to churn out?

At locationiq.org, we host 100+ OSM servers and we didn't have too much luck with AWS; moved to bare metal.

@eosorio-dou
Copy link

Thank you for sharing. I'm building this on a EBS volume with gp3 tuned settings. I wonder which would be the right values (IOPS & Throughput) to speed up the process to import the world map file. I'm looking to have a balance between cost and import time to create an AMI.

@econandrew
Copy link
Author

econandrew commented Apr 30, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment