Timeseries are problematical in Postgres, here's part 2 of an attempt to make them a little less so.

#Implementing an Array-Based Timeseries Store in Postgres Part 2 See Part 1 here for an explanation of what we're up to. In this bit, I'll be taking this to a larger data store and seeing how the results scale. ##Data Setup and Recap of Part 1 The data is the same as from Part 1, except that we're now using a much larger data set. And we're only using the test2 table that I defined earlier. For a referesher, here's the original schema:

CREATE SCHEMA data;
CREATE TABLE data.generators (
    id          varchar primary key,
    ptid        int NOT NULL UNIQUE,
    name        varchar,
    location    point,
    owner       text,
    address     jsonb,
    fields      jsonb
);
CREATE TABLE data.rt_lbmp_generators(
    generator          varchar references data.generators(id) ON UPDATE CASCADE ON DELETE RESTRICT,
    record_time        timestamptz NOT NULL,
    lbmp               float,
    losses             float,
    congestion         float,
    price_version      int,
    PRIMARY KEY(generator,record_time)
);
CREATE INDEX rt_lbmp_gen_record_time_idx ON data.rt_lbmp_generators (record_time);
CREATE INDEX rt_lbmp_gen_generator_idx ON data.rt_lbmp_generators (generator);

And here's the test2 table and summary data type we use:

CREATE TABLE test2(
	generator 		varchar REFERENCES data.generators(id) ,
	record_range	tstzrange,
	lbmp			summary,
	losses			summary,
	congestion		summary,
	rt_lbmp			data.rt_lbmp_generators[]);

CREATE TYPE summary AS (
	nn_count		int,
	minimum		double precision,
	maximum		double precision,
	total		double precision,
	average		double precision);

To transfer the data from the original table to our new test2 table we'll use a query like this:

WITH t as (SELECT d.* FROM 
  (SELECT id FROM data.generators WHERE id NOT IN (
    SELECT DISTINCT generator FROM test2) ORDER BY id LIMIT 50) g 
      INNER JOIN data.rt_lbmp_generators d ON d.generator=g.id)  
INSERT INTO test2 
SELECT generator, range_merge(to_range(min(record_time),null::tstzrange),
  to_range(max(record_time), null::tstzrange)) as record_range, 
  (count(lbmp) , min(lbmp) , max(lbmp) , sum(lbmp) , avg(lbmp) )::summary as lbmp, 
  (count(losses) , min(losses) , max(losses) , sum(losses) , avg(losses))::summary as losses, 
  (count(congestion) , min(congestion) , max(congestion) , sum(congestion) , avg(congestion))::summary as congestion,
  array_agg(t.*::data.rt_lbmp_generators) as rt_lbmp FROM t 
GROUP BY generator, EXTRACT(year FROM record_time),EXTRACT(month from record_time);

And just run it multiple times until there's no more data to transfer, this is just to make sure if there's an error in the process we haven't lost a whole very long running query, so we segment it a bit.

##Some Nice Aggregation Advantages So now if I wanted to give you an idea of what the data looked like I might run a query like this over the original data:

WITH t as (SELECT * FROM data.rt_lbmp_generators)  
  SELECT generator, range_merge(to_range(min(record_time),null::tstzrange),
    to_range(max(record_time), null::tstzrange)) as record_range, 
    count(*) as num_records FROM t GROUP BY generator;

Unfortunately, with the large number of records we have, this takes a long time, about 200s!

HashAggregate  (cost=5586985.16..5587087.66 rows=200 width=40) (actual time=198774.940..198837.819 rows=498 loops=1)
  Group Key: t.generator
  Buffers: shared hit=9400 read=1162389, temp written=864703
  CTE t
    ->  Seq Scan on rt_lbmp_generators  (cost=0.00..2275588.04 rows=110379904 width=51) (actual time=0.931..52191.970 rows=109080312 loops=1)
          Buffers: shared hit=9400 read=1162389
  ->  CTE Scan on t  (cost=0.00..2207598.08 rows=110379904 width=40) (actual time=0.937..133870.390 rows=109080312 loops=1)
        Buffers: shared hit=9400 read=1162389, temp written=864703
Planning time: 0.167 ms
Execution time: 199128.589 ms

If we instead use a slightly modified version to run on our table and take advantage of the fact that we've already aggregated the counts inside our segments:

EXPLAIN (ANALYZE,BUFFERS) WITH t as (SELECT * FROM test2)  SELECT generator, range_merge(to_range(min(lower(record_range)),null::tstzrange),
  to_range(max(upper(record_range)), null::tstzrange)) as record_range, 
  sum((lbmp).nn_count) as num_records FROM t GROUP BY generator ORDER BY generator;
-------
Sort  (cost=1125.14..1125.64 rows=200 width=96) (actual time=102.391..102.480 rows=498 loops=1)
  Sort Key: t.generator
  Sort Method: quicksort  Memory: 69kB
  Buffers: shared hit=434
  CTE t
    ->  Seq Scan on test2  (cost=0.00..563.11 rows=12911 width=238) (actual time=0.073..10.604 rows=12911 loops=1)
          Buffers: shared hit=434
  ->  HashAggregate  (cost=451.89..554.39 rows=200 width=96) (actual time=36.896..100.448 rows=498 loops=1)
        Group Key: t.generator
        Buffers: shared hit=434
        ->  CTE Scan on t  (cost=0.00..258.22 rows=12911 width=96) (actual time=0.080..22.479 rows=12911 loops=1)
              Buffers: shared hit=434
Planning time: 0.374 ms
Execution time: 103.222 ms

There are probably better ways to get the count of the whole table etc, but this definitely makes it a bit easier over the groups that we have. The whole point of this is that we're taking advantage of some natural grouping in our tables, things we know about that the query planner doesn't in order to make things work better, so this is a nice little example of that.

generator	record_range	num_records
59TH STREET_GT_1	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
74TH STREET_GT_1	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
74TH STREET_GT_2	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
ADK HUDSON___FALLS	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
ADK_NYS___DAM	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
ADK RESOURCE___RCVRY	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
ADK S GLENS___FALLS	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
AIR_PRODUCTS___DRP	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
ALBANY___1	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
ALBANY___2	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
...	...	...
TRIGEN___CC	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
UNION___PROCESSING_DRP	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
UPPER HUDSON___HYD	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
UPPER RAQUET___HYD	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
VISCHER___FERRY HYD	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
WADING RIVER_IC_1	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
WADING RIVER_IC_2	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
WADING RIVER_IC_3	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
WALDEN___HYDRO	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
WARRENSBURG____	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684
WATERSIDE___6 8 9	["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"]	219684

With 498 generators in total...basically the same as before except instead of only having the first few generators, we now have all of them that had data available in the time range we're looking at. Additionally, the original was running on my local machine whereas this is running on a DigitalOcean instance, a 4GB Ram, 2 CPU, 60GB SSD instance to be specific. So don't compare the actual numbers between part1 and part2 as they are on different machines. And we've increased from ~2 million rows to ~110 million rows in the table here.

##Did it Compress? In part 1 we got a very large compression, much of it coming from index size, which I think will decrease as a fraction of table size as we increase the size of the table, so we'd expect some amount less compression:

relation	total_size
data.rt_lbmp_generators	25 GB
public.test2	1552 MB
data.generators	144 kB

relation	size
data.rt_lbmp_generators	9155 MB
data.rt_lbmp_generators_pkey	8689 MB
data.rt_lbmp_gen_generator_idx	4406 MB
data.rt_lbmp_gen_record_time_idx	3171 MB
pg_toast.pg_toast_81927	1530 MB
pg_toast.pg_toast_81927_index	17 MB
public.test2	3472 kB
public.test2_record_range_idx	840 kB
public.test2_generator_idx	456 kB

So only a 16x decrease in size this time rather than 2 orders of magnitude, still not bad though. I'm not sure how much this will increase or decrease based on the number of columns in the table etc, but that would be an interesting topic to explore. Without indexes etc we get a consistent 5-6x decrease in size of the raw data from the compression that Postgres' TOAST (The Oversized Attribute Storage Technique) does natively.(If you'd like to read more about TOAST and Postgres storage in general, I recommend this and this as good intros).

My guess is that that 5-6x size decrease is the long term limit for the types of tables we're working with here.

UPDATE: Here's results from Jim Nasby's new, better size estimation query.

oid	table_schema	table_name	row_estimate	total_bytes	index_bytes	toast_bytes	table_bytes	total	index	toast	table
49487	data	rt_lbmp_generators	1.02142e+08	26657562624	17055834112	8192	9601720320	25 GB	16 GB	8192 bytes	9157 MB
81927	public	test2	12911	1627070464	1327104	1622163456	3579904	1552 MB	1296 kB	1547 MB	3496 kB

##But is it fast? We'll use the same basic set of queries here as we did before.

 EXPLAIN (ANALYZE,BUFFERS) SELECT record_time,lbmp,losses FROM data.rt_lbmp_generators 
  WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2014-07-21]'::tstzrange @> record_time;
------
  Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=24) (actual time=0.059..81.303 rows=21829 loops=1)
  Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> record_time)
  Rows Removed by Filter: 197855
  Buffers: shared hit=7157
Planning time: 0.169 ms
Execution time: 85.080 ms

Compared to:

 EXPLAIN (ANALYZE,BUFFERS) SELECT  (t.rt).record_time, (t.rt).lbmp, (t.rt).losses FROM (
  SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 WHERE generator='74TH STREET_GT_1' 
    AND '[2014-05-07,2014-07-21]'::tstzrange && record_range) as t WHERE'[2014-05-07,2014-07-21]'::tstzrange @> (t.rt).record_time;
------
Subquery Scan on t  (cost=0.29..8.25 rows=2 width=32) (actual time=1.384..34.257 rows=21829 loops=1)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
  Rows Removed by Filter: 4929
  Buffers: shared hit=60
  ->  Index Scan using test2_generator_idx on test2  (cost=0.29..4.50 rows=300 width=18) (actual time=1.377..12.938 rows=26758 loops=1)
        Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
        Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange && record_range)
        Rows Removed by Filter: 23
        Buffers: shared hit=60
Planning time: 0.212 ms
Execution time: 38.413 ms

So for the basic Select query we’re still doing significantly better with our array based store than otherwise. The real speedup happens with aggregates though:

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(lbmp) as total_lbmp, sum(losses) as total_losses FROM data.rt_lbmp_generators 
  WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange @> record_time;
-----
Aggregate  (cost=37570.31..37570.32 rows=1 width=16) (actual time=150.348..150.349 rows=1 loops=1)
  Buffers: shared hit=7157
  ->  Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=16) (actual time=0.046..108.721 rows=127615 loops=1)
        Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
        Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
        Rows Removed by Filter: 92069
        Buffers: shared hit=7157
Planning time: 0.193 ms
Execution time: 150.407 ms

Compared to:

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(b.lbmp) total_lbmp, sum(b.losses) total_losses FROM (
  SELECT sum(lbmp) lbmp, sum(losses) losses FROM (
    SELECT (lbmp).total as lbmp, (losses).total as losses FROM test2 
    WHERE generator='74TH STREET_GT_1' AND'[2014-05-07,2015-07-21]'::tstzrange @> record_range) as t1
  UNION ALL
  SELECT sum( (t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
    SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 
    WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange && record_range AND
      record_range NOT IN (
        SELECT record_range FROM test2 WHERE '[2014-05-07,2015-07-21]'::tstzrange @> record_range)) as t
  WHERE'[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time) as b;
-----
Aggregate  (cost=631.95..631.96 rows=1 width=16) (actual time=29.571..29.572 rows=1 loops=1)
  Buffers: shared hit=482
  ->  Append  (cost=3.07..631.94 rows=2 width=16) (actual time=0.068..29.564 rows=2 loops=1)
        Buffers: shared hit=482
        ->  Aggregate  (cost=3.07..3.08 rows=1 width=122) (actual time=0.068..0.068 rows=1 loops=1)
              Buffers: shared hit=4
              ->  Index Scan using test2_generator_idx on test2  (cost=0.29..3.01 rows=13 width=122) (actual time=0.042..0.055 rows=13 loops=1)
                    Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                    Rows Removed by Filter: 13
                    Buffers: shared hit=4
        ->  Aggregate  (cost=628.83..628.84 rows=1 width=32) (actual time=29.495..29.495 rows=1 loops=1)
              Buffers: shared hit=478
              ->  Subquery Scan on t  (cost=612.04..628.81 rows=4 width=32) (actual time=10.914..24.432 rows=13122 loops=1)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
                    Rows Removed by Filter: 4939
                    Buffers: shared hit=478
                    ->  Index Scan using test2_generator_idx on test2 test2_1  (cost=612.04..618.81 rows=800 width=18) (actual time=10.904..16.656 rows=18061 loops=1)
                          Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
                          Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)))
                          Rows Removed by Filter: 24
                          Buffers: shared hit=478
                          SubPlan 1
                            ->  Seq Scan on test2 test2_2  (cost=0.00..595.39 rows=6548 width=22) (actual time=0.006..5.727 rows=6461 loops=1)
                                  Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                                  Rows Removed by Filter: 6450
                                  Buffers: shared hit=434
Planning time: 0.431 ms
Execution time: 29.678 ms

##But what if we're not running the same query all day? So here's the really interesting bit: what happens when we aren't selecting/aggregating the same thing over and over again? I chose a few different generators and ran the same queries to check that out because I'd noticed that the first queries I ran tended to take longer than subsequent calls for the same query. All we've done here is change the name of the generator.

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(lbmp) as total_lbmp, sum(losses) as total_losses FROM data.rt_lbmp_generators 
  WHERE generator='BLUE_CIRC_CHEM_DRP' AND '[2014-05-07,2015-07-21]'::tstzrange @> record_time;
-----
--First Run
Aggregate  (cost=37570.31..37570.32 rows=1 width=16) (actual time=1730.634..1730.635 rows=1 loops=1)
  Buffers: shared hit=3395 read=3800
  ->  Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=16) (actual time=3.245..1683.733 rows=127615 loops=1)
        Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
        Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
        Rows Removed by Filter: 92069
        Buffers: shared hit=3395 read=3800
Planning time: 0.150 ms
Execution time: 1730.808 ms
--Second Run
Aggregate  (cost=37570.31..37570.32 rows=1 width=16) (actual time=158.710..158.711 rows=1 loops=1)
  Buffers: shared hit=7195
  ->  Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=16) (actual time=0.100..117.741 rows=127615 loops=1)
        Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
        Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
        Rows Removed by Filter: 92069
        Buffers: shared hit=7195
Planning time: 0.378 ms
Execution time: 158.791 ms

So with the standard aggregation query when we are no longer working in shared buffers, we've got an order of magnitude speedup from the first to second run of the same query, likely mostly due to disk read times. So let's see how our array based system worked:

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(b.lbmp) total_lbmp, sum(b.losses) total_losses FROM (
  SELECT sum(lbmp) lbmp, sum(losses) losses FROM (
    SELECT (lbmp).total as lbmp, (losses).total as losses FROM test2 
    WHERE generator='BLUE_CIRC_CHEM_DRP' AND'[2014-05-07,2015-07-21]'::tstzrange @> record_range) as t1
  UNION ALL
  SELECT sum( (t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
    SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 
    WHERE generator='BLUE_CIRC_CHEM_DRP' AND '[2014-05-07,2015-07-21]'::tstzrange && record_range AND
      record_range NOT IN (
        SELECT record_range FROM test2 WHERE '[2014-05-07,2015-07-21]'::tstzrange @> record_range)) as t
  WHERE'[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time) as b;
-----
--First Run
Aggregate  (cost=631.95..631.96 rows=1 width=16) (actual time=57.715..57.715 rows=1 loops=1)
  Buffers: shared hit=443 read=34
  ->  Append  (cost=3.07..631.94 rows=2 width=16) (actual time=1.418..57.708 rows=2 loops=1)
        Buffers: shared hit=443 read=34
        ->  Aggregate  (cost=3.07..3.08 rows=1 width=122) (actual time=1.416..1.416 rows=1 loops=1)
              Buffers: shared hit=2 read=1
              ->  Index Scan using test2_generator_idx on test2  (cost=0.29..3.01 rows=13 width=122) (actual time=1.354..1.388 rows=13 loops=1)
                    Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                    Rows Removed by Filter: 13
                    Buffers: shared hit=2 read=1
        ->  Aggregate  (cost=628.83..628.84 rows=1 width=32) (actual time=56.288..56.289 rows=1 loops=1)
              Buffers: shared hit=441 read=33
              ->  Subquery Scan on t  (cost=612.04..628.81 rows=4 width=32) (actual time=31.516..50.543 rows=13122 loops=1)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
                    Rows Removed by Filter: 4939
                    Buffers: shared hit=441 read=33
                    ->  Index Scan using test2_generator_idx on test2 test2_1  (cost=612.04..618.81 rows=800 width=18) (actual time=31.494..41.488 rows=18061 loops=1)
                          Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
                          Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)))
                          Rows Removed by Filter: 24
                          Buffers: shared hit=441 read=33
                          SubPlan 1
                            ->  Seq Scan on test2 test2_2  (cost=0.00..595.39 rows=6548 width=22) (actual time=0.035..17.811 rows=6461 loops=1)
                                  Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                                  Rows Removed by Filter: 6450
                                  Buffers: shared hit=434
Planning time: 0.919 ms
Execution time: 58.051 ms
--Second Run
Aggregate  (cost=631.95..631.96 rows=1 width=16) (actual time=29.849..29.849 rows=1 loops=1)
  Buffers: shared hit=477
  ->  Append  (cost=3.07..631.94 rows=2 width=16) (actual time=0.110..29.842 rows=2 loops=1)
        Buffers: shared hit=477
        ->  Aggregate  (cost=3.07..3.08 rows=1 width=122) (actual time=0.108..0.108 rows=1 loops=1)
              Buffers: shared hit=3
              ->  Index Scan using test2_generator_idx on test2  (cost=0.29..3.01 rows=13 width=122) (actual time=0.064..0.086 rows=13 loops=1)
                    Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                    Rows Removed by Filter: 13
                    Buffers: shared hit=3
        ->  Aggregate  (cost=628.83..628.84 rows=1 width=32) (actual time=29.729..29.729 rows=1 loops=1)
              Buffers: shared hit=474
              ->  Subquery Scan on t  (cost=612.04..628.81 rows=4 width=32) (actual time=11.059..24.372 rows=13122 loops=1)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
                    Rows Removed by Filter: 4939
                    Buffers: shared hit=474
                    ->  Index Scan using test2_generator_idx on test2 test2_1  (cost=612.04..618.81 rows=800 width=18) (actual time=11.051..16.621 rows=18061 loops=1)
                          Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
                          Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)))
                          Rows Removed by Filter: 24
                          Buffers: shared hit=474
                          SubPlan 1
                            ->  Seq Scan on test2 test2_2  (cost=0.00..595.39 rows=6548 width=22) (actual time=0.111..5.851 rows=6461 loops=1)
                                  Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                                  Rows Removed by Filter: 6450
                                  Buffers: shared hit=434
Planning time: 0.696 ms
Execution time: 30.022 ms

Well that's nice. Two order of magnitude speedup when the query doesn't hit shared buffers fully (for the normal table case). The buffers matter far less here because the table is much, much smaller (the part we're doing our first filter on that's not TOAST-ed is tiny and likely stays in shared buffers, the filters on individual rows need to be done, but it's done from the compressed TOAST-ed version which (I think) is stored sequentially on disk so should be relatively easy to access anyway). I also will note that I ran both of these queries with a few other generators substituted in and sometimes started by running on the normal table and sometimes started on the array-based storage table and the results were consistent with those above (there wasn't something weird going on where running the query on the first table moved something into buffers for the second because of shared foreign keys or anything like that). We'd definitely expect something like this with the aggregation case. But what about the more normal select case? I wouldn't have expected the speedup to be quite as large.

EXPLAIN (ANALYZE,BUFFERS) SELECT record_time,lbmp,losses FROM data.rt_lbmp_generators 
  WHERE generator='CARTHAGE___PAPER' AND '[2014-05-07,2014-07-21]'::tstzrange @> record_time;
-----
--First Run
Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=24) (actual time=2.246..1760.533 rows=21829 loops=1)
  Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> record_time)
  Rows Removed by Filter: 197855
  Buffers: shared hit=3300 read=3807
Planning time: 0.209 ms
Execution time: 1765.724 ms
--Second Run
Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=24) (actual time=0.091..78.138 rows=21829 loops=1)
  Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> record_time)
  Rows Removed by Filter: 197855
  Buffers: shared hit=7107
Planning time: 0.217 ms
Execution time: 81.957 ms

Compared to:

EXPLAIN (ANALYZE,BUFFERS) SELECT  (t.rt).record_time, (t.rt).lbmp, (t.rt).losses FROM (
  SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 WHERE generator='CARTHAGE___PAPER' 
    AND '[2014-05-07,2014-07-21]'::tstzrange && record_range) as t WHERE'[2014-05-07,2014-07-21]'::tstzrange @> (t.rt).record_time ;
-----
--First Run
Subquery Scan on t  (cost=0.29..8.25 rows=2 width=32) (actual time=6.281..41.960 rows=21829 loops=1)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
  Rows Removed by Filter: 4929
  Buffers: shared hit=12 read=49
  ->  Index Scan using test2_generator_idx on test2  (cost=0.29..4.50 rows=300 width=18) (actual time=6.267..19.989 rows=26758 loops=1)
        Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
        Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange && record_range)
        Rows Removed by Filter: 23
        Buffers: shared hit=12 read=49
Planning time: 0.205 ms
Execution time: 46.844 ms
--Second Run
Subquery Scan on t  (cost=0.29..8.25 rows=2 width=32) (actual time=1.361..29.782 rows=21829 loops=1)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
  Rows Removed by Filter: 4929
  Buffers: shared hit=61
  ->  Index Scan using test2_generator_idx on test2  (cost=0.29..4.50 rows=300 width=18) (actual time=1.341..11.730 rows=26758 loops=1)
        Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
        Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange && record_range)
        Rows Removed by Filter: 23
        Buffers: shared hit=61
Planning time: 0.249 ms
Execution time: 33.508 ms

But we got a two orders of magnitude speedup again. So yeah. That ain't bad at all.

I'd love thoughts/comments sparked by this: are these comparisons fair? What other types of use cases are there that you think would break this horribly? What are the most important steps to take to automate this sort of storage/querying? What sorts of queries would break this type of setup or do you think would perform badly?

djk447/Implementing an Array-Based Timeseries Store in Postgres Part 2.md

djk447 commented Jun 6, 2016

djk447 commented Jun 10, 2016