#Implementing an Array-Based Timeseries Store in Postgres Part 2 See Part 1 here for an explanation of what we're up to. In this bit, I'll be taking this to a larger data store and seeing how the results scale. ##Data Setup and Recap of Part 1 The data is the same as from Part 1, except that we're now using a much larger data set. And we're only using the test2 table that I defined earlier. For a referesher, here's the original schema:
CREATE SCHEMA data;
CREATE TABLE data.generators (
id varchar primary key,
ptid int NOT NULL UNIQUE,
name varchar,
location point,
owner text,
address jsonb,
fields jsonb
);
CREATE TABLE data.rt_lbmp_generators(
generator varchar references data.generators(id) ON UPDATE CASCADE ON DELETE RESTRICT,
record_time timestamptz NOT NULL,
lbmp float,
losses float,
congestion float,
price_version int,
PRIMARY KEY(generator,record_time)
);
CREATE INDEX rt_lbmp_gen_record_time_idx ON data.rt_lbmp_generators (record_time);
CREATE INDEX rt_lbmp_gen_generator_idx ON data.rt_lbmp_generators (generator);
And here's the test2 table and summary data type we use:
CREATE TABLE test2(
generator varchar REFERENCES data.generators(id) ,
record_range tstzrange,
lbmp summary,
losses summary,
congestion summary,
rt_lbmp data.rt_lbmp_generators[]);
CREATE TYPE summary AS (
nn_count int,
minimum double precision,
maximum double precision,
total double precision,
average double precision);
To transfer the data from the original table to our new test2 table we'll use a query like this:
WITH t as (SELECT d.* FROM
(SELECT id FROM data.generators WHERE id NOT IN (
SELECT DISTINCT generator FROM test2) ORDER BY id LIMIT 50) g
INNER JOIN data.rt_lbmp_generators d ON d.generator=g.id)
INSERT INTO test2
SELECT generator, range_merge(to_range(min(record_time),null::tstzrange),
to_range(max(record_time), null::tstzrange)) as record_range,
(count(lbmp) , min(lbmp) , max(lbmp) , sum(lbmp) , avg(lbmp) )::summary as lbmp,
(count(losses) , min(losses) , max(losses) , sum(losses) , avg(losses))::summary as losses,
(count(congestion) , min(congestion) , max(congestion) , sum(congestion) , avg(congestion))::summary as congestion,
array_agg(t.*::data.rt_lbmp_generators) as rt_lbmp FROM t
GROUP BY generator, EXTRACT(year FROM record_time),EXTRACT(month from record_time);
And just run it multiple times until there's no more data to transfer, this is just to make sure if there's an error in the process we haven't lost a whole very long running query, so we segment it a bit.
##Some Nice Aggregation Advantages So now if I wanted to give you an idea of what the data looked like I might run a query like this over the original data:
WITH t as (SELECT * FROM data.rt_lbmp_generators)
SELECT generator, range_merge(to_range(min(record_time),null::tstzrange),
to_range(max(record_time), null::tstzrange)) as record_range,
count(*) as num_records FROM t GROUP BY generator;
Unfortunately, with the large number of records we have, this takes a long time, about 200s!
HashAggregate (cost=5586985.16..5587087.66 rows=200 width=40) (actual time=198774.940..198837.819 rows=498 loops=1)
Group Key: t.generator
Buffers: shared hit=9400 read=1162389, temp written=864703
CTE t
-> Seq Scan on rt_lbmp_generators (cost=0.00..2275588.04 rows=110379904 width=51) (actual time=0.931..52191.970 rows=109080312 loops=1)
Buffers: shared hit=9400 read=1162389
-> CTE Scan on t (cost=0.00..2207598.08 rows=110379904 width=40) (actual time=0.937..133870.390 rows=109080312 loops=1)
Buffers: shared hit=9400 read=1162389, temp written=864703
Planning time: 0.167 ms
Execution time: 199128.589 ms
If we instead use a slightly modified version to run on our table and take advantage of the fact that we've already aggregated the counts inside our segments:
EXPLAIN (ANALYZE,BUFFERS) WITH t as (SELECT * FROM test2) SELECT generator, range_merge(to_range(min(lower(record_range)),null::tstzrange),
to_range(max(upper(record_range)), null::tstzrange)) as record_range,
sum((lbmp).nn_count) as num_records FROM t GROUP BY generator ORDER BY generator;
-------
Sort (cost=1125.14..1125.64 rows=200 width=96) (actual time=102.391..102.480 rows=498 loops=1)
Sort Key: t.generator
Sort Method: quicksort Memory: 69kB
Buffers: shared hit=434
CTE t
-> Seq Scan on test2 (cost=0.00..563.11 rows=12911 width=238) (actual time=0.073..10.604 rows=12911 loops=1)
Buffers: shared hit=434
-> HashAggregate (cost=451.89..554.39 rows=200 width=96) (actual time=36.896..100.448 rows=498 loops=1)
Group Key: t.generator
Buffers: shared hit=434
-> CTE Scan on t (cost=0.00..258.22 rows=12911 width=96) (actual time=0.080..22.479 rows=12911 loops=1)
Buffers: shared hit=434
Planning time: 0.374 ms
Execution time: 103.222 ms
There are probably better ways to get the count of the whole table etc, but this definitely makes it a bit easier over the groups that we have. The whole point of this is that we're taking advantage of some natural grouping in our tables, things we know about that the query planner doesn't in order to make things work better, so this is a nice little example of that.
generator | record_range | num_records |
---|---|---|
59TH STREET_GT_1 | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
74TH STREET_GT_1 | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
74TH STREET_GT_2 | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
ADK HUDSON___FALLS | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
ADK_NYS___DAM | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
ADK RESOURCE___RCVRY | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
ADK S GLENS___FALLS | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
AIR_PRODUCTS___DRP | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
ALBANY___1 | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
ALBANY___2 | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
... | ... | ... |
TRIGEN___CC | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
UNION___PROCESSING_DRP | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
UPPER HUDSON___HYD | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
UPPER RAQUET___HYD | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
VISCHER___FERRY HYD | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
WADING RIVER_IC_1 | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
WADING RIVER_IC_2 | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
WADING RIVER_IC_3 | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
WALDEN___HYDRO | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
WARRENSBURG____ | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
WATERSIDE___6 8 9 | ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] | 219684 |
With 498 generators in total...basically the same as before except instead of only having the first few generators, we now have all of them that had data available in the time range we're looking at. Additionally, the original was running on my local machine whereas this is running on a DigitalOcean instance, a 4GB Ram, 2 CPU, 60GB SSD instance to be specific. So don't compare the actual numbers between part1 and part2 as they are on different machines. And we've increased from ~2 million rows to ~110 million rows in the table here.
##Did it Compress? In part 1 we got a very large compression, much of it coming from index size, which I think will decrease as a fraction of table size as we increase the size of the table, so we'd expect some amount less compression:
relation | total_size |
---|---|
data.rt_lbmp_generators | 25 GB |
public.test2 | 1552 MB |
data.generators | 144 kB |
relation | size |
---|---|
data.rt_lbmp_generators | 9155 MB |
data.rt_lbmp_generators_pkey | 8689 MB |
data.rt_lbmp_gen_generator_idx | 4406 MB |
data.rt_lbmp_gen_record_time_idx | 3171 MB |
pg_toast.pg_toast_81927 | 1530 MB |
pg_toast.pg_toast_81927_index | 17 MB |
public.test2 | 3472 kB |
public.test2_record_range_idx | 840 kB |
public.test2_generator_idx | 456 kB |
So only a 16x decrease in size this time rather than 2 orders of magnitude, still not bad though. I'm not sure how much this will increase or decrease based on the number of columns in the table etc, but that would be an interesting topic to explore. Without indexes etc we get a consistent 5-6x decrease in size of the raw data from the compression that Postgres' TOAST (The Oversized Attribute Storage Technique) does natively.(If you'd like to read more about TOAST and Postgres storage in general, I recommend this and this as good intros).
My guess is that that 5-6x size decrease is the long term limit for the types of tables we're working with here.
UPDATE: Here's results from Jim Nasby's new, better size estimation query.
oid | table_schema | table_name | row_estimate | total_bytes | index_bytes | toast_bytes | table_bytes | total | index | toast | table |
---|---|---|---|---|---|---|---|---|---|---|---|
49487 | data | rt_lbmp_generators | 1.02142e+08 | 26657562624 | 17055834112 | 8192 | 9601720320 | 25 GB | 16 GB | 8192 bytes | 9157 MB |
81927 | public | test2 | 12911 | 1627070464 | 1327104 | 1622163456 | 3579904 | 1552 MB | 1296 kB | 1547 MB | 3496 kB |
##But is it fast? We'll use the same basic set of queries here as we did before.
EXPLAIN (ANALYZE,BUFFERS) SELECT record_time,lbmp,losses FROM data.rt_lbmp_generators
WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2014-07-21]'::tstzrange @> record_time;
------
Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators (cost=0.57..37564.55 rows=1152 width=24) (actual time=0.059..81.303 rows=21829 loops=1)
Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> record_time)
Rows Removed by Filter: 197855
Buffers: shared hit=7157
Planning time: 0.169 ms
Execution time: 85.080 ms
Compared to:
EXPLAIN (ANALYZE,BUFFERS) SELECT (t.rt).record_time, (t.rt).lbmp, (t.rt).losses FROM (
SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 WHERE generator='74TH STREET_GT_1'
AND '[2014-05-07,2014-07-21]'::tstzrange && record_range) as t WHERE'[2014-05-07,2014-07-21]'::tstzrange @> (t.rt).record_time;
------
Subquery Scan on t (cost=0.29..8.25 rows=2 width=32) (actual time=1.384..34.257 rows=21829 loops=1)
Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
Rows Removed by Filter: 4929
Buffers: shared hit=60
-> Index Scan using test2_generator_idx on test2 (cost=0.29..4.50 rows=300 width=18) (actual time=1.377..12.938 rows=26758 loops=1)
Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange && record_range)
Rows Removed by Filter: 23
Buffers: shared hit=60
Planning time: 0.212 ms
Execution time: 38.413 ms
So for the basic Select query we’re still doing significantly better with our array based store than otherwise. The real speedup happens with aggregates though:
EXPLAIN (ANALYZE,BUFFERS) SELECT sum(lbmp) as total_lbmp, sum(losses) as total_losses FROM data.rt_lbmp_generators
WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange @> record_time;
-----
Aggregate (cost=37570.31..37570.32 rows=1 width=16) (actual time=150.348..150.349 rows=1 loops=1)
Buffers: shared hit=7157
-> Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators (cost=0.57..37564.55 rows=1152 width=16) (actual time=0.046..108.721 rows=127615 loops=1)
Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
Rows Removed by Filter: 92069
Buffers: shared hit=7157
Planning time: 0.193 ms
Execution time: 150.407 ms
Compared to:
EXPLAIN (ANALYZE,BUFFERS) SELECT sum(b.lbmp) total_lbmp, sum(b.losses) total_losses FROM (
SELECT sum(lbmp) lbmp, sum(losses) losses FROM (
SELECT (lbmp).total as lbmp, (losses).total as losses FROM test2
WHERE generator='74TH STREET_GT_1' AND'[2014-05-07,2015-07-21]'::tstzrange @> record_range) as t1
UNION ALL
SELECT sum( (t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2
WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange && record_range AND
record_range NOT IN (
SELECT record_range FROM test2 WHERE '[2014-05-07,2015-07-21]'::tstzrange @> record_range)) as t
WHERE'[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time) as b;
-----
Aggregate (cost=631.95..631.96 rows=1 width=16) (actual time=29.571..29.572 rows=1 loops=1)
Buffers: shared hit=482
-> Append (cost=3.07..631.94 rows=2 width=16) (actual time=0.068..29.564 rows=2 loops=1)
Buffers: shared hit=482
-> Aggregate (cost=3.07..3.08 rows=1 width=122) (actual time=0.068..0.068 rows=1 loops=1)
Buffers: shared hit=4
-> Index Scan using test2_generator_idx on test2 (cost=0.29..3.01 rows=13 width=122) (actual time=0.042..0.055 rows=13 loops=1)
Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
Rows Removed by Filter: 13
Buffers: shared hit=4
-> Aggregate (cost=628.83..628.84 rows=1 width=32) (actual time=29.495..29.495 rows=1 loops=1)
Buffers: shared hit=478
-> Subquery Scan on t (cost=612.04..628.81 rows=4 width=32) (actual time=10.914..24.432 rows=13122 loops=1)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
Rows Removed by Filter: 4939
Buffers: shared hit=478
-> Index Scan using test2_generator_idx on test2 test2_1 (cost=612.04..618.81 rows=800 width=18) (actual time=10.904..16.656 rows=18061 loops=1)
Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)))
Rows Removed by Filter: 24
Buffers: shared hit=478
SubPlan 1
-> Seq Scan on test2 test2_2 (cost=0.00..595.39 rows=6548 width=22) (actual time=0.006..5.727 rows=6461 loops=1)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
Rows Removed by Filter: 6450
Buffers: shared hit=434
Planning time: 0.431 ms
Execution time: 29.678 ms
##But what if we're not running the same query all day? So here's the really interesting bit: what happens when we aren't selecting/aggregating the same thing over and over again? I chose a few different generators and ran the same queries to check that out because I'd noticed that the first queries I ran tended to take longer than subsequent calls for the same query. All we've done here is change the name of the generator.
EXPLAIN (ANALYZE,BUFFERS) SELECT sum(lbmp) as total_lbmp, sum(losses) as total_losses FROM data.rt_lbmp_generators
WHERE generator='BLUE_CIRC_CHEM_DRP' AND '[2014-05-07,2015-07-21]'::tstzrange @> record_time;
-----
--First Run
Aggregate (cost=37570.31..37570.32 rows=1 width=16) (actual time=1730.634..1730.635 rows=1 loops=1)
Buffers: shared hit=3395 read=3800
-> Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators (cost=0.57..37564.55 rows=1152 width=16) (actual time=3.245..1683.733 rows=127615 loops=1)
Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
Rows Removed by Filter: 92069
Buffers: shared hit=3395 read=3800
Planning time: 0.150 ms
Execution time: 1730.808 ms
--Second Run
Aggregate (cost=37570.31..37570.32 rows=1 width=16) (actual time=158.710..158.711 rows=1 loops=1)
Buffers: shared hit=7195
-> Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators (cost=0.57..37564.55 rows=1152 width=16) (actual time=0.100..117.741 rows=127615 loops=1)
Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
Rows Removed by Filter: 92069
Buffers: shared hit=7195
Planning time: 0.378 ms
Execution time: 158.791 ms
So with the standard aggregation query when we are no longer working in shared buffers, we've got an order of magnitude speedup from the first to second run of the same query, likely mostly due to disk read times. So let's see how our array based system worked:
EXPLAIN (ANALYZE,BUFFERS) SELECT sum(b.lbmp) total_lbmp, sum(b.losses) total_losses FROM (
SELECT sum(lbmp) lbmp, sum(losses) losses FROM (
SELECT (lbmp).total as lbmp, (losses).total as losses FROM test2
WHERE generator='BLUE_CIRC_CHEM_DRP' AND'[2014-05-07,2015-07-21]'::tstzrange @> record_range) as t1
UNION ALL
SELECT sum( (t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2
WHERE generator='BLUE_CIRC_CHEM_DRP' AND '[2014-05-07,2015-07-21]'::tstzrange && record_range AND
record_range NOT IN (
SELECT record_range FROM test2 WHERE '[2014-05-07,2015-07-21]'::tstzrange @> record_range)) as t
WHERE'[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time) as b;
-----
--First Run
Aggregate (cost=631.95..631.96 rows=1 width=16) (actual time=57.715..57.715 rows=1 loops=1)
Buffers: shared hit=443 read=34
-> Append (cost=3.07..631.94 rows=2 width=16) (actual time=1.418..57.708 rows=2 loops=1)
Buffers: shared hit=443 read=34
-> Aggregate (cost=3.07..3.08 rows=1 width=122) (actual time=1.416..1.416 rows=1 loops=1)
Buffers: shared hit=2 read=1
-> Index Scan using test2_generator_idx on test2 (cost=0.29..3.01 rows=13 width=122) (actual time=1.354..1.388 rows=13 loops=1)
Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
Rows Removed by Filter: 13
Buffers: shared hit=2 read=1
-> Aggregate (cost=628.83..628.84 rows=1 width=32) (actual time=56.288..56.289 rows=1 loops=1)
Buffers: shared hit=441 read=33
-> Subquery Scan on t (cost=612.04..628.81 rows=4 width=32) (actual time=31.516..50.543 rows=13122 loops=1)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
Rows Removed by Filter: 4939
Buffers: shared hit=441 read=33
-> Index Scan using test2_generator_idx on test2 test2_1 (cost=612.04..618.81 rows=800 width=18) (actual time=31.494..41.488 rows=18061 loops=1)
Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)))
Rows Removed by Filter: 24
Buffers: shared hit=441 read=33
SubPlan 1
-> Seq Scan on test2 test2_2 (cost=0.00..595.39 rows=6548 width=22) (actual time=0.035..17.811 rows=6461 loops=1)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
Rows Removed by Filter: 6450
Buffers: shared hit=434
Planning time: 0.919 ms
Execution time: 58.051 ms
--Second Run
Aggregate (cost=631.95..631.96 rows=1 width=16) (actual time=29.849..29.849 rows=1 loops=1)
Buffers: shared hit=477
-> Append (cost=3.07..631.94 rows=2 width=16) (actual time=0.110..29.842 rows=2 loops=1)
Buffers: shared hit=477
-> Aggregate (cost=3.07..3.08 rows=1 width=122) (actual time=0.108..0.108 rows=1 loops=1)
Buffers: shared hit=3
-> Index Scan using test2_generator_idx on test2 (cost=0.29..3.01 rows=13 width=122) (actual time=0.064..0.086 rows=13 loops=1)
Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
Rows Removed by Filter: 13
Buffers: shared hit=3
-> Aggregate (cost=628.83..628.84 rows=1 width=32) (actual time=29.729..29.729 rows=1 loops=1)
Buffers: shared hit=474
-> Subquery Scan on t (cost=612.04..628.81 rows=4 width=32) (actual time=11.059..24.372 rows=13122 loops=1)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
Rows Removed by Filter: 4939
Buffers: shared hit=474
-> Index Scan using test2_generator_idx on test2 test2_1 (cost=612.04..618.81 rows=800 width=18) (actual time=11.051..16.621 rows=18061 loops=1)
Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)))
Rows Removed by Filter: 24
Buffers: shared hit=474
SubPlan 1
-> Seq Scan on test2 test2_2 (cost=0.00..595.39 rows=6548 width=22) (actual time=0.111..5.851 rows=6461 loops=1)
Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
Rows Removed by Filter: 6450
Buffers: shared hit=434
Planning time: 0.696 ms
Execution time: 30.022 ms
Well that's nice. Two order of magnitude speedup when the query doesn't hit shared buffers fully (for the normal table case). The buffers matter far less here because the table is much, much smaller (the part we're doing our first filter on that's not TOAST-ed is tiny and likely stays in shared buffers, the filters on individual rows need to be done, but it's done from the compressed TOAST-ed version which (I think) is stored sequentially on disk so should be relatively easy to access anyway). I also will note that I ran both of these queries with a few other generators substituted in and sometimes started by running on the normal table and sometimes started on the array-based storage table and the results were consistent with those above (there wasn't something weird going on where running the query on the first table moved something into buffers for the second because of shared foreign keys or anything like that). We'd definitely expect something like this with the aggregation case. But what about the more normal select case? I wouldn't have expected the speedup to be quite as large.
EXPLAIN (ANALYZE,BUFFERS) SELECT record_time,lbmp,losses FROM data.rt_lbmp_generators
WHERE generator='CARTHAGE___PAPER' AND '[2014-05-07,2014-07-21]'::tstzrange @> record_time;
-----
--First Run
Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators (cost=0.57..37564.55 rows=1152 width=24) (actual time=2.246..1760.533 rows=21829 loops=1)
Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> record_time)
Rows Removed by Filter: 197855
Buffers: shared hit=3300 read=3807
Planning time: 0.209 ms
Execution time: 1765.724 ms
--Second Run
Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators (cost=0.57..37564.55 rows=1152 width=24) (actual time=0.091..78.138 rows=21829 loops=1)
Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> record_time)
Rows Removed by Filter: 197855
Buffers: shared hit=7107
Planning time: 0.217 ms
Execution time: 81.957 ms
Compared to:
EXPLAIN (ANALYZE,BUFFERS) SELECT (t.rt).record_time, (t.rt).lbmp, (t.rt).losses FROM (
SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 WHERE generator='CARTHAGE___PAPER'
AND '[2014-05-07,2014-07-21]'::tstzrange && record_range) as t WHERE'[2014-05-07,2014-07-21]'::tstzrange @> (t.rt).record_time ;
-----
--First Run
Subquery Scan on t (cost=0.29..8.25 rows=2 width=32) (actual time=6.281..41.960 rows=21829 loops=1)
Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
Rows Removed by Filter: 4929
Buffers: shared hit=12 read=49
-> Index Scan using test2_generator_idx on test2 (cost=0.29..4.50 rows=300 width=18) (actual time=6.267..19.989 rows=26758 loops=1)
Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange && record_range)
Rows Removed by Filter: 23
Buffers: shared hit=12 read=49
Planning time: 0.205 ms
Execution time: 46.844 ms
--Second Run
Subquery Scan on t (cost=0.29..8.25 rows=2 width=32) (actual time=1.361..29.782 rows=21829 loops=1)
Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
Rows Removed by Filter: 4929
Buffers: shared hit=61
-> Index Scan using test2_generator_idx on test2 (cost=0.29..4.50 rows=300 width=18) (actual time=1.341..11.730 rows=26758 loops=1)
Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange && record_range)
Rows Removed by Filter: 23
Buffers: shared hit=61
Planning time: 0.249 ms
Execution time: 33.508 ms
But we got a two orders of magnitude speedup again. So yeah. That ain't bad at all.
I'd love thoughts/comments sparked by this: are these comparisons fair? What other types of use cases are there that you think would break this horribly? What are the most important steps to take to automate this sort of storage/querying? What sorts of queries would break this type of setup or do you think would perform badly?
Here's a better query for the aggregate query we've been working on. The
NOT IN
clause had been bugging me for a bit...for some reason there isn't a range/array operator equivalent to!@>
i.e. not contained by, which makes it hard to find the ranges that overlap but are not fully contained by the range of interest. However, I finally realized that I could use the union (+) operator and just check if the result's the same as the range of interest i.e.record_range + '[2014-05-07,2015-07-21]'::tstzrange <> '[2014-05-07,2015-07-21]'::tstzrange
. It eliminates some rather annoying query planner stuff as well and speeds up the queries a bit.