Adhika Setya Pramudita

Adhika Setya Pramudita

Collection of thoughts

20 Feb 2022

Exploring TimescaleDB

This is a simple exploration of TimescaleDB, a time-series database.

See the code in adhikasp/timescaledb-exploration.

Comparison between databases

FeaturesMySQL 5.7MySQL 8.xTimeScaleDBScyllaDB
Atomic transactionYesYesYesNo
Instant alterNoYesYesYes
Background indexingNoNoYesYes
Parallel scan queryNoNoYesYes
Data compactionNoNoYesYes
Data TTLNoNoYesYes
Analytics-like queryNoNoYes, continuous aggregateYes, realtime query
Multi node shardingNoNoYes, with master nodeYes, fully peer-to-peer
Multiple indexesYesYesYesKinda, use GSI / materialized view

Query time

1SELECT sum(total_fee), count(*) from transactions WHERE merchant_id = 'merchant-1';
DatabaseTime
Mysql 8.06 min 52.35 sec
TimescaleDB (raw table query)0 min 27 sec
TimescaleDB (continuous aggregate)35 ms

Setup

You must have this installed before hand.

sudo apt-get install -y postgresql-client
pip3 install progressbar2 faker
go get github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy

Starting postgresql with timescaledb extension

docker-compose up -d

Create table and populate data

./command.sh

1 million dataset

CSV data generation 2:17
Insert 3:52

Raw data size 314MB
DB size 1.0GB
postgresql total size 2.1GB

10 million dataset

CSV data generation 21:53
DB insert 06:50:47

Raw data size 3.1G
DB size 8.95GB
postgresql total size 10GB

10 million dataset 2nd run

CSV data generation 22:15
DB Insert 3:28:49

Raw data size 3.1G
DB size 10.8G
postgresql total size 12G

798 ingestion / second

10 million dataset 3rd run

Using timescaledb-parallel-copy

CSV data 12:00
DB Insert 26:24

Raw data size 3.1G
DB size 10.8G
postgresql total size 12G

6243.59/sec (overall)

10 million dataset 4th run

Using timescale managed db

DB insert 12:28
row rate 13291.66/sec (overall)

30 million dataset

DB size 33.8G
DB insert 1:33:0
row rate 5368.28/sec (overall)

Query direct hypertables vs continous aggregate

exploration=# SELECT sum(total_fee), count(*) from transactions WHERE merchant_id = 'merchant-1';
    sum     |  count
------------+---------
 3634888528 | 7997931
(1 row)

Time: 27095.767 ms (00:27.096)
exploration=# SELECT sum(total_fee), sum(transaction_count) from transactions_daily WHERE merchant_id = 'merchant-1';
    sum     |   sum
------------+---------
 3634888528 | 7997931
(1 row)

Time: 35.236 ms

Query plan


                                                                           QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=576147.59..576147.60 rows=1 width=16) (actual time=25809.234..25890.886 rows=1 loops=1)
   ->  Gather  (cost=576147.47..576147.58 rows=1 width=16) (actual time=25807.713..25890.863 rows=2 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Partial Aggregate  (cost=575147.47..575147.48 rows=1 width=16) (actual time=25647.979..25647.988 rows=1 loops=2)
               ->  Parallel Append  (cost=0.00..551612.94 rows=4706906 width=4) (actual time=3387.836..25328.005 rows=3998966 loops=2)
                     ->  Parallel Seq Scan on _hyper_1_6_chunk  (cost=0.00..41138.08 rows=367138 width=4) (actual time=3432.692..7069.980 rows=623089 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 155930
                     ->  Parallel Seq Scan on _hyper_1_7_chunk  (cost=0.00..41111.36 rows=366399 width=4) (actual time=2.823..4066.629 rows=622395 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 156118
                     ->  Parallel Seq Scan on _hyper_1_10_chunk  (cost=0.00..41100.82 rows=366554 width=4) (actual time=3.176..3555.384 rows=623210 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 155094
                     ->  Parallel Seq Scan on _hyper_1_8_chunk  (cost=0.00..41099.74 rows=366774 width=4) (actual time=4.211..3651.592 rows=622577 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 155716
                     ->  Parallel Seq Scan on _hyper_1_11_chunk  (cost=0.00..41084.60 rows=365761 width=4) (actual time=2.423..3705.161 rows=621761 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 156240
                     ->  Parallel Seq Scan on _hyper_1_4_chunk  (cost=0.00..41084.55 rows=366835 width=4) (actual time=4.753..1680.699 rows=311234 loops=2)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 77764
                     ->  Parallel Seq Scan on _hyper_1_9_chunk  (cost=0.00..41075.29 rows=365885 width=4) (actual time=1.641..3825.738 rows=622025 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 155798
                     ->  Parallel Seq Scan on _hyper_1_5_chunk  (cost=0.00..41055.49 rows=365629 width=4) (actual time=1.191..3863.313 rows=621332 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 156110
                     ->  Parallel Seq Scan on _hyper_1_1_chunk  (cost=0.00..41050.95 rows=365555 width=4) (actual time=0.046..3474.942 rows=621695 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 155674
                     ->  Parallel Seq Scan on _hyper_1_3_chunk  (cost=0.00..41039.31 rows=365954 width=4) (actual time=1.335..3160.270 rows=621807 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 155339
                     ->  Parallel Seq Scan on _hyper_1_13_chunk  (cost=0.00..41011.42 rows=365985 width=4) (actual time=1.372..3259.743 rows=621492 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 155125
                     ->  Parallel Seq Scan on _hyper_1_2_chunk  (cost=0.00..41011.41 rows=364513 width=4) (actual time=0.708..3329.581 rows=620973 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 155643
                     ->  Parallel Seq Scan on _hyper_1_12_chunk  (cost=0.00..23502.49 rows=209699 width=4) (actual time=0.040..133.619 rows=356103 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 88956
                     ->  Parallel Seq Scan on _hyper_1_14_chunk  (cost=0.00..11712.90 rows=104225 width=4) (actual time=3342.976..3405.834 rows=177004 loops=1)
                           Filter: (merchant_id = 'merchant-1'::text)
                           Rows Removed by Filter: 44799
 Planning Time: 2.068 ms
 JIT:
   Functions: 118
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 55.473 ms, Inlining 145.139 ms, Optimization 4045.574 ms, Emission 2584.786 ms, Total 6830.972 ms
 Execution Time: 25919.706 ms
(54 rows)




                                                                                                                    QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=395.86..395.87 rows=1 width=64) (actual time=3.981..3.984 rows=1 loops=1)
   ->  Append  (cost=86.96..393.86 rows=398 width=16) (actual time=3.064..3.917 rows=460 loops=1)
         ->  Subquery Scan on "*SELECT* 1"  (cost=86.96..91.96 rows=200 width=16) (actual time=3.063..3.388 rows=455 loops=1)
               ->  HashAggregate  (cost=86.96..89.96 rows=200 width=115) (actual time=3.062..3.309 rows=455 loops=1)
                     Group Key: _materialized_hypertable_2.merchant_id, _materialized_hypertable_2.store_id, _materialized_hypertable_2.bucket
                     Batches: 1  Memory Usage: 1581kB
                     ->  Custom Scan (ChunkAppend) on _materialized_hypertable_2  (cost=0.28..81.27 rows=455 width=45) (actual time=0.041..1.165 rows=455 loops=1)
                           Chunks excluded during startup: 0
                           ->  Index Scan using _hyper_2_15_chunk__materialized_hypertable_2_merchant_id_bucket on _hyper_2_15_chunk  (cost=0.28..23.53 rows=130 width=45) (actual time=0.041..0.244 rows=130 loops=1)
                                 Index Cond: ((merchant_id = 'merchant-1'::text) AND (bucket < COALESCE(_timescaledb_internal.to_timestamp_without_timezone(_timescaledb_internal.cagg_watermark(2)), '-infinity'::timestamp without time zone)))
                           ->  Index Scan using _hyper_2_16_chunk__materialized_hypertable_2_merchant_id_bucket on _hyper_2_16_chunk  (cost=0.29..57.74 rows=325 width=45) (actual time=0.044..0.839 rows=325 loops=1)
                                 Index Cond: ((merchant_id = 'merchant-1'::text) AND (bucket < COALESCE(_timescaledb_internal.to_timestamp_without_timezone(_timescaledb_internal.cagg_watermark(2)), '-infinity'::timestamp without time zone)))
         ->  Subquery Scan on "*SELECT* 2"  (cost=295.46..299.92 rows=198 width=16) (actual time=0.464..0.469 rows=5 loops=1)
               ->  HashAggregate  (cost=295.46..297.94 rows=198 width=115) (actual time=0.463..0.467 rows=5 loops=1)
                     Group Key: transactions.merchant_id, transactions.store_id, time_bucket('1 day'::interval, transactions.transaction_updated_time)
                     Batches: 1  Memory Usage: 40kB
                     ->  Custom Scan (ChunkAppend) on transactions  (cost=0.43..292.99 rows=198 width=31) (actual time=0.041..0.385 rows=112 loops=1)
                           Chunks excluded during startup: 13
                           ->  Index Scan using _hyper_1_14_chunk_transactions_transaction_updated_time_idx on _hyper_1_14_chunk  (cost=0.42..258.04 rows=185 width=31) (actual time=0.039..0.355 rows=112 loops=1)
                                 Index Cond: (transaction_updated_time >= COALESCE(_timescaledb_internal.to_timestamp_without_timezone(_timescaledb_internal.cagg_watermark(2)), '-infinity'::timestamp without time zone))
                                 Filter: (merchant_id = 'merchant-1'::text)
                                 Rows Removed by Filter: 30
 Planning Time: 9.761 ms
 Execution Time: 4.956 ms
(24 rows)

Alter table

 exploration=# ALTER TABLE transactions ADD COLUMN fee_10 int NULL;
ALTER TABLE
Time: 29.147 ms


exploration=# DROP MATERIALIZED VIEW transactions_daily;
NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to table _timescaledb_internal._hyper_2_15_chunk
drop cascades to table _timescaledb_internal._hyper_2_16_chunk
DROP MATERIALIZED VIEW
Time: 75.336 ms

exploration=# CREATE MATERIALIZED VIEW transactions_daily
WITH (timescaledb.continuous)
AS
SELECT
   time_bucket('1 day', transaction_updated_time) as bucket,
   merchant_id,
   store_id,

   sum(fee_1) as fee_1,
   sum(fee_2) as fee_2,
   sum(fee_3) as fee_3,
   sum(fee_4) as fee_4,
   sum(fee_5) as fee_5,
   sum(fee_6) as fee_6,
   sum(fee_7) as fee_7,
   sum(fee_8) as fee_8,
   sum(fee_9) as fee_9,
   sum(fee_10) as fee_10,
   sum(total_fee) as total_fee,

   count(*) as transaction_count
FROM
   transactions
GROUP BY merchant_id, store_id, bucket
WITH NO DATA;
CREATE MATERIALIZED VIEW
Time: 88.822 ms


exploration=# CALL refresh_continuous_aggregate('transactions_daily', NULL, NULL);

CALL
Time: 71074.520 ms (01:11.075)

Compare

exploration=# select avg(total_fee) from transactions;
         avg
----------------------
 454.5304017000000000
(1 row)
Time: 38492.480 ms (00:38.492)

exploration=# select sum(total_fee)/sum(transaction_count) from transactions_daily;
       ?column?
----------------------
 454.5304017000000000
(1 row)
Time: 169.747 ms

exploration=# select count(*) from transactions;
  count
----------
 10000000
(1 row)
Time: 14364.812 ms (00:14.365)

exploration=# select count(*) from transactions_daily;
 count
-------
  6368
(1 row)
Time: 88.880 ms

Compression

Before 3 month uncompressed => 10GB

After 1 month uncompressed + 2 month compressed => 5GB

exploration=# SELECT * FROM transactions WHERE transaction_id = '7757fd62-8bfe-11ec-ab4b-63d923f6a2a2' and transaction_updated_time > now() - interval '14 day';

exploration=# SELECT * FROM transactions WHERE transaction_id = '7757fd62-8bfe-11ec-ab4b-63d923f6a2a2';
Time: 513.461 ms

exploration=# SELECT * FROM transactions WHERE transaction_id = '65cb27e0-8bfd-11ec-ab4b-63d923f6a2a2';
Time: 383.735 ms

exploration=# SELECT sum(fee_1) FROM transactions WHERE transaction_updated_time < now() - interval '2 month';
    sum
-----------
 158311810
(1 row)

Time: 877.993 ms
exploration=# SELECT sum(fee_1) FROM transactions WHERE transaction_updated_time > now() - inte
rval '1 month';
    sum
-----------
 172811617
(1 row)

Time: 18492.094 ms (00:18.492)

Analyze visual

psql -U postgres -h localhost -p 5433 -d exploration -qAt -c “EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) SELECT store_id, sum(total_fee) FROM transactions WHERE merchant_id = ‘merchant-2’ AND transaction_updated_time > NOW() - INTERVAL ‘1 month’ GROUP BY store_id;”

psql -U postgres -h localhost -p 5433 -d exploration -qAt -c “EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) SELECT store_id, sum(total_fee) FROM transactions_daily WHERE merchant_id = ‘merchant-2’ AND bucket > NOW() - INTERVAL ‘1 month’ GROUP BY store_id;”

http://tatiyants.com/pev/#/plans/new

Creating new aggregate

30 million data

exploration=# CREATE MATERIALIZED VIEW transactions_weekly_avg
WITH (timescaledb.continuous)
AS
SELECT
   time_bucket('7 day', transaction_updated_time) as bucket,
   merchant_id,
   store_id,

   avg(total_fee) as total_fee
FROM
   transactions
GROUP BY merchant_id, store_id, bucket
WITH DATA;
NOTICE:  refreshing continuous aggregate "transactions_weekly_avg"
HINT:  Use WITH NO DATA if you do not want to refresh the continuous aggregate on creation.
CREATE MATERIALIZED VIEW
Time: 261120.502 ms (04:21.121)

Update query

exploration=# UPDATE transactions SET transaction_updated_time = now() WHERE transaction_id = 'd7dd00ea-8c43-11ec-ab4b-63d923f6a2a2' AND transaction_updated_time > '2022-02-01';
ERROR:  new row for relation "_hyper_1_16_chunk" violates check constraint "constraint_16"
DETAIL:  Failing row contains (2022-02-13 02:58:32.388652, 2022-02-20 19:02:57.249809, 25200, d7dd00ea-8c43-11ec-ab4b-63d923f6a2a2, d7dd00eb-8c43-11ec-ab4b-63d923f6a2a2, d7dd00ec-8c43-11ec-ab4b-63d923f6a2a2, d7dd00ed-8c43-11ec-ab4b-63d923f6a2a2, d7dd00ee-8c43-11ec-ab4b-63d923f6a2a2, d7dd00ef-8c43-11ec-ab4b-63d923f6a2a2, merchant-1, store-1, 1, 1, 1, 2, 1, 4, 62, 80, 41, 73, 100, 43, 84, 39, 526).
Time: 26.056 ms
 1BEGIN;
 2
 3INSERT INTO transactions (transaction_id, alternative_id_1, alternative_id_2, alternative_id_3, alternative_id_4, alternative_id_5, total_fee, transaction_updated_time, transaction_created_time)
 4      SELECT transaction_id, alternative_id_1, alternative_id_2, alternative_id_3, alternative_id_4, alternative_id_5, total_fee, now(), transaction_created_time
 5      FROM transactions
 6      WHERE transaction_id = 'd7dd00e4-8c43-11ec-ab4b-63d923f6a2a2' AND transaction_updated_time = '2022-02-13 02:58:32.129452';
 7
 8DELETE FROM transactions WHERE transaction_id = 'd7dd00e4-8c43-11ec-ab4b-63d923f6a2a2' AND transaction_updated_time = '2022-02-13 02:58:32.129452';
 9
10COMMIT;

vs MySQL

Insert time ~1 hour

mysql> select count(*) from transactions;
+----------+
| count(*) |
+----------+
| 10000001 |
+----------+
1 row in set (45.94 sec)
mysql> SELECT sum(total_fee), count(*) from transactions WHERE merchant_id = 'merchant-1';
+----------------+----------+
| sum(total_fee) | count(*) |
+----------------+----------+
|     3635652669 |  7998871 |
+----------------+----------+
1 row in set (6 min 52.35 sec)

Explain analyze

-> Aggregate: sum(transactions.total_fee), count(0)  (cost=1566764.00 rows=4902925) (actual time=457808.339..457808.340 rows=1 loops=1)
    -> Index lookup on transactions using i6 (merchant_id='merchant-1')  (cost=1076471.50 rows=4902925) (actual time=86.929..456590.123 rows=7998871 loops=1)

Misc

All of this is running on 70 USD “server”, running CPU from 2008

https://www.cpubenchmark.net/compare/Intel-Xeon-E5-2676-v3-vs-Intel-Core2-Duo-E8400/2643vs955

selfhosted-server-image