sql - Query Postgres table by Block Range Index (BRIN) identifier directly -
i have n client machines. want load each of machine distinct partition of brin index.
that requires to:
- create brin predefined number of partitions - equal number of client machines
- send queries clients uses where on brin partitions identifier instead of filter on indexed column
the main goal performance improvement when loading single table postgres distributed client machines, keeping equal number of rows between clients - or close equal if rows count not divides machines count.
i can achieve maintaining new column chunks table number of buckets equal number of client machines (or use row_number() on (order datetime) % n
on fly). way not efficient in timing , memory, , brin index looks nice feature speed such use cases.
minimal reproducible example 3 client machines:
create table bigtable (datetime timestamptz, value text); insert bigtable values ('2015-12-01 00:00:00+00'::timestamptz, 'txt1'); insert bigtable values ('2015-12-01 05:00:00+00'::timestamptz, 'txt2'); insert bigtable values ('2015-12-02 02:00:00+00'::timestamptz, 'txt3'); insert bigtable values ('2015-12-02 03:00:00+00'::timestamptz, 'txt4'); insert bigtable values ('2015-12-02 05:00:00+00'::timestamptz, 'txt5'); insert bigtable values ('2015-12-02 16:00:00+00'::timestamptz, 'txt6'); insert bigtable values ('2015-12-02 23:00:00+00'::timestamptz, 'txt7');
expected output:
- client 1
2015-12-01 00:00:00+00, 'txt1' 2015-12-01 05:00:00+00, 'txt2' 2015-12-02 02:00:00+00, 'txt3'
- client 2
2015-12-02 03:00:00+00, 'txt4' 2015-12-02 05:00:00+00, 'txt5'
- client 3
2015-12-02 16:00:00+00, 'txt6' 2015-12-02 23:00:00+00, 'txt7'
the question:
how can create brin predefined number of partitions , run queries filters on partition identifiers instead of filtering on index column?
optionally other way brin (or other pg goodies) can speed task of parallel loading multiple clients single table?
it sounds want shard table on many machines, , have each local table (one shard of global table) have brin index 1 bucket. not make sense. if single brin index range covers entire (local) table, can never helpful.
it sounds looking partitioning check constraints can used partition-exclusion. postgresql has supported long time table inheritance (although not each partition being on separate machine). using method, range covered in check constraint has set explicitly each partition. ability explicitly specify bounds sounds looking for, using different technology.
but, partition exclusion constraint code doesn't work modulus. code smart enough know where id=5
needs check check (id between 1 , 10)
partition, because knows id=5 implies id between 1 , 10. more accurately, know contrapositive of that.
but code never written know where id=5
implies id%10 = 5%10
, though humans know that. if build partitions on modulus operators, check (id%10=5)
rather on ranges, have sprinkle queries where id = $1 , id % 10= $1 %10
if wanted take advantage of constraints.
Comments
Post a Comment