pg-copy-ch 
Simply copy the current PostgreSQL data to ClickHouse
- Simple : Just dumps and imports, so it works with older versions like 9.x.
- Handy : Static single binary, so you can install it just by wget or cp.
- Automatic : Create ClickHouse tables from PostgreSQL automatically.
- Easy : Include/Exclude tables by regex. Skip by max data count, ttl.
$ psql mydb -c "SELECT count(*) FROM users"             # => 3835
$ clickhouse-client -q "CREATE DATABASE pg"
$ pg-copy-ch init config --pg-db=mydb --ch-db=pg
$ pg-copy-ch copy -t users
[09:05:28] (1/1) users REPLACED 3835 (0.35s)
$ clickhouse-client -q "SELECT count(*) FROM pg.users"  # => 3835Installation
- psql : required for PostgreSQL client
- clickhouse-client : required for ClickHouse client
- x86_64 static binary: https://github.com/maiha/pg-copy-ch/releases
$ wget https://github.com/maiha/pg-copy-ch/releases/latest/download/pg-copy-chUsage
config
First, create config file by init config.
$ pg-copy-ch init config
Initialized empty config in .pg-copy-ch/configThen, edit config file about connection settings for PostgreSQL and ClickHouse.
$ vi .pg-copy-ch/config
[postgres]
host = "pg-server1"
port = 5432
user = "postgres"
db   = "mydb"
psql = "psql -h %host -p %port -U %user %db -w"
...Copy
Once you setup config, you can run copy with specifying the table by one of '-a', '-t', '-f', '-F'.
$ pg-copy-ch copy -t users,orders # Copy only the specified tables
$ pg-copy-ch copy -a              # Copy all tables
$ pg-copy-ch copy -f <ALLOW_FILE> # Copy all tables both in the config and in <ALLOW_FILE>.
$ pg-copy-ch copy -F <DENY_FILE>  # Copy all tables in the config and NOT in <DENY_FILE>.Filter by allow
'-f init tables command.
$ pg-copy-ch init tables
Created .pg-copy-ch/tablesThen, edit or comment out as you like.
$ vi .pg-copy-ch/tables
# budgets
creatives
orders
...
$ pg-copy-ch copy -f .pg-copy-ch/tablesFilter by deny
'-F 
$ vi ignores
^pg_
$ pg-copy-ch copy -F ignoresDryrun
'-n' just prints the actions that would be executed, but do not execute them.
$ pg-copy-ch copy -t users,orders,xxx -n
Table  PostgreSQL Action
------ ---------- ----------------------------
users  FOUND      (will) Replace
orders FOUND      (will) Replace
xxx    N/A        Ignore (PG schema not found)Find performance killers
First, run and generate log.
$ pg-copy-ch -a | tee logworst record counts
$ grep REPLACED log | sort -n -k 5 -r | head -3
[05:28:26] (092/428) creatives    REPLACED 11989140 (37.01s)
[05:29:28] (157/428) constraints  REPLACED 5765600 (2.79s)
[05:31:15] (280/428) eviews       REPLACED 4460582 (5.67s)worst time
$ grep REPLACED log | sort -n -t'(' -k 3 -r | head -3
[05:28:26] (092/428) creatives    REPLACED 11989140 (37.01s)
[05:26:38] (008/428) schedules    REPLACED 1115075 (22.90s)
[05:32:47] (343/428) statistics   REPLACED 2443266 (8.98s)PostgreSQL
psql
For example, if you want to use SSL, you can specify the command directly to psql in config.
Here, '%' prefixed words are replaced by those settings automatically.
[postgres]
psql = "psql -h %host -p %port -U %user %db -w --dbname=postgres --set=sslmode=require --set=sslrootcert=./sslcert.crt"authorization
Using ~/.pgpass is a easiest way to specify a password.
If it is difficult to write to HOME by cron execution, you can embed it directly into the config file using psql above.
[postgres]
psql = "PGPASSWORD=foo psql -h %host -p %port -U %user %db -w"before_sql
The before_sql allows you to insert an arbitrary SQL.
This is useful for using not public schema but a custom schema.
[postgres]
before_sql      = "SET search_path to myschema,public;"Development
- using Crystal on docker
$ makeTest
$ make ciContributing
- Fork it (https://github.com/maiha/pg-copy-ch/fork)
- Create your feature branch (git checkout -b my-new-feature)
- Commit your changes (git commit -am 'Add some feature')
- Push to the branch (git push origin my-new-feature)
- Create a new Pull Request
Contributors
- maiha - creator and maintainer