TimescaleDB for Time-Series Data¶
TimescaleDB is an open-source database designed to make SQL scalable for time-series data. It is engineered up from PostgreSQL, providing automatic partitioning across time and space, while retaining the standard PostgreSQL interface.
PostgREST turns your PostgreSQL database directly into a RESTful API, since TimescaleDB is packaged as a PostgreSQL extension it works with PostgREST as well.
In this tutorial we’ll explore some of TimescaleDB features through PostgREST.
Run TimescaleDB¶
First, let’s pull and start the TimescaleDB container image:
docker run --name tsdb_tut \
-e POSTGRES_PASSWORD=mysecretpassword \
-p 5433:5432 \
-d timescale/timescaledb:latest-pg11
This will run the container as a daemon and expose port 5433
to the host system so that it doesn’t conflict with another PostgreSQL installation.
Set up TimescaleDB¶
Now, we’ll create the timescaledb
extension in our database.
Run psql
in the container we created in the previous step.
docker exec -it tsdb_tut psql -U postgres
## Run all the following commands inside psql
And create the extension:
create extension if not exists timescaledb cascade;
Create an Hypertable¶
Hypertables are the core abstraction TimescaleDB offers for dealing with time-series data.
To create an hypertable
, first we need to create standard PostgreSQL tables:
create table if not exists locations(
device_id text primary key
, location text
, environment text
);
create table if not exists conditions(
time timestamp with time zone not null
, device_id text references locations(device_id)
, temperature numeric
, humidity numeric
);
Now, we’ll convert conditions
into an hypertable with create_hypertable:
SELECT create_hypertable('conditions', 'time', chunk_time_interval => interval '1 day');
-- This also implicitly creates an index: CREATE INDEX ON "conditions"(time DESC);
-- Exit psql
exit
Load sample data¶
To have some data to play with, we’ll download the weather_small
data set from TimescaleDB’s sample datasets.
## Run bash inside the database container
docker exec -it tsdb_tut bash
## Download and uncompress the data
wget -qO- https://timescaledata.blob.core.windows.net/datasets/weather_small.tar.gz | tar xvz
## Copy data into the database
psql -U postgres <<EOF
\COPY locations FROM weather_small_locations.csv CSV
\COPY conditions FROM weather_small_conditions.csv CSV
EOF
## Exit bash
exit
Run PostgREST¶
For the last step in the setup, pull and start the official PostgREST image:
docker run --rm -p 3000:3000 \
--name tsdb_pgrst \
--link tsdb_tut \
-e PGRST_DB_URI="postgres://postgres:mysecretpassword@tsdb_tut/postgres" \
-e PGRST_DB_ANON_ROLE="postgres" \
-d postgrest/postgrest:latest
PostgREST on Hypertables¶
We’ll now see how to read data from hypertables through PostgREST.
Since hypertables can be queried using standard SELECT statements, we can query them through PostgREST normally.
Suppose we want to run this query on conditions
:
select
time,
device_id,
humidity
from conditions
where
humidity > 90 and
time < '2016-11-16'
order by time desc
limit 10;
Using PostgREST horizontal/vertical filtering, this query can be expressed as:
curl -G "localhost:3000/conditions" \
-d select=time,device_id,humidity \
-d humidity=gt.90 \
-d time=lt.2016-11-16 \
-d order=time.desc \
-d limit=10
## This command is equivalent to:
## curl "localhost:3000/conditions?select=time,device_id,humidity&humidity=gt.90&time=lt.2016-11-16&order=time.desc&limit=10"
## Here we used -G and -d to make the command more readable
The response will be:
[{"time":"2016-11-15T23:58:00+00:00","device_id":"weather-pro-000982","humidity":90.90000000000006},
{"time":"2016-11-15T23:58:00+00:00","device_id":"weather-pro-000968","humidity":92.3},
{"time":"2016-11-15T23:58:00+00:00","device_id":"weather-pro-000963","humidity":96.29999999999993},
{"time":"2016-11-15T23:58:00+00:00","device_id":"weather-pro-000951","humidity":94.39999999999998},
{"time":"2016-11-15T23:58:00+00:00","device_id":"weather-pro-000950","humidity":93.69999999999982},
{"time":"2016-11-15T23:58:00+00:00","device_id":"weather-pro-000915","humidity":94.69999999999997},
{"time":"2016-11-15T23:58:00+00:00","device_id":"weather-pro-000911","humidity":93.2000000000001},
{"time":"2016-11-15T23:58:00+00:00","device_id":"weather-pro-000910","humidity":91.30000000000017},
{"time":"2016-11-15T23:58:00+00:00","device_id":"weather-pro-000901","humidity":92.30000000000005},
{"time":"2016-11-15T23:58:00+00:00","device_id":"weather-pro-000895","humidity":91.00000000000014}]
JOINs with relational tables¶
Hypertables support all standard PostgreSQL constraints . We can make use of the foreign key defined on locations
to make a JOIN through PostgREST. A query such as:
select
c.time,
c.temperature,
l.location,
l.environment
from conditions c
left join locations l on
c.device_id = l.device_id
order by time desc
limit 10;
Can be expressed in PostgREST by using Resource Embedding.
curl -G localhost:3000/conditions \
-d select="time,temperature,device:locations(location,environment)" \
-d order=time.desc \
-d limit=10
[{"time":"2016-11-16T21:18:00+00:00","temperature":69.49999999999991,"device":{"location":"office-000202","environment":"inside"}},
{"time":"2016-11-16T21:18:00+00:00","temperature":90,"device":{"location":"field-000205","environment":"outside"}},
{"time":"2016-11-16T21:18:00+00:00","temperature":60.499999999999986,"device":{"location":"door-00085","environment":"doorway"}},
{"time":"2016-11-16T21:18:00+00:00","temperature":91,"device":{"location":"swamp-000188","environment":"outside"}},
{"time":"2016-11-16T21:18:00+00:00","temperature":42,"device":{"location":"arctic-000219","environment":"outside"}},
{"time":"2016-11-16T21:18:00+00:00","temperature":70.80000000000003,"device":{"location":"office-000201","environment":"inside"}},
{"time":"2016-11-16T21:18:00+00:00","temperature":62.699999999999974,"device":{"location":"door-00084","environment":"doorway"}},
{"time":"2016-11-16T21:18:00+00:00","temperature":85.49999999999918,"device":{"location":"field-000204","environment":"outside"}},
{"time":"2016-11-16T21:18:00+00:00","temperature":42,"device":{"location":"arctic-000218","environment":"outside"}},
{"time":"2016-11-16T21:18:00+00:00","temperature":42,"device":{"location":"arctic-000217","environment":"outside"}}]
Time-Oriented Analytics¶
TimescaleDB includes new aggregate functions for time-oriented analytics.
For using aggregate queries with PostgREST you must create VIEWs or Stored Procedures. Here’s an example for using time_bucket:
-- Run psql in the database container
docker exec -it tsdb_tut psql -U postgres
-- Create the function
create or replace function temperature_summaries(gap interval default '1 hour', prefix text default 'field')
returns table(hour text, avg_temp numeric, min_temp numeric, max_temp numeric) as $$
select
time_bucket(gap, time)::text as hour,
trunc(avg(temperature), 2),
trunc(min(temperature), 2),
trunc(max(temperature), 2)
from conditions c
where c.device_id in (
select device_id from locations
where location like prefix || '-%')
group by hour
$$ language sql stable;
-- Exit psql
exit
Every time the schema is changed you must reload PostgREST schema cache so it can pick up the function parameters correctly. To reload, run:
docker kill --signal=USR1 tsdb_pgrst
Now, since the function is stable
, we can call it with GET
as:
curl -G "localhost:3000/rpc/temperature_summaries" \
-d gap=2minutes \
-d order=hour.asc \
-d limit=10 \
-H "Accept: text/csv"
## time_bucket accepts an interval type as it's argument
## so you can pass gap=5minutes or gap=5hours
hour,avg_temp,min_temp,max_temp
"2016-11-15 12:00:00+00",72.97,68.00,78.00
"2016-11-15 12:02:00+00",73.01,68.00,78.00
"2016-11-15 12:04:00+00",73.05,68.00,78.10
"2016-11-15 12:06:00+00",73.07,68.00,78.10
"2016-11-15 12:08:00+00",73.11,68.00,78.10
"2016-11-15 12:10:00+00",73.14,68.00,78.10
"2016-11-15 12:12:00+00",73.17,68.00,78.19
"2016-11-15 12:14:00+00",73.21,68.10,78.19
"2016-11-15 12:16:00+00",73.24,68.10,78.29
"2016-11-15 12:18:00+00",73.27,68.10,78.39
Note you can use PostgREST standard filtering on function results. Here we also changed the Response Format to CSV.
Fast Ingestion with Bulk Insert¶
You can use PostgREST Bulk Insert to leverage TimescaleDB fast ingestion.
Let’s do an insert of three rows:
curl "localhost:3000/conditions" \
-H "Content-Type: application/json" \
-H "Prefer: return=representation" \
-d @- << EOF
[
{"time": "2019-02-21 01:00:01-05", "device_id": "weather-pro-000000", "temperature": 40.0, "humidity": 59.9},
{"time": "2019-02-21 01:00:02-05", "device_id": "weather-pro-000000", "temperature": 42.0, "humidity": 69.9},
{"time": "2019-02-21 01:00:03-05", "device_id": "weather-pro-000000", "temperature": 44.0, "humidity": 79.9}
]
EOF
By using the Prefer: return=representation
header we can see the successfully inserted rows:
[{"time":"2019-02-21T06:00:01+00:00","device_id":"weather-pro-000000","temperature":40.0,"humidity":59.9},
{"time":"2019-02-21T06:00:02+00:00","device_id":"weather-pro-000000","temperature":42.0,"humidity":69.9},
{"time":"2019-02-21T06:00:03+00:00","device_id":"weather-pro-000000","temperature":44.0,"humidity":79.9}]
Let’s now insert a thousand rows, we’ll use jq for constructing the array.
yes "{\"time\": \"$(date +'%F %T')\", \"device_id\": \"weather-pro-000001\", \"temperature\": 50, \"humidity\": 60}" | \
head -n 1000 | jq -s '.' | \
curl -i -d @- "http://localhost:3000/conditions" \
-H "Content-Type: application/json" \
-H "Prefer: count=exact"
With Prefer: count=exact
we can know how many rows were inserted. Check out the response:
HTTP/1.1 201 Created
Transfer-Encoding: chunked
Date: Fri, 22 Feb 2019 16:47:05 GMT
Server: postgrest/5.2.0 (9969262)
Content-Range: */1000
You can see in Content-Range
that the total number of inserted rows is 1000
.