TimescaleDB for Time-Series Data

TimescaleDB is an open-source database designed to make SQL scalable for time-series data. It is engineered up from PostgreSQL, providing automatic partitioning across time and space, while retaining the standard PostgreSQL interface.

PostgREST turns your PostgreSQL database directly into a RESTful API, since TimescaleDB is packaged as a PostgreSQL extension it works with PostgREST as well.

In this tutorial we’ll explore some of TimescaleDB features through PostgREST.

Install Docker

For an easier setup we’re going to use Docker, make sure you have it installed.

Run TimescaleDB

First, let’s pull and start the TimescaleDB container image:

docker run --name tsdb_tut \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5433:5432 \
  -d timescale/timescaledb:latest-pg11

This will run the container as a daemon and expose port 5433 to the host system so that it doesn’t conflict with another PostgreSQL installation.

Set up TimescaleDB

Now, we’ll create the timescaledb extension in our database.

Run psql in the container we created in the previous step.

docker exec -it tsdb_tut psql -U postgres
## Run all the following commands inside psql

And create the extension:

create extension if not exists timescaledb cascade;

Create an Hypertable

Hypertables are the core abstraction TimescaleDB offers for dealing with time-series data.

To create an hypertable, first we need to create standard PostgreSQL tables:

create table if not exists locations(
  device_id    text  primary key
, location     text
, environment  text

create table if not exists conditions(
  time         timestamp with time zone  not null
, device_id    text                      references locations(device_id)
, temperature  numeric
, humidity     numeric

Now, we’ll convert conditions into an hypertable with create_hypertable:

SELECT create_hypertable('conditions', 'time', chunk_time_interval => interval '1 day');
-- This also implicitly creates an index: CREATE INDEX ON "conditions"(time DESC);

-- Exit psql

Load sample data

To have some data to play with, we’ll download the weather_small data set from TimescaleDB’s sample datasets.

## Run bash inside the database container
docker exec -it tsdb_tut bash

## Download and uncompress the data
wget -qO- https://timescaledata.blob.core.windows.net/datasets/weather_small.tar.gz | tar xvz

## Copy data into the database
psql -U postgres <<EOF
  \COPY locations  FROM weather_small_locations.csv  CSV
  \COPY conditions FROM weather_small_conditions.csv CSV

## Exit bash

Run PostgREST

For the last step in the setup, pull and start the official PostgREST image:

docker run --rm -p 3000:3000 \
  --name tsdb_pgrst \
  --link tsdb_tut \
  -e PGRST_DB_URI="postgres://postgres:[email protected]_tut/postgres" \
  -e PGRST_DB_ANON_ROLE="postgres" \
  -d postgrest/postgrest:latest

PostgREST on Hypertables

We’ll now see how to read data from hypertables through PostgREST.

Since hypertables can be queried using standard SELECT statements, we can query them through PostgREST normally.

Suppose we want to run this query on conditions:

from conditions
  humidity > 90           and
  time     < '2016-11-16'
order by time desc
limit 10;

Using PostgREST horizontal/vertical filtering, this query can be expressed as:

curl -G "localhost:3000/conditions" \
  -d select=time,device_id,humidity \
  -d humidity=gt.90 \
  -d time=lt.2016-11-16 \
  -d order=time.desc \
  -d limit=10
## This command is equivalent to:
## curl "localhost:3000/conditions?select=time,device_id,humidity&humidity=gt.90&time=lt.2016-11-16&order=time.desc&limit=10"
## Here we used -G and -d to make the command more readable

The response will be:


JOINs with relational tables

Hypertables support all standard PostgreSQL constraints . We can make use of the foreign key defined on locations to make a JOIN through PostgREST. A query such as:

from conditions c
left join locations l on
  c.device_id = l.device_id
order by time desc
limit 10;

Can be expressed in PostgREST by using Resource Embedding.

curl -G localhost:3000/conditions \
  -d select="time,temperature,device:locations(location,environment)" \
  -d order=time.desc \
  -d limit=10

Time-Oriented Analytics

TimescaleDB includes new aggregate functions for time-oriented analytics.

For using aggregate queries with PostgREST you must create VIEWs or Stored Procedures. Here’s an example for using time_bucket:

-- Run psql in the database container
docker exec -it tsdb_tut psql -U postgres

-- Create the function
create or replace function temperature_summaries(gap interval default '1 hour', prefix text default 'field')
returns table(hour text, avg_temp numeric, min_temp numeric, max_temp numeric) as $$
    time_bucket(gap, time)::text as hour,
    trunc(avg(temperature), 2),
    trunc(min(temperature), 2),
    trunc(max(temperature), 2)
  from conditions c
  where c.device_id in (
    select device_id from locations
    where location like prefix || '-%')
  group by hour
$$ language sql stable;

-- Exit psql

Every time the schema is changed you must reload PostgREST schema cache so it can pick up the function parameters correctly. To reload, run:

docker kill --signal=USR1 tsdb_pgrst

Now, since the function is stable, we can call it with GET as:

curl -G "localhost:3000/rpc/temperature_summaries" \
  -d gap=2minutes \
  -d order=hour.asc \
  -d limit=10 \
  -H "Accept: text/csv"
## time_bucket accepts an interval type as it's argument
## so you can pass gap=5minutes or gap=5hours
"2016-11-15 12:00:00+00",72.97,68.00,78.00
"2016-11-15 12:02:00+00",73.01,68.00,78.00
"2016-11-15 12:04:00+00",73.05,68.00,78.10
"2016-11-15 12:06:00+00",73.07,68.00,78.10
"2016-11-15 12:08:00+00",73.11,68.00,78.10
"2016-11-15 12:10:00+00",73.14,68.00,78.10
"2016-11-15 12:12:00+00",73.17,68.00,78.19
"2016-11-15 12:14:00+00",73.21,68.10,78.19
"2016-11-15 12:16:00+00",73.24,68.10,78.29
"2016-11-15 12:18:00+00",73.27,68.10,78.39

Note you can use PostgREST standard filtering on function results. Here we also changed the Response Format to CSV.

Fast Ingestion with Bulk Insert

You can use PostgREST Bulk Insert to leverage TimescaleDB fast ingestion.

Let’s do an insert of three rows:

curl "localhost:3000/conditions" \
  -H "Content-Type: application/json" \
  -H "Prefer: return=representation" \
  -d @- << EOF
    {"time": "2019-02-21 01:00:01-05", "device_id": "weather-pro-000000", "temperature": 40.0, "humidity": 59.9},
    {"time": "2019-02-21 01:00:02-05", "device_id": "weather-pro-000000", "temperature": 42.0, "humidity": 69.9},
    {"time": "2019-02-21 01:00:03-05", "device_id": "weather-pro-000000", "temperature": 44.0, "humidity": 79.9}

By using the Prefer: return=representation header we can see the successfully inserted rows:


Let’s now insert a thousand rows, we’ll use jq for constructing the array.

yes "{\"time\": \"$(date +'%F %T')\", \"device_id\": \"weather-pro-000001\", \"temperature\": 50, \"humidity\": 60}" | \
head -n 1000 | jq -s '.' | \
curl -i -d @- "http://localhost:3000/conditions" \
  -H "Content-Type: application/json" \
  -H "Prefer: count=exact"

With Prefer: count=exact we can know how many rows were inserted. Check out the response:

HTTP/1.1 201 Created
Transfer-Encoding: chunked
Date: Fri, 22 Feb 2019 16:47:05 GMT
Server: postgrest/5.2.0 (9969262)
Content-Range: */1000

You can see in Content-Range that the total number of inserted rows is 1000.

Summing it up

There you have it, with PostgREST you can get an instant and performant RESTful API for a TimescaleDB database.

For a more in depth exploration of TimescaleDB capabilities, check their docs.