Aggregate Functions

Aggregate functions allow you to summarize data by performing calculations across groups of rows. For instance, if you have an orders table that has an amount column, you could use an aggregate function to get the sum of the amount column, either for all rows, or for each group of rows that share specific values, for instance all rows that share the same order_date.

Note

Aggregate functions are disabled by default in PostgREST, as without appropriate safeguards, aggregate functions can create performance problems. See db-aggregates-enabled for further details.

PostgREST supports the following aggregate functions: avg(), count(), max(), min(), and sum(). Please refer to the section on aggregate functions in the PostgreSQL documentation for a detailed explanation of these functions.

To use an aggregate function, you append the function to a value in the select parameter, like so:

curl "http://localhost:3000/orders?select=amount.sum()"

With the above query, PostgREST will return a single row with a single column named sum that contains the sum of all the values in the amount column:

[
  {
    "sum": 1234.56
  }
]

You can use multiple aggregate functions by just adding more columns with aggregate functions to the select parameter.

To group by other columns, you simply add those columns to the select parameter. For instance:

curl "http://localhost:3000/orders?select=amount.sum(),amount.avg(),order_date"

This will return a row for each unique value in the order_date column, with the sum and average of the amount column for all rows that share the same order_date:

[
  {
    "sum": 1234.56,
    "avg": 123.45,
    "order_date": "2023-01-01"
  },
  {
    "sum": 2345.67,
    "avg": 234.56,
    "order_date": "2023-01-02"
  }
]

Note

Aggregate functions work alongside other PostgREST features, like Horizontal Filtering, JSON Columns, and Ordering. Please note at this time aggregate functions are not compatible with Domain Representations. Additionally, PostgreSQL’s HAVING clause and ordering by aggregated columns are not yet supported.

The Case of count()

Note

Before the addition of aggregate functions, it was possible to count by adding count (without parentheses) to the select parameter. While this is still supported, it may be deprecated in the future, and thus use of this legacy feature is not recommended. Please use count() (with parentheses) instead.

count() is treated specially, as it can be used without an associated column. Take for example the following query:

curl "http://localhost:3000/orders?select=count(),order_date"

This would return a row for each unique value in the order_date column, with the count of all rows that share the same order_date:

[
  {
    "count": 4,
    "order_date": "2023-01-01"
  },
  {
    "count": 2,
    "order_date": "2023-01-02"
  }
]

When count() is used with an associated column, its behavior is slightly different: It will return the count of all values that are not NULL. This is due to how PostgreSQL itself implements the count() function.

Renaming and Casting

Renaming Aggregates

Just like with other columns, you can rename aggregated columns too. See Renaming Columns for details.

Renaming columns is especially helpful in the context of aggregate functions, as by default a column with an aggregate function applied will take on the name of the applied aggregate function. You may want to provide a more semantically meaningful name or prevent collisions when using multiple aggregate functions of the same type.

Casting Aggregates

When applying an aggregate function to a column, you are able to cast both the value of the input to the aggregate function and the value of the output from the aggregate function. In both cases, the syntax works as described in Casting Columns, with the only difference being the placement of the cast.

Casting the Value of the Input

For instance, imagine that the orders table has a JSON column, order_details, and this column contains a JSON object that has a key, tax_amount. Let’s say you want to get the sum of the tax amount for every order. You can use the -> or ->> operators to extract the value with this key (see JSON Columns), but these operators will return values of the types JSON and text respectively, and neither of these types can be used with sum().

Therefore, you will need to first cast the input value to a type that is compatible with sum() (e.g. numeric). Casting the input value is done in exactly the same way as casting any other value:

curl "http://localhost:3000/orders?select=order_details->tax_amount::numeric.sum()"

With this, you will receive the sum of the casted tax_amount value:

[
  {
    "sum": 1234.56
  }
]

Casting the Value of the Output

Now let’s return to an example involving the amount column of the orders table. Imagine that we want to get the rounded average of the amount column. One way to do this is to use the avg() aggregate function and then to cast the output value of the function to int. To cast the value of the output of the function, we simply place the cast after the aggregate function:

curl "http://localhost:3000/orders?select=amount.avg()::int"

You will then receive the rounded average as the result:

[
  {
    "avg": 201
  }
]

Of course, you can use both input and output casts at the same time, if you so desire.

Using Aggregate Functions with Resource Embedding

Aggregate functions can be used in conjunction with Resource Embedding. You can use embedded resources as grouping columns, use aggregate functions within the context of an embedded resource, or use columns from a spreaded resource as grouping columns or as inputs to aggregate functions.

Using Embedded Resources as Grouping Columns

Using an embedded resource as a grouping column allows you to use data from an association to group the results of an aggregation.

For example, imagine that the orders table from the examples above is related to a customers table. If you want to get the sum of the amount column grouped by the name column from the customers table, you can include the customer name, using the standard Resource Embedding syntax, and perform a sum on the amount column.

curl "http://localhost:3000/orders?select=amount.sum(),customers(name)"

You will then get the summed amount, along with the embedded customer resource:

[
  {
    "sum": 100,
    "customers": {
      "name": "Customer A"
    }
  },
  {
    "sum": 200,
    "customers": {
      "name": "Customer B"
    }
  }
]

Note

The previous example uses a has-one association to demonstrate this functionality, but you may also use has-many associations as grouping columns, although there are few obvious use cases for this.

Using Aggregate Functions Within the Context of an Embedded Resource

When embedding a resource, you can apply aggregate functions to columns from the associated resource to perform aggregations within the context of an embedded resource.

Continuing with the example relationship between orders and customers from the previous section, imagine that you want to fetch the name, city, and state for each customer, along with the sum of amount of the customer’s orders, grouped by the order date. This can be done in the following way:

curl "http://localhost:3000/customers?select=name,city,state,orders(amount.sum(),order_date)"
[
  {
    "name": "Customer A",
    "city": "New York",
    "state": "NY",
    "orders": [
      {
        "sum": 215.22,
        "order_date": "2023-09-01"
      },
      {
        "sum": 905.73,
        "order_date": "2023-09-02"
      }
    ]
  },
  {
    "name": "Customer B",
    "city": "Los Angeles",
    "state": "CA",
    "orders": [
      {
        "sum": 329.71,
        "order_date": "2023-09-01"
      },
      {
        "sum": 425.87,
        "order_date": "2023-09-03"
      }
    ]
  }
]

In this example, the amount column is summed and grouped by the order_date within the context of the embedded resource. That is, the name, city, and state from the customers table have no bearing on the aggregation performed in the context of the orders association; instead, each aggregation can be seen as being performed independently on just the orders belonging to a particular customer, using only the data from the embedded resource for both grouping and aggregation.

Using Columns from a Spreaded Resource

When you spread an embedded resource, the columns from the spreaded resource are treated as if they were columns of the top-level resource, both when using them as grouping columns and when applying aggregate functions to them.

Grouping with Columns from a Spreaded Resource

For instance, assume you want to sum the amount column from the orders table, using the city and state columns from the customers table as grouping columns. To achieve this, you may select these two columns from the customers table and spread them; they will then be used as grouping columns:

curl "http://localhost:3000/orders?select=amount.sum(),...customers(city,state)

The result will be the same as if city and state were columns from the orders table:

[
  {
    "sum": 2000.29,
    "city": "New York",
    "state": "NY"
  },
  {
    "sum": 9241.21,
    "city": "Los Angeles",
    "state": "CA"
  }
]

Aggregate Functions with Columns from a Spreaded Resource

Now imagine that the customers table has a joined_date column that represents the date that the customer joined. You want to get both the most recent and the oldest joined_date for customers that placed an order on every distinct order date. This can be expressed as follows:

curl "http://localhost:3000/orders?select=order_date,...customers(joined_date.max(),joined_date.min())

As columns from a spreaded resource are treated as if they were columns from the top-level resource, the max() and min() are applied within the context of the top-level, rather than within the context of the embedded resource, as in the previous section.

The result will be the same as if the aggregations were applied to columns from the top-level:

[
  {
    "order_date": "2023-11-01",
    "max": "2023-10-15",
    "min": "2013-10-01"
  },
  {
    "order_date": "2023-11-02",
    "max": "2023-10-30",
    "min": "2016-02-11"
  }
]