Query Syntax Reference

Wvlet is a query language designed to be more human-readable and easier to write than SQL. If you are familiar to SQL, you will find it easy to learn Wvlet syntax due to the many similarities. If you know about DataFrame in Python, it will help you understand the Wvlet query language, as chaining relational operators in flow-style is similar to using the DataFrame API.

Getting Started

New to Wvlet? Check out the Quick Start tutorial for a hands-on introduction.

Operator Quick Reference
Data Source Operators
Filtering & Selection
Column Operations
Aggregation & Grouping
Joining & Combining Data
Sorting & Transformation
Advanced Operations
Expressions
Common Query Patterns

Operator Quick Reference

Core Operators at a Glance

Operator	Description	SQL Equivalent	Example
Data Sources
`from`	Read data from table/file	`SELECT * FROM`	`from customers`
`with`	Define reusable subquery	`WITH ... AS`	`with t as { from users }`
`show tables`	List available tables	`SHOW TABLES`	`show tables`
Filtering & Selection
`where`	Filter rows	`WHERE`	`where age > 21`
`select`	Choose columns	`SELECT`	`select name, age`
`limit`	Limit row count	`LIMIT`	`limit 10`
`sample`	Random sampling	`TABLESAMPLE`	`sample 1000`
Column Operations
`add`	Add new columns	`SELECT *, expr AS`	`add price * 0.9 as discounted`
`exclude`	Remove columns	-	`exclude password`
`rename`	Rename columns	`AS`	`rename user_id as id`
`shift`	Reorder columns	-	`shift name, email`
Aggregation
`group by`	Group rows	`GROUP BY`	`group by category`
`agg`	Add aggregations	`SELECT agg_func()`	`agg _.count, revenue.sum`
`pivot`	Pivot table	`PIVOT`	`pivot on month`
Joining Data
`join`	Inner join	`JOIN`	`join orders on id = user_id`
`left join`	Left outer join	`LEFT JOIN`	`left join orders on ...`
`concat`	Union all	`UNION ALL`	`concat { from archive }`
Transformation
`order by`	Sort rows	`ORDER BY`	`order by created_at desc`
`dedup`	Remove duplicates	`DISTINCT`	`dedup`
`unnest`	Expand arrays	`UNNEST`	`unnest(tags)`

Typical Flow of a Wvlet Query

Wvlet queries start with the from keyword, and you can chain multiple relational operators to process the input data and generate output data. Here's a typical flow:

from ...         -- Scan the input data
where ...        -- Apply filtering conditions
where ...        -- [optional] Apply more filtering conditions
add   ... as ... -- Add new columns
group by ...     -- Group rows by the given columns
agg ...          -- Add group aggregation expressions, e.g., _.count, _.sum 
where ...        -- Apply filtering conditions for groups (HAVING clause in SQL)
exclude ...      -- Remove columns from the output
shift ...        -- Shift the column position to the left
select ...       -- Select columns to output
order by ...     -- Sort the rows by the given columns
limit ...        -- Limit the number of rows to output

Unlike SQL, whose queries always must follow the SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ... structure, Wvlet uses the flow-style syntax to match the order of data processing order as much as possible, facilitating more intuitive query writing. A query should have a from statement to read the data, but select is not mandatory in Wvlet.

Relational Operators

In Wvlet, you need to use lower-case keywords for SQL-like operators. Operators are organized by their primary function to help you find what you need quickly.

Data Source Operators

These operators help you access and explore data sources in your environment.

Catalog Discovery

Operator	Description	SQL Equivalent
show tables	List all tables in current schema	`SHOW TABLES`
show tables in `schema`	List tables in specific schema	`SHOW TABLES IN schema`
show schemas	List all schemas in catalog	`SHOW SCHEMAS`
show schemas in `catalog`	List schemas in specific catalog	`SHOW SCHEMAS IN catalog`
show catalogs	List all available catalogs	`SHOW CATALOGS`
show models	List all available models	-
describe `table`	Show table/query schema	`DESCRIBE table`

Reading Data

Operator	Description	SQL Equivalent
from `table`	Read from table	`SELECT * FROM table`
from `'file.parquet'`	Read from file	-
from `[[...]]`	Read from inline values	`VALUES (...)`
with `alias` as query	Define reusable subquery	`WITH alias AS (query)`
from sql"..."	Execute raw SQL	-

Filtering & Selection

These operators help you filter rows and select specific columns from your data.

Operator	Description	SQL Equivalent
where `condition`	Filter rows by condition	`WHERE condition`
select `col1`, `col2`	Select specific columns	`SELECT col1, col2`
select *	Select all columns	`SELECT *`
select distinct `cols`	Select unique values	`SELECT DISTINCT cols`
limit `n`	Limit to n rows	`LIMIT n`
sample `n`	Random sample of n rows	`TABLESAMPLE`
sample `n%`	Random sample percentage	`TABLESAMPLE n PERCENT`
exists subquery	Check if subquery has results	`EXISTS (subquery)`

Column Operations

These operators help you manipulate columns without changing the number of rows.

Operator	Description	SQL Equivalent
add `expr` as `name`	Add column to the right	`SELECT *, expr AS name`
prepend `expr` as `name`	Add column to the left	-
exclude `col1`, `col2`	Remove specific columns	-
rename `old` as `new`	Rename columns	`old AS new`
shift `col1`, `col2`	Move columns to left	-
shift to right `cols`	Move columns to right	-

Aggregation & Grouping

These operators help you aggregate data and perform group operations.

Operator	Description	SQL Equivalent
group by `cols`	Group rows by columns	`GROUP BY cols`
agg `expr1`, `expr2`	Add aggregation expressions	`SELECT agg_func(...)`
count	Count all rows	`SELECT COUNT(*)`
pivot on `col`	Pivot table on column values	`PIVOT`
unpivot `val` for `col` in (...)	Unpivot columns to rows	`UNPIVOT`

Common Aggregation Functions

When using agg after group by, you can use these aggregation functions:

_.count - Count rows in group
column.sum - Sum of column values
column.avg - Average of column values
column.min / column.max - Min/max values
_.count_distinct(column) - Count distinct values
_.array_agg(column) - Collect values into array

Joining & Combining Data

These operators help you combine data from multiple sources.

Operator	Description	SQL Equivalent
join `table` on `condition`	Inner join	`JOIN table ON condition`
left join `table` on `condition`	Left outer join	`LEFT JOIN table ON condition`
right join `table` on `condition`	Right outer join	`RIGHT JOIN table ON condition`
cross join `table`	Cartesian product	`CROSS JOIN table`
asof join	Time-based join	-
concat query	Union all results	`UNION ALL`
intersect query	Set intersection	`INTERSECT`
except query	Set difference	`EXCEPT`

Sorting & Transformation

These operators help you sort and transform your data.

Operator	Description	SQL Equivalent
order by `col` desc	Sort descending	`ORDER BY col DESC`
order by `col` asc	Sort ascending	`ORDER BY col ASC`
dedup	Remove duplicate rows	`SELECT DISTINCT *`
unnest(`array_col`)	Expand array to rows	`UNNEST(array_col)`

Advanced Operations

These operators provide advanced functionality for testing and debugging.

Operator	Description	Use Case
test `condition`	Assert test conditions	Testing queries
debug query	Debug intermediate results	Debugging
\| `function(args)`	Apply table function	Custom transformations

Update Operations

These operators allow you to save or modify data.

Operator	Description	Notes
save to `table`	Create new table	Overwrites if exists
save to `'file.parquet'`	Save to file	Supports various formats
append to `table`	Append to existing table	Creates if not exists
delete	Delete matching rows	From source table

Inspection Commands

These commands help you understand query structure and data.

Operator	Description	Use Case
describe `query`	Show query schema	Understanding output columns
explain `query`	Show logical plan	Query optimization
explain sql"..."	Explain raw SQL	SQL debugging

Expressions

One of the major difference from traditional SQL is that wvlet uses single or double quoted strings for representing string values and back-quoted strings for referencing column or table names, which might contain space or special characters.

Operator	Description
'single quote'	String literal for representing string values, file names, etc.
"double quote"	Same as single quote strings
"""triple quote string"""	Multi-line strings
`(back quote)`	Column or table name, which requires quotations
sql"`sql expr`"	SQL expression used for inline expansion
sql" ... ${`expr`} ..."	Interpolated SQL expression with embedded expressions
s"... ${expr} ..."	Interpolated strings with expressions
s`... ${expr} ...`	Interpolated backquote strings with expressions
[`expr`, ...]	Array value
[[`expr`, ...], ...]	Array of arrays for representing table records
{`key`: `value`, ...}	Struct (row) value
`_`	Underscore refers to the previous input
`agg_func(expr)` over (partition by ... order by ...)	Window functions for computing aggregate values computed from the entire query result. This follows similar window function syntax with SQL
`_1`, `_2`, ...	Refers to 1-origin grouping keys in the preceding `group by` clause
`1`, `2`, ...	Refers to 1-origin column index for `order by` clause
`expr`:`type`	Cast the value to the target type. Equivalent to cast(`expr` as `type`) in SQL
'2025-01-01':date	Cast the string to a date value
'1 year':interval	Cast the string to an interval of SQL

Variable Definition

You can define a variable using val keyword:

-- Define a new variable
val name = 'wvlet'

-- Variable is evaluated once before running the query
select s"Hello ${x}!" as msg
-- Returns [['Hello wvlet!']]

Conditional Expressions

Operator	Description
`expr` and `expr`	Logical AND
`expr` or `expr`	Logical OR
not `expr`	Logical NOT
!`expr`	Logical NOT
`expr` is `expr`	Equality check
`expr` = `expr`	Equality check
`expr` is not `expr`	Inequality check
`expr` != `expr`	Inequality check
`expr` is null	True if the expression is null
`expr` = null	True if the expression is null
`expr` is not null	True if the expression is not null
`expr` != null	True if the expression is not null
`expr` in (`v1`, `v2`, ...)	True if the expression value is in the given list
`expr` in { from ... }	True if the expression value is in the given list provided by a sub query
`expr` not in (`v1`, `v2`, ...)	True if the expression is not in the given list
`expr` between `v1` and `v2`	True if the expression value is between v1 and v2, i.e., v1 ≤ (value) ≤ v2
`expr` like `pattern`	True if the expression matches the given pattern, e.g., , `'abc%'`
exists { from ... }	True if the subquery returns any rows
not exists { from ... }	True if the subquery returns no rows

If Expression

if ... then .. else expression can be used to switch the output value based on the condition:

from lineitem
select
  if l_returnflag = 'A' then 1 else 0
  as return_code

The if expression can be nested as follows:

from lineitem
select
  if l_returnflag = 'A' then 1
  else if l_returnflag = 'R' then 2
  else 0
  as return_code

Case Expression

To switch the output value based on the input value, you can use the case expression:

from lineitem
select
  case l_returnflag
    when 'A' then 1
    when 'R' then 2
    else 0
  as return_code

You can also use the case expression with conditional expressions for clarity:

from lineitem
select 
  case 
    when l_returnflag = 'A' then 1
    when l_returnflag = 'R' then 2
    else 0
  as return_code 

tip

Unlike SQL, Wvlet doesn't require end at the end of case expressions.

String Expressions

Operator	Description
`expr` + `expr`	Concatenate two strings

Array Expressions

You can construct array values with [e1, e2, ...] syntax:

select ['a', 'b', 'c'] as arr

Arrays can be accessed with index (1-origin):

select ['a', 'b', 'c'] as arr
select arr[1] as first_element

Map Expressions

You can construct map values with map {k1: v1, k2: v2, ...} syntax. Unlike struct expressions, keys (k1, k2, ...) needs to be the same type values, and values (v1, v2, ...) also need to be the same type values:

select map {'a': 1, 'b': 2} as m

Struct/Row Expressions

Struct (row) expressions are used to represent key-value pairs. You can access the values by name:

select {'i': 3, 's': 'str'} as obj
select 
  -- Name based access
  obj.i, obj.s,
  -- Lookup by name 
  obj['i'], obj['s']

Note that key names in a struct do not require quotes for plain key names:

select {k1:1, k2:'v1'} as obj
select obj.k1, obj.k2

Lambda Expression

To manipulate array values, you can use lambda expressions:

select list_transform([4, 5, 6], x -> x + 1)

This query applies the lambda function x -> x + 1 to each element of the input array:

┌────────────┐
│    arr     │
│ array(int) │
├────────────┤
│ [5, 6, 7]  │
├────────────┤
│ 1 rows     │
└────────────┘

To pass multiple arguments to the lambda function, use the following syntax:

select list_reduce([4, 5, 6], (a, b) -> a + b) as sum;

┌─────┐
│ sum │
│ int │
├─────┤
│  15 │
├─────┤
│ 1 … │
└─────┘

Lambda functions can be nested as follows:

select
  list_transform(
    [1, 2, 3],
    x -> list_reduce([4, 5, 6], (a, b) -> a + b) + x
  ) as arr;

┌──────────────┐
│     arr      │
│  array(int)  │
├──────────────┤
│ [16, 17, 18] │
├──────────────┤
│ 1 rows       │
└──────────────┘

Examples

show tables

To find a target table to query in the current database (schema), you can use the show tables query:

show tables

The output of show tables can be followed by the where clause to filter the table names:

show tables
where name like 'n%';

┌────────┐
│  name  │
│ string │
├────────┤
│ nation │
├────────┤
│ 1 rows │
└────────┘

show schemas

To find a target database (schema) to query in the current catalog, you can use the show schemas query:

show schemas;

┌─────────┬────────────────────┐
│ catalog │        name        │
│ string  │       string       │
├─────────┼────────────────────┤
│ memory  │ information_schema │
│ system  │ information_schema │
│ temp    │ information_schema │
│ memory  │ main               │
│ system  │ main               │
│ temp    │ main               │
│ memory  │ pg_catalog         │
│ system  │ pg_catalog         │
│ temp    │ pg_catalog         │
├─────────┴────────────────────┤
│ 9 rows                       │
└──────────────────────────────┘

These schema names can be used to specify the target schema in the show tables command:

show tables in main 
where name like 'p%';

┌──────────┐
│   name   │
│  string  │
├──────────┤
│ part     │
│ partsupp │
├──────────┤
│ 2 rows   │
└──────────┘

show catalogs

List all available catalogs:

show catalogs;

┌────────┐
│  name  │
│ string │
├────────┤
│ memory │
│ system │
│ temp   │
├────────┤
│ 3 rows │
└────────┘

from

Read rows from a given table:

from nation

In DuckDB backend, you can read data from local Parquet or JSON files:

from 'sample.parquet' 

Or from files on the Web:

from 'https://(path to your data)/sample.parquet'

Reading data from an URL will also work for S3 Presigned URLs.

Raw SQL statements

In case you need to use a raw SQL statement, you can use the sql string interpolation:

from sql"select * from nation"

warning

Although raw SQL statements are convenient, Wvlet needs to issue a real query to the target DBMS to inspect the data types of the query result, which might slowdown the query or cause errors in the subsequent relational operators.

where

You can apply a filter condition to the input rows:

from nation
where n_regionkey = 1

Unlike SQL, applying multiple where clauses is allowed in Wvlet:

from nation
where n_regionkey = 1
where n_name like 'U%'

This is useful even when you need to generate Wvlet queries programatically as you only need to append new condition lines to the query. In SQL, you need to parse the text inside WHERE clause and concatenate condition with AND expression.

count

Wvlet has a shorthand notation for counting the number of input rows:

from nation
count

This is equilvalen to:

from nation
select _.count

Or more SQL-like expression can be used too:

from nation
select count(*)

For clarity, you can use pipe | before the count operator:

from nation
select n_name, n_regionkey,
-- pipe (|) is required after trailing comma
| count

add

Add a new column to the input. This operator is helpful to reduce the typing effort as you don't need to enumerate all the columns in the select clause:

from nation
select n_name
add n_name.substring(1, 3) as prefix
limit 5;

┌───────────┬────────┐
│  n_name   │ prefix │
│  string   │ string │
├───────────┼────────┤
│ ALGERIA   │ ALG    │
│ ARGENTINA │ ARG    │
│ BRAZIL    │ BRA    │
│ CANADA    │ CAN    │
│ EGYPT     │ EGY    │
├───────────┴────────┤
│ 5 rows             │
└────────────────────┘

prepend

Prepend a new column to the left side of the input:

from nation
select n_name
prepend n_name.substring(1, 3) as prefix
limit 5;

┌────────┬───────────┐
│ prefix │  n_name   │
│ string │  string   │
├────────┼───────────┤
│ ALG    │ ALGERIA   │
│ ARG    │ ARGENTINA │
│ BRA    │ BRAZIL    │
│ CAN    │ CANADA    │
│ EGY    │ EGYPT     │
├────────┴───────────┤
│ 5 rows             │
└────────────────────┘

select

Output only the selected columns from the input:

from nation
select n_name, n_regionkey

As in SQL, arbitrary expressions can be used in select statements. Note that, however, in Wvlet, you usually don't need to use select statement as the other column-at-a-time operators, like add, exclude, shift, etc. can be used for manipulating column values.

rename

The rename operator changes the column names in the input rows. This is useful when you want to rename columns in the output.

from nation
rename n_name as nation_name

exclude

The exclude operator removes specified columns from the input rows. This is useful when you want to drop certain columns from the output.

from nation
exclude n_regionkey

shift

The shift operator changes the position of specified columns in the input rows. You can move columns to the left or right.

Shift to the left (default):

from nation
shift n_name

Shift to the right:

from nation
shift to right n_comment

group by

The group by operator groups rows by one or more columns, creating groups of rows that share the same values in the specified columns.

Basic Grouping

from [[1, "A", 100], [2, "B", 200], [1, "A", 150], [2, "B", 50], [3, "C", 300]]
  as data(id, category, value)
group by id, category;

┌─────┬──────────┬───────┐
│ id  │ category │ value │
│ int │  string  │  int  │
├─────┼──────────┼───────┤
│   1 │ A        │   100 │
│   2 │ B        │   200 │
│   3 │ C        │   300 │
├─────┴──────────┴───────┤
│ 3 rows                 │
└────────────────────────┘

note

In SQL, you need to specify aggregation operators like count, sum, avg, etc. to generate output from GROUP BY. In Wvlet, the default aggregation operator arbitrary (any) is used, which returns an arbitrary value from each group.

Referencing Grouping Keys

After grouping, you can reference grouping keys by their column names directly:

from [[1, "A", 100], [2, "B", 200], [1, "A", 150], [2, "B", 50]]
  as data(id, category, value)
group by id, category
select id, category;

┌─────┬──────────┐
│ id  │ category │
│ int │  string  │
├─────┼──────────┤
│   1 │ A        │
│   2 │ B        │
├─────┴──────────┤
│ 2 rows         │
└────────────────┘

tip

While you can also reference grouping keys using _1, _2, etc. in the order they appear in the group by clause, it's recommended to use column names for better readability and maintainability.

agg

The agg operator adds aggregation expressions. It is typically used after a group by clause to aggregate data within groups, but can also be used without group by to aggregate all rows in the input.

Basic Aggregations

from [[1, "A", 100], [2, "B", 200], [1, "A", 150], [2, "B", 50], [3, "C", 300]]
  as data(id, category, value)
group by category
agg 
  _.count as item_count,
  value.sum as total_value,
  value.avg as avg_value;

┌──────────┬────────────┬─────────────┬──────────────┐
│ category │ item_count │ total_value │  avg_value   │
│  string  │    long    │     int     │ decimal(17,4) │
├──────────┼────────────┼─────────────┼──────────────┤
│ A        │          2 │         250 │     125.0000 │
│ B        │          2 │         250 │     125.0000 │
│ C        │          1 │         300 │     300.0000 │
├──────────┴────────────┴─────────────┴──────────────┤
│ 3 rows                                             │
└────────────────────────────────────────────────────┘

Available Aggregation Functions

The underscore _ represents the group of rows and supports various aggregation functions:

from [[1, 10, "A"], [2, 20, "B"], [1, 30, "C"], [2, 40, "D"], [1, 50, "E"]]
  as sales(store_id, amount, product)
group by store_id
agg 
  _.count as transaction_count,
  amount.sum as total_sales,
  amount.avg as avg_sale,
  amount.min as min_sale,
  amount.max as max_sale,
  _.count_distinct(product) as unique_products,
  _.max_by(product, amount) as best_product,
  _.min_by(product, amount) as worst_product;

┌──────────┬───────────────────┬─────────────┬──────────────┬──────────┬──────────┬─────────────────┬──────────────┬───────────────┐
│ store_id │ transaction_count │ total_sales │   avg_sale   │ min_sale │ max_sale │ unique_products │ best_product │ worst_product │
│   int    │       long        │     int     │ decimal(17,4) │   int    │   int    │      long       │    string    │    string     │
├──────────┼───────────────────┼─────────────┼──────────────┼──────────┼──────────┼─────────────────┼──────────────┼───────────────┤
│        1 │                 3 │          90 │      30.0000 │       10 │       50 │               3 │ E            │ A             │
│        2 │                 2 │          60 │      30.0000 │       20 │       40 │               2 │ D            │ B             │
├──────────┴───────────────────┴─────────────┴──────────────┴──────────┴──────────┴─────────────────┴──────────────┴───────────────┤
│ 2 rows                                                                                                                             │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Filtering Groups (HAVING clause)

You can filter groups after aggregation using a where clause after agg:

from [[1, "Electronics", 1000], [2, "Books", 50], [3, "Electronics", 2000], 
      [4, "Books", 75], [5, "Toys", 500], [6, "Electronics", 1500]]
  as orders(id, category, amount)
group by category
agg 
  _.count as order_count,
  amount.sum as total_amount
where total_amount > 1000
order by total_amount desc;

┌─────────────┬─────────────┬──────────────┐
│  category   │ order_count │ total_amount │
│   string    │    long     │     int      │
├─────────────┼─────────────┼──────────────┤
│ Electronics │           3 │         4500 │
├─────────────┴─────────────┴──────────────┤
│ 1 rows                                    │
└───────────────────────────────────────────┘

Multiple Grouping with Complex Aggregations

from [["2024-01-01", "A", "Electronics", 100],
      ["2024-01-01", "B", "Books", 50],
      ["2024-01-02", "A", "Electronics", 200],
      ["2024-01-02", "A", "Books", 75],
      ["2024-01-03", "B", "Electronics", 150]]
  as sales(date, store, category, amount)
group by date, store
agg 
  _.count as transactions,
  amount.sum as daily_total,
  _.array_agg(category) as categories_sold,
  _.array_agg(amount) as amounts;

┌────────────┬───────┬──────────────┬─────────────┬───────────────────────┬───────────────┐
│    date    │ store │ transactions │ daily_total │   categories_sold     │    amounts    │
│   string   │ string│     long     │     int     │     array(string)     │   array(int)  │
├────────────┼───────┼──────────────┼─────────────┼───────────────────────┼───────────────┤
│ 2024-01-01 │ A     │            1 │         100 │ [Electronics]         │ [100]         │
│ 2024-01-01 │ B     │            1 │          50 │ [Books]               │ [50]          │
│ 2024-01-02 │ A     │            2 │         275 │ [Electronics, Books]  │ [200, 75]     │
│ 2024-01-03 │ B     │            1 │         150 │ [Electronics]         │ [150]         │
├────────────┴───────┴──────────────┴─────────────┴───────────────────────┴───────────────┤
│ 4 rows                                                                                    │
└───────────────────────────────────────────────────────────────────────────────────────────┘

Using Aggregations Without group by

You can use aggregations directly without group by to aggregate all rows:

from [[1, 100], [2, 200], [3, 300], [4, 400], [5, 500]]
  as data(id, value)
agg 
  _.count as total_rows,
  value.sum as sum_all,
  value.avg as average;

┌────────────┬─────────┬──────────────┐
│ total_rows │ sum_all │   average    │
│    long    │   int   │ decimal(17,4) │
├────────────┼─────────┼──────────────┤
│          5 │    1500 │     300.0000 │
├────────────┴─────────┴──────────────┤
│ 1 rows                               │
└──────────────────────────────────────┘

Practical Example: Sales Analysis

Here's a practical example analyzing sales data:

from [["2024-01", "North", "Laptop", 5, 1200],
      ["2024-01", "North", "Phone", 10, 800],
      ["2024-01", "South", "Laptop", 3, 1200],
      ["2024-01", "South", "Tablet", 7, 600],
      ["2024-02", "North", "Laptop", 8, 1200],
      ["2024-02", "North", "Tablet", 5, 600],
      ["2024-02", "South", "Phone", 12, 800],
      ["2024-02", "South", "Laptop", 4, 1200]]
  as sales(month, region, product, quantity, unit_price)
group by month, region
agg 
  _.count as num_orders,
  quantity.sum as total_units_sold,
  (quantity * unit_price).sum as total_revenue,
  _.count_distinct(product) as unique_products
where total_revenue > 10000
order by month, total_revenue desc;

┌─────────┬────────┬────────────┬──────────────────┬───────────────┬─────────────────┐
│  month  │ region │ num_orders │ total_units_sold │ total_revenue │ unique_products │
│ string  │ string │    long    │       int        │      int      │      long       │
├─────────┼────────┼────────────┼──────────────────┼───────────────┼─────────────────┤
│ 2024-01 │ North  │          2 │               15 │         14000 │               2 │
│ 2024-02 │ South  │          2 │               16 │         14400 │               2 │
│ 2024-02 │ North  │          2 │               13 │         12600 │               2 │
├─────────┴────────┴────────────┴──────────────────┴───────────────┴─────────────────┤
│ 3 rows                                                                              │
└─────────────────────────────────────────────────────────────────────────────────────┘

tip

The _ underscore in aggregations represents the group of rows. You can use it with various aggregation functions like _.count, _.sum, _.avg, _.max, _.min, _.count_distinct(), _.array_agg(), _.max_by(), _.min_by(), etc.

note

The where clause after agg is equivalent to SQL's HAVING clause - it filters groups based on aggregated values, not individual rows.

limit

Limit the number of output rows:

from nation
limit 10

order by

Sort the output rows by the given column:

from nation
order by n_name
limit 5

The order by statement can follow asc, desc ordering specifier as in SQL. You can add nulls first or nulls last to specify the order of null values.

join

The join operator combines rows from two or more tables based on a related column between them. Wvlet supports various types of joins similar to SQL, but with a more intuitive flow-style syntax.

Basic Join (Inner Join)

The basic join returns rows when there is a match in both tables:

from
  [[1, "apple", 50], [2, "banana", 10], [3, "cherry", 70]]
  as fruit(id, name, price)
join {
  from [["o1", 1, 10], ["o2", 2, 5]]
  as fruit_order(order_id, fruit_id, qty)
}
on fruit.id = fruit_order.fruit_id
select fruit.name, fruit_order.qty, fruit.price * fruit_order.qty as total_price;

┌────────┬─────┬─────────────┐
│  name  │ qty │ total_price │
│ string │ int │     int     │
├────────┼─────┼─────────────┤
│ apple  │  10 │         500 │
│ banana │   5 │          50 │
├────────┴─────┴─────────────┤
│ 2 rows                     │
└────────────────────────────┘

Left Join

A left join returns all rows from the left table, and matched rows from the right table. If no match, NULL values are returned for right table columns:

from [[1, "Alice"], [2, "Bob"], [3, "Charlie"]] as users(id, name)
left join {
  from [[1, "order1"], [1, "order2"], [3, "order3"]]
  as orders(user_id, order_id)
}
on users.id = orders.user_id
select users.name, orders.order_id;

┌─────────┬──────────┐
│  name   │ order_id │
│ string  │  string  │
├─────────┼──────────┤
│ Alice   │ order1   │
│ Alice   │ order2   │
│ Bob     │ null     │
│ Charlie │ order3   │
├─────────┴──────────┤
│ 4 rows             │
└────────────────────┘

Right Join

A right join returns all rows from the right table, and matched rows from the left table:

from [[1, "order1"], [1, "order2"], [3, "order3"], [4, "order4"]]
  as orders(user_id, order_id)
right join {
  from [[1, "Alice"], [2, "Bob"], [3, "Charlie"]]
  as users(id, name)
}
on orders.user_id = users.id
select users.name, orders.order_id;

┌─────────┬──────────┐
│  name   │ order_id │
│ string  │  string  │
├─────────┼──────────┤
│ Alice   │ order1   │
│ Alice   │ order2   │
│ Bob     │ null     │
│ Charlie │ order3   │
├─────────┴──────────┤
│ 4 rows             │
└────────────────────┘

Cross Join

A cross join produces the Cartesian product of two tables, combining each row from the first table with every row from the second table:

from [[1, "A"], [2, "B"]] as t1(id, val)
cross join {
  from [["X"], ["Y"], ["Z"]] as t2(letter)
}
select t1.val, t2.letter;

┌────────┬────────┐
│  val   │ letter │
│ string │ string │
├────────┼────────┤
│ A      │ X      │
│ A      │ Y      │
│ A      │ Z      │
│ B      │ X      │
│ B      │ Y      │
│ B      │ Z      │
├────────┴────────┤
│ 6 rows         │
└────────────────┘

Join with Multiple Conditions

You can specify multiple conditions in the on clause using and/or operators:

from [[1, "A", 100], [2, "B", 200], [3, "C", 300]] as t1(id, code, value)
join {
  from [[1, "A", 10], [2, "B", 20], [3, "D", 30]] as t2(id, code, qty)
}
on t1.id = t2.id and t1.code = t2.code
select t1.id, t1.code, t1.value, t2.qty;

┌─────┬────────┬───────┬─────┐
│ id  │  code  │ value │ qty │
│ int │ string │  int  │ int │
├─────┼────────┼───────┼─────┤
│   1 │ A      │   100 │  10 │
│   2 │ B      │   200 │  20 │
├─────┴────────┴───────┴─────┤
│ 2 rows                     │
└────────────────────────────┘

Combining Joins with Other Operators

Joins can be seamlessly combined with other Wvlet operators in a flow-style query:

from [[1, "Electronics", 1000], [2, "Books", 500], [3, "Clothing", 750]]
  as categories(id, name, budget)
left join {
  from [[1, "Laptop", 800], [1, "Phone", 600], [2, "Novel", 20], [4, "Unknown", 100]]
  as products(category_id, name, price)
}
on categories.id = products.category_id
where products.price != null
group by categories.name
agg 
  _.count as product_count,
  products.price.sum as total_price
order by product_count desc;

┌─────────────┬───────────────┬─────────────┐
│    name     │ product_count │ total_price │
│   string    │     long      │     int     │
├─────────────┼───────────────┼─────────────┤
│ Electronics │             2 │        1400 │
│ Books       │             1 │          20 │
├─────────────┴───────────────┴─────────────┤
│ 2 rows                                     │
└────────────────────────────────────────────┘

Multiple Joins

You can chain multiple join operations to combine data from several tables:

from [[1, "Alice"], [2, "Bob"], [3, "Charlie"]] as users(id, name)
join {
  from [[1, 101], [2, 102], [3, 103]] as accounts(user_id, account_id)
}
on users.id = accounts.user_id
join {
  from [[101, 1000], [102, 2000], [103, 1500]] as balances(account_id, amount)
}
on accounts.account_id = balances.account_id
select users.name, accounts.account_id, balances.amount;

┌─────────┬────────────┬────────┐
│  name   │ account_id │ amount │
│ string  │    int     │  int   │
├─────────┼────────────┼────────┤
│ Alice   │        101 │   1000 │
│ Bob     │        102 │   2000 │
│ Charlie │        103 │   1500 │
├─────────┴────────────┴────────┤
│ 3 rows                        │
└───────────────────────────────┘

Self Join

A table can be joined with itself (self join) by using aliases. The with statement helps avoid duplicating the data:

with employees as {
  from [[1, "Alice", 2], [2, "Bob", 3], [3, "Charlie", null]]
    as data(id, name, manager_id)
}
from employees as emp
left join employees as mgr
  on emp.manager_id = mgr.id
select emp.name as employee, mgr.name as manager;

┌──────────┬─────────┐
│ employee │ manager │
│  string  │ string  │
├──────────┼─────────┤
│ Alice    │ Bob     │
│ Bob      │ Charlie │
│ Charlie  │ null    │
├──────────┴─────────┤
│ 3 rows             │
└────────────────────┘

tip

Unlike SQL where you write SELECT ... FROM table1 JOIN table2, Wvlet's flow-style syntax starts with from and then adds join as a subsequent operation. This makes it easier to build queries incrementally and matches the logical flow of data processing.

note

For time-based joins, Wvlet also supports AsOf Join, which is useful for joining tables based on the most recent value at a specific time.

concat

The concat operator concatenates rows from multiple subqueries. This is similar to the UNION ALL operator in SQL.

from [[1, "A"], [2, "B"]] as t1(id, val)
concat {
  from [[3, "C"], [4, "D"]] as t2(id, val)
}
order by id;

┌─────┬────────┐
│ id  │  val   │
│ int │ string │
├─────┼────────┤
│   1 │ A      │
│   2 │ B      │
│   3 │ C      │
│   4 │ D      │
├─────┴────────┤
│ 4 rows       │
└──────────────┘

The ordering of rows is not guaranteed in the concat operator. If you need to sort the output, use the order by operator after the concat operator.

You can concatenate multiple subqueries:

from nation
where n_regionkey = 0
concat {
  from nation
  where n_regionkey = 1
}
concat {
  from nation
  where n_regionkey = 2
}
select n_name, n_regionkey
order by n_regionkey, n_name

dedup

The dedup operator removes duplicated rows from the input rows. This is equivalent to select distinct * in SQL.

from [[1, "A"], [2, "B"], [1, "A"], [3, "C"], [2, "B"]]
  as data(id, val)
dedup;

┌─────┬────────┐
│ id  │  val   │
│ int │ string │
├─────┼────────┤
│   1 │ A      │
│   2 │ B      │
│   3 │ C      │
├─────┴────────┤
│ 3 rows       │
└──────────────┘

intersect

The intersect operator returns the intersection of the input rows from multiple subqueries. By default, set semantics are used (duplicates removed).

from [[1, "A"], [2, "B"], [3, "C"], [4, "D"]] as t1(id, val)
intersect {
  from [[2, "B"], [3, "C"], [5, "E"]] as t2(id, val)
};

┌─────┬────────┐
│ id  │  val   │
│ int │ string │
├─────┼────────┤
│   2 │ B      │
│   3 │ C      │
├─────┴────────┤
│ 2 rows       │
└──────────────┘

With all keyword, bag semantics are used (duplicates preserved):

from [[1, "A"], [2, "B"], [2, "B"], [3, "C"]] as t1(id, val)
intersect all {
  from [[2, "B"], [2, "B"], [3, "C"], [3, "C"]] as t2(id, val)
};

┌─────┬────────┐
│ id  │  val   │
│ int │ string │
├─────┼────────┤
│   2 │ B      │
│   2 │ B      │
│   3 │ C      │
├─────┴────────┤
│ 3 rows       │
└──────────────┘

except

The except operator returns the difference of the input rows from multiple subqueries (rows in the first query but not in the second). By default, set semantics are used.

from [[1, "A"], [2, "B"], [3, "C"], [4, "D"]] as t1(id, val)
except {
  from [[2, "B"], [4, "D"]] as t2(id, val)
};

┌─────┬────────┐
│ id  │  val   │
│ int │ string │
├─────┼────────┤
│   1 │ A      │
│   3 │ C      │
├─────┴────────┤
│ 2 rows       │
└──────────────┘

With all keyword, bag semantics are used:

from [[1, "A"], [2, "B"], [2, "B"], [3, "C"]] as t1(id, val)
except all {
  from [[2, "B"]] as t2(id, val)
};

┌─────┬────────┐
│ id  │  val   │
│ int │ string │
├─────┼────────┤
│   1 │ A      │
│   2 │ B      │
│   3 │ C      │
├─────┴────────┤
│ 3 rows       │
└──────────────┘

pivot

The pivot operator expands the column values as horizontal columns. In pivot, you need to specify grouping columns (group by) and aggregation expressions (agg) in the subsequent operators.

Example:

from orders
pivot on o_orderpriority
group by o_orderstatus
agg _.count;

┌───────────────┬──────────┬────────┬──────────┬─────────────────┬───────┐
│ o_orderstatus │ 1-URGENT │ 2-HIGH │ 3-MEDIUM │ 4-NOT SPECIFIED │ 5-LOW │
│    string     │   long   │  long  │   long   │      long       │ long  │
├───────────────┼──────────┼────────┼──────────┼─────────────────┼───────┤
│ F             │     1468 │   1483 │     1445 │            1465 │  1443 │
│ O             │     1488 │   1506 │     1421 │            1482 │  1436 │
│ P             │       64 │     76 │       75 │              77 │    71 │
├───────────────┴──────────┴────────┴──────────┴─────────────────┴───────┤
│ 3 rows                                                                 │
└────────────────────────────────────────────────────────────────────────┘

To specify the column values to expand, use in clause:

from orders
pivot on o_orderpriority in ('1-URGENT', '2-HIGH')
group by o_orderstatus
agg _.count;

┌───────────────┬──────────┬────────┐
│ o_orderstatus │ 1-URGENT │ 2-HIGH │
│    string     │   long   │  long  │
├───────────────┼──────────┼────────┤
│ F             │     1468 │   1483 │
│ O             │     1488 │   1506 │
│ P             │       64 │     76 │
├───────────────┴──────────┴────────┤
│ 3 rows                            │
└───────────────────────────────────┘

unpivot

The unpivot operator transforms multiple columns into rows. This is useful when you need to transform wide table into a long table. Currently, unpivot is available only in DuckDB backend.

Example:

from [
 [1, 'electronics', 1, 2, 3, 4, 5, 6],
 [2, 'clothes', 10, 20, 30, 40, 50, 60],
 [3, 'cars', 100, 200, 300, 400, 500, 600]
] as sales(id, dept, jan, feb, mar, apr, may, jun)
unpivot
  sales for month in (jan, feb, mar, apr, may, jun)

┌─────┬─────────────┬────────┬───────┐
│ id  │    dept     │ month  │ sales │
│ int │   string    │ string │  int  │
├─────┼─────────────┼────────┼───────┤
│   1 │ electronics │ jan    │     1 │
│   1 │ electronics │ feb    │     2 │
│   1 │ electronics │ mar    │     3 │
│   1 │ electronics │ apr    │     4 │
│   1 │ electronics │ may    │     5 │
│   1 │ electronics │ jun    │     6 │
│   2 │ clothes     │ jan    │    10 │
│   2 │ clothes     │ feb    │    20 │
│   2 │ clothes     │ mar    │    30 │
│   2 │ clothes     │ apr    │    40 │
│   2 │ clothes     │ may    │    50 │
│   2 │ clothes     │ jun    │    60 │
│   3 │ cars        │ jan    │   100 │
│   3 │ cars        │ feb    │   200 │
│   3 │ cars        │ mar    │   300 │
│   3 │ cars        │ apr    │   400 │
│   3 │ cars        │ may    │   500 │
│   3 │ cars        │ jun    │   600 │
├─────┴─────────────┴────────┴───────┤
│ 18 rows                            │
└────────────────────────────────────┘

sample

The sample operator randomly samples rows from the input data. You can specify either a fixed number of rows or a percentage.

Sample a fixed number of rows:

from [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]] as numbers(n)
sample 5;

┌─────┐
│  n  │
│ int │
├─────┤
│   2 │
│   4 │
│   7 │
│   8 │
│  10 │
├─────┤
│ 5 … │
└─────┘

Sample a percentage of rows:

from [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]] as numbers(n)
sample 30%;

┌─────┐
│  n  │
│ int │
├─────┤
│   3 │
│   6 │
│   9 │
├─────┤
│ 3 … │
└─────┘

You can specify the sampling method - reservoir (default), system, or bernoulli:

-- Reservoir sampling (default) - good for getting exact sample size
from large_table
sample reservoir (1000)

-- System sampling - faster but approximate, samples blocks of data
from large_table  
sample system (10%)

-- Bernoulli sampling - samples individual rows with given probability
from large_table
sample bernoulli (10%)

tip

Reservoir sampling guarantees the exact number of rows in the output (if available), while system and bernoulli sampling may return approximately the requested percentage of rows.

with

To define a local subquery definition, use the with expression:

with t1 as {
  from nation
  where n_regionkey = 1
}
from t1

You can define multiple subqueries:

with t1 as {
  from nation
  where n_regionkey = 1
}
with t2 as {
  from t1
  limit 10
}
from t2

tip

The with expression is similar to the common-table expression (CTE) in SQL, but for each subquery, you need to use a separate with clause. This is for convenience of adding multiple sub queries in separate line blocks without concatenating them with commas (,).

Common Query Patterns

Filter → Group → Aggregate

A common pattern for analyzing data:

from sales
where date >= '2024-01-01':date
group by product_category
agg 
  _.count as num_sales,
  revenue.sum as total_revenue
order by total_revenue desc
limit 10

Join → Filter → Select

Combining data from multiple tables:

from customers
join {
  from orders
  where status = 'completed'
}
on customers.id = orders.customer_id
select 
  customers.name,
  orders.total,
  orders.created_at
order by orders.created_at desc

Pivot → Aggregate

Creating cross-tab reports:

from sales_data
pivot on month
group by product
agg revenue.sum

Multiple CTEs → Complex Analysis

Using CTEs for complex queries:

with active_users as {
  from users
  where last_login >= current_date - '30 days':interval
}
with user_orders as {
  from orders
  where created_at >= current_date - '30 days':interval
}
from active_users
left join user_orders
  on active_users.id = user_orders.user_id
group by active_users.segment
agg 
  _.count as user_count,
  user_orders.total.sum as revenue

This reference guide provides a comprehensive overview of Wvlet's query syntax. For hands-on examples and tutorials, check out the Quick Start guide.

Quick Navigation​

Operator Quick Reference​

Core Operators at a Glance​

Typical Flow of a Wvlet Query​

Relational Operators​

Data Source Operators​

Catalog Discovery​

Reading Data​

Filtering & Selection​

Column Operations​

Aggregation & Grouping​

Common Aggregation Functions​

Joining & Combining Data​

Sorting & Transformation​

Advanced Operations​

Update Operations​

Inspection Commands​

Expressions​

Variable Definition​

Conditional Expressions​

If Expression​

Case Expression​

String Expressions​

Array Expressions​

Map Expressions​

Struct/Row Expressions​

Lambda Expression​

Examples​

show tables​

show schemas​

show catalogs​

from​

Raw SQL statements​

where​

count​

add​

prepend​

select​

rename​

exclude​

shift​

group by​

Basic Grouping​

Referencing Grouping Keys​

agg​

Basic Aggregations​

Available Aggregation Functions​

Filtering Groups (HAVING clause)​

Multiple Grouping with Complex Aggregations​

Using Aggregations Without group by​

Practical Example: Sales Analysis​

limit​

order by​

join​

Basic Join (Inner Join)​

Left Join​

Right Join​

Cross Join​

Join with Multiple Conditions​

Combining Joins with Other Operators​

Multiple Joins​

Self Join​

concat​

dedup​

intersect​

except​

pivot​

unpivot​

sample​

with​

Common Query Patterns​

Filter → Group → Aggregate​

Join → Filter → Select​

Pivot → Aggregate​

Multiple CTEs → Complex Analysis​

Quick Navigation

Operator Quick Reference

Core Operators at a Glance

Typical Flow of a Wvlet Query

Relational Operators

Data Source Operators

Catalog Discovery

Reading Data

Filtering & Selection

Column Operations

Aggregation & Grouping

Common Aggregation Functions

Joining & Combining Data

Sorting & Transformation

Advanced Operations

Update Operations

Inspection Commands

Expressions

Variable Definition

Conditional Expressions

If Expression

Case Expression

String Expressions

Array Expressions

Map Expressions

Struct/Row Expressions

Lambda Expression

Examples

show tables

show schemas

show catalogs

from

Raw SQL statements

where

count

add

prepend

select

rename

exclude

shift

group by

Basic Grouping

Referencing Grouping Keys

agg

Basic Aggregations

Available Aggregation Functions

Filtering Groups (HAVING clause)

Multiple Grouping with Complex Aggregations

Using Aggregations Without group by

Practical Example: Sales Analysis

limit

order by

join

Basic Join (Inner Join)

Left Join

Right Join

Cross Join

Join with Multiple Conditions

Combining Joins with Other Operators

Multiple Joins

Self Join

concat

dedup

intersect

except

pivot

unpivot

sample

with

Common Query Patterns

Filter → Group → Aggregate

Join → Filter → Select

Pivot → Aggregate

Multiple CTEs → Complex Analysis