Skip to main content

Query Syntax Reference

Wvlet is a query language designed to be more human-readable and easier to write than SQL. If you are familiar to SQL, you will find it easy to learn Wvlet syntax due to the many similarities. If you know about DataFrame in Python, it will help you understand the Wvlet query language, as chaining relational operators in flow-style is similar to using the DataFrame API.

Getting Started

New to Wvlet? Check out the Quick Start tutorial for a hands-on introduction.

Quick Navigation

Operator Quick Reference

Core Operators at a Glance

OperatorDescriptionSQL EquivalentExample
Data Sources
fromRead data from table/fileSELECT * FROMfrom customers
withDefine reusable subqueryWITH ... ASwith t as { from users }
show tablesList available tablesSHOW TABLESshow tables
Filtering & Selection
whereFilter rowsWHEREwhere age > 21
selectChoose columnsSELECTselect name, age
limitLimit row countLIMITlimit 10
sampleRandom samplingTABLESAMPLEsample 1000
Column Operations
addAdd new columnsSELECT *, expr ASadd price * 0.9 as discounted
excludeRemove columns-exclude password
renameRename columnsASrename user_id as id
shiftReorder columns-shift name, email
Aggregation
group byGroup rowsGROUP BYgroup by category
aggAdd aggregationsSELECT agg_func()agg _.count, revenue.sum
pivotPivot tablePIVOTpivot on month
Joining Data
joinInner joinJOINjoin orders on id = user_id
left joinLeft outer joinLEFT JOINleft join orders on ...
concatUnion allUNION ALLconcat { from archive }
Transformation
order bySort rowsORDER BYorder by created_at desc
dedupRemove duplicatesDISTINCTdedup
unnestExpand arraysUNNESTunnest(tags)

Typical Flow of a Wvlet Query

Wvlet queries start with the from keyword, and you can chain multiple relational operators to process the input data and generate output data. Here's a typical flow:

from ...         -- Scan the input data
where ... -- Apply filtering conditions
where ... -- [optional] Apply more filtering conditions
add ... as ... -- Add new columns
group by ... -- Group rows by the given columns
agg ... -- Add group aggregation expressions, e.g., _.count, _.sum
where ... -- Apply filtering conditions for groups (HAVING clause in SQL)
exclude ... -- Remove columns from the output
shift ... -- Shift the column position to the left
select ... -- Select columns to output
order by ... -- Sort the rows by the given columns
limit ... -- Limit the number of rows to output

Unlike SQL, whose queries always must follow the SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ... structure, Wvlet uses the flow-style syntax to match the order of data processing order as much as possible, facilitating more intuitive query writing. A query should have a from statement to read the data, but select is not mandatory in Wvlet.

Relational Operators

In Wvlet, you need to use lower-case keywords for SQL-like operators. Operators are organized by their primary function to help you find what you need quickly.

Data Source Operators

These operators help you access and explore data sources in your environment.

Catalog Discovery

OperatorDescriptionSQL Equivalent
show tablesList all tables in current schemaSHOW TABLES
show tables in schemaList tables in specific schemaSHOW TABLES IN schema
show schemasList all schemas in catalogSHOW SCHEMAS
show schemas in catalogList schemas in specific catalogSHOW SCHEMAS IN catalog
show catalogsList all available catalogsSHOW CATALOGS
show modelsList all available models-
describe tableShow table/query schemaDESCRIBE table

Reading Data

OperatorDescriptionSQL Equivalent
from tableRead from tableSELECT * FROM table
from 'file.parquet'Read from file-
from [[...]]Read from inline valuesVALUES (...)
with alias as queryDefine reusable subqueryWITH alias AS (query)
from sql"..."Execute raw SQL-

Filtering & Selection

These operators help you filter rows and select specific columns from your data.

OperatorDescriptionSQL Equivalent
where conditionFilter rows by conditionWHERE condition
select col1, col2Select specific columnsSELECT col1, col2
select *Select all columnsSELECT *
select distinct colsSelect unique valuesSELECT DISTINCT cols
limit nLimit to n rowsLIMIT n
sample nRandom sample of n rowsTABLESAMPLE
sample n%Random sample percentageTABLESAMPLE n PERCENT
exists subqueryCheck if subquery has resultsEXISTS (subquery)

Column Operations

These operators help you manipulate columns without changing the number of rows.

OperatorDescriptionSQL Equivalent
add expr as nameAdd column to the rightSELECT *, expr AS name
prepend expr as nameAdd column to the left-
exclude col1, col2Remove specific columns-
rename old as newRename columnsold AS new
shift col1, col2Move columns to left-
shift to right colsMove columns to right-

Aggregation & Grouping

These operators help you aggregate data and perform group operations.

OperatorDescriptionSQL Equivalent
group by colsGroup rows by columnsGROUP BY cols
agg expr1, expr2Add aggregation expressionsSELECT agg_func(...)
countCount all rowsSELECT COUNT(*)
pivot on colPivot table on column valuesPIVOT
unpivot val for col in (...)Unpivot columns to rowsUNPIVOT

Common Aggregation Functions

When using agg after group by, you can use these aggregation functions:

  • _.count - Count rows in group
  • column.sum - Sum of column values
  • column.avg - Average of column values
  • column.min / column.max - Min/max values
  • _.count_distinct(column) - Count distinct values
  • _.array_agg(column) - Collect values into array

Joining & Combining Data

These operators help you combine data from multiple sources.

OperatorDescriptionSQL Equivalent
join table on conditionInner joinJOIN table ON condition
left join table on conditionLeft outer joinLEFT JOIN table ON condition
right join table on conditionRight outer joinRIGHT JOIN table ON condition
cross join tableCartesian productCROSS JOIN table
asof joinTime-based join-
concat queryUnion all resultsUNION ALL
intersect querySet intersectionINTERSECT
except querySet differenceEXCEPT

Sorting & Transformation

These operators help you sort and transform your data.

OperatorDescriptionSQL Equivalent
order by col descSort descendingORDER BY col DESC
order by col ascSort ascendingORDER BY col ASC
dedupRemove duplicate rowsSELECT DISTINCT *
unnest(array_col)Expand array to rowsUNNEST(array_col)

Advanced Operations

These operators provide advanced functionality for testing and debugging.

OperatorDescriptionUse Case
test conditionAssert test conditionsTesting queries
debug queryDebug intermediate resultsDebugging
| function(args)Apply table functionCustom transformations

Update Operations

These operators allow you to save or modify data.

OperatorDescriptionNotes
save to tableCreate new tableOverwrites if exists
save to 'file.parquet'Save to fileSupports various formats
append to tableAppend to existing tableCreates if not exists
deleteDelete matching rowsFrom source table

Inspection Commands

These commands help you understand query structure and data.

OperatorDescriptionUse Case
describe queryShow query schemaUnderstanding output columns
explain queryShow logical planQuery optimization
explain sql"..."Explain raw SQLSQL debugging

Expressions

One of the major difference from traditional SQL is that wvlet uses single or double quoted strings for representing string values and back-quoted strings for referencing column or table names, which might contain space or special characters.

OperatorDescription
'single quote'String literal for representing string values, file names, etc.
"double quote"Same as single quote strings
"""triple quote string"""Multi-line strings
`(back quote)`Column or table name, which requires quotations
sql"sql expr"SQL expression used for inline expansion
sql" ... ${expr} ..."Interpolated SQL expression with embedded expressions
s"... ${expr} ..."Interpolated strings with expressions
s`... ${expr} ...`Interpolated backquote strings with expressions
[expr, ...]Array value
[[expr, ...], ...]Array of arrays for representing table records
{key: value, ...}Struct (row) value
_Underscore refers to the previous input
agg_func(expr) over (partition by ... order by ...)Window functions for computing aggregate values computed from the entire query result. This follows similar window function syntax with SQL
_1, _2, ...Refers to 1-origin grouping keys in the preceding group by clause
1, 2, ...Refers to 1-origin column index for order by clause
expr:typeCast the value to the target type. Equivalent to cast(expr as type) in SQL
'2025-01-01':dateCast the string to a date value
'1 year':intervalCast the string to an interval of SQL

Variable Definition

You can define a variable using val keyword:

-- Define a new variable
val name = 'wvlet'

-- Variable is evaluated once before running the query
select s"Hello ${x}!" as msg
-- Returns [['Hello wvlet!']]

Conditional Expressions

OperatorDescription
expr and exprLogical AND
expr or exprLogical OR
not exprLogical NOT
!exprLogical NOT
expr is exprEquality check
expr = exprEquality check
expr is not exprInequality check
expr != exprInequality check
expr is nullTrue if the expression is null
expr = nullTrue if the expression is null
expr is not nullTrue if the expression is not null
expr != nullTrue if the expression is not null
expr in (v1, v2, ...)True if the expression value is in the given list
expr in { from ... }True if the expression value is in the given list provided by a sub query
expr not in (v1, v2, ...)True if the expression is not in the given list
expr between v1 and v2True if the expression value is between v1 and v2, i.e., v1 ≤ (value) ≤ v2
expr like patternTrue if the expression matches the given pattern, e.g., , 'abc%'
exists { from ... }True if the subquery returns any rows
not exists { from ... }True if the subquery returns no rows

If Expression

if ... then .. else expression can be used to switch the output value based on the condition:

from lineitem
select
if l_returnflag = 'A' then 1 else 0
as return_code

The if expression can be nested as follows:

from lineitem
select
if l_returnflag = 'A' then 1
else if l_returnflag = 'R' then 2
else 0
as return_code

Case Expression

To switch the output value based on the input value, you can use the case expression:

from lineitem
select
case l_returnflag
when 'A' then 1
when 'R' then 2
else 0
as return_code

You can also use the case expression with conditional expressions for clarity:

from lineitem
select
case
when l_returnflag = 'A' then 1
when l_returnflag = 'R' then 2
else 0
as return_code
tip

Unlike SQL, Wvlet doesn't require end at the end of case expressions.

String Expressions

OperatorDescription
expr + exprConcatenate two strings

Array Expressions

You can construct array values with [e1, e2, ...] syntax:

select ['a', 'b', 'c'] as arr

Arrays can be accessed with index (1-origin):

select ['a', 'b', 'c'] as arr
select arr[1] as first_element

Map Expressions

You can construct map values with map {k1: v1, k2: v2, ...} syntax. Unlike struct expressions, keys (k1, k2, ...) needs to be the same type values, and values (v1, v2, ...) also need to be the same type values:

select map {'a': 1, 'b': 2} as m

Struct/Row Expressions

Struct (row) expressions are used to represent key-value pairs. You can access the values by name:

select {'i': 3, 's': 'str'} as obj
select
-- Name based access
obj.i, obj.s,
-- Lookup by name
obj['i'], obj['s']

Note that key names in a struct do not require quotes for plain key names:

select {k1:1, k2:'v1'} as obj
select obj.k1, obj.k2

Lambda Expression

To manipulate array values, you can use lambda expressions:

select list_transform([4, 5, 6], x -> x + 1)

This query applies the lambda function x -> x + 1 to each element of the input array:

┌────────────┐
│ arr │
│ array(int) │
├────────────┤
│ [5, 6, 7] │
├────────────┤
│ 1 rows │
└────────────┘

To pass multiple arguments to the lambda function, use the following syntax:

select list_reduce([4, 5, 6], (a, b) -> a + b) as sum;

┌─────┐
│ sum │
int
├─────┤
15
├─────┤
1 … │
└─────┘

Lambda functions can be nested as follows:

select
list_transform(
[1, 2, 3],
x -> list_reduce([4, 5, 6], (a, b) -> a + b) + x
) as arr;

┌──────────────┐
│ arr │
│ array(int)
├──────────────┤
[16, 17, 18]
├──────────────┤
1 rows
└──────────────┘

Examples

show tables

To find a target table to query in the current database (schema), you can use the show tables query:

show tables

The output of show tables can be followed by the where clause to filter the table names:

show tables
where name like 'n%';

┌────────┐
│ name │
│ string │
├────────┤
│ nation │
├────────┤
1 rows
└────────┘

show schemas

To find a target database (schema) to query in the current catalog, you can use the show schemas query:

show schemas;

┌─────────┬────────────────────┐
│ catalog │ name │
│ string │ string │
├─────────┼────────────────────┤
│ memory │ information_schema │
│ system │ information_schema │
temp │ information_schema │
│ memory │ main │
│ system │ main │
temp │ main │
│ memory │ pg_catalog │
│ system │ pg_catalog │
temp │ pg_catalog │
├─────────┴────────────────────┤
9 rows
└──────────────────────────────┘

These schema names can be used to specify the target schema in the show tables command:

show tables in main 
where name like 'p%';

┌──────────┐
│ name │
│ string │
├──────────┤
│ part │
│ partsupp │
├──────────┤
2 rows
└──────────┘

show catalogs

List all available catalogs:

show catalogs;

┌────────┐
│ name │
│ string │
├────────┤
│ memory │
│ system │
temp
├────────┤
3 rows
└────────┘

from

Read rows from a given table:

from nation

In DuckDB backend, you can read data from local Parquet or JSON files:

from 'sample.parquet' 

Or from files on the Web:

from 'https://(path to your data)/sample.parquet'

Reading data from an URL will also work for S3 Presigned URLs.

Raw SQL statements

In case you need to use a raw SQL statement, you can use the sql string interpolation:

from sql"select * from nation"
warning

Although raw SQL statements are convenient, Wvlet needs to issue a real query to the target DBMS to inspect the data types of the query result, which might slowdown the query or cause errors in the subsequent relational operators.

where

You can apply a filter condition to the input rows:

from nation
where n_regionkey = 1

Unlike SQL, applying multiple where clauses is allowed in Wvlet:

from nation
where n_regionkey = 1
where n_name like 'U%'

This is useful even when you need to generate Wvlet queries programatically as you only need to append new condition lines to the query. In SQL, you need to parse the text inside WHERE clause and concatenate condition with AND expression.

count

Wvlet has a shorthand notation for counting the number of input rows:

from nation
count

This is equilvalen to:

from nation
select _.count

Or more SQL-like expression can be used too:

from nation
select count(*)

For clarity, you can use pipe | before the count operator:

from nation
select n_name, n_regionkey,
-- pipe (|) is required after trailing comma
| count

add

Add a new column to the input. This operator is helpful to reduce the typing effort as you don't need to enumerate all the columns in the select clause:

from nation
select n_name
add n_name.substring(1, 3) as prefix
limit 5;

┌───────────┬────────┐
│ n_name │ prefix │
│ string │ string │
├───────────┼────────┤
│ ALGERIA │ ALG │
│ ARGENTINA │ ARG │
│ BRAZIL │ BRA │
│ CANADA │ CAN │
│ EGYPT │ EGY │
├───────────┴────────┤
5 rows
└────────────────────┘

prepend

Prepend a new column to the left side of the input:

from nation
select n_name
prepend n_name.substring(1, 3) as prefix
limit 5;

┌────────┬───────────┐
│ prefix │ n_name │
│ string │ string │
├────────┼───────────┤
│ ALG │ ALGERIA │
│ ARG │ ARGENTINA │
│ BRA │ BRAZIL │
│ CAN │ CANADA │
│ EGY │ EGYPT │
├────────┴───────────┤
5 rows
└────────────────────┘

select

Output only the selected columns from the input:

from nation
select n_name, n_regionkey

As in SQL, arbitrary expressions can be used in select statements. Note that, however, in Wvlet, you usually don't need to use select statement as the other column-at-a-time operators, like add, exclude, shift, etc. can be used for manipulating column values.

rename

The rename operator changes the column names in the input rows. This is useful when you want to rename columns in the output.

from nation
rename n_name as nation_name

exclude

The exclude operator removes specified columns from the input rows. This is useful when you want to drop certain columns from the output.

from nation
exclude n_regionkey

shift

The shift operator changes the position of specified columns in the input rows. You can move columns to the left or right.

Shift to the left (default):

from nation
shift n_name

Shift to the right:

from nation
shift to right n_comment

group by

The group by operator groups rows by one or more columns, creating groups of rows that share the same values in the specified columns.

Basic Grouping

from [[1, "A", 100], [2, "B", 200], [1, "A", 150], [2, "B", 50], [3, "C", 300]]
as data(id, category, value)
group by id, category;

┌─────┬──────────┬───────┐
│ id │ category │ value
int │ string │ int
├─────┼──────────┼───────┤
1 │ A │ 100
2 │ B │ 200
3 │ C │ 300
├─────┴──────────┴───────┤
3 rows
└────────────────────────┘
note

In SQL, you need to specify aggregation operators like count, sum, avg, etc. to generate output from GROUP BY. In Wvlet, the default aggregation operator arbitrary (any) is used, which returns an arbitrary value from each group.

Referencing Grouping Keys

After grouping, you can reference grouping keys by their column names directly:

from [[1, "A", 100], [2, "B", 200], [1, "A", 150], [2, "B", 50]]
as data(id, category, value)
group by id, category
select id, category;

┌─────┬──────────┐
│ id │ category │
int │ string │
├─────┼──────────┤
1 │ A │
2 │ B │
├─────┴──────────┤
2 rows
└────────────────┘
tip

While you can also reference grouping keys using _1, _2, etc. in the order they appear in the group by clause, it's recommended to use column names for better readability and maintainability.

agg

The agg operator adds aggregation expressions. It is typically used after a group by clause to aggregate data within groups, but can also be used without group by to aggregate all rows in the input.

Basic Aggregations

from [[1, "A", 100], [2, "B", 200], [1, "A", 150], [2, "B", 50], [3, "C", 300]]
as data(id, category, value)
group by category
agg
_.count as item_count,
value.sum as total_value,
value.avg as avg_value;

┌──────────┬────────────┬─────────────┬──────────────┐
│ category │ item_count │ total_value │ avg_value │
│ string │ long │ intdecimal(17,4)
├──────────┼────────────┼─────────────┼──────────────┤
│ A │ 2250125.0000
│ B │ 2250125.0000
│ C │ 1300300.0000
├──────────┴────────────┴─────────────┴──────────────┤
3 rows
└────────────────────────────────────────────────────┘

Available Aggregation Functions

The underscore _ represents the group of rows and supports various aggregation functions:

from [[1, 10, "A"], [2, 20, "B"], [1, 30, "C"], [2, 40, "D"], [1, 50, "E"]]
as sales(store_id, amount, product)
group by store_id
agg
_.count as transaction_count,
amount.sum as total_sales,
amount.avg as avg_sale,
amount.min as min_sale,
amount.max as max_sale,
_.count_distinct(product) as unique_products,
_.max_by(product, amount) as best_product,
_.min_by(product, amount) as worst_product;

┌──────────┬───────────────────┬─────────────┬──────────────┬──────────┬──────────┬─────────────────┬──────────────┬───────────────┐
│ store_id │ transaction_count │ total_sales │ avg_sale │ min_sale │ max_sale │ unique_products │ best_product │ worst_product │
int │ long │ intdecimal(17,4)intint │ long │ string │ string │
├──────────┼───────────────────┼─────────────┼──────────────┼──────────┼──────────┼─────────────────┼──────────────┼───────────────┤
139030.000010503 │ E │ A │
226030.000020402 │ D │ B │
├──────────┴───────────────────┴─────────────┴──────────────┴──────────┴──────────┴─────────────────┴──────────────┴───────────────┤
2 rows
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Filtering Groups (HAVING clause)

You can filter groups after aggregation using a where clause after agg:

from [[1, "Electronics", 1000], [2, "Books", 50], [3, "Electronics", 2000], 
[4, "Books", 75], [5, "Toys", 500], [6, "Electronics", 1500]]
as orders(id, category, amount)
group by category
agg
_.count as order_count,
amount.sum as total_amount
where total_amount > 1000
order by total_amount desc;

┌─────────────┬─────────────┬──────────────┐
│ category │ order_count │ total_amount │
│ string │ long │ int
├─────────────┼─────────────┼──────────────┤
│ Electronics │ 34500
├─────────────┴─────────────┴──────────────┤
1 rows
└───────────────────────────────────────────┘

Multiple Grouping with Complex Aggregations

from [["2024-01-01", "A", "Electronics", 100],
["2024-01-01", "B", "Books", 50],
["2024-01-02", "A", "Electronics", 200],
["2024-01-02", "A", "Books", 75],
["2024-01-03", "B", "Electronics", 150]]
as sales(date, store, category, amount)
group by date, store
agg
_.count as transactions,
amount.sum as daily_total,
_.array_agg(category) as categories_sold,
_.array_agg(amount) as amounts;

┌────────────┬───────┬──────────────┬─────────────┬───────────────────────┬───────────────┐
date │ store │ transactions │ daily_total │ categories_sold │ amounts │
│ string │ string│ long │ int │ array(string) │ array(int)
├────────────┼───────┼──────────────┼─────────────┼───────────────────────┼───────────────┤
2024-01-01 │ A │ 1100[Electronics][100]
2024-01-01 │ B │ 150[Books][50]
2024-01-02 │ A │ 2275[Electronics, Books][200, 75]
2024-01-03 │ B │ 1150[Electronics][150]
├────────────┴───────┴──────────────┴─────────────┴───────────────────────┴───────────────┤
4 rows
└───────────────────────────────────────────────────────────────────────────────────────────┘

Using Aggregations Without group by

You can use aggregations directly without group by to aggregate all rows:

from [[1, 100], [2, 200], [3, 300], [4, 400], [5, 500]]
as data(id, value)
agg
_.count as total_rows,
value.sum as sum_all,
value.avg as average;

┌────────────┬─────────┬──────────────┐
│ total_rows │ sum_all │ average │
│ long │ intdecimal(17,4)
├────────────┼─────────┼──────────────┤
51500300.0000
├────────────┴─────────┴──────────────┤
1 rows
└──────────────────────────────────────┘

Practical Example: Sales Analysis

Here's a practical example analyzing sales data:

from [["2024-01", "North", "Laptop", 5, 1200],
["2024-01", "North", "Phone", 10, 800],
["2024-01", "South", "Laptop", 3, 1200],
["2024-01", "South", "Tablet", 7, 600],
["2024-02", "North", "Laptop", 8, 1200],
["2024-02", "North", "Tablet", 5, 600],
["2024-02", "South", "Phone", 12, 800],
["2024-02", "South", "Laptop", 4, 1200]]
as sales(month, region, product, quantity, unit_price)
group by month, region
agg
_.count as num_orders,
quantity.sum as total_units_sold,
(quantity * unit_price).sum as total_revenue,
_.count_distinct(product) as unique_products
where total_revenue > 10000
order by month, total_revenue desc;

┌─────────┬────────┬────────────┬──────────────────┬───────────────┬─────────────────┐
month │ region │ num_orders │ total_units_sold │ total_revenue │ unique_products │
│ string │ string │ long │ intint │ long │
├─────────┼────────┼────────────┼──────────────────┼───────────────┼─────────────────┤
2024-01 │ North │ 215140002
2024-02 │ South │ 216144002
2024-02 │ North │ 213126002
├─────────┴────────┴────────────┴──────────────────┴───────────────┴─────────────────┤
3 rows
└─────────────────────────────────────────────────────────────────────────────────────┘
tip

The _ underscore in aggregations represents the group of rows. You can use it with various aggregation functions like _.count, _.sum, _.avg, _.max, _.min, _.count_distinct(), _.array_agg(), _.max_by(), _.min_by(), etc.

note

The where clause after agg is equivalent to SQL's HAVING clause - it filters groups based on aggregated values, not individual rows.

limit

Limit the number of output rows:

from nation
limit 10

order by

Sort the output rows by the given column:

from nation
order by n_name
limit 5

The order by statement can follow asc, desc ordering specifier as in SQL. You can add nulls first or nulls last to specify the order of null values.

join

The join operator combines rows from two or more tables based on a related column between them. Wvlet supports various types of joins similar to SQL, but with a more intuitive flow-style syntax.

Basic Join (Inner Join)

The basic join returns rows when there is a match in both tables:

from
[[1, "apple", 50], [2, "banana", 10], [3, "cherry", 70]]
as fruit(id, name, price)
join {
from [["o1", 1, 10], ["o2", 2, 5]]
as fruit_order(order_id, fruit_id, qty)
}
on fruit.id = fruit_order.fruit_id
select fruit.name, fruit_order.qty, fruit.price * fruit_order.qty as total_price;

┌────────┬─────┬─────────────┐
│ name │ qty │ total_price │
│ string │ intint
├────────┼─────┼─────────────┤
│ apple │ 10500
│ banana │ 550
├────────┴─────┴─────────────┤
2 rows
└────────────────────────────┘

Left Join

A left join returns all rows from the left table, and matched rows from the right table. If no match, NULL values are returned for right table columns:

from [[1, "Alice"], [2, "Bob"], [3, "Charlie"]] as users(id, name)
left join {
from [[1, "order1"], [1, "order2"], [3, "order3"]]
as orders(user_id, order_id)
}
on users.id = orders.user_id
select users.name, orders.order_id;

┌─────────┬──────────┐
│ name │ order_id │
│ string │ string │
├─────────┼──────────┤
│ Alice │ order1 │
│ Alice │ order2 │
│ Bob │ null
│ Charlie │ order3 │
├─────────┴──────────┤
4 rows
└────────────────────┘

Right Join

A right join returns all rows from the right table, and matched rows from the left table:

from [[1, "order1"], [1, "order2"], [3, "order3"], [4, "order4"]]
as orders(user_id, order_id)
right join {
from [[1, "Alice"], [2, "Bob"], [3, "Charlie"]]
as users(id, name)
}
on orders.user_id = users.id
select users.name, orders.order_id;

┌─────────┬──────────┐
│ name │ order_id │
│ string │ string │
├─────────┼──────────┤
│ Alice │ order1 │
│ Alice │ order2 │
│ Bob │ null
│ Charlie │ order3 │
├─────────┴──────────┤
4 rows
└────────────────────┘

Cross Join

A cross join produces the Cartesian product of two tables, combining each row from the first table with every row from the second table:

from [[1, "A"], [2, "B"]] as t1(id, val)
cross join {
from [["X"], ["Y"], ["Z"]] as t2(letter)
}
select t1.val, t2.letter;

┌────────┬────────┐
│ val │ letter │
│ string │ string │
├────────┼────────┤
│ A │ X │
│ A │ Y │
│ A │ Z │
│ B │ X │
│ B │ Y │
│ B │ Z │
├────────┴────────┤
6 rows
└────────────────┘

Join with Multiple Conditions

You can specify multiple conditions in the on clause using and/or operators:

from [[1, "A", 100], [2, "B", 200], [3, "C", 300]] as t1(id, code, value)
join {
from [[1, "A", 10], [2, "B", 20], [3, "D", 30]] as t2(id, code, qty)
}
on t1.id = t2.id and t1.code = t2.code
select t1.id, t1.code, t1.value, t2.qty;

┌─────┬────────┬───────┬─────┐
│ id │ code │ value │ qty │
int │ string │ intint
├─────┼────────┼───────┼─────┤
1 │ A │ 10010
2 │ B │ 20020
├─────┴────────┴───────┴─────┤
2 rows
└────────────────────────────┘

Combining Joins with Other Operators

Joins can be seamlessly combined with other Wvlet operators in a flow-style query:

from [[1, "Electronics", 1000], [2, "Books", 500], [3, "Clothing", 750]]
as categories(id, name, budget)
left join {
from [[1, "Laptop", 800], [1, "Phone", 600], [2, "Novel", 20], [4, "Unknown", 100]]
as products(category_id, name, price)
}
on categories.id = products.category_id
where products.price != null
group by categories.name
agg
_.count as product_count,
products.price.sum as total_price
order by product_count desc;

┌─────────────┬───────────────┬─────────────┐
│ name │ product_count │ total_price │
│ string │ long │ int
├─────────────┼───────────────┼─────────────┤
│ Electronics │ 21400
│ Books │ 120
├─────────────┴───────────────┴─────────────┤
2 rows
└────────────────────────────────────────────┘

Multiple Joins

You can chain multiple join operations to combine data from several tables:

from [[1, "Alice"], [2, "Bob"], [3, "Charlie"]] as users(id, name)
join {
from [[1, 101], [2, 102], [3, 103]] as accounts(user_id, account_id)
}
on users.id = accounts.user_id
join {
from [[101, 1000], [102, 2000], [103, 1500]] as balances(account_id, amount)
}
on accounts.account_id = balances.account_id
select users.name, accounts.account_id, balances.amount;

┌─────────┬────────────┬────────┐
│ name │ account_id │ amount │
│ string │ intint
├─────────┼────────────┼────────┤
│ Alice │ 1011000
│ Bob │ 1022000
│ Charlie │ 1031500
├─────────┴────────────┴────────┤
3 rows
└───────────────────────────────┘

Self Join

A table can be joined with itself (self join) by using aliases. The with statement helps avoid duplicating the data:

with employees as {
from [[1, "Alice", 2], [2, "Bob", 3], [3, "Charlie", null]]
as data(id, name, manager_id)
}
from employees as emp
left join employees as mgr
on emp.manager_id = mgr.id
select emp.name as employee, mgr.name as manager;

┌──────────┬─────────┐
│ employee │ manager │
│ string │ string │
├──────────┼─────────┤
│ Alice │ Bob │
│ Bob │ Charlie │
│ Charlie │ null
├──────────┴─────────┤
3 rows
└────────────────────┘
tip

Unlike SQL where you write SELECT ... FROM table1 JOIN table2, Wvlet's flow-style syntax starts with from and then adds join as a subsequent operation. This makes it easier to build queries incrementally and matches the logical flow of data processing.

note

For time-based joins, Wvlet also supports AsOf Join, which is useful for joining tables based on the most recent value at a specific time.

concat

The concat operator concatenates rows from multiple subqueries. This is similar to the UNION ALL operator in SQL.

from [[1, "A"], [2, "B"]] as t1(id, val)
concat {
from [[3, "C"], [4, "D"]] as t2(id, val)
}
order by id;

┌─────┬────────┐
│ id │ val │
int │ string │
├─────┼────────┤
1 │ A │
2 │ B │
3 │ C │
4 │ D │
├─────┴────────┤
4 rows
└──────────────┘

The ordering of rows is not guaranteed in the concat operator. If you need to sort the output, use the order by operator after the concat operator.

You can concatenate multiple subqueries:

from nation
where n_regionkey = 0
concat {
from nation
where n_regionkey = 1
}
concat {
from nation
where n_regionkey = 2
}
select n_name, n_regionkey
order by n_regionkey, n_name

dedup

The dedup operator removes duplicated rows from the input rows. This is equivalent to select distinct * in SQL.

from [[1, "A"], [2, "B"], [1, "A"], [3, "C"], [2, "B"]]
as data(id, val)
dedup;

┌─────┬────────┐
│ id │ val │
int │ string │
├─────┼────────┤
1 │ A │
2 │ B │
3 │ C │
├─────┴────────┤
3 rows
└──────────────┘

intersect

The intersect operator returns the intersection of the input rows from multiple subqueries. By default, set semantics are used (duplicates removed).

from [[1, "A"], [2, "B"], [3, "C"], [4, "D"]] as t1(id, val)
intersect {
from [[2, "B"], [3, "C"], [5, "E"]] as t2(id, val)
};

┌─────┬────────┐
│ id │ val │
int │ string │
├─────┼────────┤
2 │ B │
3 │ C │
├─────┴────────┤
2 rows
└──────────────┘

With all keyword, bag semantics are used (duplicates preserved):

from [[1, "A"], [2, "B"], [2, "B"], [3, "C"]] as t1(id, val)
intersect all {
from [[2, "B"], [2, "B"], [3, "C"], [3, "C"]] as t2(id, val)
};

┌─────┬────────┐
│ id │ val │
int │ string │
├─────┼────────┤
2 │ B │
2 │ B │
3 │ C │
├─────┴────────┤
3 rows
└──────────────┘

except

The except operator returns the difference of the input rows from multiple subqueries (rows in the first query but not in the second). By default, set semantics are used.

from [[1, "A"], [2, "B"], [3, "C"], [4, "D"]] as t1(id, val)
except {
from [[2, "B"], [4, "D"]] as t2(id, val)
};

┌─────┬────────┐
│ id │ val │
int │ string │
├─────┼────────┤
1 │ A │
3 │ C │
├─────┴────────┤
2 rows
└──────────────┘

With all keyword, bag semantics are used:

from [[1, "A"], [2, "B"], [2, "B"], [3, "C"]] as t1(id, val)
except all {
from [[2, "B"]] as t2(id, val)
};

┌─────┬────────┐
│ id │ val │
int │ string │
├─────┼────────┤
1 │ A │
2 │ B │
3 │ C │
├─────┴────────┤
3 rows
└──────────────┘

pivot

The pivot operator expands the column values as horizontal columns. In pivot, you need to specify grouping columns (group by) and aggregation expressions (agg) in the subsequent operators.

Example:

from orders
pivot on o_orderpriority
group by o_orderstatus
agg _.count;

┌───────────────┬──────────┬────────┬──────────┬─────────────────┬───────┐
│ o_orderstatus │ 1-URGENT │ 2-HIGH │ 3-MEDIUM │ 4-NOT SPECIFIED │ 5-LOW │
│ string │ long │ long │ long │ long │ long │
├───────────────┼──────────┼────────┼──────────┼─────────────────┼───────┤
│ F │ 14681483144514651443
│ O │ 14881506142114821436
│ P │ 6476757771
├───────────────┴──────────┴────────┴──────────┴─────────────────┴───────┤
3 rows
└────────────────────────────────────────────────────────────────────────┘

To specify the column values to expand, use in clause:

from orders
pivot on o_orderpriority in ('1-URGENT', '2-HIGH')
group by o_orderstatus
agg _.count;

┌───────────────┬──────────┬────────┐
│ o_orderstatus │ 1-URGENT │ 2-HIGH │
│ string │ long │ long │
├───────────────┼──────────┼────────┤
│ F │ 14681483
│ O │ 14881506
│ P │ 6476
├───────────────┴──────────┴────────┤
3 rows
└───────────────────────────────────┘

unpivot

The unpivot operator transforms multiple columns into rows. This is useful when you need to transform wide table into a long table. Currently, unpivot is available only in DuckDB backend.

Example:

from [
[1, 'electronics', 1, 2, 3, 4, 5, 6],
[2, 'clothes', 10, 20, 30, 40, 50, 60],
[3, 'cars', 100, 200, 300, 400, 500, 600]
] as sales(id, dept, jan, feb, mar, apr, may, jun)
unpivot
sales for month in (jan, feb, mar, apr, may, jun)

┌─────┬─────────────┬────────┬───────┐
│ id │ dept │ month │ sales │
int │ string │ string │ int
├─────┼─────────────┼────────┼───────┤
1 │ electronics │ jan │ 1
1 │ electronics │ feb │ 2
1 │ electronics │ mar │ 3
1 │ electronics │ apr │ 4
1 │ electronics │ may │ 5
1 │ electronics │ jun │ 6
2 │ clothes │ jan │ 10
2 │ clothes │ feb │ 20
2 │ clothes │ mar │ 30
2 │ clothes │ apr │ 40
2 │ clothes │ may │ 50
2 │ clothes │ jun │ 60
3 │ cars │ jan │ 100
3 │ cars │ feb │ 200
3 │ cars │ mar │ 300
3 │ cars │ apr │ 400
3 │ cars │ may │ 500
3 │ cars │ jun │ 600
├─────┴─────────────┴────────┴───────┤
18 rows
└────────────────────────────────────┘

sample

The sample operator randomly samples rows from the input data. You can specify either a fixed number of rows or a percentage.

Sample a fixed number of rows:

from [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]] as numbers(n)
sample 5;

┌─────┐
│ n │
int
├─────┤
2
4
7
8
10
├─────┤
5 … │
└─────┘

Sample a percentage of rows:

from [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]] as numbers(n)
sample 30%;

┌─────┐
│ n │
int
├─────┤
3
6
9
├─────┤
3 … │
└─────┘

You can specify the sampling method - reservoir (default), system, or bernoulli:

-- Reservoir sampling (default) - good for getting exact sample size
from large_table
sample reservoir (1000)

-- System sampling - faster but approximate, samples blocks of data
from large_table
sample system (10%)

-- Bernoulli sampling - samples individual rows with given probability
from large_table
sample bernoulli (10%)
tip

Reservoir sampling guarantees the exact number of rows in the output (if available), while system and bernoulli sampling may return approximately the requested percentage of rows.

with

To define a local subquery definition, use the with expression:

with t1 as {
from nation
where n_regionkey = 1
}
from t1

You can define multiple subqueries:

with t1 as {
from nation
where n_regionkey = 1
}
with t2 as {
from t1
limit 10
}
from t2
tip

The with expression is similar to the common-table expression (CTE) in SQL, but for each subquery, you need to use a separate with clause. This is for convenience of adding multiple sub queries in separate line blocks without concatenating them with commas (,).

Common Query Patterns

Filter → Group → Aggregate

A common pattern for analyzing data:

from sales
where date >= '2024-01-01':date
group by product_category
agg
_.count as num_sales,
revenue.sum as total_revenue
order by total_revenue desc
limit 10

Join → Filter → Select

Combining data from multiple tables:

from customers
join {
from orders
where status = 'completed'
}
on customers.id = orders.customer_id
select
customers.name,
orders.total,
orders.created_at
order by orders.created_at desc

Pivot → Aggregate

Creating cross-tab reports:

from sales_data
pivot on month
group by product
agg revenue.sum

Multiple CTEs → Complex Analysis

Using CTEs for complex queries:

with active_users as {
from users
where last_login >= current_date - '30 days':interval
}
with user_orders as {
from orders
where created_at >= current_date - '30 days':interval
}
from active_users
left join user_orders
on active_users.id = user_orders.user_id
group by active_users.segment
agg
_.count as user_count,
user_orders.total.sum as revenue

This reference guide provides a comprehensive overview of Wvlet's query syntax. For hands-on examples and tutorials, check out the Quick Start guide.