Skip to main content

Query Syntax

Wvlet is a query language designed to be more human-readable and easier to write than SQL. If you are familiar to SQL, you will find it easy to learn Wvlet syntax due to the many similarities. If you know about DataFrame in Python, it will help you understand the Wvlet query language, as chaining relational operators in flow-style is similar to using the DataFrame API.

Documentation Overview

Typical Flow of a Wvlet Query

Wvlet queries start with the from keyword, and you can chain multiple relational operators to process the input data and generate output data. Here's a typical flow:

from ...         -- Scan the input data
where ... -- Apply filtering conditions
where ... -- [optional] Apply more filtering conditions
add ... as ... -- Add new columns
group by ... -- Group rows by the given columns
agg ... -- Add group aggregation expressions, e.g., _.count, _.sum
where ... -- Apply filtering conditions for groups (HAVING clause in SQL)
exclude ... -- Remove columns from the output
shift ... -- Shift the column position to the left
select ... -- Select columns to output
order by ... -- Sort the rows by the given columns
limit ... -- Limit the number of rows to output

Unlike SQL, whose queries always must follow the SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ... structure, Wvlet uses the flow-style syntax to match the order of data processing order as much as possible, facilitating more intuitive query writing. A query should have a from statement to read the data, but select is not mandatory in Wvlet.

Relational Operators

In Wvlet, you need to use lower-case keywords for SQL-like operators. Below is a list of relational operators for manipulating table-format data (relation):

Catalog Operators

Wvlet provides operators to list the available catalogs, schemas (databases), tables, and models in the current environment. The hierarchy of the catalog, schema, and table is as follows:

  • catalog: A collection of schemas (databases)
  • schema (database, datasets): A collection of tables
  • table: A collection of rows (records)
OperatorOutput
show tables (in catalog?(.schema)?Return a list of all tables in the current schema (database)
show schemas (in catalog)?Return a list of all schemas (databases) in the catalog
show catalogsReturn a list of all catalogs
show modelsReturn a list of all models
describe (relation)Return the schema of the specified table or query

Query Operators

Wvlet provides various relational operators to process input data and generate output data in the table format.

OperatorOutput
from exprRows from the given source table, model, value, or file
where condRows that satisfy the given condition. Multiple where clauses can be used in the same query
countCount the number of input rows
add expr (as alias)?, ...Same rows with new columns
select expr, ...Rows with the given expressions. select * is allowed
select distinct expr,...Rows with distinct values of the given expressions
select alias = expr, ...Rows with the given expressions with aliases
select expr as alias, ...Rows with the given expressions with aliases
select as aliasAdd an alias to the query to reference it from another query
rename column as alias, ...Same rows with the given columns renamed
exclude column, ...Same rows except the given columns
shift (to left)? column, ...Same rows with selected column moved to the left
shift to right column, ...Same rows with selected column moved to the right
group by column, ...Grouped rows by the given columns. Grouping keys can be referenced as select _1, _2, ... in the subsequent operator
agg agg_expr, ...Rows with the grouping keys in group by clause and aggregated values
order by expr (asc | desc)?, ...Rows sorted by the given expression. 1-origin column indexes can be used like 1, 2, ...
limit nRows up to the given number
asof? (left | right | cross)? join table on condJoined rows with the given condition
concat { query }Concatenated rows with the given subquery result. Same as UNION ALL in SQL
dedupRows with duplicated rows removed. Equivalent to select distinct *
(intersect | except) all? ...Rows with the intersection (difference) of the given sources. By default set semantics is used. If all is given, bag semantics will be used
pivot on pivot_column (in (v1, v2, ...) )?Rows whose column values in the pivot column are expanded as columns
| func(args, ...)Rows processed by the given table function (pipe operator)
with alias as { (query) }Define a local query alias, which is the same with a common-table expression (CTE) in SQL
sample method? (size rows? | %)Randomly sampled rows. Sampling method can be reservoir (default), system, or bernoulli
unnest(array expr)Expand an array into rows
test (test_expr)Test the query result, evaluated only in the test-run mode
debug { (query) }Pass-through the input results (no-op), but run a debug query against the input rows
describeReturn the schema of the input relation (column_name, column_type)

Update Operators

Wvlet provides update operators to save the query result as a new table or file, append the query result to the target table, or delete input rows from the source table.

OperatorDescription
save to table_nameSave the query result as a new table with the given name
save to table_name with p1:v1, p2:v2, ...Save the query result as a new table with the given name and options
save to 'file name'Save the query result as a file with the given name
append to table_nameAppend the query result to the target table. If the target table doesn't exist, it creates a new one
append to 'file_name'Append the query result to the target file. It will create a new file if the file doesn't exist
deleteDelete input rows from the source table

Raw SQL Commands

OperatorDescription
execute sql"..."Execute a raw SQL statement in the context engine
from sql"..."Execute a raw SQL statement and read the result as a table

Expressions

One of the major difference from traditional SQL is that wvlet uses single or double quoted strings for representing string values and back-quoted strings for referencing column or table names, which might contain space or special characters.

OperatorDescription
'single quote'String literal for representing string values, file names, etc.
"double quote"Same as single quote strings
"""triple quote string"""Multi-line strings
`(back quote)`Column or table name, which requires quotations
sql"sql expr"SQL expression used for inline expansion
sql" ... ${expr} ..."Interpolated SQL expression with embedded expressions
s"... ${expr} ..."Interpolated strings with expressions
s`... ${expr} ...`Interpolated backquote strings with expressions
[expr, ...]Array value
[[expr, ...], ...]Array of arrays for representing table records
{key: value, ...}Struct (row) value
_Underscore refers to the previous input
agg_func(expr) over (partition by ... order by ...)Window functions for computing aggregate values computed from the entire query result. This follows similar window function syntax with SQL
_1, _2, ...Refers to 1-origin grouping keys in the preceding group by clause
1, 2, ...Refers to 1-origin column index for order by clause

Variable Definition

You can define a variable using val keyword:

-- Define a new variable
val name = 'wvlet'

-- Variable is evaluated once before running the query
select s"Hello ${x}!" as msg
-- Returns [['Hello wvlet!']]

Conditional Expressions

OperatorDescription
expr and exprLogical AND
expr or exprLogical OR
not exprLogical NOT
!exprLogical NOT
expr is exprEquality check
expr = exprEquality check
expr is not exprInequality check
expr != exprInequality check
expr is nullTrue if the expression is null
expr = nullTrue if the expression is null
expr is not nullTrue if the expression is not null
expr != nullTrue if the expression is not null
expr in (v1, v2, ...)True if the expression value is in the given list
expr in { from ... }True if the expression value is in the given list provided by a sub query
expr not in (v1, v2, ...)True if the expression is not in the given list
expr between v1 and v2True if the expression value is between v1 and v2, i.e., v1 ≤ (value) ≤ v2
expr like patternTrue if the expression matches the given pattern, e.g., , 'abc%'
exists { from ... }True if the subquery returns any rows
not exists { from ... }True if the subquery returns no rows

If Expression

if ... then .. else expression can be used to switch the output value based on the condition:

from lineitem
select
if l_returnflag = 'A' then 1 else 0
as return_code

The if expression can be nested as follows:

from lineitem
select
if l_returnflag = 'A' then 1
else if l_returnflag = 'R' then 2
else 0
as return_code

Case Expression

To switch the output value based on the input value, you can use the case expression:

from lineitem
select
case l_returnflag
when 'A' then 1
when 'R' then 2
else 0
as return_code

You can also use the case expression with conditional expressions for clarity:

from lineitem
select
case
when l_returnflag = 'A' then 1
when l_returnflag = 'R' then 2
else 0
as return_code
tip

Unlike SQL, Wvlet doesn't require end at the end of case expressions.

String Expressions

OperatorDescription
expr + exprConcatenate two strings

Array Expressions

You can construct array values with [e1, e2, ...] syntax:

select ['a', 'b', 'c'] as arr

Arrays can be accessed with index (1-origin):

select ['a', 'b', 'c'] as arr
select arr[1] as first_element

Map Expressions

You can construct map values with map {k1: v1, k2: v2, ...} syntax. Unlike struct expressions, keys (k1, k2, ...) needs to be the same type values, and values (v1, v2, ...) also need to be the same type values:

select map {'a': 1, 'b': 2} as m

Struct/Row Expressions

Struct (row) expressions are used to represent key-value pairs. You can access the values by name:

select {'i': 3, 's': 'str'} as obj
select
-- Name based access
obj.i, obj.s,
-- Lookup by name
obj['i'], obj['s']

Note that key names in a struct do not require quotes for plain key names:

select {k1:1, k2:'v1'} as obj
select obj.k1, obj.k2

Lambda Expression

To manipulate array values, you can use lambda expressions:

select list_transform([4, 5, 6], x -> x + 1)

This query applies the lambda function x -> x + 1 to each element of the input array:

┌────────────┐
│ arr │
│ array(int) │
├────────────┤
│ [5, 6, 7] │
├────────────┤
│ 1 rows │
└────────────┘

To pass multiple arguments to the lambda function, use the following syntax:

select list_reduce([4, 5, 6], (a, b) -> a + b) as sum;

┌─────┐
│ sum │
int
├─────┤
15
├─────┤
1 … │
└─────┘

Lambda functions can be nested as follows:

select
list_transform(
[1, 2, 3],
x -> list_reduce([4, 5, 6], (a, b) -> a + b) + x
) as arr;

┌──────────────┐
│ arr │
│ array(int)
├──────────────┤
[16, 17, 18]
├──────────────┤
1 rows
└──────────────┘

Examples

show tables

To find a target table to query in the current database (schema), you can use the show tables query:

show tables

The output of show tables can be followed by the where clause to filter the table names:

show tables
where name like 'n%';

┌────────┐
│ name │
│ string │
├────────┤
│ nation │
├────────┤
1 rows
└────────┘

show schemas

To find a target database (schema) to query in the current catalog, you can use the show schemas query:

show schemas;

┌─────────┬────────────────────┐
│ catalog │ name │
│ string │ string │
├─────────┼────────────────────┤
│ memory │ information_schema │
│ system │ information_schema │
temp │ information_schema │
│ memory │ main │
│ system │ main │
temp │ main │
│ memory │ pg_catalog │
│ system │ pg_catalog │
temp │ pg_catalog │
├─────────┴────────────────────┤
9 rows
└──────────────────────────────┘

These schema names can be used to specify the target schema in the show tables command:

show tables in main 
where name like 'p%';

┌──────────┐
│ name │
│ string │
├──────────┤
│ part │
│ partsupp │
├──────────┤
2 rows
└──────────┘

show catalogs

List all available catalogs:

show catalogs;

┌────────┐
│ name │
│ string │
├────────┤
│ memory │
│ system │
temp
├────────┤
3 rows
└────────┘

from

Read rows from a given table:

from nation

In DuckDB backend, you can read data from local Parquet or JSON files:

from 'sample.parquet' 

Or from files on the Web:

from 'https://(path to your data)/sample.parquet'

Reading data from an URL will also work for S3 Presigned URLs.

Raw SQL statements

In case you need to use a raw SQL statement, you can use the sql string interpolation:

from sql"select * from nation"
warning

Although raw SQL statements are convenient, Wvlet needs to issue a real query to the target DBMS to inspect the data types of the query result, which might slowdown the query or cause errors in the subsequent relational operators.

where

You can apply a filter condition to the input rows:

from nation
where n_regionkey = 1

Unlike SQL, applying multiple where clauses is allowed in Wvlet:

from nation
where n_regionkey = 1
where n_name like 'U%'

This is useful even when you need to generate Wvlet queries programatically as you only need to append new condition lines to the query. In SQL, you need to parse the text inside WHERE clause and concatenate condition with AND expression.

count

Wvlet has a shorthand notation for counting the number of input rows:

from nation
count

This is equilvalen to:

from nation
select _.count

Or more SQL-like expression can be used too:

from nation
select count(*)

For clarity, you can use pipe | before the count operator:

from nation
select n_name, n_regionkey,
-- pipe (|) is required after trailing comma
| count

add

Add a new column to the input. This operator is helpful to reduce the typing effort as you don't need to enumerate all the columns in the select clause:

from nation
select n_name
add n_name.substring(1, 3) as prefix
limit 5;

┌───────────┬────────┐
│ n_name │ prefix │
│ string │ string │
├───────────┼────────┤
│ ALGERIA │ ALG │
│ ARGENTINA │ ARG │
│ BRAZIL │ BRA │
│ CANADA │ CAN │
│ EGYPT │ EGY │
├───────────┴────────┤
5 rows
└────────────────────┘

select

Output only the selected columns from the input:

from nation
select n_name, n_regionkey

As in SQL, arbitrary expressions can be used in select statements. Note that, however, in Wvlet, you usually don't need to use select statement as the other column-at-a-time operators, like add, exclude, shift, etc. can be used for manipulating column values.

rename

The rename operator changes the column names in the input rows. This is useful when you want to rename columns in the output.

from nation
rename n_name as nation_name

exclude

The exclude operator removes specified columns from the input rows. This is useful when you want to drop certain columns from the output.

from nation
exclude n_regionkey

shift

The shift operator changes the position of specified columns in the input rows. You can move columns to the left or right.

Shift to the left (default):

from nation
shift n_name

Shift to the right:

from nation
shift to right n_comment

group by

Create a list of grouped rows by the given column:

from nation
group by n_regionkey

In SQL, you need to specify some aggregation operators like count, sum, avg, etc. to generate the output. In Wvlet, the default aggegation operator arbitrary (any) is used to generate the output from group by operator. If you want to use other aggregation operators, use the agg operator.

agg

Add an aggregation expression to the grouped rows:

from nation
group by n_regionkey
agg _.count as count

_ denotes a list of rows in a group and can follow SQL aggregation expressions with dot notation. For example, _.sum, _.avg, _.max, _.min, _.max_by(sort_col), _.min_by(sort_col), etc. can be used in agg statement.

limit

Limit the number of output rows:

from nation
limit 10

order by

Sort the output rows by the given column:

from nation
order by n_name
limit 5

The order by statement can follow asc, desc ordering specifier as in SQL.

concat

The concat operator concatenates rows from multiple subqueries. This is similar to the UNION ALL operator in SQL.

from nation
where n_regionkey = 0
-- Append rows from another query
concat {
from nation
where n_regionkey = 1
}

The ordering of rows are not guaranteed in the concat operator. If you need to sort the output, use the order by operator after the concat operator.

dedup

The dedup operator removes duplicated rows from the input rows. This is equivalent to select distinct * in SQL.

from nation
dedup

intersect

The intersect operator returns the intersection of the input rows from multiple subqueries. By default, set semantics are used, but you can use bag semantics by specifying all.

from nation
intersect all {
from nation
where n_regionkey = 1
}

except

The except operator returns the difference of the input rows from multiple subqueries. By default, set semantics are used, but you can use bag semantics by specifying all.

from nation
except {
from nation
where n_regionkey = 1
}

pivot

The pivot operator expands the column values as horizontal columns. In pivot, you need to specify grouping columns (group by) and aggregation expressions (agg) in the subsequent operators.

Example:

from orders
pivot on o_orderpriority
group by o_orderstatus
agg _.count;

┌───────────────┬──────────┬────────┬──────────┬─────────────────┬───────┐
│ o_orderstatus │ 1-URGENT │ 2-HIGH │ 3-MEDIUM │ 4-NOT SPECIFIED │ 5-LOW │
│ string │ long │ long │ long │ long │ long │
├───────────────┼──────────┼────────┼──────────┼─────────────────┼───────┤
│ F │ 14681483144514651443
│ O │ 14881506142114821436
│ P │ 6476757771
├───────────────┴──────────┴────────┴──────────┴─────────────────┴───────┤
3 rows
└────────────────────────────────────────────────────────────────────────┘

To specify the column values to expand, use in clause:

from orders
pivot on o_orderpriority in ('1-URGENT', '2-HIGH')
group by o_orderstatus
agg _.count;

┌───────────────┬──────────┬────────┐
│ o_orderstatus │ 1-URGENT │ 2-HIGH │
│ string │ long │ long │
├───────────────┼──────────┼────────┤
│ F │ 14681483
│ O │ 14881506
│ P │ 6476
├───────────────┴──────────┴────────┤
3 rows
└───────────────────────────────────┘

with

To define a local subquery definition, use the with expression:

with t1 as {
from nation
where n_regionkey = 1
}
from t1

You can define multiple subqueries:

with t1 as {
from nation
where n_regionkey = 1
}
with t2 as {
from t1
limit 10
}
from t2