Query Syntax
Wvlet is a query language designed to be more human-readable and easier to write than SQL. If you are familiar to SQL, you will find it easy to learn Wvlet syntax due to the many similarities. If you know about DataFrame in Python, it will help you understand the Wvlet query language, as chaining relational operators in flow-style is similar to using the DataFrame API.
Documentation Overview
Typical Flow of a Wvlet Query
Wvlet queries start with the
from
keyword, and you can chain multiple relational operators to process the input data and generate output data. Here's a typical flow:
from ... -- Scan the input data
where ... -- Apply filtering conditions
where ... -- [optional] Apply more filtering conditions
add ... as ... -- Add new columns
group by ... -- Group rows by the given columns
agg ... -- Add group aggregation expressions, e.g., _.count, _.sum
where ... -- Apply filtering conditions for groups (HAVING clause in SQL)
exclude ... -- Remove columns from the output
shift ... -- Shift the column position to the left
select ... -- Select columns to output
order by ... -- Sort the rows by the given columns
limit ... -- Limit the number of rows to output
Unlike SQL, whose queries always must follow the SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ...
structure, Wvlet uses the flow-style syntax to match the order of data processing order as much as possible, facilitating more intuitive query writing. A query should have a from
statement to read the data, but
select
is not mandatory in Wvlet.
Relational Operators
In Wvlet, you need to use lower-case keywords for SQL-like operators. Below is a list of relational operators for manipulating table-format data (relation):
Catalog Operators
Wvlet provides operators to list the available catalogs, schemas (databases), tables, and models in the current environment. The hierarchy of the catalog, schema, and table is as follows:
- catalog: A collection of schemas (databases)
- schema (database, datasets): A collection of tables
- table: A collection of rows (records)
Operator | Output |
---|---|
show tables (in catalog ?(.schema )? | Return a list of all tables in the current schema (database) |
show schemas (in catalog )? | Return a list of all schemas (databases) in the catalog |
show catalogs | Return a list of all catalogs |
show models | Return a list of all models |
describe (relation) | Return the schema of the specified table or query |
Query Operators
Wvlet provides various relational operators to process input data and generate output data in the table format.
Operator | Output |
---|---|
from expr | Rows from the given source table, model, value, or file |
where cond | Rows that satisfy the given condition. Multiple where clauses can be used in the same query |
count | Count the number of input rows |
add expr (as alias )?, ... | Same rows with new columns |
select expr , ... | Rows with the given expressions. select * is allowed |
select distinct expr ,... | Rows with distinct values of the given expressions |
select alias = expr , ... | Rows with the given expressions with aliases |
select expr as alias , ... | Rows with the given expressions with aliases |
select as alias | Add an alias to the query to reference it from another query |
rename column as alias , ... | Same rows with the given columns renamed |
exclude column , ... | Same rows except the given columns |
shift (to left)? column , ... | Same rows with selected column moved to the left |
shift to right column , ... | Same rows with selected column moved to the right |
group by column , ... | Grouped rows by the given columns. Grouping keys can be referenced as select _1, _2, ... in the subsequent operator |
agg agg_expr , ... | Rows with the grouping keys in group by clause and aggregated values |
order by expr (asc | desc)?, ... | Rows sorted by the given expression. 1-origin column indexes can be used like 1 , 2 , ... |
limit n | Rows up to the given number |
asof? (left | right | cross)? join table on cond | Joined rows with the given condition |
concat { query } | Concatenated rows with the given subquery result. Same as UNION ALL in SQL |
dedup | Rows with duplicated rows removed. Equivalent to select distinct * |
(intersect | except) all? ... | Rows with the intersection (difference) of the given sources. By default set semantics is used. If all is given, bag semantics will be used |
pivot on pivot_column (in (v1 , v2 , ...) )? | Rows whose column values in the pivot column are expanded as columns |
| func(args, ...) | Rows processed by the given table function (pipe operator) |
with alias as { (query) } | Define a local query alias, which is the same with a common-table expression (CTE) in SQL |
sample method ? (size rows? | %) | Randomly sampled rows. Sampling method can be reservoir (default), system, or bernoulli |
unnest(array expr ) | Expand an array into rows |
test (test_expr) | Test the query result, evaluated only in the test-run mode |
debug { (query) } | Pass-through the input results (no-op), but run a debug query against the input rows |
describe | Return the schema of the input relation (column_name, column_type) |
Update Operators
Wvlet provides update operators to save the query result as a new table or file, append the query result to the target table, or delete input rows from the source table.
Operator | Description |
---|---|
save to table_name | Save the query result as a new table with the given name |
save to table_name with p1:v1, p2:v2, ... | Save the query result as a new table with the given name and options |
save to 'file name' | Save the query result as a file with the given name |
append to table_name | Append the query result to the target table. If the target table doesn't exist, it creates a new one |
append to 'file_name' | Append the query result to the target file. It will create a new file if the file doesn't exist |
delete | Delete input rows from the source table |
Raw SQL Commands
Operator | Description |
---|---|
execute sql"..." | Execute a raw SQL statement in the context engine |
from sql"..." | Execute a raw SQL statement and read the result as a table |
Expressions
One of the major difference from traditional SQL is that wvlet uses single or double quoted strings for representing string values and back-quoted strings for referencing column or table names, which might contain space or special characters.
Operator | Description |
---|---|
'single quote' | String literal for representing string values, file names, etc. |
"double quote" | Same as single quote strings |
"""triple quote string""" | Multi-line strings |
`(back quote)` | Column or table name, which requires quotations |
sql"sql expr " | SQL expression used for inline expansion |
sql" ... ${expr } ..." | Interpolated SQL expression with embedded expressions |
s"... ${expr} ..." | Interpolated strings with expressions |
s`... ${expr} ...` | Interpolated backquote strings with expressions |
[expr , ...] | Array value |
[[expr , ...], ...] | Array of arrays for representing table records |
{key : value , ...} | Struct (row) value |
_ | Underscore refers to the previous input |
agg_func(expr) over (partition by ... order by ...) | Window functions for computing aggregate values computed from the entire query result. This follows similar window function syntax with SQL |
_1 , _2 , ... | Refers to 1-origin grouping keys in the preceding group by clause |
1 , 2 , ... | Refers to 1-origin column index for order by clause |
Variable Definition
You can define a variable using val
keyword:
-- Define a new variable
val name = 'wvlet'
-- Variable is evaluated once before running the query
select s"Hello ${x}!" as msg
-- Returns [['Hello wvlet!']]
Conditional Expressions
Operator | Description |
---|---|
expr and expr | Logical AND |
expr or expr | Logical OR |
not expr | Logical NOT |
!expr | Logical NOT |
expr is expr | Equality check |
expr = expr | Equality check |
expr is not expr | Inequality check |
expr != expr | Inequality check |
expr is null | True if the expression is null |
expr = null | True if the expression is null |
expr is not null | True if the expression is not null |
expr != null | True if the expression is not null |
expr in (v1 , v2 , ...) | True if the expression value is in the given list |
expr in { from ... } | True if the expression value is in the given list provided by a sub query |
expr not in (v1 , v2 , ...) | True if the expression is not in the given list |
expr between v1 and v2 | True if the expression value is between v1 and v2, i.e., v1 ≤ (value) ≤ v2 |
expr like pattern | True if the expression matches the given pattern, e.g., , 'abc%' |
exists { from ... } | True if the subquery returns any rows |
not exists { from ... } | True if the subquery returns no rows |
If Expression
if ... then .. else
expression can be used to switch the output value based on the condition:
from lineitem
select
if l_returnflag = 'A' then 1 else 0
as return_code
The if
expression can be nested as follows:
from lineitem
select
if l_returnflag = 'A' then 1
else if l_returnflag = 'R' then 2
else 0
as return_code
Case Expression
To switch the output value based on the input value, you can use the case
expression:
from lineitem
select
case l_returnflag
when 'A' then 1
when 'R' then 2
else 0
as return_code
You can also use the case
expression with conditional expressions for clarity:
from lineitem
select
case
when l_returnflag = 'A' then 1
when l_returnflag = 'R' then 2
else 0
as return_code
Unlike SQL, Wvlet doesn't require end
at the end of case expressions.
String Expressions
Operator | Description |
---|---|
expr + expr | Concatenate two strings |
Array Expressions
You can construct array values with [e1, e2, ...]
syntax:
select ['a', 'b', 'c'] as arr
Arrays can be accessed with index (1-origin):
select ['a', 'b', 'c'] as arr
select arr[1] as first_element
Map Expressions
You can construct map values with
map {k1: v1, k2: v2, ...}
syntax. Unlike struct expressions, keys (k1, k2, ...) needs to be the same type values, and values (v1, v2, ...) also need to be the same type values:
select map {'a': 1, 'b': 2} as m
Struct/Row Expressions
Struct (row) expressions are used to represent key-value pairs. You can access the values by name:
select {'i': 3, 's': 'str'} as obj
select
-- Name based access
obj.i, obj.s,
-- Lookup by name
obj['i'], obj['s']
Note that key names in a struct do not require quotes for plain key names:
select {k1:1, k2:'v1'} as obj
select obj.k1, obj.k2
Lambda Expression
To manipulate array values, you can use lambda expressions:
select list_transform([4, 5, 6], x -> x + 1)
This query applies the lambda function x -> x + 1
to each element of the input array:
┌────────────┐
│ arr │
│ array(int) │
├────────────┤
│ [5, 6, 7] │
├────────────┤
│ 1 rows │
└────────────┘
To pass multiple arguments to the lambda function, use the following syntax:
select list_reduce([4, 5, 6], (a, b) -> a + b) as sum;
┌─────┐
│ sum │
│ int │
├─────┤
│ 15 │
├─────┤
│ 1 … │
└─────┘
Lambda functions can be nested as follows:
select
list_transform(
[1, 2, 3],
x -> list_reduce([4, 5, 6], (a, b) -> a + b) + x
) as arr;
┌──────────────┐
│ arr │
│ array(int) │
├──────────────┤
│ [16, 17, 18] │
├──────────────┤
│ 1 rows │
└──────────────┘
Examples
show tables
To find a target table to query in the current database (schema), you can use the show tables
query:
show tables
The output of show tables
can be followed by the where
clause to filter the table names:
show tables
where name like 'n%';
┌────────┐
│ name │
│ string │
├────────┤
│ nation │
├────────┤
│ 1 rows │
└────────┘
show schemas
To find a target database (schema) to query in the current catalog, you can use the show schemas
query:
show schemas;
┌─────────┬────────────────────┐
│ catalog │ name │
│ string │ string │
├─────────┼────────────────────┤
│ memory │ information_schema │
│ system │ information_schema │
│ temp │ information_schema │
│ memory │ main │
│ system │ main │
│ temp │ main │
│ memory │ pg_catalog │
│ system │ pg_catalog │
│ temp │ pg_catalog │
├─────────┴────────────────────┤
│ 9 rows │
└──────────────────────────────┘
These schema names can be used to specify the target schema in the show tables
command:
show tables in main
where name like 'p%';
┌──────────┐
│ name │
│ string │
├──────────┤
│ part │
│ partsupp │
├──────────┤
│ 2 rows │
└──────────┘
show catalogs
List all available catalogs:
show catalogs;
┌────────┐
│ name │
│ string │
├────────┤
│ memory │
│ system │
│ temp │
├────────┤
│ 3 rows │
└────────┘
from
Read rows from a given table:
from nation
In DuckDB backend, you can read data from local Parquet or JSON files:
from 'sample.parquet'
Or from files on the Web:
from 'https://(path to your data)/sample.parquet'
Reading data from an URL will also work for S3 Presigned URLs.
Raw SQL statements
In case you need to use a raw SQL statement, you can use the sql
string interpolation:
from sql"select * from nation"
Although raw SQL statements are convenient, Wvlet needs to issue a real query to the target DBMS to inspect the data types of the query result, which might slowdown the query or cause errors in the subsequent relational operators.
where
You can apply a filter condition to the input rows:
from nation
where n_regionkey = 1
Unlike SQL, applying multiple where
clauses is allowed in Wvlet:
from nation
where n_regionkey = 1
where n_name like 'U%'
This is useful even when you need to generate Wvlet queries programatically as you only need to append new condition lines to the query. In SQL, you need to parse the text inside WHERE clause and concatenate condition with AND expression.
count
Wvlet has a shorthand notation for counting the number of input rows:
from nation
count
This is equilvalen to:
from nation
select _.count
Or more SQL-like expression can be used too:
from nation
select count(*)
For clarity, you can use pipe |
before the count operator:
from nation
select n_name, n_regionkey,
-- pipe (|) is required after trailing comma
| count
add
Add a new column to the input. This operator is helpful to reduce the typing effort as you don't need to enumerate all the columns in the select
clause:
from nation
select n_name
add n_name.substring(1, 3) as prefix
limit 5;
┌───────────┬────────┐
│ n_name │ prefix │
│ string │ string │
├───────────┼────────┤
│ ALGERIA │ ALG │
│ ARGENTINA │ ARG │
│ BRAZIL │ BRA │
│ CANADA │ CAN │
│ EGYPT │ EGY │
├───────────┴────────┤
│ 5 rows │
└────────────────────┘
select
Output only the selected columns from the input:
from nation
select n_name, n_regionkey
As in SQL, arbitrary expressions can be used in select statements. Note that, however, in Wvlet, you usually don't need to use select statement
as the other column-at-a-time operators, like add
, exclude
, shift
, etc. can be used for manipulating column values.
rename
The rename
operator changes the column names in the input rows. This is useful when you want to rename columns in the output.
from nation
rename n_name as nation_name
exclude
The exclude
operator removes specified columns from the input rows. This is useful when you want to drop certain columns from the output.
from nation
exclude n_regionkey
shift
The shift
operator changes the position of specified columns in the input rows. You can move columns to the left or right.
Shift to the left (default):
from nation
shift n_name
Shift to the right:
from nation
shift to right n_comment
group by
Create a list of grouped rows by the given column:
from nation
group by n_regionkey
In SQL, you need to specify some aggregation operators like count
, sum
, avg
, etc. to generate the output.
In Wvlet, the default aggegation operator arbitrary (any)
is used to generate the output from group by
operator. If you want to
use other aggregation operators, use the agg
operator.
agg
Add an aggregation expression to the grouped rows:
from nation
group by n_regionkey
agg _.count as count
_
denotes a list of rows in a group and can follow SQL aggregation expressions with dot notation.
For example, _.sum
, _.avg
, _.max
, _.min
, _.max_by(sort_col)
, _.min_by(sort_col)
, etc. can be used in agg statement.
limit
Limit the number of output rows:
from nation
limit 10
order by
Sort the output rows by the given column:
from nation
order by n_name
limit 5
The order by
statement can follow asc
, desc
ordering specifier as in SQL.
concat
The concat operator concatenates rows from multiple subqueries. This is similar to the UNION ALL operator in SQL.
from nation
where n_regionkey = 0
-- Append rows from another query
concat {
from nation
where n_regionkey = 1
}
The ordering of rows are not guaranteed in the concat
operator. If you need to sort the output, use the order by
operator after the concat
operator.
dedup
The dedup
operator removes duplicated rows from the input rows. This is equivalent to select distinct *
in SQL.
from nation
dedup
intersect
The intersect
operator returns the intersection of the input rows from multiple subqueries. By default, set semantics are used, but you can use bag semantics by specifying all
.
from nation
intersect all {
from nation
where n_regionkey = 1
}
except
The except
operator returns the difference of the input rows from multiple subqueries. By default, set semantics are used, but you can use bag semantics by specifying all
.
from nation
except {
from nation
where n_regionkey = 1
}
pivot
The pivot
operator expands the column values as horizontal columns. In pivot
, you need to specify grouping columns (group by
) and aggregation expressions (agg
) in the subsequent operators.
Example:
from orders
pivot on o_orderpriority
group by o_orderstatus
agg _.count;
┌───────────────┬──────────┬────────┬──────────┬─────────────────┬───────┐
│ o_orderstatus │ 1-URGENT │ 2-HIGH │ 3-MEDIUM │ 4-NOT SPECIFIED │ 5-LOW │
│ string │ long │ long │ long │ long │ long │
├───────────────┼──────────┼────────┼──────────┼─────────────────┼───────┤
│ F │ 1468 │ 1483 │ 1445 │ 1465 │ 1443 │
│ O │ 1488 │ 1506 │ 1421 │ 1482 │ 1436 │
│ P │ 64 │ 76 │ 75 │ 77 │ 71 │
├───────────────┴──────────┴────────┴──────────┴─────────────────┴───────┤
│ 3 rows │
└────────────────────────────────────────────────────────────────────────┘
To specify the column values to expand, use in
clause:
from orders
pivot on o_orderpriority in ('1-URGENT', '2-HIGH')
group by o_orderstatus
agg _.count;
┌───────────────┬──────────┬────────┐
│ o_orderstatus │ 1-URGENT │ 2-HIGH │
│ string │ long │ long │
├───────────────┼──────────┼────────┤
│ F │ 1468 │ 1483 │
│ O │ 1488 │ 1506 │
│ P │ 64 │ 76 │
├───────────────┴──────────┴────────┤
│ 3 rows │
└───────────────────────────────────┘
with
To define a local subquery definition, use the with
expression:
with t1 as {
from nation
where n_regionkey = 1
}
from t1
You can define multiple subqueries:
with t1 as {
from nation
where n_regionkey = 1
}
with t2 as {
from t1
limit 10
}
from t2