Static Catalog
The Static Catalog feature allows Wvlet to compile queries without making remote catalog calls, significantly improving compilation performance. This is especially useful for CI/CD pipelines, offline development, and scenarios where catalog metadata doesn't change frequently.
Overview
By default, Wvlet's compiler fetches catalog metadata (schemas, tables, columns) from remote databases during compilation. This can cause performance issues when:
- Multiple table references require repeated remote calls
- Network latency is high
- Catalog cache expires (default: 5 minutes)
The Static Catalog feature addresses these issues by loading catalog metadata from local JSON files.
Architecture
File Structure
Static catalogs are organized by database type and catalog name:
<basePath>/
├── duckdb/
│ └── my_catalog/
│ ├── schemas.json # List of database schemas
│ ├── main.json # Tables in 'main' schema
│ ├── analytics.json # Tables in 'analytics' schema
│ └── functions.json # SQL functions
└── trino/
└── another_catalog/
└── ...
JSON Format
schemas.json
[
{
"catalog": "my_catalog",
"name": "main",
"description": "Main schema",
"properties": {}
}
]
<schema_name>.json
(e.g., main.json)
[
{
"tableName": {
"catalog": "my_catalog",
"schema": "main",
"name": "users"
},
"columns": [
{"name": "id", "dataType": {"typeName": "long"}},
{"name": "name", "dataType": {"typeName": "string"}},
{"name": "email", "dataType": {"typeName": "string"}}
],
"description": "User table",
"properties": {}
}
]
functions.json
[
{
"name": "sum",
"functionType": "AGGREGATE",
"args": [{"typeName": "double"}],
"returnType": {"typeName": "double"}
}
]
Usage
CLI Usage
The recommended way to use static catalogs is through the Wvlet CLI:
# Import catalog from database
wv catalog import --name mydb
# List available catalogs
wv catalog list
# Show catalog details
wv catalog show duckdb/mydb
# Compile with static catalog
wvlet compile -f query.wv --use-static-catalog --catalog mydb
See Catalog Management for detailed CLI usage.
Programmatic Usage
import wvlet.lang.compiler.{Compiler, CompilerOptions, DBType, WorkEnv}
import wvlet.log.LogLevel
val workEnv = WorkEnv(".", LogLevel.INFO)
val compilerOptions = CompilerOptions(
sourceFolders = List("."),
workEnv = workEnv,
catalog = Some("my_catalog"),
schema = Some("main"),
dbType = DBType.DuckDB,
useStaticCatalog = true,
staticCatalogPath = Some("/path/to/catalog/base")
)
val compiler = Compiler(compilerOptions)
val result = compiler.compile()
Configuration Options
sourceFolders
: List of directories containing .wv filesworkEnv
: Working environment with path and log levelcatalog
: Catalog name to loadschema
: Default schema namedbType
: Target database type (DuckDB, Trino, etc.)useStaticCatalog
: Boolean flag to enable static catalog modestaticCatalogPath
: Base directory containing catalog metadata
Implementation Details
Key Components
- StaticCatalog: Implements the
Catalog
trait with read-only operations - StaticCatalogProvider: Loads catalogs from filesystem with error handling
- CatalogSerializer: Handles JSON serialization/deserialization
- CompilerOptions: Extended with static catalog configuration
Error Handling
- Missing files: Falls back to empty collections (schemas, tables, functions)
- Corrupted JSON: Falls back to InMemoryCatalog
- Write operations: Throw
NOT_IMPLEMENTED
exceptions - Missing resources: Throw appropriate
NOT_FOUND
exceptions
Performance Characteristics
- Initial load: One-time filesystem read at compiler initialization
- Lookups: O(1) HashMap lookups for schemas and tables
- Memory usage: Proportional to catalog size (all metadata loaded in memory)
Platform Support
- JVM: Full support for static catalog loading from filesystem
- Scala.js: Not supported (no file I/O capabilities)
- Scala Native: Not supported (limited file I/O support)
Limitations
- Read-only: Cannot create or modify schemas/tables
- Manual updates: Catalog files must be updated externally
- No automatic refresh: Changes require compiler restart
- Platform-specific: Only available on JVM platform
Future Enhancements
- Incremental Updates: Support for updating only changed schemas/tables
- Schema Evolution: Handle schema changes gracefully with versioning
- Partial Loading: Load only required schemas for better performance
- Remote Storage: Support for S3, GCS, and other cloud storage backends
- Catalog Diff: Show differences between catalog versions
- Multi-Platform Support: Extend to Scala.js and Scala Native platforms