Catalog Management
Wvlet's Static Catalog feature allows you to export database metadata (schemas, tables, functions) and use it for offline query compilation. This is particularly useful for:
- CI/CD pipelines: Compile and validate queries without database access
- Offline development: Work on queries without network connectivity
- Performance: Avoid repeated catalog API calls during compilation
- Stability: Use a consistent catalog snapshot for testing
Quick Start
1. Import Catalog from Database
Export your database catalog to local JSON files:
# Import from DuckDB (default)
wvlet catalog import --name mydb
# Import from Trino
wvlet catalog import --type trino --name prod_catalog --profile mytrino
# Import specific schema only
wvlet catalog import --name mydb --schema sales
# Custom catalog directory
wvlet catalog import --path ./my-catalogs --name mydb
2. List Available Catalogs
View all imported catalogs:
wvlet catalog list
# With custom path
wvlet catalog list --path ./my-catalogs
Output:
Available static catalogs:
duckdb/mydb
trino/prod_catalog
3. Show Catalog Details
Inspect a specific catalog:
wvlet catalog show duckdb/mydb
Output:
Catalog: mydb (DuckDB)
Schemas: 3
main: 5 tables
sales: 12 tables
analytics: 8 tables
Functions: 2650
4. Compile with Static Catalog
Use the imported catalog for offline compilation:
# Compile using static catalog
wvlet compile -f query.wv --use-static-catalog --catalog mydb
# Specify custom catalog path
wvlet compile -f query.wv --use-static-catalog --static-catalog-path ./my-catalogs --catalog mydb
Catalog Structure
Static catalogs are stored as JSON files in the following structure:
./catalog/ # Default catalog directory
├── duckdb/ # Database type
│ └── mydb/ # Catalog name
│ ├── schemas.json # List of schemas
│ ├── main.json # Tables in 'main' schema
│ ├── sales.json # Tables in 'sales' schema
│ └── functions.json # SQL functions
└── trino/
└── prod_catalog/
└── ...
Common Workflows
Development Workflow
-
Initial Setup: Import catalog from development database
wvlet catalog import --name dev_db
-
Write Queries: Develop queries with auto-completion and validation
-- queries/customer_analysis.wv
from sales.customers c
join sales.orders o on c.customer_id = o.customer_id
where o.created_at > date '2024-01-01'
select c.name, count(*) as order_count
group by c.name -
Compile Offline: Validate queries without database connection
wvlet compile -f queries/customer_analysis.wv --use-static-catalog --catalog dev_db
CI/CD Workflow
-
Store Catalog in Version Control: Commit catalog files
git add catalog/
git commit -m "Update database catalog snapshot" -
CI Pipeline: Compile and validate all queries
# .github/workflows/validate-queries.yml
- name: Validate Queries
run: |
find queries -name "*.wv" -exec \
wvlet compile -f {} --use-static-catalog --catalog prod_db \;
Team Collaboration
-
Shared Catalog: One team member exports the catalog
wvlet catalog import --name shared_db --path ./shared-catalog
-
Distribution: Share via git, S3, or other storage
aws s3 sync ./shared-catalog s3://team-bucket/catalogs/
-
Team Usage: Others download and use the catalog
aws s3 sync s3://team-bucket/catalogs/ ./catalog/
wvlet compile -f query.wv --use-static-catalog --catalog shared_db
Keeping Catalogs Updated
Manual Refresh
Refresh catalog when schema changes:
wvlet catalog refresh --name mydb
This is equivalent to re-importing:
wvlet catalog import --name mydb
Automated Updates
Set up a scheduled job to keep catalogs current:
#!/bin/bash
# update-catalog.sh
wvlet catalog import --name prod_db --profile production
git add catalog/
git commit -m "Auto-update catalog $(date +%Y-%m-%d)"
git push
Best Practices
-
Version Control: Store catalog files in git for tracking changes
catalog/
├── .gitignore # Exclude large/temporary files
└── README.md # Document catalog sources and update procedures -
Naming Conventions: Use descriptive catalog names
- Environment:
dev_db
,staging_db
,prod_db
- Purpose:
analytics_db
,reporting_db
- Version:
sales_db_v2
,catalog_2024_06
- Environment:
-
Security: Don't store sensitive metadata
- Review exported schemas before committing
- Exclude internal/system schemas if needed
- Use
.gitignore
for sensitive catalogs
-
Performance: For large catalogs
- Import only required schemas
- Consider splitting by domain/team
- Use selective imports with
--schema
flag
Troubleshooting
Common Issues
-
"Catalog not found" error
- Check catalog path:
wvlet catalog list --path ./catalog
- Verify catalog name matches exactly
- Ensure JSON files exist in catalog directory
- Check catalog path:
-
"Table not found" during compilation
- Refresh catalog if schema changed:
wvlet catalog refresh --name mydb
- Check if table exists in correct schema
- Verify catalog was imported from correct database
- Refresh catalog if schema changed:
-
Large catalog files
- Import specific schemas:
--schema sales
- Exclude system schemas
- Consider splitting into multiple catalogs
- Import specific schemas:
Debug Commands
# Check what's in the catalog
wvlet catalog show duckdb/mydb
# Verify catalog file structure
ls -la catalog/duckdb/mydb/
# Test compilation with verbose logging
wvlet compile -f query.wv --use-static-catalog --catalog mydb --debug
Configuration
Environment Variables
# Default catalog directory
export WVLET_STATIC_CATALOG_PATH=/path/to/catalogs
# Default database type
export WVLET_DB_TYPE=trino
Configuration File
# .wvlet/config.yml
static_catalog:
path: ./catalog
default_catalog: prod_db
auto_refresh: false
See Also
- CLI Reference - Complete CLI command reference
- Static Catalog Architecture - Technical implementation details
- DuckDB Usage - DuckDB-specific features
- Trino Usage - Trino-specific features