Catalog Management

Wvlet's Static Catalog feature allows you to export database metadata (schemas, tables, functions) and use it for offline query compilation. This is particularly useful for:

CI/CD pipelines: Compile and validate queries without database access
Offline development: Work on queries without network connectivity
Performance: Avoid repeated catalog API calls during compilation
Stability: Use a consistent catalog snapshot for testing

Quick Start

1. Import Catalog from Database

Export your database catalog to local JSON files:

# Import from DuckDB (default)
wvlet catalog import --name mydb

# Import from Trino
wvlet catalog import --type trino --name prod_catalog --profile mytrino

# Import specific schema only
wvlet catalog import --name mydb --schema sales

# Custom catalog directory
wvlet catalog import --path ./my-catalogs --name mydb

2. List Available Catalogs

View all imported catalogs:

wvlet catalog list

# With custom path
wvlet catalog list --path ./my-catalogs

Output:

Available static catalogs:
  duckdb/mydb
  trino/prod_catalog

3. Show Catalog Details

Inspect a specific catalog:

wvlet catalog show duckdb/mydb

Output:

Catalog: mydb (DuckDB)
Schemas: 3
  main: 5 tables
  sales: 12 tables
  analytics: 8 tables
Functions: 2650

4. Compile with Static Catalog

Use the imported catalog for offline compilation:

# Compile using static catalog
wvlet compile -f query.wv --use-static-catalog --catalog mydb

# Specify custom catalog path
wvlet compile -f query.wv --use-static-catalog --static-catalog-path ./my-catalogs --catalog mydb

Catalog Structure

Static catalogs are stored as JSON files in the following structure:

./catalog/                    # Default catalog directory
├── duckdb/                  # Database type
│   └── mydb/               # Catalog name
│       ├── schemas.json    # List of schemas
│       ├── main.json       # Tables in 'main' schema
│       ├── sales.json      # Tables in 'sales' schema
│       └── functions.json  # SQL functions
└── trino/
    └── prod_catalog/
        └── ...

Common Workflows

Development Workflow

Initial Setup: Import catalog from development database
```
wvlet catalog import --name dev_db
```

Write Queries: Develop queries with auto-completion and validation

-- queries/customer_analysis.wv
from sales.customers c
join sales.orders o on c.customer_id = o.customer_id
where o.created_at > date '2024-01-01'
select c.name, count(*) as order_count
group by c.name

Compile Offline: Validate queries without database connection

wvlet compile -f queries/customer_analysis.wv --use-static-catalog --catalog dev_db

CI/CD Workflow

Store Catalog in Version Control: Commit catalog files

git add catalog/
git commit -m "Update database catalog snapshot"

CI Pipeline: Compile and validate all queries

# .github/workflows/validate-queries.yml
- name: Validate Queries
  run: |
    find queries -name "*.wv" -exec \
      wvlet compile -f {} --use-static-catalog --catalog prod_db \;

Team Collaboration

Shared Catalog: One team member exports the catalog

wvlet catalog import --name shared_db --path ./shared-catalog

Distribution: Share via git, S3, or other storage

aws s3 sync ./shared-catalog s3://team-bucket/catalogs/

Team Usage: Others download and use the catalog

aws s3 sync s3://team-bucket/catalogs/ ./catalog/
wvlet compile -f query.wv --use-static-catalog --catalog shared_db

Keeping Catalogs Updated

Manual Refresh

Refresh catalog when schema changes:

wvlet catalog refresh --name mydb

This is equivalent to re-importing:

wvlet catalog import --name mydb

Automated Updates

Set up a scheduled job to keep catalogs current:

#!/bin/bash
# update-catalog.sh
wvlet catalog import --name prod_db --profile production
git add catalog/
git commit -m "Auto-update catalog $(date +%Y-%m-%d)"
git push

Best Practices

Version Control: Store catalog files in git for tracking changes

catalog/
├── .gitignore  # Exclude large/temporary files
└── README.md   # Document catalog sources and update procedures

Naming Conventions: Use descriptive catalog names
- Environment: dev_db, staging_db, prod_db
- Purpose: analytics_db, reporting_db
- Version: sales_db_v2, catalog_2024_06
Security: Don't store sensitive metadata
- Review exported schemas before committing
- Exclude internal/system schemas if needed
- Use .gitignore for sensitive catalogs
Performance: For large catalogs
- Import only required schemas
- Consider splitting by domain/team
- Use selective imports with --schema flag

Troubleshooting

Common Issues

"Catalog not found" error
- Check catalog path: wvlet catalog list --path ./catalog
- Verify catalog name matches exactly
- Ensure JSON files exist in catalog directory
"Table not found" during compilation
- Refresh catalog if schema changed: wvlet catalog refresh --name mydb
- Check if table exists in correct schema
- Verify catalog was imported from correct database
Large catalog files
- Import specific schemas: --schema sales
- Exclude system schemas
- Consider splitting into multiple catalogs

Debug Commands

# Check what's in the catalog
wvlet catalog show duckdb/mydb

# Verify catalog file structure
ls -la catalog/duckdb/mydb/

# Test compilation with verbose logging
wvlet compile -f query.wv --use-static-catalog --catalog mydb --debug

Configuration

Environment Variables

# Default catalog directory
export WVLET_STATIC_CATALOG_PATH=/path/to/catalogs

# Default database type
export WVLET_DB_TYPE=trino

Configuration File

# .wvlet/config.yml
static_catalog:
  path: ./catalog
  default_catalog: prod_db
  auto_refresh: false

Quick Start​

1. Import Catalog from Database​

2. List Available Catalogs​

3. Show Catalog Details​

4. Compile with Static Catalog​

Catalog Structure​

Common Workflows​

Development Workflow​

CI/CD Workflow​

Team Collaboration​

Keeping Catalogs Updated​

Manual Refresh​

Automated Updates​

Best Practices​

Troubleshooting​

Common Issues​

Debug Commands​

Configuration​

Environment Variables​

Configuration File​

See Also​