FITS to Zarr Conversion¶

The OVRO-LWA Portal provides tools for converting FITS image files to cloud-optimized Zarr format.

Why Zarr?¶

Zarr offers several advantages over FITS for large radio astronomy datasets:

Cloud-optimized: Efficient access to data stored in cloud object stores
Chunked storage: Read only the data you need
Parallel I/O: Multiple processes can read simultaneously
Compression: Reduce storage requirements
Incremental updates: Append new observations to existing stores

Command-Line Interface¶

Basic Conversion¶

Convert a directory of FITS files to Zarr:

ovro-ingest convert /path/to/fits /path/to/output

This will:

Scan the input directory for FITS files
Convert each file to Zarr format
Create a single consolidated Zarr store
Display progress with a rich progress bar

Advanced Options¶

ovro-ingest convert /path/to/fits /path/to/output \
    --zarr-name custom_name.zarr \
    --chunk-lm 2048 \
    --rebuild

Options¶

--zarr-name: Name of the output Zarr store (default: derived from input path)
--chunk-lm: Chunk size for the l and m dimensions (default: 1024)
--rebuild: Remove existing Zarr store and rebuild from scratch

Get Help¶

ovro-ingest convert --help

Python API¶

For more control, use the Python API directly:

Basic Usage¶

from pathlib import Path
from ovro_lwa_portal.ingest import FITSToZarrConverter
from ovro_lwa_portal.ingest.core import ConversionConfig

# Configure conversion
config = ConversionConfig(
    input_dir=Path("/path/to/fits"),
    output_dir=Path("/path/to/output"),
    zarr_name="ovro_lwa_data.zarr",
)

# Execute conversion
converter = FITSToZarrConverter(config)
result = converter.convert()
print(f"Created: {result}")

Configuration Options¶

config = ConversionConfig(
    input_dir=Path("/path/to/fits"),
    output_dir=Path("/path/to/output"),
    zarr_name="ovro_lwa_data.zarr",
    chunk_lm=2048,              # Chunk size for l/m dimensions
    rebuild=False,              # Whether to rebuild existing store
    compressor=None,            # Custom compression (optional)
)

Incremental Processing¶

Append new observations to an existing Zarr store:

# First conversion
config1 = ConversionConfig(
    input_dir=Path("/path/to/fits/batch1"),
    output_dir=Path("/path/to/output"),
    zarr_name="observations.zarr",
)
converter1 = FITSToZarrConverter(config1)
converter1.convert()

# Append more data
config2 = ConversionConfig(
    input_dir=Path("/path/to/fits/batch2"),
    output_dir=Path("/path/to/output"),
    zarr_name="observations.zarr",  # Same name
    rebuild=False,                   # Don't rebuild
)
converter2 = FITSToZarrConverter(config2)
converter2.convert()

Concurrent Write Protection¶

The converter uses file locking to prevent data corruption when multiple processes write to the same Zarr store:

from pathlib import Path
from concurrent.futures import ProcessPoolExecutor
from ovro_lwa_portal.ingest import FITSToZarrConverter
from ovro_lwa_portal.ingest.core import ConversionConfig

def convert_batch(batch_dir):
    config = ConversionConfig(
        input_dir=batch_dir,
        output_dir=Path("/path/to/output"),
        zarr_name="shared_observations.zarr",
    )
    converter = FITSToZarrConverter(config)
    return converter.convert()

# Safe to run in parallel
with ProcessPoolExecutor(max_workers=4) as executor:
    batches = [Path(f"/path/to/batch{i}") for i in range(4)]
    results = executor.map(convert_batch, batches)

Preserving WCS Coordinates¶

The converter automatically preserves World Coordinate System (WCS) information:

Right Ascension (RA) and Declination (Dec)
Frequency and time coordinates
Observation metadata

After conversion, you can work with celestial coordinates directly:

import ovro_lwa_portal

ds = ovro_lwa_portal.open_dataset("observations.zarr")

# Access WCS coordinates
ra = ds.coords['ra']
dec = ds.coords['dec']

# Select by celestial coordinates
region = ds.sel(ra=slice(10, 20), dec=slice(-5, 5))

Best Practices¶

Chunk Size: Choose chunk sizes that match your access patterns
For time series analysis: larger time chunks
For spatial analysis: larger l/m chunks
Compression: Use compression for reduced storage (enabled by default)
Incremental Updates: Use rebuild=False to append new data
Parallel Processing: The converter is safe for concurrent writes
Cloud Storage: Convert data once, then access efficiently from anywhere