wf_psf.data.schemas

Dataset schema definitions for canonical dataset handling.

This module defines dataset validation and conversion schemas used across training, evaluation, and inference workflows. Schemas specify which canonical dataset fields are required, which are optional, and whether missing required fields should raise an exception.

These schemas provide a centralized contract between dataset adapters, preprocessing pipelines, and TensorFlow conversion utilities.

Canonical dataset field names are defined in constants.py and represent the normalized internal dataset interface used throughout the library, independent of external dataset naming conventions.

Author(s): Jennifer Pollack <jennifer.pollack@cea.fr>

Module Attributes

`TRAIN_SCHEMA`	Dataset schema used during model training.
`EVALUATION_SCHEMA`	Dataset schema used during model evaluation.
`INFERENCE_SCHEMA`	Dataset schema used during inference.
`SCHEMAS`	Registry mapping dataset operation modes to dataset schemas.

Classes

`DatasetMode`(value)	Enumeration of supported dataset operation modes.
`DatasetSchema`(id, required_keys, ...], ...)	Definition of a canonical dataset schema.

class wf_psf.data.schemas.DatasetMode(value)[source]

Bases: Enum

Enumeration of supported dataset operation modes.

These modes define the expected dataset contract for different stages of the wf-psf workflow.

TRAIN: Dataset schema used during model training.

EVALUATION: Dataset schema used during evaluation.

INFERENCE: Dataset schema used during inference or prediction.

EVALUATION = 2

INFERENCE = 3

TRAIN = 1

class wf_psf.data.schemas.DatasetSchema(id: str, required_keys: tuple[str, ...], optional_keys: tuple[str, ...], strict: bool = True, handlers: dict[str, ~typing.Callable[[...], ~typing.Any]] = <factory>)[source]

Bases: object

Definition of a canonical dataset schema.

A dataset schema specifies which canonical dataset fields are required, which fields are optional, and whether missing required fields should raise an exception during validation or conversion.

Parameters:

id (str) – Schema identifier (e.g. “train”, “evaluation”, “inference”)
required_keys (tuple[str, ...]) – Canonical dataset fields that must be present.
optional_keys (tuple[str, ...]) – Canonical dataset fields that may be present and will be processed if available.
strict (bool, optional) – If True, missing required fields raise an exception. If False, missing required fields generate warnings and are skipped. Default is True.
handlers (dict[str, Callable[..., Any]] = None) – Handler for specific dataset fields (e.g. seds)

handlers: dict[str, Callable[[...], Any]]

id: str

optional_keys: tuple[str, ...]

required_keys: tuple[str, ...]

strict: bool = True

wf_psf.data.schemas.EVALUATION_SCHEMA = DatasetSchema(id='evaluation', required_keys=('sources', 'positions', 'seds'), optional_keys=('masks', 'zernike_prior'), strict=True, handlers={'seds': <function process_seds_handler>})

Dataset schema used during model evaluation.

All canonical dataset fields are required during evaluation. Missing required fields raise exceptions.

wf_psf.data.schemas.INFERENCE_SCHEMA = DatasetSchema(id='inference', required_keys=('seds', 'positions'), optional_keys=('masks', 'zernike_prior'), strict=False, handlers={'seds': <function process_seds_handler>})

Dataset schema used during inference.

Inference requires only the minimal subset of canonical fields needed for prediction. Missing required fields generate warnings rather than raising exceptions.

wf_psf.data.schemas.SCHEMAS = {DatasetMode.EVALUATION: DatasetSchema(id='evaluation', required_keys=('sources', 'positions', 'seds'), optional_keys=('masks', 'zernike_prior'), strict=True, handlers={'seds': <function process_seds_handler>}), DatasetMode.INFERENCE: DatasetSchema(id='inference', required_keys=('seds', 'positions'), optional_keys=('masks', 'zernike_prior'), strict=False, handlers={'seds': <function process_seds_handler>}), DatasetMode.TRAIN: DatasetSchema(id='train', required_keys=('sources', 'positions', 'seds'), optional_keys=('masks', 'zernike_prior'), strict=True, handlers={'seds': <function process_seds_handler>})}

Registry mapping dataset operation modes to dataset schemas.

This dictionary provides centralized access to workflow-specific dataset contracts used throughout the preprocessing and conversion pipeline.

wf_psf.data.schemas.TRAIN_SCHEMA = DatasetSchema(id='train', required_keys=('sources', 'positions', 'seds'), optional_keys=('masks', 'zernike_prior'), strict=True, handlers={'seds': <function process_seds_handler>})

Dataset schema used during model training.

All canonical dataset fields are required during training. Missing required fields raise exceptions.