wf_psf.data.data_adapter
Data Adapter.
This module manages dataset lifecycle transitions for the WF-PSF pipeline.
Overview
Two orthogonal state machines are maintained:
1. Structure state
COMPLETE→SPLITviasplit_data()SPLIT→COMPLETEviajoin_data()
2. Representation state
NUMPY→TENSORFLOWviaconvert_to_tensorflow()
Glossary
- COMPLETE
Dataset stored as a single container.
- SPLIT
Dataset stored as train/test subsets.
- NUMPY
Data stored as NumPy arrays.
- TENSORFLOW
Data stored as TensorFlow tensors.
Design principles
Structure and representation are orthogonal.
All transitions are explicit and idempotent where possible.
No training or model logic lives in this module.
Dataset field names are canonicalized for downstream models.
Notes
The DataAdapter class manages these transitions while providing
a consistent interface for accessing dataset contents.
Authors: Jennifer Pollack <jennifer.pollack@cea.fr>
Classes
|
Adapter for managing dataset structure and backend representation. |
|
Structured container for loaded dataset. |
|
Representation state of the dataset. |
|
Structural state of the dataset. |
- class wf_psf.data.data_adapter.DataAdapter(dataset: LoadedDataset, converter: TensorFlowDatasetConverter, params: Any | None = None, metadata: dict | None = None)[source]
Bases:
objectAdapter for managing dataset structure and backend representation.
The adapter provides a consistent interface to datasets regardless of whether they are stored as a complete dataset or as train/test splits, and whether the underlying representation is NumPy or TensorFlow.
It also canonicalizes dataset fields to the names expected by downstream models.
Notes
Instances should be created via DataAdapterFactory.build().
- Attributes:
complete_dataReturn the complete dataset in the current representation.
masksGet masks for the complete dataset.
metadataGet dataset metadata.
paramsGet dataset params.
positionsGet positions for the complete dataset.
representation_stateReturn the current representation state of the dataset.
sourcesGet sources for the complete dataset.
structure_stateReturn the current structural state of the dataset.
test_dataReturn the test set in the current representation.
train_dataReturn the training set in the current representation.
zernike_priorGet Zernike prior for the complete dataset.
Methods
convert_to_tensorflow(simPSF, n_bins_lambda, ...)Convert dataset containers from NumPy to TensorFlow representation.
join_data([keys])Join train and test splits into a single complete dataset.
Release NumPy datasets.
Release tensorflow datasets.
split_data([ratio, seed])Split the complete dataset into train and test sets if not already split.
- property complete_data
Return the complete dataset in the current representation.
- convert_to_tensorflow(simPSF, n_bins_lambda, mode)[source]
Convert dataset containers from NumPy to TensorFlow representation.
Applies the configured converter to transform dataset fields associated with canonical keys into TensorFlow-compatible formats.
Conversion is performed on the active structure:
SPLIT: converts train and test datasets separately
COMPLETE: converts the full dataset
- Parameters:
simPSF (PSFSimulator) – Simulator instance passed to the converter.
n_bins_lambda (int) – Number of wavelength bins used during conversion.
mode (DatasetMode) – Dataset operation mode used to select the appropriate dataset schema for a given pipeline process (e.g. training, validation, inference)
- Raises:
RuntimeError – If no converter is configured.
Notes
Conversion is idempotent: if the data is already in TensorFlow
representation, this method does nothing.
Converted datasets are stored in internal attributes
(
_train_tf,_test_tf,_complete_tf) and do not overwrite the original NumPy data.
- join_data(keys: list[str] | None = None)[source]
Join train and test splits into a single complete dataset.
Concatenates corresponding arrays from the train and test containers along the first axis (sample dimension) for the specified keys.
- Parameters:
keys (list of str, optional) – Dataset fields to join. If None, uses the canonical dataset keys.
- Raises:
RuntimeError – If the adapter is not in SPLIT state or if train/test data is missing.
Notes
Only keys present in both train and test datasets are joined.
- property masks
Get masks for the complete dataset.
- property positions
Get positions for the complete dataset.
- property representation_state
Return the current representation state of the dataset.
- property sources
Get sources for the complete dataset.
- split_data(ratio: float | None = None, seed: int | None = None)[source]
Split the complete dataset into train and test sets if not already split.
- Parameters:
- Raises:
RuntimeError – If the dataset is not in COMPLETE state when attempting to split.
Notes
Splitting is idempotent: if the dataset is already in SPLIT state, this method does not modify the data or re-split the dataset.
- property structure_state
Return the current structural state of the dataset.
- property test_data
Return the test set in the current representation.
- property train_data
Return the training set in the current representation.
- property zernike_prior
Get Zernike prior for the complete dataset.
- class wf_psf.data.data_adapter.LoadedDataset(complete: dict | None = None, train: dict | None = None, test: dict | None = None)[source]
Bases:
objectStructured container for loaded dataset.
Methods
Check if the dataset is in COMPLETE state.
is_split()Check if the dataset is in SPLIT state.