wf_psf.data.data_utils

Data utilities and lightweight runtime data structures.

Provides lightweight dataset containers, runtime conversion contexts, and helper utilities used throughout the dataset normalization and preprocessing pipeline.

This module includes:

  • Dictionary-like dataset container abstractions

  • Dataset inspection and normalization helpers

  • Runtime conversion context objects used during field-level processing

  • Domain-specific preprocessing contexts (e.g. SED processing)

These utilities support schema-driven dataset conversion workflows used by training, validation, and inference pipelines.

Notes

The conversion context system is intentionally extensible to support additional scientific domains and instrument pipelines beyond the current Euclid-specific workflows. Future extensions may include dedicated contexts for:

  • PSF modeling

  • Instrument calibration

  • Detector noise simulation

Authors

Jennifer Pollack <jennifer.pollack@cea.fr>

Functions

to_container(obj)

Convert an object to a DatasetContainer.

Classes

ConversionContext([seds])

Global runtime context for dataset conversion operations.

DatasetContainer(data)

Lightweight container for structured dataset data.

SEDContext(simPSF, n_bins_lambda)

Context object containing parameters required for SED processing.

class wf_psf.data.data_utils.ConversionContext(seds: SEDContext | None = None)[source]

Bases: object

Global runtime context for dataset conversion operations.

This object aggregates optional domain-specific contexts required during dataset preprocessing and conversion. It is passed through the conversion pipeline and accessed by field-specific handlers.

Currently, it contains an optional SED context used for spectral energy distribution processing.

Attributes:
seds
seds: SEDContext | None = None
class wf_psf.data.data_utils.DatasetContainer(data: dict[str, Any])[source]

Bases: MutableMapping

Lightweight container for structured dataset data.

Stores data internally as a dictionary, while providing dictionary-style and attribute-style access for convenience.

Parameters:

data (dict[str, Any]) – Dictionary containing dataset tensors and metadata.

_data

Internal storage for dataset contents.

Type:

dict[str, Any]

Examples

>>> container = DatasetContainer({'x': np.array([1, 2, 3]), 'y': np.array([4, 5, 6])})
>>> container['x']
array([1, 2, 3])
>>> container.x
array([1, 2, 3])
>>> container.to_dict()
{'x': array([1, 2, 3]), 'y': array([4, 5, 6])}

Methods

clear()

get(key[, default])

Retrieve the corresponding layout by the string key.

items()

keys()

pop(k[,d])

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem()

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[,d])

to_dict()

Return data as dict.

update([E, ]**F)

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()

to_dict() dict[str, Any][source]

Return data as dict.

class wf_psf.data.data_utils.SEDContext(simPSF: PSFSimulator, n_bins_lambda: int)[source]

Bases: object

Context object containing parameters required for SED processing.

This context encapsulates all runtime dependencies needed for spectral energy distribution (SED) transformations within the dataset conversion pipeline.

Parameters:
  • simPSF (Any) – PSF simulator instance used during SED processing. This object is responsible for modelling instrument response effects applied to spectral data.

  • n_bins_lambda (int) – Number of wavelength bins used for discretizing the SED during conversion.

n_bins_lambda: int
simPSF: PSFSimulator
wf_psf.data.data_utils.to_container(obj) DatasetContainer | None[source]

Convert an object to a DatasetContainer.

Transforms various dataset representations into a standardized DatasetContainer used by downstream processing.

Supported input types include dictionaries, dataclasses, objects with attributes, and existing DatasetContainer instances.

Parameters:

obj (Any) – Object representing dataset data.

Returns:

Structured container wrapping the dataset data.

Return type:

DatasetContainer or None

Raises:

TypeError – If the input type is not supported.