wf_psf.data.factory

Factory module for creating and normalizing data adapters.

This module defines the DataAdapterFactory, which constructs DataAdapter instances from a variety of dataset formats, including dictionaries, dataclasses, LoadedDataset instances, or objects with attributes exposing numpy arrays. It also integrates dataset normalization through the DataEnvelope and utility routines in data_utils.

The module defines a protocol (SupportsParams) to allow external APIs to pass parameter containers in a generic way, supporting dataclasses, custom objects, or dictionaries.

Key features:

  • Automatic detection of dataset structure (train/test/complete) and conversion to LoadedDataset for downstream processing.

  • Normalization and validation of dataset parameters via normalize_data_envelope.

  • Optional metadata extraction when available in input objects.

  • Integration with TensorFlowDatasetConverter for TF-ready dataset pipelines.

  • Lightweight dataset introspection utilities for in-memory datasets and canonical keys.

  • Logging to provide insight into dataset resolution and loading steps.

Author: Jennifer Pollack <jennifer.pollack@cea.fr>

Functions

normalize_data_envelope(obj[, field_name, ...])

Normalize data envelope.

Classes

DataAdapterFactory()

Factory for creating DataAdapters from various dataset formats.

DataEnvelope(data, params[, metadata])

Encapsulates separated dataset, parameters and metadata.

SupportsParams(*args, **kwargs)

Protocol for dataset objects containing parameters.

class wf_psf.data.factory.DataAdapterFactory[source]

Bases: object

Factory for creating DataAdapters from various dataset formats.

Methods

build(data)

Create a DataAdapter.

static build(data)[source]

Create a DataAdapter.

Parameters:

data (object) –

The dataset to be adapted. Can be:

  • A LoadedDataset instance

  • A dataclass with numpy arrays (e.g., train/test containers, parameters or shallow complete)

  • A dict containing ‘train’, ‘test’, or ‘complete’ keys with numpy arrays

  • An object with attributes that are numpy arrays (like your train/test containers)

The factory will automatically detect the structure and convert it into a LoadedDataset.

Return type:

DataAdapter

class wf_psf.data.factory.DataEnvelope(data: Any | None, params: Any, metadata: dict | None = None)[source]

Bases: object

Encapsulates separated dataset, parameters and metadata.

data

The actual dataset (e.g., LoadedDataset, dict, dataclass). Can be None if input is just params.

Type:

Optional[Any]

params

Configuration parameters used to resolve and load the dataset. Required for adapter construction.

Type:

ParamsType

metadata

Ancillary information about the dataset (IDs, units, provenance, etc.). Defaults to None if not present in input.

Type:

Optional[dict] = None

Attributes:
metadata
data: Any | None
metadata: dict | None = None
params: Any
class wf_psf.data.factory.SupportsParams(*args, **kwargs)[source]

Bases: Protocol

Protocol for dataset objects containing parameters.

This protocol represents objects that expose a params attribute containing dataset parameters. This allows dataclasses, custom objects, and other parameter containers to be accepted by the data adapter API.

params

An object (e.g., dict, structured namespace, etc) containing dataset-specific parameters.

Type:

Any

params: Any
wf_psf.data.factory.normalize_data_envelope(obj: Any, field_name: str = 'params', metadata_name: str = 'metadata') DataEnvelope[source]

Normalize data envelope.

Normalize an input object into a DataEnvelope by extracting named parametric fields and metadata. Supports dataclasses, dictionaries, and generic objects with attributes.

Parameters:
  • obj (Any) – Input object containing dataset, parameters, and optionally metadata.

  • field_name (str, default "params") – Name of the field to extract as parameters.

  • metadata_name (str, default "metadata") – Name of the field to extract as metadata, if present.

Returns:

Object containing separated data, parameters, and metadata.

Return type:

DataEnvelope

Notes

  • The params field is optional, but may be required by downstream components (e.g. the factory) to resolve how the dataset should be constructed (in-memory vs. file-based loading).

  • The metadata field is optional and ignored if not present.