wf_psf.data.factory
Factory module for creating and normalizing data adapters.
This module defines the DataAdapterFactory, which constructs DataAdapter
instances from a variety of dataset formats, including dictionaries,
dataclasses, LoadedDataset instances, or objects with attributes exposing
numpy arrays. It also integrates dataset normalization through the DataEnvelope
and utility routines in data_utils.
The module defines a protocol (SupportsParams) to allow
external APIs to pass parameter containers in a generic way,
supporting dataclasses, custom objects, or dictionaries.
Key features:
Automatic detection of dataset structure (train/test/complete) and conversion to
LoadedDatasetfor downstream processing.Normalization and validation of dataset parameters via
normalize_data_envelope.Optional metadata extraction when available in input objects.
Integration with
TensorFlowDatasetConverterfor TF-ready dataset pipelines.Lightweight dataset introspection utilities for in-memory datasets and canonical keys.
Logging to provide insight into dataset resolution and loading steps.
Author: Jennifer Pollack <jennifer.pollack@cea.fr>
Functions
|
Normalize data envelope. |
Classes
Factory for creating DataAdapters from various dataset formats. |
|
|
Encapsulates separated dataset, parameters and metadata. |
|
Protocol for dataset objects containing parameters. |
- class wf_psf.data.factory.DataAdapterFactory[source]
Bases:
objectFactory for creating DataAdapters from various dataset formats.
Methods
build(data)Create a DataAdapter.
- static build(data)[source]
Create a DataAdapter.
- Parameters:
data (object) –
The dataset to be adapted. Can be:
A
LoadedDatasetinstanceA
dataclasswith numpy arrays (e.g., train/test containers, parameters or shallow complete)A
dictcontaining ‘train’, ‘test’, or ‘complete’ keys with numpy arraysAn
objectwith attributes that are numpy arrays (like your train/test containers)
The factory will automatically detect the structure and convert it into a
LoadedDataset.- Return type:
- class wf_psf.data.factory.DataEnvelope(data: Any | None, params: Any, metadata: dict | None = None)[source]
Bases:
objectEncapsulates separated dataset, parameters and metadata.
- data
The actual dataset (e.g.,
LoadedDataset,dict,dataclass). Can beNoneif input is just params.- Type:
Optional[Any]
- params
Configuration parameters used to resolve and load the dataset. Required for adapter construction.
- Type:
ParamsType
- metadata
Ancillary information about the dataset (IDs, units, provenance, etc.). Defaults to None if not present in input.
- Type:
Optional[dict] = None
- Attributes:
- metadata
- class wf_psf.data.factory.SupportsParams(*args, **kwargs)[source]
Bases:
ProtocolProtocol for dataset objects containing parameters.
This protocol represents objects that expose a
paramsattribute containing dataset parameters. This allows dataclasses, custom objects, and other parameter containers to be accepted by the data adapter API.- params
An object (e.g., dict, structured namespace, etc) containing dataset-specific parameters.
- Type:
Any
- wf_psf.data.factory.normalize_data_envelope(obj: Any, field_name: str = 'params', metadata_name: str = 'metadata') DataEnvelope[source]
Normalize data envelope.
Normalize an input object into a
DataEnvelopeby extracting named parametric fields and metadata. Supports dataclasses, dictionaries, and generic objects with attributes.- Parameters:
- Returns:
Object containing separated data, parameters, and metadata.
- Return type:
Notes
The
paramsfield is optional, but may be required by downstream components (e.g. the factory) to resolve how the dataset should be constructed (in-memory vs. file-based loading).The
metadatafield is optional and ignored if not present.