mccd.dataset_generation

DATASET GENERATION UTILITY FUNCTIONS.

Useful function to generate dataset including that include:

  • Analytic ellipticity and size variations.

  • Variations based on an interpolation of a binned image.

  • Random atmospheric contributions with a Von Karman power spectrum.

Authors

Tobias Liaudat <tobias.liaudat@cea.fr>

class GenerateRealisticDataset(e1_path, e2_path, size_path, output_path, image_size=51, psf_flux=1.0, beta_psf=4.765, pix_scale=0.187, catalog_id=2086592, n_ccd=40, range_mean_star_qt=[40, 100], range_dev_star_nb=[- 10, 10], max_fwhm_var=0.04, save_realisation=False, loc2glob=None, atmos_kwargs={'ngrid': 8192}, e1_kwargs={}, e2_kwargs={})[source]

Bases: object

Generate realistic dataset for training and validating PSF models.

General considerations:

The PSF will have a Moffat profile with a specific beta parameter that by default is fixed for the CFIS observations.

Optical contributions: This realistic simulation is based on the CFIS survey of CFHT. The idea is to use the mean star ellipticity measurements to characterise the optical aberrations in terms of ellipticity contribution. We use the mean measurements and we interpolate from those images to get the e1, e2 contribution at the target positions. For the size, we draw a random sample from the size distribution of the observed CFIS mean exposure FWHM.

Atmospheric contribution: In this case we will be using the Von Karman power function as it has been proposed in the Heymans et al. 2012 paper. We use Galsim’s PowerSpectrum class to generate an atmospheric realisation. We use the same Von Karman power law for the E-mode and the B-mode. The outer scale parameter, also known as theta_zero is set with the value measured in Heymans paper. Then we simulate a grid of points that correspond to our focal plane, and then we interpolate that grid to the target positions. This gives us the ellipticity contribution of the atmosphere to our PSF. We adjust the variance of the variations to a range that makes sense with our optical contributions. We use the magnification of the lensing field to multiply the constant exposure size.

Total contribution: We need to add the ellipticity contributions of the optical and the atmospheric part to get the overall ellipticity distribution of the exposure. Concerning the size, as many star selection methods for PSF modelling consist in making cuts on the star sizes, we know that the observed stars in our exposure have limited size variations. That is why we add the max_fwhm_var parameter that scales the size variations so that the maximal variations are within the required range.

Star quantity and positions: For the number of stars we draw a random number of stars per CCD that will be our mean number of stars per CCD for this particular exposure. The distribution is uniform within the range range_mean_star_qt. Then for each CCD we draw a random uniform sample from the range range_dev_star_nb that will deviate the star number from the exposure mean. However, this deviation shuold have a zero mean,ex [-10, 10]. Once we have set the number of stars a CCD will have, we draw random positions in the corresponding CCD.

The test dataset can be generated at random positions or at a regular grid. Once the train dataset is generated, the exposure_sim attribute is saved for reproductibility issues. The plotting functions plot_realisation and plot_correlation of the AtmosphereGenerator allows to see the atmospheric realisations.

Usage example:

sim_dataset_gen = mccd.dataset_generation.GenerateRealisticDataset(

e1_path=e1_path, e2_path=e2_path, size_path=fwhm_path, output_path=output_path, catalog_id=cat_id

) sim_dataset_gen.generate_train_data() sim_dataset_gen.generate_test_data()

Parameters
  • e1_path (str) – Path to the binned e1 data.

  • e2_path (str) – Path to the binned e2 data.

  • size_path (str) – Path to the size distribution. FWHM in arcsec.

  • output_path (str) – Path to the folder to save the simulated datasets.

  • image_size (int) – Dimension of the squared image stamp. (image_size x image_size) Default is 51.

  • psf_flux (float) – Total PSF photometric flux. Default is 1.

  • beta_psf (float) – Moffat beta parameter. Default is 4.765.

  • pix_scale (float) – Pixel scale. Default is 0.187.

  • catalog_id (int) – Catalog identifier number. Default is 2086592.

  • n_ccd (int) – Total number of CCDs. Default is 40.

  • range_mean_star_qt ([float, float]) – Range of the uniform distribution from where to sample the mean star number per exposure. Default is [40, 90].

  • range_dev_star_nb ([float, float]) – Range of the uniform distribution from where to sample the deviation of star number for each with respect to the mean star number. Default is [-10, 10].

  • max_fwhm_var (float) – Maximum FWHM variation with respect to the mean allowed in one exposure. Units are in arcsec. As many times the star selection is done in size cuts, the maximum FWHM variations are known. Default is 0.04.

  • save_realisation (bool) – If we need to save the exposure realisation in order to be able to reproduce the simulation.

  • loc2glob (object) – The object that allows to do the coordinate conversion from local to global. It is specific for each instrument’s focal plane geometry. If is None it defaults to the CFIS MegaCam instrument. Default is None.

init_random_positions()[source]

Initialise random positions.

init_grid_positions(x_grid=5, y_grid=10)[source]

Initialise positions in a regular grid.

scale_fwhms(fwhms)[source]

Scale the FWHM values so that they are in the desired range.

generate_train_data()[source]

Generate the training dataset and saves it in fits format.

The positions are drawn randomly.

generate_test_data(grid_pos_bool=False, x_grid=5, y_grid=10)[source]

Generate the test dataset and save it into a fits file.

Parameters
  • x_grid (int) – Horizontal number of elements of the testing grid in one CCD.

  • y_grid (int) – Vertical number of elements of the testing grid in one CCD.

static handle_SExtractor_mask(stars, thresh)[source]

Handle SExtractor masks. Reads SExtracted star stamps, generates MCCD-compatible masks (that is, binary weights), and replaces bad pixels with 0s - they will not be used by MCCD, but the ridiculous numerical values can otherwise still lead to problems because of convolutions.

class MomentInterpolator(moment_map, n_neighbors=1000, rbf_function='thin_plate', loc2glob=None)[source]

Bases: object

Allow to interpolate moments from a bin image.

Bin image like the one from the MeanShapes function.

Notes

Hard-coded for the CFIS convention!

interpolate_position(target_x, target_y)[source]

Interpolate positions.

class AtmosphereGenerator(theta_zero=180.0, r_trunc=1.0, ngrid=8192, map_std=0.008, pix_scale=0.187, loc2glob=None)[source]

Bases: object

Generate atmospheric variations.

This class generates a random atmospheric contributition in terms of elloipticity and size. The simulation is done using the Von Karman model for isotropic atmospheric turbulence. We use the model’s 2D power spectrum to generate a realisation of the atmosphere of the dimensions of our focal plane.

The parameter theta_zero of the model, also known as the outer scale, is by default fixed for th CFHT telescope based on the results of Heymans et al. 2012 (DOI: 10.1111/j.1365-2966.2011.20312.x).

Parameters
  • theta_zero (float) – Outer scale parameter of the Von Karman model. In arcsec.

  • r_trunc (float) – Gaussian truncation parameter of the power spectrum. In arcsec.

  • ngrid (int) – Number of grid points to use for our power spectrum realisation. Should be a power of 2.

  • map_std (float) – Standard deviation of our realisation.

  • pix_scale (float) – Pixel scale of our instrument. In arcsec/pixel.

  • loc2glob (object) – The object that allows to do the coordinate conversion from local to global. It is specific for each instrument’s focal plane geometry. If is None it defaults to the CFIS MegaCam instrument. Default is None.

power_fun(freq)[source]

Von Karman power function.

Parameters should be in arcsec. Heymans’ parameter for the CFHT telescope is in the range [2.62, 3.22] arcmin.

init_powerspectrum()[source]

Initialise the powerspectrum.

regenerate_atmosphere()[source]

Generate a new random atmosphere.

interpolate_position(target_x, target_y)[source]

Get the ellipticity and size factor for a target position.

It is recommended to calculate with 1D arrays as it is much faster.

Parameters
  • target_x (1D np.ndarray or float) – Position x in global MCCD coordinates.

  • target_y (1D np.ndarray or float) – Position y in global MCCD coordinates.

Returns

  • e1 (1D np.ndarray or float) – At1D np.ndarray or floatmospheric contribution to the first ellipticity component.

  • e2 (1D np.ndarray or float) – Atmospheric contribution to the second ellipticity component.

  • size_factor (1D np.ndarray or float) – Atmospheric factor afecting the PSF size.

plot_realisation(ccd_corner=None, save_path=None, save_fig=False)[source]

Plot atmospheric realisation.

Plot the entire focal plane and the dimensions of a CCD.

plot_correlation(save_path=None, n_points=100, kmin_factor=10.0, kmax_factor=10.0, save_fig=False)[source]

Plot correlation functions.

class ExposureSimulation(e1_bin_path=None, e2_bin_path=None, fwhm_dist_path=None, fwhm_range=[0.45, 1.0], loc2glob=None, atmos_kwargs={}, e1_kwargs={}, e2_kwargs={})[source]

Bases: object

Simulate one exposure.

Generate a random exposure and the give the ellipticities and size of the PSF for any position in the focal plane.

Parameters
  • e1_bin_path (str) – e1 data path.

  • e2_bin_path (str) – e2 data path.

  • fwhm_dist_path (str) – fwhm distribution path.

  • fwhm_range ([float, float]) – The range for the possible fwhm. Units in arcsec. Default for CFIS data.

  • loc2glob (object) – The object that allows to do the coordinate conversion from local to global. It is specific for each instrument’s focal plane geometry. If is None it defaults to the CFIS MegaCam instrument. Default is None.

  • atmos_kwargs (dict) – Atmosphere arguments.

  • e1_kwargs (dict) – e1 interpolator arguments.

  • e2_kwargs (dict) – e2 interpolator arguments.

init_exposure()[source]

Initialise exposure variables.

regenerate_exposure()[source]

Regenerate a random exposure.

interpolate_values(target_x, target_y)[source]

Interpolate exposure values.

For some target positions interpolate the values (e1, e2, fwhm). The input positions are in global MCCD coordinates. Faster if the several positions are passed as a np.ndarray.

Parameters
  • target_x (float or np.ndarray) – Target positions x coordinate from the global MCCD coordinate system.

  • target_y (float or np.ndarray) – Target positions y coordinate from the global MCCD coordinate system.

Returns

  • current_e1 (float or np.ndarray) – Interpolated e1 values at target positions.

  • current_e2 (float or np.ndarray) – Interpolated e2 values at target positions.

  • current_fwhm (float or np.ndarray) – Interpolated fwhm values at target positions. Units in arcsec.

class GenerateSimDataset(input_pos_path, input_ccd_path, output_path, e1_analytic_fun=None, e2_analytic_fun=None)[source]

Bases: object

Generate simulated dataset for training and validating PSF models.

Parameters
  • input_pos_path (str) – Path to the global positions of the PSF that will be used for the training.

  • input_ccd_path (str) – Path to the corresponding CCDs of the global positions.

  • output_path (str) – Path to the folder to save the simulated datasets.

  • e1_analytic_fun (function) – The analytic e1 ellipticity function that will define an ellipticity e1 for each position in the focal plane.

  • e2_analytic_fun (function) – The analytic e2 ellipticity function that will define an ellipticity e2 for each position in the focal plane.

Notes

The simulated PSFs are based on the Moffat profile and we are using Galsim to generate them. We base ourselves on two analytic functions that have to output an ellipticity for each position in the focal plane.

load_data()[source]

Load data from input paths.

generate_train_data(sigma=1.6, image_size=51, psf_flux=1.0, beta_psf=4.8, pix_scale=0.187, desired_SNR=30, catalog_id=2086592)[source]

Generate the training dataset and saves it in fits format.

Parameters
  • sigma (float) – Size of the PSF in sigma’s. (Sigma from Galsim’s HSM adaptive moments). Default is 1.6.

  • image_size (int) – Dimension of the squared image stamp. (image_size x image_size) Default is 51.

  • psf_flux (float) – Total PSF photometric flux. Default is 1.

  • beta_psf (float) – Moffat beta parameter. Default is 4.8.

  • pix_scale (float) – Pixel scale. Default is 0.187.

  • desired_SNR (float) – Desired SNR Default is 30.

  • catalog_id (int) – Catalog identifier number. Default is 2086592.

generate_test_data(x_grid=5, y_grid=10, n_ccd=40)[source]

Generate the test dataset and save it into a fits file.

Parameters
  • x_grid (int) – Horizontal number of elements of the testing grid in one CCD.

  • y_grid (int) – Vertical number of elements of the testing grid in one CCD.

  • n_ccd (int) – Number of CCDs in the instrument.

Notes

n_ccd should be coherent with the corresponding functions on mccd.mccd_utils that do the change of coordiante system.

static e1_catalog_fun(x, y)[source]

Define an e1 ellipticity per position.

Analytic function for defining the e1 ellipticity as a function of the global position (MegaCam).

static e2_catalog_fun(x, y)[source]

Define an e2 ellipticity per position.

Analytic function for defining the e2 ellipticity as a function of the global position (MegaCam).

static scale_coordinates(x, y, coor_min, coor_max, offset=None)[source]

Scale global coordinates.

static bessel_generator(x, y, coor_min, coor_max, max_order, scale_factor, circular_symetry=False, exp_decay_alpha=None, offset=None)[source]

Generate a type of Bessel function response.

static handle_SExtractor_mask(stars, thresh)[source]

Handle SExtractor masks.

Reads SExtracted star stamps, generates MCCD-compatible masks (that is, binary weights), and replaces bad pixels with 0s - they will not be used by MCCD, but the ridiculous numerical values can otherwise still lead to problems because of convolutions.