shapepipe.pipeline.file_handler

shapepipe.pipeline.file_handler

FILE HANDLER.

This module defines a class for handling pipeline files.

Author

Samuel Farrens <samuel.farrens@cea.fr>

class FileHandler(run_name, modules, config, verbose=True)[source]

Bases: object

File Handler.

This class manages the files used and produced during a pipeline run.

Parameters
  • run_name (str) – Run name

  • module_list (list) – List of modules to be run

  • config (CustomParser) – Configuaration parser instance

  • verbose (bool, optional) – Verbose setting, default is True

property run_dir

Run Directory.

This method defines the run directory.

property _input_dir

Input Directory.

This method defines the input directories.

property _output_dir

Output Directory.

This method defines the output directory.

static read_number_list(file_name)[source]

Read Number List.

Extract number strings to be processed from a file.

Parameters

file_name (str) – Number list file name

classmethod check_dir(dir_name, check_exists=False)[source]

Check Directory.

Raise error if directory exists.

Parameters

dir_name (str) – Directory name

Raises

OSError – If directory already exists

classmethod check_dirs(dir_list)[source]

Check Directories.

Check directories in list

Parameters

dir_list (list) – Directory list

classmethod mkdir(dir_name)[source]

Make Directory.

This method creates a directory at the specified path.

Parameters

dir_name (str`) – Directory name with full path

static setpath(path, name, ext='')[source]

Set Path Name.

This method appends the file/directory name to the input path.

Parameters
  • path (str) – Full path

  • name (str) – File or directory name

  • ext (str, optional) – File extension, default is ‘’

Returns

Formated path

Return type

str

static strip_slash(path)[source]

Strip Slash.

This method removes the trailing slash from a path.

Parameters

path (str) – Full path

Returns

Updated path

Return type

str

classmethod strip_slash_list(path_list)[source]

Strip Slash List.

This method removes the trailing slash from a list of paths.

Parameters

path_list (list) – List of paths

Returns

Updated paths

Return type

list

static flatten_list(input_list)[source]

Flatten List.

Flatten a list of lists.

Parameters

input_list (list) – A list of lists

Returns

Flattened list

Return type

list

static _get_module_run_name(dir)[source]

Get Module Run Name.

Retrieve module run name, module name and search string from input string.

Parameters

dir (str) – Input directory string

Returns

Module run name, module name, search string

Return type

tuple

_check_input_dir_list(dir_list)[source]

Check Input Directory List.

Check an input list to see if the directories exist or if the the run log should be serarched for an appropriate output directory.

Parameters

dir_list (list) – List of directories

Raises

ValueError – For invalid input directory value

_get_input_dir()[source]

Get Input Directory.

This method sets the module input directory

create_global_run_dirs()[source]

Create Global Run Directories.

This method creates the pipeline output directories for a given run.

_copy_config_to_log()[source]

Copy Config to Log.

Copy configuration file to run log directory.

get_module_current_run(module)[source]

Get Module Current Run.

Get the run current run count for the module.

Parameters

module (str) – Module name

Returns

Module run count

Return type

str

get_module_run_prop(module, property)[source]

Get Module Run Property.

Return the requested property for a given module run.

Parameters
  • module (str) – Module name

  • property (str) – Module run property name

Returns

Module run property value

Return type

any

Raises

ValueError – For invalid module dictionary key

get_module_config_sec(module)[source]

Get Module Configuration Section.

Get the name of section name in the configuration file for the module.

Parameters

module (str) – Module name

Returns

Configuration file section name

Return type

str

get_add_module_property(run_name, property)[source]

Get Additional Module Properties.

Get a list of additional module property values.

Parameters
  • run_name (str) – Module run name

  • property (str) – Property name

Returns

Additional module property values

Return type

list

_set_module_property(module, run_name, property, get_type)[source]

Set Module Property.

Set a module property from either the configuration file or the module runner.

Parameters
  • module (str) – Module name

  • property (str) – Property name

  • get_type (str) – Type of object to get from config file

Notes

Module properties are set with the following hierarchy:

  1. Look for properties in module section of the config file,

  2. look for default properties in the file handler, which sources the [FILE] section of the config file,

  3. look for default properties in the module runner definition.

In other words, module runner definitions will only be used if the properties are not set in the confg file. File handler properties will be used for all modules, overriding all module runner definitions. Module specific properties from the config file will override all other definitions, but only for the module in question.

_set_module_properties(module, run_name)[source]

Get Module Properties.

Get module properties defined in module runner wrapper.

Parameters

module (str) – Module name

_create_module_run_dirs(module, run_name)[source]

Create Module Run Directories.

This method creates the module output directories for a given run.

Parameters

module (str) – Module name

_set_module_input_dir(module, run_name)[source]

Set Module Input Directory.

Set the module input directory. If the module specified is the first module in the pipeline or does not have any input modules then only the INPUT_DIR from [FILE] is used, otherwise the output directory from the preceding module is used.

Additional input directories can be specified with INPUT_DIR from [MODULE].

Parameters
  • module (str) – Module name

  • run_name (str) – Module run name, if the module has only been called once this will be identical to module

static _generate_re_pattern(match_pattern)[source]

Generate Regular Expression Pattern.

Generate a regular expression pattern from an input string.

Parameters

match_pattern (str) – Pattern string

Returns

Regular expression pattern

Return type

_sre.SRE_Pattern

Raises

TypeError – For invalid input type

static _strip_dir_from_file(file_name, dir_list)[source]

Strip Directory from File Name.

Remove the directory string from the file name.

Parameters
  • file_name (str) – File name

  • dir_list (list) – Input directory list

Returns

File name

Return type

str

classmethod _get_re(num_scheme)[source]

Get Regular Expression.

Return the regular expression corresponding to the numbering scheme.

Parameters

num_scheme (str) – Numbering scheme

Raises

ValueError – if num_scheme is None

Returns

Regular Expression

Return type

str

_save_num_patterns(dir_list, re_pattern, pattern, ext, output_file)[source]

Save Number Patterns.

Save file number patterns to numpy binary, update file patterns and get correct file paths.

Parameters
  • dir_list (list) – List of input directories

  • re_pattern (str) – Regular expression pattern

  • pattern (str) – File pattern

  • ext (str) – File extension

  • output_file (str) – Output file name

static _save_match_patterns(output_file, mmap_list)[source]

Save Match Patterns.

Save matching number patterns to numpy binary.

Parameters
  • output_file (str) – Output file name

  • mmap_list (list) – List of memory maps

static _get_file_name(path, pattern, number, ext)[source]

Get File Name.

Get file name corresponding to the path, file pattern, number pattern and file extension.

Parameters
  • path (str) – Path to file

  • pattern (str) – File pattern

  • number (str) – Number pattern

  • ext (str) – File extension

Returns

File name

Return type

str

static _remove_mmaps(mmap_list)[source]

Remove Memory Maps.

Remove memory map files in input list.

Parameters

mmap_list (list or str) – List of memory map files

_format_process_list(patterns, memory_map, re_pattern, num_scheme, run_method)[source]

Format Process List.

Format the list of files to be processed.

Parameters
  • patterns (list) – List of file patterns

  • memory_map (str) – Name of memory map file

  • re_pattern (str) – Regular expression for numbering scheme

  • num_scheme (str) – Numbering scheme

  • run_method (str) – Run method

Returns

List of processes

Return type

list

_save_process_list(dir_list, pattern_list, ext_list, num_scheme, run_method)[source]

Save Process List.

Save list of processes to a numpy binary.

Parameters
  • dir_list (list) – List of input directories

  • pattern_list (list) – List of file patterns

  • ext_list (list) – List of file extensions

  • num_scheme (str) – Numbering scheme

remove_process_mmap()[source]

Remove Process MMAP.

Remove process list memory map.

_get_module_input_files(module, run_name)[source]

Get Module Input Files.

Retrieve the module input files names from the input directory.

Parameters

module (str) – Module name

set_up_module(module)[source]

Set Up Module.

Set up module parameters for file handler.

Parameters

module (str) – Module name

get_worker_log_name(module, file_number_string)[source]

Get Worker Log Name.

This method generates a worker log name.

Parameters
  • module (str) – Module name

  • file_number_string (str) – File numbering in output

Returns

Worker log file name

Return type

str