autowisp.hdf5_file module

Class Inheritance Diagram

Inheritance diagram of ABC, BytesIO, HDF5File, HDF5LayoutError

Define a class for working with HDF5 files.

class autowisp.hdf5_file.HDF5File(fname=None, mode=None, layout_version=None, **kwargs)[source]

Bases: ABC, File

Inheritance diagram of autowisp.hdf5_file.HDF5File

Base class for HDF5 pipeline products.

The actual structure of the file has to be defined by a class inheriting from this one, by overwriting the relevant properties and _get_root_tag_name().

Implements backwards compatibility for different versions of the structure of files.

_file_structure

See the first entry returned by get_file_structure.

_file_structure_version

See the second entry returned by get_file_structure.

_hat_id_prefixes

A list of the currently recognized HAT-ID prefixes, with the correct data type ready for adding as a dataset.

Type:

numpy.array

__init__(fname=None, mode=None, layout_version=None, **kwargs)[source]

Opens the given HDF5 file in the given mode.

Parameters:
  • fname – The name of the file to open.

  • mode – The mode to open the file in (see hdf5.File).

  • layout_version – If the file does not exist, this is the version of the layout that will be used for its structure. Leave None to use the latest defined.

  • kwargs – Any additional arguments. Passed directly to h5py.File.

Returns:

None

_add_repack_dataset(dataset_path)[source]

Add the given dataset to the list of datasets to repack.

_flag_required_attribute_parents()[source]

Flag attributes whose parents must exist when adding the attribute.

The file structure must be fully configured before calling this method!

If the parent is a group, it is safe to create it and then add the attribute, however, this in is not the case for attributes to datasets.

Add an attribute named ‘parent_must_exist’ to all attribute configurations in self._file_structure set to False if and only if the attribute parent is a group.

abstractmethod classmethod _get_root_tag_name()[source]

The name of the root tag in the layout configuration.

abstractmethod classmethod _product()[source]

The pipeline key of the product held in this type of HDF5 files.

static _replace_nonfinite(data, expected_dtype, replace_nonfinite)[source]

Return (copy of) data with non-finite values replaced.

_write_text_to_dataset(dataset_key, text, if_exists='overwrite', **substitutions)[source]

Adds ASCII text/file as a dateset to an HDF5 file.

Parameters:
  • dataset_key – The key identifying the dataset to add.

  • text – The text or file to add. If it is an open file, the contents is dumped, if it is a python2 string or a python3 bytes, the value is stored.

  • if_exists – See add_dataset().

  • substitututions – Any arguments that should be substituted in the dataset path.

Returns:

None

add_attribute(attribute_key, attribute_value, if_exists='overwrite', **substitutions)[source]

Adds a single attribute to a dateset or a group.

Parameters:
  • attribute_key – The key in _destinations that corresponds to the attribute to add. If the key is not one of the recognized keys, h5file is not modified and the function silently exits.

  • attribute_value – The value to give the attribute.

  • if_exists

    What should be done if the attribute exists? Possible values are:

    • ignore:

      do not update but return the attribute’s value.

    • overwrite:

      Change the value to the specified one.

    • error:

      raise an exception.

  • substitutions – variables to substitute in HDF5 paths and names.

Returns:

The value of the attribute. May differ from attribute_value if the attribute already exists, if type conversion is performed, or if the file structure does not specify a location for the attribute. In the latter case the result is None.

Return type:

unknown

add_dataset(dataset_key, data, *, if_exists='overwrite', unlimited=False, shape=None, dtype=None, **substitutions)[source]

Adds a single dataset to self.

If the target dataset already exists, it is deleted first and the name of the dataset is added to the root level Repack attribute.

Parameters:
  • dataset_key – The key identifying the dataset to add.

  • data – The values that should be written, a numpy array with an appropriate data type or None if an empty dataset should be created.

  • if_exists – See same name argument to add_attribute.

  • unlimited (bool) – Should the first dimension of the dataset be unlimited (i.e. data can be added later)?

  • shape (tuple(int,...)) – The shape of the dataset to create if data is None, otherwise the shape of the data is used. Just like if data is specified, the first dimension will be ignored if unlimited is True. It is an error to specify both data and shape!

  • dtype – The data type for the new dataset if the data is None. It is an error to specify both dtype and data!

  • substitututions – Any arguments that should be substituted in the dataset path.

Returns:

None

add_file_dump(dataset_key, fname, if_exists='overwrite', delete_original=True, **substitutions)[source]

Adds a byte by byte dump of a file to self.

If the file does not exist an empty dataset is created.

Parameters:
  • fname – The name of the file to dump.

  • dataset_key – Passed directly to dump_file_like.

  • if_exists – See same name argument to add_attribute.

  • delete_original – If True, the file being dumped is deleted (default).

  • substitutions – variables to substitute in the dataset HDF5 path.

Returns:

None.

Adds a soft link to the HDF5 file.

Parameters:
  • link_key – The key identifying the link to create.

  • if_exists – See same name argument to add_attribute().

  • substitutions – variables to substitute in HDF5 paths and names of both where the link should be place and where it should point to.

Returns:

The path the identified link points to. See if_exists argument for how the value con be determined or None if the link was not created (not defined in current file structure).

Return type:

str

Raises:

IOError – if an object with the same name as the link exists, but is not a link or is a link, but does not point to the configured target and if_exists == ‘error’.

check_for_dataset(dataset_key, must_exist=True, **substitutions)[source]

Check if the given key identifies a dataset and it actually exists.

Parameters:
  • dataset_key – The key identifying the dataset to check for.

  • must_exist – If True, and the dataset does not exist, raise IOError.

  • substitutions – Any arguments that should be substituted in the path. Only required if must_exist == True.

Returns:

None

Raises:
  • KeyError – If the specified key is not in the currently set file structure or does not identify a dataset.

  • IOError – If the dataset does not exist but the must_exist argument is True.

static collect_columns(destination, name_head, name_tail, dset_name, values)[source]

If dataset is 1D and name starts and ends as given, add to destination.

This function is intended to be passed to h5py.Group.visititems() after fixing the first 3 arguments using functools.partial.

Parameters:
  • destination (pandas.DataFrame) – The DataFrame to add matching datasets to. Datasets are added with column names given by the part of the name between name_head and name_tail.

  • name_head (str) – Only datasets whose names start with this will be included.

  • name_tail (str) – Only datasets whose names end with this will be included.

  • dset_name (str) – The name of the dataset.

  • values (array-like) – The values to potentially add as the new column.

Returns:

None

delete_attribute(attribute_key, **substitutions)[source]

Delete the given attribute.

delete_columns(parent, name_head, name_tail, dset_name)[source]

Delete 1D datasets under parent if name starts and ends as given.

delete_dataset(dataset_key, **substitutions)[source]

Delete obsolete HDF5 dataset if it exists and update repacking flag.

Parameters:

dataset_key – The key identifying the dataset to delete.

Returns:

Was a dataset actually deleted?

Return type:

bool

Raises:

Error.HDF5 – if an entry already exists at the target dataset’s location but is not a dataset.

Delete the link corresponding to the given key.

dump_file_or_text(dataset_key, file_contents, if_exists='overwrite', **substitutions)[source]

Adds a byte-by-byte dump of a file-like object to self.

Parameters:
  • dataset_key – The key identifying the dataset to create for the file contents.

  • file_contents – See text argument to _write_text_to_dataset(). None is also a valid value, in which case an empty dataset is created.

  • if_exists – See same name argument to add_attribute.

  • substitutions – variables to substitute in the dataset HDF5 path.

Returns:

Was the dataset actually created?

Return type:

(bool)

abstract property elements

Identifying strings for the recognized elements of the HDF5 file.

Shoul be a dictionary-like object with values being a set of strings containing the identifiers of the HDF5 elements and keys:

  • dataset: Identifiers for the data sets that could be included in

    the file.

  • attribute: Identifiers for the attributes that could be included

    in the file.

  • link: Identifiers for the links that could be included in

    the file.

get_attribute(attribute_key, default_value=None, **substitutions)[source]

Returns the attribute identified by the given key.

Parameters:
  • attribute_key – The key of the attribute to return. It must be one of the standard keys.

  • default_value – If this is not None this values is returned if the attribute does not exist in the file, if None, not finding the attribute rasies IOError.

  • substitutions – Any keys that must be substituted in the path (i.e. ap_ind, config_id, …).

Returns:

The value of the attribute.

Return type:

value

Raises:
  • KeyError – If no attribute with the given key is defined in the current files structure or if it does not correspond to an attribute.

  • IOError – If the requested dataset is not found and no default value was given.

get_dataset(dataset_key, expected_shape=None, default_value=None, **substitutions)[source]

Return a dataset as a numpy float or int array.

Parameters:
  • dataset_key – The key in self._destinations identifying the dataset to read.

  • expected_shape – The shape to use for the dataset if an empty dataset is found. If None, a zero-sized array is returned.

  • default_value – If the dataset does not exist, this value is returned.

  • substitutions – Any arguments that should be substituted in the path.

Returns:

A numpy int/float array containing the identified dataset from the HDF5 file.

Return type:

numpy.array

Raises:
  • KeyError – If the specified key is not in the currently set file structure or does not identify a dataset.

  • IOError – If the dataset does not exist, and no default_value was specified

get_dataset_creation_args(dataset_key, **path_substitutions)[source]

Return all arguments to pass to create_dataset() except the content.

Parameters:
  • dataset_key – The key identifying the dataset to delete.

  • path_substitutions – In theory the dataset creation arguments can depend on the full dataset path (c.f. srcextract.sources).

Returns:

All arguments to pass to create_dataset() or require_dataset() except: name, shape and data.

Return type:

dict

get_dataset_shape(dataset_key, **substitutions)[source]

Return the shape of the given dataset.

get_dtype(element_key)[source]

Return numpy data type for the element with by the given key.

get_element_path(element_id, **substitutions)[source]

Return the path to the given element (.<attr> for attributes).

Parameters:

substitutions – Arguments that should be substituted in the path. If none are given, the path is returned without substitutions.

Returns:

A string giving the path the element does/will have in the file.

Return type:

str

classmethod get_element_type(element_id)[source]

Return the type of HDF5 entry that corresponds to the given ID.

Parameters:

element_id – The identifying string for an element present in the HDF5 file.

Returns:

The type of HDF5 structure to create for this element.

One of: ‘group’, ‘dataset’, ‘attribute’, ‘link’.

Return type:

hdf5_type

abstractmethod classmethod get_file_structure(version=None)[source]

Return the layout structure with the given version of the file.

Parameters:

version – The version number of the layout structure to set. If None, it should provide the default structure for new files (presumably the latest version).

Returns:

The dictionary specifies how to include elements in the HDF5 file. The keys for the dictionary should be one in one of the lists in self.elements and the value is an object with attributes decsribing how to include the element. See classes in :mod:database.data_model for the provided attributes and their meining.

The string is the actual file structure version returned. The same as version if version is not None.

Return type:

(dict, str)

static hdf5_class_string(hdf5_class)[source]

Return a string identifier of the given hdf5 class.

layout_to_xml()[source]

Create an etree.Element decsribing the currently defined layout.

read_fitsheader_from_dataset(dataset_key, **substitutions)[source]

Reads a FITS header from an HDF5 dataset.

The inverse of write_fitsheader_to_dataset().

Parameters:

h5dset – The dataset containing the header to read.

Returns:

The FITS header contained in the given dataset.

Return type:

fits.Header

write_fitsheader_to_dataset(dataset_key, fitsheader, **kwargs)[source]

Adds a FITS header to an HDF5 file as a dataset.

Parameters:
  • dataset_key (str) – The key identifying the dataset to add the header to.

  • fitsheader (fits.Header) – The header to save.

  • kwargs – Passed directly to _write_text_to_dataset().

Returns:

None