superphot_pipeline.hdf5_file module¶
Class Inheritance Diagram¶

Define a class for working with HDF5 files.
-
class
superphot_pipeline.hdf5_file.
HDF5File
(fname, mode, project_id=None, db_config_version=None)[source]¶ Bases:
abc.ABC
,h5py._hl.files.File
Base class for HDF5 pipeline products.
Supports defining the structure from the database or XML file, as well as generating markdown, and XML files describing the structure.
Implements backwards compatibility for different versions of the structure of files.
-
__init__
(fname, mode, project_id=None, db_config_version=None)[source]¶ Opens the given HDF5 file in the given mode.
-
classmethod
_add_paths
(xml_part, parent_path='/', parent_type='group')[source]¶ Add the paths in a part of an XML document.
Parameters: - xml_part – A part of an XML docuemnt to parse (parsed through xml.dom.minidom).
- parent_path – The path under which xml_part lives.
- parent_type – The type of entry the parent is (‘group’ or ‘dataset’).
Returns: None
-
_delete_obsolete_dataset
(parent, name, logger=None, log_extra={})[source]¶ Delete obsolete HDF5 dataset if it exists and update repacking flag.
Parameters: - parent – The parent group this entry belongs to.
- name – The name of the entry to check and delete. If the entry is not a dataset, an error is raised.
- logger – An object to issue log messages to.
- log_extra – Extra information to add to log messages.
Returns: None
Raises: Error.HDF5
– if an entry with the given name exists under parent, but is not a dataset.
-
_wiki_other_version_links
(version_list, target_version)[source]¶ Text to add to wiki to link to other configuration versions.
-
add_attribute
(attribute_key, attribute_value, attribute_dtype=None, if_exists='overwrite', logger=None, log_extra={}, **substitutions)[source]¶ Adds a single attribute to a dateset or a group.
Parameters: - attribute_key – The key in _destinations that corresponds to the attribute to add. If the key is not one of the recognized keys, h5file is not modified and the function silently exits.
- attribute_value – The value to give the attribute.
- attribute_dtype – Data type for the new attribute, None to determine automatically.
- if_exists –
What should be done if the attribute exists? Possible values are: - ignore: do not update but return the attribute’s value.
- overwrite: Change the value to the specified one.
- error: raise an exception.
- logger – An object to pass log messages to.
- log_extra – Extra information to attach to the log messages.
- substitutions – variables to substitute in HDF5 paths and names.
Returns: None.
-
add_file_dump
(fname, destination, link_name=False, delete_original=True, logger=None, external_log_extra={})[source]¶ Adds a byte by byte dump of a file to the data reduction file.
If the file does not exist an empty dataset is created.
Parameters: - fname – The name of the file to dump.
- destination – Passed directly to dump_file_like.
- link_name – Passed directly to dump_file_like.
- delete_original – If True, the file being dumped is deleted (default).
- logger – An object to emit log messages to.
- external_log_extra – extra information to add to log message.
Returns: None.
-
add_link
(target, name, logger=None, log_extra={})[source]¶ Adds a soft link to the HDF5 file.
Parameters: - target – The path to create a soft link to.
- name – The name to give to the link. Overwritten if it existts and is a link.
Returns: None
Raises: Error.HDF5
– if an object with the same name as the link exists, but is not a link.
-
add_single_dataset
(parent, name, data, creation_args, replace_nonfinite=None, logger=None, log_extra={}, **kwargs)[source]¶ Adds a single dataset to self.
If the target dataset already exists, it is deleted first and the name of the dataset is added to the root level Repack attribute.
Parameters: - parent – The full path of the group under which to place the new dataset (created if it does not exist).
- name – The name of the dataset.
- data – The values that should be written, a numpy array with appropriate type already set.
- creation_args – Additional arguments to pass to the create_dataset method.
- replace_nonfinite – If not None, any non-finite values are replaced with this value, it is also used as the fill value for the dataset.
- logger – An object to send log messages to.
- log_extra – Extra information to add to log messages
- kwargs – Ignored.
Returns: None
-
classmethod
configure_from_db
(db, target_project_id=0, target_version=None, datatype_from_db=False, save_to_file=None, update_trac=False, generate_markdown=False)[source]¶ Reads the defined the structure of the file from the database.
Parameters: - db – An instance of CalibDB connected to the calibration database.
- target_project_id – The project ID to configure for (falls back to the configuration for project_id=0 if no configuration is found for the requested value. Default: 0.
- target_version – The configuration version to set as default. If None (default), the largest configuration value found is used.
- datatype_from_db – Should the information about data type be read form the database?
- save_to_file – If not None, a file with the given names is created contaning an XML representation of the HDF5 file structure costructed from the database.
- generate_markdown – Generates markdown files suitable for committing to a GitHub repository as documentation. If given, this argument should be a directory where the files should be saved. Otherwise it should be something that tests as False.
Returns: None
-
classmethod
configure_from_xml
(xml, project_id=0, make_default=False)[source]¶ Defines the file structure from an xml.dom.minidom document.
Parameters: - xml – The xml.dom.minidom document defining the structure.
- project_id – The project ID this configuration applies to.
- make_default – Should this configuration be saved as the default one?
Returns: None
-
default_destinations
¶ Dictionary of where to place newly created elements in the HDF5 file.
There is an entry for all non-group elements as defined by self.get_element_type(). Each entry is a dictionary:
- parent: The path to the parent group/dataset where the new
- entry will be created.
- parent_type: The type of the parent - ‘group’ or ‘dataset’.
- name: The name to give to the new dataset. It may contain
- a substitution of %(ap_ind)?.
- creation_args: For datasets only. Should specify additional
- arguments for the create_dataset method.
- replace_nonfinite: For floating point datasets only. Specifies
- a value with which to replace any non-finite dataset entries before writing to the file. (Workaround the scaleoffset filter problem with non-finite values). After extracting a dataset, any values found to equal this are replaced by not-a-number.
-
destination_versions
¶ classmethod(function) -> method
Convert a function to be a class method.
A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:
- class C:
@classmethod def f(cls, arg1, arg2, ...):
...
It can be called either on the class (e.g. C.f()) or on an instance (e.g. C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.
Class methods are different than C++ or Java static methods. If you want those, see the staticmethod builtin.
-
destinations
¶ Specifies the destinations for self.elements in the current file.
See self.default_destinations.
-
dump_file_like
(file_like, destination, link_name=False, logger=None, external_log_extra={}, log_dumping=True)[source]¶ Adds a byte-by-byte dump of a file-like object to self.
Parameters: - file_like – A file-like object to dump.
- destination – The path in self to use for the dump.
- link_name – If this argument converts to True, a link with the given name is created pointing to destination.
- logger – An object to emit log messages to.
- external_log_extra – extra information to add to log message.
Returns: None.
-
element_uses
¶ A dictionary specifying what each dataset or property is used for.
This structure has two keys: ‘dataset’ and ‘attribute’ each of which should contain a dictionary with keys self.elements[‘dataset’] or self.elements[‘attribute’] and values are lists of strings specifying uses (only needed for generating documentation).
-
elements
¶ Identifying strings for the recognized elements of the HDF5 file.
Shoul be a dictionary-like object with values being a set of strings containing the identifiers of the HDF5 elements and keys:
- dataset: Identifiers for the datasets that could be included in
- the file.
- attribute: Identifiers for the attributes that could be included
- in the file.
- link: Identifiers for the links that could be included in
- the file.
-
generate_wiki
(xml_part, current_indent='', format_for='TRAC')[source]¶ Returns the part of the wiki corresponding to a part of the XML tree.
Parameters: - xml_part – The part of the XML tree to wikify.
- current_indent – The indent to use for the root element of xml_part.
Returns: - a python string with the wiki text to add (newlines
and all).
Return type: wiki
-
get_attribute
(attribute_key, default_value=None, **substitutions)[source]¶ Returns the attribute identified by the given key.
Parameters: - attribute_key – The key of the attribute to return. It must be one of the standard keys.
- default_value – If this is not None this values is returned if the attribute does not exist in the file, if None, not finding the attribute rasies Error.Sanity.
- substitutions – Any keys that must be substituted in the path (i.e. ap_ind, config_id, ...).
Returns: The value of the attribute.
Return type: value
-
classmethod
get_documentation_order
(element_type, element_id)[source]¶ Return a sorting key for the elements in a documentation level.
Parameters: - element_type – The type of entry this element corresponds to in the HDF5 file (group, dataset, attribute or link).
- element_id – The identifying string of this element.
Returns: - An integer such that elements for which lower values
are returned should appear before elements with higher values in the documentation if the two are on the same level.
Return type: sort_key
-
classmethod
get_element_type
(element_id)[source]¶ Return the type of HDF5 entry that corresponds to the given ID.
Parameters: element_id – The identifying string for an element present in the HDF5 file. Returns: - The type of HDF5 structure to create for this element.
- One of: ‘group’, ‘dataset’, ‘attribute’, ‘link’.
Return type: hdf5_type
-
get_file_dump
(dump_key)[source]¶ Returns as a string (with name attribute) a previously dumped file.
Parameters: dump_key – The key in self._destinations identifying the file to extract. Returns: The text of the dumped file. Return type: dump
-
static
get_hdf5_dtype
(dtype_string, hdf5_element_type)[source]¶ Return the dtype argument to when creating an HDF5 entry.
Parameters: - dtype_string – The string from the XML or database configuration specifying the type.
- hdf5_element_type – What is the element being created - dataset or attribute.
Returns: - Whatever should be passed as the dtype argument when
creating the given entry in the HDF5 file.
Return type: dtype
-
classmethod
get_layout_root_tag_name
()[source]¶ The name of the root tag in the layout configuration.
-
get_single_dataset
(dataset_key, sub_entry=None, expected_shape=None, optional=None, **substitute)[source]¶ Return a single dataset as a numpy float or int array.
Parameters: - dataset_key – The key in self._destinations identifying the dataset to read.
- sub_entry – If the dataset_key does not identify a single dataset, this value is used to select from among the multiple possible datasets (e.g. ‘field’ or ‘source’ for source IDs)
- expected_size – The size to use for the dataset if an empty dataset is found. If None, a zero-sized array is returned.
- optional – If not None and the dataset does not exist, this value is returned, otherwise if the dataset does not exist an exception is raised.
- substitute – Any arguments that should be substituted in the path (e.g. ap_ind or config_id).
Returns: - A numpy int/float array containing the identified dataset
from the HDF5 file.
Return type: data
-
classmethod
get_version_dtype
(element_id, version=None)[source]¶ What get_dtype would return for LC configured with the given version.
Parameters: - element_id – The string identifier for the quantity to return the data type for.
- version – The structure version for which to return the data type. If None, uses the latest configured version.
Returns: A numpy style data type to use for the quantity in LCs.
Return type: dtype
-
static
read_fitsheader_from_dataset
(h5dset)[source]¶ Reads a FITS header from an HDF5 dataset.
The inverse of fitsheader_to_dataset().
Parameters: h5dset – The dataset containing the header to read. Returns: Instance of fits.Header. Return type: header
-
static
read_text_from_dataset
(h5dset, as_file=False)[source]¶ Reads a text from an HDF5 dataset.
The inverse of text_to_dataset().
Parameters: h5dset – The dataset containing the text to read. Returns: Numpy byte array (dtype=’i1’) containing the text. Return type: text
-
static
write_fitsheader_to_dataset
(fitsheader, *args, **kwargs)[source]¶ Adds a FITS header to an HDF5 file as a dataset.
Parameters: - fitsheader – The header to save (fits.Header instance).
- args – Passed directly to text_to_dataset().
- kwargs – Passed directly to text_to_dataset(.)
Returns: None
-
static
write_text_to_dataset
(text, h5group, dset_path, creation_args=None, **attributes)[source]¶ Adds ASCII text/file as a dateset to an HDF5 file.
Parameters: - text – The text or file to add. If it is an open file, the contents is dumped, if it is a python2 string or a python3 bytes, the value is stored.
- h5group – An HDF5 group (could be the root group, i.e. an h5py.File opened for writing).
- dset_path – The path for the new dataset, either absolute or relative to h5group.
- creation_args – Keyword arguments to pass to create_dataset(). If None, defaults to dict(compression=’gzip’, compression_opts=9).
- compression_opts – see same name argument in h5py.File.create_dataset.
- attributes – Added as attributes with the same name to the the dataset.
Returns: None
-