superphot_pipeline.hdf5_file module¶

Class Inheritance Diagram¶

Inheritance diagram of ABC, HDF5File, HDF5LayoutError, StringIO

Define a class for working with HDF5 files.

class superphot_pipeline.hdf5_file.HDF5File(fname, mode, project_id=None, db_config_version=None)[source]¶

Bases: abc.ABC, h5py._hl.files.File

Inheritance diagram of superphot_pipeline.hdf5_file.HDF5File

Base class for HDF5 pipeline products.

Supports defining the structure from the database or XML file, as well as generating markdown, and XML files describing the structure.

Implements backwards compatibility for different versions of the structure of files.

__init__(fname, mode, project_id=None, db_config_version=None)[source]¶: Opens the given HDF5 file in the given mode.

classmethod _add_paths(xml_part, parent_path='/', parent_type='group')[source]¶

Add the paths in a part of an XML document.

Parameters:	xml_part – A part of an XML docuemnt to parse (parsed through xml.dom.minidom). parent_path – The path under which xml_part lives. parent_type – The type of entry the parent is (‘group’ or ‘dataset’).
Returns:	None

_delete_obsolete_dataset(parent, name, logger=None, log_extra={})[source]¶

Delete obsolete HDF5 dataset if it exists and update repacking flag.

Parameters:	parent – The parent group this entry belongs to. name – The name of the entry to check and delete. If the entry is not a dataset, an error is raised. logger – An object to issue log messages to. log_extra – Extra information to add to log messages.
Returns:	None
Raises:	`Error.HDF5` – if an entry with the given name exists under parent, but is not a dataset.

_wiki_other_version_links(version_list, target_version)[source]¶: Text to add to wiki to link to other configuration versions.

add_attribute(attribute_key, attribute_value, attribute_dtype=None, if_exists='overwrite', logger=None, log_extra={}, **substitutions)[source]¶

Adds a single attribute to a dateset or a group.

Parameters:

attribute_key – The key in _destinations that corresponds to the attribute to add. If the key is not one of the recognized keys, h5file is not modified and the function silently exits.
attribute_value – The value to give the attribute.
attribute_dtype – Data type for the new attribute, None to determine automatically.
if_exists –
What should be done if the attribute exists? Possible values are: - ignore: do not update but return the attribute’s value.
- overwrite: Change the value to the specified one.
- error: raise an exception.
logger – An object to pass log messages to.
log_extra – Extra information to attach to the log messages.
substitutions – variables to substitute in HDF5 paths and names.

Returns:

None.

add_file_dump(fname, destination, link_name=False, delete_original=True, logger=None, external_log_extra={})[source]¶

Adds a byte by byte dump of a file to the data reduction file.

If the file does not exist an empty dataset is created.

Parameters:	fname – The name of the file to dump. destination – Passed directly to dump_file_like. link_name – Passed directly to dump_file_like. delete_original – If True, the file being dumped is deleted (default). logger – An object to emit log messages to. external_log_extra – extra information to add to log message.
Returns:	None.

add_link(target, name, logger=None, log_extra={})[source]¶

Adds a soft link to the HDF5 file.

Parameters:	target – The path to create a soft link to. name – The name to give to the link. Overwritten if it existts and is a link.
Returns:	None
Raises:	`Error.HDF5` – if an object with the same name as the link exists, but is not a link.

add_single_dataset(parent, name, data, creation_args, replace_nonfinite=None, logger=None, log_extra={}, **kwargs)[source]¶

Adds a single dataset to self.

If the target dataset already exists, it is deleted first and the name of the dataset is added to the root level Repack attribute.

Parameters:

parent – The full path of the group under which to place the new dataset (created if it does not exist).
name – The name of the dataset.
data – The values that should be written, a numpy array with appropriate type already set.
creation_args – Additional arguments to pass to the create_dataset method.
replace_nonfinite – If not None, any non-finite values are replaced with this value, it is also used as the fill value for the dataset.
logger – An object to send log messages to.
log_extra – Extra information to add to log messages
kwargs – Ignored.

Returns:

None

classmethod configure_from_db(db, target_project_id=0, target_version=None, datatype_from_db=False, save_to_file=None, update_trac=False, generate_markdown=False)[source]¶

Reads the defined the structure of the file from the database.

Parameters:

db – An instance of CalibDB connected to the calibration database.
target_project_id – The project ID to configure for (falls back to the configuration for project_id=0 if no configuration is found for the requested value. Default: 0.
target_version – The configuration version to set as default. If None (default), the largest configuration value found is used.
datatype_from_db – Should the information about data type be read form the database?
save_to_file – If not None, a file with the given names is created contaning an XML representation of the HDF5 file structure costructed from the database.
generate_markdown – Generates markdown files suitable for committing to a GitHub repository as documentation. If given, this argument should be a directory where the files should be saved. Otherwise it should be something that tests as False.

Returns:

None

classmethod configure_from_xml(xml, project_id=0, make_default=False)[source]¶

Defines the file structure from an xml.dom.minidom document.

Parameters:	xml – The xml.dom.minidom document defining the structure. project_id – The project ID this configuration applies to. make_default – Should this configuration be saved as the default one?
Returns:	None

default_destinations¶

Dictionary of where to place newly created elements in the HDF5 file.

There is an entry for all non-group elements as defined by self.get_element_type(). Each entry is a dictionary:

parent: The path to the parent group/dataset where the new

entry will be created.

parent_type: The type of the parent - ‘group’ or ‘dataset’.

name: The name to give to the new dataset. It may contain

a substitution of %(ap_ind)?.

creation_args: For datasets only. Should specify additional

arguments for the create_dataset method.

replace_nonfinite: For floating point datasets only. Specifies

a value with which to replace any non-finite dataset entries before writing to the file. (Workaround the scaleoffset filter problem with non-finite values). After extracting a dataset, any values found to equal this are replaced by not-a-number.

destination_versions¶

classmethod(function) -> method

Convert a function to be a class method.

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C:

@classmethod def f(cls, arg1, arg2, ...):

...

It can be called either on the class (e.g. C.f()) or on an instance (e.g. C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, see the staticmethod builtin.

destinations¶

Specifies the destinations for self.elements in the current file.

See self.default_destinations.

dump_file_like(file_like, destination, link_name=False, logger=None, external_log_extra={}, log_dumping=True)[source]¶

Adds a byte-by-byte dump of a file-like object to self.

Parameters:	file_like – A file-like object to dump. destination – The path in self to use for the dump. link_name – If this argument converts to True, a link with the given name is created pointing to destination. logger – An object to emit log messages to. external_log_extra – extra information to add to log message.
Returns:	None.

element_uses¶

A dictionary specifying what each dataset or property is used for.

This structure has two keys: ‘dataset’ and ‘attribute’ each of which should contain a dictionary with keys self.elements[‘dataset’] or self.elements[‘attribute’] and values are lists of strings specifying uses (only needed for generating documentation).

elements¶

Identifying strings for the recognized elements of the HDF5 file.

Shoul be a dictionary-like object with values being a set of strings containing the identifiers of the HDF5 elements and keys:

dataset: Identifiers for the datasets that could be included in

the file.

attribute: Identifiers for the attributes that could be included

in the file.

link: Identifiers for the links that could be included in

the file.

generate_wiki(xml_part, current_indent='', format_for='TRAC')[source]¶

Returns the part of the wiki corresponding to a part of the XML tree.

Parameters:

xml_part – The part of the XML tree to wikify.
current_indent – The indent to use for the root element of xml_part.

Returns:

a python string with the wiki text to add (newlines: and all).

Return type:

wiki

get_attribute(attribute_key, default_value=None, **substitutions)[source]¶

Returns the attribute identified by the given key.

Parameters:	attribute_key – The key of the attribute to return. It must be one of the standard keys. default_value – If this is not None this values is returned if the attribute does not exist in the file, if None, not finding the attribute rasies Error.Sanity. substitutions – Any keys that must be substituted in the path (i.e. ap_ind, config_id, ...).
Returns:	The value of the attribute.
Return type:	value

classmethod get_documentation_order(element_type, element_id)[source]¶

Return a sorting key for the elements in a documentation level.

Parameters:

element_type – The type of entry this element corresponds to in the HDF5 file (group, dataset, attribute or link).
element_id – The identifying string of this element.

Returns:

An integer such that elements for which lower values: are returned should appear before elements with higher values in the documentation if the two are on the same level.

Return type:

sort_key

get_dtype(element_id)[source]¶: Return numpy style data type string for the given element_id.

classmethod get_element_type(element_id)[source]¶

Return the type of HDF5 entry that corresponds to the given ID.

Parameters:	element_id – The identifying string for an element present in the HDF5 file.
Returns:	The type of HDF5 structure to create for this element. One of: ‘group’, ‘dataset’, ‘attribute’, ‘link’.
Return type:	hdf5_type

get_file_dump(dump_key)[source]¶

Returns as a string (with name attribute) a previously dumped file.

Parameters:	dump_key – The key in self._destinations identifying the file to extract.
Returns:	The text of the dumped file.
Return type:	dump

static get_hdf5_dtype(dtype_string, hdf5_element_type)[source]¶

Return the dtype argument to when creating an HDF5 entry.

Parameters:

dtype_string – The string from the XML or database configuration specifying the type.
hdf5_element_type – What is the element being created - dataset or attribute.

Returns:

Whatever should be passed as the dtype argument when: creating the given entry in the HDF5 file.

Return type:

dtype

classmethod get_layout_root_tag_name()[source]¶: The name of the root tag in the layout configuration.

get_single_dataset(dataset_key, sub_entry=None, expected_shape=None, optional=None, **substitute)[source]¶

Return a single dataset as a numpy float or int array.

Parameters:

dataset_key – The key in self._destinations identifying the dataset to read.
sub_entry – If the dataset_key does not identify a single dataset, this value is used to select from among the multiple possible datasets (e.g. ‘field’ or ‘source’ for source IDs)
expected_size – The size to use for the dataset if an empty dataset is found. If None, a zero-sized array is returned.
optional – If not None and the dataset does not exist, this value is returned, otherwise if the dataset does not exist an exception is raised.
substitute – Any arguments that should be substituted in the path (e.g. ap_ind or config_id).

Returns:

A numpy int/float array containing the identified dataset: from the HDF5 file.

Return type:

data

classmethod get_version_dtype(element_id, version=None)[source]¶

What get_dtype would return for LC configured with the given version.

Parameters:	element_id – The string identifier for the quantity to return the data type for. version – The structure version for which to return the data type. If None, uses the latest configured version.
Returns:	A numpy style data type to use for the quantity in LCs.
Return type:	dtype

static read_fitsheader_from_dataset(h5dset)[source]¶

Reads a FITS header from an HDF5 dataset.

The inverse of fitsheader_to_dataset().

Parameters:	h5dset – The dataset containing the header to read.
Returns:	Instance of fits.Header.
Return type:	header

static read_text_from_dataset(h5dset, as_file=False)[source]¶

Reads a text from an HDF5 dataset.

The inverse of text_to_dataset().

Parameters:	h5dset – The dataset containing the text to read.
Returns:	Numpy byte array (dtype=’i1’) containing the text.
Return type:	text

static write_fitsheader_to_dataset(fitsheader, *args, **kwargs)[source]¶

Adds a FITS header to an HDF5 file as a dataset.

Parameters:	fitsheader – The header to save (fits.Header instance). args – Passed directly to text_to_dataset(). kwargs – Passed directly to text_to_dataset(.)
Returns:	None

static write_text_to_dataset(text, h5group, dset_path, creation_args=None, **attributes)[source]¶

Adds ASCII text/file as a dateset to an HDF5 file.

Parameters:

text – The text or file to add. If it is an open file, the contents is dumped, if it is a python2 string or a python3 bytes, the value is stored.
h5group – An HDF5 group (could be the root group, i.e. an h5py.File opened for writing).
dset_path – The path for the new dataset, either absolute or relative to h5group.
creation_args – Keyword arguments to pass to create_dataset(). If None, defaults to dict(compression=’gzip’, compression_opts=9).
compression_opts – see same name argument in h5py.File.create_dataset.
attributes – Added as attributes with the same name to the the dataset.

Returns:

None

Table Of Contents

Previous topic

Next topic

This Page

superphot_pipeline.hdf5_file module¶

Class Inheritance Diagram¶