ch_util.andata
Analysis data format
Functions
|
Create a CorrData object from a 1.0.0 archive version acq. |
|
Create an Andata object from a version 2.0.0 archive format acq. |
|
Pick a subclass of |
|
Create a version tuple from a version string. |
Classes
alias of |
|
|
CHIME data in analysis format. |
|
Provides high level reading of CHIME data. |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
alias of |
|
|
Subclass of |
|
Subclass of |
Exceptions
Exception raised when something unexpected happens with the data. |
- exception ch_util.andata.AnDataError[source]
Bases:
Exception
Exception raised when something unexpected happens with the data.
- class ch_util.andata.BaseData(h5_data=None, **kwargs)[source]
Bases:
TOData
CHIME data in analysis format.
Inherits from
caput.memh5.BasicCont
.This is intended to be the main data class for the post acquisition/real-time analysis parts of the pipeline. This class is laid out very similarly to how the data is stored in analysis format hdf5 files and the data in this class can be optionally stored in such an hdf5 file instead of in memory.
- Parameters:
h5_data (h5py.Group, memh5.MemGroup or hdf5 filename, optional) – Underlying h5py like data container where data will be stored. If not provided a new
caput.memh5.MemGroup
instance will be created.
Used to pick which subclass to instantiate based on attributes in data.
- property cal
Stores calibration schemes for the datasets.
Each entry is a calibration scheme which itself is a dict storing meta-data about calibration.
Do not try to add a new entry by assigning to an element of this property. Use
create_cal()
instead.- Returns:
cal – Calibration schemes.
- Return type:
read only dictionary
- static convert_time(time)[source]
Overload to provide support for multiple time formats.
Method accepts scalar times in supported formats and converts them to the same format as
self.time
.
- property datasets
Stores hdf5 datasets holding all data.
Each dataset can reference a calibration scheme in
datasets[name].attrs['cal']
which refers to an entry incal
.Do not try to add a new dataset by assigning to an item of this property. Use create_dataset instead.
- Returns:
datasets – Entries are
h5py
orcaput.memh5
datasets.- Return type:
read only dictionary
- property flags
Datasets representing flags and data weights.
- Returns:
flags – Entries are
h5py
orcaput.memh5
datasets.- Return type:
read only dictionary
- classmethod from_acq_h5(acq_files, start=None, stop=None, datasets=None, out_group=None, **kwargs)[source]
Convert acquisition format hdf5 data to analysis data object.
Reads hdf5 data produced by the acquisition system and converts it to analysis format in memory.
- Parameters:
acq_files (filename, h5py.File or list there-of or filename pattern) – Files to convert from acquisition format to analysis format. Filename patterns with wild cards (e.g. “foo*.h5”) are supported.
start (integer, optional) – What frame to start at in the full set of files.
stop (integer, optional) – What frame to stop at in the full set of files.
datasets (list of strings) – Names of datasets to include from acquisition files. Default is to include all datasets found in the acquisition files.
out_group (h5py.Group, hdf5 filename or memh5.Group) – Underlying hdf5 like container that will store the data for the BaseData instance.
Examples
Examples are analogous to those of
CorrData.from_acq_h5()
.
- property ntime
Length of the time axis of the visibilities.
- property time
The ‘time’ axis centres as Unix/POSIX time.
- class ch_util.andata.BaseReader(files)[source]
Bases:
Reader
Provides high level reading of CHIME data.
You do not want to use this class, but rather one of its inherited classes (
CorrReader
,HKReader
,WeatherReader
).Parses and stores meta-data from file headers allowing for the interpretation and selection of the data without reading it all from disk.
- Parameters:
files (filename, h5py.File or list there-of or filename pattern) – Files containing data. Filename patterns with wild cards (e.g. “foo*.h5”) are supported.
- read(out_group=None)[source]
Read the selected data.
- Parameters:
out_group (h5py.Group, hdf5 filename or memh5.Group) – Underlying hdf5 like container that will store the data for the BaseData instance.
- Returns:
data – Data read from
files
based on the selections given intime_sel
,prod_sel
, andfreq_sel
.- Return type:
- select_time_range(start_time=None, stop_time=None)[source]
Sets
time_sel
to include a time range.The times from the samples selected will have bin centre timestamps that are bracketed by the given start_time and stop_time.
- Parameters:
start_time (float or
datetime.datetime
) – If a float, this is a Unix/POSIX time. Affects the first element oftime_sel
. Default leaves it unchanged.stop_time (float or
datetime.datetime
) – If a float, this is a Unix/POSIX time. Affects the second element oftime_sel
. Default leaves it unchanged.
- class ch_util.andata.CalibrationGainData(h5_data=None, **kwargs)[source]
Bases:
GainData
Subclass of
GainData
for gain acquisitions.Used to pick which subclass to instantiate based on attributes in data.
- property gain
Aliases the gain dataset.
- property nsource
Number of sources of gains.
- property source
Names of the sources of gains.
- property source_gains
Dictionary that allows look up of source gains based on source name.
- property source_weights
Dictionary that allows look up of source weights based on source name.
- property weight
Aliases the weight dataset.
- class ch_util.andata.CalibrationGainReader(files)[source]
Bases:
BaseReader
Subclass of
BaseReader
for calibration gain data.- data_class
alias of
CalibrationGainData
- class ch_util.andata.CorrData(h5_data=None, **kwargs)[source]
Bases:
BaseData
Subclass of
BaseData
for correlation data.Used to pick which subclass to instantiate based on attributes in data.
- property dataset_id
Access dataset id dataset in unicode format.
- property freq
The spectral frequency axis as bin centres in MHz.
- classmethod from_acq_h5(acq_files, start=None, stop=None, **kwargs)[source]
Convert acquisition format hdf5 data to analysis data object.
This method overloads the one in BaseData.
Changed Jan. 22, 2016: input arguments are now
(acq_files, start, stop, **kwargs)
instead of(acq_files, start, stop, prod_sel, freq_sel, datasets, out_group)
.Reads hdf5 data produced by the acquisition system and converts it to analysis format in memory.
- Parameters:
acq_files (filename, h5py.File or list there-of or filename pattern) – Files to convert from acquisition format to analysis format. Filename patterns with wild cards (e.g. “foo*.h5”) are supported.
start (integer, optional) – What frame to start at in the full set of files.
stop (integer, optional) – What frame to stop at in the full set of files.
stack_sel (valid numpy index) – Used to select a subset of the stacked correlation products. Only one of stack_sel, prod_sel, and input_sel may be specified, with prod_sel preferred over input_sel and stack_sel proferred over both.
h5py
fancy indexing supported but to be used with caution due to poor reading performance.prod_sel (valid numpy index) – Used to select a subset of correlation products. Only one of stack_sel, prod_sel, and input_sel may be specified, with prod_sel preferred over input_sel and stack_sel proferred over both.
h5py
fancy indexing supported but to be used with caution due to poor reading performance.input_sel (valid numpy index) – Used to select a subset of correlator inputs. Only one of stack_sel, prod_sel, and input_sel may be specified, with prod_sel preferred over input_sel and stack_sel proferred over both.
h5py
fancy indexing supported but to be used with caution due to poor reading performance.freq_sel (valid numpy index) – Used to select a subset of frequencies.
h5py
fancy indexing supported but to be used with caution due to poor reading performance.datasets (list of strings) – Names of datasets to include from acquisition files. Default is to include all datasets found in the acquisition files.
out_group (h5py.Group, hdf5 filename or memh5.Group) – Underlying hdf5 like container that will store the data for the BaseData instance.
apply_gain (boolean, optional) – Whether to apply the inverse gains to the visibility datasets.
renormalize (boolean, optional) – Whether to renormalize for dropped packets.
distributed (boolean, optional) – Load data into a distributed dataset.
comm (MPI.Comm) – Communicator to distributed over. Use MPI.COMM_WORLD if not set.
- Returns:
data – Loaded data object.
- Return type:
Examples
Suppose we have two acquisition format files (this test data is included in the ch_util repository):
>>> import os >>> import glob >>> from . import test_andata >>> os.chdir(test_andata.data_path) >>> print(glob.glob('test_acq.h5*')) ['test_acq.h5.0001', 'test_acq.h5.0002']
These can be converted into one big analysis format data object:
>>> data = CorrData.from_acq_h5('test_acq.h5*') >>> print(data.vis.shape) (1024, 36, 31)
If we only want a subset of the total frames (time bins) in these files we can supply start and stop indices.
>>> data = CorrData.from_acq_h5('test_acq.h5*', start=5, stop=-3) >>> print(data.vis.shape) (1024, 36, 23)
If we want a subset of the correlation products or spectral frequencies, specify the prod_sel or freq_sel respectively:
>>> data = CorrData.from_acq_h5( ... 'test_acq.h5*', ... prod_sel=[0, 8, 15, 21], ... freq_sel=slice(5, 15), ... ) >>> print(data.vis.shape) (10, 4, 31) >>> data = CorrData.from_acq_h5('test_acq.h5*', prod_sel=1, ... freq_sel=slice(None, None, 10)) >>> print(data.vis.shape) (103, 1, 31)
The underlying hdf5-like container that holds the analysis format data can also be specified.
>>> group = memh5.MemGroup() >>> data = CorrData.from_acq_h5('test_acq.h5*', out_group=group) >>> print(group['vis'].shape) (1024, 36, 31) >>> group['vis'] is data.vis True
- classmethod from_acq_h5_fast(fname, comm=None, freq_sel=None, start=None, stop=None)[source]
Efficiently read a CorrData file in a distributed fashion.
This reads a single file from disk into a distributed container. In contrast to to CorrData.from_acq_h5 it is more restrictive, allowing only contiguous slices of the frequency and time axes, and no down selection of the input/product/stack axis.
- Parameters:
fname (str) – File name to read. Only supports one file at a time.
comm (MPI.Comm, optional) – MPI communicator to distribute over. By default this will use MPI.COMM_WORLD.
freq_sel (slice, optional) – A selection over the frequency axis. Only slice objects are supported. If not set, read all frequencies.
start (int, optional) – Start and stop indexes of the time selection.
stop (int, optional) – Start and stop indexes of the time selection.
- Returns:
data – The CorrData container.
- Return type:
- property gain
Convenience access to the gain dataset.
Equivalent to self.datasets[‘gain’].
- property input_flags
Convenience access to the input flags dataset.
Equivalent to self.flags[‘inputs’].
- property nfreq
Length of the freq axis.
- property nprod
Length of the prod axis.
- property prod
The correlation product axis as channel pairs.
- property prodstack
A pair of input indices representative of those in the stack.
Note, these are correctly conjugated on return, and so calculations of the baseline and polarisation can be done without additionally looking up the stack conjugation.
- property stack
The correlation product axis as channel pairs.
- property vis
Convenience access to the visibilities array.
Equivalent to self.datasets[‘vis’].
- property weight
Convenience access to the visibility weight array.
Equivalent to self.flags[‘vis_weight’].
- class ch_util.andata.CorrReader(files)[source]
Bases:
BaseReader
Subclass of
BaseReader
for correlator data.- property freq
Spectral frequency bin centres in data files.
- property freq_sel
Which frequencies to read.
- Returns:
freq_sel – Valid numpy index for a 1D array, specifying what data to read along the frequency axis.
- Return type:
1D data selection
- property input
Correlator inputs in data files.
- property input_sel
Which correlator intputs to read.
- Returns:
input_sel – Valid numpy index for a 1D array, specifying what data to read along the correlation product axis.
- Return type:
1D data selection
- property prod
Correlation products in data files.
- property prod_sel
Which correlation products to read.
- Returns:
prod_sel – Valid numpy index for a 1D array, specifying what data to read along the correlation product axis.
- Return type:
1D data selection
- read(out_group=None)[source]
Read the selected data.
- Parameters:
out_group (h5py.Group, hdf5 filename or memh5.Group) – Underlying hdf5 like container that will store the data for the BaseData instance.
- Returns:
data – Data read from
files
based on the selections given intime_sel
,prod_sel
, andfreq_sel
.- Return type:
- select_freq_physical(frequencies)[source]
Sets
freq_sel
to include given physical frequencies.- Parameters:
frequencies (list of floats) – Frequencies to select. Physical frequencies are matched to indices on a best match basis.
- select_freq_range(freq_low=None, freq_high=None, freq_step=None)[source]
Sets
freq_sel
to given physical frequency range.Frequencies selected will have bin centres bracked by provided range.
- Parameters:
freq_low (float) – Lower end of the frequency range in MHz. Default is the lower edge of the band.
freq_high (float) – Upper end of the frequency range in MHz. Default is the upper edge of the band.
freq_step (float) – How much bandwidth to skip over between samples in MHz. This value is approximate. Default is to include all samples in given range.
- class ch_util.andata.DigitalGainData(h5_data=None, **kwargs)[source]
Bases:
GainData
Subclass of
GainData
for digitalgain acquisitions.Used to pick which subclass to instantiate based on attributes in data.
- property compute_time
Unix timestamp indicating when the digital gain was computed.
- property gain
The digital gain applied to the channelized data.
- property gain_coeff
The coefficient of the digital gain applied to the channelized data.
- property gain_exp
The exponent of the digital gain applied to the channelized data.
- class ch_util.andata.DigitalGainReader(files)[source]
Bases:
BaseReader
Subclass of
BaseReader
for digital gain data.- data_class
alias of
DigitalGainData
- class ch_util.andata.FlagInputData(h5_data=None, **kwargs)[source]
Bases:
GainFlagData
Subclass of
GainFlagData
for flaginput acquisitions.Used to pick which subclass to instantiate based on attributes in data.
- property flag
Aliases the flag dataset.
- property source_flags
Dictionary that allow look up of source flags based on source name.
- class ch_util.andata.FlagInputReader(files)[source]
Bases:
BaseReader
Subclass of
BaseReader
for input flag data.- data_class
alias of
FlagInputData
- class ch_util.andata.GainData(h5_data=None, **kwargs)[source]
Bases:
GainFlagData
Subclass of
GainFlagData
for gain and digitalgain acquisitions.Used to pick which subclass to instantiate based on attributes in data.
- property freq
The spectral frequency axis as bin centres in MHz.
- property nfreq
Number of frequency bins.
- class ch_util.andata.GainFlagData(h5_data=None, **kwargs)[source]
Bases:
BaseData
Subclass of
BaseData
for gain, digitalgain, and flag input acquisitions.These acquisitions consist of a collection of updates to the real-time pipeline ordered chronologically. In most cases the updates do not occur at a regular cadence. The time that each update occured can be accessed via self.index_map[‘update_time’]. In addition, each update is given a unique update ID that can be accessed via self.datasets[‘update_id’] and can be searched using the self.search_update_id method.
Used to pick which subclass to instantiate based on attributes in data.
- property input
Correlator inputs.
- property ninput
Number of correlator inputs.
- property ntime
Number of updates.
- resample(dataset, timestamp, transpose=False)[source]
Return a dataset resampled at specific times.
- Parameters:
dataset (string) – Name of the dataset to resample.
timestamp (np.ndarray) – Unix timestamps.
transpose (bool) – Tranpose the data such that time is the fastest varying axis. By default time will be the slowest varying axis.
- Returns:
data – The dataset resampled at the desired times and transposed if requested.
- Return type:
np.ndarray
- search_update_id(pattern, is_regex=False)[source]
Find the index into the update_time axis for a particular update_id.
- Parameters:
pattern (str) – The desired update_id or a glob pattern to search.
is_regex (bool) – Set to True if pattern is a regular expression.
- Returns:
index – Index into the update_time axis that will yield all updates whose update_id matches the requested pattern.
- Return type:
np.ndarray of dtype = int
- search_update_time(timestamp)[source]
Find the index into the update_time axis that is valid for specific times.
For each time returns the most recent update the occured before that time.
- Parameters:
timestamp (np.ndarray of unix timestamp) – Unix timestamps.
- Returns:
index – Index into the update_time axis that will yield values that are valid for the requested timestamps.
- Return type:
np.ndarray of dtype = int
- property time
Returns index_map[‘update_time’] for caput.tod functionality.
- property update_id
Aliases the update_id dataset.
- class ch_util.andata.HKData(h5_data=None, **kwargs)[source]
Bases:
BaseData
Subclass of
BaseData
for housekeeping data.Used to pick which subclass to instantiate based on attributes in data.
- property atmel
Get the ATMEL board that took these data.
- Returns:
comp – The ATMEL component that took these data.
- Return type:
layout.component
- chan(mux=-1)[source]
Convenience access to the list of channels in a given mux.
- Parameters:
mux (int) – A mux number. For housekeeping files with no multiplexing (e.g., FLA’s), leave this as
-1
.- Returns:
n – The channels numbers.
- Return type:
list
- Raises:
ValueError – Raised if mux does not exist.
- classmethod from_acq_h5(acq_files, start=None, stop=None, datasets=None, out_group=None)[source]
Convert acquisition format hdf5 data to analysis data object.
This method overloads the one in BaseData.
Reads hdf5 data produced by the acquisition system and converts it to analysis format in memory.
- Parameters:
acq_files (filename, h5py.File or list there-of or filename pattern) – Files to convert from acquisition format to analysis format. Filename patterns with wild cards (e.g. “foo*.h5”) are supported.
start (integer, optional) – What frame to start at in the full set of files.
stop (integer, optional) – What frame to stop at in the full set of files.
datasets (list of strings) – Names of datasets to include from acquisition files. Default is to include all datasets found in the acquisition files.
out_group (h5py.Group, hdf5 filename or memh5.Group) – Underlying hdf5 like container that will store the data for the BaseData instance.
Examples
Examples are analogous to those of
CorrData.from_acq_h5()
.
- property mux
Get the list of muxes in the data.
- nchan(mux=-1)[source]
Convenience access to the number of channels in a given mux.
- Parameters:
mux (int) – A mux number. For housekeeping files with no multiplexing (e.g., FLA’s), leave this as
-1
.- Returns:
n – The number of channels
- Return type:
int
- Raises:
ValueError – Raised if mux does not exist.
- property nmux
Get the number of muxes in the data.
- tod(chan, mux=-1)[source]
Convenience access to a single time-ordered datastream (TOD).
- Parameters:
chan (int) – A channel number. (Generally, they should be in the range 0–7 for non-multiplexed data and 0–15 for multiplexed data.)
mux (int) – A mux number. For housekeeping files with no multiplexing (e.g., FLA’s), leave this as
-1
.
- Returns:
tod – A 1D array of values for the requested channel/mux combination. Note that a reference to the data in the dataset is returned; this method does not make a copy.
- Return type:
numpy.array
- Raises:
ValueError – Raised if one of chan or mux is not present in any dataset.
- class ch_util.andata.HKPData(data_group=None, distributed=False, comm=None, file_format=None)[source]
Bases:
MemDiskGroup
Subclass of
BaseData
for housekeeping data.- classmethod from_acq_h5(acq_files, start=None, stop=None, metrics=None, datasets=None, **kwargs)[source]
Load in the housekeeping files.
- Parameters:
acq_files (list) – List of files to load.
start (datetime or float, optional) – Start and stop times for the range of data to load. Default is all.
stop (datetime or float, optional) – Start and stop times for the range of data to load. Default is all.
metrics (list) – Names of metrics to load. Default is all.
datasets (list) – Synonym for metrics (the value of metrics will take precedence).
- Returns:
data
- Return type:
- static metrics(acq_files)[source]
Get the names of the metrics contained within the files.
- Parameters:
acq_files (list) – List of acquisition filenames.
- Returns:
metrics
- Return type:
list
- resample(metric_name, rule, how='mean', unstack=False, **kwargs)[source]
Resample the metric onto a regular grid of time.
This internally uses the Pandas resampling functionality so that documentation is a useful reference. This will return the metric with the labels as a series of multi-level columns.
- Parameters:
metric_name (str) – Name of metric to resample.
rule (str) – The set of times to resample onto (example ‘30S’, ‘1Min’, ‘2D’). See the pandas docs for a full description.
how (str or callable, optional) – How should we combine samples to regrid the data? This takes any valid argument for the the pandas apply method. Useful options are ‘mean’, ‘sum’, ‘min’, ‘max’ and ‘std’.
unstack (bool, optional) – Unstack the data, i.e. return with the labels as hierarchial columns.
kwargs – Any remaining kwargs are passed to the pandas.DataFrame.resample method to give fine grained control of the resampling.
- Returns:
df – A dataframe resampled onto a regular grid. Labels now appear as part of multi-level columns.
- Return type:
pandas.DataFrame
- class ch_util.andata.HKPReader(files)[source]
Bases:
BaseReader
Subclass of
BaseReader
for HKP data.
- class ch_util.andata.HKReader(files)[source]
Bases:
BaseReader
Subclass of
BaseReader
for HK data.
- class ch_util.andata.RawADCData(h5_data=None, **kwargs)[source]
Bases:
BaseData
Subclass of
BaseData
for raw ADC data.Used to pick which subclass to instantiate based on attributes in data.
- class ch_util.andata.RawADCReader(files)[source]
Bases:
BaseReader
Subclass of
BaseReader
for raw ADC data.- data_class
alias of
RawADCData
- ch_util.andata.Reader
alias of
CorrReader
- class ch_util.andata.WeatherData(h5_data=None, **kwargs)[source]
Bases:
BaseData
Subclass of
BaseData
for weather data.Used to pick which subclass to instantiate based on attributes in data.
- property temperature
For easy access to outside weather station temperature. Needs to be able to extrac temperatures from both mingun_weather files and chime_weather files.
- property time
Needs to be able to extrac times from both mingun_weather files and chime_weather files.
- class ch_util.andata.WeatherReader(files)[source]
Bases:
BaseReader
Subclass of
BaseReader
for weather data.- data_class
alias of
WeatherData
- ch_util.andata.andata_from_acq1(acq_files, start, stop, prod_sel, freq_sel, datasets, out_group)[source]
Create a CorrData object from a 1.0.0 archive version acq.
- Parameters:
acq_files (filename, h5py.File or list there-of or filename pattern) – Files to convert from acquisition format to analysis format. Filename patterns with wild cards (e.g. “foo*.h5”) are supported.
start (int) – What frame to start at in the full set of files.
stop (int) – What frame to stop at in the full set of files.
prod_sel (1D data selection) – Valid numpy index for a 1D array, specifying what data to read along the correlation product axis.
freq_sel (1D data selection) – Valid numpy index for a 1D array, specifying what data to read along the frequency axis.
datasets (list of strings) – Names of datasets to include from acquisition files. Default is to include all datasets found in the acquisition files.
out_group (h5py.Group, hdf5 filename or memh5.Group) – Underlying hdf5 like container that will store the data for the BaseData instance.
- Returns:
A CorrData object with the requested data.
- Return type:
corrdata
- ch_util.andata.andata_from_archive2(cls, acq_files, start, stop, stack_sel, prod_sel, input_sel, freq_sel, datasets, out_group)[source]
Create an Andata object from a version 2.0.0 archive format acq.
- Parameters:
cls – class of object to create
acq_files (filename, h5py.File or list there-of or filename pattern) – Files to convert from acquisition format to analysis format. Filename patterns with wild cards (e.g. “foo*.h5”) are supported.
start (int) – What frame to start at in the full set of files.
stop (int) – What frame to stop at in the full set of files.
prod_sel (1D data selection) – Valid numpy index for a 1D array, specifying what data to read along the correlation product axis.
freq_sel (1D data selection) – Valid numpy index for a 1D array, specifying what data to read along the frequency axis.
datasets (list of strings) – Names of datasets to include from acquisition files. Default is to include all datasets found in the acquisition files.
out_group (h5py.Group, hdf5 filename or memh5.Group) – Underlying hdf5 like container that will store the data for the BaseData instance.
Returns
-------
andata (cls instance) – The andata object for the requested data