ch_util.rfi

Tools for RFI flagging

This module contains tools for finding and removing Radio Frequency Interference (RFI).

Note that this generates masks where the elements containing RFI are marked as True, and the remaining elements are marked False. This is in contrast to the routines in ch_pipeline.rfi which generates a inverse noise weighting, where RFI containing elements are effectively False, and the remainder are True.

There are general purpose routines for flagging RFI in andata like datasets:

For more control there are specific routines that can be called:

Functions

flag_dataset(data[, freq_width, time_width, ...])

RFI flag the dataset.

frequency_mask(freq_centre[, freq_width, ...])

Flag known bad frequencies.

get_autocorrelations(data[, stack, normalize])

Extract autocorrelations from a data stack.

highpass_delay_filter(freq, tau_cut, flag[, ...])

Construct a high-pass delay filter.

iterative_hpf_masking(freq, y[, flag, ...])

Mask features in a spectrum that have significant power at high delays.

mad_cut_1d(data[, twidth, threshold, mask])

Mask out RFI using a median absolute deviation cut in the time direction.

mad_cut_2d(data[, fwidth, twidth, ...])

Mask out RFI using a median absolute deviation cut in time-frequency blocks.

mad_cut_rolling(data[, fwidth, twidth, ...])

Mask out RFI by placing a cut on the absolute deviation.

nanmedian(*args, **kwargs)

number_deviations(data[, freq_width, ...])

Calculate the number of median absolute deviations (MAD) of the autocorrelations from the local median.

sir(basemask[, eta, only_freq, only_time])

Apply the SIR operator over the frequency and time axes for each product.

sir1d(basemask[, eta])

Numpy implementation of the scale-invariant rank (SIR) operator.

spectral_cut(data[, fil_window, only_autos])

Flag out the TV bands, or other constant spectral RFI.

ch_util.rfi.flag_dataset(data, freq_width=10.0, time_width=420.0, threshold=5.0, flag1d=False, rolling=False)[source]

RFI flag the dataset. This function wraps number_deviations, and remains largely for backwards compatability. The pipeline code now calls number_deviations directly.

Parameters:
  • data (andata.CorrData) – Must contain vis and weight attribute that are both np.ndarray[nfreq, nprod, ntime]. Note that this function does not work with CorrData that has been stacked over redundant baselines.

  • freq_width (float) – Frequency interval in MHz to compare across.

  • time_width (float) – Time interval in seconds to compare.

  • threshold (float) – Threshold in MAD over which to cut out RFI.

  • rolling (bool) – Use a rolling window instead of distinct blocks.

  • flag1d (bool, optional) – Only apply the MAD cut in the time direction. This is useful if the frequency coverage is sparse.

Returns:

mask – RFI mask, output shape is the same as input visibilities.

Return type:

np.ndarray

ch_util.rfi.frequency_mask(freq_centre: ndarray, freq_width: ndarray | float | None = None, timestamp: ndarray | float | None = None, instrument: str | None = 'chime') ndarray[source]

Flag known bad frequencies.

Time dependent static RFI flags that affect the recent observations are added.

Parameters:
  • freq_centre – Centre of each frequency channel

  • freq_width – Width of each frequency channel. If None (default), calculate the width from the frequency centre separation. If supplied as an array it must be broadcastable against freq_centre.

  • timestamp – UNIX observing time. If None (default) mask all specified bands regardless of their start/end times, otherwise mask only timestamps within the band start and end times. If supplied as an array it must be broadcastable against freq_centre.

  • instrument – Telescope name. [kko, gbo, hco, chime (default)]

Returns:

An array marking the bad frequency channels. The final shape is the result of broadcasting freq_centre and timestamp together.

Return type:

mask

ch_util.rfi.get_autocorrelations(data, stack: bool = False, normalize: bool = False) tuple[ndarray, ndarray, ndarray][source]

Extract autocorrelations from a data stack.

Parameters:
  • data (andata.CorrData) – Must contain vis and weight attributes that are both np.ndarray[nfreq, nprod, ntime].

  • stack (bool, optional) – Average over all autocorrelations.

  • normalize (bool, optional) – Normalize by the median value over time prior to averaging over autocorrelations. Only relevant if stack is True.

Returns:

  • auto_ii (np.ndarray[ninput,]) – Index of the inputs that have been processed. If stack is True, then [0] will be returned.

  • auto_vis (np.ndarray[nfreq, ninput, ntime]) – The autocorrelations that were used to calculate the number of deviations.

  • auto_flag (np.ndarray[nfreq, ninput, ntime]) – Indices where data weights are positive

ch_util.rfi.highpass_delay_filter(freq, tau_cut, flag, epsilon=1e-10)[source]

Construct a high-pass delay filter.

The stop band will range from [-tau_cut, tau_cut]. DAYENU is used to construct the filter in the presence of masked frequencies. See Ewall-Wice et al. 2021 (arXiv:2004.11397) for a description.

Parameters:
  • freq (np.ndarray[nfreq,]) – Frequency in MHz.

  • tau_cut (float) – The half width of the stop band in micro-seconds.

  • flag (np.ndarray[nfreq,]) – Boolean flag that indicates what frequencies are valid.

  • epsilon (float) – The stop-band rejection of the filter.

Returns:

pinv – High pass delay filter.

Return type:

np.ndarray[nfreq, nfreq]

ch_util.rfi.iterative_hpf_masking(freq, y, flag=None, tau_cut=0.6, epsilon=1e-10, window=65, threshold=6.0, nperiter=1, niter=40, timestamp=None)[source]

Mask features in a spectrum that have significant power at high delays.

Uses the following iterative procedure to generate the mask:

  • Apply a high-pass filter to the spectrum.

  • For each frequency channel, calculate the median absolute deviation of nearby frequency channels to get an estimate of the noise. Divide the high-pass filtered spectrum by the noise estimate.

  • Mask excursions with the largest signal to noise.

  • Regenerate the high-pass filter using the new mask.

  • Repeat.

The procedure stops when the maximum number of iterations is reached or there are no excursions beyond some threshold.

Parameters:
  • freq (np.ndarray[nfreq,]) – Frequency in MHz.

  • y (np.ndarray[nfreq,]) – Spectrum to search for narrowband features.

  • flag (np.ndarray[nfreq,]) – Boolean flag where True indicates valid data.

  • tau_cut (float) – Cutoff of the high-pass filter in microseconds.

  • epsilon (float) – Stop-band rejection of the filter.

  • threshold (float) – Number of median absolute deviations beyond which a frequency channel is considered an outlier.

  • window (int) – Width of the window used to estimate the noise (by calculating a local median absolute deviation).

  • nperiter (int) – Maximum number of frequency channels to flag on any iteration.

  • niter (int) – Maximum number of iterations.

  • timestamp (float) – Start observing time (in unix time)

Returns:

  • yhpf (np.ndarray[nfreq,]) – The high-pass filtered spectrum generated using the mask from the last iteration.

  • flag (np.ndarray[nfreq,]) – Boolean flag where True indicates valid data. This is the logical complement to the mask from the last iteration.

  • rsigma (np.ndarray[nfreq,]) – The local median absolute deviation from the last iteration.

ch_util.rfi.mad_cut_1d(data, twidth=42, threshold=5.0, mask=True)[source]

Mask out RFI using a median absolute deviation cut in the time direction.

This is useful for datasets with sparse frequency coverage. Functionally this routine is equivalent to mad_cut_2d() with fwidth = 1, but will be much faster.

Parameters:
  • data (np.ndarray[freq, time]) – Array of data to mask.

  • twidth (integer, optional) – Number of time samples to average median over.

  • threshold (scalar, optional) – Number of median deviations above which we cut the data.

  • mask (boolean, optional) – If True return the mask, if False return the number of median absolute deviations.

Returns:

mask – Mask or number of median absolute deviations for each sample.

Return type:

np.ndarray[freq, time]

ch_util.rfi.mad_cut_2d(data, fwidth=64, twidth=42, threshold=5.0, freq_flat=True, mask=True)[source]

Mask out RFI using a median absolute deviation cut in time-frequency blocks.

Parameters:
  • data (np.ndarray[freq, time]) – Array of data to mask.

  • fwidth (integer, optional) – Number of frequency samples to average median over.

  • twidth (integer, optional) – Number of time samples to average median over.

  • threshold (scalar, optional) – Number of median deviations above which we cut the data.

  • freq_flat (boolean, optional) – Flatten in the frequency direction by dividing through by the median.

  • mask (boolean, optional) – If True return the mask, if False return the number of median absolute deviations.

Returns:

mask – Mask or number of median absolute deviations for each sample.

Return type:

np.ndarray[freq, time]

ch_util.rfi.mad_cut_rolling(data, fwidth=64, twidth=42, threshold=5.0, freq_flat=True, mask=True, limit_range: slice = slice(None, None, None))[source]

Mask out RFI by placing a cut on the absolute deviation. Compared to mad_cut_2d, this function calculates the median and median absolute deviation using a rolling 2D median filter, i.e., for every (freq, time) sample a separate estimates of these statistics is obtained for a window that is centered on that sample.

For sparsely sampled frequency axis, set fwidth = 1.

Parameters:
  • data (np.ndarray[freq, time]) – Array of data to mask.

  • fwidth (integer, optional) – Number of frequency samples to calculate median over.

  • twidth (integer, optional) – Number of time samples to calculate median over.

  • threshold (scalar, optional) – Number of median absolute deviations above which we cut the data.

  • freq_flat (boolean, optional) – Flatten in the frequency direction by dividing each frequency by the median over time.

  • mask (boolean, optional) – If True return the mask, if False return the number of median absolute deviations.

  • limit_range (slice, optional) – Data is limited to this range in the freqeuncy axis. Defaults to slice(None).

Returns:

mask – Mask or number of median absolute deviations for each sample.

Return type:

np.ndarray[freq, time]

ch_util.rfi.number_deviations(data, freq_width=10.0, time_width=420.0, flag1d=False, apply_static_mask=False, rolling=False, stack=False, normalize=False, fill_value=None)[source]

Calculate the number of median absolute deviations (MAD) of the autocorrelations from the local median.

Parameters:
  • data (andata.CorrData) – Must contain vis and weight attributes that are both np.ndarray[nfreq, nprod, ntime].

  • freq_width (float) – Frequency interval in MHz to compare across.

  • time_width (float) – Time interval in seconds to compare across.

  • flag1d (bool) – Only apply the MAD cut in the time direction. This is useful if the frequency coverage is sparse.

  • apply_static_mask (bool) – Apply static mask obtained from frequency_mask before computing the median absolute deviation.

  • rolling (bool) – Use a rolling window instead of distinct blocks.

  • stack (bool) – Average over all autocorrelations.

  • normalize (bool) – Normalize by the median value over time prior to averaging over autocorrelations. Only relevant if stack is True.

  • fill_value (float) – Data that was already flagged as bad will be set to this value in the output array. Should be a large positive value that is greater than the threshold that will be placed. Default is float(‘Inf’).

Returns:

  • auto_ii (np.ndarray[ninput,]) – Index of the inputs that have been processed. If stack is True, then [0] will be returned.

  • auto_vis (np.ndarray[nfreq, ninput, ntime]) – The autocorrelations that were used to calculate the number of deviations.

  • ndev (np.ndarray[nfreq, ninput, ntime]) – Number of median absolute deviations of the autocorrelations from the local median.

ch_util.rfi.sir(basemask, eta=0.2, only_freq=False, only_time=False)[source]

Apply the SIR operator over the frequency and time axes for each product.

This is a wrapper for sir1d. It loops over times, applying sir1d across the frequency axis. It then loops over frequencies, applying sir1d across the time axis. It returns the logical OR of these two masks.

Parameters:
  • basemask (np.ndarray[nfreq, nprod, ntime] of boolean type) – The previously generated threshold mask. 1 (True) for masked points, 0 (False) otherwise.

  • eta (float) – Aggressiveness of the method: with eta=0, no additional samples are flagged and the function returns basemask. With eta=1, all samples will be flagged.

  • only_freq (bool) – Only apply the SIR operator across the frequency axis.

  • only_time (bool) – Only apply the SIR operator across the time axis.

Returns:

mask – The mask after the application of the SIR operator.

Return type:

np.ndarray[nfreq, nprod, ntime] of boolean type

ch_util.rfi.sir1d(basemask, eta=0.2)[source]

Numpy implementation of the scale-invariant rank (SIR) operator.

For more information, see arXiv:1201.3364v2.

Parameters:
  • basemask (numpy 1D array of boolean type) – Array with the threshold mask previously generated. 1 (True) for flagged points, 0 (False) otherwise.

  • eta (float) – Aggressiveness of the method: with eta=0, no additional samples are flagged and the function returns basemask. With eta=1, all samples will be flagged. The authors in arXiv:1201.3364v2 seem to be convinced that 0.2 is a mostly universally optimal value, but no optimization has been done on CHIME data.

Returns:

mask – The mask after the application of the (SIR) operator. Same shape and type as basemask.

Return type:

numpy 1D array of boolean type

ch_util.rfi.spectral_cut(data, fil_window=15, only_autos=False)[source]

Flag out the TV bands, or other constant spectral RFI.

Parameters:
  • data (andata.obj) – If only_autos shape is (freq, n_feeds, time), else (freq, n_prod, time).

  • fil_window (integer) – Window of median filter for baseline of chime spectrum. Default is 15.

  • only_autos (boolean) – Whether data contains only autos or not.

Returns:

mask – RFI mask (no product axis).

Return type:

np.ndarray[freq,time]