4 Advanced techniques
In this section we will introduce a few of the more sophisticated techniques and algorithms offered by SoFiA for the purpose of improving the quality of the source finding and parameterisation output.
4.1 Other source finding algorithms
While we have only used the S + C finder so far, SoFiA offers several alternative source finding algorithms that may be more suitable to some problems and data sets. An overview of these algorithms and their performance is given in Popping et al. (2012, PASA, 29, 318).
Characterised Noise H Ⅰ (CNHI) finder
The CNHI finder (Jurek 2012, PASA, 29, 251) is best suited for H Ⅰ data cubes in which the sources are resolved in the spectral domain, but only marginally resolved in the spatial domain. It applies a statistical test (Kuiper’s test) to identify regions in the H Ⅰ spectrum that are inconsistent with statistical noise. In other words, instead of looking for sources, the CNHI finder tries to identify regions that don’t appear to be purely noise.
In order to use the CNHI finder, all we need to do is navigate to the “Source Finding” tab in SoFiA, disable the “Smooth + Clip Finder” module and enable the “CNHI Finder” module. Again there are several settings enabling us to control the CNHI finder. These are provided in the file SoFiA_Tutorial_Section_4.1_CNHI.par
(remember to change the input file path to point to the location of the cube on your computer). Given the statistical nature of the algorithm, most of these parameters are somewhat less intuitive than those of the S + C finder.
- Probability:
- This defines the probability (as determined from Kupier’s test) below which the data are considered to be inconsistent with pure noise and hence treated as a source. Useful values typically are in the range of 10⁻⁷ to 10⁻³. We will set this to a value of
1e-7
here. - Quality:
- This is the Q value of Kuiper’s test, a heuristic parameter that is used to assess the accuracy of the probability calculated from Kuiper’s test. We will set this to a value of
5.0
. - Min. / Max. scale:
- These define the minimum and maximum size of the spectral regions to be tested. The maximum scale parameter can be set to
-1
, in which case it defaults to half the size of the spectral axis. We will explicitly set both to10
and25
, respectively. - Median test:
- If enabled, the CNHI finder will additionally require all regions identified as possible sources to have a median greater than that of the remaining data. We will leave this option enabled (which is the default setting).
The “CNHI Finder” section should now look as shown in Fig. 10. In addition to these settings, we also need to modify some of the settings in the “Merging” tab from those established in ☛ Section 3.3. Specifically, we need to increase the values of the “Min. size X / Y / Z” parameters from 5
to 8
. We then run the pipeline again, and SoFiA should detect all four sources that were also found by the S + C finder run described in ☛ Section 3.
Figure 10: Settings of the CNHI finder used in the example in Section 4.1.
As noted before, the CNHI finder uses statistical methods to detect sources. Its different settings are therefore less intuitive, and some level of experimentation is usually required to optimise its performance. It should also be noted that the SoFiA test data cube is not a particularly suitable data set for this algorithm, because the galaxies contained in the cube are spatially extended, whereas the CNHI finder works best for sources that are spatially unresolved or only marginally resolved, such as galaxies at higher redshift.
2D–1D wavelet decomposition
Another useful source finding method implemented in SoFiA is based on decomposition of the data cube into wavelets of different scales (Flöer & Winkel 2012, PASA, 29, 244). The algorithm implemented in SoFiA specifically treats the spatial and spectral wavelet scales separately to account for the fact that the spatial extent of H Ⅰ sources often differs from their spectral extent in terms of the number of pixels / channels covered by the source (hence the name 2D–1D wavelet decomposition, referring to two spatial dimensions and one spectral dimension).
Figure 11: Application of the 2D–1D wavelet decomposition filter on a channel map of the SoFiA test data cube (left) creates a noise-free map of wavelet components (right). A simple threshold finder can then be applied to extract the three galaxies (labelled here with arbitrary numbers).
The 2D–1D wavelet decomposition algorithm does not constitute a source finder as such, but is rather implemented as an input filter in SoFiA. Hence, it is found under the “Input Filter” tab in the GUI. The algorithm essentially decomposes the cube into wavelet components on different scales and then reconstructs the entire cube by only including significant signal from the individual wavelet components. This will generally get rid of most of the image noise, but retain signal from sufficiently bright sources in the field (see Fig. 11). A simple threshold source finder can then be used to extract sources from the reconstructed cube. The settings used in this example are provided in the file SoFiA_Tutorial_Section_4.1_Wavelet.par
(remember to change the input file path to point to the location of the cube on your computer).
We will first need to set up the 2D–1D wavelet filter found under the “Input Filter” tab. After enabling the filter, we then apply the following settings:
- Threshold:
- This is the relative threshold in units of the rms noise level for wavelet components to be included in the reconstructed cube. We will set it to
5.0
here, which is its default value. - Iterations:
- The number of iterations in the reconstruction process. Again, we will leave this at its default value of
3
. - Scale XY / Z:
- This defines the number of spatial / spectral scales to be used in the reconstruction process. Leaving both at their default value of
-1
will tell SoFiA to automatically determine the optimal number of scales based on the cube dimensions. - Positivity:
- We will enable this to ensure that only positive wavelet components are added to the reconstructed cube. Otherwise, both positive and negative signals whose absolute value is above the threshold will be included.
With the wavelet decomposition filter set up, we will next have to choose and set up a source finding algorithm to run on the reconstructed cube. The most obvious choice is SoFiA’s threshold finder, designed to apply a simple flux threshold to the data. Under the “Source Finding” tab we enable the threshold finder and disable the S + C and CNHI finders. In the threshold finder we then set the clip mode to absolute
and the threshold to 0.0005
, i.e. 0.5 mJy. In addition, the “Min. size X / Y / Z” settings under the “Merging” tab should be set to a value of 10
.
Next, we run the source finding pipeline again, and if everything is correctly set up, SoFiA should detect all three galaxies present in the data cube, again breaking up the edge-on galaxy near the southern edge of the cube into two separate detections (hence four detections overall). Another interesting thing to do is to take a look at the actual reconstructed cube. This can be done by checking the “Filtered cube” option in the “Data products” settings of the “Output Data Products” section under the “Output” tab. Rerunning the pipeline should then produce an additional file called sofiatestcube_filtered.fits
that contains a copy of the reconstructed cube. A single channel map from that cube is shown in the right-hand panel of Fig. 11 and illustrates the capability of the 2D–1D wavelet algorithm to suppress noise in a data cube and highlight the underlying source emission on larger spatial and spectral scales.
Finally, it should be noted at this stage that the 2D–1D wavelet algorithm implemented in SoFiA has not yet been optimised and currently occupies a large amount of memory (about 40 times the size of the input data cube). Improving the algorithm’s memory footprint is work in progress.
4.2 Improving completeness and reliability
The aim of any source finding effort is to detect as many sources as possible right down to the statistical noise level of the data cube. However, when decreasing the detection threshold to pick up fainter sources, we will also inevitably increase the number of false positives, most of which will be noise peaks or signals from radio-frequency interference. In other words, increasing the completeness of our catalogue will at the same time decrease its reliability.
Fortunately, SoFiA comes to the rescue with a powerful algorithm that allows us to determine the reliability of each detected source in a statistical way (Serra et al. 2012, PASA, 29, 296). This “Reliability Calculation” method can be found under the “Parameterisation” tab in the GUI. The algorithm makes the fundamental assumption that all astronomical signal in the data cube will have positive flux, whereas all negative signals must be due to statistical noise. In addition, the assumption is made that the noise is symmetric about zero, i.e. the flux distribution of positive noise peaks is the same as that of negative noise peaks. Based on these assumptions, the algorithm then determines the density of positive and negative sources in an N-dimensional source parameter space around the position of each positive detection and uses these to calculate the probability of the positive signal being a genuine source as opposed to a noise peak.
Of course, this method will only produce meaningful results if enough positive and negative noise peaks have been detected to ensure that the calculated probability is statistically significant. Therefore, the reliability calculation algorithm will usually only work with very low detection thresholds of typically 4 σ and lower. The great advantage, however, is that we can use the calculated reliabilities to filter out all detections with low reliability from our catalogue and hence produce a much more reliable and complete source catalogue down to low flux detection thresholds. Being able to apply a lower detection threshold also means that we can detect weaker sources as well as faint components associated with bright objects (e.g., extra-planar gas in galaxies).
Figure 12: Moment-0 maps of the SoFiA test data cube after running the S + C finder with a threshold of 5 σ (left), 3 σ (centre) and 3 σ + reliability threshold of 0.9 (right). Note the great improvement in reliability in the latter case, as well as the merging of the two halves of the edge-on galaxy near the southern edge of the cube into a single source.
Let’s see how well the algorithm works by picking up our source finding example from ☛ Section 3 again. As you may remember, one of the galaxies got broken up into two separate sources in that example (left-hand panel of Fig. 12), so let’s see if we can rectify this issue by decreasing our detection threshold without picking up any false positives at the same time. The settings for this example are provided in the file SoFiA_Tutorial_Section_4.2_Reliability.par
(remember to change the input file path to point to the location of the cube on your computer).
In our original parameter settings from ☛ Section 3, we first need to change the threshold of the S + C finder from 5.0
to 3.0
. If we now run the pipeline again with the lower threshold, the number of detections in the final catalogue will increase dramatically from 4 to 71. A quick inspection of the moment images produced by SoFiA reveals that the overwhelming majority of these are false detections caused by noise peaks (central panel of Fig. 12), although some additional extra-planar gas associated with the largest galaxy in the field is also detected.
Figure 13: Two projections of the three-dimensional parameter space in which SoFiA calculates the reliability of detections. Sources with positive flux are shown in blue, negative sources in red. The three isolated blue dots, corresponding to the three galaxies in the data cube, populate a highly reliable region of parameter space where there are no negative signals.
Now we switch on the “Reliability Calculation” module in the “Parameterisation” tab of the GUI and set the threshold to 0.9
(which should be its default value). In addition, we set the kernel scale to a value of 0.55
. This will calculate the reliability of each detection and then remove all detections from the catalogue whose reliability is found to be below 90%. Running the pipeline again with the reliability calculation turned on will now result in a catalogue of only 3 sources, corresponding to the three galaxies contained in the cube. Note that all false positives got removed from the catalogue (right-hand panel of Fig. 12 and Fig. 13). Most importantly, thanks to our lower detection threshold, the two halves of the edge-on galaxy near the southern edge of the cube have now been merged into a single object.
Note
The reliability calculation module offers on option to produce diagnostic plots (in PDF format) that can be used to inspect the distribution of positive and negative sources in parameter space (see Fig. 13 for an example). This can be helpful in assessing whether there are enough positive and negative detections for accurate reliability determination. The higher the density of negative signals in parameter space, the more accurate the reliability calculation will be. To enable diagnostic plots, simply activate the corresponding check box in the “Reliability Calculation” section of the “Parameterisation” tab in the GUI.
Table of Contents – Section 1 – Section 2 – Section 3 – Section 4 – Section 5