NOTE: this tutorial is designed for version 1.4. Please update SPUTNIK before from https://github.com/paoloinglese/SPUTNIK.
Welcome to SPUTNIK Github Page!
SPUTNIK (Inglese et al. 2018) is a filter tool for Mass Spectrometry Imaging (MSI) data. It is designed to identify and remove the mass spectral peaks that are spatially unrelated with the sample of interest. Often MSI data contain signal generated by the analytical technique used for the ion desorption. For instance, DESI-MSI data can contain solvent-related signals, or, in the case of MALDI-MSI, matrix-related signal can be part of the data. These signals are non-informative about the molecular content of the analyzed sample, and can interfere with the statistical analysis of the data. Furthermore, the presence of these signals makes the dimensionality of the data uselessly big. SPUTNIK uses three classes of filters to identify and remove these types of signals.
The first family of filters, called general similarity filters, uses the expected location of the sample (ROI) to determine if a signal is informative or not. Simple similarity measures between the ROI, or a continuous reference signal representing the area occupied by the sample, and the images generated by the peak signals are used to determine whether a peak is informative. The continuous reference and ROI images can be calculated from the MSI data or imported. There are four methods to calculate the continuous reference:
There are also four methods to calculate the ROI:
For instance, continuous reference and ROI from Rompp et al. MALDI-MSI dataset are calculated as follows.
library(SPUTNIK)
## Registered S3 method overwritten by 'spatstat.geom':
## method from
## plot.imlist imager
SPUTNIK requires as input:
These can be generated by several tools, such as MALDIquant (see tutorial).
NOTE The intensity matrix must contain the raw intensities (non-normalized or transformed) and must contain all finite values (NA
needs to be converted to 0).
In this case, we will use an example MALDI-MSI dataset. Let’s load the data
bladderMALDI <- bladderMALDIRompp2010()
## Downloading the data from the repository...
## Loading the data in the R environment...
The loaded dataset consists of 34,840 pixels and 1,175 peaks (features):
dim(bladderMALDI)
## [1] 34840 1175
The spatial and mass information is stored as attributes of the matrix:
names(attributes(bladderMALDI))
## [1] "dim" "dimnames" "mass" "size"
Then, we create the dataset object using the msiDataset
command:
msiX <- msiDataset(values = bladderMALDI,
mz = attr(bladderMALDI, 'mass'),
rsize = attr(bladderMALDI, 'size')[1],
csize = attr(bladderMALDI, 'size')[2])
## Creating msiDataset object...
## Detecting constant peaks...
## Generating image of detected peaks...
## Generating total-ion-count image...
The first thing to do is to plot the image of the number of detected peaks and total ion count (TIC). These image are useful to capture the spatial location of the sample, as we expect its spectra to contain more ions or more intense signals:
plot(msiX@totalioncount)
plot(msiX@numdetected)
Unfortunately, in this case, it seems that more ions with higher TIC are detected outside of the tissue, we will need to remember this in the following steps.
Now we can normalize and apply the variable stabilizing transformation:
msiX <- normIntensity(msiX, 'PQN')
msiX <- varTransform(msiX, 'log2')
We can now generate the continuous and binary reference images.
NOTE: we are aligning the PCA image with the number of detected peaks, but , since the number of detected ions is greater outside of the tissue, we need to set the argument invertAlign
to TRUE:
ref.cont <- refImageContinuous(msiData = msiX, method = "pca",
alignTo = 'detected',
invertAlign = TRUE)
## Calculating first principal component...
plot(ref.cont)
The first principal component show clearly the area occupied by the sample.
Now we can generate the binary reference using k-means
NOTE: again we invert the binary ROI after aligning it to the number of detected peaks.
ref.bin <- refImageBinaryKmeans(dataset = msiX,
alignTo = "detected",
invertAligned = TRUE)
plot(ref.bin)
Often the binary reference contains several scattered pixels, as result of the presence of noise in the original data. The isolated pixels can be removed through the removeSmallObjects
command. Now we remove all the connected objects smaller than 50 pixels.
ref.bin <- removeSmallObjects(ref.bin, threshold = 50)
plot(ref.bin)
The global reference filter uses the similarity between the peak intensity image and the reference image In this case, we calculate the normalized mutual information (NMI) with the binary reference.
The argument cores
> 1 allows multicore processing.
gpf <- globalPeaksFilter(msiData = msiX, referenceImage = ref.bin,
method = 'nmi', threshold = 0, cores = 1)
## Calculating the similarity values...
##
|
| | 0%
## Similarity measure quantiles (after removing NAs):
## 0% 25% 50% 75% 100%
## -0.304686005 -0.012251494 0.002941737 0.010944608 0.270602660
## Selecting peaks with similarity larger than 0...
Then, we can apply the filter to remove the uninformative peaks.
msiX <- applyPeaksFilter(msiX, gpf)
print(dim(getIntensityMat(msiX)))
## [1] 34840 669
We can check the effect of filtering by plotting the RGB image corresponding to the scores of the first 3 principal components. This gives a qualitative idea of the spatial structures present in the dataset
pca.img <- PCAImage(msiX)
plot(pca.img)
tic.img <- totalIonCountMSI(msiX)
plot(tic.img)
Pixel count filter overcomes the problems related with the standard approach of removing peaks that are found in less than a certain number of pixels (thresholding). This methodology does not take into account of the spatial structure of the peak signals and therefore can remove important, although localized in small regions, signals. Pixel count filter allows the user to define the minimum number of connected pixels for a peak signal to consider it informative. Also, through three levels of aggressiveness, it compares the size of the connected signal pixels inside and outside the sample ROI. Now we define a new msiDataset
object using the original MALDI-MSI data
msiX <- msiDataset(values = bladderMALDI,
mz = attr(bladderMALDI, 'mass'),
rsize = attr(bladderMALDI, 'size')[1],
csize = attr(bladderMALDI, 'size')[2])
## Creating msiDataset object...
## Detecting constant peaks...
## Generating image of detected peaks...
## Generating total-ion-count image...
We normalize and transform as before:
msiX <- normIntensity(msiX, 'PQN')
msiX <- varTransform(msiX, 'log2')
We use the binary reference generated in the previous section.
Now we define the filter, requesting the smallest number of connected signal pixels to be larger than 9. Also, we set the aggressiveness to 1, through the parameter agressive = 1
. In this way, we request that the largest connected object outside the ROI is smaller than the largest one within the ROI
cpf <- countPixelsFilter(msiData = msiX, roiImage = ref.bin, minNumPixels = 9,
aggressive = 1)
## Counting connected pixels within signal region...
After, we apply the filter and check the RGB image of the first three principal components scores
msiX <- applyPeaksFilter(msiX, cpf)
print(dim(getIntensityMat(msiX)))
## [1] 34840 101
We can then plot the RGB image representing the first 3 principal components. The contrast between the sample and off-sample regions is now much stronger:
pca.img <- PCAImage(msiX)
plot(pca.img)
The third class of filters use the complete spatial randomness tests to determine whether the binarized peak signal images represent a random point pattern. There are two available tests: Clark-Evans and Kolmogorov-Smirnov. We show here how to apply the Clark-Evans test on the MALDI-MSI data.
Again,
msiX <- msiDataset(values = bladderMALDI,
mz = attr(bladderMALDI, 'mass'),
rsize = attr(bladderMALDI, 'size')[1],
csize = attr(bladderMALDI, 'size')[2])
## Creating msiDataset object...
## Detecting constant peaks...
## Generating image of detected peaks...
## Generating total-ion-count image...
Let’s normalize and transform the intensities:
msiX <- normIntensity(msiX, 'PQN')
msiX <- varTransform(msiX, 'log2')
Now we run the filter
csr <- CSRPeaksFilter(msiData = msiX, method = 'ClarkEvans', verbose = TRUE,
cores = 1) # running in serial mode (parallel if cores > 1)
## Scaling all the peaks intensities in [0, 1]
## Testing ion images for complete spatial randomness...
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 3%
|
|=== | 4%
|
|==== | 4%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|====== | 7%
|
|====== | 8%
|
|======= | 8%
|
|======= | 9%
|
|======== | 9%
|
|======== | 10%
|
|======== | 11%
|
|========= | 11%
|
|========= | 12%
|
|========== | 12%
|
|========== | 13%
|
|=========== | 13%
|
|=========== | 14%
|
|============ | 14%
|
|============ | 15%
|
|============ | 16%
|
|============= | 16%
|
|============= | 17%
|
|============== | 17%
|
|============== | 18%
|
|=============== | 18%
|
|=============== | 19%
|
|================ | 19%
|
|================ | 20%
|
|================ | 21%
|
|================= | 21%
|
|================= | 22%
|
|================== | 22%
|
|================== | 23%
|
|=================== | 23%
|
|=================== | 24%
|
|==================== | 24%
|
|==================== | 25%
|
|==================== | 26%
|
|===================== | 26%
|
|===================== | 27%
|
|====================== | 27%
|
|====================== | 28%
|
|======================= | 28%
|
|======================= | 29%
|
|======================== | 29%
|
|======================== | 30%
|
|======================== | 31%
|
|========================= | 31%
|
|========================= | 32%
|
|========================== | 32%
|
|========================== | 33%
|
|=========================== | 33%
|
|=========================== | 34%
|
|============================ | 34%
|
|============================ | 35%
|
|============================ | 36%
|
|============================= | 36%
|
|============================= | 37%
|
|============================== | 37%
|
|============================== | 38%
|
|=============================== | 38%
|
|=============================== | 39%
|
|================================ | 39%
|
|================================ | 40%
|
|================================ | 41%
|
|================================= | 41%
|
|================================= | 42%
|
|================================== | 42%
|
|================================== | 43%
|
|=================================== | 43%
|
|=================================== | 44%
|
|==================================== | 44%
|
|==================================== | 45%
|
|==================================== | 46%
|
|===================================== | 46%
|
|===================================== | 47%
|
|====================================== | 47%
|
|====================================== | 48%
|
|======================================= | 48%
|
|======================================= | 49%
|
|======================================== | 49%
|
|======================================== | 50%
|
|======================================== | 51%
|
|========================================= | 51%
|
|========================================= | 52%
|
|========================================== | 52%
|
|========================================== | 53%
|
|=========================================== | 53%
|
|=========================================== | 54%
|
|============================================ | 54%
|
|============================================ | 55%
|
|============================================ | 56%
|
|============================================= | 56%
|
|============================================= | 57%
|
|============================================== | 57%
|
|============================================== | 58%
|
|=============================================== | 58%
|
|=============================================== | 59%
|
|================================================ | 59%
|
|================================================ | 60%
|
|================================================ | 61%
|
|================================================= | 61%
|
|================================================= | 62%
|
|================================================== | 62%
|
|================================================== | 63%
|
|=================================================== | 63%
|
|=================================================== | 64%
|
|==================================================== | 64%
|
|==================================================== | 65%
|
|==================================================== | 66%
|
|===================================================== | 66%
|
|===================================================== | 67%
|
|====================================================== | 67%
|
|====================================================== | 68%
|
|======================================================= | 68%
|
|======================================================= | 69%
|
|======================================================== | 69%
|
|======================================================== | 70%
|
|======================================================== | 71%
|
|========================================================= | 71%
|
|========================================================= | 72%
|
|========================================================== | 72%
|
|========================================================== | 73%
|
|=========================================================== | 73%
|
|=========================================================== | 74%
|
|============================================================ | 74%
|
|============================================================ | 75%
|
|============================================================ | 76%
|
|============================================================= | 76%
|
|============================================================= | 77%
|
|============================================================== | 77%
|
|============================================================== | 78%
|
|=============================================================== | 78%
|
|=============================================================== | 79%
|
|================================================================ | 79%
|
|================================================================ | 80%
|
|================================================================ | 81%
|
|================================================================= | 81%
|
|================================================================= | 82%
|
|================================================================== | 82%
|
|================================================================== | 83%
|
|=================================================================== | 83%
|
|=================================================================== | 84%
|
|==================================================================== | 84%
|
|==================================================================== | 85%
|
|==================================================================== | 86%
|
|===================================================================== | 86%
|
|===================================================================== | 87%
|
|====================================================================== | 87%
|
|====================================================================== | 88%
|
|======================================================================= | 88%
|
|======================================================================= | 89%
|
|======================================================================== | 89%
|
|======================================================================== | 90%
|
|======================================================================== | 91%
|
|========================================================================= | 91%
|
|========================================================================= | 92%
|
|========================================================================== | 92%
|
|========================================================================== | 93%
|
|=========================================================================== | 93%
|
|=========================================================================== | 94%
|
|============================================================================ | 94%
|
|============================================================================ | 95%
|
|============================================================================ | 96%
|
|============================================================================= | 96%
|
|============================================================================= | 97%
|
|============================================================================== | 97%
|
|============================================================================== | 98%
|
|=============================================================================== | 98%
|
|=============================================================================== | 99%
|
|================================================================================| 99%
|
|================================================================================| 100%
By default, CSRPeaksFilter
calculates the p-values for the selected test, and applies the Bonferroni multiple testing correction. The adjusted p-values are saved in the element q.value
. We can use the 0.05 threshold to define the peak filter.
NOTE: The Clark-Evans test does not distinguish between signals localized outside or within the sample, so it is better to run it after the global reference filter.
We can check the distributions of p-values and q-values:
hist(csr$p.value)
hist(csr$q.value)
We define the new filter setting a threshold of 0.05 for the q.value
:
csrFilter <- createPeaksFilter(which(csr$q.value < 0.05))
and we can check the number of kept peaks:
# Number of kept peaks (non-random)
length(csrFilter$sel.peaks)
## [1] 369
We are ready to apply the filter on the dataset:
msiX <- applyPeaksFilter(msiX, csrFilter)
print(dim(getIntensityMat(msiX)))
## [1] 34840 369
Let’s have a look at the RGB image of the first 3 principal components:
pca.img <- PCAImage(msiX)
plot(pca.img)
As you can see, the CSR filter does not remove off-sample signals because they are not recognized as spatially random.
In this short tutorial, we have seen how to apply spatial filters using SPUTNIK. In this case, the filters were applied separately, but they can also be combined to get improved results.
We suggest to start with the global filter, which removes most of the signal spatially detected outside of the sample region, followed by one or both the other filters.
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] SPUTNIK_1.4
##
## loaded via a namespace (and not attached):
## [1] viridis_0.6.2 sass_0.4.0 edgeR_3.34.0
## [4] jsonlite_1.7.2 viridisLite_0.4.0 splines_4.1.1
## [7] foreach_1.5.1 bslib_0.3.0 assertthat_0.2.1
## [10] highr_0.9 spatstat.geom_2.3-0 tiff_0.1-8
## [13] yaml_2.2.1 pillar_1.6.3 lattice_0.20-44
## [16] glue_1.4.2 limma_3.48.3 doSNOW_1.0.19
## [19] digest_0.6.28 polyclip_1.10-0 colorspace_2.0-2
## [22] readbitmap_0.1.5 htmltools_0.5.2 Matrix_1.3-4
## [25] plyr_1.8.6 spatstat.sparse_2.0-0 pkgconfig_2.0.3
## [28] purrr_0.3.4 spatstat.core_2.3-0 scales_1.1.1
## [31] snow_0.4-3 tensor_1.5 jpeg_0.1-9
## [34] spatstat.utils_2.2-0 tibble_3.1.5 proxy_0.4-26
## [37] mgcv_1.8-36 generics_0.1.0 farver_2.1.0
## [40] ggplot2_3.3.5 infotheo_1.2.0 ellipsis_0.3.2
## [43] magrittr_2.0.1 crayon_1.4.1 deldir_1.0-5
## [46] evaluate_0.14 fansi_0.5.0 nlme_3.1-152
## [49] imager_0.42.10 class_7.3-19 tools_4.1.1
## [52] bmp_0.3 lifecycle_1.0.1 stringr_1.4.0
## [55] munsell_0.5.0 locfit_1.5-9.4 irlba_2.3.3
## [58] compiler_4.1.1 jquerylib_0.1.4 e1071_1.7-9
## [61] rlang_0.4.11 grid_4.1.1 iterators_1.0.13
## [64] goftest_1.2-3 igraph_1.2.6 labeling_0.4.2
## [67] rmarkdown_2.10 gtable_0.3.0 codetools_0.2-18
## [70] abind_1.4-5 DBI_1.1.1 reshape_0.8.8
## [73] R6_2.5.1 gridExtra_2.3 knitr_1.33
## [76] dplyr_1.0.7 fastmap_1.1.0 utf8_1.2.2
## [79] stringi_1.7.5 spatstat.data_2.1-0 parallel_4.1.1
## [82] Rcpp_1.0.7 vctrs_0.3.8 rpart_4.1-15
## [85] png_0.1-7 tidyselect_1.1.1 xfun_0.25