1 Introduction

MetaboAnnotatoR is designed to perform metabolite annotation of features from LC-MS All-ion fragmentation (AIF) datasets, using ion fragment databases. It requires raw LC-MS AIF chromatograms acquired/transformed in centroid mode.

2 Installation

To install this package, start R (version “4.5.0” or higher) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("MetaboAnnotatoR")

3 Example session

An example of feature annotation using LC-MS AIF chromatograms processed using xcms and RamClustR packages is illustrated here. The details of how the example dataset was obtained check MetaboAnnotatoR original paper for the full details: https://pubs.acs.org/doi/10.1021/acs.analchem.1c03032.

For more details on RAMClustR object, check the original publication: https://pubs.acs.org/doi/10.1021/ac501530d.

Firstly load library and dependencies:

library(MetaboAnnotatoR)

3.1 Feature table and data

As an input, MetaboAnnotatoR requires a data frame containing the features to be annotated and either a raw AIF LC-MS chromatogram (as .mzML or CDF) or a processed dataset composed of two objects: RAMClustR (object containing the pseudo-MS/MS spectra) and an XCMS object containing the peak-picked data. Additionally, the fragment libraries need to be specified.

Firstly a data table (targets) containing one feature to annotate needs to be loaded. There is an example feature table in MetaboAnnotatoR (targetTable.csv) that it will be used in this example.

tfile <- system.file("extdata", "targetTable.csv", package="MetaboAnnotatoR")
targets <- read.csv(tfile)

This table contains 6 features from a LC-MS Lipidomics (ESI+) chromatogram to be annotated.

The example in this vignette will use of processed data, included in the package. These consist of: 1) an xcmsSet object (xset) containing the processed data from 100 AIF LC-MS chromatograms from human serum samples and 2) the respective pseudo-MS/Ms spectra obtained by processing the xcmsSet data using RAMClustR (RC). The data can be loaded as followed:

data("xset")
data("RC")

3.2 Annotations

Since the features come from a ESI+ lipidomics experiment, annotation can be performed using the default Lipid Positive mode libraries “LipidPos”. For this, the default Lipid Positive libraries must be first loaded into the workspace:

data("LipidPos")

Then annotations can be performed using the annotateRC function. The results will be stored in an object (annotations ):

annotations <- annotateRC(targets, xcmsObject=xset, ramclustObj=RC, 
                            libs="LipidPos")
#> No RT information provided...
#> ... Processing feature 1 of 6 ...
#> Searching candidates...
#> ... Processing feature 2 of 6 ...
#> Searching candidates...
#> ... Processing feature 3 of 6 ...
#> Searching candidates...
#> Matching fragments to pseudo-MS/MS and highCE spectra...
#> ... Processing feature 4 of 6 ...
#> Searching candidates...
#> Matching fragments to pseudo-MS/MS and highCE spectra...
#> ... Processing feature 5 of 6 ...
#> Searching candidates...
#> Matching fragments to pseudo-MS/MS and highCE spectra...
#> ... Processing feature 6 of 6 ...
#> Searching candidates...
#> Matching fragments to pseudo-MS/MS and highCE spectra...
#> Job done!

The most significant annotations (rank 1 annotations) for each feature are summarised in the global results object within the annotations object:

annotations$global
#>   feature.mz feature.rt metabolite feature.type ion.type isotope mz.metabolite
#> 1   286.1442   40.77069       <NA>         <NA>     <NA>     M+0            NA
#> 2   585.2692   72.79411       <NA>         <NA>     <NA>     M+0            NA
#> 3   468.3095   82.92009  LPC(14:0)       parent   [M+H]+     M+0      468.3085
#> 4   520.3409  100.62388  LPC(18:2)       parent   [M+H]+     M+0      520.3398
#> 5   496.3410  113.59412       <NA>         <NA>     <NA>     M+0            NA
#> 6   478.2938  104.22690  LPE(18:2)       parent   [M+H]+     M+0      478.2928
#>   matched.mz mz.error pseudoMSMS fraction     score
#> 1         NA       NA      FALSE     <NA>        NA
#> 2         NA       NA       TRUE     <NA>        NA
#> 3   468.3085 2.026865       TRUE 3  of  4 0.5716864
#> 4   520.3398 2.014641      FALSE 3  of  4 0.4231832
#> 5         NA       NA      FALSE     <NA>        NA
#> 6   478.2928 2.017588      FALSE 1  of  5 0.2540706

Three out of the six features were annotated with to a lipid.

It is also possible to inspect if there were other candidate annotations for a given feature, for instance feature 3: 468.3095 m/z, 82.92009 s. This information can be accessed from the rankedResult object stored in the annotations. For feature 3, it is accessed as follows:

annotations$rankedResult[[3]]
#>      feature.mz feature.rt             metabolite feature.type     ion.type
#> 17     468.3095   82.92009              LPC(14:0)       parent       [M+H]+
#> 24.3   468.3095   82.92009  PC(20:0) PC(6:0_14:0)     fragment [LPC_tail2]+
#> 24.4   468.3095   82.92009  PC(21:3) PC(7:3_14:0)     fragment [LPC_tail2]+
#> 24.1   468.3095   82.92009 PC(33:1) PC(14:0_19:1)     fragment [LPC_tail1]+
#> 24.2   468.3095   82.92009 PC(33:4) PC(14:0_19:4)     fragment [LPC_tail1]+
#> 19     468.3095   82.92009              LPE(17:0)       parent       [M+H]+
#>      isotope mz.metabolite matched.mz mz.error pseudoMSMS fraction     score
#> 17       M+0      468.3085   468.3085 2.026865       TRUE 3  of  4 0.5716864
#> 24.3     M+0      566.3817   468.3087 1.717241       TRUE 4  of  9 0.4765815
#> 24.4     M+0      574.3504   468.3087 1.717241       TRUE 3  of  9 0.4203315
#> 24.1     M+0      746.5696   468.3087 1.717241       TRUE 3  of  9 0.3161648
#> 24.2     M+0      740.5226   468.3087 1.717241       TRUE 3  of  9 0.3161648
#> 19       M+0      468.3085   468.3085 2.069572       TRUE 2  of  5 0.2665959
#>      rank
#> 17      1
#> 24.3    2
#> 24.4    3
#> 24.1    4
#> 24.2    4
#> 19      5

The rank 1 annotation is LPC(14:0). However, it is also possible to see this feature could also be annotated (although with lower score and hence confidence) to fragments of several PCs that also contain the 14:0 fatty acyl chain.

It is possible to visualise the spectra containing the matched ions to each candidate. The example code below will plot the rank 1 candidate for the annotation of the 3rd feature of the targets table:

plotResultSpec(annotations, 3, 1)

3.3 Save the annotations

It is possible to save the annotation results to a user-specified directory. By default, the global annotations are saved specified directory. The annotation options can be also saved, as well as the pseudo-MS/MS spectra of each matched candidate will be saved (as .pdf) and any pseudo-MS/MS spectra as (.mgf file). For this examples we’ll make use of a temporary directory.

exampleDir <- tempdir()
saveAnnotations(annotations, DirPath=exampleDir, saveOptions=TRUE, 
                saveXCMSoptions=FALSE, saveRanked=TRUE,
                saveRankedSpec=TRUE, savePseudoMSMS=TRUE)

4 Session Info

sessionInfo()
#> R version 4.6.0 alpha (2026-04-05 r89794)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] MetaboAnnotatoR_0.99.21 MSnbase_2.37.3          ProtGenerics_1.43.0    
#>  [4] S4Vectors_0.49.1        mzR_2.45.1              Rcpp_1.1.1             
#>  [7] Biobase_2.71.0          BiocGenerics_0.57.0     generics_0.1.4         
#> [10] xcms_4.9.2              BiocParallel_1.45.0     BiocStyle_2.39.0       
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.3.0                   rlang_1.2.0                
#>   [3] magrittr_2.0.5              clue_0.3-68                
#>   [5] MassSpecWavelet_1.77.0      otel_0.2.0                 
#>   [7] matrixStats_1.5.0           compiler_4.6.0             
#>   [9] PTMods_0.99.6               systemfonts_1.3.2          
#>  [11] vctrs_0.7.3                 reshape2_1.4.5             
#>  [13] stringr_1.6.0               crayon_1.5.3               
#>  [15] pkgconfig_2.0.3             MetaboCoreUtils_1.19.2     
#>  [17] fastmap_1.2.0               magick_2.9.1               
#>  [19] XVector_0.51.0              labeling_0.4.3             
#>  [21] rmarkdown_2.31              preprocessCore_1.73.0      
#>  [23] ragg_1.5.2                  tinytex_0.59               
#>  [25] purrr_1.2.2                 xfun_0.57                  
#>  [27] MultiAssayExperiment_1.37.4 cachem_1.1.0               
#>  [29] jsonlite_2.0.0              progress_1.2.3             
#>  [31] DelayedArray_0.37.1         prettyunits_1.2.0          
#>  [33] parallel_4.6.0              cluster_2.1.8.2            
#>  [35] R6_2.6.1                    bslib_0.10.0               
#>  [37] stringi_1.8.7               RColorBrewer_1.1-3         
#>  [39] limma_3.67.1                GenomicRanges_1.63.2       
#>  [41] jquerylib_0.1.4             Seqinfo_1.1.0              
#>  [43] bookdown_0.46               SummarizedExperiment_1.41.1
#>  [45] iterators_1.0.14            knitr_1.51                 
#>  [47] IRanges_2.45.0              Matrix_1.7-5               
#>  [49] igraph_2.2.3                tidyselect_1.2.1           
#>  [51] dichromat_2.0-0.1           abind_1.4-8                
#>  [53] yaml_2.3.12                 doParallel_1.0.17          
#>  [55] codetools_0.2-20            affy_1.89.0                
#>  [57] lattice_0.22-9              tibble_3.3.1               
#>  [59] plyr_1.8.9                  withr_3.0.2                
#>  [61] S7_0.2.1                    evaluate_1.0.5             
#>  [63] Spectra_1.21.7              pillar_1.11.1              
#>  [65] affyio_1.81.0               BiocManager_1.30.27        
#>  [67] MatrixGenerics_1.23.0       foreach_1.5.2              
#>  [69] MALDIquant_1.22.3           ncdf4_1.24                 
#>  [71] hms_1.1.4                   ggplot2_4.0.2              
#>  [73] scales_1.4.0                MsExperiment_1.13.1        
#>  [75] glue_1.8.0                  MsFeatures_1.19.0          
#>  [77] lazyeval_0.2.3              tools_4.6.0                
#>  [79] mzID_1.49.1                 data.table_1.18.2.1        
#>  [81] QFeatures_1.21.2            vsn_3.79.6                 
#>  [83] fs_2.0.1                    XML_3.99-0.23              
#>  [85] grid_4.6.0                  impute_1.85.0              
#>  [87] tidyr_1.3.2                 MsCoreUtils_1.23.7         
#>  [89] PSMatch_1.15.3              cli_3.6.6                  
#>  [91] textshaping_1.0.5           S4Arrays_1.11.1            
#>  [93] dplyr_1.2.1                 AnnotationFilter_1.35.0    
#>  [95] pcaMethods_2.3.0            gtable_0.3.6               
#>  [97] sass_0.4.10                 digest_0.6.39              
#>  [99] SparseArray_1.11.13         farver_2.1.2               
#> [101] htmltools_0.5.9             lifecycle_1.0.5            
#> [103] statmod_1.5.1               MASS_7.3-65