4.6. Metabolomics: next generation phytochemical analysis

The term “omics” is often used as suffix to large-scale comprehensive biological approaches generating big data sets. Omics in molecular biology aims to collectively characterize and quantify large groups of molecules and thereby create an overview of structures, functions, and dynamics of an organism or groups of organisms. Omics technologies are nowadays popular in many research areas, including crop science and human nutrition, as they can provide a broad insight into overall differences between samples related to genotypes, physiology, nutrition, culture conditions, etcetera. Omics data can be used as a kind of fingerprint of the materials or samples under investigation.

While genomics refers to studying the ‘entire’ genome, transcriptomics to the expression of ‘all’ genes, and proteomics to ‘all’ proteins, metabolomics refers to multi-parallel detection of ‘all’ metabolites present in an organism, tissue or even single cells. Metabolomics can thus be regarded as deep (phyto)chemical fingerprinting. Similarly, fingerprinting of minerals is generally referred to as ionomics. In practice, all omics analyses techniques are restricted to those biomolecules that are present and can be extracted and detected in the material to be analysed, thus detection ‘all’ biomolecules of an organism is hardly feasible.

What is the main difference between metabolomics and classical metabolite analysis as described above? In classical analysis methods, the protocols and methods have been specifically developed and optimized for the specific compound(s) of interest (the targets); other compounds possibly detected in the same extract by the same method are not considered or taken into account in the further data processing and analysis. In several cases this is the best or even the only option to analyse specific metabolites, especially in cases of labile compounds (like vitamin C) or for analysing metabolites usually present at very low levels, such as hormones or folates. The limitation is that standards or appropriate reference compounds should be available, which is unfortunately not the case for many plant metabolites, especially secondary metabolites. Moreover, in explorative research the target compounds to be analyzed are often unknown, i.e. it is frequently unknown beforehand what food compounds are expected to be influenced and altered by e.g. plant growth conditions, processing, gut microflora, etcetera. In such cases, so-called untargeted metabolomics approaches can provide more insight, due to its ability to detect and compare large series of compounds simultaneously. Especially LCMS and GCMS-based approaches are frequently used in metabolomics studies, due to their high sensitivity and compound separation power. Both LCMS and GCMS untargeted metabolomics approaches can nowadays generate relative abundance values for several hundreds to even thousands of compounds in each plant species, food product or human urine or plasma sample analysed. Such large dataset include the (mostly relative few) compounds that are already known from previous studies or from targeted analyses, as well as many compounds that have not yet been reported for the specific plant or food investigated, or are completely novel (have never been reported in nature before). Though yet unknown, these compounds may still be highly relevant to the specific research question, e.g. being most differential between / indicative for plant growth conditions or statistically related to quality traits like plant stress resistance, product flavour, health-related effects, etcetera.

In contrast to targeted analyses generating quantified values for phytochemicals, untargeted metabolomics approaches produce only relative intensity values per compound, as it is impossible to identify and quantify all detectable compounds, knowns and unknowns, beforehand. In order to enable good comparison of samples using such relative intensity values, well-designed setups of experiment and sample handling, as well as dedicated protocols for extract preparation, LCMS or GCMS analysis, and data processing are crucial. The general scheme of untargeted metabolomics analyses starts with generating varying samples, e.g. plants cultivated under varying conditions, freezing and grinding them into a fine homogenous powder, accurately weighing the same amount of sample aliquots, preparing extracts in the same manner preferably simultaneously in a single batch, in order to prevent possible variations in metabolite profiles due to differences in sample handling. Extracts are then injected into an LCMS or GCMS as a non-stop single series, again preventing possible variations due to alterations in sensitivity or chromatography between different series of analysis. Specialized metabolomics labs usually apply a dedicated (partly in-house) workflow to process the resulting GCMS or LCMS data files containing thousands of individual mass peak chromatograms in an unbiased manner, i.e. without prior knowledge of detected compounds. They use dedicated software to extract and align all individual mass peaks signals, generating a large spreadsheet of each mass peak in each sample. Mass peaks belonging to the same metabolite (e.g. natural isotopes, fragments and adducts generated in the MS) are then assembled using clustering programs. The final result is a large peak list (Excel-type or other database format) containing the relative intensity of each metabolite detected (both known and yet unknown compounds) across each sample analyzed. Then, statistical tools like univariate (e.g. Student’s T-test) or multivariate (e.g. Principal Component Analysis; PCA) can be applied to select compounds of most interest to the research question, e.g. differential between growth conditions, cultivars or fruit ripening stages. These selected compounds relevant to research question, rather than all compounds, are of most interest to identify. Identification is usually performed by detailed analyses of the MS data and comparison with existing databases and performing additional fragmentation experiments (MS/MS). Putative identifications can then be checked with standards if available or if possible to synthesize chemically. If unambiguous identification is needed but cannot be obtained using MS, the compound of interest needs to be concentrated, purified and subsequently unambiguous identified using Nuclear Magnetic Resonance (NMR).

Nowadays, untargeted LCMS and GCMS metabolomics approaches are used in many plant and food science-related research topics, in order to identify the effects of, for instance, plant cultivation conditions, cultivar variation, gene editing, plant development, fruit ripening, post-harvest treatments, and metabolism of phytonutrients within the human body in a highly detailed manner (Bino et al. 2004; Van Duynhoven et al. 2014; Lopez-Sanchez et al. 2015; van Treuren et al. 2018).

References
Bino RJ, Vos CHR De, Lieberman M, et al. 2004. of Tomato : Alterations in the Fruit Metabolome. : 427–438. DOI: 10.1111/j.1469-8137.2005.01362.x.

Lopez-Sanchez P, De Vos RCH, Jonker HH, et al. 2015. Comprehensive metabolomics to evaluate the impact of industrial processing on the phytochemical composition of vegetable purees. Food Chemistry 168: 348–355. DOI: 10.1016/j.foodchem.2014.07.076.

van Duynhoven J, van der Hooft JJJ, van Dorsten FA, et al. 2014. Rapid and Sustained Systemic Circulation of Conjugated Gut Microbial Catabolites after Single-Dose Black Tea Extract Consumption. Journal of Proteome Research 13: 2668–2678. DOI: 10.1021/pr5001253.

van Treuren R, van Eekelen HDLM, Wehrens R, de Vos RCH. 2018. Metabolite variation in the lettuce gene pool: towards healthier crop varieties and food. Metabolomics 14: 146. DOI: 10.1007/s11306-018-1443-8.

Cultivation practices

4.6. Metabolomics: next generation phytochemical analysis

Analytics