On 2018-04-05 23:08:06, user Rachel Steele wrote:
Review of Hannigan et al. (2018)
Biogeography & environmental conditions shape bacteriophage-bacteria networks across the human microbiome
Summary:
In this paper, the authors created a network model which characterized the interactions between viruses and bacteria within the human microbiome. To do this, they used data from three previously published metagenomic datasets. Each of the three datasets contained both purified DNA viral metagenomes and bacteria-dominated whole community metagenomes, allowing the authors to link DNA virome data with the bacterial metagenome. The datasets were assembled into contigs, and the resulting contigs were clustered into either phage or bacterial operational genomic units (OGUs). The authors used a machine-learning algorithm to predict which phage OGUs would infect which bacterial OGUs. The authors used bacterial and phage species with different infection ranges and known interactions as a training data set for a machine learning algorithm, which populated a network model for the metagenomic datasets used based on the following features:
Genome nucleotide similarities (Blast)<br /> Gene amino acid sequence similarities (BlastX)<br /> Bacterial Clustered Regularly Interspaced Short Palindromic Repeat spacer sequences that target phages (CRISPR)<br /> Similarity of protein families associated with experimentally identified protein-protein interactions (Pfam)
The authors then examined the role of diet and obesity on gut microbiome network connectivity using centrality metrics, and found that high-fat diets appeared to have a less connected network. Additionally, the obesity-associated networks appeared to possibly be less connected. The individuality of microbial networks was then investigated using a dissimilarity metric to test whether microbiome network structures were more similar within people than between people over time. It was found that network dissimilarity within each person was less than the network dissimilarity between that person and other individuals (not statistically significant). There was no evidence for gut network conservation among family members, though the skin microbiome network structure was conserved within individuals.The network was also studied across the human skin landscape, and it was found that moisture and occlusion played a significant role in the network structure of the skin microbiome.
Major Comments:
-
Genome nucleotide similarities and gene amino acid similarities are collinear parameters; it is unclear how each of these parameters can give unique contributions to the model.
-
The receiver operating characteristic is very low; it is so close to 0.5 that it seems that the model practically randomly assigns network nodes and edges to bacteria and phages. Perhaps the method is not as predictive as the authors may suggest in the text.
-
To an individual not well-versed in phage biology, it is unclear why seeing sequence elements similar to the bacterial 16S rRNA gene in the virome datasets would with certainty indicate bacterial sequence noise (lines 122-125). Is it known that phages never have sequence similarity to the bacterial 16s rRNA gene?
-
Microbiome diversity should be considered within this analysis: it is likely that as diversity changes for viral populations as well as bacterial populations, the structure of the network will be greatly affected. Will this impact the conclusions?
-
In lines 211-214, it is stated that a higher closeness centrality indicates more connectivity, which suggests a greater resilience against network degradation by extinction events, yet in lines 319-322, it is stated that less connected networks suggest a higher resilience against network degradation, a seeming contradiction which confuses the reader.
-
The training data set as represented in Figure S4A is disturbingly sparse but these data are key in making predictions regarding interactions between viruses and bacteria in the gut. The choice of virus-bacteria tested should be clearly discussed and its limitations outlined.
-
One way to test the consistency of the method used to create the model might be to randomly subset the original set of training data (from Fig S4A), and use these randomly selected interactions to create a new model (use the subset as training data), then to use the whole set of data presented in Figure S4 as experimental data to determine how well the model works for this set of known interactions. This method could be repeated to determine how consistently the networks are created - perhaps this is a “power analysis” (how many of the interactions in Fig S4A are needed to consistently predict interactions in the gut virus-bacteria community.
-
Figure 1 has one more image than is referenced in the caption; it appears that for the provided image, Figure 1D was not described in the caption. This image should either be removed from the figure or should be incorporated into the figure caption and into the text. After making the correction, all references to Figure 1 should be checked to ensure the correct image(s) within the figure are referenced.
-
The paper provides a balance of information which indicates how the model moves the field forward while at the same time indicating the shortcomings and weaknesses of the model. However, some biased language is included, specifically in lines 163-165. It is unknown what is meant by “ideal” for balancing true positives and true negatives within the model.
Minor Comments:
-
Culture-dependent data is referred to in lines 72-74 as being limited in the scale of possible experiments and analyses in contrast to using inferred data from metagenomic datasets, and in lines 360-363, it is stated that inferring specific relationships between phage and bacterial species is limited compared to culture-based work. With these different benefits and disadvantages, it is unclear whether there a balance which in the future could be found between the two methods.
-
The figures at the end are not labeled as Figures 1-4. This can cause confusion, as the figures are included separately from their captions. To enhance readability and reduce confusion for the reader when looking through the figures, the figures should be included with their captions in line with the text within which they are referred.
-
It is unclear why the random forest model was used to build the network. Are there other machine-learning algorithms which could be used to generate a network model? Why specifically choose this one?
-
Line 43 contains a typo; “homestatsis” should be changed to “homeostasis”.
-
The caption for figure 2 contains a typo: in “because one of the was only sampled post-diet”, “the” should be changed to “them”.
-
Figure two could have a better format: Axes should be scaled the same way so comparisons between plots can be made. Degree centrality plots and closeness centrality plots could be lined up next to one another, and matching treatments/groups could be stacked vertically so that one representation of each axis could be used.
-
It is unclear whether it is the same individual under study for the high fat vs. low fat diet or for the healthy/obese individuals -- it would be easier to understand the data if individuals were color-coded.
-
Averages and error bars could be presented on the plots for each study group
-
The supplemental figures and table were provided as individual files rather than being included in the text. This makes it difficult for the reader to access them.