On 2020-10-25 19:08:24, user Daniel Haake wrote:
Dear study team,
Thank you for your study, which shows that the risk of COVID-19 death increases significantly with age. To improve the quality of the study I have some comments regarding the statistical analysis of the study. In the following I would like to go into it.
The time of the determination of the death figures
You write that antibodies are formed in 95% of people after 17-19 days. In contrast, 95% of deaths are reported after 41 days. That is a difference of 22-24 days. Nevertheless, you take the number of deaths 28 days after the midpoint of the study. Why do you take a later point in time than you yourselve have determined? Even with this approach, you are 4 - 6 days too late and overestimate the number of deaths. Why even this would be too late, I will explain in more detail below.
The 41 days were given for the USA. But what is the situation in other countries? In Germany, for example, there is a legal requirement that the death must be reported after 3 working days at the latest. Of course there can also be unrecognized deaths in Germany, where it takes longer to report. But this should be the minority. If we transfer however this fact of the USA to other countries, in which the risk of the long reporting time does not exist in such a way, you take up too many deaths into the counter of the quotient with. This leads to a too high IFR.
Counting the deaths 28 days after the study midpoint is also problematic because in the meantime, further deaths may appear in the statistics that were not infected until after the infected persons identified in the study became infected. This is because not all deaths take as long to report. These are then deaths that are not related to the study. You yourself write that the average value of the report of a dead person lasts 7 days with an IQR of 2 - 19 days. These figures speak in the statistical sense for a right-skewed distribution in the reporting of death figures. This in turn means that the majority of the deceased have a rather shorter reporting time. The procedure leads to a too high number of deaths. This is a problem especially with still existing infection waves, even with already declining infection waves.
You write: “The mean time interval from symptom onset to death is 15 days for ages 18–64 and 12 days for ages 65+, with interquartile ranges of 9–24 days and 7–19 days.”<br />
If we assume the 3 days reporting time for Germany, we receive 18 days for the age 18-64 and 15 days for 65+. In contrast, 95% of the antibodies are formed after 17-19 days, which is about the same or later than the time when the dead appear in the statistics. For other countries this may be different and would therefore need to be investigated. In any case, a blanket assumption from the USA is not possible for studies outside the USA.
Since the mean time interval from onset of symptoms to death is 15 days for the age 18-64 with the interquartile range of 9-24 days, but the midpoint of the range would be 16.5 days, this suggests a right-skewed distribution in the values. The same applies to the mean time interval from the onset of symptoms of 12 days with interquartile range of 7-19 days for the age 65+, where the midpoint of this range is 13 days. This also speaks for a right-skewed distribution of the values. This would mean that the majority of the values would be below the mean value in each case, making shorter times more likely. This also shifts the time too far back. Therefore it would be better to assume the median value, because it is less prone to outliers.
Your example infection wave from figure 1 also shows the problem with this procedure. As you say, antibodies are formed in 95% of people after 17 - 19 days. Now you have an example study with the median 14 days after the start of infection. At that time, only a few of the infected persons have formed antibodies at all, since just 14 days before the infection wave starts with low numbers and then increases. Only 4 days before is the peak of the infection wave. This means that the time period, which is very strongly represented, cannot have developed any antibodies at all. This leads to the fact that only very few infected persons are recognized as infected. In your example, 95% of the deceased are now infected, but only very few of the infected. This leads to a clear overinterpretation of the IFR.
Due to the problems mentioned, the number of deaths should therefore be taken at the median time of the study. Of course, it would be best if the studies took place immediately after the end of a wave of infection, where the death rates are stable and the expression of antibodies is complete.
Antibody Studies
You write: "A potential concern about measuring IFR based on seroprevalence is that antibody titers may diminish over time, leading to underestimation of true prevalence and corresponding overestimation of IFR, especially for locations where the seroprevalence study was conducted several months after the outbreak had been contained.“
You have made many assumptions about the death figures and adjusted the death figures (upwards) accordingly. Here you find that the antibodies disappear over time and that this can lead to an underestimation of the number of infected persons. However, you do not adjust the number of infected persons upwards, unlike your approach to adjusting the death figures. For example, a study by the RKI found that 39.9% of those who tested positive for PCR before did not develop antibodies (https://www.rki.de/DE/Conte... "https://www.rki.de/DE/Content/Gesundheitsmonitoring/Studien/cml-studie/Factsheet_Bad_Feilnbach.html)"). From this, we could conclude that the antibody study only detected around 60% of those previously infected and that the number of infected persons would have to be adjusted accordingly. But you have not done that. I can understand that you did not do that. I wouldn't have done it either, because we don't know how this is transferable to other studies. But in adapting the dead, you have transferred such assumptions to other studies. This should therefore also be avoided. There, too, we do not know how transferable it is. If you only make an adjustment in the dead, but not justifiably in the infected, this leads to an overestimated IFR.
PCR tests from countries with tracing programs
You write in your appendix D: "By contrast, a seroprevalence study of Iceland indicates that its tracing program was effective in identifying a high proportion of SARS-CoV-2 infections“.
In my opinion this is a wrong conclusion. In my opinion, it is not the success of the tracing program, but the number of tests and thus fewer unreported cases. To date, Iceland has performed almost as many tests as there are inhabitants in Iceland. Therefore they could keep the number of unreported cases lower. Other countries did not test as much. Therefore the results are not easily transferable to other countries. The PCR tests only show the present, but not the past and not the untested.<br />
You write it yourself: „(…) hence we make corresponding adjustments for other countries with comprehensive tracing programs, and we identify these estimates as subject to an elevated risk of bias.“<br />
Nevertheless, you leave these studies in meta-analysis, although for the reasons mentioned above this leads to severe problems. The figures for countries with tracing programs should therefore not have been included. The estimated number of unreported cases is not known and cannot be taken over by Iceland.
Study selection
You sort out some seroprevelence studies. These include Australia [63], Blaine County, Idaho, USA [67], Caldari Ortona, Italy [72], Chelsea, Massachusetts, USA [73], Czech Republic [75], Gangelt, Germany [79], Ischgl, Austria [81], Riverside County, California, USA [98] , Slovenia [101] and Santa Clara, California, USA [116]. For the most part, these studies are sorted out because there is no age specification for seroprevelence. Since this is the study's investigation, this is of course understandable. However, these studies in particular have shown calculated IFR values between 0.1% and 0.5%. At the same time, you leave the numbers of PCR tests from countries with tracing programs in the meta-analysis. As already mentioned, this is not correct due to the unknown dark figure and the transfer from Iceland is also not possible, as described before. This leads to the fact that studies with low values are sorted out, but at the same time uncertain numbers with high values are left in the study. This shifts the calculated IFR value upwards in purely mathematical terms.
It is precisely the outliers upwards that cause problems in the calculation. Since the numbers are rather small (in a mathematical sense), there can be no deviation as strong downwards as upwards. This means that there may be studies that deviate perhaps 0.2 percentage points downwards, but other studies that deviate upwards by 1.2 percentage points. This is a problem for the regression, because the regression then leads to too high values. Therefore, outlier detection should be performed upstream and the outliers should be excluded. You can also make it easier by taking the median value, since it is less susceptible to outliers. But then you would have only one value.
You write: “The validity of that assumption is evident in Figure 3: Nearly all of the observations fall within the 95% prediction interval of the metaregression, and the remainder are moderate outliers.”<br />
You can see it in figure 3, but due to the logarithmic scale it is difficult to estimate the ratios. Better suited is Figure 4, which would be desirable for the different age groups to be able to make a better estimation there. Figure 4 shows that many studies are outside the confidence interval, often to a considerable extent and to a greater extent also towards the high IFR values. Looking at the values and the confidence interval, these studies must have significant z-scores, which would show that these are clearly outliers that should not be considered. This leads to the fact that the regression will be brought further in the direction of high values, which results in too high IFR values.
Adjustment of death rates for Europe due to excess mortality
In Appendix Q you write: "In the absence of accurate COVID-19 death counts, excess mortality can be computed by comparing the number of deaths for a given time period in 2020 to the average number of deaths over the comparable time period in prior calendar years, e.g., 2015 to 2019. This approach has been used to conduct systematic analysis of excess mortality in European countries.[159] For example, the Belgian study used in our metaregression computed age-specific IFRs using seroprevalence findings in conjunction with data on excess mortality in Belgium“
I understand why you want to do this. But there are some dangers involved. The above statement may be true for Belgium, but it cannot be transferred to other countries in a general way. Especially since you cannot say in general terms that every dead person above average is a COVID 19 dead person. Mathematically, this would mean that there have been COVID-19 deaths in some of the last few years, because there have been periods with more deaths than the average. This makes the average straight. Especially since, as I said, you can't simply say that every death above the average is a COVID-19 death. The majority will be it, but not necessarily everyone. Thus, even cancer operations that did not take place or untreated heart attacks due to the circumstances and unnoticed visits to the doctor may have contributed a share. Whether this is the case, we do not know without a study. A blanket assumption that every death above the mean value is a COVID-19 death is not correct. From the statement "For example, the Belgian study used in our metaregression computed age-specific IFRs using seroprevalence findings in conjunction with data on excess mortality in Belgium", we could also conclude that the number of reported COVID-19 deaths is correct and can therefore be used as the numerator of the quotient for calculating the IFR. <br />
If you take this as a blanket assumption, how do you deal with those countries that do not have excess mortality but have several thousand COVID-19 deaths in the official statistics? Would you then correct the number of COVID-19 deaths downwards, perhaps even to 0? Certainly not.
Variation in the IFR
You write: "We specifically consider the hypothesis that the observed variation in IFR across locations may primarily reflect the age specificity of COVID-19 infections and fatalities.“
It is also possible that the variation in the calculated IFRs occurs due to still different dark figures. If, for example, the PCR tests are taken in countries with a tracing app, but an IFR based on Iceland is calculated there, this can lead to incorrect and too high IFR values. Also the adjustments of the death rates themselves or the late time of the death rate determination 4 weeks after the study center can lead to this high variance.
Conspicuous features regarding the correct determination of the death figures
In Table 1 you write that on July 15 there were 8 million inhabitants with a projected 1.6 million infections. According to my research there are 8.4 million inhabitants. You calculate the 1.6 million infected on the basis of the 22.7% infected in the study. However, the blood samples were taken between April 19 and 28, so the infections occurred before or until the beginning/middle of April. So you now take the number of infected persons from the beginning/mid-April or from April 24 (study midpoint) and insert them for July 15, i.e. just under 3 months later! In the meantime, however, not only people have died, but have also become infected and formed antibodies. They thus increase the numerator of the quotient, but leave the denominator unchanged, although the denominator would also be higher. So you shift the IFR upwards here as well.
The study on Gangelt, which was not taken into account, shows a similar picture. You write that at the end of June there were 12 deaths and therefore the IFR rises to 0.6%. That is 8 weeks (!) after the study center. This does not take into account that in Germany the deaths must be reported after 3 days. If you have proceeded in this way when calculating the other IFRs from other studies, this suggests that the IFR values are too high.
Calculation of the IFR of Influenza
You calculate the IFR of influenza based on the CDC figures for the 2018/2019 influenza season and indicate the IFR as 0.05%. Firstly, it should be said that statistically it is never good to look at just one value. The average of a time series should be considered. You calculate the value by looking at the estimated deaths and looking at how many were estimated to be symptomatically infected with influenza. You use a study according to which about 43.4% of cases are asymptomatic or subclinical (95% CI 25.4%-61.8%). You then take the mean value from the confidence interval with the value 43.6% and use this figure to calculate how many people were probably infected with influenza. Statistically it is not correct to take the average value of 43.6%. The value of 43.4% must be taken. Due to the small difference, this does not make much difference, but it shows the statistically imprecise consideration that runs through the study and generally leads to an IFR that is too high or, in the case of influenza, too low.
Now a statement on the selection of the 2018/2019 flu season, the CDC writes: "These estimates are subject to several limitations. (...) Second, national rates of influenza-associated hospitalizations and in-hospital death were adjusted for the frequency of influenza testing and the sensitivity of influenza diagnostic assays, using a multiplier approach3. However, data on testing practices during the 2018-2019 season were not available at the time of estimation. We adjusted rates using the most conservative multiplier from any season between 2010-2011 and 2016-2017, Burden estimates from the 2018-2019 season will be updated at a later date when data on contemporary testing practices become available. (...) Fourth, our estimate of influenza-associated deaths relies on information about location of death from death certificates. However, death certificate data during the 2018-2019 season were not available at the time of estimation. We have used death certification data from all influenza seasons between 2010-2011 and 2016-2017 where these data were available from the National Center for Health Statistics. (…)
The CDC writes the same for the 2017/2018 season, so the values, which were always only estimated anyway, were estimated even more due to missing data. Therefore we should have considered the figures for the seasons 2010/2011 to 2017/2017. If we calculate the IFR of influenza in this way and also use the confidence interval to calculate the number of people potentially infected per season, we get an IFR of influenza of 0.077%, ranging from 0.036% to 0.164%. Every single year prior to the 2018/2019 season was above the 0.05% and the average of 0.077% is also 54% above your reported value. This means that influenza is still not as lethal as COVID-19 has been so far, but the factor is not as high as suggested by your study.
It should also be noted that it is not possible to compare an IFR calculation that is equally distributed over age with an IFR of influenza that is not equally distributed over age. You do not do it directly, but by naming these numerical values, this has been taken up by the media. The IFR just indicates the mortality per actually infected person. Therefore the IFR of the actually infected persons of COVID-19 must be compared with the IFR of influenza. You can of course calculate a hypothetical IFR assuming that every age is equally likely to be infected. In this case, however, the calculation must be performed not only for COVID-19, but also for influenza.
I hope I can help you to improve the study in terms of statistical issues. I remain with kind regards.