Wednesday, October 7, 2009

20 Monkeys

Last Wednesday (9.30.09), Age of Autism posted an unsurprising announcement of a soon-to-be published study by Hewitson et al., Delayed Acquisition of Neonatal Reflexes in Newborn Primates Receiving a Thimerosal-Containing Hepatitis B Vaccine: Influence of Gestational Age and Birth Weight in an upcoming issue of Neurotoxicology. Naturally, the author, Mark Blaxill not only inflates the results of the study but also manages to implicate influenza vaccines too:
Consequently, the finding that early exposure to potentially toxic vaccine formulations can cause significant neuro-developmental delays in primates has explosive implications for vaccine safety management. These implications go far beyond the domestic HBV program and raise concerns about HBV formulations sold abroad as well as the domestic influenza vaccine program. Most influenza vaccines, including the vaccines in the upcoming swine flu program, contain thimerosal and are routinely administered to pregnant women and infants.
Primate studies are not trivial to perform and it is imperative to implement a robust study design with sufficient statistical power. This is why we discuss the study design in more detail here. Before we dive into this study, let's backtrack a bit. Back in May of 2008, Hewitson et al. presented 3 posters at the International Meeting for Autism Research (IMFAR). Dr. David Gorski provided and excellent summary of these 3 abstracts on Science-Based Medicine so there is no need to discuss these at length here and now. We mention these abstracts since they are clearly parts of a single study that this Hep B/thimerosal study is also part of and there are discrepancies between them.

The authors claim that the animals were selected on a "semi-random" basis to receive either thimerosal-containing Hepatitis B vaccines at birth (TCV or thimerosal-containing vaccine) (n=13), a saline placebo (n=4), or no injection (n=3). We would like to point out that there isn't any accepted term such as "semi-random". Selection is either random or non-random and while there are numerous ways to randomise ones' selection randomisation is never a sliding-scale. Next, the 3 aforementioned abstracts state that there were 13 treatment animals (TCV) and 3 receiving saline placebo. Since these animals are all undoubtedly from the same group and protocol, that means that either 4 animals (1 saline placebo and 3 no injection) were not used for inclusion on the abstract results or that 4 animals were added for the purposes of this study. This discrepancy, along with their "semi-random" selection require explanation since they could certainly affect the results and subsequent interpretation.

Given the authors' hypothesis, i.e. ethylmercury affects infant (macaque) neurodevelopment, and to ensure they are controlling for all conceivable bias, it is puzzling as to why their treatment and control group sizes were alloted in the manner they were. The Hep B vaccine, with or without thimerosal will have a physiological effect upon recipients, which may easily manifest in outward symptomology e.g. fever, fussiness and/or injection site swelling and soreness. A more robust study design would have, for example, been TCV Hep B (n=10), thimerosal-free Hep B (n=5) and saline placebo (n=5). This would have controlled for the absolute effects between the TCVs, Hep B and placebo.

The authors also describe how the animals were divided into peer groups such that each group contained animals from either exposed or unexposed study groups for later "social testing". Not only is this social testing never mentioned again in the present paper, but placing the animals in such groupings introduced observer bias for the reasons mentioned above. Even though the animals were in separate cages, the cages containing exposed or unexposed animals were grouped together. An observer could subconsciously note that an animal was fussy and had visible swelling at the injection site and thus, be able to guess that the rest in the group were also TCV-exposed. Conversely, animals in the unexposed peer groups would likely be behaving differently and no outward signs of having received a vaccination.

Appropriate randomisation and blinding of the observer to the treatment are essential elements of a good study, animal studies that do not utilize randomisation and blinding are up to 5 times more likely to report a difference between study groups than studies that employ these methods. Neither were apparently done properly in the present paper.

The neonatal assessments for reflexes, perceptual and motor skills were carried out by a single assessor, L.A.H. On at least 2 other studies of neonatal Rhesus Macaque behaviour using the same metrics, i.e. the Brazelton Neonatal Behavioral Assessment Scale, co-authored and authored by some of the same investigators as this study, multiple assessors were used. From Sackett et al. (2006):
Reflex and sensory motor responses were measured 5–6 days per week from birth through 30 days. The purpose was to assess neural integrity during the neonatal period, as reflected in the development of motor and sensory reflexes required for the survival of macaque monkeys in a natural environment. Similar procedures for a similar purpose have been used to assess both humans (Brazelton, 1973) and rhesus monkeys (Schneider & Suomi, 1992).
Reliability was assessed from the simultaneous observations of pairs of testers for each response that could be scored simultaneously and by immediately repeated observations by the second tester for items requiring handling the neonate by the tester. The latter values combine reliability with response repeatability. Periodic retesting maintained reliability at 80% or better on each type of scoring.
And from Dettmer et al. (2007):
Neonatal assessment and social data were collected by multiple observers. Interrater reliability for social behavior was achieved by computerized calculation of a kappa score (k) of .80 or above for each of five consecutive randomized focal animal sessions. Reliability for nursery assessments was achieved when observers obtained an 89% agreement on three consecutive randomized assessments.
Now given the intense scrutiny most of the authors had to know this study would be given, why use just one assessor? What was the possible observer error of one assessor if such a high discordance is acceptable for 2 or more observers?

The authors state that they had to right-censor the data because a further intervention on day 14 precluded them from assessing the animals longer (right-censoring data is a statistical technique to account for subjects that don't reach the a threshold value by the end of the study period). But not only do they fail to mention what this intervention was, but it appears as though they did assess the animals until at least day 30, as indicated in one of their IMFAR abstracts. Why was this data not reported in the present paper? authors also did not include any biochemical analyses, such as baseline and post injection blood-mercury levels. If they went to such considerable effort to prepare and verify TCVs for this study, it seems almost absurd to ignore the relatively easily-obtainable data points that mercury testing and a chemistry panel could provide.

In summary, the present study has significant flaws in experimental design. Groups size varied greatly, animals were not randomised properly, and the observer was not properly blinded. These are all conditions under which the likelihood of finding a significant difference between groups (a 'false positive') is artificially increased. Furthermore, 4 animals seem to have been added long after the initial study (as reported at the 2008 IMFAR meeting). While this may have been to satisfy reviewer requests, it certainly contributed to the problems mentioned above. These problems will also carry through to all subsequent studies reporting on the same cohort of monkeys.

There are further problems in the data reported and how they are presented and interpreted. According to the results:
All infants remained healthy during the study testing period reaching all criteria for maintaining health including appetite, weight gain, and activity level, and achieved temperature regulation by Day 3.

Neonatal reflexes and sensorimotor responses were measured daily from birth until post-natal Day 14. Datasets from the two unexposed groups (with or without a saline injection) were combined when no differences were found for all measures (p>0.5). There was a significant delay in time-to-criterion for exposed vs. unexposed animals for three survival reflexes including root (Fig. 1A; p=0.004), suck (Fig. 1B; p=0.002) and snout (Fig, 1C; p=0.03) and approached significance for startle (p=0.11). The effect of exposure also approached significance for grasp hand (p=0.07), one of the motor reflexes. There were no reflexes for which the unexposed animals took significantly longer to reach criterion than the exposed animals (Table 2).
So for all of the thirteen measurements, only 3 reached statistical significance. And one of these, snout, the unadjusted main effects of exposure, did not maintain significance using the Cox regression models, nor did it reach statistical significance when exposure was measured against gestational age (GA). The data from Tables 3 and 4 were sufficiently explained in the body of the text, and since they were attempts to squeeze every last bit of possible associations between TCV exposure and development, the tables themselves were redundant. It would have been far more interesting and informative to have instead included a table that had animals, exposure and assessment results for each measurement. It is also worth noting that all animals in the exposed groups reached root, suck and snout criterion well-prior to reaching censor and unexposed animals did not reach criterion for some measurements prior to censor.

Three measurements out of 13 were statistically significant when comparing TCV exposed and unexposed animals but are they biologically important? Maybe, maybe not; it's difficult to say due to the low quality of the study design in terms of control groups and observation. It is worth noting this gem from Mark Blaxill about the results interpretation:
To make the point more simply, it would have taken only modest differences in the management of the data analysis to make over half of the measured reflexes show significant delays instead of a third of them. Pointing this out is not intended as a criticism of the study, however, but rather a demonstration of how conservative the authors were in their interpretation of the results.
"Modest difference in the management of the data analysis"?! What exactly are you suggesting here Mr. Blaxill? A little more data massaging would have been justified?

There were 3 authors involved with this study that have considerable conflicts of interest, Dr.s Hewitson, Wakefield and Stott. Here is their COI statement as it appears in the galley of the manuscript:
Prior to 2005, CS and AJW acted as paid experts in MMR-related litigation on behalf of the plaintiff. LH has a child who is a petitioner in the National Vaccine Injury Compensation Program. For this reason, LH was not involved in any data collection or statistical analyses to preclude the possibility of a perceived conflict of interest.
Now here is the COI declaration in Neurotoxicology Instructions to Authors:
All authors are requested to disclose any actual or potential conflict of interest including any financial, personal or other relationships with other people or organizations within three years of beginning the submitted work that could inappropriately influence, or be perceived to influence, their work. See also http://www.elsevier.com/conflictsofinterest.

NeuroToxicology requires full disclosure of all potential conflicts of interest. At the end of the manuscript text (and in the cover letter of the manuscript), under a subheading "Conflict of Interest statement", all authors must disclose any financial and personal relationships with other people or organisations that could inappropriately influence (bias) their work. If there are no conflicts of interest, the authors should state, "The authors declare that there are no conflicts of interest." Signed copies of the NeuroToxicology Conflict of Interest policy form are required upon submission. The Conflict of Interest policy form can be downloaded here. In order to minimize delays, we strongly advise that the signed copies of these statements are prepared before you submit your manuscript. The corresponding author is responsible for sharing this document with all co-authors. Each and every co-author must sign an individual disclosure form. The corresponding author is responsible for uploading their form and those of their co-authors.
None of these 3 authors fully declared their conflicts of interest. Now they may think they have some wiggle room with the way the COI declaration instructions are worded, but they don't. They have certainly violated full disclosure of actual or potential conflicts of interest. Dr. Hewitson has failed to mention her affiliation with DAN! and her husband's employment by Thoughtful House and his associations with decidedly anti-vaccine organisations such as SafeMinds, whom, not-so-coincidentally co-funded this study and may or may not have had significant input into the study design. Dr. Wakefield goes without saying but as the executive director of a medical practise that unabashedly treats autism as vaccine damage, the mission of his organisation has a considerable vestment in any positive results for such studies and should have been declared as such. Dr. Stott is employed by Thoughful House as a Senior Research Associate as well and may still be offering her services as paid 'expert' for 'vaccine injury' litigation.

Dr. Carol Stott has undeclared conflicts of interest, not to mention she has a history of rather deranged behaviour related to the MMR-autism scare originated by Dr. Wakefield (she was disciplined by the British Psychological Society), yet she was allowed to handle the data analyses for this study.
Finally, let's take a look at their funding sources:
This work was supported by the Johnson Family, the late Liz Birt, SafeMinds, the Autism Research Institute, The Ted Lindsay Foundation, the Greater Milwaukee Foundation, David and Cindy Emminger, Sandy McInnis, Elise Roberts and Vivienne McKelvey.
Most of these individuals and organisations have considerable investment in a vaccine-autism association, particularly with regards to mercury. Sally Bernard has demonstrated her vitriolic dedication to the mercury-autism causation claim when she was invited to take part in a CDC study examining TCVs and autism. A study that was published in a 2007 NEJM issue had her input and cooperation until the results were announced; she flounced off, claiming that the study design was poor. Given that SafeMinds is a supporter of Thoughtful House and has funded dubious research such as the "Rain Mouse" study by Hornig et al., it is clear that they exert substantial pressure for results they may want.

Primate studies come with significant financial and ethical costs. These are sentient creatures, their use in animal research should not be broached lightly and their sacrifice should come with a significant contribution to the literature. It is a shame that the design of the current study did not ensure the optimal benefit from this investment.

Please visit Respectful Insolence for Orac's undoubtedly thorough critique of this study.

ETA: The authors of this blogpost, Catherina and Science Mom have absolutely no conflicts of interest either real or perceived to declare.