Challenges and Lessons-Learned from the PHN Normal Echo Database Study… A Conversation with Dongngan Truong and Leo Lopez
By David K. Werho, MD (@DWerho) - Associate Editor, Social Media
Cardiology in the Young (@CardiologyYoung)
In their recently published article in Cardiology in the Young, titled “Challenges and lessons learned from the Pediatric Heart Network Normal Echocardiogram Database study,” (Truong, Dongngan, et al. "Challenges and lessons learned from the Pediatric Heart Network Normal Echocardiogram Database study." (Cardiology in the Young (2020): 1-6). Dr. Truong and her colleagues discussed the various aspects arising in the course of their study that were unexpected, including several obstacles to the completion of the study. Despite the many challenges they faced, they were able to create the first large, multi-institutional standardized, pediatric z-score database that accounts for age, race, and gender (Lopez, Leo, et al. "Relationship of echocardiographic Z scores adjusted for body surface area to age, sex, race, and ethnicity: the Pediatric Heart Network Normal Echocardiogram Database." Circulation: Cardiovascular Imaging 10.11 (2017): e006979.
As this was a foundational project for pediatric echocardiography, it was important for the authors to share their experience so that others designing future large-scale studies using echo data can refer to this article as they plan their own research. Amidst the COVID-19 lockdown, I had the pleasure of chatting (via Zoom) with two of the authors, Dr. Leo Lopez at Lucile Packard Children’s Hospital – Stanford University and Dr. Dongngan Truong at Primary Children’s Hospital – University of Utah about their work on this project.
Can you tell me a little bit about the background of why the PHN decided to do the normal echo database study?
Lopez: All of the z-score databases up until the time that we published our paper had limitations in terms of sample size, in terms of not paying attention to race and some not paying attention to sex, and some that all had very different statistical approaches in terms of determining what the Z scores are. So there was not a uniform way of doing that and we felt like the PHN was the perfect place to try and do a multicenter study where we can address all of the prior limitations to Z scores and that is how it came out. After we published the quantification paper in JASE (Lopez, Leo, et al. "Recommendations for quantification methods during the performance of a pediatric echocardiogram: a report from the Pediatric Measurements Writing Group of the American Society of Echocardiography Pediatric and Congenital Heart Disease Council." (Journal of the American Society of Echocardiography 23.5 (2010) 465-495) in 2010, one of the things that was discussed fairly regularly while we were drafting that paper was that it could serve as a manual of operations for a project like the PHN Z score project and so it almost felt like a natural offshoot from that original paper.
Can you summarize what you found when you made the initial analysis of all the Z scores and how race and age and those things factored into it or it did not factor into it?
Lopez: Yes so there are a couple of very important findings in there and some of it was kind of known but we did not really, one of them was what body size parameter was going to be the best predictor and what we were trying to do was come up with the simplest model so that it will be simple for everybody who wants to use these scores and so what we showed was that BSA (body surface area) was the best body size parameter that predicted how cardiovascular structures grew. We mostly knew that but this really clarified it for us. The other thing that was important was to know whether or not all of these other confounders were true, things that you needed to account for and they included sex, race, ethnicity, age, and a few other things that we were not able to test previously and so the finding on our paper was that none of them actually had a clinically significant effect on the way body surface area predicted the growth of cardiovascular structures.
Now you notice I said clinically significant effect and I think the most novel part of the PHN Z score paper was the fact that it is very easy to find statistically significant differences when you are looking at models when you have 3200 subjects. There is just such a large amount of data that the P value of differences exist everywhere you look and in fact that is what we found: that age, race and all of those things had some effect or other on the Z scores and you can just imagine how crazy and complicated these Z-score models would have been if we had to account for all of those things. That would have meant like developing 36 different Z-score models just so we can account for all of the combinations, but we talked about how there is a certain limitation in performing echocardiograms that is related to the spatial resolution that we deal with and so if we think about variability of measurements within the context of that resolution issue, all of the measurements we make could vary within a certain amount and you could just really attribute that to variability error. A whole group of measurement values with miniscule variation could really be okay for a particular measurement and what we found was that at least from other studies, about 5% was kind of the magic number that any variability within 5% of a measurement value could be attributed to sampling error. And so what we said was that if you look at all of the models that we looked at and somehow accounted for that variability, then can we get rid of all of those statistically significant differences? And that is exactly what we did; we found that by just accounting for that variability everything across the board was clinically insignificant.
And going along those lines, when you published your new normative database and all the new PHN Z scores with the new model, did you find any clinically significant differences from the previously used scores or anything that might go back and change retrospectively how we may have cared for patients in the past?
Lopez: Yes that is a good question and we actually have presented an abstract at the American Heart Association Scientific Sessions on this where we compared the PHN Z scores to the Boston Z scores, the Detroit Z scores, and the Italian Z scores, and we found some differences particularly in bigger kids. Part of the challenge is that the statistical approach was different with the Italian and Detroit models, and so as you can imagine the PHN Z scores were most similar to the Boston Z scores because we used the same statistical approach. And so yes, I think that there are differences and we hope that by having a much bigger dataset that we have a much more robust and predictive model but that is going to take time for us to really see whether that is true.
So, as we get more collaborative in our field and as our fields grows, and especially as echo images are easier to share and to analyze amongst different centers, it’s becoming possible to really get large numbers and to do new types of research based on large echo datasets. I think that is really the important thing about this paper is that you did a great job of laying out what are some of the limitations of putting together a large echo-based study. So tell me a little bit about where the idea for this offshoot paper came from?
Truong: The goal of the paper was to hopefully be a way that we can learn from that experience and so that future studies can have some way of hopefully not going through the challenges or the issues that our study went through. A previous PHN trial, a randomized clinical trial of ACE inhibitors in single ventricle patients also had a similar publication addressing challenges and difficulties with that study (Pike, Nancy A., et al. "Challenges and successes of recruitment in the “angiotensin-converting enzyme inhibition in infants with single ventricle trial” of the Pediatric Heart Network." Cardiology in the Young 23.2 (2013): 248-257 and so this was mirrored after that.
In your paper, I found it really striking how many patients you guys screened and ultimately how many were excluded, so tell me about your inclusion and exclusion criteria and had you done this again or had you had this information going back to rewrite your study from the beginning, what would you have done differently?
Truong: The inclusion criteria were that there had to be data regarding a patients race, indication for the echo, etc, as well as being a normal echo without having a family history or personal history of the patient having either significant cardiac disease and not having a systemic illness at the time the echo was done. And so centers had to go through a lot of screening for simple things and excluding a lot of patients for things like not having the race listed or not having a clear indication of why the echocardiogram was done, and that was definitely something that I think people were not expecting - just the number of potential screenings that had to be undertaken in order to just find patients that would just meet inclusion from a demographic standpoint or just the basic inclusion and exclusion criteria.
Lopez: One thing we found was that even though we had published the quantification guidelines document, a lot of programs were not necessarily following all of the recommendations. They were following some of them, but they were not following all of them. One of the major issues was the measurement of the ascending aorta where you can see the RPA - a lot of centers just did not have that view and so were not following the exact recommendation on how to measure the ascending aorta. One of the decisions we made early on and I do not know whether it was a wrong decision but it certainly affected the timeline was that all submitted studies had to have a certain minimum number of elements or images providing us the measurement values, and so the reason so many studies were excluded was because many of them did not have all of the mandatory elements for a study to be eligible.
One of the questions that came up a couple of times during the study was why did we get that strict? Why could people not just submit studies that were definitely normal, then extract from each of those studies whatever we can and then use that to develop our models? And you can imagine how much more crazy it would have been for the core lab to try and navigate through all of that and how much more potential variability there would have been. Plus the issue of a study that may not necessarily be normal then could potentially get included, for example if somebody just did not take pictures of the coronary arteries and it turned out they actually had dilated coronary arteries and never found it, then we could potentially have included those patients. And so I think that is an important question and I do not know if we actually know the answer to that question.
Truong: And I think just the retrospective nature of this study also limited centers’ ability to react to some of these changes. For instance, one center did not allow any echocardiograms after the actual launch date to be enrolled, so everything really truly was retrospective for that center whereas other centers could evolve from study launch and potentially change protocols to meet the study criteria, where that was not possible at some centers.
Lopez: At my previous center for example, we realized when we started doing this that we were not doing those ascending aortic measurements correctly and so we had to change and we really had to focus on that within my Echo lab. We were then able to submit an IRB amendment allowing us to then include studies that were done between the start of the project and the time that we submitted the amendment. Some centers did not have the ability to improve and add the studies that would have been eligible for this project and that was a big challenge.
Obviously, there are lots of reasons to do this retrospectively, especially with the way that regulations and issues regarding consent and those types of things are. However, as you mentioned, there are also many limitations to doing something retrospectively, especially when the regulatory bodies at the different centers are more or less strict or just have different expectations. Do you think that going back if you had said, “We are doing this prospectively,” would that have changed the amount of patients you would have been able to enroll over two to three years? Obviously, there would be other new challenges involved in that, but if somebody else wants to design an echo study in the future with these lessons learned, what would you suggest they do?
Truong: Well, I think a prospective study design definitely would have alleviated some of the things that were encountered, as you mentioned, just the ability to adjust echocardiographic standards locally to gear their studies towards a particular imaging protocol could have actually been addressed. Or making sure on future echocardiograms that every single aspect of the imaging element requirements were in each study is definitely one thing that I think would have been improved, but as you mentioned I think there could be a lot of potential headaches. For instance, if a baby was getting an echo for a murmur and you do not know till the end that it is completely normal and then for prospective studies, as you mentioned, you need things like consent. And costs go up with prospective studies as well. So I think prospective studies would have helped with a lot of aspects of the things that were encountered, but I do not think it would have solved everything necessarily. But that was definitely something that was discussed in terms of future studies, would it be better to do them prospectively? I think it would have to be thoroughly discussed and weighed out, the added costs and the added challenges related to that.
Lopez: The original proposal was actually a prospective study and it was turned down because it cost too much to do it prospectively. I think it would have been faster and I suspect that some of the people involved may now see it differently in retrospect, but I will tell you I do not know if it is a done deal in terms of whether prospective or retrospective is the way to go. I do not know the answer to that because the other part of it is, there is a certain bias that you could potentially have with people that agree to have their study included in a normal database and so you know somebody might say that we are automatically excluding some patients… I am thinking that there may be some biases associated with it. The other challenge all around is that you want your results to be a good representation of real world data and so what you do not want are such tight study parameters where the standard deviations are not that big then you do not actually get a representation of how this is done out in the outside world. We had an issue with that by using a Core Lab but certainly by doing it prospectively there is a potential for those confidence intervals to be even tighter and you do not want it to be too tight because then everybody starts to look like they have abnormal measured values.
As you were reflecting on the study and thinking about ways in which you can inform others as we make studies in the future, what were some of the things that surprised you most when you were looking back? What challenges had you not anticipated that caught you off guard in the end? What is the take home point of your reflection as you put all this back together retrospectively?
Truong: There are a couple of things that were most surprising to me that I would not have even thought of as a potential issue. One was a Hispanic classification and just in terms of not even realizing that there were differences between how the hospital systems and how the NIH define something like that, that something like that would even be an issue was very surprising to me. There are differences in definitions between national standards and at the local level. And then the other one for me was also just how differently IRBs can interpret what you would think would be standard definitions again in terms of retrospective study versus not and how each center, how interpretable such things can be when you would not necessarily think that it would be an issue.
Yes, the way that race and ethnicity data are collected and used vary so much within our field even from study to study and from clinical medicine to the research world. Some studies will use Hispanic as a race in their analyses and others with use it as an ethnicity, so I think it is something that we all need to get on the same page about, but until you come across challenges like this, it might be really hard to anticipate that it is even going to be an issue. I worry that it actually could impact results and how you interpret data and I do not think we really have a standard yet. Based on these experiences how do you think race and ethnicity should be considered in these types of studies?
Lopez: So I know one of the things that came up early on when we were doing this, working with the proposal even back when we were doing it as a prospective proposal, there were people that suggested that we should actually do DNA testing on everybody that was included because the whole concept of race especially in this day and age is quite inexact. But somebody did suggest that to really do this in a scientific way, we should be looking at genes and chromosomes to understand what the differences are and obviously that is an impossibly expensive thing to do for this type of study and this number of patients, but I think we all should just recognize that any evaluation and any study that wants to account for race can never be pure and probably never completely a hundred percent correct and predictive.
There is another thing that I remember in terms of one of the lessons we learned. There are a lot of guidelines coming out right now for pediatric imaging and this was not true 10-15 years ago. We have been pretty active in the pediatric echo communities about creating these guidelines including appropriate use criteria, imaging quality metrics, and on all sorts of fronts. But I think one of the things that we learned here that continues to be a recurring theme, is that publishing guidelines do not necessarily translate to changes in practice. Maybe over years and years it does, but often a lot of those guidelines have evolved by the time practice changes. But we definitely saw that here in terms of the quantification guidelines and that even in very reputable, very excellent echo labs across the country, there was so much deviation from the guidelines in their general practice.
Finally, is there anything you want to emphasize for future investigators putting together echo based research collaborative projects?
Truong: I suspect there are going to be lessons learned from every multicenter study that we do and, as we have talked about, I do not think there is one simple solution to have the perfect project where will be nothing to learn from. I think there are going to be potential issues that arise with every study whether it is retrospective, whether it is prospective, but hopefully with publications like this, some of those can be minimized to the best of our abilities and hopefully these things are not surprises in the future and there are other secondary plans in place if they arise.
Lopez: Yes, I think that one of the things that is sort of a limitation that we did not address in this paper but would be useful for people is the limitation with left ventricular functional parameters and we did publish a paper subsequent to this on the issues with ejection fraction and shortening fraction. And I think that that to me was probably one of the most important lessons we learned. We addressed it in another paper because we really had to sort of go above and beyond what we did with the original project to understand the problems. But I would encourage people to look at that paper so that they can understand the limitations associated with those particular parameters. (Frommelt, Peter C., et al. "Challenges With Left Ventricular Functional Parameters: The Pediatric Heart Network Normal Echocardiogram Database." Journal of the American Society of Echocardiography 32.10 (2019): 1331-1338.