2Society for Vascular Surgery Patient Safety Organization, Chicago, IL
Jens Eldrup-Jorgensen, Maine Medical Partners Vascular Care, 887 Congress St, Suite 400, Portland, ME 04102, Tel: 207-662-7032, Fax 207-774-9388, E-mail:
Methods: After discovery of data discrepancies, the VQI undertook a data review in 2017. A statistical review was performed to identify variances in the data distribution for each variable reported in the data sets in each registry. Using a retrospective audit strategy, the internal consistency and accuracy of data was investigated.
Results: There are 12 different registries within the VQI and the error rate for variables was generally less than 5% involving mostly descriptive variables rather than key outcome variables. Less than 2% of data points were found to be in error.
Discussion: Clinical registries contain large amounts of data subject to multiple error types. The audit allowed discovery of coding and other systematic errors resulting in improved quality control measures and data checks.
Conclusion: Although the audit was time consuming and labor intensive, it led to improved data accuracy as well as new and enhanced audit strategies.
Key words: data validity, audit, data quality, registry
VQI data has been used for multiple quality improvement initiatives. Analysis of VQI data has impacted how clinicians provide care which has undoubtedly prevented strokes, heart attacks, limb loss and mortality. VQI has provided information that allows providers and centers to benchmark their care and performance. The VQI registry has served as a robust source of data for dozens of scientific analyses and publications. As well as preventing complications, VQI data have also allowed centers and providers to reduce their resource utilization and length of stay with a subsequent reduction in costs.
There are over 160,000,000 data points in the VQI registries and, as is inevitable, data errors have been encountered. Almost 2 years ago while reviewing the EVAR registry, an apparent disproportionate number of complications was encountered. This was readily appreciated by the investigators prompting review of the data set and its code. Further analysis identified the responsible coding error which was corrected.
When the registry was found to be corrupted, the code for creating this BDS was subject to further quality control analysis and coding errors were discovered. The errors were felt to be unique to this registry as it had recently undergone a major revision. When a registry is revised, new data fields are created, some data fields are eliminated, some are changed and some are given new definitions. When the revised registry is merged with the previous registry data, there is a mapping process that incorporates these modifications which can be subject to data variation.
Shortly thereafter data discrepancies were discovered in another registry. In this instance, a center found that their reports from the IVC filter registry contained fewer patients than they had entered into the registry. The BDS from the IVC filter registry did not contain inaccurate data but was found to be incomplete (as some patients were not included in the BDS). In this instance, the code used to create the BDS truncated the data set and some patients were not included. As a result of these two events - one registry with inaccurate data and one with incomplete data, the PSO informed the SVS and embarked on a data audit. As a result during this time period, the PSO did not release any BDS for investigation or analysis until the audit for the registry had been completed, approved by PSO and VQI.
Data is the foundation of everything that the VQI (or any clinical registry) does. All reports, analyses, findings, benchmarks, etc are based upon patient data. As such, it is imperative that the data be of the highest quality and beyond reproach. The data audit of VQI was begun with our corporate partner, M2S® (a subsidiaryof Medstreaming®), which is responsible for the secure transmission and warehousing of the data as well as creating the BDS that is provided to investigators. New quality controls were developed including a standardized methodology for creating the BDS. After a new uniform code was created for the BDS, it was subject to analysis. At the outset, the PSO requested that an independent auditor be brought in to assess the effectiveness of the recently written code for creating the BDS. The external audit confirmed that the data displayed in the BDS matched the data entered in to the registry. After review, it was felt that the new code was accurate and reliable. Using the new standardized code, a new BDS was created for each of the 12 registries. Each registry was then subject to further auditing by analyzing the means or the categorical distribution of each variable by year and flagging any unexpected changes as suggestive of potential data errors. Variations could be due to changes in practice, coding or other systematic errors. If significant variation was encountered, the fields with data variation were subject to further analysis. Some degree of data variation was due to clinical variation. Coding or transcription errors were corrected. The “variation check” was performed on each registry and the registry was cleared for release of BDS.
The audit of 12 registries was a tedious, time-consuming and labor-intensive process that took thousands of hours of PSO and M2S personnel until data accuracy was confirmed. There were multiple reasons for the development of data discrepancies including transcription (some data had been collected in 2003 on paper forms and migrated through multiple databases before arriving in the current format), improper mapping of old variables to new during prior registry revisions, and inadequacy of the data dictionaries.
Registry |
Total # Variables |
# (%) Variables that required Correction due to transition from Paper Forms |
# (%) Variables that Required Correction due to Data |
# (%) Prior Calculated Variables that Required Correction (Now Retired) |
INFRA |
187 |
8 (4%) |
5 (3%) |
3 (2%) |
CEA |
221 |
4 (2%) |
9 (4%) |
3 (1%) |
PVI |
286 |
0 (0%) |
0 (0%) |
5 (2%) |
OAAA |
144 |
1 (1%) |
5 (4%) |
0 (0%) |
SUPRA |
187 |
0 (0%) |
5 (3%) |
2 (1%) |
EVAR |
759 |
15 (2%) |
33 (4%) |
3 (0%) |
CAS |
189 |
1 (1%) |
2 (1%) |
1 (1%) |
AMP |
161 |
0 (0%) |
8 (5%) |
3 (2%) |
VV |
437 |
0 (0%) |
1 (0%) |
0 (0%) |
AVACESS |
159 |
0 (0%) |
5 (3%) |
1 (1%) |
IVC |
152 |
0 (0%) |
1 (1%) |
14 (9%) |
TEVAR |
1038 |
0 (0%) |
35 (3%) |
5 (1%) |
When each of the registries had gone through the audit process and been cleared for release, a new BDS was generated. The BDS for each registry was sent to all investigators who had used that BDS for study. Each investigator was asked to evaluate the results of the same analysis on the new BDS compared to the previous BDS that he or she had received. As yet, no investigator has reported that the data errors have significantly impacted their outcomes or analysis.
The foundation of any clinical registry is data accuracy. Unless there is absolute confidence in the data integrity, any analysis or conclusions based upon the registry is questionable [3]. Even small degrees of error can significantly impact outcomes or analyses. At every scientific congress and in most medical journals, there are multiple presentations and manuscripts based on administrative and/or clinical databases. Clinical registries have long been felt to be superior and more accurate than administrative databases due to better coding [4, 5]. The primary purpose of a clinical registry is to provide aggregate clinical data for analysis whereas administrative databases primarily function for financial or administrative reasons. Clinical databases are felt to have improved accuracy compared to administrative databases because they are compiled by data collectors with clinical expertise, are used for clinical rather than administrative or financial purposes, are prospectively maintained and are subject to audit [6].
All registries are subject to errors in coding, transcription, and interpretation. Multiple efforts are made to minimize these errors by training, certification, data checks, and other quality control techniques. However it is inherent to any large complex registry that there will be some degree of data variation. The Society for Thoracic Surgeons (STS) National Adult Cardiac Surgery Database (NCD) is recognized as a premier high quality clinical registry. Years ago the validity of the NCD data was questioned by investigators, who found variable agreement (89%, range 42-100%) in the data when subject to review by untrained abstractors [7]. Most of the variability was noted in “subjective fields” such as etiology of aortic valve disease or NYSHA classification. More critical fields such as number of internal mammary grafts or absence of significant valvular disease had high degrees of concordance. However the patient status at 30 days (dead or alive) was in agreement in only 83% of cases in 2008. Grunkemeier and Furnary outlined the issues of a large complex clinical registry emphasizing the need for awareness of incomplete and inconsistent data and how to utilize internal quality controls [8].
The STS has made major efforts at improving data validity, including developing its own program for improving data quality (STS Adult Cardiac Surgery Database v2.61 Consistency Edits and Checks [9]. [February 17, 2009].
The National Cardiovascular Data Registry (NCDR) has multiple registries with a comprehensive data audit and quality assurance program. They describe average accuracy rates within 3 of their registries to range from 90-93% (center range from 85- 97%) [10]. Review of a high-quality orthopedic registry showed that in 2 hospitals the percentage of records that were error free were 8 and 10% [11]. Setting the benchmark for accuracy at 95%, 57% of variables at one hospital and 74% at the other met this threshold. The majority of discrepancies were in medical history although agreement for the variable "postoperative complications" was 73%.
With the increasing number and utility of medical registries, there has been increased focus on defining and improving data quality [12]. The European Society of Thoracic Surgeons used task-independent metrics to assess quality within their database [13]. They measured completeness, correctness and consistency with a threshold for good quality set at 0.8. Although 0.8 does not appear to be a high threshold for accuracy, it was used for all variables including non-critical fields. Correctness was defined as syntactic accuracy (or plausible with clinical values, e.g. FEV1 >25 and < 150) and not semantic accuracy (i.e. checked against source data). Consistency was measured only as internal consistency (or feasibility, e.g. DATE OF ADMISSION < DATE OF OPERATION < DATE OF DISCHARGE). There were few parameters that could be measured for correctness and consistency. The ESTS has gone on to create the Aggregate Data Quality score (ADQ) to further measure data quality [14].
Ethical Approval: NA
Clinical Trial Registration: NA
- Society for Vascular Surgery. VQI: Vascular Quality Initiative. 2018; https://www.vqi.org/. Accessed January 2 2018.
- Surgery SfV. Patient Safety Organization. https://vascular.org/research-quality/vascular-quality-initiative/patient-safety-organization. Accessed January 2, 2018.
- Gallivan S, Stark J, Pagel C, et al. Dead reckoning: can we trust estimates of mortality rates in clinical databases? Eur J Cardiothorac Surg. 2008;33(3):334-340.
- Prasad A, Helder MR, Brown DA, et al. Understanding Differences in Administrative and Audited Patient Data in Cardiac Surgery: Comparison of the University HealthSystem Consortium and Society of Thoracic Surgeons Databases. J Am Coll Surg. 2016;223(4):551-557 e554.
- Mack MJ, Herbert M, Prince S, et al. Does reporting of coronary artery bypass grafting from administrative databases accurately reflect actual clinical outcomes? J Thorac Cardiovasc Surg. 2005;129(6):1309-1317.
- Shahian DM, Jacobs JP, Edwards FH, et al. The society of thoracic surgeons national database. Heart. 2013;99(20):1494-1501.
- Brown ML, Lenoch JR, Schaff HV. Variability in data: the Society of Thoracic Surgeons National Adult Cardiac Surgery Database. J Thorac Cardiovasc Surg. 2010;140(2):267-273.
- Grunkemeier GL, Furnary AP. Data variability and validity: the elephant in the room. J Thorac Cardiovasc Surg. 2010;140(2):273-275.
- Welke KF, Ferguson TB, Jr., Coombs LP, et al. Validity of the Society of Thoracic Surgeons National Adult Cardiac Surgery Database. Ann Thorac Surg. 2004;77(4):1137-1139.
- Messenger JC, Ho KK, Young CH, et al. The National Cardiovascular Data Registry (NCDR) Data Quality Brief: the NCDR Data Quality Program in 2012. J Am Coll Cardiol. 2012;60(16):1484-1488.
- Seagrave KG, Naylor J, Armstrong E, et al. Data quality audit of the arthroplasty clinical outcomes registry NSW. BMC Health Serv Res. 2014;14:512.
- Arts DG, De Keizer NF, Scheffer GJ. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J Am Med Inform Assoc. 2002;9(6):600-611.
- Salati M, Brunelli A, Dahan M, et al. Task-independent metrics to assess the data quality of medical registries using the European Society of Thoracic Surgeons (ESTS) Database. Eur J Cardiothorac Surg. 2011;40(1):91-98.
- Salati M, Falcoz PE, Decaluwe H, et al. The European Thoracic Data Quality Project: an aggregate data quality score to measure the quality of international multi-institutional databases. European Journal of Cardio-Thoracic Surgery. 2015;49(5):1470-1475.
- Smerek MM. Assessing Data Quality for Healthcare Systems Data Used in Clinical Research (Version 1.0). In:2015.
- Willis CD, Jolley DJ, McNeil JJ, et al. Identifying and improving unreliable items in registries through data auditing. Int J Qual Health Care. 2011;23(3):317-323.