Comparing and Validating Simple Measures of Patient-Reported Peripheral Neuropathy for Oncology Clinical Trials : NCCTG N 0897 ( Alliance ) A Pooled Analysis of 2440 Patients

Introduction: Current standard evaluation of Peripheral Neuropathy (PN) is based on an investigator-reported classification system that is commonly unable to correctly reflect the subjective symptoms for patients. Thus more reliable methods to assess PN are needed. This study assessed alternative methods of assessing patient-reported PN in 5 North Central Cancer Treatment Group (NCCTG) clinical trials. Method: Two single-item assessments relating to numbness and tingling were used to measure PN. Patients’ Quality Of Life (QOL) was also assessed using the Uniscale, Symptom Distress Scale (SDS), Profile of Mood States (POMS), Brief Pain Inventory (BPI) and Subject Global Impression of Change (SGIC). Wilcoxon tests compared QOL scores between patients with PN (score > 50) vs. no PN (score ≤ 50). Changes from baseline in QOL were compared by Wilcoxon rank sum test with a 20-point change in PN defined as clinically meaningful. Both distribution-based and anchor-based approaches were used to derive estimates of Minimal Important Differences (MID). Standardized Response Means (SRM), Effect Sizes (ES) and Guyatt’s responsiveness statistic were used to measure responsiveness. Results: The proportion of patients reporting numbness (tingling) at baseline was 10.7% (10.0%) and 18.4% (17.8%) at last assessment. The correlation between numbness and tingling at baseline was 0.81, and at last assessment was 0.83. Patients with substantial PN reported an average of 10 points lower overall QOL, mood and worse symptom distress and 20 points lower in the BPI interference items. Patients having a ≤ 20 point worsening in PN score reported significantly worse in symptom distress and BPI worst pain, but not in POMS or overall QOL. The MID estimates were similar between numbness and tingling items but varied depending on the approach used. Responsiveness statistics indicated that the two PN assessments are sensitive and responsive instruments for cancer patients with PN. Conclusions: The two PN items for numbness and tingling were redundant. Evidence of criterion validity and responsiveness indicates that these simple measures of PN can be used successfully in cancer clinical trials.


Introduction
Peripheral Neuropathy (PN) is a painful, common and potentially dose-limiting adverse effect of treatment with taxanes, vinca alkaloids, and platinum-based chemotherapeutic agents [1].Neuropathic symptoms may compromise the quality of life, influencing physical, social and psychological functioning, which maybe particularly critical during cancer treatment and recovery among cancer patients [2].Furthermore, neuropathy can become so severe as to limit the tolerable total dose of chemotherapy, potentially leading to disease progression [2,3].The introduction of new anti-cancer therapies that cause PN has increased the need for methods to reliably assess PN in clinical and research settings.
To date, the commonly used physician-based instruments to assess cancer-induced PN include the National Cancer Institute -Common Terminology Criteria for Adverse Events (NCI-CTCAE), the World Health Organization (WHO) Common Toxicity Criteria for PN, and the Eastern Cooperative Oncology Group (ECOG) Grading Scale for CIPN [4][5][6].These grading scales neither define grade uniformly nor define terms clearly.Hence each clinician undertakes the task of assessing peripheral neuropathy without clear points of reference and is hence forced to rely upon personal experience and specific knowledge with peripheral neuropathy which could be highly variable.This leads to subjective interpretation by the clinician and makes PN diagnosis and grading more controversial.The subjective nature of these ratings also results in substantial inter-observer and intraobserver variation in assessing PN.Moreover, current evidence suggests that physician-based assessments tend to under-report the incidence and severity of PN [7][8][9].
The lack of a universally recognized, standardized, valid and reliable patient-reported tool that quantifies PN symptoms makes comparisons among published studies of PN difficult.Patientreported instruments are often lengthy and do not provide extensive information about the intensity of this toxicity or the severity of resulting impairment.Therefore, the development of a new and reliable tool for assessing PN that allows early detection and captures conditions with the psychometric properties required for use in clinical and research setting would represent an important advance.
There have been more recent advances in the measurement of PN, most notably by the European Organization for the Treatment and Research in Cancer (EORTC) which produced a 20-item measure for patient-reported peripheral neuropathy (ref) and the Functional Assessment of Cancer Therapy Scale/ Gynecologic Oncology Group-Neurotoxicity (FACT/GOG-Ntx) questionnaire [10,11].Other studies have looked at simple visual analogue scales and the Neuropathy Symptom Score and Neuropathy Symptoms and Change Score used in other studies [12,13].Challenges remain however, in that the relative psychometric qualities of the various assessments has yet to be established in detecting PN, and there is no evidence comparing the more involved assessments to simple single-item measures.
The primary aim of this study was to compare and validate two single-item measures of PN using data from a series of North Central Cancer Treatment Group (NCCTG) clinical trials.A secondary aim was to produce an estimate of the Minimal Important Difference (MID) for the PN measures and compare the responsiveness to clinical change of the PN measures in cancer patients.

Methods
Data were pooled from five NCCTG clinical trials involving a total of 2440 patients.Details of the individual trials and their publications are in Table 1 [14][15][16].Each participant signed an IRB-approved, protocol-specific informed consent in accordance with federal and institutional guidelines.All trials used two single-item questions relating to numbness and tingling in fingers and toes to measure PN, where a score of 0 represented no numbness or tingling and a score of 10 represented the worst imaginable numbness or tingling.These items were developed within the NCCTG specifically for the assessment of oxaliplatinrelated toxicity [17].Patients also completed other instruments to assess health-related quality of life (QOL).Overall QOL was measured using the Uniscale, symptoms were measured using the Symptom Distress Scale (SDS), mood was measured using the Profile of Mood States (POMS) Short Form, pain was measured using the Brief Pain Inventory (BPI) and physical status was measured using the Subject Global Impression of Change (SGIC).All assessments were provided to the patient in professionally prepared booklets by study coordinators at each site and were measured at baseline and at least one other time point during the trial.The psychometric properties of study instruments are described below.
The studies were analyzed separately and together in a pooled dataset.Results in terms of the responsiveness of the measures, and other psychometric properties were consistent across studies (data not shown).Pooling was justified as the results comparing intra-patient differences and correlation among the various PN assessments were our goals.
The Uniscale is a single-item measure of global QOL using a numerical rating scale ranging from 0 to 10 with well-established validity data in cancer populations [18][19][20].
The SDS is a valid and reliable 13-item cancer-specific instrument intended for assessing the degree of distress associated with the following 11 cancer-related symptoms as perceived by the patient: nausea, appetite, insomnia, pain, fatigue, bowel pattern, concentration, appearance, outlook, breathing, and cough.Each symptom is rated on a 1-5 Likerttype scale [21,22].
The POMS is a 37 item scale utilized to assess patients' overall mood and specific mood items of fatigue-inertia, vigor-activity, tension-anxiety, depression-dejection, anger-hostility and confusion-bewilderment.It is valid for use, and is discriminative, in the evaluation of cancer-and pain-associated mood disturbance or psychological distress [23].
The BPI is a pain assessment tool that consists of 15 items to locate pain, determine pain severity, determine how the pain interferes with daily activities and assess the extent of pain relief received from analgesics [24,25].All items except those concerning pain location and medication are measured using numeric analogue scales ranging from 0 to 10.The BPI has been widely used and has been validated for use in cancer populations [26].
The SGIC is a 7-point item in which patients rate the change in the overall status since the beginning of the study (ranging from much improved, moderately improved, minimally improved, no change, minimally worse, moderately worse, too much worse).The item has been found to be effectively discriminate treatment effects in neuropathic pain trials [27].
The QOL assessments were scored according to each tool's established scoring algorithm.The scores were coded so that a low score was representative of poor patient condition, as necessary, and scores converted into 0-100 point scale with 100 being the best possible score for ease of interpretability and comparability between instruments with differing ranges [28][29][30].Each of the two single-item PN scores is on a 0 to 10 ordinal scale and thus was also converted to a 0-100 scale.Patients were assigned a dichotomous category for substantial PN at baseline using the scoring cut-off for Clinically Deficient PN (CDPN) based on the numbness and tingling scores (CDPN of ≤ 50 vs.non-CDPN of > 50).This approach was consistent with prior work that determined a score of 50 or less on the 100-point scale was indicative of a deficit that required clinical intervention or at least further clinical investigation and assessment [30][31][32], Changes in scores were calculated for PN and QOL assessments using baseline and the last assessment.Patients were further categorized as becoming ≥ 20 points worse vs. < 20 points worse in PN rating as measured by the worst change from baseline in numbness or tingling [29].The Wilcoxon rank-sum test was used to compare QOL scores between assigned categorical groups [33].
Associations between PN and QOL scores were examined using Spearman correlation coefficients and the two PN assessments were compared using a Bland-Altman analysis [34,35].Weighted kappa was used to measure agreement between the two PN assessments [36].
Both distribution-based and anchor-based approaches were used to derive estimates of Minimal Important Differences (MID), which has been defined as the smallest change in a patientreported outcome that is perceived by patients as beneficial or that would result in a change in treatment [37,38].The anchorbased approach relied upon SGIC related to QOL, physical condition and emotional status.The distribution-based approach in this analysis applied the 1/2 standard deviation method was applied to determine the MID [39].Only the two trials, N00C3 and N01C3 that collected SGIC at week 8 were used to determine the MID estimates.
Standardized Response Means (SRM), Effect Sizes (ES) and Guyatt's responsiveness statistic were used to measure responsiveness.The SRM was calculated as the mean change in scores divided by the standard deviation of the change scores [40].The ES was calculated as the mean change in scores divided by the standard deviation of the baseline scores [41,42].Guyatt's responsiveness statistic was calculated as the mean change in scores divided by the standard deviation of change of patients in the placebo group [43,44].For these indices, small effects were considered higher than 0.20 but less than 0.50; moderate effects, higher than 0.50 but less than 0.80; and large effects, higher than 0.80 [45][46][47].Data collection and statistical analyses were conducted by the Alliance Statistics and Data Center.Data was frozen by 12/11/2014, and statistical analyses were performed using SAS version 9.2.

Results
Patient characteristics are reported in Table 2.The median age was 61 years (range, 19-88 years), 55% patients were men, and 88% patients were white.The proportion of patients reporting numbness (tingling) at baseline was 10.7% (10.0%) and 18.4% (17.8%) at last assessment.The proportion of patients reporting PN on either item was 11.4% at baseline and 20.3% at last assessment.
Patients with substantial PN at baseline (CDPN ≤ 50) reported an average of 10 points lower overall QOL, SDS and POMS and 20 points lower in the BPI interference items (Table 3).Mean changes in these QOL assessments from baseline to last observation reflected greater reduction in patients having at least a 20 point decrease in PN scores over time compared to those who did not.This was true for the SDS (-1.5 vs. 1.0, respectively, P < 0.001), and the BPI.The mean difference in decline in most BPI items between the two groups was greater than 10 points, suggesting a reasonable responsiveness of these QOL components to PN (Table 4).
The correlations between the two PN questions at baseline and last assessment were 0.81 (95% CI: 0.78-0.84)and 0.83 (95% CI: 0.81-0.86),respectively.The redundancy indices in canonical correlation analysis at baseline and last assessment were 0.85 and 0.79, respectively (both above the typical cutoff of 0.75 to indicate redundancy).The average difference generated by subtracting tingling from numbness at baseline was -0.55 (SD = 9.48, p=0.01) and the average difference generated by subtracting tingling from numbness at last assessment was -0.25 (SD = 12.52, p=0.39).A strong agreement between these two PN items was observed with a weighted kappa of 0.85 (95% CI: 0.83-0.88).The Bland-Altman plot in Figure 1

Comparing and Validating Simple Measures of Patient-Reported Peripheral Neuropathy for Oncology Clinical Trials: NCCTG N0897 (Alliance) A Pooled Analysis of 2440 Patients
Copyright: © 2015 Sloan et al.  differences between the numbness and tingling items versus the average of the two items.The picture indicates that differences are scattered equally on either side of zero and that there is no pattern in the differences relative to the degree of numbness or tingling reported(r = 0.08).The arrow head shape on the extreme right and left sides of the figure only indicates the lack of extreme differences.Collectively this indicates that these two neuropathy measures were highly correlated to the point of redundancy.Table 5 presents correlations between QOL assessments and both numbness and tingling at baseline.

Comparing and Validating Simple Measures of Patient-Reported Peripheral Neuropathy for Oncology Clinical Trials: NCCTG N0897 (Alliance) A Pooled Analysis of 2440 Patients
Copyright: © 2015 Sloan et al.
and POMS, indicating that patient-reported PN symptoms are not strongly associated with overall QOL, SDS and POMS, and are therefore measuring distinctly different constructs.Both PN symptoms were found to be moderately correlated with pain and physical function.Baseline and changes in numbness and tingling were found to be moderately correlated with changes in pain and physical function, suggesting that the changes in PN are associated with changes in pain and physical activity functions.
One important feature of the PN items was their sensitivity to treatment.Increase and decrease of the PN items were related to the patient rated improvement in physical status.The MID estimate using the overall QOL, physical condition and emotional state of SCIG were 24, 21, 25 points for numbness (respectively); and 19, 18, 17 points for tingling (respectively).The distribution based MID estimate was 11 points for numbness and 12 for tingling.The correlation between the patient anchors and the neuropathy items are low (< 0.4), indicating that the anchors may not be the valid ones.Patients who experienced substantial PN reported significant lower QOL, mood and worse symptom distress: the average difference is around 10, and the average difference in the BPI interferences are even higher (around 20), indicating a substantial association of neuropathy to BPI interference.
The overall ES, SRM and Guyatt's statistic for numbness after cycle 1 are 0.61, 0.47 and 0.77 (respectively); and 0.55, 0.44 and 0.73 for tingling (respectively), indicates that the two PN symptoms are sensitive and responsive QOL instruments for cancer patients with PN.

Discussion
The primary finding of importance in this comparative study of alternative measures of neuropathy is that the two neuropathy measures measuring numbness and tingling separately were redundant.Hence, in designing future clinical trials to assess PN, these data support the hypothesis that a simple single-item drawn from the numbness or tingling or combining numbness and tingling into one item would be sufficient to produce valid and reliable data, although generalizability is restricted to the sort of patients studied and the type of clinical trials involved The major advantage of single-item PN assessments is their simplicity, in terms of administration, scoring, and interpretation as has been demonstrated repeatedly [48].The disadvantage of the two assessments is that they do not describe of all symptoms of the PN such as neuropathy pain.As a screening tool in clinical practice or stratification variable in clinical trials, the single-item assessment can be the trigger that launches a subsequent, more comprehensive investigation into uncovering the specific PN deficit and/or initiating appropriate clinical interventions [49].
A secondary finding more relevant to methodologists was that anchor-based MID methods for determining clinical significance produced larger estimates than the distribution-based methods.This is an important point because the relative merit of anchorbased and distribution-based approaches to clinical significance assessment is still being debated [50,51].The findings are consistent with previous work indicating the choice of anchor can substantially impact MID estimates [52,53].
There are limitations to this evaluation of PN measures.The five trials in this pooled analysis included two CIPN treatment trials, one prevention trial testing interventions to reduce PN, and two chemotherapy treatment trials where CIPN was expected to be a prominent toxicity.The influence of heterogeneity in the cancer patients and its relation to patient outcome may have influenced individual levels of CIPN but as the construct under study was the intrapatient differences among the various measures of CIPN, each patient served as their own control.Furthermore, results remained consistent when each trial was independently analyzed.Replication of the psychometric properties will be studied further in ongoing and planned clinical trials.Another limitation arises because the responsiveness of the QOL scale depends on  the follow-up time point and patients' initial health status.Three trials have different study-specific QOL collection schedules, so two trials only were used for deriving responsiveness statistics.Further, the mean PN assessments at baseline are high, indicating good health status in regard to PN.As such, it is difficult for the CIPN measure to detect further improvement [54,55].This might have led to an under-estimating of the responsiveness statistics in the study.Finally, we did not compare the CIPN measures with objective measures of nerve conduction, nor did we undertake discriminant validity in this paper, as these aspects have been reported elsewhere for patient-reported CIPN measures [56].
Ultimately, the key finding of this pooled analysis is that the simple measures of CIPN demonstrate acceptable psychometrics including criterion validity and responsiveness indicating that these simple measures of PN can be used successfully in cancer clinical trials.While there are other measures of CIPN available, they are all longer and more complicated, representing additional burden both to the patients and clinical trial system.We are carrying out further research focusing on direct comparisons of alternative measures of CIPN to learn about the relative merits of the various assessment approaches.

Figure 1 :
Figure 1: Bland-Altman Plot for Numbness and Tingling at Baseline.

Table 1 :
displays the List of Trials that Were Included in Neuropathy Pooled Analysis.

Table 3 :
Mean Baseline QOL Scores by Baseline Peripheral Neuropathy.

Table 4 :
Mean QOL Change from Baseline Score by 20 Points Worsening PN.

Table 6
presents the correlations between changes in QOL and change in numbness and tingling.Both PN symptoms were weakly correlated with overall QOL, SDS and POMS.Change in numbness and tingling were also weakly correlated with change in overall QOL, SDS Peripheral Neuropathy for Oncology Clinical Trials: NCCTG N0897 (Alliance) A Pooled Analysis of 2440 Patients.SOJ Anesthesiol Pain Manag, 2(2): 1-9.

Table 5 :
Correlation between Baseline Numbness, Tingling and Baseline QOL items.

Table 6 :
Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Numbers U10CA180821 and U10CA180882 to the Alliance for Clinical Trials in Oncology.This work was also supported in part by Public Health Service grants CA-25224, CA-37404, CA-35431, CA-35415, CA-35103, and CA-35269.The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.Correlation between Change from Baseline in Numbness, Tingling and Change from Baseline in QOL items.