Responsiveness of Physical Activity Measures Following Exercise Programs after Total Knee Arthroplasty

Background: Few instruments that measure physical activity (pa) can accurately quantify pa performed at light and moderate intensities, which is particularly relevant to older adults. Evidence for responsiveness of these instruments after an intervention is limited. Objectives: o estimate and compare the responsiveness of two activity monitors and one questionnaire in assessing PA after an intervention following total knee Arthroplasty. Methods: This one-group pretest-posttest, repeated-measures study analyzed changes in duration of daily PA and the standardized response mean (SRM) to assess internal responsiveness that were compared across instruments. Correlations between changes in PA measured by the proposed instruments and the global rating of change were used to test external responsiveness. Agreement between PA instruments on identifying individuals who changed their PA based on measurement error was assessed using weighted-Kappa (K). Results: Thirty subjects, mean age 67(6) and 73% female, were analyzed. Changes in PA measured by each instrument were small (p>0.05), resulting in a small degree of responsiveness (SRM<0.30). Global rating of change scores did not correlate with changes in PA (rho=0.13-0.28, p>0.05). The activity monitors agreed on identifying changes in moderate-intensity PA (K=0.60) and number of steps (K=0.63), but did not agree with scores from questionnaire(K≤0.22). Conclusion: Analyzing group-based changes in PA is challenging due to high-variability in the outcome. Investigating changes in PA at the individuallevel may be a more viable alternative.


Background
The benefits of regular physical activity (PA) to improve general health are well known.[1] Individuals who undergo total knee Arthroplasty (TKA) for end-stage knee osteoarthritis are typically older adults who are less active than their healthier counterparts, [2,3]and therefore, more susceptible to comorbidities.[4] Interventions to increase PA and prevent comorbidities in individuals after TKA have been developed lately.[5,6]To test the effectiveness of these interventions, researchers need measurement tools responsive to changes in PA behavior.However, information on the responsiveness of PA measurement tools is limited.
Responsiveness evaluates the ability of a measurement tool to accurately detect changes in the concept being measured when change has occurred, which can be determined by internal and external responsiveness methods.[7][8][9]There are numerous PA measurement tools available, but evidence of their responsiveness to change is limited.Two activity monitors (Actigraph [ACT; Actigraph LLC, Pensacola, FL] and Sensewear Armband [SWA; Body media, Pittsburgh, PA]) and one self-reported questionnaire (Community Health Activities Model Program for Seniors [CHAMPS])have been validated in older adults, including those with knee osteoarthritis, [10][11][12]and are commonly used in research in this population.[13][14][15]These three instruments have the advantage to measure light-intensity PA(i.e., household chores and slow-walking) that are mostly performed by older adults who undergo TKA.[16][17][18][19][20][21]While activity monitors are costly and need several days of data collection, questionnaires have low cost and it takes 15-20 minutes to complete.In contrast, the monitors measure PA in real time, which eliminates problems with recall bias commonly seeing in self-report PA measures.[21,22] To our knowledge, no studies determined the responsiveness of the SWA and the few studies on the responsiveness of the ACT and CHAMPS did not include older adults with mobility problems such as those undergoing TKA.[12,[23][24][25]Additionally, studies have not concurrently compared responsiveness across these three measurement tools, which will provide information

Study Protocol
Subjects attended two testing visits: one prior to the rehabilitation program (baseline), and one at the end of the program (6-month follow-up).At baseline, subjects completed questionnaires of demographics, pain (11-point numeric pain scale)and physical function (17-item Western Ontario and MCMaster Universities Osteoarthritis Index-Physical Function subscale).Subjects' height and weight were also measured.At the end of the testing-session, subjects were fitted with the ACT and SWA and instructed to wear the monitors for 7 days, except during water activities because they are not waterproof.After the 7-day period, subjects returned to our facility to download data from the monitors and to complete the CHAMPS.The CHAMPS questionnaire queries about PA participation in the past week, which corresponds to the time monitors collected data.Only subjects who had useful PA data from the monitors (i.e., ≥5 days with 10-hours of PA data/day) [26,27] were analyzed.
After completing PA measures, subjects were randomized into two exercise groups.[6] Both exercise programs included endurance and strengthening for the lower extremities.The experimental-group performed more intense training than the control and it was exposed to balance and functional exercises along with PA promotion.Exercise programs consisted of 12-supervised sessions delivered within 3monthsfollowed by a home-exercise program for another 3 months.For the purpose of this ancillary study, data from both groups were combined to provide a wide range of change in PA to test responsiveness.PA measures were repeated at follow-up.Additionally, subjects rated their perceived changes in PA from baseline to follow-up, using a modified global rating of change scale that consists of 15 points ranging from -7 ("a very great deal less active") to 0 ("about the same") to +7 ("a very great deal more active").Subjects were classified as 'more active' if ratingswere+3 ("somewhat more active") or higher.Subjects were classified as 'not changed' if ratings were between +2 ("a little bit more active") and -2 ("a little bit less active").Subjects were classified as 'less active' if ratings were -3 ("somewhat less active") or lower.

Measures of Physical Activity
The ACT is a small triaxial accelerometer-based monitor worn around the waist at the hip-bone level, over the right anterior superior iliac spine.The ACT GT3X and the actilife 5 software (Actigraph LLC, Pensacola, FL)were used.This device generates data on activity counts per minute (CPM) and number of steps.Duration of daily activities in minutes per day (min/day) were categorized by the software using the following CPM cut-points: 760-1951 CPM for light, 1952-5724 CPM for moderate, and >5724 CPM for vigorous intensities.[28]The software calculates non-wear periods following manufacturer's recommendation, which were also visually analyzed.The ACT has demonstrated moderate accuracy in comparison to doubly labeled water (reference standard) to measure PA in older adults and good reliability in individuals after TKA.[10,29] The SWA is a small multi-sensor device worn on the right arm over the triceps muscle at midpoint between shoulder and elbow.The SWA Pro-3 and the Professional software v6.1 (Body media, Pittsburgh, PA)were used.Information from biaxial accelerometer, heat flux, galvanic signal (i.e., sweat rate) and skin temperature is integrated and processed by the software using proprietary algorithms that account for subjects' characteristics (gender, age, height and weight).The SWA was set to provide data on duration of PA (min/day) in light-(2-2.9 metabolic equivalents [mets]), moderate-(3-5.9 mets) and vigorous-intensity (≥6 mets), as well as number of steps.The SWA turns off when not in use, enabling the software to recognize non-wear periods.Data were also visually screened for non-wear periods.Theswashowed good accuracy compared to doubly labeled water to measure pain older adults and good reliability in individuals after TKA.[10,29] CHAMPS is a self-reported questionnaire that queries type, frequency and duration of 41 activities usually performed by older adults.Duration in hours per week (hr/week) of each activity is selected from a range of <1hr/week to ≥9hr/week and categorized in two intensity levels according to the CHAMPS activities codebook.[12]Categories are light-to-vigorous PA (≥2 mets) and moderate-to-vigorous PA (≥3 mets).A lightintensity category was computed (i.e., light-to-vigorous PA minus moderate-to-vigorous PA) to allow direct comparison with the activity monitors.CHAMPS scores in hr/week were converted into min/day.The questionnaire showed small significant association with doubly labeled water to measure PA in older adults, and moderate reliability in people with musculoskeletal disorders and healthy older adults.[10,29,30] Programs after Total Knee Arthroplasty.J Exerc Sports Orthop 4(3): 1-8.DOI: http://dx.doi.org/10.15226/2374-6904/4/3/00164

Responsiveness of Physical Activity Measures Following Exercise Programs after Total Knee Arthroplasty
Copyright: © 2017 Almeida GJ, et al.
The main outcome measure of this study was PA duration in min/day estimated by the ACT, SWA and CHAMPS during waking hours (i.e., from out of bed in the morning to back to bed at night).Measures of moderate-and vigorous-intensity activities were combined into the moderate category since our sample engaged in negligible amounts of vigorous-intensity activities (<1 min/ day).Therefore, the PA intensity categories compared across the three instruments in this study were light, moderate, and lightto-moderate (combination of light and moderate intensities).Number of steps was compared between the ACT and SWA only.

Data Analysis
Descriptive statistics for continuous variables included mean(SD) or median (25-75 percentiles), according to data distribution.Counts and frequencies were used for categorical variables.Histograms depicting the magnitude of changes in PA from pre-post intervention were to visually assess changes in PA.Internal responsiveness (group-based) was estimated by calculating the magnitude of changes in PA and its 95% confidence intervals, as well as the standardized response mean (SRM), which is as an index of responsiveness.The SRM is a ratio of mean change to the SD of change scores.[31]The SRM is interpreted as trivial (<0.20), small (0.20-0.49), moderate (0.50-0.79) and high (≥0.80)degree of responsiveness.[31]Confidence intervals around the SRM were calculated using the bias-corrected and accelerated bootstrap.[32] To estimate external responsiveness (individual-based), we first explored if the modified global rating of change was suitable as an external anchor to capture perceived changes in PA, i.e., correlations between its scores and changes measured by the PA instruments should be at least moderate(≥0.30).Pearson or Spearman rho correlation coefficients are used according to data distribution.If global rating of change and changes in PA measured by the three instruments were associated, then cut-offs of clinical important change were derived.[9] To compare responsiveness across the PA instruments oneway repeated measures Analysis of Variance (ANOVA) was performed for each PA intensity category to determine if the magnitude of change obtained from the three instruments were statistically different from each other.Changes in PA at each intensity category measured by each instrument was used as the repeated factor.If anovas indicated significant differences, post-hoc comparisons between PA measures were performed applying Bonferroni adjustments with alpha level set at 0.016 to account for multiple comparisons.Bland and Altman plots were used to visually compare change scores across instruments at each intensity category.[33]Internal responsiveness was also compared across instruments by examining the 95% confidence intervals around the SRMs.Non-overlapping confidence intervals indicate that responsiveness between instruments was significantly different.[34] Additionally, weighted-Kappa (linear weighing scheme) was used to investigate the agreement between instruments on identifying subjects who were less, more, or similarly active after the intervention, based on the standard error of the measurements previously published.[29]Values obtained from weighted-Kappa are interpreted as poor (<0.20), fair (0.21-0.40), moderate (0.41-0.60), good (0.61-0.80) and very good (0.81-1.00).[31]Analyses were performed using the IBMSPSS Statistics 21 (IBM Corporation, Armonk, NY)and Microsoft Excel 2010 (Microsoft Corporation, Redmond, WA).

Results
Forty-two subjects completed the randomized trial, of which 30 had complete data in the three PA instruments and were included in the responsiveness analyses.Amongst the 12 subjects not entered in the analyses, 3 refused wearing the monitors at follow-up, and the remaining 9 subjects had no data in one of the devices due to equipment failure.The demographic and biomedical characteristics between subjects included and those excluded from the responsiveness analyses were similar (Table -1).Subjects included in the analyses were on average 67(6) years old, predominantly females (73.3%) and obese (body mass index= 30(4) kg/m 2 ).Monitors wear time was similar at baseline and follow-up: 15(2) hr/day.(Figure -1)depicts the distribution of changes in PA from baseline to follow-up.The graphs indicate that the amount of subjects who became less versus more active was similar.For example, using zero as a threshold for no change at all, measures from the ACT in light-to-moderate PA showed that 18 subjects

Responsiveness of Physical Activity Measures Following Exercise Programs after Total Knee Arthroplasty
Copyright: © 2017 Almeida GJ, et al. became less active and 12 more active.As per the SWA, 14 subjects became less active and 16 more active, whereas CHAMPS scores indicated that 16 subjects became less active and 14 more active.By visually analyzing the number of subjects who changed beyond a magnitude that most clinicians would agree to be an important change (i.e., ~20 min/day in light-to-moderate PA),measures from the ACT indicated that 10 subjects became more active, 9 less active and 11 did not change; as per the SWA, 11 subjects became more active, 13 less active and 6 did not change; and based on CHAMPS, 7 subjects became more active, 11 less active and 12 did not change.
In terms of external responsiveness, while PA measured by the ACT, SWA and CHAMPS revealed similar number of subjects who became less or more active at follow-up (Figure -1), none of the subjects reported being less active based on the global rating of change in PA scale.According to the scale,7 subjects (23.3%) had no change in activity level at follow-up and the remaining of the subjects (76.7%) were more active.Consequently, the associations between self-rated and measured changes in PA were below the threshold of rho≥0.30(rho= 0.13 to 0.28, p>0.05), which precluded calculation of cut-offs of clinical important change.
The comparison of internal responsiveness between PA instruments indicated that the magnitude of changes in pawere not different across PA categories (p≥0.12,data not shown).Moreover, the 95% confidence interval of the SRMs largely overlapped across instruments (Table -2).The Bland-Altman plots also indicated that internal responsiveness was similar across PA instruments (Figure 2): the line of equality (zero) was contained within the lines of 95% confidence interval of the difference between changes in PA scores measured by each instrument across intensity categories.
Using measurement error as a threshold for changes in PA weighted-Kappa (K) indicated moderate agreement between ACT-SWA on identifying subjects who changed their PA beyond the standard error of the measurement in moderate-intensity PA (K=0.60),good agreement in number of steps (K=0.63) and fair agreement in light-to-moderate PA (K=0.36),all of which were statistically significant .Agreement was fair between ACT-SWA in measures of light-intensity PA (K=0.25) as well as between ACT-CHAMPS in measures of light-to-moderate PA (K=0.22), which were not statistically significant.There was poor agreement between CHAMPS and the activity monitors in

Responsiveness of Physical Activity Measures Following Exercise Programs after Total Knee Arthroplasty
Copyright: © 2017 Almeida GJ, et al.

Table 2:
Duration of physical activity in min/day measured by the ACT, SWA and CHAMPS questionnaire, the magnitude of changes scores between baseline and follow-up, and the standardized response mean.Data represent mean (standard deviation) from an n of 30 unless otherwise indicated.

Responsiveness of Physical Activity Measures Following Exercise Programs after Total Knee Arthroplasty
Copyright: © 2017 Almeida GJ, et al. measures of light-to-moderate PA (K≤0.07) and light-intensity PA (K≤0.12).Agreement between CHAMPS-SWA was also poor in moderate-intensity PA (K=0.12).

Discussion
This is the first study to estimate and compare the responsiveness of the ACT, SWA, and CHAMPS in assessing changes in light to moderate PA after rehabilitation following TKA.We observed that the high variability inherent in measures of PA is problematic and limits the utilization of any responsiveness index (i.e., SRM and effect-sizes) based on group changes.The inability to detect changes over-time at the group-level in this study seemed to be due to large variability at the subject-level rather than pre-post intervention.Another important finding was that subjects' perception of changes in activity participation appears to be limited due to poor associations between self-rated and measured changes in PA.Study results also demonstrate that using a threshold to identify changes in paat the individual-level may be useful.
Results from our study agree with prior studies that assessed the responsiveness of the ACT and CHAMPS and reported trivial to small degree of responsiveness in those PA instruments.[23][24][25]35]Responsiveness indices from studies on measures from ACT ranged from 0.18 to 0.36 in subjects with type-2 diabetes, [23] nonworking older adults, [24] and sedentary healthy adults.[25] One study on the CHAMPS questionnaire in healthy older adults found responsiveness indices of 0.33 and 0.37 in moderateintensity PA and light-to-moderate PA respectively.[35]The small responsiveness of PA measures across the studies not only highlight the difficulty to change PA behavior, but also support the notion that assessment of changes in PA over-time at the group-level may not be adequate.
In this study, we attempted to assess external responsiveness using a modified global rating of change scale as an external

Responsiveness of Physical Activity Measures Following Exercise Programs after Total Knee Arthroplasty
Copyright: © 2017 Almeida GJ, et al.
anchor.However, associations were poor between the scale and measured changes by the PA instruments.The lack of association between self-rated and measured changes in PA may be partially due to social desirability bias.[36] When completing the global rating of change scale, subjects might have answered the question in a positive waysince none of them reported being less active.Additionally, the poor associations may also be a result of the difficulty that subjects have in adjudicating changes in the amount of activities after a long period of time (baseline to follow-up).This perception of changes in PA may be particularly difficult for older adults with knee osteoarthritis because they usually engage in light-intensity PA, which is very difficult to recall.[37] We are not aware of any studies that attempted to use external responsiveness methods to investigate sensitivity to change of the ACT, SWA or CHAMPS.
A practical step to identify changes in PA at the individuallevel is the use of the standard error of the measurement as a threshold, which might be appropriate in situations where patient-perceived changes are unavailable.[38]While measurement error has been used as a threshold in studies investigating changes in physical function, [39,40] this is the first study to discuss the usability of this method to identify changes in PA.This approach allowed to test the agreement between PA instruments in classifying individuals who changed or not their PA.While the ACT and SWA generally agreed on classifying those who changed, they disagreed with change scores from CHAMPS.This discrepancy may have been due to limitations inherent of PA questionnaires such as recall-bias.[21,22] This study is not without weakness.Although the study had a small sample size, it is unlikely that the non-significant results for internal responsiveness were due to type-II error because the changes in PA were all very small.These small changes were likely the result of difficulties in changing subjects' lifestyles along with high variability in change scores among subjects.

Conclusion
In conclusion, our study showed that the high variability in change scores resulted in small degree of responsiveness of PA measured by the ACT, SWA and CHAMPS, which dispute the assessment of change in PA at the group-level.The ACT and SWA seemed to agree on detecting changes in PA using measurement error as a threshold.Therefore, when investigating changes in PA behavior in individuals with arthritis of the lower extremities, clinicians and researchers should consider interpreting their results based on changes at the individual-level rather than at the group-level.

Figure 1 :
Figure 1: Changes in physical activity (PA) duration measured by the Actigraph (ACT), Sensewear Armband (SWA) and Community Healthy Activities Model Program for Seniors questionnaire (CHAMPS) pre to post intervention across the three PA intensity categories, and number of steps (ACT and SWA only).Numbers on the Y-axis represent frequency and numbers on the X-axis represent PA duration in min/day or daily number of steps.N/A: not applicable since CHAMPS does not measure number of steps.

Figure 2 :
Figure 2: Bland and Altman plots of differences between change scores across physical activity categories.PA: physical activity; ACT: Actigraph; SWA: Sensewear Armband; CHAMPS: Community Healthy Activities Model Program for Seniors questionnaire; N/A: not applicable since CHAMPS does not measure number of steps.

Table 1 :
Baseline characteristics of study sample.Data represents mean (SD) or Frequency (%), unless otherwise indicated.

Table 3 .
Number of subjects who were less active, more active, or did not change based on the standard error of the measurement, and weighted Kappa between measures of PA from the ACT, SWA and the CHAMPS questionnaire.