Comparison between Hounsfield unit value and vertebral bone quality score for adjacent vertebral fracture risk assessment after balloon kyphoplasty: a propensity score matching study
Article information
Abstract
Study Design
A retrospective study.
Purpose
To compare the predictive utility between Hounsfield unit (HU) values and vertebral bone quality (VBQ) scores for adjacent vertebral fracture (AVF) risk after balloon kyphoplasty (BKP) and to identify the appropriate measurement site.
Overview of Literature
HU and VBQ have emerged as novel bone strength assessment methods. However, no study has compared the efficacy of these methods for evaluating AVF risk.
Methods
This single-center study included 130 patients with osteoporotic vertebral fractures who underwent BKP and preoperative computed tomography and magnetic resonance imaging. After propensity score matching for age; sex; body mass index; fracture level; use of steroids, teriparatide, or osteoporosis medication; and previous AVF, patients were classified into the AVF (−) and AVF (+) groups, each of which included 34 patients. Bone strength was assessed using the L1 HU, L1–4 HU (mean HU of L1–L4), L1 VBQ, and L1–4 VBQ. Group differences were analyzed, and the predictive accuracy for AVF was evaluated using area under the receiver operating characteristic curve (AUC).
Results
L1 HU was significantly lower in the AVF (+) group than in the AVF (−) group (92.1±29.4 vs. 71.6±21.4, p=0.013). No significant differences between the groups were observed for L1–4 HU, L1 VBQ, and L1–4 VBQ. L1 HU had the highest AUC (0.657), compared with those for L1–4 HU (0.625), L1 VBQ (0.524), and L1–4 VBQ (0.523). The predictive accuracy of L1 was superior to that of L1–4 for both HU and VBQ scores.
Conclusions
HU was superior to VBQ in predicting AVF risk after BKP, with L1 HU being the most effective indicator of bone strength and AVF risk.
Introduction
Dual-energy X-ray absorptiometry (DEXA) is widely used for bone strength assessment, but its accuracy is known to decline in the presence of degenerative diseases, scoliosis, or vascular calcification [1,2]. To address these limitations, the Hounsfield unit (HU) value and vertebral bone quality (VBQ) score have recently gained attention as novel bone strength assessment methods. HU is measured using computed tomography (CT) and was first reported by Pickhardt et al. [3] in 2011 to correlate with DEXA. VBQ is based on magnetic resonance imaging (MRI) and was shown to correlate with DEXA by Ehresman et al. [4] in 2020. Thereafter, HU and VBQ assessments have been reported in studies on complications, such as screw loosening [5,6], cage subsidence [2,7], and proximal junctional kyphosis, after spinal fusion surgery [1,8]. These studies further established their utility as reliable bone strength assessment tools in spinal surgery.
Balloon kyphoplasty (BKP) is a widely performed surgical treatment that has been demonstrated to have favorable outcomes for osteoporotic vertebral fractures (OVF) [9,10]. However, adjacent vertebral fracture (AVF) following BKP is a significant complication [11]. The effectiveness and reliability of DEXA in evaluating AVF risk remain controversial [11,12]. Consequently, the optimal method for assessing bone strength during BKP remains unclear. Matsumoto et al. [13] reported that HU was superior to DEXA in assessing AVF risk after BKP. However, to the best of our knowledge, no study has compared the efficacy between HU and VBQ for evaluating AVF risk. Furthermore, the appropriate site for measuring HU and VBQ in AVF risk assessment is yet to be determined.
This study aimed to compare the utility between HU and VBQ for AVF risk assessment after BKP and to investigate the impact of measurement site differences on evaluation outcomes.
Materials and Methods
This study was approved by the institutional review board of the Sonoda Medical Institute Tokyo Spine Center (202002-1). Written informed consent was obtained from the patients.
Patient selection
In this study, we selected 308 patients with OVF treated with BKP at a single institution between January 2011 and May 2023.
The inclusion criteria were as follows: (1) age ≥60 years; (2) minor trauma; (3) observation for more than 2 months; (4) preoperative CT imaging using the same device; (5) preoperative MRI from the same device, and (6) patients in whom BKP was performed at a single vertebral level.
The exclusion criteria were as follows: (1) neurological impairment; (2) pathological fractures; (3) history of spinal surgery; (4) diffuse idiopathic skeletal hyperostosis; (5) patients with fractures at T12, L1, or L2; and (6) double-level fractures.
Of the 308 patients, 130 met the abovementioned criteria. These patients were divided into two groups based on the presence (n=35) or absence (n=95) of AVF within 2 months postoperatively. Propensity score matching was performed according to age; sex; body mass index; fracture level; use of steroids, teriparatide, or osteoporosis medications; and previous AVF to establish the AVF (−) and AVF (+) groups, each of which included 34 patients [11].
Bone strength assessment parameters and predictive analysis
Bone strength evaluation parameters, including L1 HU (HU of L1 vertebra) [14], L1–4 HU (average HU of L1–L4 vertebrae), L1 VBQ (VBQ of L1 vertebra), and L1–4 VBQ (average VBQ of L1–L4 vertebrae), were compared between the AVF (−) and AVF (+) groups. The predictive accuracy for AVF occurrence was evaluated using area under the receiver operating characteristic curve (AUC).
Image evaluation
The diagnosis of OVF was based on the presence of high signal intensity bone marrow edema on MRI short tau inversion recovery sequences in patients with low back pain. X-ray imaging was performed preoperatively; immediately postoperatively; at 1, 2, 6, and 12 months postoperatively; and each time there was a recurrence of low back pain after surgery.
HU assessment
All patients underwent CT scan using the same equipment (Aquilion CX system; Canon Medical Systems, Otawara, Japan) at a single institution prior to surgery. The HU values were evaluated using a picture-archiving and communication system. Measurements were taken using axial images of the central vertebral body. To avoid areas with uneven bone, such as venous layers and sclerotic lesions, measurements were taken from a uniform region of trabecular bone as much as possible. L1 HU was measured using the axial image of the central vertebral body of L1 (Fig. 1) [14]. In patients with new or previous fractures in the L1 vertebra, the average HU values of the T12 and L2 vertebrae were used as substitute for L1 HU. In those with fractures in both the L1 and T12 vertebrae, the HU value of the L2 vertebra was measured. If those with fractures in both the L1 and L2 vertebrae, the HU value of the T12 vertebra was measured. Patients with fractures in all three vertebrae (L1, T12, and L2) were excluded from the study [14]. L1–4 HU was defined as the average HU values of L1, L2, L3, and L4. In patients with fractures in the L1–4 vertebrae, the fractured vertebrae were excluded from calculation of the average HU. Each measurement was performed three times, and intraexaminer reliability was evaluated.
(A, B) Hounsfield unit value (HU) assessment. HU was measured using axial images of the central vertebral body from the computed tomography. To avoid areas with uneven bone such as venous layers or sclerotic lesions, measurements were taken from as uniform a region of trabecular bone as possible. L1 HU was measured using the axial image of the central vertebral body of L1. The L1–4 HU was defined as the average HU values of L1, L2, L3, and L4.
VBQ assessment
All patients underwent MRI scan using the same equipment (Vantage Titan MRT-2004/N4 1.5T; Canon Medical Systems) at a single institution prior to surgery. VBQ was measured using the midline sagittal image of the lumbar spine on MRI T1-weighted sequence. L1–4 VBQ was calculated by dividing the average signal intensity in the trabecular bone of the L1–4 vertebrae by the cerebrospinal fluid (CSF) signal intensity at the L3 level (Fig. 2). In other words, the L1–4 VBQ represents the VBQ score proposed by Ehresman et al [4]. In this study, we used the term L1–4 VBQ to make the measurement site easier to understand. L1 VBQ was calculated by dividing the average signal intensity in the L1 vertebra by the CSF signal intensity at the L3 level. In patients with fractures within the measurement range, the same method used for HU was used to exclude fractured vertebrae from the measurement.
Vertebral bone quality score (VBQ) assessment. VBQ was measured using the sagittal midline image of the lumbar spine from the magnetic resonance imaging (MRI) T1-weighted sequence. The L1–4 VBQ was calculated by measuring the signal intensity of the trabecular bone in the L1–L4 vertebrae and the cerebrospinal fluid (CSF) signal intensity at the L3 level, then dividing the average signal intensity of the L1–4 vertebrae by the CSF signal intensity at the L3 level. L1 VBQ was calculated by dividing the signal intensity of the L1 vertebra by the CSF signal intensity at the L3 level.
Statistical analysis
EZR (64-bit) statistical software (https://www.jichi.ac.jp/usr/hema/EZR/statmedEN.html) was used. Propensity score matching was used to reduce the risk of selection bias. Differences between groups were assessed using Student t-test for univariate comparisons. Statistical significance was defined as a p-value of ≤0.05. The predictive accuracy for AVF was evaluated using AUC. Intrarater reliability was determined by calculating the intraclass correlation coefficient (ICC).
Results
Study population
Among 308 patients with OVF who underwent BKP during the study period, patients were excluded because of lost to follow-up (n=67), history of spinal surgery (n=74), no available MRI (n=15) or CT (n=11), and other reasons (n=11). Among 130 patients who met the eligibility criteria, AVF was absent in 95 and present in 35 within 2 months after surgery. After propensity score matching, 34 patients were selected for the AVF (−) and AVF (+) groups, respectively (Fig. 3), and their basic data are presented in Table 1. No significant differences were observed between the two groups. The mean follow-up period was 17.6±1.4 months.
Patient selection flowchart for Balloon kypho
plasty (BKP) Study. OVF, osteoporotic vertebral fractures; MRI, magnetic resonance imaging; CT, computed tomography; AVF, adjacent vertebral fracture; BMI, body mass index.
Distribution of measured vertebrae in L1 and L1–4 (HU, VBQ)
The measured vertebrae of L1 (HU, VBQ) and L1–4 (HU, VBQ), excluding the fractured vertebrae, are shown in Table 2. The vertebrae measured for L1 (HU, VBQ) included L1 (n=39); T12, L2 (n=22); T12 (n=4); and L2 (n=3). For L1–4 (HU, VBQ), the measured vertebrae included L1–4 (n=23); L1, L2, L4 (n=6); L1, L3, L4 (n=7); L2–4 (n=21); L1, L4 (n=1); L2, L4 (n=1); L3, L4 (n=4); L1 (n=2); L2 (n=2); and L3 (n=1) (Table 2).
Comparison of bone strength parameters between AVF (−) and AVF (+)
The comparisons of bone strength assessments between the AVF (−) and AVF (+) are shown in Table 3. L1 HU exhibited a significant difference (92.1±29.4 vs. 71.6±21.4, p=0.013). However, there were no significant differences in L1–4 HU (77.7±27.7 vs. 62.7±22.6, p=0.058); L1 VBQ (4.2±0.7 vs. 4.1±0.5, p=0.673); and L1–4 VBQ (4.2±0.6 vs. 4.2±0.6, p=0.898). The ICCs were 0.925 for L1 HU, 0.941 for L1–4 HU, 0.876 for L1 VBQ, and 0.905 for L1–4 VBQ.
Performance comparison between HU and VBQ using AUC
The AUCs were 0.657 (95% confidence interval [CI], 0.526–0.788) for L1 HU; 0.625 (95% CI, 0.491–0.760) for L1–4 HU; 0.524 (95% CI, 0.383–0.665) for L1 VBQ; and 0.523 (95% CI, 0.383–0.664) for L1–4 VBQ (Fig. 4). When comparing bone strength assessments, the AUCs were higher for HU than for VBQ for L1 (0.657 vs. 0.524, respectively) and L1–4 (0.625 vs. 0.523, respectively). When comparing measurement sites, the AUCs were higher for L1 than for L1–4 for both HU (0.657 vs. 0.625, respectively) and VBQ (0.524 vs. 0.523, respectively). Among all measurement methods, L1 HU had the highest AUC.
Discussion
To the best of our knowledge, this study was the first report to compare the efficacy between HU and VBQ in assessing AVF risk after BKP. HU was found to be superior to VBQ for evaluating bone strength in the context of AVF risk assessment. In addition, measuring the HU in the L1 vertebral body was an effective method for evaluating bone strength.
In this study, HU demonstrated a better AUC than VBQ. Consistently, several reports have previously indicated that compared with VBQ, HU was a superior method for assessing bone strength [15–18]. The primary reason is that HU directly reflects the local mineral density of cancellous bone, whereas VBQ indirectly reflects bone density through the vertebral body fat content. Consequently, hyperlipidemia may cause VBQ measurements to overestimate actual bone density [19]. Moreover, the effects confounding factors, such as MRI relaxation time and echo time, are difficult to eliminate [15,20], and CT scans allow more detailed observation and evaluation of cancellous bone structure. Therefore, HU can selectively measure uniform cancellous bone and exclude nonuniform areas, such as sclerotic findings (e.g., bone islands) and venous plexuses. Conversely, MRI often depicts vertebral bodies as inherently nonuniform, making it difficult to selectively measure uniform cancellous bone to the same degree as HU. However, some studies have reported that VBQ was superior to HU [21,22]. There is an ongoing debate on whether HU or VBQ is the better method for assessing bone strength.
In this study, 36 new vertebral fractures and 22 previously fractured vertebrae were identified within the L1–L4 region, corresponding to an average of 0.85 vertebral fractures per patient in this region. Of 68 patients, 47 (69.1%) had at least one vertebral fracture within the L1–L4 region, whereas 11 (16.2%) had two or more vertebral fractures. The high incidence of vertebral fractures is a characteristic feature in patients undergoing BKP. These vertebral fractures are believed to influence the measurement error of DEXA. One advantage of both HU and VBQ is their ability to selectively measure intact vertebrae and exclude fractured vertebrae. In terms of measurement sites, HU is typically measured at the L1 [4,23,24] or L1–L4 region [25,26], whereas VBQ is primarily measured at L1–L4 [4]. Therefore, in this study, we used both L1 and L1–L4 as measurement sites. When comparing measurement sites, the higher AUCs for L1 (HU, VBQ) than for L1–L4 (HU, VBQ) may be attributed to the smaller variability in measurement sites for L1 (HU, VBQ) than for L1–L4 (HU, VBQ). L1 (HU, VBQ) measurements were consistently obtained from four distinct sites even after excluding fractured vertebrae, whereas L1–L4 (HU, VBQ) measurements excluded fractured vertebrae, resulting in up to 10 different measurement site combinations (Table 2). This increased variability in measurement sites is believed to have contributed to measurement errors in L1–L4 (HU, VBQ). Given the high prevalence of vertebral fractures in patients undergoing BKP, L1 HU was found to be an effective approach, as proposed by Zou et al. [14].
For HU, the adjacent vertebrae of the treated vertebra was suggested to be a useful measurement site [27,28]. This method has been reported to be effective for assessing AVF risk, but it has the drawback of large variability, because the measurement site changes according to the vertebra being treated with BKP. As a result, L1 HU is superior to the HU of adjacent vertebrae for assessment [13].
Wang et al. [22] reported that HU and VBQ were effective in predicting new vertebral fracture over a follow-up period of more than 2 years after percutaneous vertebroplasty and kyphoplasty. Conversely, our study focused on assessing AVF occurrence within 2 months after BKP. This difference is attributed to our exclusive focus on perioperative AVF, which is of greater importance to surgeons. In addition, Wang et al. [22] evaluated the HU and VBQ in the L1–L4 region. Therefore, in this study, patients with two or more fractures within the L1–L4 region were excluded and, considering the high prevalence of vertebral fractures in patients undergoing BKP, we chose to evaluate not only L1–L4 (HU, VBQ) but also L1 (HU, VBQ). In addition, identification of the optimal measurement site was a key distinguishing feature of the present research.
L1 HU was effective in assessing AVF risk after BKP. However, the AUC of 0.657 for L1 HU was not particularly high. This suggests that reliance on L1 HU alone may be insufficient for evaluating the risk of AVF. The next challenge is to develop a more accurate scoring system for predicting AVF by incorporating well-known risk factors, such as local kyphosis and previous fractures, in addition to L1 HU.
For patients at high risk for AVF, use of instrumentation or preventive BKP for adjacent vertebrae may be effective. However, there is currently insufficient evidence supporting the efficacy of these methods in preventing AVF. Therefore, identification of appropriate treatment strategies for high-risk patients is an important topic for future research.
A major strength of this study was its single-center design using the same CT and MRI findings for all patients. Therefore, the results were not affected by measurement errors caused by differences in equipment models, thereby, ensuring high reliability. Although a multicenter study could have increased the number of patients, it would be extremely difficult to standardize the CT and MRI models across institutions, making measurement errors due to equipment differences inevitable.
The main limitation of this study was that it did not include DEXA measurements. This is because DEXA was introduced at our institution in 2016, and its compatibility with data before its introduction could not be ensured. However, the purpose of this study was to compare the effectiveness between HU and VBQ, and comparisons with DEXA were addressed in a previous report [13]. Additional limitations include the small sample size; inability to measure L1 (HU, VBQ) in patients with L1, T12, and L2 fractured vertebrae; and the lack of external validation.
Conclusions
In the assessment of AVF risk after BKP, HU was superior to VBQ. Moreover, the L1 vertebra was found to be a better measurement site than the L1–4 vertebrae. Measurement of L1 HU was the most effective method for evaluating bone strength in AVF risk assessment.
Key Points
This study aimed to compare the predictive utility between Hounsfield unit (HU) values and vertebral bone quality (VBQ) scores for adjacent vertebral fracture (AVF) risk after balloon kyphoplasty (BKP) and to identify the appropriate measurement site.
L1 HU was significantly lower in the AVF (+) group than in the AVF (−) group (92.1±29.4 vs. 71.6±21.4, p=0.013), but no significant group differences were observed for L1–4 HU, L1 VBQ, and L1–4 VBQ.
L1 HU had the highest area under the receiver operating characteristic curve (0.657), compared with that for L1–4 HU (0.625), L1 VBQ (0.524), and L1–4 VBQ (0.523). The predictive accuracy of L1 was superior to that of L1–4 for both HU and VBQ scores.
HU was superior to VBQ in predicting AVF risk after BKP, with L1 HU being the most reliable indicator of bone strength and AVF risk.
Notes
Conflict of Interest
No potential conflict of interest relevant to this article was reported.
Author Contributions
Conception and design: KM. Data acquisition and analysis of data: HS, SS, TF, HT, RO. Drafting a manuscript: KM. Manuscript review & editing: KN. Critical revision: KN. Administrative support: MH. Supervision: MH, KN. Final approval of the manuscript: all authors.
