Patient characteristics
The younger and older groups showed significant differences in terms of race (P < 0.001), marital status (P = 0.002), year of diagnosis (P = 0.001), tumor grade (P < 0.001), histology (P < 0.001), T stage (P = 0.037), and N stage (P < 0.001) before IPTW (Table 1). However, no statistically significant differences in baseline characteristics were found between the two groups after IPTW (Supplementary Table 1, Supplementary Fig. 1).
OS and CODs
The younger group exhibited a significantly longer median OS than the older group before IPTW adjustment (140 vs. 50 months, P < 0.001, Fig. 1A) and after IPTW adjustment (137 vs. 53 months, P < 0.001, Fig. 1B).

Overall survival (OS) based on age groups before and after the inverse probability of treatment weighting.
Before IPTW, the older group had higher 5-, 10-, and 15-year cumulative incidences of NRD (31, 35, and 37% vs. 21, 27, and 30%; P < 0.001, Fig. 2A), SMNs (11, 14, and 15% vs. 8.4, 10, and 11%; P = 0.006, Fig. 2B), CVDs (4.2, 7, and 8.7% vs. 0.5, 1.5, and 2.7%; P < 0.001, Fig. 2C), and other causes (8.7, 19, and 25% vs. 3.9, 8.1, and 11%; P < 0.001, Fig. 2D) than the younger group.

Comparison of cumulative incidences of NRDs (A), SMNs (B), CVDs (C), and other causes (D) between the older and younger groups at 5, 10, and 15 years before IPTW. IPTW, inverse probability of treatment weighting; NPC, nasopharyngeal carcinoma; NRD, NPC-related deaths; SMN, secondary malignant neoplasm; CVD, cardiovascular disease.
After IPTW, the older group had worse 5-, 10-, and 15-year cumulative incidences of NRD (30, 34, and 38% vs. 21, 27, and 30%; P < 0.001, Fig. 3A), CVDs (4.1, 7.2, and 8.8% vs. 0.5, 1.8, and 3.0%; P < 0.001, Fig. 3C), and other causes (8.3, 17, and 24% vs. 4.1, 8.7, and 12%; P < 0.001, Fig. 3D) than the younger group. However, cumulative incidences of SMNs were comparable between the two groups (P = 0.100, Fig. 3B).

Comparison of cumulative incidences of NRDs (A), SMNs (B), CVDs (C), and other causes (D) between the older and younger groups at 5, 10, and 15 years after IPTW. IPTW, inverse probability of treatment weighting; NPC, nasopharyngeal carcinoma; NRD, NPC-related deaths; SMN, secondary malignant neoplasm; CVD, cardiovascular disease.
Patients based on age groups and spline curve analysis
All patients were categorized into seven age groups: 18–30, 30–40, 40–50, 50–60, 60–70, 70–80, and 80 + years old (Supplementary Table 2). Detailed comparisons of OS among these groups showed significant differences, with the 18–30 age group demonstrating the longest OS and the 80 + age group exhibiting the worst OS (Fig. 4). Additionally, we performed spline regression analysis for the entire cohort and for each COD category, including NRD, SMN, CVD, and other causes. Our results confirm that as age increases, the risk of mortality due to all causes (Supplementary Fig. 2), NRDs (Supplementary Fig. 3A), SMNs (Supplementary Fig. 3B), and other causes increases (Supplementary Fig. 3D) in a continuous fashion. However, for CVD-related mortality (Supplementary Fig. 3C), we observed a distinct pattern: the risk remained similar up to the age of 55, after which the risk increased significantly.

Survival analysis based on different age subgroups.
Supplementary Table 3 presents the distribution of SMNs in the cohort, categorized by age groups. Multivariable Cox regression analysis shows that sex, age, marital status, year of diagnosis, household income, grade, histology, and M stage are associated with SMNs (Supplementary Table 4).
Evaluation of the machine learning models
Age, metastasis, stage, marital status, histology, year of diagnosis, and household income were identified as factors affecting OS through LASSO analysis (Supplementary Fig. 4). The RF model demonstrated the highest C-index among all models, reaching 0.701 (Fig. 5). The six models were evaluated in the validation set by comparing their Brier scores and areas under the ROC curves (AUCs, Table 2; Fig. 6). Furthermore, the DCA (Supplementary Fig. 5A-C) and calibration curves (Supplementary Fig. 5D) for the RF model showed strong predictive accuracy in estimating 3-, 5-, and 10-year survival rates.

Concordance index ranking graph.

Receiver operating characteristic curves for predicting 3-, 5, and 10-year survival based on seven different models: Cox regression (A), DT (B), DT (C), GBM (D), SVM (E), and XGBoost (F). ROC, receiver operating characteristic; DT, decision tree; GBM, gradient-boosting machine; SVM, support vector machine; XGBoost, eXtreme Gradient Boosting.
Model interpretation
The time-dependent variable importance bar plots revealed that age had the greatest influence on 3-, 5-, and 10-year survival, followed by metastasis and tumor stage (Fig. 7). PDPs further indicated that advanced stages, increased age, and the presence of M were associated with worse survival (Supplementary Fig. 6). Likewise, SHAP value-based box plots revealed a strong association between older age and reduced survival rates (Supplementary Fig. 7).

Time-dependent variable importance bar plots display the ranking of significant features affecting the 3-, 5-, and 10-year survival.
link