

ORIGINAL ARTICLE: GENITOURINARY CANCERS 

Year : 2019  Volume
: 8
 Issue : 3  Page : 150159 

Prostate cancer survival estimates: An application with piecewise hazard function derivation
Atanu Bhattacharjee, Atul Budukh, Rajesh Dikshit
Centre for Cancer Epidemiology, The Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Navi Mumbai, Maharastra, India
Date of Web Publication  01Aug2019 
Correspondence Address: Dr. Atanu Bhattacharjee Centre for Cancer Epidemiology, The Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Navi Mumbai, Maharastra India
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/sajc.sajc_245_18
Background: The hazard function is defined as timedependent. However, it is an overlooked area of research about the estimation of hazard function within the frame of time. The possible explanation could be carried by estimating function through the changes of time points. It is expected that it will provide us the overall idea of survival trend. This work is dedicated to propose a method to work with piecewise hazard rate. It is a datadriven method and provides us the estimates of hazard function with different time points. Methods: The proposed method is explored with prostate cancer patients, registered in the Surveillance, Epidemiology, and End Results Program and having aged at diagnosis with range 40–80 years and above. A total of 610,814 patients are included in this study. The piecewise hazard rate is formulated to serve the objective. The measurement of piecewise hazard rate is compared with Waldtype test statistics, and corresponding R function is provided. The duration of followups is split into different intervals to obtain the piecewise hazard rate estimates. Results: The maximum duration of followup observed in this study is 40 years. The piecewise hazard rate changes at different intervals of followups are observed almost same except few later intervals in the followup. The likelihood of hazard in earlier aged patients observed lower in comparison to older patients. The hazard rates in different grades of prostate cancer also observed separately. Conclusion: The application of piecewise hazard helps to generate statistical inference in a deeper manner. This analysis will provide us the better understanding of a requirement of effective treatment toward prolonged survival benefit for different aged patients. Keywords: Piecewise hazard function, prostate cancer, SEER
How to cite this article: Bhattacharjee A, Budukh A, Dikshit R. Prostate cancer survival estimates: An application with piecewise hazard function derivation. South Asian J Cancer 2019;8:1509 
How to cite this URL: Bhattacharjee A, Budukh A, Dikshit R. Prostate cancer survival estimates: An application with piecewise hazard function derivation. South Asian J Cancer [serial online] 2019 [cited 2020 Aug 9];8:1509. Available from: http://journal.sajc.org/text.asp?2019/8/3/150/263882 
Introduction   
There have been significantly more deaths due to prostate cancer among patients with the age group of 62–76 years in comparison to age <61 years.^{[1]} It sparked us to dig a better estimate about the influence of age on prostate cancer deaths. We are interested to get estimates of hazard function those are changing with time point. Three important parameters, that is, duration of survival, reasons for death, and age of patients are required for estimates of hazard function. Since we are trying to establish hazard function with reference to different ages in years, it is also an important to initiate the work with a high amount of sample size data. The agewise classification of data created several strata with small sample size. Unless our cohort data are not large enough in size, it is difficult to establish the robust statistical inference with hazard functions for different ages in years. A relatively large sample size data on prostate cancer were obtained from the Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) PublicUse Data (1973–2014), National Cancer Institute.
It is true that prostate cancer is a deadly disease. However, prostate cancer is observed with prolonged survival. There are always possibilities that the patients may be exposed due to other causes of death. The management to prolong the duration of survival is always interest in any clinical practice. However, the challenge to prolonging the survival for the younger patient is not same for older patients. For instance, the effort to prolonging the survival of a 40yearold patient to 41 years is not same for the 60yearold patient to his 61 years.^{[2],[3]} The reason is the presence of different life expectancy in different age groups. It is obvious that older patients will be diagnosed with prostate cancer with several comorbidities. It becomes difficult to cover the minimum label of life expectancy in the general population for an older prostate cancer patient in the presence of different comorbidities. Simultaneously, the younger prostate cancer patient may be free from different comorbidities, but covering their life toward average life expectancy is another challenge due to the long gap of years between their age and life expectancy in the general population. In this circumstance, we preferred to use piecewise hazard to capture the magnitude of mortality risk in different age groups due to prostate cancer. Hence, the objective of this study is to estimate the hazard functions that are changing with time. While searching with work on piecewise hazard function, it has been observed that the single changepoint analysis with hazard function^{[4],[5]} and multiple changepoint analysis are attempted.^{[6]} We adopted the datadriven approach for detecting the number of change points with piecewise hazard function. The results were further compared with likelihood ratio test^{[7]} with piecewise hazard estimates.^{[5],[6],[7],[8],[9],[10]}
Piecewise Hazard Function   
The idea to compare treatment effect by cumulative risk of event is useful to quantify the ultimate treatment benefit.^{[11],[12]} In our motivating context, the theoretical quantities of interest are the survival benefit in a specific time intervals and identify the necessary steps to modify the treatment management strategy. Let the total time point is measured with interval (0, ). The total time interval is split into where . The corresponding hazard is defined as and g_{0} =1.
The piecewise constant hazard function is defined^{[13]} as follows:
The survival function is:
The cumulative hazard is obtained as follows:
Piecewise Hazard With Multiple Testing Problem   
The terms X_{1},… X_{n} denote independent identically distributed survival times and C_{1},… C_{n} be the censoring times which are assumed to be independently of X. We only observe the pairs.
(T_{i},δ_{i},i = 1, 2, ......n) where X_{i},C_{i} and i = 1 if
and zero otherwise. Considering the following changepoint model,
where are the change points, k the number of change points in the model, and α _{i}; the value of the hazard function between the time points τ_{j1} and τ_{j}.
We propose a maximum likelihood estimates to estimate the unknown parameters. Based on Equation (1), the loglikelihood function is formulated as follows:
Where is the number of death observed up to time t with $\tau_{j}, j = 1., k$ fixed, some algebra yields that the maximizes of τ_{j} are given by:
Substituting these values into log L gives the profile likelihood for $\tau_{j}$'s, which can be expressed as:
We then maximize with respect to and insert the obtained values back to for MLEs of α_{j}
Now, the objective is to identify the changes of τ_{j}. It can be confirmed through the hypothesis test with . The representation of τ_{j}can be prepared by different factors. In this work, it is assumed with age. It is explored that and are independent in nature.^{[7]} The Waldtype test statistics is as follows:
It follows the Chisquare test statistics with one degree of freedom under null hypothesis. We wrote an R function, called Wald Test (), which allows to perform test statistics. Its source code is reported in Appendix A.
Data analysis
The proposed method is explored with prostate cancer data, the SEER Program (www.seer.cancer.gov) PublicUse Data (1973–2014), National Cancer Institute, with a followup till December 2014. The cancer incidence and survival status of the patients are included in this data set. There are several causes of death among patients included in these data. However, we only consider the causes of death due to prostate cancer and censored cases. Deaths due to other causes are excluded for this analysis. In this data set, there are other subsites of prostate cancer such as “Prepuce,” “Glans penis,” “Body of penis,” “Overlapping lesion of penis,” “Penis, NOS,” “Prostate gland,” “Undescended testis,” “Descended testis,” “Testis, NOS,”
"Epididymis,” “Spermatic cord,” “Scrotum, NOS,” “Other specified parts of male genital organs,” “Overlapping lesion of male genital organs,” and “Male genital organs, NOS” are excluded from this analysis to maintain the level of consistency as much as possible.
Results   
A total of 610,814 patients are included in this study. Registered patients died due to prostate cancer or censored are included in this study. Initially, we prepared the descriptive statistics to check the occurrence of prostate cancer with respect to age. It is observed that there are very less number of cases of age at diagnosis of up to 40 years. Thus, in some age at diagnosis, it is observed with zero count or very less number of prostate cancer cases. Our intention is to present hazard rate for each age at diagnosis due to prostate cancer. However, it is not feasible due to zeroinflated or very less count represented prostate cancer cases in different ages at diagnosis, although the sample is very large. These very less count number of cases are explored with percentage with reference to the cohort size, that is, 611,133 and many times, these are observed with frequencies with zero with two decimal places. Only age at diagnosis observed with cumulative frequencies 0.01 or more is included in this study from a cohort size of 611,133. Finally, only patients of age at diagnosis minimum 40 years are included in this study. The graphical representation of a number of cases and their death rate at different ages at diagnosis is detailed in [Figure 1]. The count table with cases and deaths is presented in [Table 1]. In the next step, we split the duration of survival into different survival intervals by , where represents 0–20 months and as 20–40 months. Under the null hypothesis testing, it is assumed that , where k = 1 to 11. However, to avoid the multiple testing problems, the hypothesis tests are performed with k >k − 1 and k = 1 to 11. The upper limit of k is defined 39 (i.e., 39 months) because the maximum duration of followup with death occurrences in this data set is observed with 468 months, that is, 39 years. Therefore, a total of 38 survival intervals are generated with 12month window from the observed duration of survival. The outcomes with piecewise hazard estimates and 95% lower control limit (LCL) and upper control limit (UCL) are presented with [Figure 2]. The numerical outputs are presented in [Table 2]. There are four different grades. The piecewise hazard estimates adjusted with different grades are presented in [Table 3]. The results show that no significant changes in piecewise hazard estimates are observed between different ages at prostate cancer diagnosis. It shows that the initial duration of followup of the hazard rates is almost equal in the entire interval and not significantly different in any age at diagnosis. However, few significant changes observed intervals in 25 (25) and to 39 (39) onward. However, in most of these cases, this interval is not observed significantly different with upper and lower confidence intervals. Hence, our null hypothesis not rejected.  Figure 1: Distribution of age at diagnosis, number of prostate cancer cases, and death due to prostate cancer
Click here to view 
 Table 1: Prostate cancer occurence and death presentation in different age at diagnosis
Click here to view 
 Figure 2: Piecewise hazard rate estimated in different survival duration intervals
Click here to view 
 Table 2: Piecewise hazard ratio estimates in different survival intervals in months
Click here to view 
 Table 3: Piecewise Hazard Ratio Estimates in Different Survival Intervals in Months for Grade I, II, III and IV
Click here to view 
In the final step, we performed the ageadjusted piecewise hazard estimates to test the real impact of age at diagnosis on hazard rate in prolonged survival of prostate cancer. Patients' age at diagnosis 40 years and above are considered in this step. However, patients' age at diagnosis 80 years and above are classified into the same category. The maximum age at diagnosis is observed with 107 years. In this step, the duration of survival is split into a maximum of 11 different intervals by , where represents by 0–20 months and as 20–40 months. Reason to prepare less number of intervals in comparison to earlier step is because of the presence of less number of patients in different survival durations with ageadjusted data. In addition to that, we observed that in some survival intervals, the estimates are failed to generate due to limited number of cases. However, those are observed for prolonged survival intervals not for initial intervals. The problem is overcome by extending the duration of survival interval with longer window. For example, if we failed to generate piecewise hazard estimate for interval between 280 and 300 months, then interval is extended up to 280 and 320 months and piecewise hazard is generated thereafter. If we still failed to generate the estimate, then it further extending into 280–340 months. A total of 10 intervals are generated. The corresponding estimates of piecewise hazard estimates are provided through [Figure 3]. The similar hypothesis is assumed with τ_{k} = τ_{k1} wherek = 1 to 11.
The outcomes with hazard rate and 95% LCL and UCL are presented with [Figure 3]. The numerical outputs are presented in [Table 4] and [Table 5]. The graphical representations are provided in [Figure 3]. [Figure 3] provides that in the initial duration of followup, the hazard rates are higher in older age patients. While we shifted the duration of survival from 20 to 40 months and thereafter 40–60 months, it shows that the hazard rate in older age patients was started to decline. However, the hazard rate for younger age patients steadily inclined through increases of duration of survival. However, at the end of duration of survival, the hazard rate in younger and old patients is maintained with similar hazard rate. It can be concluded that prostate cancer is more fatal in older age group patients after diagnosis. However, in longer duration, it becomes more fatal in the younger patient as compared to older.  Table 4: Piecewise Hazard Ratio Estimates in Different Survival Intervals in Months
Click here to view 
 Table 5: Piecewise Hazard Ratio Estimates in Different Survival Intervals in Months (Continued)
Click here to view 
Discussion   
There are very limited applications observed with piecewise proportional hazard model. The application of piecewise proportional hazard is observed to determine the hormone therapeutic effect in women's health.^{[14]} It is also used to compare the infant and early childhood mortality rates.^{[15]} The risk of home hemodialysis utilization in Canada and their corresponding risks are compared through piecewise proportional hazard function.^{[16]} It is always better to start with conventional hazard rate, due to the conditional in nature, and easy to handle with timedependent treatment.^{[17],[18]} However, it is not suitable with multiple timescales.^{[19]} The piecewise Poisson model is found suitable to work with multiple timescales to evaluate the impact of event by likelihood ratio test.^{[9]} In this work, we also used the likelihood ratio results through Waldtype test statistics.
The estimates of hazard function are feasible to use to develop prediction score as well. It will provide us another dimension about the establishment of therapeutic effect. It may be important toward health policy decision. With an enhanced understanding of the hazard function estimation with time point, we can improve the estimation procedure.
By analyzing the change of hazard for different age groups from SEER data, we can establish the different phases in mortality risk in prostate cancer patients. We identified that age more than 40 years is highly affected by prostate cancer death. The death due to prostate cancer becomes influential after 40 years and above.
The duration of followup in prostate cancer patients is relatively large. However, interpretation about causes of death among prostate cancer patients is relatively difficult in comparison to other types of cancer. Since during the prolonged followup period, patients could be exposed with several other causes and other causes may jointly and separately be able to decline the duration of survival. It is assumed that patients will be exposed more number of causes to penetrate their death as long they survived. In this situation, the age of the patients as separate factor is considered in this study. The timevarying effects and biologically plausible interactions are also required to be considered. In such a way, the model could be complex and piecewise hazard function could be appropriate tools.
One recent study on SEER confirmed that the prostate cancer patients with conservatively managed, localized, and welltomoderately differentiated prostate cancer observed with 8%–9% incidence of mortality between 10 years from the date of diagnosis.^{[20]} It is also concluded that majority of prostate cancer cases die due to other causes. The other cause like lifestyle is required to be modified.^{[21]}
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
Appendix A
WaldTest = function (L)
{
WaldTest = numeric (3)
names (WaldTest) = c(“W”,”df”,”P value”)
r = dim (L)[1]
W = ((tau1.tau2)^{2})/v
W = as.numeric(W)
pval = 1.pchisq(W,1)
WaldTest[1] = W; WaldTest[2] = r; WaldTest[3] = pval
WaldTest
} # End function WaldTest
LL = rbind (c(1,.1)); LL thetahat = c(1,1)
References   
1.  Parikh RR, Kim S, Stein MN, Haffty BG, Kim IY, Goyal S, et al. Trends in active surveillance for very lowrisk prostate cancer: Do guidelines influence modern practice? Cancer Med 2017;6:24108. 
2.  Badar F, Mahmood S. Epidemiology of cancers in Lahore, Pakistan, among children, adolescents and adults, 20102012: A crosssectional study part 2. BMJ Open 2017;7:e016559. 
3.  Watts EL, Appleby PN, Albanes D, Black A, Chan JM, Chen C, et al. Circulating sex hormones in relation to anthropometric, sociodemographic and behavioural factors in an international dataset of 12,300 men. PLoS One 2017;12:e0187741. 
4.  Gijbels I, Gürler U. Estimation of a change point in a hazard function based on censored data. Lifetime Data Anal 2003;9:395411. 
5.  Goodman MS, Li Y, Tiwari RC. Detecting multiple change points in piecewise constant hazard functions. J Appl Stat 2011;38:252332. 
6.  Kim HJ, Fay MP, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Stat Med 2000;19:33551. 
7.  Yao YC. Maximum likelihood estimation in hazard rate models with a changepoint. Commun Stat Theory Methods 1986;15:245566. 
8.  Henderson R. A problem with the likelihood ratio test for a changepoint hazard rate model. Biometrika 1990;77:83543. 
9.  Matthews DE, Farewell VT. On a singularity in the likelihood for a changepoint hazard rate model. Biometrika 1985;72:7034. 
10.  Nguyen HT, Rogers GS, Walker EA. Estimation in changepoint hazard rate models. Biometrika 1984;71:299304. 
11.  Rebora P, Galimberti S, Valsecchi MG. Using multiple timescale models for the evaluation of a timedependent treatment. Stat Med 2015;34:364860. 
12.  Pepe MS, Mori M. KaplanMeier, marginal or conditional probability curves in summarizing competing risks failure time data? Stat Med 1993;12:73751. 
13.  
14.  Yang S, Prentice RL. Assessing potentially timedependent treatment effect from clinical trials and observational studies for survival data, with applications to the women's health initiative combined hormone therapy trial. Stat Med 2015;34:180117. 
15.  Kuate Defo B. Determinants of infant and early childhood mortality in Cameroon: The role of socioeconomic factors, housing characteristics, and immunization status. Soc Biol 1994;41:181211. 
16.  Perl J, Na Y, Tennankore KK, Chan CT. Temporal trends and factors associated with home hemodialysis technique survival in Canada. Clin J Am Soc Nephrol 2017. pii: CJN.13271216. 
17.  Mantel N, Byar DP. Evaluation of responsetime data involving transient states: An illustration using hearttransplant data. J Am Stat Assoc 1974;69:816. 
18.  Anderson JR, Cain KC, Gelber RD. Analysis of survival by tumor response. J Clin Oncol 1983;1:7109. 
19.  Rebora P, Salim A, Reilly M. Bshazard: A flexible tool for nonparametric smoothing of the hazard function. R J 2014;6:11422. 
20.  LuYao GL, Albertsen PC, Moore DF, Shih W, Lin Y, DiPaola RS, et al. Outcomes of localized prostate cancer following conservative management. JAMA 2009;302:12029. 
21.  Epstein MM, Edgren G, Rider JR, Mucci LA, Adami HO. Temporal trends in cause of death among Swedish and US men with prostate cancer. J Natl Cancer Inst 2012;104:133542. 
[Figure 1], [Figure 2], [Figure 3]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]
