R draw survival curve and calculate P-value at specific times











up vote
2
down vote

favorite
1












I am trying to figure out how to generate a survival curve and calculate P-value of a specific time point and not of the entire survival curve.



I use the surv and survfit methods from packages survminer, survival to create the survival object and ggsurvplot to draw the curve and it's p-value.



df_surv <- Surv(time = df$diff_in_days, event = df$survivalstat)
df_survfit <- survfit(dat_surv ~ Schedule, data = df)

ggsurvplot(
df_survfit ,
data = df,
pval = TRUE
)


Now it calculates the p-value over the entire curve of 2500+ days. I would also like to calculate the P-value at exact intervals. Let's say I would like to know the survival probability at / up-to 365 days.



I can't simply cut-off all records which have survival times longer than x (e.g. 365) days, as below. Then all survival probability drops to 0% since subjects who had the event occur later than 365 aren't taken into account.



the event hasn't there also isn't anybody alive beyond x days anymore.



df <- df[df$diff_in_days <= 365, ]


How can I calculate the P-value at a specific time from the overall curve?



The dput(head(df) of my dataframe for reproducible example.



structure(list(diff_in_days = structure(c(2160, 84, 273, 1245, 
2175, 114), class = "difftime", units = "days"), Schedule = c(1,
1, 1, 2, 2, 2), survivalstat = c(0, 1, 1, 0, 1, 1)), row.names = c(12L,
28L, 33L, 38L, 58L, 62L), class = "data.frame")


My dataframe




  • UID (each row is a new entry)

  • Event occurrence no/yes (0,1)

  • Integer amount of days till event happened (if occurence didnt happen yet, the days from start of monitoring till current is calculated (right-censoring))


EDIT:



using following code to set everybody's event occurance to 0 after 365 days.



dat$survivalstat <- ifelse(dat$diff_in_days > 365, 0, dat$survivalstat)


It does calculate the p-value but still over the entire curve. After 365 days it stays horizontal till the end at 2500+ days (since no events occur) and those events after 365 days are all still taken into account because theyre still in the curve. (I assume that even tho all the datapoints after 365 are the same, they still affect the P-value?)










share|improve this question




















  • 2




    Could you please provide a minimal reproducible example?
    – Mr_Z
    Nov 9 at 9:21










  • I can't simply cut-off all records which have survival times longer than x... since subjects who had the event occur later than 365 aren't taken into account. And if you set the survival status 0 for these subjects?
    – Stéphane Laurent
    Nov 15 at 13:12










  • i've edited my post. It does calculate the p-value but still over the entire curve. After 365 days it stays horizontal till the end at 2500+ days (since no events occur) and those events after 365 days are all still taken into account because theyre still in the curve.
    – Krijn van der Burg
    Nov 15 at 13:57










  • This is more of a statistics question than a programming question and should be moved to cross validated
    – Mike
    Nov 15 at 15:31















up vote
2
down vote

favorite
1












I am trying to figure out how to generate a survival curve and calculate P-value of a specific time point and not of the entire survival curve.



I use the surv and survfit methods from packages survminer, survival to create the survival object and ggsurvplot to draw the curve and it's p-value.



df_surv <- Surv(time = df$diff_in_days, event = df$survivalstat)
df_survfit <- survfit(dat_surv ~ Schedule, data = df)

ggsurvplot(
df_survfit ,
data = df,
pval = TRUE
)


Now it calculates the p-value over the entire curve of 2500+ days. I would also like to calculate the P-value at exact intervals. Let's say I would like to know the survival probability at / up-to 365 days.



I can't simply cut-off all records which have survival times longer than x (e.g. 365) days, as below. Then all survival probability drops to 0% since subjects who had the event occur later than 365 aren't taken into account.



the event hasn't there also isn't anybody alive beyond x days anymore.



df <- df[df$diff_in_days <= 365, ]


How can I calculate the P-value at a specific time from the overall curve?



The dput(head(df) of my dataframe for reproducible example.



structure(list(diff_in_days = structure(c(2160, 84, 273, 1245, 
2175, 114), class = "difftime", units = "days"), Schedule = c(1,
1, 1, 2, 2, 2), survivalstat = c(0, 1, 1, 0, 1, 1)), row.names = c(12L,
28L, 33L, 38L, 58L, 62L), class = "data.frame")


My dataframe




  • UID (each row is a new entry)

  • Event occurrence no/yes (0,1)

  • Integer amount of days till event happened (if occurence didnt happen yet, the days from start of monitoring till current is calculated (right-censoring))


EDIT:



using following code to set everybody's event occurance to 0 after 365 days.



dat$survivalstat <- ifelse(dat$diff_in_days > 365, 0, dat$survivalstat)


It does calculate the p-value but still over the entire curve. After 365 days it stays horizontal till the end at 2500+ days (since no events occur) and those events after 365 days are all still taken into account because theyre still in the curve. (I assume that even tho all the datapoints after 365 are the same, they still affect the P-value?)










share|improve this question




















  • 2




    Could you please provide a minimal reproducible example?
    – Mr_Z
    Nov 9 at 9:21










  • I can't simply cut-off all records which have survival times longer than x... since subjects who had the event occur later than 365 aren't taken into account. And if you set the survival status 0 for these subjects?
    – Stéphane Laurent
    Nov 15 at 13:12










  • i've edited my post. It does calculate the p-value but still over the entire curve. After 365 days it stays horizontal till the end at 2500+ days (since no events occur) and those events after 365 days are all still taken into account because theyre still in the curve.
    – Krijn van der Burg
    Nov 15 at 13:57










  • This is more of a statistics question than a programming question and should be moved to cross validated
    – Mike
    Nov 15 at 15:31













up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





I am trying to figure out how to generate a survival curve and calculate P-value of a specific time point and not of the entire survival curve.



I use the surv and survfit methods from packages survminer, survival to create the survival object and ggsurvplot to draw the curve and it's p-value.



df_surv <- Surv(time = df$diff_in_days, event = df$survivalstat)
df_survfit <- survfit(dat_surv ~ Schedule, data = df)

ggsurvplot(
df_survfit ,
data = df,
pval = TRUE
)


Now it calculates the p-value over the entire curve of 2500+ days. I would also like to calculate the P-value at exact intervals. Let's say I would like to know the survival probability at / up-to 365 days.



I can't simply cut-off all records which have survival times longer than x (e.g. 365) days, as below. Then all survival probability drops to 0% since subjects who had the event occur later than 365 aren't taken into account.



the event hasn't there also isn't anybody alive beyond x days anymore.



df <- df[df$diff_in_days <= 365, ]


How can I calculate the P-value at a specific time from the overall curve?



The dput(head(df) of my dataframe for reproducible example.



structure(list(diff_in_days = structure(c(2160, 84, 273, 1245, 
2175, 114), class = "difftime", units = "days"), Schedule = c(1,
1, 1, 2, 2, 2), survivalstat = c(0, 1, 1, 0, 1, 1)), row.names = c(12L,
28L, 33L, 38L, 58L, 62L), class = "data.frame")


My dataframe




  • UID (each row is a new entry)

  • Event occurrence no/yes (0,1)

  • Integer amount of days till event happened (if occurence didnt happen yet, the days from start of monitoring till current is calculated (right-censoring))


EDIT:



using following code to set everybody's event occurance to 0 after 365 days.



dat$survivalstat <- ifelse(dat$diff_in_days > 365, 0, dat$survivalstat)


It does calculate the p-value but still over the entire curve. After 365 days it stays horizontal till the end at 2500+ days (since no events occur) and those events after 365 days are all still taken into account because theyre still in the curve. (I assume that even tho all the datapoints after 365 are the same, they still affect the P-value?)










share|improve this question















I am trying to figure out how to generate a survival curve and calculate P-value of a specific time point and not of the entire survival curve.



I use the surv and survfit methods from packages survminer, survival to create the survival object and ggsurvplot to draw the curve and it's p-value.



df_surv <- Surv(time = df$diff_in_days, event = df$survivalstat)
df_survfit <- survfit(dat_surv ~ Schedule, data = df)

ggsurvplot(
df_survfit ,
data = df,
pval = TRUE
)


Now it calculates the p-value over the entire curve of 2500+ days. I would also like to calculate the P-value at exact intervals. Let's say I would like to know the survival probability at / up-to 365 days.



I can't simply cut-off all records which have survival times longer than x (e.g. 365) days, as below. Then all survival probability drops to 0% since subjects who had the event occur later than 365 aren't taken into account.



the event hasn't there also isn't anybody alive beyond x days anymore.



df <- df[df$diff_in_days <= 365, ]


How can I calculate the P-value at a specific time from the overall curve?



The dput(head(df) of my dataframe for reproducible example.



structure(list(diff_in_days = structure(c(2160, 84, 273, 1245, 
2175, 114), class = "difftime", units = "days"), Schedule = c(1,
1, 1, 2, 2, 2), survivalstat = c(0, 1, 1, 0, 1, 1)), row.names = c(12L,
28L, 33L, 38L, 58L, 62L), class = "data.frame")


My dataframe




  • UID (each row is a new entry)

  • Event occurrence no/yes (0,1)

  • Integer amount of days till event happened (if occurence didnt happen yet, the days from start of monitoring till current is calculated (right-censoring))


EDIT:



using following code to set everybody's event occurance to 0 after 365 days.



dat$survivalstat <- ifelse(dat$diff_in_days > 365, 0, dat$survivalstat)


It does calculate the p-value but still over the entire curve. After 365 days it stays horizontal till the end at 2500+ days (since no events occur) and those events after 365 days are all still taken into account because theyre still in the curve. (I assume that even tho all the datapoints after 365 are the same, they still affect the P-value?)







r dataframe survival-analysis






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 at 13:53

























asked Nov 9 at 8:23









Krijn van der Burg

1451223




1451223








  • 2




    Could you please provide a minimal reproducible example?
    – Mr_Z
    Nov 9 at 9:21










  • I can't simply cut-off all records which have survival times longer than x... since subjects who had the event occur later than 365 aren't taken into account. And if you set the survival status 0 for these subjects?
    – Stéphane Laurent
    Nov 15 at 13:12










  • i've edited my post. It does calculate the p-value but still over the entire curve. After 365 days it stays horizontal till the end at 2500+ days (since no events occur) and those events after 365 days are all still taken into account because theyre still in the curve.
    – Krijn van der Burg
    Nov 15 at 13:57










  • This is more of a statistics question than a programming question and should be moved to cross validated
    – Mike
    Nov 15 at 15:31














  • 2




    Could you please provide a minimal reproducible example?
    – Mr_Z
    Nov 9 at 9:21










  • I can't simply cut-off all records which have survival times longer than x... since subjects who had the event occur later than 365 aren't taken into account. And if you set the survival status 0 for these subjects?
    – Stéphane Laurent
    Nov 15 at 13:12










  • i've edited my post. It does calculate the p-value but still over the entire curve. After 365 days it stays horizontal till the end at 2500+ days (since no events occur) and those events after 365 days are all still taken into account because theyre still in the curve.
    – Krijn van der Burg
    Nov 15 at 13:57










  • This is more of a statistics question than a programming question and should be moved to cross validated
    – Mike
    Nov 15 at 15:31








2




2




Could you please provide a minimal reproducible example?
– Mr_Z
Nov 9 at 9:21




Could you please provide a minimal reproducible example?
– Mr_Z
Nov 9 at 9:21












I can't simply cut-off all records which have survival times longer than x... since subjects who had the event occur later than 365 aren't taken into account. And if you set the survival status 0 for these subjects?
– Stéphane Laurent
Nov 15 at 13:12




I can't simply cut-off all records which have survival times longer than x... since subjects who had the event occur later than 365 aren't taken into account. And if you set the survival status 0 for these subjects?
– Stéphane Laurent
Nov 15 at 13:12












i've edited my post. It does calculate the p-value but still over the entire curve. After 365 days it stays horizontal till the end at 2500+ days (since no events occur) and those events after 365 days are all still taken into account because theyre still in the curve.
– Krijn van der Burg
Nov 15 at 13:57




i've edited my post. It does calculate the p-value but still over the entire curve. After 365 days it stays horizontal till the end at 2500+ days (since no events occur) and those events after 365 days are all still taken into account because theyre still in the curve.
– Krijn van der Burg
Nov 15 at 13:57












This is more of a statistics question than a programming question and should be moved to cross validated
– Mike
Nov 15 at 15:31




This is more of a statistics question than a programming question and should be moved to cross validated
– Mike
Nov 15 at 15:31












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted
+50










If you want a p-value at a specific time point you can do a z-test at a particular time point. In my example below I used the lung data set from the survival package. For better help to see if this method is appropriate I would post this question on cross validated.



library(survival)
library(dplyr)
library(broom)
library(ggplot2)
fit1 <- survfit(Surv(time,status)~sex,data = lung)
#turn into df
df <- broom::tidy(fit1)

fit_df <- df %>%
#group by strata
group_by(strata) %>%
#get day of interest or day before it
filter(time <= 365) %>%
arrange(time) %>%
# pulls last date
do(tail(.,1))

#calculate z score based on 2 sample test at that time point
z <- (fit_df$estimate[1]-fit_df$estimate[2]) /
(sqrt( fit_df$std.error[1]^2+ fit_df$std.error[2]^2))
#get probability of z score
pz <- pnorm(abs(z))
#get p value
pvalue <- round(2 * (1-pz),2)



ggplot(data = df, aes(x=time, y=estimate, group=strata, color= strata)) +
geom_line(size = 1.5)+
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2)+
geom_vline(aes(xintercept=365))+
geom_text(aes(x = 500,y=.8,label = paste0("p = " ,pvalue) ))+
scale_y_continuous("Survival",
limits = c(0,1))+
scale_x_continuous("Time")+
scale_color_manual(" ", values = c("grey", "blue"))+
scale_fill_discrete(guide = FALSE)+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=14),
axis.title.x = element_text(size =14),
axis.text.y = element_text(size = 14),
strip.text.x = element_text(size=14),
axis.title.y = element_blank())+
theme_bw()


enter image description here



Update - getting p-value up to a specific time point using log rank



#First censor and make follow time to the time point of interest 
lung2 <- lung %>%
mutate(time2 = ifelse(time >= 365, 365, time),
status2 = ifelse(time >= 365, 1,status))
#Compute log rank test using survdiff
sdf <- survdiff(Surv(time2,status2)~sex,data = lung2)
#extract p-value
p.val <- round(1 - pchisq(sdf$chisq, length(sdf$n) - 1),3)


In the ggplot code above you can replace pvalue with p.val so it shows the log rank score.






share|improve this answer























  • Thank you very much! this is exactly what I was looking for. Though I was hoping to have the P-value calculated using log-rank. Could you perhaps refer me a link where I can calculate the P-value using log-rank using your method. Thanks again!
    – Krijn van der Burg
    Nov 16 at 12:26










  • @KrijnvanderBurg I updated the post to show how to get a log rank p-value at a specific time point
    – Mike
    Nov 16 at 14:37











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53222122%2fr-draw-survival-curve-and-calculate-p-value-at-specific-times%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted
+50










If you want a p-value at a specific time point you can do a z-test at a particular time point. In my example below I used the lung data set from the survival package. For better help to see if this method is appropriate I would post this question on cross validated.



library(survival)
library(dplyr)
library(broom)
library(ggplot2)
fit1 <- survfit(Surv(time,status)~sex,data = lung)
#turn into df
df <- broom::tidy(fit1)

fit_df <- df %>%
#group by strata
group_by(strata) %>%
#get day of interest or day before it
filter(time <= 365) %>%
arrange(time) %>%
# pulls last date
do(tail(.,1))

#calculate z score based on 2 sample test at that time point
z <- (fit_df$estimate[1]-fit_df$estimate[2]) /
(sqrt( fit_df$std.error[1]^2+ fit_df$std.error[2]^2))
#get probability of z score
pz <- pnorm(abs(z))
#get p value
pvalue <- round(2 * (1-pz),2)



ggplot(data = df, aes(x=time, y=estimate, group=strata, color= strata)) +
geom_line(size = 1.5)+
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2)+
geom_vline(aes(xintercept=365))+
geom_text(aes(x = 500,y=.8,label = paste0("p = " ,pvalue) ))+
scale_y_continuous("Survival",
limits = c(0,1))+
scale_x_continuous("Time")+
scale_color_manual(" ", values = c("grey", "blue"))+
scale_fill_discrete(guide = FALSE)+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=14),
axis.title.x = element_text(size =14),
axis.text.y = element_text(size = 14),
strip.text.x = element_text(size=14),
axis.title.y = element_blank())+
theme_bw()


enter image description here



Update - getting p-value up to a specific time point using log rank



#First censor and make follow time to the time point of interest 
lung2 <- lung %>%
mutate(time2 = ifelse(time >= 365, 365, time),
status2 = ifelse(time >= 365, 1,status))
#Compute log rank test using survdiff
sdf <- survdiff(Surv(time2,status2)~sex,data = lung2)
#extract p-value
p.val <- round(1 - pchisq(sdf$chisq, length(sdf$n) - 1),3)


In the ggplot code above you can replace pvalue with p.val so it shows the log rank score.






share|improve this answer























  • Thank you very much! this is exactly what I was looking for. Though I was hoping to have the P-value calculated using log-rank. Could you perhaps refer me a link where I can calculate the P-value using log-rank using your method. Thanks again!
    – Krijn van der Burg
    Nov 16 at 12:26










  • @KrijnvanderBurg I updated the post to show how to get a log rank p-value at a specific time point
    – Mike
    Nov 16 at 14:37















up vote
2
down vote



accepted
+50










If you want a p-value at a specific time point you can do a z-test at a particular time point. In my example below I used the lung data set from the survival package. For better help to see if this method is appropriate I would post this question on cross validated.



library(survival)
library(dplyr)
library(broom)
library(ggplot2)
fit1 <- survfit(Surv(time,status)~sex,data = lung)
#turn into df
df <- broom::tidy(fit1)

fit_df <- df %>%
#group by strata
group_by(strata) %>%
#get day of interest or day before it
filter(time <= 365) %>%
arrange(time) %>%
# pulls last date
do(tail(.,1))

#calculate z score based on 2 sample test at that time point
z <- (fit_df$estimate[1]-fit_df$estimate[2]) /
(sqrt( fit_df$std.error[1]^2+ fit_df$std.error[2]^2))
#get probability of z score
pz <- pnorm(abs(z))
#get p value
pvalue <- round(2 * (1-pz),2)



ggplot(data = df, aes(x=time, y=estimate, group=strata, color= strata)) +
geom_line(size = 1.5)+
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2)+
geom_vline(aes(xintercept=365))+
geom_text(aes(x = 500,y=.8,label = paste0("p = " ,pvalue) ))+
scale_y_continuous("Survival",
limits = c(0,1))+
scale_x_continuous("Time")+
scale_color_manual(" ", values = c("grey", "blue"))+
scale_fill_discrete(guide = FALSE)+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=14),
axis.title.x = element_text(size =14),
axis.text.y = element_text(size = 14),
strip.text.x = element_text(size=14),
axis.title.y = element_blank())+
theme_bw()


enter image description here



Update - getting p-value up to a specific time point using log rank



#First censor and make follow time to the time point of interest 
lung2 <- lung %>%
mutate(time2 = ifelse(time >= 365, 365, time),
status2 = ifelse(time >= 365, 1,status))
#Compute log rank test using survdiff
sdf <- survdiff(Surv(time2,status2)~sex,data = lung2)
#extract p-value
p.val <- round(1 - pchisq(sdf$chisq, length(sdf$n) - 1),3)


In the ggplot code above you can replace pvalue with p.val so it shows the log rank score.






share|improve this answer























  • Thank you very much! this is exactly what I was looking for. Though I was hoping to have the P-value calculated using log-rank. Could you perhaps refer me a link where I can calculate the P-value using log-rank using your method. Thanks again!
    – Krijn van der Burg
    Nov 16 at 12:26










  • @KrijnvanderBurg I updated the post to show how to get a log rank p-value at a specific time point
    – Mike
    Nov 16 at 14:37













up vote
2
down vote



accepted
+50







up vote
2
down vote



accepted
+50




+50




If you want a p-value at a specific time point you can do a z-test at a particular time point. In my example below I used the lung data set from the survival package. For better help to see if this method is appropriate I would post this question on cross validated.



library(survival)
library(dplyr)
library(broom)
library(ggplot2)
fit1 <- survfit(Surv(time,status)~sex,data = lung)
#turn into df
df <- broom::tidy(fit1)

fit_df <- df %>%
#group by strata
group_by(strata) %>%
#get day of interest or day before it
filter(time <= 365) %>%
arrange(time) %>%
# pulls last date
do(tail(.,1))

#calculate z score based on 2 sample test at that time point
z <- (fit_df$estimate[1]-fit_df$estimate[2]) /
(sqrt( fit_df$std.error[1]^2+ fit_df$std.error[2]^2))
#get probability of z score
pz <- pnorm(abs(z))
#get p value
pvalue <- round(2 * (1-pz),2)



ggplot(data = df, aes(x=time, y=estimate, group=strata, color= strata)) +
geom_line(size = 1.5)+
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2)+
geom_vline(aes(xintercept=365))+
geom_text(aes(x = 500,y=.8,label = paste0("p = " ,pvalue) ))+
scale_y_continuous("Survival",
limits = c(0,1))+
scale_x_continuous("Time")+
scale_color_manual(" ", values = c("grey", "blue"))+
scale_fill_discrete(guide = FALSE)+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=14),
axis.title.x = element_text(size =14),
axis.text.y = element_text(size = 14),
strip.text.x = element_text(size=14),
axis.title.y = element_blank())+
theme_bw()


enter image description here



Update - getting p-value up to a specific time point using log rank



#First censor and make follow time to the time point of interest 
lung2 <- lung %>%
mutate(time2 = ifelse(time >= 365, 365, time),
status2 = ifelse(time >= 365, 1,status))
#Compute log rank test using survdiff
sdf <- survdiff(Surv(time2,status2)~sex,data = lung2)
#extract p-value
p.val <- round(1 - pchisq(sdf$chisq, length(sdf$n) - 1),3)


In the ggplot code above you can replace pvalue with p.val so it shows the log rank score.






share|improve this answer














If you want a p-value at a specific time point you can do a z-test at a particular time point. In my example below I used the lung data set from the survival package. For better help to see if this method is appropriate I would post this question on cross validated.



library(survival)
library(dplyr)
library(broom)
library(ggplot2)
fit1 <- survfit(Surv(time,status)~sex,data = lung)
#turn into df
df <- broom::tidy(fit1)

fit_df <- df %>%
#group by strata
group_by(strata) %>%
#get day of interest or day before it
filter(time <= 365) %>%
arrange(time) %>%
# pulls last date
do(tail(.,1))

#calculate z score based on 2 sample test at that time point
z <- (fit_df$estimate[1]-fit_df$estimate[2]) /
(sqrt( fit_df$std.error[1]^2+ fit_df$std.error[2]^2))
#get probability of z score
pz <- pnorm(abs(z))
#get p value
pvalue <- round(2 * (1-pz),2)



ggplot(data = df, aes(x=time, y=estimate, group=strata, color= strata)) +
geom_line(size = 1.5)+
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2)+
geom_vline(aes(xintercept=365))+
geom_text(aes(x = 500,y=.8,label = paste0("p = " ,pvalue) ))+
scale_y_continuous("Survival",
limits = c(0,1))+
scale_x_continuous("Time")+
scale_color_manual(" ", values = c("grey", "blue"))+
scale_fill_discrete(guide = FALSE)+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=14),
axis.title.x = element_text(size =14),
axis.text.y = element_text(size = 14),
strip.text.x = element_text(size=14),
axis.title.y = element_blank())+
theme_bw()


enter image description here



Update - getting p-value up to a specific time point using log rank



#First censor and make follow time to the time point of interest 
lung2 <- lung %>%
mutate(time2 = ifelse(time >= 365, 365, time),
status2 = ifelse(time >= 365, 1,status))
#Compute log rank test using survdiff
sdf <- survdiff(Surv(time2,status2)~sex,data = lung2)
#extract p-value
p.val <- round(1 - pchisq(sdf$chisq, length(sdf$n) - 1),3)


In the ggplot code above you can replace pvalue with p.val so it shows the log rank score.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 16 at 14:36

























answered Nov 15 at 15:33









Mike

809316




809316












  • Thank you very much! this is exactly what I was looking for. Though I was hoping to have the P-value calculated using log-rank. Could you perhaps refer me a link where I can calculate the P-value using log-rank using your method. Thanks again!
    – Krijn van der Burg
    Nov 16 at 12:26










  • @KrijnvanderBurg I updated the post to show how to get a log rank p-value at a specific time point
    – Mike
    Nov 16 at 14:37


















  • Thank you very much! this is exactly what I was looking for. Though I was hoping to have the P-value calculated using log-rank. Could you perhaps refer me a link where I can calculate the P-value using log-rank using your method. Thanks again!
    – Krijn van der Burg
    Nov 16 at 12:26










  • @KrijnvanderBurg I updated the post to show how to get a log rank p-value at a specific time point
    – Mike
    Nov 16 at 14:37
















Thank you very much! this is exactly what I was looking for. Though I was hoping to have the P-value calculated using log-rank. Could you perhaps refer me a link where I can calculate the P-value using log-rank using your method. Thanks again!
– Krijn van der Burg
Nov 16 at 12:26




Thank you very much! this is exactly what I was looking for. Though I was hoping to have the P-value calculated using log-rank. Could you perhaps refer me a link where I can calculate the P-value using log-rank using your method. Thanks again!
– Krijn van der Burg
Nov 16 at 12:26












@KrijnvanderBurg I updated the post to show how to get a log rank p-value at a specific time point
– Mike
Nov 16 at 14:37




@KrijnvanderBurg I updated the post to show how to get a log rank p-value at a specific time point
– Mike
Nov 16 at 14:37


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53222122%2fr-draw-survival-curve-and-calculate-p-value-at-specific-times%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Schultheiß

Verwaltungsgliederung Dänemarks

Liste der Kulturdenkmale in Wilsdruff