Identifying rows with uniform sequence while ignoring missing data in R











up vote
1
down vote

favorite












I'm working with panel data where the same variable is recorded multiple times to create a sequence of states. I only want to use observations that do not have uniform sequences but I am struggling to create a flag that would identify these while also not considering NAs as a different state.



I've created an example dataset to make things simple:



ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5)
df

ID S1 S2 S3 S4 S5
1 1 Education Education Education Education Education
2 2 Employment Employment Employment Employment Employment
3 3 Education Education NA Education Education
4 4 Education Unemployed Unemployed Unemployed Unemployed
5 5 Education Education Education Education Education
6 6 Education Education Employment Employment Employment
7 7 Education Employment Employment Employment Employment
8 8 Education Education NA NA NA
9 9 Education Education Education Education Education
10 10 Education Education Education Education Education


I'd ideally be able to flag or keep only observations ID=c("4", "6", "7").



I tried couple of approaches:



I tried counting the consecutive states but that doesn't account for the separate IDs



library(data.table)

setDT(df_long)
df_long[, employed := (S=="Employment")
][, e.length := with(rle(employed), rep(lengths,lengths))
][employed == 0, e.length := 0]

df_long[, education := (S=="Education")
][, edu.length := with(rle(education), rep(lengths,lengths))
][education == 0, edu.length := 0]
df_long


I've also tried manually creating a flag variable but that doesn't account for NAs and with the number of repeated observations in my dataset it is too manual/time-consuming



df$employed[df$S1=="Education" & df$S2=="Education" & df$S3=="Education" & df$S4=="Education" & df$S5=="Education"] <- 1
df$employed


Any help would be greatly appreciated.










share|improve this question






















  • Could also vectorize as follows which(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4) (but only if you create your data while specifying , stringsAsFactors = FALSE)
    – David Arenburg
    Nov 8 at 11:38

















up vote
1
down vote

favorite












I'm working with panel data where the same variable is recorded multiple times to create a sequence of states. I only want to use observations that do not have uniform sequences but I am struggling to create a flag that would identify these while also not considering NAs as a different state.



I've created an example dataset to make things simple:



ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5)
df

ID S1 S2 S3 S4 S5
1 1 Education Education Education Education Education
2 2 Employment Employment Employment Employment Employment
3 3 Education Education NA Education Education
4 4 Education Unemployed Unemployed Unemployed Unemployed
5 5 Education Education Education Education Education
6 6 Education Education Employment Employment Employment
7 7 Education Employment Employment Employment Employment
8 8 Education Education NA NA NA
9 9 Education Education Education Education Education
10 10 Education Education Education Education Education


I'd ideally be able to flag or keep only observations ID=c("4", "6", "7").



I tried couple of approaches:



I tried counting the consecutive states but that doesn't account for the separate IDs



library(data.table)

setDT(df_long)
df_long[, employed := (S=="Employment")
][, e.length := with(rle(employed), rep(lengths,lengths))
][employed == 0, e.length := 0]

df_long[, education := (S=="Education")
][, edu.length := with(rle(education), rep(lengths,lengths))
][education == 0, edu.length := 0]
df_long


I've also tried manually creating a flag variable but that doesn't account for NAs and with the number of repeated observations in my dataset it is too manual/time-consuming



df$employed[df$S1=="Education" & df$S2=="Education" & df$S3=="Education" & df$S4=="Education" & df$S5=="Education"] <- 1
df$employed


Any help would be greatly appreciated.










share|improve this question






















  • Could also vectorize as follows which(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4) (but only if you create your data while specifying , stringsAsFactors = FALSE)
    – David Arenburg
    Nov 8 at 11:38















up vote
1
down vote

favorite









up vote
1
down vote

favorite











I'm working with panel data where the same variable is recorded multiple times to create a sequence of states. I only want to use observations that do not have uniform sequences but I am struggling to create a flag that would identify these while also not considering NAs as a different state.



I've created an example dataset to make things simple:



ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5)
df

ID S1 S2 S3 S4 S5
1 1 Education Education Education Education Education
2 2 Employment Employment Employment Employment Employment
3 3 Education Education NA Education Education
4 4 Education Unemployed Unemployed Unemployed Unemployed
5 5 Education Education Education Education Education
6 6 Education Education Employment Employment Employment
7 7 Education Employment Employment Employment Employment
8 8 Education Education NA NA NA
9 9 Education Education Education Education Education
10 10 Education Education Education Education Education


I'd ideally be able to flag or keep only observations ID=c("4", "6", "7").



I tried couple of approaches:



I tried counting the consecutive states but that doesn't account for the separate IDs



library(data.table)

setDT(df_long)
df_long[, employed := (S=="Employment")
][, e.length := with(rle(employed), rep(lengths,lengths))
][employed == 0, e.length := 0]

df_long[, education := (S=="Education")
][, edu.length := with(rle(education), rep(lengths,lengths))
][education == 0, edu.length := 0]
df_long


I've also tried manually creating a flag variable but that doesn't account for NAs and with the number of repeated observations in my dataset it is too manual/time-consuming



df$employed[df$S1=="Education" & df$S2=="Education" & df$S3=="Education" & df$S4=="Education" & df$S5=="Education"] <- 1
df$employed


Any help would be greatly appreciated.










share|improve this question













I'm working with panel data where the same variable is recorded multiple times to create a sequence of states. I only want to use observations that do not have uniform sequences but I am struggling to create a flag that would identify these while also not considering NAs as a different state.



I've created an example dataset to make things simple:



ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5)
df

ID S1 S2 S3 S4 S5
1 1 Education Education Education Education Education
2 2 Employment Employment Employment Employment Employment
3 3 Education Education NA Education Education
4 4 Education Unemployed Unemployed Unemployed Unemployed
5 5 Education Education Education Education Education
6 6 Education Education Employment Employment Employment
7 7 Education Employment Employment Employment Employment
8 8 Education Education NA NA NA
9 9 Education Education Education Education Education
10 10 Education Education Education Education Education


I'd ideally be able to flag or keep only observations ID=c("4", "6", "7").



I tried couple of approaches:



I tried counting the consecutive states but that doesn't account for the separate IDs



library(data.table)

setDT(df_long)
df_long[, employed := (S=="Employment")
][, e.length := with(rle(employed), rep(lengths,lengths))
][employed == 0, e.length := 0]

df_long[, education := (S=="Education")
][, edu.length := with(rle(education), rep(lengths,lengths))
][education == 0, edu.length := 0]
df_long


I've also tried manually creating a flag variable but that doesn't account for NAs and with the number of repeated observations in my dataset it is too manual/time-consuming



df$employed[df$S1=="Education" & df$S2=="Education" & df$S3=="Education" & df$S4=="Education" & df$S5=="Education"] <- 1
df$employed


Any help would be greatly appreciated.







r count sequence






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 8 at 10:59









Maria

83




83












  • Could also vectorize as follows which(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4) (but only if you create your data while specifying , stringsAsFactors = FALSE)
    – David Arenburg
    Nov 8 at 11:38




















  • Could also vectorize as follows which(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4) (but only if you create your data while specifying , stringsAsFactors = FALSE)
    – David Arenburg
    Nov 8 at 11:38


















Could also vectorize as follows which(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4) (but only if you create your data while specifying , stringsAsFactors = FALSE)
– David Arenburg
Nov 8 at 11:38






Could also vectorize as follows which(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4) (but only if you create your data while specifying , stringsAsFactors = FALSE)
– David Arenburg
Nov 8 at 11:38














3 Answers
3






active

oldest

votes

















up vote
0
down vote



accepted










Its super easy:



df[df == "NA"] <- NA

df$keep <- lengths(apply(df[,-1],1, table)) > 1




#> which(df$keep)
#[1] 4 6 7





share|improve this answer

















  • 1




    That's amazing, thank you Andre
    – Maria
    Nov 8 at 11:12




















up vote
0
down vote













I had a similar solution, but without table:



df[df == "NA"] <- NA
df$to.keep <- apply(df[, -1], 1, function(x) {
!any(is.na(x)) & length(unique(x)) > 1
})

> which(df$to.keep)
[1] 4 6 7





share|improve this answer





















  • please add S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education") to the data.frame. You will see "your solution" will not work.
    – Andre Elrico
    Nov 8 at 11:27




















up vote
0
down vote













ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5,S6)


Added S6 also from your comments where Andre answer not able to label it correctly



library(dplyr)
df[df == "NA"] <- NA

df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))

df
ID S1 S2 S3 S4 S5 S6 Flag_NA Flag_Uniform Flag_keep
1 1 Education Education Education Education Education Education Yes No No
2 2 Employment Employment Employment Employment Employment Employment Yes No No
3 3 Education Education <NA> Education Education Education No Yes No
4 4 Education Unemployed Unemployed Unemployed Unemployed Unemployed Yes Yes Yes
5 5 Education Education Education Education Education Education Yes No No
6 6 Education Education Employment Employment Employment Employment Yes Yes Yes
7 7 Education Employment Employment Employment Employment Employment Yes Yes Yes
8 8 Education Education <NA> <NA> <NA> EMP No Yes No
9 9 Education Education Education Education Education Education Yes No No
10 10 Education Education Education Education Education Education Yes No No





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206365%2fidentifying-rows-with-uniform-sequence-while-ignoring-missing-data-in-r%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote



    accepted










    Its super easy:



    df[df == "NA"] <- NA

    df$keep <- lengths(apply(df[,-1],1, table)) > 1




    #> which(df$keep)
    #[1] 4 6 7





    share|improve this answer

















    • 1




      That's amazing, thank you Andre
      – Maria
      Nov 8 at 11:12

















    up vote
    0
    down vote



    accepted










    Its super easy:



    df[df == "NA"] <- NA

    df$keep <- lengths(apply(df[,-1],1, table)) > 1




    #> which(df$keep)
    #[1] 4 6 7





    share|improve this answer

















    • 1




      That's amazing, thank you Andre
      – Maria
      Nov 8 at 11:12















    up vote
    0
    down vote



    accepted







    up vote
    0
    down vote



    accepted






    Its super easy:



    df[df == "NA"] <- NA

    df$keep <- lengths(apply(df[,-1],1, table)) > 1




    #> which(df$keep)
    #[1] 4 6 7





    share|improve this answer












    Its super easy:



    df[df == "NA"] <- NA

    df$keep <- lengths(apply(df[,-1],1, table)) > 1




    #> which(df$keep)
    #[1] 4 6 7






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 8 at 11:07









    Andre Elrico

    4,6971827




    4,6971827








    • 1




      That's amazing, thank you Andre
      – Maria
      Nov 8 at 11:12
















    • 1




      That's amazing, thank you Andre
      – Maria
      Nov 8 at 11:12










    1




    1




    That's amazing, thank you Andre
    – Maria
    Nov 8 at 11:12






    That's amazing, thank you Andre
    – Maria
    Nov 8 at 11:12














    up vote
    0
    down vote













    I had a similar solution, but without table:



    df[df == "NA"] <- NA
    df$to.keep <- apply(df[, -1], 1, function(x) {
    !any(is.na(x)) & length(unique(x)) > 1
    })

    > which(df$to.keep)
    [1] 4 6 7





    share|improve this answer





















    • please add S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education") to the data.frame. You will see "your solution" will not work.
      – Andre Elrico
      Nov 8 at 11:27

















    up vote
    0
    down vote













    I had a similar solution, but without table:



    df[df == "NA"] <- NA
    df$to.keep <- apply(df[, -1], 1, function(x) {
    !any(is.na(x)) & length(unique(x)) > 1
    })

    > which(df$to.keep)
    [1] 4 6 7





    share|improve this answer





















    • please add S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education") to the data.frame. You will see "your solution" will not work.
      – Andre Elrico
      Nov 8 at 11:27















    up vote
    0
    down vote










    up vote
    0
    down vote









    I had a similar solution, but without table:



    df[df == "NA"] <- NA
    df$to.keep <- apply(df[, -1], 1, function(x) {
    !any(is.na(x)) & length(unique(x)) > 1
    })

    > which(df$to.keep)
    [1] 4 6 7





    share|improve this answer












    I had a similar solution, but without table:



    df[df == "NA"] <- NA
    df$to.keep <- apply(df[, -1], 1, function(x) {
    !any(is.na(x)) & length(unique(x)) > 1
    })

    > which(df$to.keep)
    [1] 4 6 7






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 8 at 11:17









    Gramposity

    23614




    23614












    • please add S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education") to the data.frame. You will see "your solution" will not work.
      – Andre Elrico
      Nov 8 at 11:27




















    • please add S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education") to the data.frame. You will see "your solution" will not work.
      – Andre Elrico
      Nov 8 at 11:27


















    please add S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education") to the data.frame. You will see "your solution" will not work.
    – Andre Elrico
    Nov 8 at 11:27






    please add S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education") to the data.frame. You will see "your solution" will not work.
    – Andre Elrico
    Nov 8 at 11:27












    up vote
    0
    down vote













    ID <- c(1,2,3,4,5,6,7,8,9,10)
    S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
    S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
    S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
    S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
    S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
    S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
    df <- data.frame(ID, S1, S2, S3, S4, S5,S6)


    Added S6 also from your comments where Andre answer not able to label it correctly



    library(dplyr)
    df[df == "NA"] <- NA

    df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
    df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
    df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))

    df
    ID S1 S2 S3 S4 S5 S6 Flag_NA Flag_Uniform Flag_keep
    1 1 Education Education Education Education Education Education Yes No No
    2 2 Employment Employment Employment Employment Employment Employment Yes No No
    3 3 Education Education <NA> Education Education Education No Yes No
    4 4 Education Unemployed Unemployed Unemployed Unemployed Unemployed Yes Yes Yes
    5 5 Education Education Education Education Education Education Yes No No
    6 6 Education Education Employment Employment Employment Employment Yes Yes Yes
    7 7 Education Employment Employment Employment Employment Employment Yes Yes Yes
    8 8 Education Education <NA> <NA> <NA> EMP No Yes No
    9 9 Education Education Education Education Education Education Yes No No
    10 10 Education Education Education Education Education Education Yes No No





    share|improve this answer

























      up vote
      0
      down vote













      ID <- c(1,2,3,4,5,6,7,8,9,10)
      S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
      S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
      S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
      S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
      S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
      S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
      df <- data.frame(ID, S1, S2, S3, S4, S5,S6)


      Added S6 also from your comments where Andre answer not able to label it correctly



      library(dplyr)
      df[df == "NA"] <- NA

      df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
      df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
      df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))

      df
      ID S1 S2 S3 S4 S5 S6 Flag_NA Flag_Uniform Flag_keep
      1 1 Education Education Education Education Education Education Yes No No
      2 2 Employment Employment Employment Employment Employment Employment Yes No No
      3 3 Education Education <NA> Education Education Education No Yes No
      4 4 Education Unemployed Unemployed Unemployed Unemployed Unemployed Yes Yes Yes
      5 5 Education Education Education Education Education Education Yes No No
      6 6 Education Education Employment Employment Employment Employment Yes Yes Yes
      7 7 Education Employment Employment Employment Employment Employment Yes Yes Yes
      8 8 Education Education <NA> <NA> <NA> EMP No Yes No
      9 9 Education Education Education Education Education Education Yes No No
      10 10 Education Education Education Education Education Education Yes No No





      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        ID <- c(1,2,3,4,5,6,7,8,9,10)
        S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
        S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
        S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
        S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
        S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
        S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
        df <- data.frame(ID, S1, S2, S3, S4, S5,S6)


        Added S6 also from your comments where Andre answer not able to label it correctly



        library(dplyr)
        df[df == "NA"] <- NA

        df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
        df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
        df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))

        df
        ID S1 S2 S3 S4 S5 S6 Flag_NA Flag_Uniform Flag_keep
        1 1 Education Education Education Education Education Education Yes No No
        2 2 Employment Employment Employment Employment Employment Employment Yes No No
        3 3 Education Education <NA> Education Education Education No Yes No
        4 4 Education Unemployed Unemployed Unemployed Unemployed Unemployed Yes Yes Yes
        5 5 Education Education Education Education Education Education Yes No No
        6 6 Education Education Employment Employment Employment Employment Yes Yes Yes
        7 7 Education Employment Employment Employment Employment Employment Yes Yes Yes
        8 8 Education Education <NA> <NA> <NA> EMP No Yes No
        9 9 Education Education Education Education Education Education Yes No No
        10 10 Education Education Education Education Education Education Yes No No





        share|improve this answer












        ID <- c(1,2,3,4,5,6,7,8,9,10)
        S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
        S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
        S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
        S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
        S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
        S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
        df <- data.frame(ID, S1, S2, S3, S4, S5,S6)


        Added S6 also from your comments where Andre answer not able to label it correctly



        library(dplyr)
        df[df == "NA"] <- NA

        df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
        df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
        df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))

        df
        ID S1 S2 S3 S4 S5 S6 Flag_NA Flag_Uniform Flag_keep
        1 1 Education Education Education Education Education Education Yes No No
        2 2 Employment Employment Employment Employment Employment Employment Yes No No
        3 3 Education Education <NA> Education Education Education No Yes No
        4 4 Education Unemployed Unemployed Unemployed Unemployed Unemployed Yes Yes Yes
        5 5 Education Education Education Education Education Education Yes No No
        6 6 Education Education Employment Employment Employment Employment Yes Yes Yes
        7 7 Education Employment Employment Employment Employment Employment Yes Yes Yes
        8 8 Education Education <NA> <NA> <NA> EMP No Yes No
        9 9 Education Education Education Education Education Education Yes No No
        10 10 Education Education Education Education Education Education Yes No No






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 8 at 11:59









        Sai Prabhanjan Reddy

        1829




        1829






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206365%2fidentifying-rows-with-uniform-sequence-while-ignoring-missing-data-in-r%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Schultheiß

            Liste der Kulturdenkmale in Wilsdruff

            Android Play Services Check