Identifying rows with uniform sequence while ignoring missing data in R
up vote
1
down vote
favorite
I'm working with panel data where the same variable is recorded multiple times to create a sequence of states. I only want to use observations that do not have uniform sequences but I am struggling to create a flag that would identify these while also not considering NAs as a different state.
I've created an example dataset to make things simple:
ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5)
df
ID S1 S2 S3 S4 S5
1 1 Education Education Education Education Education
2 2 Employment Employment Employment Employment Employment
3 3 Education Education NA Education Education
4 4 Education Unemployed Unemployed Unemployed Unemployed
5 5 Education Education Education Education Education
6 6 Education Education Employment Employment Employment
7 7 Education Employment Employment Employment Employment
8 8 Education Education NA NA NA
9 9 Education Education Education Education Education
10 10 Education Education Education Education Education
I'd ideally be able to flag or keep only observations ID=c("4", "6", "7").
I tried couple of approaches:
I tried counting the consecutive states but that doesn't account for the separate IDs
library(data.table)
setDT(df_long)
df_long[, employed := (S=="Employment")
][, e.length := with(rle(employed), rep(lengths,lengths))
][employed == 0, e.length := 0]
df_long[, education := (S=="Education")
][, edu.length := with(rle(education), rep(lengths,lengths))
][education == 0, edu.length := 0]
df_long
I've also tried manually creating a flag variable but that doesn't account for NAs and with the number of repeated observations in my dataset it is too manual/time-consuming
df$employed[df$S1=="Education" & df$S2=="Education" & df$S3=="Education" & df$S4=="Education" & df$S5=="Education"] <- 1
df$employed
Any help would be greatly appreciated.
r count sequence
add a comment |
up vote
1
down vote
favorite
I'm working with panel data where the same variable is recorded multiple times to create a sequence of states. I only want to use observations that do not have uniform sequences but I am struggling to create a flag that would identify these while also not considering NAs as a different state.
I've created an example dataset to make things simple:
ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5)
df
ID S1 S2 S3 S4 S5
1 1 Education Education Education Education Education
2 2 Employment Employment Employment Employment Employment
3 3 Education Education NA Education Education
4 4 Education Unemployed Unemployed Unemployed Unemployed
5 5 Education Education Education Education Education
6 6 Education Education Employment Employment Employment
7 7 Education Employment Employment Employment Employment
8 8 Education Education NA NA NA
9 9 Education Education Education Education Education
10 10 Education Education Education Education Education
I'd ideally be able to flag or keep only observations ID=c("4", "6", "7").
I tried couple of approaches:
I tried counting the consecutive states but that doesn't account for the separate IDs
library(data.table)
setDT(df_long)
df_long[, employed := (S=="Employment")
][, e.length := with(rle(employed), rep(lengths,lengths))
][employed == 0, e.length := 0]
df_long[, education := (S=="Education")
][, edu.length := with(rle(education), rep(lengths,lengths))
][education == 0, edu.length := 0]
df_long
I've also tried manually creating a flag variable but that doesn't account for NAs and with the number of repeated observations in my dataset it is too manual/time-consuming
df$employed[df$S1=="Education" & df$S2=="Education" & df$S3=="Education" & df$S4=="Education" & df$S5=="Education"] <- 1
df$employed
Any help would be greatly appreciated.
r count sequence
Could also vectorize as followswhich(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4)
(but only if you create your data while specifying, stringsAsFactors = FALSE
)
– David Arenburg
Nov 8 at 11:38
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm working with panel data where the same variable is recorded multiple times to create a sequence of states. I only want to use observations that do not have uniform sequences but I am struggling to create a flag that would identify these while also not considering NAs as a different state.
I've created an example dataset to make things simple:
ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5)
df
ID S1 S2 S3 S4 S5
1 1 Education Education Education Education Education
2 2 Employment Employment Employment Employment Employment
3 3 Education Education NA Education Education
4 4 Education Unemployed Unemployed Unemployed Unemployed
5 5 Education Education Education Education Education
6 6 Education Education Employment Employment Employment
7 7 Education Employment Employment Employment Employment
8 8 Education Education NA NA NA
9 9 Education Education Education Education Education
10 10 Education Education Education Education Education
I'd ideally be able to flag or keep only observations ID=c("4", "6", "7").
I tried couple of approaches:
I tried counting the consecutive states but that doesn't account for the separate IDs
library(data.table)
setDT(df_long)
df_long[, employed := (S=="Employment")
][, e.length := with(rle(employed), rep(lengths,lengths))
][employed == 0, e.length := 0]
df_long[, education := (S=="Education")
][, edu.length := with(rle(education), rep(lengths,lengths))
][education == 0, edu.length := 0]
df_long
I've also tried manually creating a flag variable but that doesn't account for NAs and with the number of repeated observations in my dataset it is too manual/time-consuming
df$employed[df$S1=="Education" & df$S2=="Education" & df$S3=="Education" & df$S4=="Education" & df$S5=="Education"] <- 1
df$employed
Any help would be greatly appreciated.
r count sequence
I'm working with panel data where the same variable is recorded multiple times to create a sequence of states. I only want to use observations that do not have uniform sequences but I am struggling to create a flag that would identify these while also not considering NAs as a different state.
I've created an example dataset to make things simple:
ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5)
df
ID S1 S2 S3 S4 S5
1 1 Education Education Education Education Education
2 2 Employment Employment Employment Employment Employment
3 3 Education Education NA Education Education
4 4 Education Unemployed Unemployed Unemployed Unemployed
5 5 Education Education Education Education Education
6 6 Education Education Employment Employment Employment
7 7 Education Employment Employment Employment Employment
8 8 Education Education NA NA NA
9 9 Education Education Education Education Education
10 10 Education Education Education Education Education
I'd ideally be able to flag or keep only observations ID=c("4", "6", "7").
I tried couple of approaches:
I tried counting the consecutive states but that doesn't account for the separate IDs
library(data.table)
setDT(df_long)
df_long[, employed := (S=="Employment")
][, e.length := with(rle(employed), rep(lengths,lengths))
][employed == 0, e.length := 0]
df_long[, education := (S=="Education")
][, edu.length := with(rle(education), rep(lengths,lengths))
][education == 0, edu.length := 0]
df_long
I've also tried manually creating a flag variable but that doesn't account for NAs and with the number of repeated observations in my dataset it is too manual/time-consuming
df$employed[df$S1=="Education" & df$S2=="Education" & df$S3=="Education" & df$S4=="Education" & df$S5=="Education"] <- 1
df$employed
Any help would be greatly appreciated.
r count sequence
r count sequence
asked Nov 8 at 10:59
Maria
83
83
Could also vectorize as followswhich(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4)
(but only if you create your data while specifying, stringsAsFactors = FALSE
)
– David Arenburg
Nov 8 at 11:38
add a comment |
Could also vectorize as followswhich(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4)
(but only if you create your data while specifying, stringsAsFactors = FALSE
)
– David Arenburg
Nov 8 at 11:38
Could also vectorize as follows
which(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4)
(but only if you create your data while specifying , stringsAsFactors = FALSE
)– David Arenburg
Nov 8 at 11:38
Could also vectorize as follows
which(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4)
(but only if you create your data while specifying , stringsAsFactors = FALSE
)– David Arenburg
Nov 8 at 11:38
add a comment |
3 Answers
3
active
oldest
votes
up vote
0
down vote
accepted
Its super easy:
df[df == "NA"] <- NA
df$keep <- lengths(apply(df[,-1],1, table)) > 1
#> which(df$keep)
#[1] 4 6 7
1
That's amazing, thank you Andre
– Maria
Nov 8 at 11:12
add a comment |
up vote
0
down vote
I had a similar solution, but without table
:
df[df == "NA"] <- NA
df$to.keep <- apply(df[, -1], 1, function(x) {
!any(is.na(x)) & length(unique(x)) > 1
})
> which(df$to.keep)
[1] 4 6 7
please addS6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
to the data.frame. You will see "your solution" will not work.
– Andre Elrico
Nov 8 at 11:27
add a comment |
up vote
0
down vote
ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5,S6)
Added S6 also from your comments where Andre answer not able to label it correctly
library(dplyr)
df[df == "NA"] <- NA
df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))
df
ID S1 S2 S3 S4 S5 S6 Flag_NA Flag_Uniform Flag_keep
1 1 Education Education Education Education Education Education Yes No No
2 2 Employment Employment Employment Employment Employment Employment Yes No No
3 3 Education Education <NA> Education Education Education No Yes No
4 4 Education Unemployed Unemployed Unemployed Unemployed Unemployed Yes Yes Yes
5 5 Education Education Education Education Education Education Yes No No
6 6 Education Education Employment Employment Employment Employment Yes Yes Yes
7 7 Education Employment Employment Employment Employment Employment Yes Yes Yes
8 8 Education Education <NA> <NA> <NA> EMP No Yes No
9 9 Education Education Education Education Education Education Yes No No
10 10 Education Education Education Education Education Education Yes No No
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
Its super easy:
df[df == "NA"] <- NA
df$keep <- lengths(apply(df[,-1],1, table)) > 1
#> which(df$keep)
#[1] 4 6 7
1
That's amazing, thank you Andre
– Maria
Nov 8 at 11:12
add a comment |
up vote
0
down vote
accepted
Its super easy:
df[df == "NA"] <- NA
df$keep <- lengths(apply(df[,-1],1, table)) > 1
#> which(df$keep)
#[1] 4 6 7
1
That's amazing, thank you Andre
– Maria
Nov 8 at 11:12
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
Its super easy:
df[df == "NA"] <- NA
df$keep <- lengths(apply(df[,-1],1, table)) > 1
#> which(df$keep)
#[1] 4 6 7
Its super easy:
df[df == "NA"] <- NA
df$keep <- lengths(apply(df[,-1],1, table)) > 1
#> which(df$keep)
#[1] 4 6 7
answered Nov 8 at 11:07
Andre Elrico
4,6971827
4,6971827
1
That's amazing, thank you Andre
– Maria
Nov 8 at 11:12
add a comment |
1
That's amazing, thank you Andre
– Maria
Nov 8 at 11:12
1
1
That's amazing, thank you Andre
– Maria
Nov 8 at 11:12
That's amazing, thank you Andre
– Maria
Nov 8 at 11:12
add a comment |
up vote
0
down vote
I had a similar solution, but without table
:
df[df == "NA"] <- NA
df$to.keep <- apply(df[, -1], 1, function(x) {
!any(is.na(x)) & length(unique(x)) > 1
})
> which(df$to.keep)
[1] 4 6 7
please addS6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
to the data.frame. You will see "your solution" will not work.
– Andre Elrico
Nov 8 at 11:27
add a comment |
up vote
0
down vote
I had a similar solution, but without table
:
df[df == "NA"] <- NA
df$to.keep <- apply(df[, -1], 1, function(x) {
!any(is.na(x)) & length(unique(x)) > 1
})
> which(df$to.keep)
[1] 4 6 7
please addS6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
to the data.frame. You will see "your solution" will not work.
– Andre Elrico
Nov 8 at 11:27
add a comment |
up vote
0
down vote
up vote
0
down vote
I had a similar solution, but without table
:
df[df == "NA"] <- NA
df$to.keep <- apply(df[, -1], 1, function(x) {
!any(is.na(x)) & length(unique(x)) > 1
})
> which(df$to.keep)
[1] 4 6 7
I had a similar solution, but without table
:
df[df == "NA"] <- NA
df$to.keep <- apply(df[, -1], 1, function(x) {
!any(is.na(x)) & length(unique(x)) > 1
})
> which(df$to.keep)
[1] 4 6 7
answered Nov 8 at 11:17
Gramposity
23614
23614
please addS6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
to the data.frame. You will see "your solution" will not work.
– Andre Elrico
Nov 8 at 11:27
add a comment |
please addS6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
to the data.frame. You will see "your solution" will not work.
– Andre Elrico
Nov 8 at 11:27
please add
S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
to the data.frame. You will see "your solution" will not work.– Andre Elrico
Nov 8 at 11:27
please add
S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
to the data.frame. You will see "your solution" will not work.– Andre Elrico
Nov 8 at 11:27
add a comment |
up vote
0
down vote
ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5,S6)
Added S6 also from your comments where Andre answer not able to label it correctly
library(dplyr)
df[df == "NA"] <- NA
df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))
df
ID S1 S2 S3 S4 S5 S6 Flag_NA Flag_Uniform Flag_keep
1 1 Education Education Education Education Education Education Yes No No
2 2 Employment Employment Employment Employment Employment Employment Yes No No
3 3 Education Education <NA> Education Education Education No Yes No
4 4 Education Unemployed Unemployed Unemployed Unemployed Unemployed Yes Yes Yes
5 5 Education Education Education Education Education Education Yes No No
6 6 Education Education Employment Employment Employment Employment Yes Yes Yes
7 7 Education Employment Employment Employment Employment Employment Yes Yes Yes
8 8 Education Education <NA> <NA> <NA> EMP No Yes No
9 9 Education Education Education Education Education Education Yes No No
10 10 Education Education Education Education Education Education Yes No No
add a comment |
up vote
0
down vote
ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5,S6)
Added S6 also from your comments where Andre answer not able to label it correctly
library(dplyr)
df[df == "NA"] <- NA
df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))
df
ID S1 S2 S3 S4 S5 S6 Flag_NA Flag_Uniform Flag_keep
1 1 Education Education Education Education Education Education Yes No No
2 2 Employment Employment Employment Employment Employment Employment Yes No No
3 3 Education Education <NA> Education Education Education No Yes No
4 4 Education Unemployed Unemployed Unemployed Unemployed Unemployed Yes Yes Yes
5 5 Education Education Education Education Education Education Yes No No
6 6 Education Education Employment Employment Employment Employment Yes Yes Yes
7 7 Education Employment Employment Employment Employment Employment Yes Yes Yes
8 8 Education Education <NA> <NA> <NA> EMP No Yes No
9 9 Education Education Education Education Education Education Yes No No
10 10 Education Education Education Education Education Education Yes No No
add a comment |
up vote
0
down vote
up vote
0
down vote
ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5,S6)
Added S6 also from your comments where Andre answer not able to label it correctly
library(dplyr)
df[df == "NA"] <- NA
df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))
df
ID S1 S2 S3 S4 S5 S6 Flag_NA Flag_Uniform Flag_keep
1 1 Education Education Education Education Education Education Yes No No
2 2 Employment Employment Employment Employment Employment Employment Yes No No
3 3 Education Education <NA> Education Education Education No Yes No
4 4 Education Unemployed Unemployed Unemployed Unemployed Unemployed Yes Yes Yes
5 5 Education Education Education Education Education Education Yes No No
6 6 Education Education Employment Employment Employment Employment Yes Yes Yes
7 7 Education Employment Employment Employment Employment Employment Yes Yes Yes
8 8 Education Education <NA> <NA> <NA> EMP No Yes No
9 9 Education Education Education Education Education Education Yes No No
10 10 Education Education Education Education Education Education Yes No No
ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5,S6)
Added S6 also from your comments where Andre answer not able to label it correctly
library(dplyr)
df[df == "NA"] <- NA
df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))
df
ID S1 S2 S3 S4 S5 S6 Flag_NA Flag_Uniform Flag_keep
1 1 Education Education Education Education Education Education Yes No No
2 2 Employment Employment Employment Employment Employment Employment Yes No No
3 3 Education Education <NA> Education Education Education No Yes No
4 4 Education Unemployed Unemployed Unemployed Unemployed Unemployed Yes Yes Yes
5 5 Education Education Education Education Education Education Yes No No
6 6 Education Education Employment Employment Employment Employment Yes Yes Yes
7 7 Education Employment Employment Employment Employment Employment Yes Yes Yes
8 8 Education Education <NA> <NA> <NA> EMP No Yes No
9 9 Education Education Education Education Education Education Yes No No
10 10 Education Education Education Education Education Education Yes No No
answered Nov 8 at 11:59
Sai Prabhanjan Reddy
1829
1829
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206365%2fidentifying-rows-with-uniform-sequence-while-ignoring-missing-data-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Could also vectorize as follows
which(rowSums((df[, 2] == df[, -(1:2)]) + (df[, -(1:2)] == "NA")) < 4)
(but only if you create your data while specifying, stringsAsFactors = FALSE
)– David Arenburg
Nov 8 at 11:38