How to Split Dataset in R with respect to a column containing NA or it consists a value
up vote
0
down vote
favorite
I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.
project_train_2= filter(project_final,project_final$SalePrice=='NA')
Pro_train=createDataPartition(project_final,project_final[project_train_2,])
Here Project final is the combined Train and test data, which I want to split with respect to the SalesPrice column either having NA or is having a value.
Those having values have to be in Train and the ones not having the test.
I have used the above command to filter data and there by using the CreatedataPartition code to split the data.
Need help on the same.
r
|
show 2 more comments
up vote
0
down vote
favorite
I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.
project_train_2= filter(project_final,project_final$SalePrice=='NA')
Pro_train=createDataPartition(project_final,project_final[project_train_2,])
Here Project final is the combined Train and test data, which I want to split with respect to the SalesPrice column either having NA or is having a value.
Those having values have to be in Train and the ones not having the test.
I have used the above command to filter data and there by using the CreatedataPartition code to split the data.
Need help on the same.
r
I assume you are usingdplyr::filter
(and notstats::filter
, for which this will not work); do not useproject_file$
inside of it, just dofilter(project_final, SalePrice == 'NA')
. Furthermore, is it the literal string"NA"
or is it R'sNA
, in which case you'd wantfilter(project_final, is.na(SalePrice))
.
– r2evans
Nov 8 at 13:25
filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12
what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37
You misunderstanddata.frame
s: everything in a column is the same class, so even theNA
s in that column will be marked as integer. Do you wantfilter(project_final, !is.na(SalePrice))
, returning all rows that do not have anNA
price?
– r2evans
Nov 9 at 16:38
Background: there are five types ofNA
, though they are always printed as the basicNA
. There are also:NA_integer_
(what you have, likely),NA_real_
(isnumeric
),NA_complex_
, andNA_character_
. The basicNA
that is not yet associated with strings or numbers will be alogical
, sois.logical(NA)
is true, notis.logical(c(1,NA))
, insteadis.numeric(c(1,NA))
. The reason is that everything in a vector must be the same class (logical
,integer
,numeric
,character
); each column in a (normal)data.frame
is a vector internally, so shares the same-class requirement.
– r2evans
Nov 9 at 16:42
|
show 2 more comments
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.
project_train_2= filter(project_final,project_final$SalePrice=='NA')
Pro_train=createDataPartition(project_final,project_final[project_train_2,])
Here Project final is the combined Train and test data, which I want to split with respect to the SalesPrice column either having NA or is having a value.
Those having values have to be in Train and the ones not having the test.
I have used the above command to filter data and there by using the CreatedataPartition code to split the data.
Need help on the same.
r
I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.
project_train_2= filter(project_final,project_final$SalePrice=='NA')
Pro_train=createDataPartition(project_final,project_final[project_train_2,])
Here Project final is the combined Train and test data, which I want to split with respect to the SalesPrice column either having NA or is having a value.
Those having values have to be in Train and the ones not having the test.
I have used the above command to filter data and there by using the CreatedataPartition code to split the data.
Need help on the same.
r
r
edited Nov 8 at 13:22
EstevaoLuis
1,07831727
1,07831727
asked Nov 8 at 13:17
Anand Menon
11
11
I assume you are usingdplyr::filter
(and notstats::filter
, for which this will not work); do not useproject_file$
inside of it, just dofilter(project_final, SalePrice == 'NA')
. Furthermore, is it the literal string"NA"
or is it R'sNA
, in which case you'd wantfilter(project_final, is.na(SalePrice))
.
– r2evans
Nov 8 at 13:25
filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12
what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37
You misunderstanddata.frame
s: everything in a column is the same class, so even theNA
s in that column will be marked as integer. Do you wantfilter(project_final, !is.na(SalePrice))
, returning all rows that do not have anNA
price?
– r2evans
Nov 9 at 16:38
Background: there are five types ofNA
, though they are always printed as the basicNA
. There are also:NA_integer_
(what you have, likely),NA_real_
(isnumeric
),NA_complex_
, andNA_character_
. The basicNA
that is not yet associated with strings or numbers will be alogical
, sois.logical(NA)
is true, notis.logical(c(1,NA))
, insteadis.numeric(c(1,NA))
. The reason is that everything in a vector must be the same class (logical
,integer
,numeric
,character
); each column in a (normal)data.frame
is a vector internally, so shares the same-class requirement.
– r2evans
Nov 9 at 16:42
|
show 2 more comments
I assume you are usingdplyr::filter
(and notstats::filter
, for which this will not work); do not useproject_file$
inside of it, just dofilter(project_final, SalePrice == 'NA')
. Furthermore, is it the literal string"NA"
or is it R'sNA
, in which case you'd wantfilter(project_final, is.na(SalePrice))
.
– r2evans
Nov 8 at 13:25
filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12
what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37
You misunderstanddata.frame
s: everything in a column is the same class, so even theNA
s in that column will be marked as integer. Do you wantfilter(project_final, !is.na(SalePrice))
, returning all rows that do not have anNA
price?
– r2evans
Nov 9 at 16:38
Background: there are five types ofNA
, though they are always printed as the basicNA
. There are also:NA_integer_
(what you have, likely),NA_real_
(isnumeric
),NA_complex_
, andNA_character_
. The basicNA
that is not yet associated with strings or numbers will be alogical
, sois.logical(NA)
is true, notis.logical(c(1,NA))
, insteadis.numeric(c(1,NA))
. The reason is that everything in a vector must be the same class (logical
,integer
,numeric
,character
); each column in a (normal)data.frame
is a vector internally, so shares the same-class requirement.
– r2evans
Nov 9 at 16:42
I assume you are using
dplyr::filter
(and not stats::filter
, for which this will not work); do not use project_file$
inside of it, just do filter(project_final, SalePrice == 'NA')
. Furthermore, is it the literal string "NA"
or is it R's NA
, in which case you'd want filter(project_final, is.na(SalePrice))
.– r2evans
Nov 8 at 13:25
I assume you are using
dplyr::filter
(and not stats::filter
, for which this will not work); do not use project_file$
inside of it, just do filter(project_final, SalePrice == 'NA')
. Furthermore, is it the literal string "NA"
or is it R's NA
, in which case you'd want filter(project_final, is.na(SalePrice))
.– r2evans
Nov 8 at 13:25
filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12
filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12
what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37
what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37
You misunderstand
data.frame
s: everything in a column is the same class, so even the NA
s in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice))
, returning all rows that do not have an NA
price?– r2evans
Nov 9 at 16:38
You misunderstand
data.frame
s: everything in a column is the same class, so even the NA
s in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice))
, returning all rows that do not have an NA
price?– r2evans
Nov 9 at 16:38
Background: there are five types of
NA
, though they are always printed as the basic NA
. There are also: NA_integer_
(what you have, likely), NA_real_
(is numeric
), NA_complex_
, and NA_character_
. The basic NA
that is not yet associated with strings or numbers will be a logical
, so is.logical(NA)
is true, not is.logical(c(1,NA))
, instead is.numeric(c(1,NA))
. The reason is that everything in a vector must be the same class (logical
, integer
, numeric
, character
); each column in a (normal) data.frame
is a vector internally, so shares the same-class requirement.– r2evans
Nov 9 at 16:42
Background: there are five types of
NA
, though they are always printed as the basic NA
. There are also: NA_integer_
(what you have, likely), NA_real_
(is numeric
), NA_complex_
, and NA_character_
. The basic NA
that is not yet associated with strings or numbers will be a logical
, so is.logical(NA)
is true, not is.logical(c(1,NA))
, instead is.numeric(c(1,NA))
. The reason is that everything in a vector must be the same class (logical
, integer
, numeric
, character
); each column in a (normal) data.frame
is a vector internally, so shares the same-class requirement.– r2evans
Nov 9 at 16:42
|
show 2 more comments
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53208567%2fhow-to-split-dataset-in-r-with-respect-to-a-column-containing-na-or-it-consists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I assume you are using
dplyr::filter
(and notstats::filter
, for which this will not work); do not useproject_file$
inside of it, just dofilter(project_final, SalePrice == 'NA')
. Furthermore, is it the literal string"NA"
or is it R'sNA
, in which case you'd wantfilter(project_final, is.na(SalePrice))
.– r2evans
Nov 8 at 13:25
filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12
what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37
You misunderstand
data.frame
s: everything in a column is the same class, so even theNA
s in that column will be marked as integer. Do you wantfilter(project_final, !is.na(SalePrice))
, returning all rows that do not have anNA
price?– r2evans
Nov 9 at 16:38
Background: there are five types of
NA
, though they are always printed as the basicNA
. There are also:NA_integer_
(what you have, likely),NA_real_
(isnumeric
),NA_complex_
, andNA_character_
. The basicNA
that is not yet associated with strings or numbers will be alogical
, sois.logical(NA)
is true, notis.logical(c(1,NA))
, insteadis.numeric(c(1,NA))
. The reason is that everything in a vector must be the same class (logical
,integer
,numeric
,character
); each column in a (normal)data.frame
is a vector internally, so shares the same-class requirement.– r2evans
Nov 9 at 16:42