How to Split Dataset in R with respect to a column containing NA or it consists a value

up vote
0
down vote

favorite

I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.

project_train_2= filter(project_final,project_final$SalePrice=='NA')



Pro_train=createDataPartition(project_final,project_final[project_train_2,])

Here Project final is the combined Train and test data, which I want to split with respect to the SalesPrice column either having NA or is having a value.
Those having values have to be in Train and the ones not having the test.

I have used the above command to filter data and there by using the CreatedataPartition code to split the data.

Need help on the same.

edited Nov 8 at 13:22

EstevaoLuis

1,07831727

asked Nov 8 at 13:17

Anand Menon

I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
– r2evans
Nov 8 at 13:25

filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12

what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37

You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
– r2evans
Nov 9 at 16:38

Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
– r2evans
Nov 9 at 16:42

|
show 2 more comments

up vote
0
down vote

favorite

I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.

project_train_2= filter(project_final,project_final$SalePrice=='NA')



Pro_train=createDataPartition(project_final,project_final[project_train_2,])

I have used the above command to filter data and there by using the CreatedataPartition code to split the data.

Need help on the same.

edited Nov 8 at 13:22

EstevaoLuis

1,07831727

asked Nov 8 at 13:17

Anand Menon

I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
– r2evans
Nov 8 at 13:25

filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12

what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37

You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
– r2evans
Nov 9 at 16:38

Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
– r2evans
Nov 9 at 16:42

|
show 2 more comments

up vote
0
down vote

favorite

I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.

project_train_2= filter(project_final,project_final$SalePrice=='NA')



Pro_train=createDataPartition(project_final,project_final[project_train_2,])

I have used the above command to filter data and there by using the CreatedataPartition code to split the data.

Need help on the same.

edited Nov 8 at 13:22

EstevaoLuis

1,07831727

asked Nov 8 at 13:17

Anand Menon

I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.

project_train_2= filter(project_final,project_final$SalePrice=='NA')



Pro_train=createDataPartition(project_final,project_final[project_train_2,])

I have used the above command to filter data and there by using the CreatedataPartition code to split the data.

Need help on the same.

edited Nov 8 at 13:22

EstevaoLuis

1,07831727

asked Nov 8 at 13:17

Anand Menon

edited Nov 8 at 13:22

EstevaoLuis

1,07831727

asked Nov 8 at 13:17

Anand Menon

edited Nov 8 at 13:22

EstevaoLuis

1,07831727

edited Nov 8 at 13:22

EstevaoLuis

1,07831727

edited Nov 8 at 13:22

EstevaoLuis

1,07831727

asked Nov 8 at 13:17

Anand Menon

asked Nov 8 at 13:17

Anand Menon

asked Nov 8 at 13:17

Anand Menon

I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
– r2evans
Nov 8 at 13:25

filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12

what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37

You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
– r2evans
Nov 9 at 16:38

Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
– r2evans
Nov 9 at 16:42

|
show 2 more comments

I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
– r2evans
Nov 8 at 13:25

filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12

what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37

You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
– r2evans
Nov 9 at 16:38

Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
– r2evans
Nov 9 at 16:42

I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
– r2evans
Nov 8 at 13:25

filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12

what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37

You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
– r2evans
Nov 9 at 16:38

Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
– r2evans
Nov 9 at 16:42

|
show 2 more comments

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53208567%2fhow-to-split-dataset-in-r-with-respect-to-a-column-containing-na-or-it-consists%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xtykutl