How to Split Dataset in R with respect to a column containing NA or it consists a value











up vote
0
down vote

favorite












I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.



project_train_2= filter(project_final,project_final$SalePrice=='NA')

Pro_train=createDataPartition(project_final,project_final[project_train_2,])


Here Project final is the combined Train and test data, which I want to split with respect to the SalesPrice column either having NA or is having a value.
Those having values have to be in Train and the ones not having the test.



I have used the above command to filter data and there by using the CreatedataPartition code to split the data.



Need help on the same.










share|improve this question
























  • I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
    – r2evans
    Nov 8 at 13:25










  • filter(project_final, is.na(SalePrice)), thanks this worked.
    – Anand Menon
    Nov 8 at 15:12










  • what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
    – Anand Menon
    Nov 9 at 7:37












  • You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
    – r2evans
    Nov 9 at 16:38










  • Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
    – r2evans
    Nov 9 at 16:42















up vote
0
down vote

favorite












I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.



project_train_2= filter(project_final,project_final$SalePrice=='NA')

Pro_train=createDataPartition(project_final,project_final[project_train_2,])


Here Project final is the combined Train and test data, which I want to split with respect to the SalesPrice column either having NA or is having a value.
Those having values have to be in Train and the ones not having the test.



I have used the above command to filter data and there by using the CreatedataPartition code to split the data.



Need help on the same.










share|improve this question
























  • I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
    – r2evans
    Nov 8 at 13:25










  • filter(project_final, is.na(SalePrice)), thanks this worked.
    – Anand Menon
    Nov 8 at 15:12










  • what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
    – Anand Menon
    Nov 9 at 7:37












  • You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
    – r2evans
    Nov 9 at 16:38










  • Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
    – r2evans
    Nov 9 at 16:42













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.



project_train_2= filter(project_final,project_final$SalePrice=='NA')

Pro_train=createDataPartition(project_final,project_final[project_train_2,])


Here Project final is the combined Train and test data, which I want to split with respect to the SalesPrice column either having NA or is having a value.
Those having values have to be in Train and the ones not having the test.



I have used the above command to filter data and there by using the CreatedataPartition code to split the data.



Need help on the same.










share|improve this question















I have a got 76 variables and here my Y is Sale Price, I had combined the train and test data to shape the variables now I want to split it again.



project_train_2= filter(project_final,project_final$SalePrice=='NA')

Pro_train=createDataPartition(project_final,project_final[project_train_2,])


Here Project final is the combined Train and test data, which I want to split with respect to the SalesPrice column either having NA or is having a value.
Those having values have to be in Train and the ones not having the test.



I have used the above command to filter data and there by using the CreatedataPartition code to split the data.



Need help on the same.







r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 8 at 13:22









EstevaoLuis

1,07831727




1,07831727










asked Nov 8 at 13:17









Anand Menon

11




11












  • I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
    – r2evans
    Nov 8 at 13:25










  • filter(project_final, is.na(SalePrice)), thanks this worked.
    – Anand Menon
    Nov 8 at 15:12










  • what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
    – Anand Menon
    Nov 9 at 7:37












  • You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
    – r2evans
    Nov 9 at 16:38










  • Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
    – r2evans
    Nov 9 at 16:42


















  • I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
    – r2evans
    Nov 8 at 13:25










  • filter(project_final, is.na(SalePrice)), thanks this worked.
    – Anand Menon
    Nov 8 at 15:12










  • what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
    – Anand Menon
    Nov 9 at 7:37












  • You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
    – r2evans
    Nov 9 at 16:38










  • Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
    – r2evans
    Nov 9 at 16:42
















I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
– r2evans
Nov 8 at 13:25




I assume you are using dplyr::filter (and not stats::filter, for which this will not work); do not use project_file$ inside of it, just do filter(project_final, SalePrice == 'NA'). Furthermore, is it the literal string "NA" or is it R's NA, in which case you'd want filter(project_final, is.na(SalePrice)).
– r2evans
Nov 8 at 13:25












filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12




filter(project_final, is.na(SalePrice)), thanks this worked.
– Anand Menon
Nov 8 at 15:12












what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37






what will be the command to filter SalePrice which has vallues, I am using the following command however it also gets all the rows with NA as well. project_train_2= filter(project_final,is.integer(SalePrice))
– Anand Menon
Nov 9 at 7:37














You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
– r2evans
Nov 9 at 16:38




You misunderstand data.frames: everything in a column is the same class, so even the NAs in that column will be marked as integer. Do you want filter(project_final, !is.na(SalePrice)), returning all rows that do not have an NA price?
– r2evans
Nov 9 at 16:38












Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
– r2evans
Nov 9 at 16:42




Background: there are five types of NA, though they are always printed as the basic NA. There are also: NA_integer_ (what you have, likely), NA_real_ (is numeric), NA_complex_, and NA_character_. The basic NA that is not yet associated with strings or numbers will be a logical, so is.logical(NA) is true, not is.logical(c(1,NA)), instead is.numeric(c(1,NA)). The reason is that everything in a vector must be the same class (logical, integer, numeric, character); each column in a (normal) data.frame is a vector internally, so shares the same-class requirement.
– r2evans
Nov 9 at 16:42

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53208567%2fhow-to-split-dataset-in-r-with-respect-to-a-column-containing-na-or-it-consists%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53208567%2fhow-to-split-dataset-in-r-with-respect-to-a-column-containing-na-or-it-consists%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Schultheiß

Verwaltungsgliederung Dänemarks

Liste der Kulturdenkmale in Wilsdruff