Pandas: Conditionally insert rows into DataFrame while iterating through rows
up vote
3
down vote
favorite
While iterating through the rows of a specific column in a Pandas DataFrame, I would like to add a new row below the currently iterated row, if the cell in the currently iterated row meets a certain condition.
Say for example:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})
DataFrame:
A B
0 0.15 1500
1 0.15 1500
2 0.70 7000
Attempt:
y = 100 #An example scalar
i = 1
for x in df['A']:
if x is not None: #Values in 'A' are filled atm, but not necessarily.
df.loc[i] = [None, x*y] #Should insert None into 'A', and product into 'B'.
df.index = df.index + 1 #Shift index? According to this S/O answer: https://stackoverflow.com/a/24284680/4909923
i = i + 1
df.sort_index(inplace=True) #Sort index?
I haven't been able to succeed so far; getting a shifted index numbering that doesn't start at 0, and rows seem not to be inserted in an orderly way:
A B
3 0.15 1500
4 NaN 70
5 0.70 7000
I tried various variants of this, trying to use applymap
with a lambda function, but was not able to get it working.
Desired result:
A B
0 0.15 1500
1 None 15
2 0.15 1500
3 None 15
4 0.70 7000
5 None 70
python pandas
add a comment |
up vote
3
down vote
favorite
While iterating through the rows of a specific column in a Pandas DataFrame, I would like to add a new row below the currently iterated row, if the cell in the currently iterated row meets a certain condition.
Say for example:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})
DataFrame:
A B
0 0.15 1500
1 0.15 1500
2 0.70 7000
Attempt:
y = 100 #An example scalar
i = 1
for x in df['A']:
if x is not None: #Values in 'A' are filled atm, but not necessarily.
df.loc[i] = [None, x*y] #Should insert None into 'A', and product into 'B'.
df.index = df.index + 1 #Shift index? According to this S/O answer: https://stackoverflow.com/a/24284680/4909923
i = i + 1
df.sort_index(inplace=True) #Sort index?
I haven't been able to succeed so far; getting a shifted index numbering that doesn't start at 0, and rows seem not to be inserted in an orderly way:
A B
3 0.15 1500
4 NaN 70
5 0.70 7000
I tried various variants of this, trying to use applymap
with a lambda function, but was not able to get it working.
Desired result:
A B
0 0.15 1500
1 None 15
2 0.15 1500
3 None 15
4 0.70 7000
5 None 70
python pandas
As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
– roganjosh
Nov 10 at 10:39
@roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
– Winterflags
Nov 10 at 10:43
So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
– roganjosh
Nov 10 at 10:44
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
While iterating through the rows of a specific column in a Pandas DataFrame, I would like to add a new row below the currently iterated row, if the cell in the currently iterated row meets a certain condition.
Say for example:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})
DataFrame:
A B
0 0.15 1500
1 0.15 1500
2 0.70 7000
Attempt:
y = 100 #An example scalar
i = 1
for x in df['A']:
if x is not None: #Values in 'A' are filled atm, but not necessarily.
df.loc[i] = [None, x*y] #Should insert None into 'A', and product into 'B'.
df.index = df.index + 1 #Shift index? According to this S/O answer: https://stackoverflow.com/a/24284680/4909923
i = i + 1
df.sort_index(inplace=True) #Sort index?
I haven't been able to succeed so far; getting a shifted index numbering that doesn't start at 0, and rows seem not to be inserted in an orderly way:
A B
3 0.15 1500
4 NaN 70
5 0.70 7000
I tried various variants of this, trying to use applymap
with a lambda function, but was not able to get it working.
Desired result:
A B
0 0.15 1500
1 None 15
2 0.15 1500
3 None 15
4 0.70 7000
5 None 70
python pandas
While iterating through the rows of a specific column in a Pandas DataFrame, I would like to add a new row below the currently iterated row, if the cell in the currently iterated row meets a certain condition.
Say for example:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})
DataFrame:
A B
0 0.15 1500
1 0.15 1500
2 0.70 7000
Attempt:
y = 100 #An example scalar
i = 1
for x in df['A']:
if x is not None: #Values in 'A' are filled atm, but not necessarily.
df.loc[i] = [None, x*y] #Should insert None into 'A', and product into 'B'.
df.index = df.index + 1 #Shift index? According to this S/O answer: https://stackoverflow.com/a/24284680/4909923
i = i + 1
df.sort_index(inplace=True) #Sort index?
I haven't been able to succeed so far; getting a shifted index numbering that doesn't start at 0, and rows seem not to be inserted in an orderly way:
A B
3 0.15 1500
4 NaN 70
5 0.70 7000
I tried various variants of this, trying to use applymap
with a lambda function, but was not able to get it working.
Desired result:
A B
0 0.15 1500
1 None 15
2 0.15 1500
3 None 15
4 0.70 7000
5 None 70
python pandas
python pandas
asked Nov 10 at 10:37
Winterflags
1,35942151
1,35942151
As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
– roganjosh
Nov 10 at 10:39
@roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
– Winterflags
Nov 10 at 10:43
So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
– roganjosh
Nov 10 at 10:44
add a comment |
As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
– roganjosh
Nov 10 at 10:39
@roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
– Winterflags
Nov 10 at 10:43
So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
– roganjosh
Nov 10 at 10:44
As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
– roganjosh
Nov 10 at 10:39
As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
– roganjosh
Nov 10 at 10:39
@roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
– Winterflags
Nov 10 at 10:43
@roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
– Winterflags
Nov 10 at 10:43
So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
– roganjosh
Nov 10 at 10:44
So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
– roganjosh
Nov 10 at 10:44
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
I believe you can use:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7],
'B': [1500, 1500, 7000],
'C': [100, 200, 400]})
v = 100
L =
for i, x in df.to_dict('index').items():
print (x)
#append dictionary
L.append(x)
#append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs
L.append({'A':x['A'] * v})
df = pd.DataFrame(L)
print (df)
A B C
0 0.15 1500.0 100.0
1 15.00 NaN NaN
2 0.15 1500.0 200.0
3 15.00 NaN NaN
4 0.70 7000.0 400.0
5 70.00 NaN NaN
@Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
– jezrael
Nov 10 at 12:00
add a comment |
up vote
1
down vote
It doesn't seem you need a manual loop here:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})
y = 100
# copy slice of dataframe
df_extra = df.loc[df['A'].notnull()].copy()
# assign A and B series values
df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))
# increment index partially, required for sorting afterwards
df_extra.index += 0.5
# append, sort index, drop index
res = df.append(df_extra).sort_index().reset_index(drop=True)
print(res)
A B
0 0.15 1500
1 NaN 15
2 0.15 1500
3 NaN 15
4 0.70 7000
5 NaN 70
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238097%2fpandas-conditionally-insert-rows-into-dataframe-while-iterating-through-rows%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
I believe you can use:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7],
'B': [1500, 1500, 7000],
'C': [100, 200, 400]})
v = 100
L =
for i, x in df.to_dict('index').items():
print (x)
#append dictionary
L.append(x)
#append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs
L.append({'A':x['A'] * v})
df = pd.DataFrame(L)
print (df)
A B C
0 0.15 1500.0 100.0
1 15.00 NaN NaN
2 0.15 1500.0 200.0
3 15.00 NaN NaN
4 0.70 7000.0 400.0
5 70.00 NaN NaN
@Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
– jezrael
Nov 10 at 12:00
add a comment |
up vote
1
down vote
accepted
I believe you can use:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7],
'B': [1500, 1500, 7000],
'C': [100, 200, 400]})
v = 100
L =
for i, x in df.to_dict('index').items():
print (x)
#append dictionary
L.append(x)
#append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs
L.append({'A':x['A'] * v})
df = pd.DataFrame(L)
print (df)
A B C
0 0.15 1500.0 100.0
1 15.00 NaN NaN
2 0.15 1500.0 200.0
3 15.00 NaN NaN
4 0.70 7000.0 400.0
5 70.00 NaN NaN
@Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
– jezrael
Nov 10 at 12:00
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
I believe you can use:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7],
'B': [1500, 1500, 7000],
'C': [100, 200, 400]})
v = 100
L =
for i, x in df.to_dict('index').items():
print (x)
#append dictionary
L.append(x)
#append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs
L.append({'A':x['A'] * v})
df = pd.DataFrame(L)
print (df)
A B C
0 0.15 1500.0 100.0
1 15.00 NaN NaN
2 0.15 1500.0 200.0
3 15.00 NaN NaN
4 0.70 7000.0 400.0
5 70.00 NaN NaN
I believe you can use:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7],
'B': [1500, 1500, 7000],
'C': [100, 200, 400]})
v = 100
L =
for i, x in df.to_dict('index').items():
print (x)
#append dictionary
L.append(x)
#append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs
L.append({'A':x['A'] * v})
df = pd.DataFrame(L)
print (df)
A B C
0 0.15 1500.0 100.0
1 15.00 NaN NaN
2 0.15 1500.0 200.0
3 15.00 NaN NaN
4 0.70 7000.0 400.0
5 70.00 NaN NaN
edited Nov 10 at 12:00
answered Nov 10 at 10:59
jezrael
316k22256333
316k22256333
@Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
– jezrael
Nov 10 at 12:00
add a comment |
@Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
– jezrael
Nov 10 at 12:00
@Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
– jezrael
Nov 10 at 12:00
@Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
– jezrael
Nov 10 at 12:00
add a comment |
up vote
1
down vote
It doesn't seem you need a manual loop here:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})
y = 100
# copy slice of dataframe
df_extra = df.loc[df['A'].notnull()].copy()
# assign A and B series values
df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))
# increment index partially, required for sorting afterwards
df_extra.index += 0.5
# append, sort index, drop index
res = df.append(df_extra).sort_index().reset_index(drop=True)
print(res)
A B
0 0.15 1500
1 NaN 15
2 0.15 1500
3 NaN 15
4 0.70 7000
5 NaN 70
add a comment |
up vote
1
down vote
It doesn't seem you need a manual loop here:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})
y = 100
# copy slice of dataframe
df_extra = df.loc[df['A'].notnull()].copy()
# assign A and B series values
df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))
# increment index partially, required for sorting afterwards
df_extra.index += 0.5
# append, sort index, drop index
res = df.append(df_extra).sort_index().reset_index(drop=True)
print(res)
A B
0 0.15 1500
1 NaN 15
2 0.15 1500
3 NaN 15
4 0.70 7000
5 NaN 70
add a comment |
up vote
1
down vote
up vote
1
down vote
It doesn't seem you need a manual loop here:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})
y = 100
# copy slice of dataframe
df_extra = df.loc[df['A'].notnull()].copy()
# assign A and B series values
df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))
# increment index partially, required for sorting afterwards
df_extra.index += 0.5
# append, sort index, drop index
res = df.append(df_extra).sort_index().reset_index(drop=True)
print(res)
A B
0 0.15 1500
1 NaN 15
2 0.15 1500
3 NaN 15
4 0.70 7000
5 NaN 70
It doesn't seem you need a manual loop here:
df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})
y = 100
# copy slice of dataframe
df_extra = df.loc[df['A'].notnull()].copy()
# assign A and B series values
df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))
# increment index partially, required for sorting afterwards
df_extra.index += 0.5
# append, sort index, drop index
res = df.append(df_extra).sort_index().reset_index(drop=True)
print(res)
A B
0 0.15 1500
1 NaN 15
2 0.15 1500
3 NaN 15
4 0.70 7000
5 NaN 70
answered Nov 10 at 12:22
jpp
88.6k195199
88.6k195199
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238097%2fpandas-conditionally-insert-rows-into-dataframe-while-iterating-through-rows%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
– roganjosh
Nov 10 at 10:39
@roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
– Winterflags
Nov 10 at 10:43
So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
– roganjosh
Nov 10 at 10:44