Generate numpy array using multiple columns of pandas dataframe

up vote
0
down vote

favorite

Sorry for the long post.
I'm using python 3.6 on windows 10.I have a pandas data frame that contain around 100,000 rows. From this data frame I need to generate Four numpy arrays. First 5 relevant rows of my data frame looks like below

A          B      x      UB1     LB1     UB2    LB2

0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127

0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185

0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617

0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644

0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828

Column B is (1-Column A), Actually column B is not there in my data frame. I have added it to explain my problem
From this data frame, I need to generate three arrays. My arrays looks like

My array c looks like array([-0.2134, -0.7866,-0.24735, -0.75265,-0.0125, -0.9875,-0.8365, -0.1635,-0.1234, -0.8766],dtype=float32)

Where first element is first row of column A with added negative sign, similarly 2nd element is taken from 1st row of column B, third element is from second row of column A,fourth element is 2nd row of column B & so on
My second array UB looks like

array([ 0.2237, 0.0881, 0.1501, 0.0948, 0.0415, 0.2237],dtype=float32)

where elements are rows of column X.

My third array,bounds, looks like

   array([[0.0133 , 0.1567],

       [0.127 , 1.0499],

       [0.422 , 0.5905],

       [0.5185 , 1.4715],

       [0.5007 , 1.3721],

       [2.0617 , 2.0866],

       [1.0854 , 1.9463],

       [1.9644 , 2.4655],

       [2.2602 , 2.7903],

       [3.2828 , 3.5192]])

Where bounds[0][0] is first row of LB1,bounds[0][1] is first row of UB1. bounds[1][0] is first row of LB2, bounds [1][1] is first row of UB2. Again bounds[2][0] is 2nd row of LB1 & so on.
My fourth array looks like

array([[-1,  1,  0,  0,  0,  0,  0,  0,  0,  0],

       [ 0,  0, -1,  1,  0,  0,  0,  0,  0,  0],

       [ 0,  0,  0,  0, -1,  1,  0,  0,  0,  0],

       [ 0,  0,  0,  0,  0,  0, -1,  1,  0,  0],

       [ 0,  0,  0,  0,  0,  0,  0,  0, -1,  1]])

It contains same number of rows as data frame rows & column=2*data frame rows.

Can you please tell me for 100,000 rows of record what is the efficient way to generate these arrays

asked Nov 8 at 10:49

Tanvi Mirza

79117

add a comment |

up vote
0
down vote

favorite

A          B      x      UB1     LB1     UB2    LB2

0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127

0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185

0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617

0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644

0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828

Column B is (1-Column A), Actually column B is not there in my data frame. I have added it to explain my problem
From this data frame, I need to generate three arrays. My arrays looks like

My array c looks like array([-0.2134, -0.7866,-0.24735, -0.75265,-0.0125, -0.9875,-0.8365, -0.1635,-0.1234, -0.8766],dtype=float32)

array([ 0.2237, 0.0881, 0.1501, 0.0948, 0.0415, 0.2237],dtype=float32)

where elements are rows of column X.

My third array,bounds, looks like

   array([[0.0133 , 0.1567],

       [0.127 , 1.0499],

       [0.422 , 0.5905],

       [0.5185 , 1.4715],

       [0.5007 , 1.3721],

       [2.0617 , 2.0866],

       [1.0854 , 1.9463],

       [1.9644 , 2.4655],

       [2.2602 , 2.7903],

       [3.2828 , 3.5192]])

array([[-1,  1,  0,  0,  0,  0,  0,  0,  0,  0],

       [ 0,  0, -1,  1,  0,  0,  0,  0,  0,  0],

       [ 0,  0,  0,  0, -1,  1,  0,  0,  0,  0],

       [ 0,  0,  0,  0,  0,  0, -1,  1,  0,  0],

       [ 0,  0,  0,  0,  0,  0,  0,  0, -1,  1]])

It contains same number of rows as data frame rows & column=2*data frame rows.

Can you please tell me for 100,000 rows of record what is the efficient way to generate these arrays

asked Nov 8 at 10:49

Tanvi Mirza

79117

add a comment |

up vote
0
down vote

favorite

A          B      x      UB1     LB1     UB2    LB2

0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127

0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185

0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617

0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644

0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828

Column B is (1-Column A), Actually column B is not there in my data frame. I have added it to explain my problem
From this data frame, I need to generate three arrays. My arrays looks like

My array c looks like array([-0.2134, -0.7866,-0.24735, -0.75265,-0.0125, -0.9875,-0.8365, -0.1635,-0.1234, -0.8766],dtype=float32)

array([ 0.2237, 0.0881, 0.1501, 0.0948, 0.0415, 0.2237],dtype=float32)

where elements are rows of column X.

My third array,bounds, looks like

   array([[0.0133 , 0.1567],

       [0.127 , 1.0499],

       [0.422 , 0.5905],

       [0.5185 , 1.4715],

       [0.5007 , 1.3721],

       [2.0617 , 2.0866],

       [1.0854 , 1.9463],

       [1.9644 , 2.4655],

       [2.2602 , 2.7903],

       [3.2828 , 3.5192]])

array([[-1,  1,  0,  0,  0,  0,  0,  0,  0,  0],

       [ 0,  0, -1,  1,  0,  0,  0,  0,  0,  0],

       [ 0,  0,  0,  0, -1,  1,  0,  0,  0,  0],

       [ 0,  0,  0,  0,  0,  0, -1,  1,  0,  0],

       [ 0,  0,  0,  0,  0,  0,  0,  0, -1,  1]])

It contains same number of rows as data frame rows & column=2*data frame rows.

Can you please tell me for 100,000 rows of record what is the efficient way to generate these arrays

asked Nov 8 at 10:49

Tanvi Mirza

79117

A          B      x      UB1     LB1     UB2    LB2

0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127

0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185

0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617

0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644

0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828

Column B is (1-Column A), Actually column B is not there in my data frame. I have added it to explain my problem
From this data frame, I need to generate three arrays. My arrays looks like

My array c looks like array([-0.2134, -0.7866,-0.24735, -0.75265,-0.0125, -0.9875,-0.8365, -0.1635,-0.1234, -0.8766],dtype=float32)

array([ 0.2237, 0.0881, 0.1501, 0.0948, 0.0415, 0.2237],dtype=float32)

where elements are rows of column X.

My third array,bounds, looks like

   array([[0.0133 , 0.1567],

       [0.127 , 1.0499],

       [0.422 , 0.5905],

       [0.5185 , 1.4715],

       [0.5007 , 1.3721],

       [2.0617 , 2.0866],

       [1.0854 , 1.9463],

       [1.9644 , 2.4655],

       [2.2602 , 2.7903],

       [3.2828 , 3.5192]])

array([[-1,  1,  0,  0,  0,  0,  0,  0,  0,  0],

       [ 0,  0, -1,  1,  0,  0,  0,  0,  0,  0],

       [ 0,  0,  0,  0, -1,  1,  0,  0,  0,  0],

       [ 0,  0,  0,  0,  0,  0, -1,  1,  0,  0],

       [ 0,  0,  0,  0,  0,  0,  0,  0, -1,  1]])

It contains same number of rows as data frame rows & column=2*data frame rows.

Can you please tell me for 100,000 rows of record what is the efficient way to generate these arrays

python arrays pandas

asked Nov 8 at 10:49

Tanvi Mirza

79117

asked Nov 8 at 10:49

Tanvi Mirza

79117

asked Nov 8 at 10:49

Tanvi Mirza

79117

asked Nov 8 at 10:49

Tanvi Mirza

79117

asked Nov 8 at 10:49

Tanvi Mirza

79117

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

This should be rather straightforward:

from io import StringIO

import pandas as pd

import numpy as np



data = """A          B      x      UB1     LB1     UB2    LB2

0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127

0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185

0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617

0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644

0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828"""



df = pd.read_csv(StringIO(data), sep='\s+', header=0)



c = -np.stack([df['A'], 1 - df['A']], axis=1).ravel()

print(c)

# [-0.2134  -0.7866  -0.24735 -0.75265 -0.0125  -0.9875  -0.8365  -0.1635

#  -0.1234  -0.8766 ]



ub = df['x'].values

print(ub)

# [0.2237 0.0881 0.1501 0.0948 0.0415]



bounds = np.stack([df['LB1'], df['UB1'], df['LB2'], df['UB2']], axis=1).reshape((-1, 2))

print(bounds)

# [[0.0133 0.1567]

#  [0.127  1.0499]

#  [0.422  0.5905]

#  [0.5185 1.4715]

#  [0.5007 1.3721]

#  [2.0617 2.0866]

#  [1.0854 1.9463]

#  [1.9644 2.4655]

#  [2.2602 2.7903]

#  [3.2828 3.5192]]



n = len(df)

fourth = np.zeros((n, 2 * n))

idx = np.arange(n)

fourth[idx, 2 * idx] = -1

fourth[idx, 2 * idx + 1] = 1

print(fourth)

# [[-1.  1.  0.  0.  0.  0.  0.  0.  0.  0.]

#  [ 0.  0. -1.  1.  0.  0.  0.  0.  0.  0.]

#  [ 0.  0.  0.  0. -1.  1.  0.  0.  0.  0.]

#  [ 0.  0.  0.  0.  0.  0. -1.  1.  0.  0.]

#  [ 0.  0.  0.  0.  0.  0.  0.  0. -1.  1.]]

answered Nov 8 at 11:09

jdehesa

20.8k33050

It works, Thanks a lot @jdehesa
– Tanvi Mirza
Nov 8 at 11:39

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206161%2fgenerate-numpy-array-using-multiple-columns-of-pandas-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

This should be rather straightforward:

from io import StringIO

import pandas as pd

import numpy as np



data = """A          B      x      UB1     LB1     UB2    LB2

0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127

0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185

0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617

0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644

0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828"""



df = pd.read_csv(StringIO(data), sep='\s+', header=0)



c = -np.stack([df['A'], 1 - df['A']], axis=1).ravel()

print(c)

# [-0.2134  -0.7866  -0.24735 -0.75265 -0.0125  -0.9875  -0.8365  -0.1635

#  -0.1234  -0.8766 ]



ub = df['x'].values

print(ub)

# [0.2237 0.0881 0.1501 0.0948 0.0415]



bounds = np.stack([df['LB1'], df['UB1'], df['LB2'], df['UB2']], axis=1).reshape((-1, 2))

print(bounds)

# [[0.0133 0.1567]

#  [0.127  1.0499]

#  [0.422  0.5905]

#  [0.5185 1.4715]

#  [0.5007 1.3721]

#  [2.0617 2.0866]

#  [1.0854 1.9463]

#  [1.9644 2.4655]

#  [2.2602 2.7903]

#  [3.2828 3.5192]]



n = len(df)

fourth = np.zeros((n, 2 * n))

idx = np.arange(n)

fourth[idx, 2 * idx] = -1

fourth[idx, 2 * idx + 1] = 1

print(fourth)

# [[-1.  1.  0.  0.  0.  0.  0.  0.  0.  0.]

#  [ 0.  0. -1.  1.  0.  0.  0.  0.  0.  0.]

#  [ 0.  0.  0.  0. -1.  1.  0.  0.  0.  0.]

#  [ 0.  0.  0.  0.  0.  0. -1.  1.  0.  0.]

#  [ 0.  0.  0.  0.  0.  0.  0.  0. -1.  1.]]

answered Nov 8 at 11:09

jdehesa

20.8k33050

It works, Thanks a lot @jdehesa
– Tanvi Mirza
Nov 8 at 11:39

add a comment |

up vote
1
down vote

accepted

This should be rather straightforward:

from io import StringIO

import pandas as pd

import numpy as np



data = """A          B      x      UB1     LB1     UB2    LB2

0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127

0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185

0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617

0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644

0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828"""



df = pd.read_csv(StringIO(data), sep='\s+', header=0)



c = -np.stack([df['A'], 1 - df['A']], axis=1).ravel()

print(c)

# [-0.2134  -0.7866  -0.24735 -0.75265 -0.0125  -0.9875  -0.8365  -0.1635

#  -0.1234  -0.8766 ]



ub = df['x'].values

print(ub)

# [0.2237 0.0881 0.1501 0.0948 0.0415]



bounds = np.stack([df['LB1'], df['UB1'], df['LB2'], df['UB2']], axis=1).reshape((-1, 2))

print(bounds)

# [[0.0133 0.1567]

#  [0.127  1.0499]

#  [0.422  0.5905]

#  [0.5185 1.4715]

#  [0.5007 1.3721]

#  [2.0617 2.0866]

#  [1.0854 1.9463]

#  [1.9644 2.4655]

#  [2.2602 2.7903]

#  [3.2828 3.5192]]



n = len(df)

fourth = np.zeros((n, 2 * n))

idx = np.arange(n)

fourth[idx, 2 * idx] = -1

fourth[idx, 2 * idx + 1] = 1

print(fourth)

# [[-1.  1.  0.  0.  0.  0.  0.  0.  0.  0.]

#  [ 0.  0. -1.  1.  0.  0.  0.  0.  0.  0.]

#  [ 0.  0.  0.  0. -1.  1.  0.  0.  0.  0.]

#  [ 0.  0.  0.  0.  0.  0. -1.  1.  0.  0.]

#  [ 0.  0.  0.  0.  0.  0.  0.  0. -1.  1.]]

answered Nov 8 at 11:09

jdehesa

20.8k33050

It works, Thanks a lot @jdehesa
– Tanvi Mirza
Nov 8 at 11:39

add a comment |

up vote
1
down vote

accepted

This should be rather straightforward:

from io import StringIO

import pandas as pd

import numpy as np



data = """A          B      x      UB1     LB1     UB2    LB2

0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127

0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185

0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617

0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644

0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828"""



df = pd.read_csv(StringIO(data), sep='\s+', header=0)



c = -np.stack([df['A'], 1 - df['A']], axis=1).ravel()

print(c)

# [-0.2134  -0.7866  -0.24735 -0.75265 -0.0125  -0.9875  -0.8365  -0.1635

#  -0.1234  -0.8766 ]



ub = df['x'].values

print(ub)

# [0.2237 0.0881 0.1501 0.0948 0.0415]



bounds = np.stack([df['LB1'], df['UB1'], df['LB2'], df['UB2']], axis=1).reshape((-1, 2))

print(bounds)

# [[0.0133 0.1567]

#  [0.127  1.0499]

#  [0.422  0.5905]

#  [0.5185 1.4715]

#  [0.5007 1.3721]

#  [2.0617 2.0866]

#  [1.0854 1.9463]

#  [1.9644 2.4655]

#  [2.2602 2.7903]

#  [3.2828 3.5192]]



n = len(df)

fourth = np.zeros((n, 2 * n))

idx = np.arange(n)

fourth[idx, 2 * idx] = -1

fourth[idx, 2 * idx + 1] = 1

print(fourth)

# [[-1.  1.  0.  0.  0.  0.  0.  0.  0.  0.]

#  [ 0.  0. -1.  1.  0.  0.  0.  0.  0.  0.]

#  [ 0.  0.  0.  0. -1.  1.  0.  0.  0.  0.]

#  [ 0.  0.  0.  0.  0.  0. -1.  1.  0.  0.]

#  [ 0.  0.  0.  0.  0.  0.  0.  0. -1.  1.]]

answered Nov 8 at 11:09

jdehesa

20.8k33050

This should be rather straightforward:

from io import StringIO

import pandas as pd

import numpy as np



data = """A          B      x      UB1     LB1     UB2    LB2

0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127

0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185

0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617

0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644

0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828"""



df = pd.read_csv(StringIO(data), sep='\s+', header=0)



c = -np.stack([df['A'], 1 - df['A']], axis=1).ravel()

print(c)

# [-0.2134  -0.7866  -0.24735 -0.75265 -0.0125  -0.9875  -0.8365  -0.1635

#  -0.1234  -0.8766 ]



ub = df['x'].values

print(ub)

# [0.2237 0.0881 0.1501 0.0948 0.0415]



bounds = np.stack([df['LB1'], df['UB1'], df['LB2'], df['UB2']], axis=1).reshape((-1, 2))

print(bounds)

# [[0.0133 0.1567]

#  [0.127  1.0499]

#  [0.422  0.5905]

#  [0.5185 1.4715]

#  [0.5007 1.3721]

#  [2.0617 2.0866]

#  [1.0854 1.9463]

#  [1.9644 2.4655]

#  [2.2602 2.7903]

#  [3.2828 3.5192]]



n = len(df)

fourth = np.zeros((n, 2 * n))

idx = np.arange(n)

fourth[idx, 2 * idx] = -1

fourth[idx, 2 * idx + 1] = 1

print(fourth)

# [[-1.  1.  0.  0.  0.  0.  0.  0.  0.  0.]

#  [ 0.  0. -1.  1.  0.  0.  0.  0.  0.  0.]

#  [ 0.  0.  0.  0. -1.  1.  0.  0.  0.  0.]

#  [ 0.  0.  0.  0.  0.  0. -1.  1.  0.  0.]

#  [ 0.  0.  0.  0.  0.  0.  0.  0. -1.  1.]]

answered Nov 8 at 11:09

jdehesa

20.8k33050

answered Nov 8 at 11:09

jdehesa

20.8k33050

answered Nov 8 at 11:09

jdehesa

20.8k33050

answered Nov 8 at 11:09

jdehesa

20.8k33050

It works, Thanks a lot @jdehesa
– Tanvi Mirza
Nov 8 at 11:39

add a comment |

It works, Thanks a lot @jdehesa
– Tanvi Mirza
Nov 8 at 11:39

It works, Thanks a lot @jdehesa
– Tanvi Mirza
Nov 8 at 11:39

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xtykutl