Eigen rowwise addition/subtraction performance
up vote
0
down vote
favorite
During profiling of my program i found that following lines is bottleneck
// Some big nested loop
{
const auto inpRow = inpMap.row(counter);
outMap.row(adjRow) -= inpRow;
outMap.row(colInd) += inpRow;
}
outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.
Is there a way to improve performance of such operations? (Except parallel for of course)
c++ performance matrix eigen eigen3
add a comment |
up vote
0
down vote
favorite
During profiling of my program i found that following lines is bottleneck
// Some big nested loop
{
const auto inpRow = inpMap.row(counter);
outMap.row(adjRow) -= inpRow;
outMap.row(colInd) += inpRow;
}
outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.
Is there a way to improve performance of such operations? (Except parallel for of course)
c++ performance matrix eigen eigen3
Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00
1
@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01
#define EIGEN_NO_DEBUG?
– R zu
yesterday
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
During profiling of my program i found that following lines is bottleneck
// Some big nested loop
{
const auto inpRow = inpMap.row(counter);
outMap.row(adjRow) -= inpRow;
outMap.row(colInd) += inpRow;
}
outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.
Is there a way to improve performance of such operations? (Except parallel for of course)
c++ performance matrix eigen eigen3
During profiling of my program i found that following lines is bottleneck
// Some big nested loop
{
const auto inpRow = inpMap.row(counter);
outMap.row(adjRow) -= inpRow;
outMap.row(colInd) += inpRow;
}
outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.
Is there a way to improve performance of such operations? (Except parallel for of course)
c++ performance matrix eigen eigen3
c++ performance matrix eigen eigen3
asked Nov 8 at 8:33
Dark_Daiver
7061830
7061830
Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00
1
@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01
#define EIGEN_NO_DEBUG?
– R zu
yesterday
add a comment |
Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00
1
@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01
#define EIGEN_NO_DEBUG?
– R zu
yesterday
Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00
Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00
1
1
@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01
@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01
#define EIGEN_NO_DEBUG?– R zu
yesterday
#define EIGEN_NO_DEBUG?– R zu
yesterday
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:
- Make sure you enabled compiler optimizations,
-O3 -march=native
- Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)
- If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:
- Make sure you enabled compiler optimizations,
-O3 -march=native
- Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)
- If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.
add a comment |
up vote
1
down vote
accepted
There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:
- Make sure you enabled compiler optimizations,
-O3 -march=native
- Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)
- If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:
- Make sure you enabled compiler optimizations,
-O3 -march=native
- Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)
- If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.
There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:
- Make sure you enabled compiler optimizations,
-O3 -march=native
- Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)
- If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.
answered Nov 8 at 10:59
ggael
19.5k22944
19.5k22944
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53204003%2feigen-rowwise-addition-subtraction-performance%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00
1
@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01
#define EIGEN_NO_DEBUG?– R zu
yesterday