Elasticsearch phrase freq score without IDF
up vote
0
down vote
favorite
I need a custom score that is a combination of phrase frequency (i.e. number of "John Hobs" occurrences - same order, next to each other) and custom score that is computed while indexing.
Basically I need to turn off IDF from default scoring as it brings extra information that has different (not under control) value for each term and is nondeterministic because of shards. I know I can use function score, but I need to get somehow the phrase frequency value and without the need of reindexing.
There is probably no way to turn off IDF in default similarities (which can be tune dynamically without reindexing), right?
I can define custom scripted similarity, but the score is actually computed for each term and summed up (for "John Hobs" it is computed twice, for "John Walker Hobs" three times, etc.) and I don't now how to actually get number of terms of the query in the script.
I can write custom plugin. It should work without reindexing, but it works on term level only (I can get term frequency). How can I can compute phrase frequency? I cannot get any position information. Also I can access my custom score defined while indexing via lookup, what about performance? I suspect the performance would be not good.
I'll be glad for any answer of any question :)
Thank you very much
data:image/s3,"s3://crabby-images/5bfc9/5bfc93fa37082c1d8ea5265ab6b41e4e15da9e92" alt=""
New contributor
Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
up vote
0
down vote
favorite
I need a custom score that is a combination of phrase frequency (i.e. number of "John Hobs" occurrences - same order, next to each other) and custom score that is computed while indexing.
Basically I need to turn off IDF from default scoring as it brings extra information that has different (not under control) value for each term and is nondeterministic because of shards. I know I can use function score, but I need to get somehow the phrase frequency value and without the need of reindexing.
There is probably no way to turn off IDF in default similarities (which can be tune dynamically without reindexing), right?
I can define custom scripted similarity, but the score is actually computed for each term and summed up (for "John Hobs" it is computed twice, for "John Walker Hobs" three times, etc.) and I don't now how to actually get number of terms of the query in the script.
I can write custom plugin. It should work without reindexing, but it works on term level only (I can get term frequency). How can I can compute phrase frequency? I cannot get any position information. Also I can access my custom score defined while indexing via lookup, what about performance? I suspect the performance would be not good.
I'll be glad for any answer of any question :)
Thank you very much
data:image/s3,"s3://crabby-images/5bfc9/5bfc93fa37082c1d8ea5265ab6b41e4e15da9e92" alt=""
New contributor
Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I need a custom score that is a combination of phrase frequency (i.e. number of "John Hobs" occurrences - same order, next to each other) and custom score that is computed while indexing.
Basically I need to turn off IDF from default scoring as it brings extra information that has different (not under control) value for each term and is nondeterministic because of shards. I know I can use function score, but I need to get somehow the phrase frequency value and without the need of reindexing.
There is probably no way to turn off IDF in default similarities (which can be tune dynamically without reindexing), right?
I can define custom scripted similarity, but the score is actually computed for each term and summed up (for "John Hobs" it is computed twice, for "John Walker Hobs" three times, etc.) and I don't now how to actually get number of terms of the query in the script.
I can write custom plugin. It should work without reindexing, but it works on term level only (I can get term frequency). How can I can compute phrase frequency? I cannot get any position information. Also I can access my custom score defined while indexing via lookup, what about performance? I suspect the performance would be not good.
I'll be glad for any answer of any question :)
Thank you very much
data:image/s3,"s3://crabby-images/5bfc9/5bfc93fa37082c1d8ea5265ab6b41e4e15da9e92" alt=""
New contributor
Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I need a custom score that is a combination of phrase frequency (i.e. number of "John Hobs" occurrences - same order, next to each other) and custom score that is computed while indexing.
Basically I need to turn off IDF from default scoring as it brings extra information that has different (not under control) value for each term and is nondeterministic because of shards. I know I can use function score, but I need to get somehow the phrase frequency value and without the need of reindexing.
There is probably no way to turn off IDF in default similarities (which can be tune dynamically without reindexing), right?
I can define custom scripted similarity, but the score is actually computed for each term and summed up (for "John Hobs" it is computed twice, for "John Walker Hobs" three times, etc.) and I don't now how to actually get number of terms of the query in the script.
I can write custom plugin. It should work without reindexing, but it works on term level only (I can get term frequency). How can I can compute phrase frequency? I cannot get any position information. Also I can access my custom score defined while indexing via lookup, what about performance? I suspect the performance would be not good.
I'll be glad for any answer of any question :)
Thank you very much
data:image/s3,"s3://crabby-images/5bfc9/5bfc93fa37082c1d8ea5265ab6b41e4e15da9e92" alt=""
data:image/s3,"s3://crabby-images/5bfc9/5bfc93fa37082c1d8ea5265ab6b41e4e15da9e92" alt=""
New contributor
Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 2 days ago
New contributor
Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 2 days ago
data:image/s3,"s3://crabby-images/9eb3e/9eb3e0c9476827d94fac2444c6df9efc5d2bc341" alt=""
data:image/s3,"s3://crabby-images/9eb3e/9eb3e0c9476827d94fac2444c6df9efc5d2bc341" alt=""
Adam Horky
11
11
New contributor
Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Adam Horky is a new contributor. Be nice, and check out our Code of Conduct.
Adam Horky is a new contributor. Be nice, and check out our Code of Conduct.
Adam Horky is a new contributor. Be nice, and check out our Code of Conduct.
Adam Horky is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53203458%2felasticsearch-phrase-freq-score-without-idf%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password