Elasticsearch phrase freq score without IDF











up vote
0
down vote

favorite












I need a custom score that is a combination of phrase frequency (i.e. number of "John Hobs" occurrences - same order, next to each other) and custom score that is computed while indexing.



Basically I need to turn off IDF from default scoring as it brings extra information that has different (not under control) value for each term and is nondeterministic because of shards. I know I can use function score, but I need to get somehow the phrase frequency value and without the need of reindexing.




  1. There is probably no way to turn off IDF in default similarities (which can be tune dynamically without reindexing), right?


  2. I can define custom scripted similarity, but the score is actually computed for each term and summed up (for "John Hobs" it is computed twice, for "John Walker Hobs" three times, etc.) and I don't now how to actually get number of terms of the query in the script.


  3. I can write custom plugin. It should work without reindexing, but it works on term level only (I can get term frequency). How can I can compute phrase frequency? I cannot get any position information. Also I can access my custom score defined while indexing via lookup, what about performance? I suspect the performance would be not good.



I'll be glad for any answer of any question :)
Thank you very much










share|improve this question









New contributor




Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    0
    down vote

    favorite












    I need a custom score that is a combination of phrase frequency (i.e. number of "John Hobs" occurrences - same order, next to each other) and custom score that is computed while indexing.



    Basically I need to turn off IDF from default scoring as it brings extra information that has different (not under control) value for each term and is nondeterministic because of shards. I know I can use function score, but I need to get somehow the phrase frequency value and without the need of reindexing.




    1. There is probably no way to turn off IDF in default similarities (which can be tune dynamically without reindexing), right?


    2. I can define custom scripted similarity, but the score is actually computed for each term and summed up (for "John Hobs" it is computed twice, for "John Walker Hobs" three times, etc.) and I don't now how to actually get number of terms of the query in the script.


    3. I can write custom plugin. It should work without reindexing, but it works on term level only (I can get term frequency). How can I can compute phrase frequency? I cannot get any position information. Also I can access my custom score defined while indexing via lookup, what about performance? I suspect the performance would be not good.



    I'll be glad for any answer of any question :)
    Thank you very much










    share|improve this question









    New contributor




    Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I need a custom score that is a combination of phrase frequency (i.e. number of "John Hobs" occurrences - same order, next to each other) and custom score that is computed while indexing.



      Basically I need to turn off IDF from default scoring as it brings extra information that has different (not under control) value for each term and is nondeterministic because of shards. I know I can use function score, but I need to get somehow the phrase frequency value and without the need of reindexing.




      1. There is probably no way to turn off IDF in default similarities (which can be tune dynamically without reindexing), right?


      2. I can define custom scripted similarity, but the score is actually computed for each term and summed up (for "John Hobs" it is computed twice, for "John Walker Hobs" three times, etc.) and I don't now how to actually get number of terms of the query in the script.


      3. I can write custom plugin. It should work without reindexing, but it works on term level only (I can get term frequency). How can I can compute phrase frequency? I cannot get any position information. Also I can access my custom score defined while indexing via lookup, what about performance? I suspect the performance would be not good.



      I'll be glad for any answer of any question :)
      Thank you very much










      share|improve this question









      New contributor




      Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I need a custom score that is a combination of phrase frequency (i.e. number of "John Hobs" occurrences - same order, next to each other) and custom score that is computed while indexing.



      Basically I need to turn off IDF from default scoring as it brings extra information that has different (not under control) value for each term and is nondeterministic because of shards. I know I can use function score, but I need to get somehow the phrase frequency value and without the need of reindexing.




      1. There is probably no way to turn off IDF in default similarities (which can be tune dynamically without reindexing), right?


      2. I can define custom scripted similarity, but the score is actually computed for each term and summed up (for "John Hobs" it is computed twice, for "John Walker Hobs" three times, etc.) and I don't now how to actually get number of terms of the query in the script.


      3. I can write custom plugin. It should work without reindexing, but it works on term level only (I can get term frequency). How can I can compute phrase frequency? I cannot get any position information. Also I can access my custom score defined while indexing via lookup, what about performance? I suspect the performance would be not good.



      I'll be glad for any answer of any question :)
      Thank you very much







      elasticsearch ranking






      share|improve this question









      New contributor




      Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 2 days ago





















      New contributor




      Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      Adam Horky

      11




      11




      New contributor




      Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Adam Horky is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





























          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Adam Horky is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53203458%2felasticsearch-phrase-freq-score-without-idf%23new-answer', 'question_page');
          }
          );

          Post as a guest





































          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Adam Horky is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          Adam Horky is a new contributor. Be nice, and check out our Code of Conduct.













          Adam Horky is a new contributor. Be nice, and check out our Code of Conduct.












          Adam Horky is a new contributor. Be nice, and check out our Code of Conduct.















           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53203458%2felasticsearch-phrase-freq-score-without-idf%23new-answer', 'question_page');
          }
          );

          Post as a guest




















































































          Popular posts from this blog

          Schultheiß

          Liste der Kulturdenkmale in Wilsdruff

          Android Play Services Check