Q: ANTLR 4 Grammar recognition of whole odd value not only the last digit











up vote
0
down vote

favorite












I'm trying to make grammar for the calculator, however it have to be working only for odd numbers.



For example it works like that:
If I put 123 the result is 123.
If I put 1234 the result is 123, and the token recognition error at: 4 but should be at: 1234.



There is my grammar:



grammar G;
DIGIT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
operator : ('+' | '-' | '*' | ':');
result: DIGIT operator (DIGIT | result);


I mean specifically to make that, the 1234 should be recognized as an error, not only the last digit.










share|improve this question
























  • Welcome to SO! Please, clarify your question.
    – fewlinesofcode
    Nov 8 at 11:18










  • what are you trying to accomplish with the definition of DIGIT?
    – Jonas
    Nov 8 at 11:30










  • I'm trying to get an odd number.
    – Cevz
    Nov 8 at 11:50















up vote
0
down vote

favorite












I'm trying to make grammar for the calculator, however it have to be working only for odd numbers.



For example it works like that:
If I put 123 the result is 123.
If I put 1234 the result is 123, and the token recognition error at: 4 but should be at: 1234.



There is my grammar:



grammar G;
DIGIT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
operator : ('+' | '-' | '*' | ':');
result: DIGIT operator (DIGIT | result);


I mean specifically to make that, the 1234 should be recognized as an error, not only the last digit.










share|improve this question
























  • Welcome to SO! Please, clarify your question.
    – fewlinesofcode
    Nov 8 at 11:18










  • what are you trying to accomplish with the definition of DIGIT?
    – Jonas
    Nov 8 at 11:30










  • I'm trying to get an odd number.
    – Cevz
    Nov 8 at 11:50













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I'm trying to make grammar for the calculator, however it have to be working only for odd numbers.



For example it works like that:
If I put 123 the result is 123.
If I put 1234 the result is 123, and the token recognition error at: 4 but should be at: 1234.



There is my grammar:



grammar G;
DIGIT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
operator : ('+' | '-' | '*' | ':');
result: DIGIT operator (DIGIT | result);


I mean specifically to make that, the 1234 should be recognized as an error, not only the last digit.










share|improve this question















I'm trying to make grammar for the calculator, however it have to be working only for odd numbers.



For example it works like that:
If I put 123 the result is 123.
If I put 1234 the result is 123, and the token recognition error at: 4 but should be at: 1234.



There is my grammar:



grammar G;
DIGIT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
operator : ('+' | '-' | '*' | ':');
result: DIGIT operator (DIGIT | result);


I mean specifically to make that, the 1234 should be recognized as an error, not only the last digit.







grammar antlr4






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 8 at 11:54

























asked Nov 8 at 11:14









Cevz

12




12












  • Welcome to SO! Please, clarify your question.
    – fewlinesofcode
    Nov 8 at 11:18










  • what are you trying to accomplish with the definition of DIGIT?
    – Jonas
    Nov 8 at 11:30










  • I'm trying to get an odd number.
    – Cevz
    Nov 8 at 11:50


















  • Welcome to SO! Please, clarify your question.
    – fewlinesofcode
    Nov 8 at 11:18










  • what are you trying to accomplish with the definition of DIGIT?
    – Jonas
    Nov 8 at 11:30










  • I'm trying to get an odd number.
    – Cevz
    Nov 8 at 11:50
















Welcome to SO! Please, clarify your question.
– fewlinesofcode
Nov 8 at 11:18




Welcome to SO! Please, clarify your question.
– fewlinesofcode
Nov 8 at 11:18












what are you trying to accomplish with the definition of DIGIT?
– Jonas
Nov 8 at 11:30




what are you trying to accomplish with the definition of DIGIT?
– Jonas
Nov 8 at 11:30












I'm trying to get an odd number.
– Cevz
Nov 8 at 11:50




I'm trying to get an odd number.
– Cevz
Nov 8 at 11:50












1 Answer
1






active

oldest

votes

















up vote
0
down vote













The way that tokenization works is that it tries to find the longest prefix of the input that matches any of your regular expressions and then produces the appropriate token, consuming that prefix. So when the input is 1234, it sees 123 as the longest prefix that matches the DIGIT pattern (which should really be called ODD_INT or something) and produces the corresponding token. Then it sees the remaining 4 and produces an error because no rule matches it.



Note that it's not necessarily only the last digit that produces the error. For the input 1324, it would produce a DIGIT token for 13 and then a token recognition error for 24.



So how can you get the behaviour that you want? One approach would be to rewrite your pattern to match all sequences of digits and then use a semantic predicate to verify that the number is odd. The way that semantic predicates work on lexer rules is that it first takes the longest prefix that matches the pattern (without taking into account the predicate) and then checks the predicate. If the predicate is false, it moves on to the other patterns - it does not try to match the same pattern to a smaller input to make the predicate return true. So for the input 1234, the pattern would match the entire number and then the predicate would return false. Then it would try the other patterns, none of which match, so you'd get a token recognition error for the full number.



ODD_INT: ('0'..'9') + { Integer.parseInt(getText()) % 2 == 1 }?;


The down side of this approach is that you'll need to write some language-specific code (and if you're not using Java, you'll need to adjust the above code accordingly).



Alternatively, you could just recognize all integers in the lexer - not just odd ones - and then check whether they're odd later during semantic analysis.



If you do want to check the oddness using patterns only, you can also work around the problem by defining rules for both odd and even integers:



ODD_INT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
EVEN_INT: ('0'..'9') * ('0' | '2' | '4' | '6'| '8');


This way for an input like 1234, the longest match would always be 1234, not 123. It's just that this would match the EVEN_INT pattern, not ODD_INT. So you wouldn't get a token recognition error, but, if you consistently only use ODD_INT in the grammar, you would get an error saying that an ODD_INT was expected, but an EVEN_INT found.






share|improve this answer





















  • You're great! Thank you. Now I understand where the problem was.
    – Cevz
    Nov 8 at 21:56











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206614%2fq-antlr-4-grammar-recognition-of-whole-odd-value-not-only-the-last-digit%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













The way that tokenization works is that it tries to find the longest prefix of the input that matches any of your regular expressions and then produces the appropriate token, consuming that prefix. So when the input is 1234, it sees 123 as the longest prefix that matches the DIGIT pattern (which should really be called ODD_INT or something) and produces the corresponding token. Then it sees the remaining 4 and produces an error because no rule matches it.



Note that it's not necessarily only the last digit that produces the error. For the input 1324, it would produce a DIGIT token for 13 and then a token recognition error for 24.



So how can you get the behaviour that you want? One approach would be to rewrite your pattern to match all sequences of digits and then use a semantic predicate to verify that the number is odd. The way that semantic predicates work on lexer rules is that it first takes the longest prefix that matches the pattern (without taking into account the predicate) and then checks the predicate. If the predicate is false, it moves on to the other patterns - it does not try to match the same pattern to a smaller input to make the predicate return true. So for the input 1234, the pattern would match the entire number and then the predicate would return false. Then it would try the other patterns, none of which match, so you'd get a token recognition error for the full number.



ODD_INT: ('0'..'9') + { Integer.parseInt(getText()) % 2 == 1 }?;


The down side of this approach is that you'll need to write some language-specific code (and if you're not using Java, you'll need to adjust the above code accordingly).



Alternatively, you could just recognize all integers in the lexer - not just odd ones - and then check whether they're odd later during semantic analysis.



If you do want to check the oddness using patterns only, you can also work around the problem by defining rules for both odd and even integers:



ODD_INT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
EVEN_INT: ('0'..'9') * ('0' | '2' | '4' | '6'| '8');


This way for an input like 1234, the longest match would always be 1234, not 123. It's just that this would match the EVEN_INT pattern, not ODD_INT. So you wouldn't get a token recognition error, but, if you consistently only use ODD_INT in the grammar, you would get an error saying that an ODD_INT was expected, but an EVEN_INT found.






share|improve this answer





















  • You're great! Thank you. Now I understand where the problem was.
    – Cevz
    Nov 8 at 21:56















up vote
0
down vote













The way that tokenization works is that it tries to find the longest prefix of the input that matches any of your regular expressions and then produces the appropriate token, consuming that prefix. So when the input is 1234, it sees 123 as the longest prefix that matches the DIGIT pattern (which should really be called ODD_INT or something) and produces the corresponding token. Then it sees the remaining 4 and produces an error because no rule matches it.



Note that it's not necessarily only the last digit that produces the error. For the input 1324, it would produce a DIGIT token for 13 and then a token recognition error for 24.



So how can you get the behaviour that you want? One approach would be to rewrite your pattern to match all sequences of digits and then use a semantic predicate to verify that the number is odd. The way that semantic predicates work on lexer rules is that it first takes the longest prefix that matches the pattern (without taking into account the predicate) and then checks the predicate. If the predicate is false, it moves on to the other patterns - it does not try to match the same pattern to a smaller input to make the predicate return true. So for the input 1234, the pattern would match the entire number and then the predicate would return false. Then it would try the other patterns, none of which match, so you'd get a token recognition error for the full number.



ODD_INT: ('0'..'9') + { Integer.parseInt(getText()) % 2 == 1 }?;


The down side of this approach is that you'll need to write some language-specific code (and if you're not using Java, you'll need to adjust the above code accordingly).



Alternatively, you could just recognize all integers in the lexer - not just odd ones - and then check whether they're odd later during semantic analysis.



If you do want to check the oddness using patterns only, you can also work around the problem by defining rules for both odd and even integers:



ODD_INT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
EVEN_INT: ('0'..'9') * ('0' | '2' | '4' | '6'| '8');


This way for an input like 1234, the longest match would always be 1234, not 123. It's just that this would match the EVEN_INT pattern, not ODD_INT. So you wouldn't get a token recognition error, but, if you consistently only use ODD_INT in the grammar, you would get an error saying that an ODD_INT was expected, but an EVEN_INT found.






share|improve this answer





















  • You're great! Thank you. Now I understand where the problem was.
    – Cevz
    Nov 8 at 21:56













up vote
0
down vote










up vote
0
down vote









The way that tokenization works is that it tries to find the longest prefix of the input that matches any of your regular expressions and then produces the appropriate token, consuming that prefix. So when the input is 1234, it sees 123 as the longest prefix that matches the DIGIT pattern (which should really be called ODD_INT or something) and produces the corresponding token. Then it sees the remaining 4 and produces an error because no rule matches it.



Note that it's not necessarily only the last digit that produces the error. For the input 1324, it would produce a DIGIT token for 13 and then a token recognition error for 24.



So how can you get the behaviour that you want? One approach would be to rewrite your pattern to match all sequences of digits and then use a semantic predicate to verify that the number is odd. The way that semantic predicates work on lexer rules is that it first takes the longest prefix that matches the pattern (without taking into account the predicate) and then checks the predicate. If the predicate is false, it moves on to the other patterns - it does not try to match the same pattern to a smaller input to make the predicate return true. So for the input 1234, the pattern would match the entire number and then the predicate would return false. Then it would try the other patterns, none of which match, so you'd get a token recognition error for the full number.



ODD_INT: ('0'..'9') + { Integer.parseInt(getText()) % 2 == 1 }?;


The down side of this approach is that you'll need to write some language-specific code (and if you're not using Java, you'll need to adjust the above code accordingly).



Alternatively, you could just recognize all integers in the lexer - not just odd ones - and then check whether they're odd later during semantic analysis.



If you do want to check the oddness using patterns only, you can also work around the problem by defining rules for both odd and even integers:



ODD_INT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
EVEN_INT: ('0'..'9') * ('0' | '2' | '4' | '6'| '8');


This way for an input like 1234, the longest match would always be 1234, not 123. It's just that this would match the EVEN_INT pattern, not ODD_INT. So you wouldn't get a token recognition error, but, if you consistently only use ODD_INT in the grammar, you would get an error saying that an ODD_INT was expected, but an EVEN_INT found.






share|improve this answer












The way that tokenization works is that it tries to find the longest prefix of the input that matches any of your regular expressions and then produces the appropriate token, consuming that prefix. So when the input is 1234, it sees 123 as the longest prefix that matches the DIGIT pattern (which should really be called ODD_INT or something) and produces the corresponding token. Then it sees the remaining 4 and produces an error because no rule matches it.



Note that it's not necessarily only the last digit that produces the error. For the input 1324, it would produce a DIGIT token for 13 and then a token recognition error for 24.



So how can you get the behaviour that you want? One approach would be to rewrite your pattern to match all sequences of digits and then use a semantic predicate to verify that the number is odd. The way that semantic predicates work on lexer rules is that it first takes the longest prefix that matches the pattern (without taking into account the predicate) and then checks the predicate. If the predicate is false, it moves on to the other patterns - it does not try to match the same pattern to a smaller input to make the predicate return true. So for the input 1234, the pattern would match the entire number and then the predicate would return false. Then it would try the other patterns, none of which match, so you'd get a token recognition error for the full number.



ODD_INT: ('0'..'9') + { Integer.parseInt(getText()) % 2 == 1 }?;


The down side of this approach is that you'll need to write some language-specific code (and if you're not using Java, you'll need to adjust the above code accordingly).



Alternatively, you could just recognize all integers in the lexer - not just odd ones - and then check whether they're odd later during semantic analysis.



If you do want to check the oddness using patterns only, you can also work around the problem by defining rules for both odd and even integers:



ODD_INT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
EVEN_INT: ('0'..'9') * ('0' | '2' | '4' | '6'| '8');


This way for an input like 1234, the longest match would always be 1234, not 123. It's just that this would match the EVEN_INT pattern, not ODD_INT. So you wouldn't get a token recognition error, but, if you consistently only use ODD_INT in the grammar, you would get an error saying that an ODD_INT was expected, but an EVEN_INT found.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 8 at 18:51









sepp2k

289k36592604




289k36592604












  • You're great! Thank you. Now I understand where the problem was.
    – Cevz
    Nov 8 at 21:56


















  • You're great! Thank you. Now I understand where the problem was.
    – Cevz
    Nov 8 at 21:56
















You're great! Thank you. Now I understand where the problem was.
– Cevz
Nov 8 at 21:56




You're great! Thank you. Now I understand where the problem was.
– Cevz
Nov 8 at 21:56


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206614%2fq-antlr-4-grammar-recognition-of-whole-odd-value-not-only-the-last-digit%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Schultheiß

Verwaltungsgliederung Dänemarks

Liste der Kulturdenkmale in Wilsdruff