Q: ANTLR 4 Grammar recognition of whole odd value not only the last digit
up vote
0
down vote
favorite
I'm trying to make grammar for the calculator, however it have to be working only for odd numbers.
For example it works like that:
If I put 123
the result is 123
.
If I put 1234
the result is 123
, and the token recognition error at: 4
but should be at: 1234
.
There is my grammar:
grammar G;
DIGIT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
operator : ('+' | '-' | '*' | ':');
result: DIGIT operator (DIGIT | result);
I mean specifically to make that, the 1234
should be recognized as an error, not only the last digit.
grammar antlr4
add a comment |
up vote
0
down vote
favorite
I'm trying to make grammar for the calculator, however it have to be working only for odd numbers.
For example it works like that:
If I put 123
the result is 123
.
If I put 1234
the result is 123
, and the token recognition error at: 4
but should be at: 1234
.
There is my grammar:
grammar G;
DIGIT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
operator : ('+' | '-' | '*' | ':');
result: DIGIT operator (DIGIT | result);
I mean specifically to make that, the 1234
should be recognized as an error, not only the last digit.
grammar antlr4
Welcome to SO! Please, clarify your question.
– fewlinesofcode
Nov 8 at 11:18
what are you trying to accomplish with the definition ofDIGIT
?
– Jonas
Nov 8 at 11:30
I'm trying to get an odd number.
– Cevz
Nov 8 at 11:50
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I'm trying to make grammar for the calculator, however it have to be working only for odd numbers.
For example it works like that:
If I put 123
the result is 123
.
If I put 1234
the result is 123
, and the token recognition error at: 4
but should be at: 1234
.
There is my grammar:
grammar G;
DIGIT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
operator : ('+' | '-' | '*' | ':');
result: DIGIT operator (DIGIT | result);
I mean specifically to make that, the 1234
should be recognized as an error, not only the last digit.
grammar antlr4
I'm trying to make grammar for the calculator, however it have to be working only for odd numbers.
For example it works like that:
If I put 123
the result is 123
.
If I put 1234
the result is 123
, and the token recognition error at: 4
but should be at: 1234
.
There is my grammar:
grammar G;
DIGIT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
operator : ('+' | '-' | '*' | ':');
result: DIGIT operator (DIGIT | result);
I mean specifically to make that, the 1234
should be recognized as an error, not only the last digit.
grammar antlr4
grammar antlr4
edited Nov 8 at 11:54
asked Nov 8 at 11:14
Cevz
12
12
Welcome to SO! Please, clarify your question.
– fewlinesofcode
Nov 8 at 11:18
what are you trying to accomplish with the definition ofDIGIT
?
– Jonas
Nov 8 at 11:30
I'm trying to get an odd number.
– Cevz
Nov 8 at 11:50
add a comment |
Welcome to SO! Please, clarify your question.
– fewlinesofcode
Nov 8 at 11:18
what are you trying to accomplish with the definition ofDIGIT
?
– Jonas
Nov 8 at 11:30
I'm trying to get an odd number.
– Cevz
Nov 8 at 11:50
Welcome to SO! Please, clarify your question.
– fewlinesofcode
Nov 8 at 11:18
Welcome to SO! Please, clarify your question.
– fewlinesofcode
Nov 8 at 11:18
what are you trying to accomplish with the definition of
DIGIT
?– Jonas
Nov 8 at 11:30
what are you trying to accomplish with the definition of
DIGIT
?– Jonas
Nov 8 at 11:30
I'm trying to get an odd number.
– Cevz
Nov 8 at 11:50
I'm trying to get an odd number.
– Cevz
Nov 8 at 11:50
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
The way that tokenization works is that it tries to find the longest prefix of the input that matches any of your regular expressions and then produces the appropriate token, consuming that prefix. So when the input is 1234
, it sees 123
as the longest prefix that matches the DIGIT
pattern (which should really be called ODD_INT
or something) and produces the corresponding token. Then it sees the remaining 4 and produces an error because no rule matches it.
Note that it's not necessarily only the last digit that produces the error. For the input 1324
, it would produce a DIGIT
token for 13
and then a token recognition error for 24
.
So how can you get the behaviour that you want? One approach would be to rewrite your pattern to match all sequences of digits and then use a semantic predicate to verify that the number is odd. The way that semantic predicates work on lexer rules is that it first takes the longest prefix that matches the pattern (without taking into account the predicate) and then checks the predicate. If the predicate is false, it moves on to the other patterns - it does not try to match the same pattern to a smaller input to make the predicate return true. So for the input 1234
, the pattern would match the entire number and then the predicate would return false. Then it would try the other patterns, none of which match, so you'd get a token recognition error for the full number.
ODD_INT: ('0'..'9') + { Integer.parseInt(getText()) % 2 == 1 }?;
The down side of this approach is that you'll need to write some language-specific code (and if you're not using Java, you'll need to adjust the above code accordingly).
Alternatively, you could just recognize all integers in the lexer - not just odd ones - and then check whether they're odd later during semantic analysis.
If you do want to check the oddness using patterns only, you can also work around the problem by defining rules for both odd and even integers:
ODD_INT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
EVEN_INT: ('0'..'9') * ('0' | '2' | '4' | '6'| '8');
This way for an input like 1234
, the longest match would always be 1234
, not 123
. It's just that this would match the EVEN_INT
pattern, not ODD_INT
. So you wouldn't get a token recognition error, but, if you consistently only use ODD_INT
in the grammar, you would get an error saying that an ODD_INT
was expected, but an EVEN_INT
found.
You're great! Thank you. Now I understand where the problem was.
– Cevz
Nov 8 at 21:56
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
The way that tokenization works is that it tries to find the longest prefix of the input that matches any of your regular expressions and then produces the appropriate token, consuming that prefix. So when the input is 1234
, it sees 123
as the longest prefix that matches the DIGIT
pattern (which should really be called ODD_INT
or something) and produces the corresponding token. Then it sees the remaining 4 and produces an error because no rule matches it.
Note that it's not necessarily only the last digit that produces the error. For the input 1324
, it would produce a DIGIT
token for 13
and then a token recognition error for 24
.
So how can you get the behaviour that you want? One approach would be to rewrite your pattern to match all sequences of digits and then use a semantic predicate to verify that the number is odd. The way that semantic predicates work on lexer rules is that it first takes the longest prefix that matches the pattern (without taking into account the predicate) and then checks the predicate. If the predicate is false, it moves on to the other patterns - it does not try to match the same pattern to a smaller input to make the predicate return true. So for the input 1234
, the pattern would match the entire number and then the predicate would return false. Then it would try the other patterns, none of which match, so you'd get a token recognition error for the full number.
ODD_INT: ('0'..'9') + { Integer.parseInt(getText()) % 2 == 1 }?;
The down side of this approach is that you'll need to write some language-specific code (and if you're not using Java, you'll need to adjust the above code accordingly).
Alternatively, you could just recognize all integers in the lexer - not just odd ones - and then check whether they're odd later during semantic analysis.
If you do want to check the oddness using patterns only, you can also work around the problem by defining rules for both odd and even integers:
ODD_INT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
EVEN_INT: ('0'..'9') * ('0' | '2' | '4' | '6'| '8');
This way for an input like 1234
, the longest match would always be 1234
, not 123
. It's just that this would match the EVEN_INT
pattern, not ODD_INT
. So you wouldn't get a token recognition error, but, if you consistently only use ODD_INT
in the grammar, you would get an error saying that an ODD_INT
was expected, but an EVEN_INT
found.
You're great! Thank you. Now I understand where the problem was.
– Cevz
Nov 8 at 21:56
add a comment |
up vote
0
down vote
The way that tokenization works is that it tries to find the longest prefix of the input that matches any of your regular expressions and then produces the appropriate token, consuming that prefix. So when the input is 1234
, it sees 123
as the longest prefix that matches the DIGIT
pattern (which should really be called ODD_INT
or something) and produces the corresponding token. Then it sees the remaining 4 and produces an error because no rule matches it.
Note that it's not necessarily only the last digit that produces the error. For the input 1324
, it would produce a DIGIT
token for 13
and then a token recognition error for 24
.
So how can you get the behaviour that you want? One approach would be to rewrite your pattern to match all sequences of digits and then use a semantic predicate to verify that the number is odd. The way that semantic predicates work on lexer rules is that it first takes the longest prefix that matches the pattern (without taking into account the predicate) and then checks the predicate. If the predicate is false, it moves on to the other patterns - it does not try to match the same pattern to a smaller input to make the predicate return true. So for the input 1234
, the pattern would match the entire number and then the predicate would return false. Then it would try the other patterns, none of which match, so you'd get a token recognition error for the full number.
ODD_INT: ('0'..'9') + { Integer.parseInt(getText()) % 2 == 1 }?;
The down side of this approach is that you'll need to write some language-specific code (and if you're not using Java, you'll need to adjust the above code accordingly).
Alternatively, you could just recognize all integers in the lexer - not just odd ones - and then check whether they're odd later during semantic analysis.
If you do want to check the oddness using patterns only, you can also work around the problem by defining rules for both odd and even integers:
ODD_INT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
EVEN_INT: ('0'..'9') * ('0' | '2' | '4' | '6'| '8');
This way for an input like 1234
, the longest match would always be 1234
, not 123
. It's just that this would match the EVEN_INT
pattern, not ODD_INT
. So you wouldn't get a token recognition error, but, if you consistently only use ODD_INT
in the grammar, you would get an error saying that an ODD_INT
was expected, but an EVEN_INT
found.
You're great! Thank you. Now I understand where the problem was.
– Cevz
Nov 8 at 21:56
add a comment |
up vote
0
down vote
up vote
0
down vote
The way that tokenization works is that it tries to find the longest prefix of the input that matches any of your regular expressions and then produces the appropriate token, consuming that prefix. So when the input is 1234
, it sees 123
as the longest prefix that matches the DIGIT
pattern (which should really be called ODD_INT
or something) and produces the corresponding token. Then it sees the remaining 4 and produces an error because no rule matches it.
Note that it's not necessarily only the last digit that produces the error. For the input 1324
, it would produce a DIGIT
token for 13
and then a token recognition error for 24
.
So how can you get the behaviour that you want? One approach would be to rewrite your pattern to match all sequences of digits and then use a semantic predicate to verify that the number is odd. The way that semantic predicates work on lexer rules is that it first takes the longest prefix that matches the pattern (without taking into account the predicate) and then checks the predicate. If the predicate is false, it moves on to the other patterns - it does not try to match the same pattern to a smaller input to make the predicate return true. So for the input 1234
, the pattern would match the entire number and then the predicate would return false. Then it would try the other patterns, none of which match, so you'd get a token recognition error for the full number.
ODD_INT: ('0'..'9') + { Integer.parseInt(getText()) % 2 == 1 }?;
The down side of this approach is that you'll need to write some language-specific code (and if you're not using Java, you'll need to adjust the above code accordingly).
Alternatively, you could just recognize all integers in the lexer - not just odd ones - and then check whether they're odd later during semantic analysis.
If you do want to check the oddness using patterns only, you can also work around the problem by defining rules for both odd and even integers:
ODD_INT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
EVEN_INT: ('0'..'9') * ('0' | '2' | '4' | '6'| '8');
This way for an input like 1234
, the longest match would always be 1234
, not 123
. It's just that this would match the EVEN_INT
pattern, not ODD_INT
. So you wouldn't get a token recognition error, but, if you consistently only use ODD_INT
in the grammar, you would get an error saying that an ODD_INT
was expected, but an EVEN_INT
found.
The way that tokenization works is that it tries to find the longest prefix of the input that matches any of your regular expressions and then produces the appropriate token, consuming that prefix. So when the input is 1234
, it sees 123
as the longest prefix that matches the DIGIT
pattern (which should really be called ODD_INT
or something) and produces the corresponding token. Then it sees the remaining 4 and produces an error because no rule matches it.
Note that it's not necessarily only the last digit that produces the error. For the input 1324
, it would produce a DIGIT
token for 13
and then a token recognition error for 24
.
So how can you get the behaviour that you want? One approach would be to rewrite your pattern to match all sequences of digits and then use a semantic predicate to verify that the number is odd. The way that semantic predicates work on lexer rules is that it first takes the longest prefix that matches the pattern (without taking into account the predicate) and then checks the predicate. If the predicate is false, it moves on to the other patterns - it does not try to match the same pattern to a smaller input to make the predicate return true. So for the input 1234
, the pattern would match the entire number and then the predicate would return false. Then it would try the other patterns, none of which match, so you'd get a token recognition error for the full number.
ODD_INT: ('0'..'9') + { Integer.parseInt(getText()) % 2 == 1 }?;
The down side of this approach is that you'll need to write some language-specific code (and if you're not using Java, you'll need to adjust the above code accordingly).
Alternatively, you could just recognize all integers in the lexer - not just odd ones - and then check whether they're odd later during semantic analysis.
If you do want to check the oddness using patterns only, you can also work around the problem by defining rules for both odd and even integers:
ODD_INT: ('0'..'9') * ('1' | '3' | '5' | '7'| '9');
EVEN_INT: ('0'..'9') * ('0' | '2' | '4' | '6'| '8');
This way for an input like 1234
, the longest match would always be 1234
, not 123
. It's just that this would match the EVEN_INT
pattern, not ODD_INT
. So you wouldn't get a token recognition error, but, if you consistently only use ODD_INT
in the grammar, you would get an error saying that an ODD_INT
was expected, but an EVEN_INT
found.
answered Nov 8 at 18:51
sepp2k
289k36592604
289k36592604
You're great! Thank you. Now I understand where the problem was.
– Cevz
Nov 8 at 21:56
add a comment |
You're great! Thank you. Now I understand where the problem was.
– Cevz
Nov 8 at 21:56
You're great! Thank you. Now I understand where the problem was.
– Cevz
Nov 8 at 21:56
You're great! Thank you. Now I understand where the problem was.
– Cevz
Nov 8 at 21:56
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206614%2fq-antlr-4-grammar-recognition-of-whole-odd-value-not-only-the-last-digit%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Welcome to SO! Please, clarify your question.
– fewlinesofcode
Nov 8 at 11:18
what are you trying to accomplish with the definition of
DIGIT
?– Jonas
Nov 8 at 11:30
I'm trying to get an odd number.
– Cevz
Nov 8 at 11:50