Which collation to use so that `ş` and `s` are treated as unique values?
up vote
0
down vote
favorite
The issue is that ş
and s
are interpreted by MySQL as identical values.
I'm new to MySQL, so I have no idea which collations would view them as unique.
The collations that I've tried using which don't work are:
utf8_general_ci
utf8_unicode_520_ci
utf8mb4_unicode_ci
utf8mb4_unicode_520_ci
Does anybody know which collation to use?
P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode
?
mysql collation mysql-5.7
add a comment |
up vote
0
down vote
favorite
The issue is that ş
and s
are interpreted by MySQL as identical values.
I'm new to MySQL, so I have no idea which collations would view them as unique.
The collations that I've tried using which don't work are:
utf8_general_ci
utf8_unicode_520_ci
utf8mb4_unicode_ci
utf8mb4_unicode_520_ci
Does anybody know which collation to use?
P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode
?
mysql collation mysql-5.7
those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04
@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06
I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07
@Anthony ahh i see what youre saying now. my buddy just actually saidunicode_bin
will work since it doesn't strip accents
– Anthony
Nov 9 at 0:17
That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
The issue is that ş
and s
are interpreted by MySQL as identical values.
I'm new to MySQL, so I have no idea which collations would view them as unique.
The collations that I've tried using which don't work are:
utf8_general_ci
utf8_unicode_520_ci
utf8mb4_unicode_ci
utf8mb4_unicode_520_ci
Does anybody know which collation to use?
P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode
?
mysql collation mysql-5.7
The issue is that ş
and s
are interpreted by MySQL as identical values.
I'm new to MySQL, so I have no idea which collations would view them as unique.
The collations that I've tried using which don't work are:
utf8_general_ci
utf8_unicode_520_ci
utf8mb4_unicode_ci
utf8mb4_unicode_520_ci
Does anybody know which collation to use?
P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode
?
mysql collation mysql-5.7
mysql collation mysql-5.7
edited Nov 9 at 0:03
asked Nov 8 at 23:52
Anthony
1,250324
1,250324
those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04
@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06
I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07
@Anthony ahh i see what youre saying now. my buddy just actually saidunicode_bin
will work since it doesn't strip accents
– Anthony
Nov 9 at 0:17
That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57
add a comment |
those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04
@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06
I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07
@Anthony ahh i see what youre saying now. my buddy just actually saidunicode_bin
will work since it doesn't strip accents
– Anthony
Nov 9 at 0:17
That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57
those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04
those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04
@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06
@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06
I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07
I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07
@Anthony ahh i see what youre saying now. my buddy just actually said
unicode_bin
will work since it doesn't strip accents– Anthony
Nov 9 at 0:17
@Anthony ahh i see what youre saying now. my buddy just actually said
unicode_bin
will work since it doesn't strip accents– Anthony
Nov 9 at 0:17
That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57
That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
utf8_turkish_ci
and utf8_romanian_ci
-- as shown in http://mysql.rjweb.org/utf8_collations.html
(Plus, of course, utf8_bin
.)
For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8
is the one to use. In MySQL, it is utf8mb4
. The "collations" that are associated with that are named utf8mb4_...
. Collations control ordering and equality, as indicated in the first part of your question about s
and ş
.
MySQL's CHARACTER SET utf8
is a subset of utf8mb4
. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
utf8_turkish_ci
and utf8_romanian_ci
-- as shown in http://mysql.rjweb.org/utf8_collations.html
(Plus, of course, utf8_bin
.)
For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8
is the one to use. In MySQL, it is utf8mb4
. The "collations" that are associated with that are named utf8mb4_...
. Collations control ordering and equality, as indicated in the first part of your question about s
and ş
.
MySQL's CHARACTER SET utf8
is a subset of utf8mb4
. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.
add a comment |
up vote
1
down vote
accepted
utf8_turkish_ci
and utf8_romanian_ci
-- as shown in http://mysql.rjweb.org/utf8_collations.html
(Plus, of course, utf8_bin
.)
For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8
is the one to use. In MySQL, it is utf8mb4
. The "collations" that are associated with that are named utf8mb4_...
. Collations control ordering and equality, as indicated in the first part of your question about s
and ş
.
MySQL's CHARACTER SET utf8
is a subset of utf8mb4
. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
utf8_turkish_ci
and utf8_romanian_ci
-- as shown in http://mysql.rjweb.org/utf8_collations.html
(Plus, of course, utf8_bin
.)
For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8
is the one to use. In MySQL, it is utf8mb4
. The "collations" that are associated with that are named utf8mb4_...
. Collations control ordering and equality, as indicated in the first part of your question about s
and ş
.
MySQL's CHARACTER SET utf8
is a subset of utf8mb4
. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.
utf8_turkish_ci
and utf8_romanian_ci
-- as shown in http://mysql.rjweb.org/utf8_collations.html
(Plus, of course, utf8_bin
.)
For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8
is the one to use. In MySQL, it is utf8mb4
. The "collations" that are associated with that are named utf8mb4_...
. Collations control ordering and equality, as indicated in the first part of your question about s
and ş
.
MySQL's CHARACTER SET utf8
is a subset of utf8mb4
. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.
edited Nov 9 at 22:36
answered Nov 9 at 1:07
Rick James
64.4k55593
64.4k55593
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53217875%2fwhich-collation-to-use-so-that-%25c5%259f-and-s-are-treated-as-unique-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04
@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06
I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07
@Anthony ahh i see what youre saying now. my buddy just actually said
unicode_bin
will work since it doesn't strip accents– Anthony
Nov 9 at 0:17
That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57