Which collation to use so that `ş` and `s` are treated as unique values?

up vote
0
down vote

favorite

The issue is that ş and s are interpreted by MySQL as identical values.

I'm new to MySQL, so I have no idea which collations would view them as unique.

The collations that I've tried using which don't work are:

utf8_general_ci

utf8_unicode_520_ci

utf8mb4_unicode_ci

utf8mb4_unicode_520_ci

Does anybody know which collation to use?

P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode?

edited Nov 9 at 0:03

asked Nov 8 at 23:52

Anthony

1,250324

those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04

@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06

I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07

@Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
– Anthony
Nov 9 at 0:17

That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57

add a comment |

up vote
0
down vote

favorite

The issue is that ş and s are interpreted by MySQL as identical values.

I'm new to MySQL, so I have no idea which collations would view them as unique.

The collations that I've tried using which don't work are:

utf8_general_ci

utf8_unicode_520_ci

utf8mb4_unicode_ci

utf8mb4_unicode_520_ci

Does anybody know which collation to use?

P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode?

edited Nov 9 at 0:03

asked Nov 8 at 23:52

Anthony

1,250324

those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04

@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06

I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07

@Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
– Anthony
Nov 9 at 0:17

That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57

add a comment |

up vote
0
down vote

favorite

The issue is that ş and s are interpreted by MySQL as identical values.

I'm new to MySQL, so I have no idea which collations would view them as unique.

The collations that I've tried using which don't work are:

utf8_general_ci

utf8_unicode_520_ci

utf8mb4_unicode_ci

utf8mb4_unicode_520_ci

Does anybody know which collation to use?

P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode?

edited Nov 9 at 0:03

asked Nov 8 at 23:52

Anthony

1,250324

The issue is that ş and s are interpreted by MySQL as identical values.

I'm new to MySQL, so I have no idea which collations would view them as unique.

The collations that I've tried using which don't work are:

utf8_general_ci

utf8_unicode_520_ci

utf8mb4_unicode_ci

utf8mb4_unicode_520_ci

Does anybody know which collation to use?

P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode?

mysql collation mysql-5.7

edited Nov 9 at 0:03

asked Nov 8 at 23:52

Anthony

1,250324

edited Nov 9 at 0:03

asked Nov 8 at 23:52

Anthony

1,250324

edited Nov 9 at 0:03

asked Nov 8 at 23:52

Anthony

1,250324

asked Nov 8 at 23:52

Anthony

1,250324

asked Nov 8 at 23:52

Anthony

1,250324

those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04

@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06

I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07

@Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
– Anthony
Nov 9 at 0:17

That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57

add a comment |

those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04

@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06

I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07

@Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
– Anthony
Nov 9 at 0:17

That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57

those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04

@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06

I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07

@Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
– Anthony
Nov 9 at 0:17

That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html

(Plus, of course, utf8_bin.)

For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8 is the one to use. In MySQL, it is utf8mb4. The "collations" that are associated with that are named utf8mb4_.... Collations control ordering and equality, as indicated in the first part of your question about s and ş.

MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.

edited Nov 9 at 22:36

answered Nov 9 at 1:07

Rick James

64.4k55593

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53217875%2fwhich-collation-to-use-so-that-%25c5%259f-and-s-are-treated-as-unique-values%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html

(Plus, of course, utf8_bin.)

MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.

edited Nov 9 at 22:36

answered Nov 9 at 1:07

Rick James

64.4k55593

add a comment |

up vote
1
down vote

accepted

utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html

(Plus, of course, utf8_bin.)

MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.

edited Nov 9 at 22:36

answered Nov 9 at 1:07

Rick James

64.4k55593

add a comment |

up vote
1
down vote

accepted

utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html

(Plus, of course, utf8_bin.)

MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.

edited Nov 9 at 22:36

answered Nov 9 at 1:07

Rick James

64.4k55593

utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html

(Plus, of course, utf8_bin.)

MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.

edited Nov 9 at 22:36

answered Nov 9 at 1:07

Rick James

64.4k55593

edited Nov 9 at 22:36

answered Nov 9 at 1:07

Rick James

64.4k55593

answered Nov 9 at 1:07

Rick James

64.4k55593

answered Nov 9 at 1:07

Rick James

64.4k55593

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xtykutl