Which collation to use so that `ş` and `s` are treated as unique values?











up vote
0
down vote

favorite












The issue is that ş and s are interpreted by MySQL as identical values.



I'm new to MySQL, so I have no idea which collations would view them as unique.



The collations that I've tried using which don't work are:




  1. utf8_general_ci

  2. utf8_unicode_520_ci

  3. utf8mb4_unicode_ci

  4. utf8mb4_unicode_520_ci


Does anybody know which collation to use?



P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode?










share|improve this question
























  • those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
    – Anthony
    Nov 9 at 0:04










  • @Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
    – Anthony
    Nov 9 at 0:06










  • I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
    – Anthony
    Nov 9 at 0:07










  • @Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
    – Anthony
    Nov 9 at 0:17










  • That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
    – Anthony
    Nov 9 at 1:57















up vote
0
down vote

favorite












The issue is that ş and s are interpreted by MySQL as identical values.



I'm new to MySQL, so I have no idea which collations would view them as unique.



The collations that I've tried using which don't work are:




  1. utf8_general_ci

  2. utf8_unicode_520_ci

  3. utf8mb4_unicode_ci

  4. utf8mb4_unicode_520_ci


Does anybody know which collation to use?



P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode?










share|improve this question
























  • those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
    – Anthony
    Nov 9 at 0:04










  • @Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
    – Anthony
    Nov 9 at 0:06










  • I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
    – Anthony
    Nov 9 at 0:07










  • @Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
    – Anthony
    Nov 9 at 0:17










  • That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
    – Anthony
    Nov 9 at 1:57













up vote
0
down vote

favorite









up vote
0
down vote

favorite











The issue is that ş and s are interpreted by MySQL as identical values.



I'm new to MySQL, so I have no idea which collations would view them as unique.



The collations that I've tried using which don't work are:




  1. utf8_general_ci

  2. utf8_unicode_520_ci

  3. utf8mb4_unicode_ci

  4. utf8mb4_unicode_520_ci


Does anybody know which collation to use?



P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode?










share|improve this question















The issue is that ş and s are interpreted by MySQL as identical values.



I'm new to MySQL, so I have no idea which collations would view them as unique.



The collations that I've tried using which don't work are:




  1. utf8_general_ci

  2. utf8_unicode_520_ci

  3. utf8mb4_unicode_ci

  4. utf8mb4_unicode_520_ci


Does anybody know which collation to use?



P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode?







mysql collation mysql-5.7






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 9 at 0:03

























asked Nov 8 at 23:52









Anthony

1,250324




1,250324












  • those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
    – Anthony
    Nov 9 at 0:04










  • @Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
    – Anthony
    Nov 9 at 0:06










  • I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
    – Anthony
    Nov 9 at 0:07










  • @Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
    – Anthony
    Nov 9 at 0:17










  • That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
    – Anthony
    Nov 9 at 1:57


















  • those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
    – Anthony
    Nov 9 at 0:04










  • @Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
    – Anthony
    Nov 9 at 0:06










  • I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
    – Anthony
    Nov 9 at 0:07










  • @Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
    – Anthony
    Nov 9 at 0:17










  • That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
    – Anthony
    Nov 9 at 1:57
















those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04




those two characters are considered equivalent, at least by Unicode's collation standard. unicode.org/reports/tr15
– Anthony
Nov 9 at 0:04












@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06




@Anthony i know that because that's the very issue i'm facing right now lol. thanks nonetheless.
– Anthony
Nov 9 at 0:06












I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07




I'm saying there wouldn't be a unicode conforming collation that would treat those characters as unique.
– Anthony
Nov 9 at 0:07












@Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
– Anthony
Nov 9 at 0:17




@Anthony ahh i see what youre saying now. my buddy just actually said unicode_bin will work since it doesn't strip accents
– Anthony
Nov 9 at 0:17












That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57




That makes sense, since unicode_bin only treats characters as code points. But note that that isn't what collation actually is. That the mostly common mistake of how collation works. unicode.org/faq/collation.html
– Anthony
Nov 9 at 1:57












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html



(Plus, of course, utf8_bin.)



For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8 is the one to use. In MySQL, it is utf8mb4. The "collations" that are associated with that are named utf8mb4_.... Collations control ordering and equality, as indicated in the first part of your question about s and ş.



MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53217875%2fwhich-collation-to-use-so-that-%25c5%259f-and-s-are-treated-as-unique-values%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html



    (Plus, of course, utf8_bin.)



    For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8 is the one to use. In MySQL, it is utf8mb4. The "collations" that are associated with that are named utf8mb4_.... Collations control ordering and equality, as indicated in the first part of your question about s and ş.



    MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.






    share|improve this answer



























      up vote
      1
      down vote



      accepted










      utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html



      (Plus, of course, utf8_bin.)



      For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8 is the one to use. In MySQL, it is utf8mb4. The "collations" that are associated with that are named utf8mb4_.... Collations control ordering and equality, as indicated in the first part of your question about s and ş.



      MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.






      share|improve this answer

























        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted






        utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html



        (Plus, of course, utf8_bin.)



        For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8 is the one to use. In MySQL, it is utf8mb4. The "collations" that are associated with that are named utf8mb4_.... Collations control ordering and equality, as indicated in the first part of your question about s and ş.



        MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.






        share|improve this answer














        utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html



        (Plus, of course, utf8_bin.)



        For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8 is the one to use. In MySQL, it is utf8mb4. The "collations" that are associated with that are named utf8mb4_.... Collations control ordering and equality, as indicated in the first part of your question about s and ş.



        MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 9 at 22:36

























        answered Nov 9 at 1:07









        Rick James

        64.4k55593




        64.4k55593






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53217875%2fwhich-collation-to-use-so-that-%25c5%259f-and-s-are-treated-as-unique-values%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Schultheiß

            Verwaltungsgliederung Dänemarks

            Liste der Kulturdenkmale in Wilsdruff