How to deal with special characters inside XML string attributes?











up vote
0
down vote

favorite












So I have some input from a web form that's sent as XML and goes through an XSS filter that canonicalises/unencodes all of the text before it hits the server. So on the client side we send:



<term><var>x</var><while exp="x&lt;3"><dostuff></dostuff></while></term>



Which turns into



<term><var>x</var><while exp="x<3"><dostuff></dostuff></while></term>



Then when I parse the xml, of course it breaks.



Do I have to step through every attribute and re-encode them, or is there an easy way to do this in groovy/grails?










share|improve this question


























    up vote
    0
    down vote

    favorite












    So I have some input from a web form that's sent as XML and goes through an XSS filter that canonicalises/unencodes all of the text before it hits the server. So on the client side we send:



    <term><var>x</var><while exp="x&lt;3"><dostuff></dostuff></while></term>



    Which turns into



    <term><var>x</var><while exp="x<3"><dostuff></dostuff></while></term>



    Then when I parse the xml, of course it breaks.



    Do I have to step through every attribute and re-encode them, or is there an easy way to do this in groovy/grails?










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      So I have some input from a web form that's sent as XML and goes through an XSS filter that canonicalises/unencodes all of the text before it hits the server. So on the client side we send:



      <term><var>x</var><while exp="x&lt;3"><dostuff></dostuff></while></term>



      Which turns into



      <term><var>x</var><while exp="x<3"><dostuff></dostuff></while></term>



      Then when I parse the xml, of course it breaks.



      Do I have to step through every attribute and re-encode them, or is there an easy way to do this in groovy/grails?










      share|improve this question













      So I have some input from a web form that's sent as XML and goes through an XSS filter that canonicalises/unencodes all of the text before it hits the server. So on the client side we send:



      <term><var>x</var><while exp="x&lt;3"><dostuff></dostuff></while></term>



      Which turns into



      <term><var>x</var><while exp="x<3"><dostuff></dostuff></while></term>



      Then when I parse the xml, of course it breaks.



      Do I have to step through every attribute and re-encode them, or is there an easy way to do this in groovy/grails?







      xml grails groovy






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 8 at 13:31









      jambox

      524413




      524413
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          Whatever filter you're putting it through, it is corrupting your data, so get it fixed or scrap it quickly before it does irreparable harm.



          In the general case, repairing your data isn't possible. If the filter putting unescaped angle brackets into your data stream, you might be able to detect some of the cases, but in the worst case it will be indistinguishable from genuine markup.






          share|improve this answer





















          • Isn't canonicalisation the standard way of preventing xss though? I believe we're using the ESAPI library. I suspect what people usually do is either cook up their own encoding scheme, or reprocess the data once it reaches the server using contextual knowledge.
            – jambox
            Nov 8 at 16:01










          • I don't know the software that you are using, but the evidence from your post is that it is corrupting your XML. It might just be the way it's configured, I don't know, but you need to fix the problem at source rather than trying to repair the damage.
            – Michael Kay
            Nov 8 at 17:24










          • Fine and thanks for the answer. However if you're saying that a common XSS lib is corrupting data then you could back that up a little. What are alternative anti-XSS methods?
            – jambox
            Nov 8 at 17:27










          • I'm only going on the information in your question. Something has corrupted your data and the only thing you have told us about is an "XSS filter".
            – Michael Kay
            Nov 8 at 20:33










          • Well it's ESAPI canonicalize. Like I say, it seems to be the standard way to prevent xss attacks.
            – jambox
            Nov 8 at 21:48











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53208797%2fhow-to-deal-with-special-characters-inside-xml-string-attributes%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          Whatever filter you're putting it through, it is corrupting your data, so get it fixed or scrap it quickly before it does irreparable harm.



          In the general case, repairing your data isn't possible. If the filter putting unescaped angle brackets into your data stream, you might be able to detect some of the cases, but in the worst case it will be indistinguishable from genuine markup.






          share|improve this answer





















          • Isn't canonicalisation the standard way of preventing xss though? I believe we're using the ESAPI library. I suspect what people usually do is either cook up their own encoding scheme, or reprocess the data once it reaches the server using contextual knowledge.
            – jambox
            Nov 8 at 16:01










          • I don't know the software that you are using, but the evidence from your post is that it is corrupting your XML. It might just be the way it's configured, I don't know, but you need to fix the problem at source rather than trying to repair the damage.
            – Michael Kay
            Nov 8 at 17:24










          • Fine and thanks for the answer. However if you're saying that a common XSS lib is corrupting data then you could back that up a little. What are alternative anti-XSS methods?
            – jambox
            Nov 8 at 17:27










          • I'm only going on the information in your question. Something has corrupted your data and the only thing you have told us about is an "XSS filter".
            – Michael Kay
            Nov 8 at 20:33










          • Well it's ESAPI canonicalize. Like I say, it seems to be the standard way to prevent xss attacks.
            – jambox
            Nov 8 at 21:48















          up vote
          0
          down vote













          Whatever filter you're putting it through, it is corrupting your data, so get it fixed or scrap it quickly before it does irreparable harm.



          In the general case, repairing your data isn't possible. If the filter putting unescaped angle brackets into your data stream, you might be able to detect some of the cases, but in the worst case it will be indistinguishable from genuine markup.






          share|improve this answer





















          • Isn't canonicalisation the standard way of preventing xss though? I believe we're using the ESAPI library. I suspect what people usually do is either cook up their own encoding scheme, or reprocess the data once it reaches the server using contextual knowledge.
            – jambox
            Nov 8 at 16:01










          • I don't know the software that you are using, but the evidence from your post is that it is corrupting your XML. It might just be the way it's configured, I don't know, but you need to fix the problem at source rather than trying to repair the damage.
            – Michael Kay
            Nov 8 at 17:24










          • Fine and thanks for the answer. However if you're saying that a common XSS lib is corrupting data then you could back that up a little. What are alternative anti-XSS methods?
            – jambox
            Nov 8 at 17:27










          • I'm only going on the information in your question. Something has corrupted your data and the only thing you have told us about is an "XSS filter".
            – Michael Kay
            Nov 8 at 20:33










          • Well it's ESAPI canonicalize. Like I say, it seems to be the standard way to prevent xss attacks.
            – jambox
            Nov 8 at 21:48













          up vote
          0
          down vote










          up vote
          0
          down vote









          Whatever filter you're putting it through, it is corrupting your data, so get it fixed or scrap it quickly before it does irreparable harm.



          In the general case, repairing your data isn't possible. If the filter putting unescaped angle brackets into your data stream, you might be able to detect some of the cases, but in the worst case it will be indistinguishable from genuine markup.






          share|improve this answer












          Whatever filter you're putting it through, it is corrupting your data, so get it fixed or scrap it quickly before it does irreparable harm.



          In the general case, repairing your data isn't possible. If the filter putting unescaped angle brackets into your data stream, you might be able to detect some of the cases, but in the worst case it will be indistinguishable from genuine markup.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 8 at 15:22









          Michael Kay

          107k657114




          107k657114












          • Isn't canonicalisation the standard way of preventing xss though? I believe we're using the ESAPI library. I suspect what people usually do is either cook up their own encoding scheme, or reprocess the data once it reaches the server using contextual knowledge.
            – jambox
            Nov 8 at 16:01










          • I don't know the software that you are using, but the evidence from your post is that it is corrupting your XML. It might just be the way it's configured, I don't know, but you need to fix the problem at source rather than trying to repair the damage.
            – Michael Kay
            Nov 8 at 17:24










          • Fine and thanks for the answer. However if you're saying that a common XSS lib is corrupting data then you could back that up a little. What are alternative anti-XSS methods?
            – jambox
            Nov 8 at 17:27










          • I'm only going on the information in your question. Something has corrupted your data and the only thing you have told us about is an "XSS filter".
            – Michael Kay
            Nov 8 at 20:33










          • Well it's ESAPI canonicalize. Like I say, it seems to be the standard way to prevent xss attacks.
            – jambox
            Nov 8 at 21:48


















          • Isn't canonicalisation the standard way of preventing xss though? I believe we're using the ESAPI library. I suspect what people usually do is either cook up their own encoding scheme, or reprocess the data once it reaches the server using contextual knowledge.
            – jambox
            Nov 8 at 16:01










          • I don't know the software that you are using, but the evidence from your post is that it is corrupting your XML. It might just be the way it's configured, I don't know, but you need to fix the problem at source rather than trying to repair the damage.
            – Michael Kay
            Nov 8 at 17:24










          • Fine and thanks for the answer. However if you're saying that a common XSS lib is corrupting data then you could back that up a little. What are alternative anti-XSS methods?
            – jambox
            Nov 8 at 17:27










          • I'm only going on the information in your question. Something has corrupted your data and the only thing you have told us about is an "XSS filter".
            – Michael Kay
            Nov 8 at 20:33










          • Well it's ESAPI canonicalize. Like I say, it seems to be the standard way to prevent xss attacks.
            – jambox
            Nov 8 at 21:48
















          Isn't canonicalisation the standard way of preventing xss though? I believe we're using the ESAPI library. I suspect what people usually do is either cook up their own encoding scheme, or reprocess the data once it reaches the server using contextual knowledge.
          – jambox
          Nov 8 at 16:01




          Isn't canonicalisation the standard way of preventing xss though? I believe we're using the ESAPI library. I suspect what people usually do is either cook up their own encoding scheme, or reprocess the data once it reaches the server using contextual knowledge.
          – jambox
          Nov 8 at 16:01












          I don't know the software that you are using, but the evidence from your post is that it is corrupting your XML. It might just be the way it's configured, I don't know, but you need to fix the problem at source rather than trying to repair the damage.
          – Michael Kay
          Nov 8 at 17:24




          I don't know the software that you are using, but the evidence from your post is that it is corrupting your XML. It might just be the way it's configured, I don't know, but you need to fix the problem at source rather than trying to repair the damage.
          – Michael Kay
          Nov 8 at 17:24












          Fine and thanks for the answer. However if you're saying that a common XSS lib is corrupting data then you could back that up a little. What are alternative anti-XSS methods?
          – jambox
          Nov 8 at 17:27




          Fine and thanks for the answer. However if you're saying that a common XSS lib is corrupting data then you could back that up a little. What are alternative anti-XSS methods?
          – jambox
          Nov 8 at 17:27












          I'm only going on the information in your question. Something has corrupted your data and the only thing you have told us about is an "XSS filter".
          – Michael Kay
          Nov 8 at 20:33




          I'm only going on the information in your question. Something has corrupted your data and the only thing you have told us about is an "XSS filter".
          – Michael Kay
          Nov 8 at 20:33












          Well it's ESAPI canonicalize. Like I say, it seems to be the standard way to prevent xss attacks.
          – jambox
          Nov 8 at 21:48




          Well it's ESAPI canonicalize. Like I say, it seems to be the standard way to prevent xss attacks.
          – jambox
          Nov 8 at 21:48


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53208797%2fhow-to-deal-with-special-characters-inside-xml-string-attributes%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Schultheiß

          Verwaltungsgliederung Dänemarks

          Liste der Kulturdenkmale in Wilsdruff