If one value of bias is high and one is low then what indication it gives us?











up vote
1
down vote

favorite












I am dealing with fully connected neural network. Where I initialize bias with zero weights. But during training process one bias adopts a high positive value and other adopts negative value. I want to classify my data into two classes. I want to know what these bias values tell us?How they can help in classification problem?










share|cite|improve this question













migrated from stackoverflow.com Nov 10 at 10:01


This question came from our site for professional and enthusiast programmers.



















    up vote
    1
    down vote

    favorite












    I am dealing with fully connected neural network. Where I initialize bias with zero weights. But during training process one bias adopts a high positive value and other adopts negative value. I want to classify my data into two classes. I want to know what these bias values tell us?How they can help in classification problem?










    share|cite|improve this question













    migrated from stackoverflow.com Nov 10 at 10:01


    This question came from our site for professional and enthusiast programmers.

















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I am dealing with fully connected neural network. Where I initialize bias with zero weights. But during training process one bias adopts a high positive value and other adopts negative value. I want to classify my data into two classes. I want to know what these bias values tell us?How they can help in classification problem?










      share|cite|improve this question













      I am dealing with fully connected neural network. Where I initialize bias with zero weights. But during training process one bias adopts a high positive value and other adopts negative value. I want to classify my data into two classes. I want to know what these bias values tell us?How they can help in classification problem?







      machine-learning






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Nov 10 at 4:22







      R.joe











      migrated from stackoverflow.com Nov 10 at 10:01


      This question came from our site for professional and enthusiast programmers.






      migrated from stackoverflow.com Nov 10 at 10:01


      This question came from our site for professional and enthusiast programmers.
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          The biases cannot be interpreted independently of the weights. It could just mean your weights are getting quite large, and so the biases get large as well. If you use weight decay properly, though, this shouldn't happen. Assuming you did that, then at first glance, it seems that it means your data can be well separated into two classes. Does your test accuracy reflect this? Can you get a high classification accuracy on the test set?



          However, in general it is not a good idea to over-interpret the weights and biases of a neural network. They are created by a random highly non-convex gradient descent. If you were to run the optimization again, you would get different weights and biases, even if you got the same level of accuracy. It is better to try to run experiments on the output of the neural network to see if it does what you want, and not to try to interpret the individual weights.






          share|cite|improve this answer





















          • Actually I'm classifying data into two classes. I'm not using any hidden layer. Simply input nodes and two 2 output nodes. I'm not sure that should I use 2 biases in this case attached to the 2 output nodes or not? My model is showing 98 percent accuracy in this case. I want to figure out why is it so?Why this network is performing so well. If I remove bias its accuracy suddenly decreases. Beside I'm initializing biases with zero value.
            – R.joe
            Nov 10 at 4:46










          • Without a hidden layer, isn't it just a linear classifier? How do you apply the non-linearity? And the bias initialization shouldn't matter too much since if you are training with SGD it will converge to high-performing values
            – Stephen Phillips
            Nov 10 at 4:51










          • Yes I'm not adding any hidden layer. My network simply consists of input node and 2 output nodes. On the output nodes i use softmax as an activation function. But I also add two biases. Should I add theses 2 biases? If I remove biases network performace is too low. But with biases it gives 98 percent accuracy.I'm curious to know what's going on here? Why these biases are important
            – R.joe
            Nov 10 at 4:56










          • Secondly after training both biases have same value but with opposite signs.One is positive and other is negative. What they tell us?
            – R.joe
            Nov 10 at 4:57






          • 1




            What you are doing is not a neural network - it is called logistic regression. I assume you are using cross entropy as your loss? Either way the biases are appropriate. Again, assuming you have weight decay and your weights are not too big, the large different in bias just means that your variables are well separated. If this is indeed the case you should expect that your first set of weights are just the negative of the other, and thus the bias is the negative of the other (up to noise)
            – Stephen Phillips
            Nov 10 at 5:00











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "65"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376305%2fif-one-value-of-bias-is-high-and-one-is-low-then-what-indication-it-gives-us%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown
























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          The biases cannot be interpreted independently of the weights. It could just mean your weights are getting quite large, and so the biases get large as well. If you use weight decay properly, though, this shouldn't happen. Assuming you did that, then at first glance, it seems that it means your data can be well separated into two classes. Does your test accuracy reflect this? Can you get a high classification accuracy on the test set?



          However, in general it is not a good idea to over-interpret the weights and biases of a neural network. They are created by a random highly non-convex gradient descent. If you were to run the optimization again, you would get different weights and biases, even if you got the same level of accuracy. It is better to try to run experiments on the output of the neural network to see if it does what you want, and not to try to interpret the individual weights.






          share|cite|improve this answer





















          • Actually I'm classifying data into two classes. I'm not using any hidden layer. Simply input nodes and two 2 output nodes. I'm not sure that should I use 2 biases in this case attached to the 2 output nodes or not? My model is showing 98 percent accuracy in this case. I want to figure out why is it so?Why this network is performing so well. If I remove bias its accuracy suddenly decreases. Beside I'm initializing biases with zero value.
            – R.joe
            Nov 10 at 4:46










          • Without a hidden layer, isn't it just a linear classifier? How do you apply the non-linearity? And the bias initialization shouldn't matter too much since if you are training with SGD it will converge to high-performing values
            – Stephen Phillips
            Nov 10 at 4:51










          • Yes I'm not adding any hidden layer. My network simply consists of input node and 2 output nodes. On the output nodes i use softmax as an activation function. But I also add two biases. Should I add theses 2 biases? If I remove biases network performace is too low. But with biases it gives 98 percent accuracy.I'm curious to know what's going on here? Why these biases are important
            – R.joe
            Nov 10 at 4:56










          • Secondly after training both biases have same value but with opposite signs.One is positive and other is negative. What they tell us?
            – R.joe
            Nov 10 at 4:57






          • 1




            What you are doing is not a neural network - it is called logistic regression. I assume you are using cross entropy as your loss? Either way the biases are appropriate. Again, assuming you have weight decay and your weights are not too big, the large different in bias just means that your variables are well separated. If this is indeed the case you should expect that your first set of weights are just the negative of the other, and thus the bias is the negative of the other (up to noise)
            – Stephen Phillips
            Nov 10 at 5:00















          up vote
          0
          down vote













          The biases cannot be interpreted independently of the weights. It could just mean your weights are getting quite large, and so the biases get large as well. If you use weight decay properly, though, this shouldn't happen. Assuming you did that, then at first glance, it seems that it means your data can be well separated into two classes. Does your test accuracy reflect this? Can you get a high classification accuracy on the test set?



          However, in general it is not a good idea to over-interpret the weights and biases of a neural network. They are created by a random highly non-convex gradient descent. If you were to run the optimization again, you would get different weights and biases, even if you got the same level of accuracy. It is better to try to run experiments on the output of the neural network to see if it does what you want, and not to try to interpret the individual weights.






          share|cite|improve this answer





















          • Actually I'm classifying data into two classes. I'm not using any hidden layer. Simply input nodes and two 2 output nodes. I'm not sure that should I use 2 biases in this case attached to the 2 output nodes or not? My model is showing 98 percent accuracy in this case. I want to figure out why is it so?Why this network is performing so well. If I remove bias its accuracy suddenly decreases. Beside I'm initializing biases with zero value.
            – R.joe
            Nov 10 at 4:46










          • Without a hidden layer, isn't it just a linear classifier? How do you apply the non-linearity? And the bias initialization shouldn't matter too much since if you are training with SGD it will converge to high-performing values
            – Stephen Phillips
            Nov 10 at 4:51










          • Yes I'm not adding any hidden layer. My network simply consists of input node and 2 output nodes. On the output nodes i use softmax as an activation function. But I also add two biases. Should I add theses 2 biases? If I remove biases network performace is too low. But with biases it gives 98 percent accuracy.I'm curious to know what's going on here? Why these biases are important
            – R.joe
            Nov 10 at 4:56










          • Secondly after training both biases have same value but with opposite signs.One is positive and other is negative. What they tell us?
            – R.joe
            Nov 10 at 4:57






          • 1




            What you are doing is not a neural network - it is called logistic regression. I assume you are using cross entropy as your loss? Either way the biases are appropriate. Again, assuming you have weight decay and your weights are not too big, the large different in bias just means that your variables are well separated. If this is indeed the case you should expect that your first set of weights are just the negative of the other, and thus the bias is the negative of the other (up to noise)
            – Stephen Phillips
            Nov 10 at 5:00













          up vote
          0
          down vote










          up vote
          0
          down vote









          The biases cannot be interpreted independently of the weights. It could just mean your weights are getting quite large, and so the biases get large as well. If you use weight decay properly, though, this shouldn't happen. Assuming you did that, then at first glance, it seems that it means your data can be well separated into two classes. Does your test accuracy reflect this? Can you get a high classification accuracy on the test set?



          However, in general it is not a good idea to over-interpret the weights and biases of a neural network. They are created by a random highly non-convex gradient descent. If you were to run the optimization again, you would get different weights and biases, even if you got the same level of accuracy. It is better to try to run experiments on the output of the neural network to see if it does what you want, and not to try to interpret the individual weights.






          share|cite|improve this answer












          The biases cannot be interpreted independently of the weights. It could just mean your weights are getting quite large, and so the biases get large as well. If you use weight decay properly, though, this shouldn't happen. Assuming you did that, then at first glance, it seems that it means your data can be well separated into two classes. Does your test accuracy reflect this? Can you get a high classification accuracy on the test set?



          However, in general it is not a good idea to over-interpret the weights and biases of a neural network. They are created by a random highly non-convex gradient descent. If you were to run the optimization again, you would get different weights and biases, even if you got the same level of accuracy. It is better to try to run experiments on the output of the neural network to see if it does what you want, and not to try to interpret the individual weights.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered Nov 10 at 4:39









          Stephen Phillips

          1




          1












          • Actually I'm classifying data into two classes. I'm not using any hidden layer. Simply input nodes and two 2 output nodes. I'm not sure that should I use 2 biases in this case attached to the 2 output nodes or not? My model is showing 98 percent accuracy in this case. I want to figure out why is it so?Why this network is performing so well. If I remove bias its accuracy suddenly decreases. Beside I'm initializing biases with zero value.
            – R.joe
            Nov 10 at 4:46










          • Without a hidden layer, isn't it just a linear classifier? How do you apply the non-linearity? And the bias initialization shouldn't matter too much since if you are training with SGD it will converge to high-performing values
            – Stephen Phillips
            Nov 10 at 4:51










          • Yes I'm not adding any hidden layer. My network simply consists of input node and 2 output nodes. On the output nodes i use softmax as an activation function. But I also add two biases. Should I add theses 2 biases? If I remove biases network performace is too low. But with biases it gives 98 percent accuracy.I'm curious to know what's going on here? Why these biases are important
            – R.joe
            Nov 10 at 4:56










          • Secondly after training both biases have same value but with opposite signs.One is positive and other is negative. What they tell us?
            – R.joe
            Nov 10 at 4:57






          • 1




            What you are doing is not a neural network - it is called logistic regression. I assume you are using cross entropy as your loss? Either way the biases are appropriate. Again, assuming you have weight decay and your weights are not too big, the large different in bias just means that your variables are well separated. If this is indeed the case you should expect that your first set of weights are just the negative of the other, and thus the bias is the negative of the other (up to noise)
            – Stephen Phillips
            Nov 10 at 5:00


















          • Actually I'm classifying data into two classes. I'm not using any hidden layer. Simply input nodes and two 2 output nodes. I'm not sure that should I use 2 biases in this case attached to the 2 output nodes or not? My model is showing 98 percent accuracy in this case. I want to figure out why is it so?Why this network is performing so well. If I remove bias its accuracy suddenly decreases. Beside I'm initializing biases with zero value.
            – R.joe
            Nov 10 at 4:46










          • Without a hidden layer, isn't it just a linear classifier? How do you apply the non-linearity? And the bias initialization shouldn't matter too much since if you are training with SGD it will converge to high-performing values
            – Stephen Phillips
            Nov 10 at 4:51










          • Yes I'm not adding any hidden layer. My network simply consists of input node and 2 output nodes. On the output nodes i use softmax as an activation function. But I also add two biases. Should I add theses 2 biases? If I remove biases network performace is too low. But with biases it gives 98 percent accuracy.I'm curious to know what's going on here? Why these biases are important
            – R.joe
            Nov 10 at 4:56










          • Secondly after training both biases have same value but with opposite signs.One is positive and other is negative. What they tell us?
            – R.joe
            Nov 10 at 4:57






          • 1




            What you are doing is not a neural network - it is called logistic regression. I assume you are using cross entropy as your loss? Either way the biases are appropriate. Again, assuming you have weight decay and your weights are not too big, the large different in bias just means that your variables are well separated. If this is indeed the case you should expect that your first set of weights are just the negative of the other, and thus the bias is the negative of the other (up to noise)
            – Stephen Phillips
            Nov 10 at 5:00
















          Actually I'm classifying data into two classes. I'm not using any hidden layer. Simply input nodes and two 2 output nodes. I'm not sure that should I use 2 biases in this case attached to the 2 output nodes or not? My model is showing 98 percent accuracy in this case. I want to figure out why is it so?Why this network is performing so well. If I remove bias its accuracy suddenly decreases. Beside I'm initializing biases with zero value.
          – R.joe
          Nov 10 at 4:46




          Actually I'm classifying data into two classes. I'm not using any hidden layer. Simply input nodes and two 2 output nodes. I'm not sure that should I use 2 biases in this case attached to the 2 output nodes or not? My model is showing 98 percent accuracy in this case. I want to figure out why is it so?Why this network is performing so well. If I remove bias its accuracy suddenly decreases. Beside I'm initializing biases with zero value.
          – R.joe
          Nov 10 at 4:46












          Without a hidden layer, isn't it just a linear classifier? How do you apply the non-linearity? And the bias initialization shouldn't matter too much since if you are training with SGD it will converge to high-performing values
          – Stephen Phillips
          Nov 10 at 4:51




          Without a hidden layer, isn't it just a linear classifier? How do you apply the non-linearity? And the bias initialization shouldn't matter too much since if you are training with SGD it will converge to high-performing values
          – Stephen Phillips
          Nov 10 at 4:51












          Yes I'm not adding any hidden layer. My network simply consists of input node and 2 output nodes. On the output nodes i use softmax as an activation function. But I also add two biases. Should I add theses 2 biases? If I remove biases network performace is too low. But with biases it gives 98 percent accuracy.I'm curious to know what's going on here? Why these biases are important
          – R.joe
          Nov 10 at 4:56




          Yes I'm not adding any hidden layer. My network simply consists of input node and 2 output nodes. On the output nodes i use softmax as an activation function. But I also add two biases. Should I add theses 2 biases? If I remove biases network performace is too low. But with biases it gives 98 percent accuracy.I'm curious to know what's going on here? Why these biases are important
          – R.joe
          Nov 10 at 4:56












          Secondly after training both biases have same value but with opposite signs.One is positive and other is negative. What they tell us?
          – R.joe
          Nov 10 at 4:57




          Secondly after training both biases have same value but with opposite signs.One is positive and other is negative. What they tell us?
          – R.joe
          Nov 10 at 4:57




          1




          1




          What you are doing is not a neural network - it is called logistic regression. I assume you are using cross entropy as your loss? Either way the biases are appropriate. Again, assuming you have weight decay and your weights are not too big, the large different in bias just means that your variables are well separated. If this is indeed the case you should expect that your first set of weights are just the negative of the other, and thus the bias is the negative of the other (up to noise)
          – Stephen Phillips
          Nov 10 at 5:00




          What you are doing is not a neural network - it is called logistic regression. I assume you are using cross entropy as your loss? Either way the biases are appropriate. Again, assuming you have weight decay and your weights are not too big, the large different in bias just means that your variables are well separated. If this is indeed the case you should expect that your first set of weights are just the negative of the other, and thus the bias is the negative of the other (up to noise)
          – Stephen Phillips
          Nov 10 at 5:00


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Cross Validated!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376305%2fif-one-value-of-bias-is-high-and-one-is-low-then-what-indication-it-gives-us%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Schultheiß

          Verwaltungsgliederung Dänemarks

          Liste der Kulturdenkmale in Wilsdruff