Can Beehive detect a Snowden-like actor?











up vote
51
down vote

favorite
8












In a seminar, one of the Authors of Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks said that this system can prevent actions like Snowden did.



From their articles' conclusions;




Beehive improves on signature-based approaches to detecting security incidents. Instead, it flags suspected security incidents in hosts based on behavioral analysis. In our evaluation, Beehive detected malware infections and policy violations that went otherwise unnoticed by existing, state-of-the-art security tools and personal.




Can Beehive or a similar system prevent Snowden type action?










share|improve this question




















  • 36




    Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
    – forest
    2 days ago






  • 3




    But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
    – kelalaka
    2 days ago








  • 6




    Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
    – forest
    2 days ago








  • 13




    Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
    – Croll
    2 days ago






  • 8




    @Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
    – Anthony Grist
    2 days ago















up vote
51
down vote

favorite
8












In a seminar, one of the Authors of Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks said that this system can prevent actions like Snowden did.



From their articles' conclusions;




Beehive improves on signature-based approaches to detecting security incidents. Instead, it flags suspected security incidents in hosts based on behavioral analysis. In our evaluation, Beehive detected malware infections and policy violations that went otherwise unnoticed by existing, state-of-the-art security tools and personal.




Can Beehive or a similar system prevent Snowden type action?










share|improve this question




















  • 36




    Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
    – forest
    2 days ago






  • 3




    But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
    – kelalaka
    2 days ago








  • 6




    Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
    – forest
    2 days ago








  • 13




    Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
    – Croll
    2 days ago






  • 8




    @Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
    – Anthony Grist
    2 days ago













up vote
51
down vote

favorite
8









up vote
51
down vote

favorite
8






8





In a seminar, one of the Authors of Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks said that this system can prevent actions like Snowden did.



From their articles' conclusions;




Beehive improves on signature-based approaches to detecting security incidents. Instead, it flags suspected security incidents in hosts based on behavioral analysis. In our evaluation, Beehive detected malware infections and policy violations that went otherwise unnoticed by existing, state-of-the-art security tools and personal.




Can Beehive or a similar system prevent Snowden type action?










share|improve this question















In a seminar, one of the Authors of Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks said that this system can prevent actions like Snowden did.



From their articles' conclusions;




Beehive improves on signature-based approaches to detecting security incidents. Instead, it flags suspected security incidents in hosts based on behavioral analysis. In our evaluation, Beehive detected malware infections and policy violations that went otherwise unnoticed by existing, state-of-the-art security tools and personal.




Can Beehive or a similar system prevent Snowden type action?







malware antimalware corporate-policy detection incident-response






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









Johnny

468113




468113










asked 2 days ago









kelalaka

4511310




4511310








  • 36




    Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
    – forest
    2 days ago






  • 3




    But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
    – kelalaka
    2 days ago








  • 6




    Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
    – forest
    2 days ago








  • 13




    Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
    – Croll
    2 days ago






  • 8




    @Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
    – Anthony Grist
    2 days ago














  • 36




    Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
    – forest
    2 days ago






  • 3




    But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
    – kelalaka
    2 days ago








  • 6




    Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
    – forest
    2 days ago








  • 13




    Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
    – Croll
    2 days ago






  • 8




    @Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
    – Anthony Grist
    2 days ago








36




36




Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
– forest
2 days ago




Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
– forest
2 days ago




3




3




But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
– kelalaka
2 days ago






But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
– kelalaka
2 days ago






6




6




Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
– forest
2 days ago






Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
– forest
2 days ago






13




13




Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
– Croll
2 days ago




Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
– Croll
2 days ago




8




8




@Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
– Anthony Grist
2 days ago




@Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
– Anthony Grist
2 days ago










5 Answers
5






active

oldest

votes

















up vote
121
down vote













A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.



Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.



If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.






share|improve this answer

















  • 149




    TL;dr: you can't protect yourself against yourself.
    – Braiam
    2 days ago






  • 1




    Comments are not for extended discussion; this conversation has been moved to chat.
    – Jeff Ferland
    yesterday


















up vote
16
down vote













Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.



It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.



Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.



In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.






share|improve this answer



















  • 2




    And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
    – Nelson
    yesterday






  • 5




    @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
    – Lightness Races in Orbit
    yesterday


















up vote
12
down vote













Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.



I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.






share|improve this answer








New contributor




RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    up vote
    8
    down vote













    No it can't.



    And the quote that you pulled clearly explained why not, and how people came to claim that it could.



    What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)



    What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different.
    Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis.
    Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.



    [Just imagine where Beehive comes in:



    Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?



    This is far too late (and it assumes that the logs are evaluated in real-time]



    Logs are for retroactive investigation, not real-time intervention.



    What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed.
    The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]






    share|improve this answer








    New contributor




    Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.














    • 6




      And the false positives. Job promotions will be a nightmare, as will department changes.
      – Nelson
      yesterday












    • As a sysadmin, could one simple alter the logs?
      – paulj
      yesterday










    • @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
      – forest
      14 hours ago










    • Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
      – jpaugh
      4 hours ago




















    up vote
    1
    down vote













    First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.



    With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.



    Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.



    With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.



    Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and normal real world use of a network in real time.






    share|improve this answer










    New contributor




    Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.


















      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "162"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














       

      draft saved


      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f197169%2fcan-beehive-detect-a-snowden-like-actor%23new-answer', 'question_page');
      }
      );

      Post as a guest
































      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      121
      down vote













      A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.



      Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.



      If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.






      share|improve this answer

















      • 149




        TL;dr: you can't protect yourself against yourself.
        – Braiam
        2 days ago






      • 1




        Comments are not for extended discussion; this conversation has been moved to chat.
        – Jeff Ferland
        yesterday















      up vote
      121
      down vote













      A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.



      Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.



      If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.






      share|improve this answer

















      • 149




        TL;dr: you can't protect yourself against yourself.
        – Braiam
        2 days ago






      • 1




        Comments are not for extended discussion; this conversation has been moved to chat.
        – Jeff Ferland
        yesterday













      up vote
      121
      down vote










      up vote
      121
      down vote









      A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.



      Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.



      If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.






      share|improve this answer












      A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.



      Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.



      If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 2 days ago









      ThoriumBR

      19.9k54868




      19.9k54868








      • 149




        TL;dr: you can't protect yourself against yourself.
        – Braiam
        2 days ago






      • 1




        Comments are not for extended discussion; this conversation has been moved to chat.
        – Jeff Ferland
        yesterday














      • 149




        TL;dr: you can't protect yourself against yourself.
        – Braiam
        2 days ago






      • 1




        Comments are not for extended discussion; this conversation has been moved to chat.
        – Jeff Ferland
        yesterday








      149




      149




      TL;dr: you can't protect yourself against yourself.
      – Braiam
      2 days ago




      TL;dr: you can't protect yourself against yourself.
      – Braiam
      2 days ago




      1




      1




      Comments are not for extended discussion; this conversation has been moved to chat.
      – Jeff Ferland
      yesterday




      Comments are not for extended discussion; this conversation has been moved to chat.
      – Jeff Ferland
      yesterday












      up vote
      16
      down vote













      Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.



      It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.



      Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.



      In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.






      share|improve this answer



















      • 2




        And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
        – Nelson
        yesterday






      • 5




        @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
        – Lightness Races in Orbit
        yesterday















      up vote
      16
      down vote













      Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.



      It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.



      Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.



      In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.






      share|improve this answer



















      • 2




        And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
        – Nelson
        yesterday






      • 5




        @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
        – Lightness Races in Orbit
        yesterday













      up vote
      16
      down vote










      up vote
      16
      down vote









      Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.



      It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.



      Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.



      In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.






      share|improve this answer














      Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.



      It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.



      Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.



      In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 2 days ago

























      answered 2 days ago









      Steffen Ullrich

      110k12191256




      110k12191256








      • 2




        And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
        – Nelson
        yesterday






      • 5




        @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
        – Lightness Races in Orbit
        yesterday














      • 2




        And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
        – Nelson
        yesterday






      • 5




        @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
        – Lightness Races in Orbit
        yesterday








      2




      2




      And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
      – Nelson
      yesterday




      And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
      – Nelson
      yesterday




      5




      5




      @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
      – Lightness Races in Orbit
      yesterday




      @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
      – Lightness Races in Orbit
      yesterday










      up vote
      12
      down vote













      Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.



      I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.






      share|improve this answer








      New contributor




      RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















        up vote
        12
        down vote













        Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.



        I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.






        share|improve this answer








        New contributor




        RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.




















          up vote
          12
          down vote










          up vote
          12
          down vote









          Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.



          I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.






          share|improve this answer








          New contributor




          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.



          I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.







          share|improve this answer








          New contributor




          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer






          New contributor




          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered 2 days ago









          RG1

          1312




          1312




          New contributor




          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






















              up vote
              8
              down vote













              No it can't.



              And the quote that you pulled clearly explained why not, and how people came to claim that it could.



              What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)



              What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different.
              Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis.
              Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.



              [Just imagine where Beehive comes in:



              Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?



              This is far too late (and it assumes that the logs are evaluated in real-time]



              Logs are for retroactive investigation, not real-time intervention.



              What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed.
              The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]






              share|improve this answer








              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.














              • 6




                And the false positives. Job promotions will be a nightmare, as will department changes.
                – Nelson
                yesterday












              • As a sysadmin, could one simple alter the logs?
                – paulj
                yesterday










              • @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
                – forest
                14 hours ago










              • Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
                – jpaugh
                4 hours ago

















              up vote
              8
              down vote













              No it can't.



              And the quote that you pulled clearly explained why not, and how people came to claim that it could.



              What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)



              What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different.
              Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis.
              Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.



              [Just imagine where Beehive comes in:



              Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?



              This is far too late (and it assumes that the logs are evaluated in real-time]



              Logs are for retroactive investigation, not real-time intervention.



              What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed.
              The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]






              share|improve this answer








              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.














              • 6




                And the false positives. Job promotions will be a nightmare, as will department changes.
                – Nelson
                yesterday












              • As a sysadmin, could one simple alter the logs?
                – paulj
                yesterday










              • @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
                – forest
                14 hours ago










              • Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
                – jpaugh
                4 hours ago















              up vote
              8
              down vote










              up vote
              8
              down vote









              No it can't.



              And the quote that you pulled clearly explained why not, and how people came to claim that it could.



              What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)



              What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different.
              Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis.
              Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.



              [Just imagine where Beehive comes in:



              Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?



              This is far too late (and it assumes that the logs are evaluated in real-time]



              Logs are for retroactive investigation, not real-time intervention.



              What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed.
              The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]






              share|improve this answer








              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              No it can't.



              And the quote that you pulled clearly explained why not, and how people came to claim that it could.



              What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)



              What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different.
              Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis.
              Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.



              [Just imagine where Beehive comes in:



              Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?



              This is far too late (and it assumes that the logs are evaluated in real-time]



              Logs are for retroactive investigation, not real-time intervention.



              What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed.
              The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]







              share|improve this answer








              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              share|improve this answer



              share|improve this answer






              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              answered 2 days ago









              Hobbamok

              1813




              1813




              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.





              New contributor





              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.








              • 6




                And the false positives. Job promotions will be a nightmare, as will department changes.
                – Nelson
                yesterday












              • As a sysadmin, could one simple alter the logs?
                – paulj
                yesterday










              • @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
                – forest
                14 hours ago










              • Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
                – jpaugh
                4 hours ago
















              • 6




                And the false positives. Job promotions will be a nightmare, as will department changes.
                – Nelson
                yesterday












              • As a sysadmin, could one simple alter the logs?
                – paulj
                yesterday










              • @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
                – forest
                14 hours ago










              • Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
                – jpaugh
                4 hours ago










              6




              6




              And the false positives. Job promotions will be a nightmare, as will department changes.
              – Nelson
              yesterday






              And the false positives. Job promotions will be a nightmare, as will department changes.
              – Nelson
              yesterday














              As a sysadmin, could one simple alter the logs?
              – paulj
              yesterday




              As a sysadmin, could one simple alter the logs?
              – paulj
              yesterday












              @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
              – forest
              14 hours ago




              @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
              – forest
              14 hours ago












              Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
              – jpaugh
              4 hours ago






              Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
              – jpaugh
              4 hours ago












              up vote
              1
              down vote













              First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.



              With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.



              Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.



              With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.



              Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and normal real world use of a network in real time.






              share|improve this answer










              New contributor




              Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






















                up vote
                1
                down vote













                First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.



                With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.



                Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.



                With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.



                Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and normal real world use of a network in real time.






                share|improve this answer










                New contributor




                Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.




















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.



                  With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.



                  Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.



                  With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.



                  Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and normal real world use of a network in real time.






                  share|improve this answer










                  New contributor




                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.



                  With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.



                  Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.



                  With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.



                  Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and normal real world use of a network in real time.







                  share|improve this answer










                  New contributor




                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer








                  edited 8 mins ago





















                  New contributor




                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 13 mins ago









                  Cliff AB

                  1113




                  1113




                  New contributor




                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






























                       

                      draft saved


                      draft discarded



















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f197169%2fcan-beehive-detect-a-snowden-like-actor%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest




















































































                      Popular posts from this blog

                      Schultheiß

                      Verwaltungsgliederung Dänemarks

                      Liste der Kulturdenkmale in Wilsdruff