This learning process through positive and negative reinforcement is similar to how humans learn many tasks . For instance , when a child completes their chores , they might receive positive reinforcement , such as a playdate with friends . Not doing their work could lead to negative reinforcement , such as losing digital device privileges . By mimicking this natural process of learning , DRL provides a promising approach to decision-making in the field of cybersecurity , enabling defenders to quickly adapt to changing situations and respond with greater efficiency .
“ It ’ s the same concept in reinforcement learning ,” says Chatterjee . “ The agent can choose from a set of actions . With each action comes feedback , good or bad , that becomes part of its memory . There ’ s an interplay between exploring new opportunities and exploiting past experiences . The goal is to create an agent that learns to make good decisions .”
To evaluate the efficacy of four deep DRL algorithms , the team leveraged Open AI Gym , an open-source software toolkit , as a foundation for creating a custom and controlled simulation environment .
The researchers incorporated the MITRE ATT & CK framework , developed by MITRE , and included seven tactics and 15 techniques used by three separate adversaries . Defenders were given 23 mitigation actions to halt or prevent an attack from progressing .
106 April 2023