Achieve the Impossible

How AI Can Help Find the Optimal Solution through Iterations of Trial And Error

July 10, 2020 | AI

This post is the sixth entry in our series on AI application patterns across multiple industries. In this post, we will discuss how AI and machine learning can help find the optimal solution through iterations of trial and error.

Understanding Reinforcement Learning: The Machine Learning Type Typically Associated With the Trial and Error Process

When discussing AI application patterns, a key pattern of AI that cannot be missed are goal-driven systems. We describe this as the use of machine learning and other approaches to cognition to provide your agent (AI) with the opportunity to learn by trial and error.

This approach is specifically aimed at finding the best solution to a problem. Iterative problem solving, resource optimization, and scenario simulation are examples of this pattern. Reinforcement learning approaches to machine learning dominate some of the most popular examples and use cases of this pattern.

Reinforcement learning is the process of learning to map from conditions to behavior to optimize a scalar gain or a reinforcing alert. The learning system is not told what action to take, as with other ways of learning. Instead, it must figure out which actions produce the highest reward by attempting them.

For the most fascinating and difficult situations, actions influence not just the instant reward but also the scenario that follows, and all future incentives arising from that. The trial-and-error search and delayed incentive are the two defining characteristics of reinforcement learning. Let’s take a deeper look into reinforcement learning and how it enables goal-driven systems.

The Reinforcement Learning Secrets

Reinforcement learning is probably the hardest field of ML to grasp because you will find that there are so many things happening simultaneously or at the same time.

Chess is a prime example of how the RL algorithm works. The software knows the game rules and how to playand will go through the steps to finish the round. The only piece of information the program gets is whether it has won or lost the match. It continues playing the game until it eventually wins a match, all while keeping track of its positive moves.  This was a non-technical way of explaining how reinforcement learning works. Let’s now take a look at a more technical explanation of RL.

Markov Decision Process

A cycle of trial and error, reinforcement learning involves the agent taking a series of actions in a setting. The agent has a state every single moment, and acts from that specific state to a new one. There may or may not be a reward for this particular behavior.

So, we can conclude that any epoch of learning (or an episode) can be interpreted as a series of states, acts, and incentives. Each state depends only on the preceding states and behavior and is essentially stochastic as the setting. The Markov property is satisfied by this process.

The Markov property implies that the conditional probability distribution of process states in the future is dependent only on the current state and not on the series of events that came before it. This entire process is called the Markov decision process.

The Categories of Reinforcement Learning

Today, there are many different types of reinforcement learning algorithms. Each RL algorithms focuses on something different. While there are many unique RL algorithms, model-based and model-free are the two main categories that reinforcement learning algorithms can be classified into. The following is a brief explanation of each category.

1. Model-Based

These algorithms are designed to learn from their observations how the world (its mechanics) functions, and then prepare a solution by utilizing that model. Once they have a blueprint, they use some form of preparation to find the best strategy. They are known to be efficient in data. However, they struggle when space is too big for the state.  Examples of model-based approaches are dynamic programming methods. This is because they need maximum environmental awareness, such as probabilities of change and incentives.

2. Model-Free

Model-free algorithms do not need to study the environment and preserve all states and behavior combinations. They can be classified into the following two groups, depending upon the training’s ultimate objective.

Policy-Based Methods

They look to find the best strategy, be it deterministic or stochastic. In this group, you will find algorithms such as REINFORCE and policy gradients. The benefits of these algorithms are increased coordination and efficacy in high dimensional or dynamic spaces of operation.

Policy-based approaches are an issue of optimization, where we consider a policy function to its limit. This is why we also use algorithms such as Hill Climbing and Evolution Strategies (ESs).

Value-Based Methods

They look to determine the optimum value. A significant part of this group is an algorithm family called Q-learning, which learns to maximize the Q-value. Q-learning is an important part of Reinforcement learning. Other algorithms include Value Iteration and SARSA.

At the convergence of policy and value-based approaches, we have the Actor-Critical approaches, which seek to maximize both policy and value function.

Understanding Deep Reinforcement Learning

Deep neural networks have been utilized to model environmental dynamics (mode-based), boost policy searches (policy-based), and estimate the value function (value-based). A model called Deep Q Network has resulted from research on the value function. This is a model credited for some of the most impressive developments in the field, along with its many upgrades (e.g. Atari).

Deep reinforcement learning incorporates artificial neural networks into a reinforcement learning system that allows software-defined agents to recognize the optimal behaviors in a simulated environment to achieve their objectives. That is to say it binds function estimation and optimization of targets, connecting pairs of state-action to anticipated incentives.  Is this too complicated for you to understand? Here’s deep reinforcement learning explained in a more layman language.

Deep reinforcement learning is a subset of machine learning and AI where advanced systems can learn from their behaviors in a manner similar to how humans learn from experience. Intrinsic in this form of machine learning is the rewarding or penalizing of an agent based on their behaviors. Actions are rewarded (reinforced) if they get them to the desired result.

A machine continues to learn through a sequence of trials and errors, making this technology perfect for dynamic environments that change continuously. While reinforcement learning has existed for many years, it has been combined with deep learning only recently hand and this has produced remarkable results.

The “deep” part of reinforcement learning indicates several (deep) layers of artificial neural networks that mimic the design of a human brain. Deep learning demands a lot of computing resources and large volumes of training data. However, data volumes have grown exponentially in the past few years, while the cost of computing power has fallen significantly, allowing for the proliferation of deep learning applications.

After the highly-publicized loss of a Go grandmaster at the hands of DeepMind’s AlphaGo, the prospects of deep reinforcement learning were brought to the attention of many. Deep reinforcement learning does not excel at playing Go, but it has also reached human-level mastery in other games as well. These games include Atari, poker, chess, and other video/online games.  The applications of deep reinforcement learning are not limited to games. Their practical applications can be found in the following sectors as well.

Training

AI toolkits like Psychlab, DeepMind Lab, and OpenAI Gym provide the training environment required to propel large-scale development for in-depth reinforcement learning. Open source tools, they are designed to train DRL agents. We will continue to see rapid growth in practical implementations as more companies apply deep reinforcement learning to their own specific business use cases.

Automotive           

The automotive industry has a varied and extensive dataset that will allow deep learning to be strengthened. It will help optimize manufacturing, vehicle servicing and overall automation in industry; it is already in use for autonomous vehicles. The automotive industry is defined by reliability, cost and quality, and DRL can provide new opportunities to increase quality, save costs and ensure a better safety record using data from consumers, suppliers and warranties.

Healthcare

From diagnosis and assessing appropriate treatment plans to clinical trials, development of new drug, and automated treatment, there is tremendous potential for developing healthcare through deep reinforcement learning.

Manufacturing

Intelligent robots in warehouse and distribution centers are becoming increasingly common for sorting out millions of goods and distributing them to the right people. When a robot chooses a unit to place in a container, deep reinforcement learning allows it acquire knowledge based on whether it has succeeded or failed. The robot then uses this knowledge to perform better in the future.

A real-world example of this is BRETT, previously known Willow Garage Personal Robot 2 (PR2). BRETT is a robot developed by a team of researchers at UC Berkeley’s Department of Electrical Engineering and Computer Sciences worked with.

The algorithm regulating the learning of the robot included a component of reward, which generated a score based on how well the robot was performing the job.  Movements that took the robot closer to accomplishing the job will result in higher scores than those that do not. The score feeds back via the neural net in order to allow the robot to learn which moves are better for the job.

Conversational Bots

The conversational User interface framework that enables AI bots utilizes the power of deep reinforcement learning. Thanks to deep reinforcement learning, the bots are rapidly mastering the complexities and terminology of language in several domains for automated speech and natural language comprehension.

3 Ways Machine Learning and Other Cognitive Approaches Are Helping Find the Optimal Solution through Iterations of Trial And Error

The following are some of the main ways machine learning and other cognitive approaches are helping find the optimal solution through iterations of trial and error.

1. Scenario Simulation

A scenario simulation is an artificial illustration of a real-world occurrence by means of experiential learning to accomplish educational objectives. It takes careful preparation to develop a successful simulation scenario and it can be divided into several phases.

Simulation scenarios are developed to evaluate, inform and assist learners in finding differences in their comprehension of the content or information application. Simulation may also be used to address patient health concerns, clinical events, communication difficulties, organizational skills, the potential for teamwork, and leadership skills. A real-world example of scenario simulation is PhishMe.

Hackers are constantly using spearphishing and phishing email attacks to target employees. To counter this danger, security firm PhishMe has built simulation software that can be used by a business executive, human resource, IT, and auditors to inform their employees.

PhishMe assists in generating and distributing focused phishing messages with hyperlinks, including those requesting username and password, or attachments intended to engage users. The platform then collects user response metrics in a centralized storage location for organizations to create reports or share them with users.

2. Resource Optimization

Resource optimization helps organizations to satisfy resource requests optimally. Resource optimization analyses all open resource requests against available resources and sets out a strategy that helps you to meet the objectives of the company.

The AI, resource optimization tool, utilizes a range of sophisticated AI techniques, like genetic algorithms, business rules, and constraint programming, to intelligently generate tailored resource allocation timelines and strategies that reliably fulfill goal requirements while fulfilling all rules and constraints. Using an advanced dynamic rescheduling algorithm, this AI tool also enables organizations to respond to unforeseen events or changes in workload demands.

3. Iterative Problem Solving

Iterative problem solving in AI and machine learning takes inspiration from the software-development lifecycle, which is essentially a process in which you are solving a problem—in well-defined steps. Like software development, problem-solving in machine learning and AI is an iterative process which includes the following steps:

  1. Recognizing the Problem
  2. Reviewing the Dataset
  3. Establishing a Realistic End Goal
  4. Listing Alternate Solutions
  5. Selecting a Solution
  6. Implementing the solutions
  7. Evaluation

By going through the above steps, machine learning algorithms can understand a problem and then solve it iteratively.

Final Word

Machines have mastered games of such as chess and checkers and quickly found answers to mazes in the past. Now, they can compete at Go, multi-player games like DoTA, and even more complex games, thanks to the power of reinforcement learning and other sophisticated computing capabilities.

For goal-driven systems, games are not the only area of application. Machine learning and other cognitive methods can be implemented by companies to leverage the power of reinforcement learning and other machine learning strategies to equip their applications with the ability to learn by trial and error. It is valuable in any situation where you want the application to find the best or optimal solution to a problem. COVID-19 vaccine development is a latest and greatest example of this process.

In the final article in this series, we will look at how a combination of Smart Workflow, RPA and other semantic automation methods can be used to simplify everyday routine tasks and free human resources for more creative tasks.

Get in touch to learn how our AI powered solutions
can solve your business problem.