Reinforcement learning (RL) is inspired by behavioral psychology. The main goal is to see how an agent interacts with its environment to maximize reward potential.
In 1963, Donald Michie built a machine out of 304 matchboxes and beads which applied RL to the game of tic-tac-toe. Twenty six years later, a significant advancement was made in the RL field when Christopher Watkins developed the Q-learning algorithm to measure the long-term value of an agent’s action.
RL has recently received renewed attention as practitioners have found new applications for its use by combining existing algorithms, such as Q-learning and SARSA, with deep learning methods. This particular combination is called deep reinforcement learning, and it enables researchers to provide solutions for larger scale problems.
The main components of a RL model are: agent, environment, action, state, reward, policy, and Q-value.
The objective of RL in Fundraising is to keep the probability of a donation high while continuing to improve and build a relationship with a donor or prospect. We are essentially asking RL to tell fundraisers what to do and when to maximize a gift or relationship.
By analyzing the giving history and interactions between a fundraiser and specific donor, the algorithm will recommend the proper date to take action to improve and help increase the probability of donation.
One of the strengths of RL is that it has the capability to evaluate the long-term effect of an action – rather than simply evaluating the present, or short-term, impact. For example, the algorithm may suggest making a phone call in the next 15 days – which could be perceived as a misstep because it's too persistent. However, the algorithm understands that taking this action increases the probability of a donation at the optimal time for the donor.
The importance of this application is to better organize and manage outreach initiatives and prompt a fundraiser to communicate more effectively overall. The algorithm will explore an array of possible actions and consequences – and propose the ones with the best potential outcome.