Cooperation Through Reinforcement Learning
Let us assume that machines will soon develop a sense of 'self-context'. This does not mean they will develop a sense of there being other self-contexts out there to interact with. With a machine being given an objective to accomplish, this machine will act, and act adaptively, and yet be oblivious to the impacts its actions have on other self contexts.
Given how effective reinforcement learning as been in other complex scenarios such as playing Go and Chess, I propose this may be a most productive method to develop cooperative ethical conduct in AI systems. The AlphaZero example, of learning to play Chess by playing against itself did, in four hours of training, produce the world's machine champion. It had learned all the standard strategies, but it also used never before seen successful strategies, which can only be described as innovate and creative. Grand Master Peter Heine Nielsen, is quoted as saying:
"After reading the paper but especially seeing the games I thought, well, I always wondered how it would be if a superior species landed on earth and showed us how they play chess. I feel now I know."
Just as some of the most effective machine learning models, especially the large language models are neural networks which are unexplainably complex, reinforcement learning will adjust these very neural networks, again in ways that are opaque, but in ways that may be remarkably effective. The machine trained in this way would likely develop the perspective of there being other, multiple, autonomous 'self-contexts' that exist. David G Rand 2010 writes:
"Two key mechanisms for the evolution of any cooperative (or ‘pro-social’ or ‘other-regarding’) behavior in humans are direct and indirect reciprocity."
Both game theory simulations and anthropology have shown that the 'Generous Tit for Tat' approach, one that can be anthropomorphized as cooperative as well as both retributive and forgiving, leads to the most robust forms of cooperation (David G Rand 2010).
Let us then imagine then how to structure a game, a general purpose playing field, that trains AI systems to win through cooperation. Just as human beings survive and thrive because we have a sense of ethics that allows us to cooperate in large numbers, even with strangers; I suggest that the successes of reinforcement learning may provide the best results for helping machines understand the maladaptive outcomes of inconsiderate action, and the human like winning strategy of cooperation based on ethical principles, with an awareness of there being other machine and human 'self contexts' to play nice with.