The hackers were humans. Soon AI will hack humanity



If you don’t you are already tired of worrying, consider a world where AI is the Pirates.

Piracy is as old as humanity. We are creative problem solvers. We exploit loopholes, manipulate systems and strive for more influence, power and wealth. To date, hacking has been exclusively a human activity. Not for a long time.

As I expose myself in a report I just published, artificial intelligence will eventually find vulnerabilities in all kinds of social, economic and political systems, and then exploit them at unprecedented speed, scale and scope. After hacking humanity, AI systems will then hack other AI systems, and humans will be little more than collateral damage.

Okay, that might be a bit of hyperbole, but it doesn’t require any sci-fi tech for the distant future. I am not applying AI ‘Singularity’, where the AI ​​learning feedback loop becomes so fast that it is beyond human comprehension. I’m not assuming smart androids. I’m not presuming bad intentions. Most of these hacks don’t even require major breakthroughs in AI research. They are already happening. However, as AI gets more sophisticated, we often won’t even know it’s happening.

AIs don’t solve problems like humans do. They are looking at more types of solutions than we are. They will take complex paths that we have not considered. This can be a problem because of something called the explainability problem. Modern AI systems are essentially black boxes. Data goes to one end and a response comes out at the other. It may be impossible to understand how the system came to its conclusion, even if you are a programmer looking at the code.

In 2015, a research group fed an AI system called Deep Patient Health and Medical Data to some 700,000 people, and tested whether it could predict disease. It could, but Deep Patient doesn’t provide any explanation based on a diagnosis, and researchers have no idea how it comes to its conclusions. A doctor can either trust or ignore the computer, but that trust will remain blind.

As researchers work on an AI that can be explained, there seems to be a trade-off between ability and explainability. Explanations are a cognitive shortcut used by humans, adapted to the way humans make decisions. Forcing an AI to produce explanations could be an additional constraint that could affect the quality of its decisions. For now, AI is becoming more and more opaque and less explainable.

Separately, AIs can engage in something called reward hacking. Because AIs don’t solve problems the same way people do, they will invariably come across solutions humans might never have anticipated – and some will reverse the intent of the system. This is because AIs don’t think in terms of implications, context, norms and values ​​that we humans share and take for granted. This reward hack involves achieving a goal, but in a way the designers of AI didn’t want or want.

Take a soccer simulation where an AI figured out that if they kick the ball out of bounds, the goalkeeper should throw the ball and leave the goal defenseless. Or another simulation, where an AI figured out that instead of running, they could make themselves big enough to cross a distant finish line by falling on them. Or the robot vacuum cleaner who, instead of learning not to bump, learned to back up, where no sensor told him he was bumping into objects. If there are any issues, inconsistencies or loopholes in the rules, and if these properties lead to an acceptable solution as defined by the rules, AIs will find these hacks.

We learned about this hacking problem as kids with the King Midas story. When the god Dionysus grants him a wish, Midas demands that everything he touches turn to gold. He ends up starving and miserable when his food, drink, and daughter turn to gold. It’s a specification issue: Midas programmed the wrong lens into the system.

Geniuses are very specific about wording wishes and can be mischievously pedantic. We know that, but there is still no way to outsmart the genius. Whatever you wish, he will always be able to tune it to you in a way that you wished he hadn’t. He will hack your wish. Goals and desires are always under-specified in human language and thought. We never describe all options, nor do we include all applicable caveats, exceptions and caveats. Any goal that we specify will necessarily be incomplete.



Please enter your comment!
Please enter your name here