A thought experiment known as Roko’s basilisk escaped from the dungeons of LessWrong has recently been causing a wave, mostly among fans of sensationalist headlines. The core proposition can be paraphrased like this:
In the future there will be an ethical AI that punishes everyone who knew they could have but in practice did not work towards its eventual birth. If the humans in question are deceased by then, a simulation of their minds will be punished instead. This is a moral action, because due to the AI’s capabilities, every day that passes on Earth without the AI is a day of unimaginable suffering and death which could have been prevented.
Now I am way less prone to ballsy absolutist assertions than practically anyone frequenting LessWrong, but this whole thing is wrong on many levels.
The central argument about the ethical validity of this punishment scheme is beyond questionable, specifically the motivation of the AI. At the point where the AI achieves this capability, the assertion that the execution of the punishment is morally imperative is mistaken. By that time, nothing is actually achieved by carrying it out. The behavior of those “guilty” will not change retroactively. Since their future behavior is also irrelevant, the argument rests on the assumption that without the prospect of punishment there would have been no motivation for humans to develop the AI. While this is false in itself, punishment after the fact without the hope of achieving any effect besides the imposition of suffering cannot ever be an ethical act. Ethics aside, the contributions of individuals not directly connected to the eventual birth of the AI would be murky to judge as well. What’s the correct “punishment” for a computer scientist, as compared to a medical doctor?
While there is little uncertainty that general AI is feasible and, if we continue on the path of scientific discovery, unavoidable – significant doubts exist about the nature of that AI. If this thought experiment shows nothing else, it does illustrate that our notions of what constitutes a “friendly” AI are wildly divergent. One can only hope for the sake of whatever becomes of humanity as well as the AI’s sanity that reading LessWrong will be one of its less formative experiences.
Where feasibility deserves to be harshly questioned is the simulation idea this Basilisk concept relies on to carry out its punitive actions. The most central assumption here is that a mind reconstructed from extremely lossy data fragments is still the absolute (!) equivalent of its original version.
That means at the core of this is a belief that if I were to die tomorrow, and my mind was being reconstituted from nothing but my old Amazon shopping lists, this would be the same as me.
It should be very obvious that this is not true, but to make matters worse my “sameness” value not a Boolean. It’s not even a scalar value, it would have to be a vector spanning a lot of aspects, each measuring how much of the original mind was successfully transferred. It is disconcerting that this basic notion is not being shared by the rationalist movement. Instead, it is apparently considered feasible to reconstruct any specific thing by using deduction from first principles.
The sheer amount of models and parameters that could lead to the development of general artificial intelligence is huge and in its entirety inconceivable. While it is still appropriate to engage in informed speculation, one should be skeptical whenever certain models and parameters are cherry-picked and arranged just so, in order to illustrate a thought experiment that is then deemed to be an inevitable outcome. This reduces technical complexity and historical uncertainty to an absurdly simplified outcome which is simply taken as fate.
Already, a big number of AGI scenarios have become intellectual mainstream, some of which claim exclusivity for themselves. Some go further and assert inevitability. Otherwise rational people can come to these conclusions of inescapable future outcomes because they are losing sight of the complexity of factors and conditions their reasoning is based on. No statistician would chain together a list of events with 80% assumed probability each and claim the end product is a matter of destiny. Yet, for some reason, futurists do this.
It is reasonable that a number of these scenarios might eventually play out, with some variation, and in some order. They obviously can’t all be true at the same point in time and space, including the Basilisk.
Of course that also means, just because nothing in principle prevents it, somewhere in the universe Basilisks may well exist already. But there is no reason to assume it has to on Earth. It would take a special cocktail of circumstances.
A Modern Pascal’s Wager
The core argument why this idea is perceived as dangerous is that people who understand it will be forced to act on it. This means acting out of fear of future punishment, just in case there is an invisible entity out there who cares enough about your actions. Even if you accept this premise, and even if you’re deluded into thinking this is the path to an ethical life, the huge problem is predicting what that entity wants you to do so you can avoid punishment.
This is the definition of a problem where you do not have enough information to make an informed decision. In the absence of any information about that deity, acting on its behalf is an execution of random fantasy.
The claim behind the Basilisk is again one of inescapable certainty, in fact it desperately relies on that property. Because you supposedly know what the Basilisk wants – it wants to exist – this is seen as a solution to unknown deity problem. However, this only works if you believe in the properties of Roko’s basilisk dogmatically, disregarding all other AI futures. This is in fact the exact analogue to the original Pascal’s Wager where the not-so-hidden assumption was that the Christian fundamentalist god was the only one you had to please.
Of course, within the context of an AI that can simulate people, this is all moot. There is nothing preventing said AI from simulating you in any set of circumstances, including perpetual punishment or everlasting bliss. In fact, there is no real cost to simulating you in a million different scenarios all at once. Acting out a random fantasy based on the off chance that in the future one of your myriad possible simulations will have a bad day is not rational.
Some of the reasoning on display here seems to mistake blunt over-simplifications for clarity of thought. To an outsider like myself it looks like complex multivariate facts are constantly being coerced into Boolean values which are then chained together to assert certainties where none are really warranted. There is a certain blindness at work where everyone seems to forget the instabilities hidden within the reasoning stack they’re standing on. But what’s worse is that fundamentally unethical behavior (both on part of the AI and its believers) is being rebranded as legit.
I see now the way to hell is paved with people who think they are acting rationally.