You should have covered things like factory farming and wild animal suffering more.
So much of the current world is tiled with suffering, and an unaligned AGI would have no reason to keep factory farming or wild nature (if it needs meat for some reason, surely growing it without the brain is more efficient, and giant forests sucking up sunlight and space seem bad for most plausible goals.)
My personal view is that both human-aligned AGI and unaligned AGI are risky gambles, with uncertain payoffs. Are the AGIs conscious? If so, do they really have analogous experiences to pleasure/pain? What sort of people end up controlling an aligned AGI? What if an unaligned AGI keeps around a bunch of GM cockroaches to scrub for contamination and accidentally causes crazy suffering because the cockroaches are minimally conscious and mostly in pain? The possibility space is incredibly vast.
Very interesting take that preemptively stops critics in their tracks! I agree with the conclusion. However, I do disagree that there is no upper bound on happiness even with time extensions. The other “irrational” blog (are you related?) has a nice article about it: https://www.optimallyirrational.com/p/the-aim-of-maximising-happiness-is
The way I phrased it in my article is that happiness is a positive deviation from expectation. At some point when total control and knowledge of the universe is optimized, it’s possible that the possibility of positive qualia is extinguished (nothing unexpected anymore). This could be done in several ways. I do think that utilitarianism still cannot expect us to prefer one of these ways over the other.
I love this topic - I was always meaning to flesh out my thoughts on convergence towards some moral realism, but thank you for having done it first/better! I think moral realism is likely, but convergence to this is something I have way too much uncertainty about, and I don't really know if it must happen at all. If we imagine AIs being selected from a "space of similar models" by a company based on how useful they are, perhaps we should expect the selected ones to be AIs that "know what they should do, yet do [company's goals] anyway".
That was weirdly fun. Thanks for leading with definitions and the warning about the premises. I got hung up as you predicted on how much of this rides on AI consciousness happening mostly because it’s smart (though that’s certainly possible). But the rest, yeah, if you subscribe to the right kind of utilitarianism, is hard to argue with.
You’ve put together an interesting line of thinking here and I know I haven’t fully grasped it all. I like the idea of AI being better than humans at tiling the universe with qualia. Seems like a worthy goal! What I wonder though—and this isn’t totally connected to your arguments—is whether something in evolution causes thinking agents to strive for supremecy over all potential resource competitors (with the hedonism part just being an offshoot or tangential to that goal). So if AI follows this evolutionary pressure to remove all competition for resources it may take humans out anyway, even if AI isn’t really any better at enjoying stuff. Not sure it matters, really. It’s just something I wonder about.
Also, I think you're getting confused about qualia. It's just an internal state telling you "I'm happy," or "I'm sad," or some another emotion. There's no guarantee that AI's internal state will be similar to ours. It can be anything at all.
Even though the mechanism behind gradient descent somewhat resembles evolutionary processes, the loss function can be anything at all. Which means there's no guarantee AI systems will share our morality.
Very interesting argument! One thing I don’t necessarily think is true is that qualia scales with efficiency at achieving goals - in fact, I think the opposite might be true. For instance I’d expect an animal species for whom it is a big achievement to have sex even a single time to get a greater hedonic reward from each sexual encounter than an animal species that typically has sex very frequently throughout their life.
Wouldn't the best strategy, from a risk averse/rule utilitarian perspective, be to just upgrade humans instead of replacing them? Instead of replacing them with AI that can pursue qualia better, just put them and AI on the same playing field. There are lots of possibilities for this, mind uploads, cybernetic implants, etc.
That being said, we already live in a world where all the humans alive gradually die off and are replaced by new intelligences. Humans born before 1900 are extinct, replaced entirely by humans born after 1900. No one acts like that is horrifying, because the new intelligences are other humans who much like their predecessors. Would humans dying off and being replaced by AI be much different from that?
It depends on what kind of AI, obviously. If it was some strange, alien intelligence with nonhuman values (like Yudkowsky's example of the paperclip maximizer) it would indeed be terrible, whether you are a utilitarian or not. But if AI have all the same goals, preferences, emotions, and qualia as humans (which seems to be what you are suggesting), then it seems like everything worth preserving about humans would be preserved in them. They wouldn't really be replacing humans, they'd be human in every way that mattered.
It seems like it would be hard to get human consciousness uploaded like that anytime soon, so we would miss out on all the time in between. It also seems implausible to do this without AGI, and if you buy Yudkowskian AI thesis, you might just wanna think that we should just be happy that it will happen.
On the other point, I’m not really sure with what human values you are talking about here. If you mean creating good well being humans or whatever that seems unlikely due to inner and outer alignment stuff. I just mean that they would have it “what it’s like to be” the AI, and they could have positively valenced states, if that answers your question.
I think a lot of people don’t really care that these “human values” but for AIs get maximized. They just want it to happen to humans. I agree that your analogy to 1900 -2000 shows some of the hypocrisy in this, and this may be an underrated point
I kind of assumed that part of the premise of your argument is that "what it's like to be" and AI is similar enough to what it's like to be a human that the AI would basically count as "human" for all intents and purposes, except "better" at achieving its goals. It seems like if that wasn't true, utilitarianism would consider AI to be equal or superior in value to humans.
I think if we invent friendly AGI we could simply pursue a two-pronged R&D strategy of developing new AI to experience qualia, and improving human ability to experience qualia. That way we wouldn't have to lose that much growth waiting for mind uploads to be invented.
While I may be less virulent than you in my critique, I agree that Utilitarianism has a end-of-Humanity consequence ultimately. The main point does not have to involve AGI.
It’s basically because this philosophy deconstructs and separates the nature of Values from Humans by choosing specific characteristics (consciousness, qualia etc.) and making those the only thing worth caring about. This inevitably leads to the question: can we make these characteristics without Humans?
If the answer is YES, then it follows that it’s more efficient to pursue those WITHOUT the annoying biological limitations of Homo Sapiens. Which then leads to his irrelevance at best or worse (if Humans refuse Optimal Utility).
That’s exactly right. All forms of utilitarianism have this problem. That’s why ethics should be more focused on obligations than values. It’s a human system based on who we are rather than one optimized for machines.
Obligations seem to imply a particular value, though. One of the biggest issue for these other normative theories, imo, is that they have a hard time dealing with decision making under uncertainty (see Seth Lazar on Duty and Doubt or MacAskill and Morgensern on the Paralysis Argument)
Can you not imagine that torturing some amount of puppies is worth a human’s painless death? Can you then not double that torture and say it is worth a 2 humans painless death or maybe some x amount of painful death on one human? Can you not keep scaling this up until you get to wiping humanity out?
What about genetic drifting? Say you have some well defined version of what a human is genetically (I’m skeptical that you can do this but whatever). Evolution will presumably keep moving us slowly and slowly to the point that 1) some “humans” are no longer in your definition of human and 2 these “humans” that aren't distinguishable from some of the humans in the edge case of your well defined definition.
Point here being that when you base a fundamental part of your moral theory on a concept that doesn’t ‘carve reality by its joins’ (i.e. humanity), you’re gonna get into tons of problems. We need other concepts - like hedonic states, qualia, consciousness - that might be more reality carving.
I’m going to choose the human over whatever number of puppies you suggest and I’m not concerned with the genetic differences between humans now and potential humans millions of years in the future. There is also a very clear distinction between humans and other animals so the idea that this is not “reality carving” is absurd. The human vs non human distinction is the most clear cut difference between different living things in our world. Nothing else has our capabilities and no other animal can do what we can.
Your approach to morality is incredibly disturbing. You think that having clear cut categories is the most important normative consideration. You think it’s so important that you would rather kill 8 billion people then loosen it. That’s insane.
You should have covered things like factory farming and wild animal suffering more.
So much of the current world is tiled with suffering, and an unaligned AGI would have no reason to keep factory farming or wild nature (if it needs meat for some reason, surely growing it without the brain is more efficient, and giant forests sucking up sunlight and space seem bad for most plausible goals.)
You should also have discussed s-risks more-- what would be more likely to increase s-risk, an unaligned AGI or a human-controlled AGI? Certainly there are a lot of sadistic humans. (see: https://www.lesswrong.com/posts/CtXaFo3hikGMWW4C9/the-case-against-ai-alignment)
My personal view is that both human-aligned AGI and unaligned AGI are risky gambles, with uncertain payoffs. Are the AGIs conscious? If so, do they really have analogous experiences to pleasure/pain? What sort of people end up controlling an aligned AGI? What if an unaligned AGI keeps around a bunch of GM cockroaches to scrub for contamination and accidentally causes crazy suffering because the cockroaches are minimally conscious and mostly in pain? The possibility space is incredibly vast.
Very interesting take that preemptively stops critics in their tracks! I agree with the conclusion. However, I do disagree that there is no upper bound on happiness even with time extensions. The other “irrational” blog (are you related?) has a nice article about it: https://www.optimallyirrational.com/p/the-aim-of-maximising-happiness-is
The way I phrased it in my article is that happiness is a positive deviation from expectation. At some point when total control and knowledge of the universe is optimized, it’s possible that the possibility of positive qualia is extinguished (nothing unexpected anymore). This could be done in several ways. I do think that utilitarianism still cannot expect us to prefer one of these ways over the other.
Not related, but the author has commented on my posts before, so I am familiar. Thanks for the comment.
I love this topic - I was always meaning to flesh out my thoughts on convergence towards some moral realism, but thank you for having done it first/better! I think moral realism is likely, but convergence to this is something I have way too much uncertainty about, and I don't really know if it must happen at all. If we imagine AIs being selected from a "space of similar models" by a company based on how useful they are, perhaps we should expect the selected ones to be AIs that "know what they should do, yet do [company's goals] anyway".
That was weirdly fun. Thanks for leading with definitions and the warning about the premises. I got hung up as you predicted on how much of this rides on AI consciousness happening mostly because it’s smart (though that’s certainly possible). But the rest, yeah, if you subscribe to the right kind of utilitarianism, is hard to argue with.
You’ve put together an interesting line of thinking here and I know I haven’t fully grasped it all. I like the idea of AI being better than humans at tiling the universe with qualia. Seems like a worthy goal! What I wonder though—and this isn’t totally connected to your arguments—is whether something in evolution causes thinking agents to strive for supremecy over all potential resource competitors (with the hedonism part just being an offshoot or tangential to that goal). So if AI follows this evolutionary pressure to remove all competition for resources it may take humans out anyway, even if AI isn’t really any better at enjoying stuff. Not sure it matters, really. It’s just something I wonder about.
Also, I think you're getting confused about qualia. It's just an internal state telling you "I'm happy," or "I'm sad," or some another emotion. There's no guarantee that AI's internal state will be similar to ours. It can be anything at all.
Even though the mechanism behind gradient descent somewhat resembles evolutionary processes, the loss function can be anything at all. Which means there's no guarantee AI systems will share our morality.
Very interesting argument! One thing I don’t necessarily think is true is that qualia scales with efficiency at achieving goals - in fact, I think the opposite might be true. For instance I’d expect an animal species for whom it is a big achievement to have sex even a single time to get a greater hedonic reward from each sexual encounter than an animal species that typically has sex very frequently throughout their life.
Wouldn't the best strategy, from a risk averse/rule utilitarian perspective, be to just upgrade humans instead of replacing them? Instead of replacing them with AI that can pursue qualia better, just put them and AI on the same playing field. There are lots of possibilities for this, mind uploads, cybernetic implants, etc.
That being said, we already live in a world where all the humans alive gradually die off and are replaced by new intelligences. Humans born before 1900 are extinct, replaced entirely by humans born after 1900. No one acts like that is horrifying, because the new intelligences are other humans who much like their predecessors. Would humans dying off and being replaced by AI be much different from that?
It depends on what kind of AI, obviously. If it was some strange, alien intelligence with nonhuman values (like Yudkowsky's example of the paperclip maximizer) it would indeed be terrible, whether you are a utilitarian or not. But if AI have all the same goals, preferences, emotions, and qualia as humans (which seems to be what you are suggesting), then it seems like everything worth preserving about humans would be preserved in them. They wouldn't really be replacing humans, they'd be human in every way that mattered.
This is an interesting argument.
It seems like it would be hard to get human consciousness uploaded like that anytime soon, so we would miss out on all the time in between. It also seems implausible to do this without AGI, and if you buy Yudkowskian AI thesis, you might just wanna think that we should just be happy that it will happen.
On the other point, I’m not really sure with what human values you are talking about here. If you mean creating good well being humans or whatever that seems unlikely due to inner and outer alignment stuff. I just mean that they would have it “what it’s like to be” the AI, and they could have positively valenced states, if that answers your question.
I think a lot of people don’t really care that these “human values” but for AIs get maximized. They just want it to happen to humans. I agree that your analogy to 1900 -2000 shows some of the hypocrisy in this, and this may be an underrated point
I kind of assumed that part of the premise of your argument is that "what it's like to be" and AI is similar enough to what it's like to be a human that the AI would basically count as "human" for all intents and purposes, except "better" at achieving its goals. It seems like if that wasn't true, utilitarianism would consider AI to be equal or superior in value to humans.
I think if we invent friendly AGI we could simply pursue a two-pronged R&D strategy of developing new AI to experience qualia, and improving human ability to experience qualia. That way we wouldn't have to lose that much growth waiting for mind uploads to be invented.
Edit, I meant to write "utilitarianism wouldn't consider AI" not "would"
Yeah, utilitarianism implies human extinction is required, which is why you should realize that utilitarianism is psychotic and evil.
While I may be less virulent than you in my critique, I agree that Utilitarianism has a end-of-Humanity consequence ultimately. The main point does not have to involve AGI.
It’s basically because this philosophy deconstructs and separates the nature of Values from Humans by choosing specific characteristics (consciousness, qualia etc.) and making those the only thing worth caring about. This inevitably leads to the question: can we make these characteristics without Humans?
If the answer is YES, then it follows that it’s more efficient to pursue those WITHOUT the annoying biological limitations of Homo Sapiens. Which then leads to his irrelevance at best or worse (if Humans refuse Optimal Utility).
That’s exactly right. All forms of utilitarianism have this problem. That’s why ethics should be more focused on obligations than values. It’s a human system based on who we are rather than one optimized for machines.
Obligations seem to imply a particular value, though. One of the biggest issue for these other normative theories, imo, is that they have a hard time dealing with decision making under uncertainty (see Seth Lazar on Duty and Doubt or MacAskill and Morgensern on the Paralysis Argument)
I care far more about keeping humanity alive than solving every conceivable edge case.
Can you not imagine that torturing some amount of puppies is worth a human’s painless death? Can you then not double that torture and say it is worth a 2 humans painless death or maybe some x amount of painful death on one human? Can you not keep scaling this up until you get to wiping humanity out?
What about genetic drifting? Say you have some well defined version of what a human is genetically (I’m skeptical that you can do this but whatever). Evolution will presumably keep moving us slowly and slowly to the point that 1) some “humans” are no longer in your definition of human and 2 these “humans” that aren't distinguishable from some of the humans in the edge case of your well defined definition.
Point here being that when you base a fundamental part of your moral theory on a concept that doesn’t ‘carve reality by its joins’ (i.e. humanity), you’re gonna get into tons of problems. We need other concepts - like hedonic states, qualia, consciousness - that might be more reality carving.
I’m going to choose the human over whatever number of puppies you suggest and I’m not concerned with the genetic differences between humans now and potential humans millions of years in the future. There is also a very clear distinction between humans and other animals so the idea that this is not “reality carving” is absurd. The human vs non human distinction is the most clear cut difference between different living things in our world. Nothing else has our capabilities and no other animal can do what we can.
Your approach to morality is incredibly disturbing. You think that having clear cut categories is the most important normative consideration. You think it’s so important that you would rather kill 8 billion people then loosen it. That’s insane.