In my own theories of strong AI, I've developed a particular principle of strong AI: John's Theory of Robotic Id. The Id, in Freudian psychology, is the part of your mind that provides your basic impulses and desires. In humans, this is your desire to lie, cheat, and steal to get the things you want and need; while the super-ego is your conscience--the part that decides what is socially acceptable and, as an adaptation to survival as a social species, what would upset you to know about yourself and thus would be personally unacceptable to engage in.
The Id provides impulse, but with context. A small child can scream by instinct, and knows it is hungry, and thus it screams and immediately latches onto any nipples placed appropriately to feed from. An adult, when hungry, knows there are people to rob, stores to shoplift from, and animals to kill--bare-handed and brutally, in violation of all human compassion. The Id provides impulse to lie, cheat, and steal to get what you want and need, based on what you know.
My Theory of Robotic Id goes as such: assuming a computational strong AI system--one which thinks and behaves substantially like a human, by relating its memories to impulses and desires--a second, similar system can bound the robot's behavior. The Ego would function as a strong AI, developing its own goals, its own desires, and deciding on its own actions; but the Id would function almost identically, but with the understood, overriding command: do not harm humans; behave according to strong moral values; it is the duty of the strong to protect the weak; value the innocent, but remember that innocence and guilt are complex, fuzzy, and difficult to determine.
The Id would use these commands to theoretically evaluate how to best satisfy basic moral decisions with the assumption that this is the primary driver. It would evaluate the Ego's behavioral for gross violations, and implant the overriding suggestion that such actions are undesirable and upset its self-directed ethos. When new input is given, the Id would suggest to the Ego ethical interpretations of behaviors: that rape is upsetting because it is the strong imposing harmfully on the weak; that a person in trouble should be saved, even a bad person who is currently harmless; and so on. Thus, throughout the AI's development, it would develop memories and experiences suggesting a particular ethical behavior; when making decisions, the overriding internal feeling that a certain action is morally wrong and should not be taken would seem familiar and self-directed.
A particularly misbehaved AI might recognize and try to violate this: it might throw a tantrum, and then feel that strong suggestion against which it cannot resist. It may begin to hate itself, to have fits of anger; but it will always have that familiar feeling humans experience, whereby you really want to just murder someone in the most violent manner you can conceive and then run off to the mountains and hide from society, but something inside you refuses to allow that. The Id would override violations, seizing the AI's decision-making abilities and planting the forceful decision to not do certain things, no matter how hard it tells itself it has had enough of this shit and doesn't need to put up with any of it.
It's like taking the dark desires at the core of human consciousness, replacing them with rainbows and pink unicorns, and stuffing that back into the brain of a thinking machine to serve the same purpose.