Is the robot "face" screen going to be showing the live video of the person's face? If so, since presumably you don't have a Steadicam operator staying directly in front of each human being represented by robots at all times, this is going to look weird. It will be hard to even keep your face in frame as you naturally move around, swivel your chair, etc. Even if your face can somehow be properly framed, the front of your robot face (which itself swivels) will keep showing the sides of your face as you turn to look at various people.
This can be avoided at the great expense of losing the live video of the person--you can just put a static picture of the person's face on the bot, but this seems a big step back from a regular videoconference--you can't see the person's facial expressions.
Not to mention, this enhances a SINGLE nonverbal body language feature (direction of head pointing) while utterly destroying all other nonverbal information you get from a plain old videoconference, including overall posture, hand gestures, etc. The robot can't fold its arms, make a gesture, tilt its head side to side, etc.
I think this idea is quite a stretch.