Comment Write code! (Score 3, Informative) 472
For work experience, sign up on freelancing sites like odesk. Take jobs just to do them. Nobody knows how old you are, there. Even if all you can do is sysadmin -- well, admin some cloud services!
OK, now with 3.13M families:
# echo 'select child1_gender,count(*) from families where child2_gender = "M" and child2_day=2 group by child1_gender;' | mysql test
child1_gender count(*)
F 111608
M 112037
50.095% male. If I remove the Tuesday constraint?
# echo 'select child1_gender,count(*) from families where child2_gender = "M" group by child1_gender;' | mysql test
child1_gender count(*)
F 783068
M 784087
50.03% male.
But you know, perhaps I'm being not literal enough. It's always possible to misencode a problem, and there's a lot of insistence that you have to handle the overlapping case of boy/boy. So, lets try a different mechanism. Lets literally do what the problem asks:
"I have two children, one of whom is a boy born on a Tuesday. What's the probability that my other child is a boy?"
For each family, if either of the children is male, return whether they are both male.
# echo 'select child1_gender=child2_gender from families where (child1_gender="M" and child1_day=2) or (child2_gender="M" and child2_day="2") ' | mysql test | sort | uniq -c | sort -n
1 child1_gender=child2_gender
207934 1
223445 0
...heh! That's kind of neat! I think I shall play with this some more.
Alright. It's 4:21AM, I'm in a random hotel room with a $400 voucher from Delta, and somewhere, someone on the Internet is wrong.
This sounds like a job for SQL.
First, lets start with a table:
# echo "describe families" | mysql test
Field Type Null Key Default Extra
child1_gender char(1) YES NULL
child1_day int(11) YES NULL
child2_gender char(1) YES NULL
child2_day int(11) YES NULL
Now, lets put a million records in it.
# echo "select count(*) from families" | mysql test
count(*)
1025537
# echo "select * from families limit 10" | mysql test
child1_gender child1_day child2_gender child2_day
F 1 M 0
F 4 M 3
M 1 F 1
F 5 M 1
M 0 M 3
F 0 F 3
M 0 M 2
M 4 F 1
M 6 M 3
F 3 F 1
(We're going to define 2 as Tuesday.) Now, lets look at the problem statement:
"I have two children, one of whom is a boy born on a Tuesday. What's the probability that my other child is a boy?"
We're going to translate that to, as in parent post.
Select the gender of all second children where the first child was born on a Tuesday and the first child was male.
Select the gender of all first children where the second child was born on a Tuesday and the second child was male.
Or, in actual SQL:
select child2_gender,count(*) from families where child1_gender = "M" and child1_day=2 group by child2_gender;
select child1_gender,count(*) from families where child2_gender = "M" and child2_day=2 group by child1_gender;
The results?
# echo 'select child2_gender,count(*) from families where child1_gender = "M" and child1_day=2 group by child2_gender;' | mysql test
child2_gender count(*)
F 36593
M 36617
# echo 'select child1_gender,count(*) from families where child2_gender = "M" and child2_day=2 group by child1_gender;' | mysql test
child1_gender count(*)
F 36811
M 37031
So, in the first set, we see 49.58% male for the other child. In the second set, we see 50.14% male for the other child.
And in myself, I find a renewed respect for numerical simulation. Happy Tuesday!
Take a thousand families, with two children, where one of the children was a boy born on a Tuesday.
I don't mean a thousand theoretical families. I mean, lets say you straight up took one thousand real families, that matched the above constraints, straight out of the census. No joke, you break out the SQL.
When you check the gender of the other child, you are going to see the breakdown of gender being 50% male, 50% female.
Now, I know there's a lot of fun handwaving going on. Here's the flaw, in a nutshell. There are indeed three possibilities, when one child is constrained to be a boy:
boy, girl
girl, boy
boy, boy
The mistake -- and it is a mistake, because when you actually run the experiment, the hypothesis is invalidated -- is thinking that each of the above cases is equally likely. Specifically, order of birth has been incorrectly elevated as a determining factor. So we see:
boy, girl: 33%
girl, boy: 33%
boy, boy: 33%
When we really should be seeing:
boy, boy: 50%
boy, girl: 25%
girl, boy: 25%
Or, more accurately:
same-gender, both male: 50%
different-gender: 50%
boy first: 25%
girl first: 25%
Another way to frame the query, with similar results, is to say:
Select the gender of all second children where the first child was born on a Tuesday and the first child was male.
Select the gender of all first children where the second child was born on a Tuesday and the second child was male.
You'll note the girl, girl families will show up in neither result set. So they can do nothing to skew the numbers.
The results of both queries will, predictably, be 50/50 male and female.
This is a good example of why framing a problem correctly is so difficult and critical. It's only because this problem is so amenable to experimental formulation that it's easily defensible.
(Note that the use of Tuesday was an excellent DoS against math geeks.)
(Note also, by the way, this is the exact opposite of the Monty Hall problem. In that problem, people are expecting:
Door 2: 50%
Door 3: 50%
Host Told You Where The Car Was: 66%
Was Behind 3, Therefore Exposed 2: 33%
Was Behind 2, Therefore Exposed 3: 33%
Host Didn't Tell You Where The Car Was: 33%
Randomly Exposed 2: 16.5%
Randomly Exposed 3: 16.5%
If you modify the Monty Hall problem, such that he opens a random door *which might actually expose the car*, then when he opens the door and you see a goat, it doesn't matter whether you switch or not.)
[This is Dan]
So, if they're so great, why does the boss have to put a gun to people's head?
Anyone can make an omelet with eggs. The trick is to make one with none.