Researchers find AI is Bad at Predicting GPA, Grit, Eviction, Job Training, Layoffs, and Material Hardship (venturebeat.com) 63
A paper coauthored by over 112 researchers across 160 data and social science teams found that AI and statistical models, when used to predict six life outcomes for children, parents, and households, weren't very accurate even when trained on 13,000 data points from over 4,000 families. From a report: They assert that the work is a cautionary tale on the use of predictive modeling, especially in the criminal justice system and social support programs. "Here's a setting where we have hundreds of participants and a rich data set, and even the best AI results are still not accurate," said study co-lead author Matt Salganik, a professor of sociology at Princeton and interim director of the Center for Information Technology Policy at the Woodrow Wilson School of Public and International Affairs. "These results show us that machine learning isn't magic; there are clearly other factors at play when it comes to predicting the life course."
The study [PDF], which was published this week in the journal Proceedings of the National Academy of Sciences, is the fruit of the Fragile Families Challenge, a multi-year collaboration that sought to recruit researchers to complete a predictive task by predicting the same outcomes using the same data. Over 457 groups applied, of which 160 were selected to participate, and their predictions were evaluated with an error metric that assessed their ability to predict held-out data (i.e., data held by the organizer and not available to the participants). The Challenge was an outgrowth of the Fragile Families Study (formerly Fragile Families and Child Wellbeing Study) based at Princeton, Columbia University, and the University of Michigan, which has been studying a cohort of about 5,000 children born in 20 large American cities between 1998 and 2000.
The study [PDF], which was published this week in the journal Proceedings of the National Academy of Sciences, is the fruit of the Fragile Families Challenge, a multi-year collaboration that sought to recruit researchers to complete a predictive task by predicting the same outcomes using the same data. Over 457 groups applied, of which 160 were selected to participate, and their predictions were evaluated with an error metric that assessed their ability to predict held-out data (i.e., data held by the organizer and not available to the participants). The Challenge was an outgrowth of the Fragile Families Study (formerly Fragile Families and Child Wellbeing Study) based at Princeton, Columbia University, and the University of Michigan, which has been studying a cohort of about 5,000 children born in 20 large American cities between 1998 and 2000.
Re: Mod parent up (Score:2)
actually read the original paper (Score:2)
https://www.pnas.org/content/e... [pnas.org]
The study methodology was to solicit numerous various predictive techniques to avoid issues that single ones might be over-fit or underspecified for the task.
So yes, the paper's results and conclusions are about the dataset or its underlying facts: life course is not highly predictable from inputs observable at previous ages. The target was certain outcomes at age 15, and inputs used at earlier years.
However, it would be even more interesting to see if these previous varia
Re: (Score:2)
The title should read: Researchers stumped to find data set that works as a good predictor or GPA, etc
Indeed. When they say the model performed poorly, they don't say compared to what. Are humans "experts" any better when looking at the same data? If not, then the problem is likely with the data, not with the program.
If you want to predict GPA, the most important factors are family income, IQ, and the outcome of the marshmallow test.
Re: (Score:2)
Also, (Score:2)
Re: (Score:3)
I just run true for that. It also tells when I want a beer
Re: (Score:2)
I want a beer
That's slightly more complex and it needs to know the time of day. Before 12: beer. After 12: whisky. After 2 and still awake: tequila.
Re: (Score:2)
lack of data (Score:4, Insightful)
Re: (Score:3, Interesting)
So, "artificial" "intelligence" presupposes a data set equal in size to the population?
Re: lack of data (Score:3)
Re: (Score:2)
Maybe not the population, but larger than this.
That was a rhetorical question. "Training a deep neural network" is just building a database of dumb statistical rules. They work in situations in which your statistical model is very simple and very clear and there is no need for real analysis and understanding.
and after training on those images is able to recognize an image of a dog that it has never seen before.
This is so precisely because recognizing a physical shape is fairly simple problem and can be done with a glorified statistical package. The things mentioned in the summary have nothing to do with that, so they will fail even with a dataset twice th
Re: (Score:1)
The things mentioned in the summary could be reasonably modeled with a simple rules based classification scheme and therefore could be modeled with AI.
The dirty secret is they undoubtedly adjusted their models to meet with diversity assumptions which is where they deter from actual human evaluation of the same and break. That said, even human level intelligence only does so well at predict the future behavior of other humans in a highly chaotic world. This breaks because whatever correlations underlay stere
Re: (Score:2)
The things mentioned in the summary could be reasonably modeled with a simple rules based classification scheme and therefore could be modeled with AI.
Sure, Jan, modelling complex social behavior is so easy, that it will work 100% correctly if only your subtly oozing biases are applied.
Re: (Score:2)
They could perhaps be modeled in the aggregate - what percentage of students will have a given GPA, etc. For an individual student it's essentially trying to predict all their interactions with their graders, and for that you need (a) to decide that the universe is deterministic and free will is an illusion, and (b) really good position/velocity figures for the elementary particles in (at least) the solar system, so you can predict neuron firings.
Re: (Score:2)
"They could perhaps be modeled in the aggregate"
Exactly. This is my point about stereotypes, whatever correlation underlies them tends to hold in aggregate but fall apart when applied to individuals. Stereotypes are essentially the aggregate patterns found by large numbers of actual neutral nets examining large populations. If you correct them away you've likely corrected out the most significant statistical correlations available. Not that I'm claiming they are strong correlations or accurate predictors, t
Re: (Score:2)
Re: (Score:2)
What is wrong with a bigger data set?
The thinking that a bigger data set will solve your real problem is wrong.
There are some things (1) that you can reasonably model with dumb statistics. These are either things that have simple relationships that are hard to calculate because there are many, or things you've analyzed creatively, understand well analytically, and can explain why a certain statistical approach would work - that is, why fitting coefficients will produce predictions with a small error.
Then there are things (2) for which you need
not quantity but quality? (Score:2)
Or perhaps rather missing a crucial state variable in the data set?
Re: (Score:2)
As I recall, Tensorflow needs 70,000 points to recognize shoes.
But a lot of life turns on chance and interactions with other people.
The race is not to the swift or the battle to the strong, nor does food come to the wise or wealth to the brilliant or favor to the learned; but time and chance happen to them all.
Re: (Score:3)
Re: (Score:2)
Actually handing
Re: (Score:3)
That's a very large set in social science and the methods should be able to handle it.
The predictors ranged from very simple to very complex (with strong regularization of course)---the complex ones didn't do that much better than the simple ones. A baseline set of linear and logistic regressions was competitive with the best of the machine learning methods, and all were lousy. That's a consequence of low intrinsic predictability, or if you think about it another way, that the observed outcome has a large a
Re: (Score:3)
Statistics cannot solve every problem. These things that is reported here is a problem for even professionals to figure out.
Those who work in hiring probably had hired someone who you first thought would be a really good fit, only to have them being a problem employee.
The problem isn't data points, but the fact people will give faulty data.
College GPA: That 2.5 in college may become a 3.5 GPA because after highschool the person no longer has to take classes they really don't want to take, or with the profe
Re: (Score:2)
If there is no significant difference, you know your modelling technique is crap.
AI? (Score:1, Troll)
Oh... silly me, you mean Actual Idiots as in the people who believe this smelly dog poo.
Individuals (Score:3)
We keep trying to group people into nice neat categories so that we can exploit them and discriminate.
The real solution, is understanding that the individual is the smallest group set, and is unique. Any categorization that doesn't have that is going to fail.
Group modeling is good for getting group statistics. AI can't account for individual outliers that defy standard categorizations.
People can't either (Score:1)
While, sure, AI can't account for individual outliers well, neither can people, when it comes down to it, and in a lot of cases, automated rules can actually increase accuracy, thus the use of checklists and such improving outcomes in flight and surgery.
Note, the models they were trying to build don't have to be perfect, just good enough to enable authorities to better identify people more at risk and therefore in need of intervention.
Hopefully there'd be multiple layers of this, so even if somebody turns o
Re: (Score:2)
While, sure, AI can't account for individual outliers well, neither can people, when it comes down to it, and in a lot of cases, automated rules can actually increase accuracy, thus the use of checklists and such improving outcomes in flight and surgery.
Note, the models they were trying to build don't have to be perfect, just good enough to enable authorities to better identify people more at risk and therefore in need of intervention.
Hopefully there'd be multiple layers of this, so even if somebody turns out to actually need help who didn't get it, they'd slide a bit further, more indicators would pop, then they'd get helped. We can only do the best we can.
If the AI models aren't able to be good predictors, to me that indicates that they aren't looking at the right things, there's something missing, assuming it can be predicted at all.
The problem, as I see it, is people often place far more confidence in AI predictions simply because it is done by a computer, and then assumed to be unbiased and accurate. As with many models, it can provide some insight as to where to look, but one shouldn't assume that it has identified a set that includes all subjects of interest.
Re: (Score:2)
Note, the models they were trying to build don't have to be perfect, just good enough to enable authorities to better identify people more at risk and therefore in need of intervention.
We might be making the mistake of trying to use group statistics to identify individuals. IT CANNOT BE DONE that way. Why? Because there are always outliers who defy the group dynamics. And now, we've mis-categorized someone because AI modeling says they ought to be categorized that way. And now, we've possibly placed a not-so-veiled bigoted stigma upon people needlessly.
We are all individuals, and that ought to be our starting point, not the ending point. It is way more work to judge people for themselves
Re: (Score:2)
Related: GIGO (Score:1)
Re: (Score:2)
Re: (Score:2)
Wrong Data, not 'too little data' (Score:3)
Trying to decide people's GPA based on their Chlorestal level will not work, no matter how much data you have.
Science works like this: a) hypothesize that X will predict Y
b) Test. c) Conclude d) repeat.
They tested and found their hypothesis false. That does NOT mean that all other hypothesis are false. It just means they have to pick different kind of data to feed into their AI.
Or what they call AI is really just a d20.
Re: (Score:2)
Trying to decide people's GPA based on their Chlorestal level will not work, no matter how much data you have
Actually you sort of can, and therein lies the problem with a huge amount of these things.
The reason you can is that GPA and cholesterol are probably both correlated with some underlying factor, e.g. household income. However, GPA is probably conditionally independent of cholesterol given the household income. If you build a classifier, you will probably do considerably better than random, if that co
Re: (Score:2)
Re: (Score:2)
Unless you're excluding the better underlying predictors from the data anything properly developed (maybe that's asking a lot) shouldn't get tripped up by anything that easily identifiable.
You'd hope so, but only in a perfect world.
We already have statistical techniques that allow us to identify these.
Only in some cases, and only if you have the things well labelled. In this case you're probably not trying to predict GPA from a single variable but from the entire collection. And none of the variables (even
Re: (Score:2)
The reason you can is that GPA and cholesterol are probably both correlated with some underlying factor
Prove that claim. The people, I am not going to call them scientists because of the quality of their work, from this article tested there hypothesis and it failed because it was stupid to begin with.
Re: (Score:2)
Prove that claim.
Wut. OK, that's getting pretty aggressive for a fairly informal discussion. It was drifting towards the hypothetical, but you can often predict all sorts of stuff, but that doesn't make it *useful*.
Anyway, the article is paywalled. The summary has gems such as:
Re: (Score:2)
Unless you have a large number of students who are eating cheeseburgers rather than studying.
Simple reason why this is: (Score:2)
Re: (Score:2)
We have big, weighted decision trees that are highly deterministic.
Mostly these days people use deep neural networks rather than random forests.
Re: (Score:2)
We have big, weighted decision trees that are highly deterministic.
Those are called Expert Systems. What are discussed here are statistical classifiers (neural networks, etc), which basically divide things into groups in a large multidimensional space.
Comment removed (Score:5, Insightful)
Re: (Score:2)
"no one has yet come up with a reliable way to "win" the stock market"
The only way to win is not to play.
All gambling works like that.
Only the house comes out ahead.
Re: (Score:2)
Re: (Score:2)
The simple gritty truth is that it can only do things that people can already do, just faster.
Not really. AI can do some things that people can't do, and people can do many things that AI can't do. And AI can do stuff faster. As time goes on we'll push back the limits of what AI can do and increase the space of things it can do that people can't, and decrease the space of things it can't do that people can. But it seems likely to be a slow process, and one filled with surprises in both directions.
What were they expecting? (Score:2)
Was the "AI" supposed to find meaningful patterns in statistical noise? Are they trying to prove GIGO is really a thing?
Source material questionable? (Score:1)
Paper does not consider IQ??? huh?? (Score:2)
Cuts into my liberal side (Score:2)
Re: (Score:1)
Don't need AI to predict economic success. The most reliable predictor is the economic status of the parents.
Re: (Score:2)
Social science is not science (Score:1)
Take with a grain of salt. (Score:2)
co-lead author Matt Salganik, a professor of sociology
The next sign is
Fragile Families Challenge
...
Challenge was an outgrowth of the Fragile Families Study (formerly Fragile Families and Child Wellbeing Study)
And this sign of selection bias
It’s designed to oversample births to unmarried couples in those cities,
Now, for the pièce de résistance:
“Either luck plays a major role in people’s lives, or our theories as social scientists are missing some important variable,” added McLanahan. “It’s too early at this point to know for sure.”
Please note how this conclusion says either chance has a major role OR "social scientists are missing some important variable".
It isn't the AIs, it is the assumptions of social scientists ( see the first sign above ) that are the problem.
Garbage In, Garbage Out (Score:2)