Here are details of the hack:
The hackers used an exploit, that's been known for Cisco for about 2 years. It's called CVE-2018-0171 and affects Cisco IOS and IOS XE software. Specifically, it's a bound checking error that can be attacked using UDP on one specific port. What the hackers did was simply execute a buffer overflow attack on the HTTP format of the authentication. That is, they overrode the authentication with an executable script, which obviously writes over the memory address denying anyone not authorized, to gain root access.
Cisco has advertised that this was a potentially dangerous exploit but said that they won't issue a patch:
https://www.bleepingcomputer.com/news/security/cisco-warns-of-auth-bypass-bug-with-public-exploit-in-eol-routers/
Here's a quote from the linked article: Despite rating it as a critical severity bug and saying that its Product Security Incident Response Team (PSIRT) team is aware of proof-of-concept exploit code available in the wild, Cisco noted that it "has not and will not release software updates that address this vulnerability."
Instead, Cisco sent out warnings about which ports should be blocked from sending and receiving UDP packets
I thought this was a statistically sound blog but there are two points worth considering (and not mentioned):
1) RT is an average of an average. That is, it's a binary variable, good or bad, of a review and this is then averaged for all reviewers. It's worth considering if Fandango changed either or both of these measurements. In some cases, for instance, it's not clear if a review is positive or negative. It could have been, if an analysis found that there was an equal amount of positive and negative sentiment that it should revert to negative. Fandango could have changed this to revert to positive. Also, "average" is a statistically variable concept. It could also be that, prior to 2016, the averages weren't weighted. Now, it could be that they are. Or, it could that they weren't and now they are. The point: small changes in measurements metrics could be what caused the more positive reviews -- not actual manipulation of the reviews or reviewers itself.
2) It's worth looking at the ratio of positive to negative reviews for each reviewer, before and after the acquisition. If the reviewers selected after 2016, for instance, like more movies than not, and significantly more than before 2016, than this may also cause the change the acquisition.
If you assume these AI servers contain integrated WIFI chips, then the question is why a backdoor wasn't installed?
This likely could be only a few lines of code. Also, the transmitted signal doesn't necessarily have to connect to the Internet. It could either send an encrypted packets of message identifying itself or, also, the message could simply state what ports are open and sysadmin credentials. This, then, could be intercepted with a packet sniffer. In any case, there are many other ways a backdoor could be installed and made to be very difficult to detect.
The question is why it wasn't done so and I suspect the answer is: 1) this was done without consent of the manufacturers 2) the people who put the trackers on the shipments don't have the technical knowledge to do so.
(Haven't read the article.) Am going to assume what you've described is true. If so, then you've confused the qualitative with the quantitative. That is, the increase in the risk of disease is an average, measured with many types of people. What you're describing can be measured quantitatively. That is, all that's necessary is to get the right medical tests to determine how, say, your insulin level changes the more hot dogs you eat.
So, the likely possibility is that eating one hot dog, say, a week may make one but, say, eating 100 Slim Jims a week could be deadly. So, this would be clearly evident with a quantitative test but be smoothed out in an average.
This is really strange. This prediction seems to violate the two principles (that I'm aware of) of swarm intelligence (SI), 1) Optimizing an objective function 2) Finding how to learn what is the best strategy based on past decisions. I'll briefly describe both. (Btw, am not an expert. This is what I remember from papers I read years ago.)
1) The purpose of SI is to optimize a function. This can be a loss function or, in this case, it can be a prediction algorithm. So, if SI failed to predict the winner(s), then this is independent of maximizing the loss function of the prediction algorithm. In other words, the predictions may have been the best predictions given the loss function. Or, there was no way for the AI to make a better prediction.
2) This can be probably be best described using a physical metaphor, rather than the concept of Pareto optimality (which I haven't used in years). SI is based on the idea that, say, a colony of ants can, first, find a food source and optimize the best path to food source using only information that's collected from the ant themselves. This is done using an optimization method that reduces the search space to a few variables in which to search and therefore maximize.
So, the problem with this one prediction of the Kentucky Derby is that that SI algorithm simply hasn't been given the parameters to learn. Maybe, using historical data and this initial wrong guess, it could greatly improve its initial prediction.
A problem, I speculate, are cases that involve long-tails. That is, probability distributions in which data is sparse and this makes prediction or classification low or with high variance. Or, this could also be true (with long tails), that the aggregate probability of classification is high but the individual classification is low. This is a problem that large retailers have. A potential buyer fits the probabilistic category of needing or wanting product x but, individually, these people never buy. Two common long tails are the power law distribution and the Cauchy distribution. In the case of the power law, the mean and variance only exist under certain boundary conditions. In the case of the Cauchy, the mean and variance don't exist, which makes prediction inherently impossible.
One example -- in case this was too abstract -- consider that you're a person who the UK has classified has a high likelihood of committing murder. Individually, however, you're likelihood would actually be low. What could account for this difference? Because, let's say, n variables are used to make the prediction. And of these n, this person has a high likelihood of a fit and with low variance. But, only one of the n variables is actually predictive of murder. Or, because the data is so sparse, there simply isn't enough data to correctly categorize this person, the UK government could greatly damage this person's life with a misclassification; not because of any ill intention (as is mentioned in the comment) or bureaucratic error, but because the classification system is inherently flawed; and those who use it aren't aware that it is.
A few points:
* The (linked) article notes that the Bureau of Labor Stats makes a distinction between "computer programmer" and "software developer". Why? It wasn't fully explained in the article. In any case, the earlier is declining in employment and the latter is increasing. What was also not addressed; is the correlation between the two scalar invariant? That is, is the decline in employment of programmers equivalent to the increase in developers?
What was not mentioned in the article: a comparison of variance of employment for the same time period over many decades. In other words, if you assume that the number employed as computer programmers has the same distribution over many generations, then you can compare the variances of the same time period. Or, is the variance of a decline of 25% within what would be considered a typical range of variance (for the same time period)?
Why is it important to consider the variance? Because then this would be the first (important) step in defining the decline in jobs that has a casual relationship to the frequent use of AI. In other words, this could show that there's a heteroscedasticity correlation between the variables and, therefore, a comparison of variances from different time periods can't be used with this data set.
Read the blog post. The author doesn't normalize for sample sizes. Specifically, the author assumes that 1) makes no attempt to normalize for the different number of reviews of any given show and any given episode. If, say, one show has 10 reviews for the first six episodes and another show has ten thousand, on average, for the first six shows, the author both doesn't recognize this difference and doesn't change his methodology to account for this difference.
The second problem with methodology is that the possibility exists that the reviewers of the first six episodes are highly correlated with each other. That is, viewers of the first six episodes may be the most adherent watchers of new shows, the most likely to comment, and therefore the reviews of the first six episodes have a different viewpoint; one that's more critical, than other reviewers of other episodes. Or, the reviews of later episodes are more likely to be more representative of the mean reviewer.
The solution is to use a sampling method that assumes a given mean and a given variance determined by the central limit theorem of all episodes. Then, using a sampling methodology, determine if the characteristics of the reviewers of the first six episodes are different from other reviewers. For instance, what's the likelihood that a reviewer of the first six episodes is the 1) the first to comment on a new show 2) only comments on the first few episodes? Then, compare this likelihood to chance of the mean reviewer
"Facts are stupid things." -- President Ronald Reagan (a blooper from his speeach at the '88 GOP convention)