Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Intel

Are Intel's i9-13900k's and -14900k's Crashing at a Higher Rate? (techradar.com) 66

"Intel's problems with unstable 13th-gen and 14th-gen high-end CPUs appear to run deeper than we thought," writes TechRadar, "and a new YouTube video diving into these gremlins will do little to calm any fears that buyers of Raptor Lake Core i9 processors (and its subsequent refresh) have." Level1Techs is the YouTuber in question, who has explored several avenues in an effort to make more sense of the crashing issues with these Intel processors that are affecting some PC gamers and making their lives a misery — more so in some cases than others. Data taken from game developer crash logs — from two different games — clearly indicates a high prevalence of crashes with the mentioned more recent Intel Core i9 chips (13900K and 14900K).

In fact, for one particular type of error (decompression, a commonly performed operation in games), there was a total of 1,584 that occurred in the databases Level1Techs sifted through, and an alarming 1,431 of those happened with a 13900K or 14900K. Yes — that's 90% of those decompression errors hitting just two specific CPUs. As for other processors, the third most prevalent was an old Intel Core i7 9750H (Coffee Lake laptop CPU) — which had a grand total of 11 instances. All AMD processors in total had just 4 occurrences of decompression errors in these game databases.

"In case you were thinking that AMD chips might be really underrepresented here, hence that very low figure, well, they're not — 30% of the CPUs in the database were from Team Red..."

"The YouTuber also brings up another point here: namely that data centers are noticing these issues with Core i9s."

More details at Digital Trends... And long-time Slashdot reader UnknowingFool wrote a summary of the video's claims here.
This discussion has been archived. No new comments can be posted.

Are Intel's i9-13900k's and -14900k's Crashing at a Higher Rate?

Comments Filter:
  • Next refresh looks pretty similar from what little leaks there has been.

    • > Next refresh looks pretty similar from what little leaks there has been.

      Ooof. I had to move my PC into another room due to the heat generated with 14.9k, is it the same with 15.9?

    • New core design, new process. Keep in mind that the 12900k/Alder Lake didn't have this problem. Arrow Lake-S (which is what the 15900k will be, if that's what Intel calls it) may not have this difficulty. The worst thing about Arrow Lake-S is that it probably won't be that much faster than a 14900ks/Raptor Lake Refresh. The downside to the 14900ks (and 13900k/ks before it) is that, apparently, it's a deeply flawed CPU.

      Yes, Intel deserves a lot of hate for this problem. Yes, you should be spending your

  • i9-13000K Owner (Score:5, Interesting)

    by JBMcB ( 73720 ) on Saturday July 13, 2024 @11:23AM (#64623165)

    I have a dodgy MSI i9-13000K build I got a great deal on. I've done multithreaded video encodes, blender renders, and hours-long Linux compiles, and have had one unexplained crash over the course of about a year.

    I did, however, go into the EFI and reduce the maximum CPU power draw from the MSI-set 4000(!) to the Intel recommended ~250W.

    • Re: i9-13000K Owner (Score:5, Interesting)

      by godrik ( 1287354 ) on Saturday July 13, 2024 @11:54AM (#64623207)

      It is probably not every processor or we would have noticed way before that. It is likely a production issue. Probably they had bad wafers.
      But is could also be something like under high temperature and if you are on a lower quality wafer, this particular instruction fucks up.

      I had a few of pentium 3 on which SSE instructions fucked up at high temperature. (sse sqrt if I remember correctly.) But most of the one I had access to worked just fine, even at high temperature.

      • by JBMcB ( 73720 )

        I had a few of pentium 3 on which SSE instructions fucked up at high temperature. (sse sqrt if I remember correctly.) But most of the one I had access to worked just fine, even at high temperature.

        The extended instructions seem to be an ongoing issue on Intel. AVX512 had all kinds of problems as well, to the point Intel basically punted AVX512 on consumer chips and is doing the hybrid VNNI think instead. Also the TSX instruction set on Haswell, though instead of causing overheating I think it simply didn't work properly.

        • by godrik ( 1287354 )

          The key problem of AVX 512 is that only few problems can actually use vectors of that size. To make the instruction set more useful, they added a bunch of in-vector processing to do various forms of shuffling, data reordering, sparse load/stores kind of things. And these are actually costly in term of complexity of the chip you need to design. But if you don't have them it kinds of becomes hard to use for such large vectors. Without even mentioning the need for 512 bit registers which are quite energy consu

      • Not technically "production" but rather a "binning" issue. All wafers are varying degrees of bad, as are the chips on them. They are graded and binned into product lines based on performance and stability at that performance.

        Could very well be an incorrect binning.

      • by AmiMoJo ( 196126 )

        It's estimated to be around 25% of affected models.

        It's not really clear if heat makes it worse. More cores probably does, simply due to probability.

        Intel will try to patch it, but if the patch hits performance people won't be happy with the CPU lottery it creates.

      • There is something else going on. It is not bad wafers because errors occur on different CPUs purchased months apart. Also, other products made from those same wafers have had no issues. The issue is also not with temperature as shown by the errors occurring in the data centres where there is appropriate cooling and no overclock.

        This appears to be an issue of silicon degradation - a worst case scenario for Intel. So there is a section within the i9 product range that is stressed differently from the

    • I read TFA along with a few related ones and it appears that your solution alleviates the problem somewhat, but the rate of failure is still far higher than with older Intel processors or AMD equivalents. What makes things worse is that datacenters are also reporting problems, datacenters do not overclock, in fact this problem is occurring on MoBos which do not even have that option.
      A couple of months ago it appeared that the problem was down to overclocking - with Intel having failed to tell the MoBo manu

      • I read TFA along with a few related ones and it appears that your solution alleviates the problem somewhat, but the rate of failure is still far higher than with older Intel processors or AMD equivalents. What makes things worse is that datacenters are also reporting problems, datacenters do not overclock, in fact this problem is occurring on MoBos which do not even have that option.

        Agreed. If you tell someone not to do something because it will cause problems, or if you tell them, if you're going to do thi

      • I had to RMA my first Ryzen 3xxxx, there was a pretty big stink done about them too iirc.

    • I helped a client with a gaming system running the Core i9-14900KS (iirc), that was crashing when trying to launch games. I similarly had to set the power limits to Intel's current recommendations to achieve stability. This was before I had heard about the issues with these Raptor Lake chips.

      It also wasn't just games. Part of what tipped me off that I was seeing a processor issue was when the NVIDIA driver (a self-extracting 7-Zip archive) would consistently throw a CRC error at a random point during the pr

      • if you run memtest, does the issue happens in different places ?

        • Ran Memtest86 on it for 3 days, never reported any errors. I presume that the instability only manifested under loads that were pushing the power usage above the limit (that is now) recommended by Intel. Memtest86 may not have pushed it across that threshold. From what I've read online, very math/arithmetic-heavy operations were more likely to be affected.
          Didn't particularly test different archives / compression algorithms. The system could definitely extract some other various archives without error, incl
    • You tried First Descendant yet? Also, some applications are going into "compatibility mode" because the Raptor Lake-S systems are disabling AVX/AVX2 due to instability.

  • Elevation (Score:5, Informative)

    by buzz_mccool ( 549976 ) on Saturday July 13, 2024 @11:33AM (#64623175)

    A computer company I worked for had a problem with our CPUs crashing. Some investigation found the problem much more prevalent at a customer in Flagstaff Arizona, and again at Mexico City.

    Cosmic rays were causing problems because we only had error detection and not error detection and correction circuitry in portions of the cache memory design.

  • I wonder if they're starting to run into unanticipated quantum effects? Sort of the opposite of a quantum computer chip solving problems using superposition expansion - a conventional electronic chip subject to occasional interference from quantum indeterminacy?
    • by Luthair ( 847766 )
      No this is Intel falling behind AMD and cutting corners to pretend to be competitive.
      • by gweihir ( 88907 )

        Yep, exactly. Design-wise Intel has been behind AMD for quite a while, but their manufacturing edge kept them somewhat competitive as did the crap they did with grossly insecure speculative execution. That is over. Seriously, buy AMD. Intel is way overpriced and unreliable on top.

    • by Anonymous Coward

      I wonder if they're starting to run into unanticipated quantum effects?

      No, almost certainly not in the way you are thinking, and definitely not in the way you worded it for others.

      Chip fab reached the point allowing for quantum tunneling about 15 years ago.
      The FET standard layout libraries have long since been adjusted to both account for preventing tunneling when not desired, and slightly more recently, incorporate designs that intentionally take advantage of quantum tunneling for function.

      "Starting to" is the key word prompting the "definitely not" part of my response.

      That s

    • The issue is most attributed to the high end CPUs, the i9 that has overclocking capabilities. It is the kind of CPU that runs fairly hot (uses over 200W), you probably should consider water cooling for these CPUs. Other CPUs in the i3 and i5 range are supposed to be running just fine.

      On top of that, motherboard manufacturers try to make the CPU the fastest by tweaking many of the different settings available. Nobody wants to manufacture the motherboard that runs an expensive CPU slower than what it can run

  • Trying to sell silicon that doesn't work anymore. So funny.

    I need some stickers that say "AMD Inside".

  • This has been known for a while
    Motherboard makers ship BIOSs with extreme defaults, causing processors to run hot. I suspect that they do this to improve benchmark scores
    I configured my BIOS settings to what Intel recommended, and all is well
    Methinks the problem comes from pushing chips to their limits. Back off a bit, and all is well
    In the old days, default settings were conservative and overclockers discovered that they could easily push chips to higher performance
    Now, the defaults are at or beyond the li

    • Wendell specifically mentioned this in the video and stated that the initial explanation of aggressive motherboard defaults does not appear to be the root cause of this issue (although I imagine it may not help). The reason he believes this is because the issue is also widespread among gaming servers which have motherboards with sane defaults and no ability to overclock.
    • Intel CPUs have had this problem in one form or another since Coffee Lake/8700k. Alder Lake-S didn't have weird instability and degradation problems despite many mobo OEMs shipping with "defaults" that let the CPU slam up against the 253W PL2 with unlimited tau.

  • Now that they cannot compete on manufacturing anymore and have been found out about all the crap they did with speculative execution, they apparently try other dirty hacks to fake performance. Seriously, stop buying Intel. You are being ripped off.

  • by organgtool ( 966989 ) on Saturday July 13, 2024 @01:43PM (#64623369)
    Wendell from Level1Techs was also featured in a video with Steve from Gamers Nexus [youtube.com] where they discussed the problem further. While they didn't provide much additional info in that video, Steve mentioned that he has a lead about a potential cause for this problem but he's not ready to release this info yet. If you own one of these processors and this problem is impacting you, stay tuned as I believe we'll be getting much more detailed information soon.
  • A few more details (Score:4, Informative)

    by doug141 ( 863552 ) on Saturday July 13, 2024 @02:42PM (#64623481)

    According to https://www.youtube.com/watch?... [youtube.com] the processors are not hot, are not overclocked, and processors that will pass a test when new are failing the same test after running half a year.

  • Level1Techs is awesome. Wendell is the best. You should watch That is all.
  • In the issues on PrusaSlicer's GitHub page, there have been a couple users that had crashing problems eventually attributed to i9.

  • I was originally looking at the 14900k, a 7950x3D, or the the 7900x, but ended up changing my mind.

    I skipped the i9 for issues we are all seeing, and changed my mind about AMD after watching a video from Jazy2Cents, which covered the ongoing memory problems that he was having -- something these i9s are also having now. He also talks about the limitations and issues of the 3D cache outside of gaming:
    https://www.youtube.com/watch?... [youtube.com]

    I've owned many Intel and AMD CPUs going back decades. My bias it to

"If it ain't broke, don't fix it." - Bert Lantz

Working...