Comment Re:Example vs Practical (Score 1) 62
I was swayed by another comment in this discussion that points out that, for whatever reason, his example is an LLM analysis of a single routine manifested as 120 bytes of machine code. The choice to use something so utterly short is enough to perhaps re calibrate expectations for practical use. It did spot a couple of real issues but mostly buried the user in a list of "I know already" about how the general environment is not exactly credibly secure at all. 75% of the 'findings' were just "Hey, Apple II doesn't have anything that looks like 'security'. LLMs kind of become more incoherent with volume of stuff thrown at them, and so unclear if this is practically scalable at the moment to something more practical.
A human could conceivably hand review 120 bytes of Apple II machine code, but doing the same for even a modest library becomes largely untenable. LLMs are likely in the same boat in this respect, just much faster at getting wherever that boat might be able to go.