Utter crap. Codenomicon are very friendly to FLOSS and FLOSS developers. They're also great guys. They have been providing free test services to the Samba project for many years now, and have helped us fix many many bugs.

In case you hadn't noticed, the code they're reporting on here is closed source proprietary code...

Software on Internet-connected devices is a bit different from your examples though. No matter how insecure cars are, it would be really hard for me to steal a million cars in one night, let alone without being caught. Yet, it's common to see millions of computers/phones being hacked in a very short period of time. And the risk to the person responsible is much lower.

Andy Updegrove (956488) writes "The U.K. Cabinet Office accomplished today what the Commonwealth of Massachusetts set out (unsuccessfully) to achieve ten years ago: it formally required compliance with the Open Document Format (ODF) by software to be purchased in the future across all government bodies. Compliance with any of the existing versions of OOXML, the competing document format championed by Microsoft, is neither required nor relevant. The announcement was made today by The Minister for the Cabinet Office, Francis Maude. Henceforth, ODF compliance will be required for documents intended to be shared or subject to collaboration. PDF/A or HTML compliance will be required for viewable government documents. The decision follows a long process that invited, and received, very extensive public input – over 500 comments in all."
sfcrazy (1542989) writes "UK has decided to use ‘open standards’ for sharing and viewing government documents. The announcement was made by the Minister for the Cabinet Office, Francis Maude. One of the primary objectives of this move is to create a level playing field for suppliers of all sizes. The move must put some pressure on Google to offer full support for ODF in Chrome, Android and Google Docs."
> the first ones used threads, semaphores through python's multiprocessing.Pipe implementation.

I stopped reading when I came across this.

Honestly - why are people trying to do things that need guarantees with python?

because we have an extremely limited amount of time as an additional requirement, and we can always rewrite critical portions or later the entire application in c once we have delivered a working system that means that the client can get some money in and can therefore stay in business.

also i worked with david and we benchmarked python-lmdb after adding in support for looped sequential "append" mode and got a staggering performance metric of 900,000 100-byte key/value pairs, and a sequential read performance of 2.5 MILLION records. the equivalent c benchmark is only around double those numbers. we don't *need* the dramatic performance increase that c would bring if right now, at this exact phase of the project, we are targetting something that is 1/10th to 1/5th the performance of c.

so if we want to provide the client with a product *at all*, we go with python.

but one thing that i haven't pointed out is that i am an experienced linux python and c programmer, having been the lead developer of samba tng back from 1997 to 2000. i simpy transferred all of the tricks that i know involving while-loops around non-blocking sockets and so on over to python. ... and none of them helped. if you get 0.5% of the required performance in python, it's so far off the mark that you know something is drastically wrong. converting the exact same program to c is not going to help.

The fact you have strict timing guarantees means you should be using a realtime kernel and realtime threads with a dedicated network card and dedicated processes on IRQs for that card.

we don't have anything like that [strict timing guarantees] - not for the data itself. the data comes in on a 15 second delay (from the external source that we do not have control over) so a few extra seconds delay is not going to hurt.

so although we need the real-time response to handle the incoming data, we _don't_ need the real-time capability beyond that point.

Take the incoming messages from UDP and post them on a message bus should be step one so that you don't lose them.

.... you know, i think this is extremely sensible advice (which i have heard from other sources) so it is good to have that confirmed... my concerns are as follows:


* how do you then ensure that the process receiving the incoming UDP messages is high enough priority to make sure that the packets are definitely, definitely received?

* what support from the linux kernel is there to ensure that this happens?

* is there a system call which makes sure that data received on a UDP socket *guarantees* that the process receiving it is woken up as an absolute priority over and above all else?

* the message queue destination has to have locking otherwise it will be corrupted. what happens if the message queue that you wish to send the UDP packet to is locked by a *lower* priority process?

* what support in the linux kernel is there to get the lower priority process to have its priority temporarily increased until it lets go of the message queue on which the higher-priority task is critically dependent?

this is exactly the kind of thing that is entirely missing from the linux kernel. temporary automatic re-prioritisation was something that was added to solaris by sun microsystems quite some time ago.

to the best of my knowledge the linux kernel has absolutely no support for these kinds of very important re-prioritisation requirements.

i am running into exactly this problem on my current contract. here is the scenario:

* UDP traffic (an external requirement that cannot be influenced) comes in
* the UDP traffic contains multiple data packets (call them "jobs") each of which requires minimal decoding and processing
* each "job" must be farmed out to *multiple* scripts (for example, 15 is not unreasonable)
* the responses from each job running on each script must be collated then post-processed.

so there is a huge fan-out where jobs (approximately 60 bytes) are coming in at a rate of 1,000 to 2,000 per second; those are being multiplied up by a factor of 15 (to 15,000 to 30,000 per second, each taking very little time in and of themselves), and the responses - all 15 to 30 thousand - must be in-order before being post-processed.

so, the first implementation is in a single process, and we just about achieve the target of 1,000 jobs but only about 10 scripts per job.

anything _above_ that rate and the UDP buffers overflow and there is no way to know if the data has been dropped. the data is *not* repeated, and there is no back-communication channel.

the second implementation uses a parallel dispatcher. i went through half a dozen different implementations.

the first ones used threads, semaphores through python's multiprocessing.Pipe implementation. the performance was beyond dreadful, it was deeply alarming. after a few seconds performance would drop to zero. strace investigations showed that at heavy load the OS call futex was maxed out near 100%.

next came replacement of multiprocessing.Pipe with unix socket pairs and threads with processes, so as to regain proper control over signals, sending of data and so on. early variants of that would run absolutely fine up to some arbitrarry limit then performance would plummet to around 1% or less, sometimes remaining there and sometimes recovering.

next came replacement of select with epoll, and the addition of edge-triggered events. after considerable bug-fixing a reliable implementation was created. testing began, and the CPU load slowly cranked up towards the maximum possible across all 4 cores.

the performance metrics came out *WORSE* than the single-process variant. investigations began and showed a number of things:

1) even though it is 60 bytes per job the pre-processing required to make the decision about which process to send the job were so great that the dispatcher process was becoming severely overloaded

2) each process was spending approximately 5 to 10% of its time doing actual work and NINETY PERCENT of its time waiting in epoll for incoming work.

this is unlike any other "normal" client-server architecture i've ever seen before. it is much more like the mainframe "job processing" that the article describes, and the linux OS simply cannot cope.

i would have used POSIX shared memory Queues but the implementation sucks: it is not possible to identify the shared memory blocks after they have been created so that they may be deleted. i checked the linux kernel source: there is no "directory listing" function supplied and i have no idea how you would even mount the IPC subsystem in order to list what's been created, anyway.

i gave serious consideration to using the python LMDB bindings because they provide an easy API on top of memory-mapped shared memory with copy-on-write semantics. early attempts at that gave dreadful performance: i have not investigated fully why that is: it _should_ work extremely well because of the copy-on-write semantics.

we also gave serious consideration to just taking a file, memory-mapping it and then appending job data to it, then using the mmap'd file for spin-locking to indicate when the job is being processed.

all of these crazy implementations i basically have absolutely no confidence in the linux kernel nor the GNU/Linux POSIX-compliant implementation of the OS on top - i have no confidence that it can handle the load.

so i would be very interested to hear from anyone who has had to design similar architectures, and how they dealt with it.

i think one of two things happened, here. first is that it might have finally sunk in to google that even just *claiming* to have properly verified user identities leaves them open to lawsuits should they fail to have properly carried out the verification checks that other users *believe* they have carried out. every other service people *know* that you don't trust the username: for a service to claim that they have truly verified the identity of the individual behind the username is reprehensibly irresponsible.

second is that they simply weren't getting enough people, so have quotes opened up the doors quotes.

It would certainly be nice, but it's not realistic. For a simple paper, it would likely cost a few thousands, but for anything that requires fancy material, it could easily run in the millions. The only level where fraud prevention makes sense is at the institution (company, lab, university) level.

So you're saying that reviewers should have to reproduce the results (using their own funds) of the authors before accepting the papers or risk being disciplined? Aside from ending up with zero reviewers, I don't see what this could possibly accomplish. Peer review is designed to catch mistakes, not fraud.

I think what is missing is that a) more reviewer actually need to be experts and practicing scientists and b) doing good reviews needs to get you scientific reputation rewards. At the moment,investing time in reviewing well is a losing game for those doing it.

Well, there's also the thing that one of the most fundamental assumption you have to make while reviewing is that the author's acting in good faith. It's really hard to review anything otherwise (we're scientists, not a sort of police)

I agree that good reviews do not need to be binary. You can also "accept if this is fixed", "rewrite as an 'idea' paper", "publish in a different field", "make it a poster", etc. But all that takes time and real understanding.

It goes beyond just that. I should have said "multi-dimensional" maybe. In many cases, I want to say "publish this article because the idea is good, despite the implementation being flawed". In other cases, you might want to say "this is technically correct, but boring". In the medical field, it may be useful to publish something pointing out that "maybe chemical X could be harmful and it's worth further investigation" without necessarily buying all of the authors' conclusion.

Personally, I prefer reading flawed papers that come from a genuinely good idea rather than rigorous theoretical papers that are both totally correct and totally useless.

This is not a new phenomenon, it seems to just be getting worse again. But remember that Shannon had trouble publishing his "Theory of Information", because no reviewer understood it or was willing to invest time for something new.

That's the problem here. Should the review system "accept the paper unless it's provably broken" or "reject the paper unless it's provably correct". The former leads to all these issues of false stuff in medical journals and climate research, while the latter leads to good research (like the Shannon example) not being published. This needs to be more than just binary. Personally I prefer to accept if it looks like it could be a good idea, even if some parts may be broken. Then again I don't work on controversial stuff and nobody dies if the algorithm is wrong. I can understand that people in other fields have different opinions, but I guess what we need is non-binary review. Of course, reviewers are also just one part of the equation. My reviews have been overruled by associate editors more often than not.

Read "hard" as "Expensive as Hell"

That is part of it yes. It requires a wide range of differently experienced people: low level software, high level software, circuit design, assembly, layout, component sourcing, factory liasion, DFt, Manufacturing etc.

Then you need to get them all to work together. And you have to pay them.

... ynow... one of the reasons i came up with the idea to design mass-volume hardware that would be eco and libre friendly was because, after having developed the experience to deal with both low-level software and high-level software, and having done some circuit design at both school and university, i figured that the rest should not be too hard to learn... or manage.

  you wanna know the absolute toughest part [apart from managing people?] it's the component sourcing. maan, is that tough. if you want a laugh [out of sheer horror, not because it was actually funny] look up the story on how long it took to find a decently-priced mid-mount micro HDMI type D [8 months].

  so anyway, i set out to find people with the prerequisite skills that i *didn't* have, offered them a chance to participate and profit. the list of people who have helped and then fallen by the wayside... i... well.... i want to succeed at this so that i can give them something in return for what they did.

If only there was some way to get more information, perhaps with a sort of "link" of some kind to a more detailed description.

here is the [old] specification of the [revision 1] CPU Card:

the current revision 2 which i am looking for factories to produce (RFQs sent out already) we will try with 2gb of RAM. this is just a component change not a layout change so chances of success are high.

here is the [old] specification of the Micro-Engineering Board:

that was our "minimal test rig" which helped verify the interfaces on the first CPU Cards (and will help verify the next ones as well, with no further financial outlay needed. ever. ok, that would be true if i hadn't taken the opportunity to change the spec before we go properly live with it!! you only get one shot at designing a decade-long standard.... i'd rather get it right)

this will be the basis of the planned crowd-funding campaign: it's more of a micro-desktop PC:

the micro-desktop chassis is very basic: VGA, 2x USB, Ethernet, Power In (5.5 to 21V DC). all the other interfaces are on the CPU Card (USB-OTG, Micro-HDMI, Micro-SD). however unlike the Micro-Engineering Board, the power is done with a view to the average end-user (as is the VGA connector which means 2 independent screens, straight out the box).

does that help answer the question?

Open hardware sounds cool, but as others have noted, good hardware design is both difficult and expensive. Considering how rapidly the components advance (CPU/SoC, I/O, displays, etc.),

aaaah gotcha! that's the _whole_ reason why i designed the long-term modular standards, so that products *can* be split around the arms race of CPU/SoC on the one hand and battery life / display etc. on the other.

and the factory that we are in touch with (the big one), they _love_ this concept, because the one thing that you might not be aware of is that even the big guys cannot react fast enough nowadays.

imagine what it would mean to them to be able to buy HUGE numbers of CPUs (and related components), drop them into a little module that they KNOW is going to work across every single product that conforms to the long-term standard. in 6 months time there will be a faster SoC, more memory, less power, but that's ok, because *right now* they can get better discounts on the SoC that's available *now*.

on the other side of the interface, imagine what it would mean to them that they could buy the exact same components for a base unit for well... three to five years (or until something better came along or some component went end-of-life)?

it took them a while, but they _loved_ the idea. the problem is: as a PRC State-Sponsored company they are *prohibited* from doing anything other than following the rules... i can't tell you what those rules are: they're confidential, but it meant that we had to find other... creative ways to get the designs made.

We're in a world where a first generation Nexus 7 tablet sells for $140 or less. At Walmart.

yeah. now that prices are dropping, just like the PC price wars, the profits are becoming so small that the manufacturers are getting alarmed (or just dropping out of the market entirely). those people are now looking for something else. they're willing to try something that might get them a profit. what should we tell them?

anyway: thank you for your post, darylb, it provides a very useful starting point for some of the key insights i want to get across to people.

