Wait, if you have enough power to the house, you can have ovens, refrigerators, and air conditioning units?! Wow, who would have thought about the new awesome ideas 100 years ago when power was limited?
Unless you're God and can see the future, stop acting like you know that there is absolutely no benefit to improving technology.
Netflix' SSD servers only have about 10TB of storage, and they have about a 70% hit rate, while the rust-bucket servers have 100TB of storage and have about an 80%-90% hit rate. Their entire catalog is about 1PB.
P2P would work best for flash-mobs. It would reduce the number of cache servers they need deployed in other ISPs.
Assuming decent buffering, you can start streaming the video live at the beginning, and the P2P can start buffering the later part of the video. Just use the normal servers for starting the buffer, but then use P2P to populate as much buffer as you can. I'm sure the 80/20 rule would apply.
So it makes little sense to have asymmetric fiber service other than for marketing purposes.
Most fiber deployments use GPON, which shares bandwidth. High speed sending optics are more expensive than high speed receiving optics. Most ONTs can receive up to 2.5gb/s, but can only send 1.25gb/s.. If using Active Ethernet, then it'll be completely symmetrical, and you'll have 1gb up/down. But in GPON mode, you have 2.5 down and 1.25 up.
Google Fiber uses WDM-GPON, which has 32 lamdas of 1.25gb/1.25gb, so it's all symmetrical, but they were an early adopter and used the draft version.
The other question that comes up. Since the fiber is already dedicated, why use GPON? Well, you get higher port densities and less power consumption, but quite a bit. There is a good benefit of having 32 customers per port instead of one customer per port.
The point is that each and every component involved, from hardware through firmware to software, is designed under the premiss that it is okay to drop a packet at any time for any reason, or to duplicate or reorder packets.
That entire sentence is damn near a lie. Those issue can happen, but they shouldn't happen. You almost have to go out of your way to make those situations happen. Dropping a packet should NEVER happen except when going past line rate. Packets should NEVER be duplicated or reordered except in the case of a misconfiguration of a network. Networks are FIFO and they don't just duplicate packets for the fun of it.
As for error rates, many high end network devices can upwards of an error rate of 10E-18, which puts it at one error every 111petabytes. I assume you'd have to divide that error rate by the number of hops.
I've seen enough system designs where they send data as UDP packets and they require incredibly low packet-loss rates, border-lining never. It can be done, but you're not going to be using dlink switches. You can purchase L4 switches now with multi-gigabyte buffers. They're meant to handle potentially massive throughput spikes and not drop packets.
I assume this is all intra-datacenter traffic or at least an entirely reserved network.
* the UDP traffic contains multiple data packets (call them "jobs") each of which requires minimal decoding and processing
anything _above_ that rate and the UDP buffers overflow and there is no way to know if the data has been dropped. the data is *not* repeated, and there is no back-communication channel.
How are you planning on handling UDP checksum errors without a backchannel or EC? The physical ethernet layer is lossy, so you're screwed even before the packet hits the NIC.
I just logged into my switch at home and it has 146 days of uptime with 20,154,030,043 frames processed and 0 frame errors. I can even do a 1gb/1gb, for a total of just under 2gb/s at once, iperf, and have 0 packets dropped.
Let the network group worry about QoS. But yes, errors will eventually happen, they're just very rare. But when they do happen, it's probably pathological and you'll get a lot of them. But I wouldn't go so far to say "the physical ethernet layer is lossy", as a general statement.
You need to make larger batches.
1) UDP/Job comes in, write to single-writer many reader queue(large circular queues can be good for this) and the order number, maybe a 64bit incrementing integer. If the run time per job is quite constant, then you could use several single reader/writer queues and just round robin them. This would reduce potential lock contention, but would come at the cost of variable work loads could cause a bias towards a single worker.
1.a) You're not receiving packets fast enough to worry about threading reading from the NIC. If you had to look into making this part faster, like millions of Packets Per Second, the first thing I would find out is if this packets are coming from multiple data sources and if jobs need to be processing in order relative to all sources or to themselves. If themselves, then you could have a load balancer trying to round-robin and sticky by Source IP.
2) Worker sees jobs in queue(since this is a speed sensitive dedicated matching, polling could work, but may want event based), grabs N jobs, where those N Jobs can be reliably completed in a timely fashion, this may be 1 or may be 100, who knows until you test. Note the order number of your Jobs. You don't really need to grab N jobs if using a single reader/writer queue since there is no real contention, but reading in batches is good for high contention queues like multi-readers.
3) Your worker will now loop through each job running each script, hopefully all on the same worker/thread.
4) Write out the completed jobs to a single reader single writer queue. If you don't use a single reader/writer queue and instead have a multi-writer queue, you may want to commit finished jobs in batches to reduce contention.
5) Have another worker poll/event each of queues for each worker. This worker can make sure the jobs are put back in order. This process I assume to be relatively lite, so probably a single worker to handle all of the worker queues, but could also be threaded. You just need to manage the ordering somehow.
You should have no more than N number of workers per core, where N is probably a small number, like 2. Lots of threads is bad.
I love single reader/writer queues, they can be lock-less.
Your problem sounds close to what Disruptor handles (Google: disruptor ring buffer)(fun read: http://mechanitis.blogspot.com...). May want to also look into that kind of design. It's an interesting project that runs on Java and