Comment no. (Score -1) 928
systemd violates the principles of unix, adding massive amounts of completely unnecessary complexity. there is absolutely nothing good to say about it.
systemd violates the principles of unix, adding massive amounts of completely unnecessary complexity. there is absolutely nothing good to say about it.
wasn't NT 3.5 available for ARM, DEC Alpha, Power PC *and* x86? wasn t the core of the NT kernel based on the Mach kernel, and written almost exclusively in c? so what the hell went wrong??
i think at some point some scientists somewhere will work out that the statistical evidence is growing to show, more and more, that dark matter *doesn't* exist...
PPS: Given your custom IPC for Python, could you go us one further and write an OSGi for Python using it? Pretty please!
That's not loadavg, that's IO latency. You should probably be using iostat to get useful numbers.
oo, thank you very much for that tip, i'll try to pass it on and will definitely remember it for the next projects i work on. thank you.
It doesn't matter how awesome someone thinks their Python-LMDB project is. It doesn't matter how important someone thinks their Python-LMDB project is.
the mistake you've made has been raised a number of times in the slashdot comments (3 so far). the wikipedia page that was deleted was about LMDB, not python-lmdb. python-lmdb is just bindings to LMDB and that is not notable in any significant way.
CPython is a compiler.
it's an interpreter which was [originally] based on a FORTH engine.
It compiles Python source code to Python bytecode,
there is a compiler which does that, yes.
and the Python runtime executes the compiled bytecode.
it interprets it.
CPython has one major weakness, the GIL (global interpreter lock).
*sigh* it does. the effect that this has on threading is to reduce threads to the role of a mutually-exclusive task-switching mechanism.
I've seen the GIL harm high-throughput, multi-threaded event processing systems not dissimilar from the one you describe.
yes. you are one of the people who will appreciate, given that the codebase could not be written in (or converted to) any other language, due to time-constraints, that using processes and custom-written IPC because threads (which you'd think would be perfect to get high-performance on event processing because there would be no overhead on passing data between threads) couldn't be used, means that the end-result is going to be... complicated.
If you must insist on Python and want to avoid multi-threaded I/O bound weaknesses of the GIL, then use Jython.
not a snowball in hell's chance of that happening
java on the other hand i just... i don't even want to begin describing why i don't want to be involved in its deployment - i'm sure there are many here on slashdot happy to explain why java is unsuitable.
there are many other ways in which the limitation of threads in python imposed by the GIL may be avoided. i chose to work around the problem by using processes and custom-writing an IPC infrastructure using edge-triggered epoll. it was... hard. others may choose to use stackless python. others may agree with the idea to use jython, but honestly if the application was required to be reasonably reliable as well as high-performance there would be absolutely no way that i could ever endorse such an idea. sorry
if something like PostgreSQL had been used as the back-end store, that rate would be somewhere around 30,000 tasks per second or possibly even less than that
You should pipe it to
don't jest... please
but, seriously: the complete lack of need in this application for joins (as well as any other features of SQL or NOSQL databases) was what led me to research key-value stores in the first place.
A lot of the locking semantics you mentioned sound pretty similar to RCU which is used extensively in the Linux kernel, and allows for lockless reading on certain architectures.
http://en.wikipedia.org/wiki/R...
but i am not an expert on these things. i'm sure that if howard chipped in here (and he _is_ an expert on the linux kernel and on high-performance efficient algorithm implementation) he'd be able to tell you more and probably a lot more accurately than i can.
The use cases for LMDB are pretty limited.
weeelll.... the article _did_ say "high performance", so there are some sacrifices that can be made especially when those features provided by SQL databases are clearly not even needed.
basically what was needed then was to actually *re-implement* some of the missing features (indexes for example) and that took quite some research. it turns out that (after finding an article written by someone who has implemented a SQL database using the very same key-value stores that everyone uses) you can implement secondary indexes *using* a key-value store with range capabilities by concatenating the value that you wish to have range-search on with the primary key of the record that you wish to access, and then storing that as the key with a zero-length value in the secondary-index key-value store.
this was what i had to implement - directly - in python, to provide secondary indexing using timestamps so that records could be deleted for example once they were no longer needed. it was actually incredibly efficient, *because of the performance of LMDB*.
so... yeah. didn't need SQL queries. added some basic secondary-indexing manually. got the transactional guarantees directly from the implementation of LMDB. got many other cool features....
please remember that i am keenly aware that SQLite, MySQL and i think even PostgreSQL can now be compiled to use LMDB as its back-end data store... but that the application was _so demanding_ that even if that had been done it still would not have been enough.
but, apart from that: i don't believe you are correct in saying that there are a limited number of use cases for LMDB *itself* - the statement "there are a limited number of use cases for range-based key-value stores" *might* be a bit more accurate, but there are clearly quite a _lot_ of use cases for range-based key-value stores [including as the back-end of more complex data management systems such as SQL and NOSQL servers].
this high-performance task scheduler application happens to be one of them... and the main point of the article is that, amongst the available key-value stores currently in existence, my research tells me that i picked the absolute best of them all.
I apologize for that, I was wrong and spoke too quickly. If you can find notable sources for P-LMDB, then it's worth a shot bringing it to that user's attention.
hey not a problem. you're right about py-lmdb - my main concern is to get LMDB the recognition that its peer stores (such as BerkeleyDB) already have: http://en.wikipedia.org/wiki/B... - someone else mentioned that there are other such key-value stores (some of them at the same development period as LMDB) which already have articles. and it's that an *oracle* employee marked the page for deletion that's the main issue of contention here.
The author got poor performance from a SQL database with no indexing, which degraded as the number of records grew? You don't say! A database that has to do a full scan for reads performs poorly?
yes. it was that i had to do that analysis in a formal repeatable independent way, which i had never done before, and i was very surprised at the poor results. i was at least expecting a *consistent* and reliable rate of... well, i don't know: i was kinda expecting PostgreSQL to be top of the list and i was kinda expecting it to reach 100,000 or 200,000 records per second... and it just... couldn't. i was *completely* caught off-guard by the need to switch off all the safety checks, and by how dramatic the effect on performance of adding indexes really was.
so it was then by complete contrast that, for example, the py-lmdb benchmarks got an ORDER OF MAGNITUDE better sequential-read-speeds (2.5 million per second) than i was expecting that made me really sit up and take notice.
Surprise about load average seems equally naive. If you fork a bunch of processes that are doing IO, of COURSE the load increases. Load is a measure of the number of processes not sleeping. That's all it is. I don't understand his surprise that a system steadily doing a great deal of IO would show a lot of time spent in IO calls in profiling.
you've missed the point. it was that the exact same design using 20 (or so) shm file handles instead of 200 file handles opening to the exact same data (effectively) resulted in a reasonable loadavg, whereas having the 200 file handles open had a loadavg that ground the system completely to a halt.
so it's not the *actual* loadavg that is relevant but that the *relative* loadavg before and after that one simple change was so dramatically shifted from "completely unusable and in no way deployable in a live production environment" to a "this might actually fly, jim" level.
Never mind what projects use it; what have independent reliable sources written about LMDB?
i've written something and i'm pretty wubwubwubreliawibble oh look pretty coloured lights...
Anyone can make an omelet with eggs. The trick is to make one with none.