lkcl - Slashdot User

Comment Re:database performance (Score 1) 98

by lkcl on Friday October 17, 2014 @04:31PM (#48172093) Attached to: Python-LMDB In a High-Performance Environment

Comment Re:(not)perplexingly (Score 1) 98

by lkcl on Friday October 17, 2014 @04:28PM (#48172061) Attached to: Python-LMDB In a High-Performance Environment

Comment Over-emphasizing (Score 1) 98

by lkcl on Friday October 17, 2014 @04:24PM (#48172023) Attached to: Python-LMDB In a High-Performance Environment

CPython is a compiler.

it's an interpreter which was [originally] based on a FORTH engine.

It compiles Python source code to Python bytecode,

there is a compiler which does that, yes.

and the Python runtime executes the compiled bytecode.

it interprets it.

CPython has one major weakness, the GIL (global interpreter lock).

*sigh* it does. the effect that this has on threading is to reduce threads to the role of a mutually-exclusive task-switching mechanism.

I've seen the GIL harm high-throughput, multi-threaded event processing systems not dissimilar from the one you describe.

yes. you are one of the people who will appreciate, given that the codebase could not be written in (or converted to) any other language, due to time-constraints, that using processes and custom-written IPC because threads (which you'd think would be perfect to get high-performance on event processing because there would be no overhead on passing data between threads) couldn't be used, means that the end-result is going to be... complicated.

If you must insist on Python and want to avoid multi-threaded I/O bound weaknesses of the GIL, then use Jython.

not a snowball in hell's chance of that happening :) not in a milllion years. not on this project, and not on any project i will actively and happily be involved in. and *especially* i cannot ever endorse the use of java for high performance reliable applications. i'm familiar with python's advantages and disadvantages, the way that the garbage collector works, and am familiar with the size of the actual python interpreter and am happy that it is implemented in c.

java on the other hand i just... i don't even want to begin describing why i don't want to be involved in its deployment - i'm sure there are many here on slashdot happy to explain why java is unsuitable.

there are many other ways in which the limitation of threads in python imposed by the GIL may be avoided. i chose to work around the problem by using processes and custom-writing an IPC infrastructure using edge-triggered epoll. it was... hard. others may choose to use stackless python. others may agree with the idea to use jython, but honestly if the application was required to be reasonably reliable as well as high-performance there would be absolutely no way that i could ever endorse such an idea. sorry :)

Comment Do not use joins (Score 2) 98

by lkcl on Friday October 17, 2014 @03:29PM (#48171467) Attached to: Python-LMDB In a High-Performance Environment

Comment Re:Would it hurt ... (Score 1) 98

by lkcl on Friday October 17, 2014 @03:25PM (#48171413) Attached to: Python-LMDB In a High-Performance Environment

Comment Re:Oh my... (Score 5, Interesting) 98

by lkcl on Friday October 17, 2014 @03:14PM (#48171311) Attached to: Python-LMDB In a High-Performance Environment

The use cases for LMDB are pretty limited.

weeelll.... the article _did_ say "high performance", so there are some sacrifices that can be made especially when those features provided by SQL databases are clearly not even needed.

basically what was needed then was to actually *re-implement* some of the missing features (indexes for example) and that took quite some research. it turns out that (after finding an article written by someone who has implemented a SQL database using the very same key-value stores that everyone uses) you can implement secondary indexes *using* a key-value store with range capabilities by concatenating the value that you wish to have range-search on with the primary key of the record that you wish to access, and then storing that as the key with a zero-length value in the secondary-index key-value store.

this was what i had to implement - directly - in python, to provide secondary indexing using timestamps so that records could be deleted for example once they were no longer needed. it was actually incredibly efficient, *because of the performance of LMDB*.

so... yeah. didn't need SQL queries. added some basic secondary-indexing manually. got the transactional guarantees directly from the implementation of LMDB. got many other cool features....

please remember that i am keenly aware that SQLite, MySQL and i think even PostgreSQL can now be compiled to use LMDB as its back-end data store... but that the application was _so demanding_ that even if that had been done it still would not have been enough.

but, apart from that: i don't believe you are correct in saying that there are a limited number of use cases for LMDB *itself* - the statement "there are a limited number of use cases for range-based key-value stores" *might* be a bit more accurate, but there are clearly quite a _lot_ of use cases for range-based key-value stores [including as the back-end of more complex data management systems such as SQL and NOSQL servers].

this high-performance task scheduler application happens to be one of them... and the main point of the article is that, amongst the available key-value stores currently in existence, my research tells me that i picked the absolute best of them all.

Comment Re:Did you make any effort to get this undeleted? (Score 1) 98

by lkcl on Friday October 17, 2014 @02:35PM (#48170917) Attached to: Python-LMDB In a High-Performance Environment

Comment database performance (Score 2) 98

by lkcl on Friday October 17, 2014 @02:29PM (#48170863) Attached to: Python-LMDB In a High-Performance Environment

Comment Submitter doesn't understand Wikipedia notability (Score 1) 98

by lkcl on Friday October 17, 2014 @01:52PM (#48170567) Attached to: Python-LMDB In a High-Performance Environment

Comment Re:Did you make any effort to get this undeleted? (Score 1) 98

by lkcl on Friday October 17, 2014 @01:49PM (#48170545) Attached to: Python-LMDB In a High-Performance Environment

Comment Oh my... (Score 5, Informative) 98

by lkcl on Friday October 17, 2014 @01:43PM (#48170483) Attached to: Python-LMDB In a High-Performance Environment

Comment I can't wait for it (Score 1) 98

by lkcl on Friday October 17, 2014 @01:34PM (#48170379) Attached to: Python-LMDB In a High-Performance Environment

Comment Would it hurt ... (Score 5, Informative) 98

by lkcl on Friday October 17, 2014 @01:32PM (#48170357) Attached to: Python-LMDB In a High-Performance Environment

OpenLDAP was originally using Berkeley DB, until recently. they'd worked with it for years, and got fed up with it. in order to minimise the amount of disruption to the code-base, LMDB was written as a near-drop-in replacement.

LMDB is - according to the web site and also the deleted wikipedia page - a key-value store. however its performance absolutely pisses over everything else around it, on pretty much every metric that can be measured, with very few exceptions.

basically howard's extensive experience combined with the intelligence to do thorough research (even to computing papers dating back to the 1960s) led him to make some absolutely critical but perfectly rational design choices, the ultimate combination of which is that LMDB outshines pretty much every key-value store ever written.

i mean, if you are running benchmark programs in *python* and getting sequential read access to records at a rate of 2,500,000 (2.5 MILLION) records per second... in a *scripted* programming language for goodness sake... then they have to be doing something right.

the random write speed of the python-based benchmarks showed 250,000 records written per second. the _sequential_ ones managed just over 900,000 per second!

there are several key differences between Berkeley DB's API and LMDB's API. the first is that LMDB can be put into "append" mode (as mentioned above). basically what you do is you *guarantee* that the key of new records is lexicographically greater than all other records. with this guarantee LMDB baiscally lets you put the new record _right_ at the end of its B+ Tree. this results in something like an astonishing 5x performance increase in writes.

the second key difference is that LMDB allows you to add duplicate values per key. in fact i think there's also a special mode (never used it) where if you do guaranteed fixed (identical) record sizes LMDB will let you store the values in a more space-efficient manner.

so it's pretty sophisticated.

from a technical perspective, there are two key differences between LMDB and *all* other key-value stores.

the first is: it uses "append-only" when adding new records. basically this has some guarantees that there can never be any corruption of existing data just because a new record is added.

the second is: it uses shared memory "copy-on-write" semantics. what that means is that the (one allowed) writer NEVER - and i mean never - blocks readers, whilst importantly being able to guarantee data integrity and transaction atomicity as well.

the way this is achieved is that because Copy-on-write is enabled, the "writer" may make as many writes it wants, knowing full well that all the readers will NOT be interfered with (because any write creates a COPY of the memory page being written to). then, finally, once everything is done, and the new top level parent B+ Tree is finished, the VERY last thing is a single simple LOCK, update-pointer-to-top-level, UNLOCK.

so as long as Reads do the exact same LOCK, get-pointer-to-top-level-of-B-Tree, UNLOCK, there is NO FURTHER NEED for any kind of locking AT ALL.

i am just simply amazed at the simplicity, and how this technique has just... never been deployed in any database engine before, until now. the reasons as howard makes clear are that the original research back in the 1960s was restricted to 32-bit memory spaces. now we have 64-bit so shared memory may refer to absolutely enormous files, so there is no problem deploying this technique, now.

all incredibly cool.

Submission + - Python-LMDB in a high-performance environment

Submitted by lkcl on Friday October 17, 2014 @10:36AM

Comment pay them!! (Score 3, Interesting) 265

by lkcl on Tuesday October 14, 2014 @03:54PM (#48143609) Attached to: Confidence Shaken In Open Source Security Idealism

Slashdot Top Deals