lkcl - Slashdot User

Comment Re:so why is intel's 14nm haswell still at 3.5 wat (Score 1) 161

by lkcl on Monday September 01, 2014 @05:10AM (#47798903) Attached to: Research Shows RISC vs. CISC Doesn't Matter

Comment Re:so why is intel's 14nm haswell still at 3.5 wat (Score 1) 161

by lkcl on Monday September 01, 2014 @05:08AM (#47798897) Attached to: Research Shows RISC vs. CISC Doesn't Matter

Here is your answer, the A20 is freakishly slow compared to anything Intel would put their name on.

Granted, you can build a tablet to do specific tasks (like decoding video codecs) around a really slow processor and some special-purpose DSPs. But perhaps the companies in that business aren't making enough profit to interest Intel.

interestingly that assumption - that allwinner is not making enough profit - is completely wrong. allwinner is now one of _the_ dominant tablet SoC manufacturers in the world. their first revision (the A10, which was a Cortex A8) actually caused a major recession in the electronics industry when it first came out, as it was only $7.50 compared to the nearest competitor at around $11 to $12. everyone *not* using the A10 at the time was left holding worthless components; contracts for supply were reneged on; the change was so quick that many factories and design houses simply went out of business.

the volumes that allwinner are shipping are simply enormous, and, along with rockchip, their nearest competitor, the tablet market is completely and utterly overwhelmingly dominated by processors of the type that you describe as "built to do specific tasks".

those "specific tasks" include "running the android OS at a pace that's good enough for the overwhelming majority of end-users".

in short, intel has a long *long* way to go before they can even remotely consider that they have a processor that can be taken seriously in this very large market, both in terms of price and also in terms of performance.

what is particularly interesting about the comment that you make is that it would seem that intel really does, just as you do, believe that "a really slow processor and some special-purpose DSPs" simply is... not enough. and, contrary to that belief, it can be quite clearly seen by the total dominance of allwinner and rockchip that "a really slow processor and some special-purpose DSPs" really *is* enough.

one of the reasons for that is because if you look at the market you find that you need:

* audio and video CODEC processing. this can be handled by a special-purpose DSP. some of these are now handling 3D 4096-bit-wide screens.

* 3D graphics. these are handled by licensing a whole range of hard macros (special-purpose DSPs) that come with proprietary libraries implementing OpenGL ES 2.0. they're good enough, and some of them are getting _really_ good.

* an (as you put it) "really slow processor" - although if you look at allwinner's latest processor the A80 it can hardly be called "slow", it's an 8 core monster - which covers the running of the general OS.

overall these processors are graded according to price: $5 will get you something dreadful but "good enough", $20 will get you something that's complete overkill for a tablet.

and you know what? the $7 1.2ghz dual-core ARM Cortex A7 Allwinner A20 is, when it's put with 2gb of RAM, actually extremely quick. i tested out 1gb of RAM running debian GNU/Linux: i fired up xrdp and i had *five* rdesktop sessions running OpenOffice and Firefox on it, onto my laptop. it didn't fall over, and it wasn't dreadfully slow.

so i think you, just like intel, are completely and entirely missing the point. and in intel's case, that means entirely missing out on a *huge* market segment.

Comment access to broadcom chips (Score 1) 165

by lkcl on Sunday August 31, 2014 @02:25PM (#47796023) Attached to: Update: Raspberry Pi-Compatible Development Board Cancelled

Comment so why is intel's 14nm haswell still at 3.5 watts? (Score 0, Troll) 161

by lkcl on Thursday August 28, 2014 @10:21AM (#47773819) Attached to: Research Shows RISC vs. CISC Doesn't Matter

Climate Scientist Pioneer Talks About the Furture of Geoengineering 140

Posted by samzenpus on Tuesday August 26, 2014 @01:56PM from the it's-getting-hot-in-here dept.

Comment Re:complex application example (Score 1) 161

by lkcl on Tuesday July 22, 2014 @03:32PM (#47509893) Attached to: Linux Needs Resource Management For Complex Workloads

Comment Re:complex application example (Score 4, Informative) 161

by lkcl on Sunday July 20, 2014 @07:51AM (#47493359) Attached to: Linux Needs Resource Management For Complex Workloads

> the first ones used threads, semaphores through python's multiprocessing.Pipe implementation.

I stopped reading when I came across this.

Honestly - why are people trying to do things that need guarantees with python?

because we have an extremely limited amount of time as an additional requirement, and we can always rewrite critical portions or later the entire application in c once we have delivered a working system that means that the client can get some money in and can therefore stay in business.

also i worked with david and we benchmarked python-lmdb after adding in support for looped sequential "append" mode and got a staggering performance metric of 900,000 100-byte key/value pairs, and a sequential read performance of 2.5 MILLION records. the equivalent c benchmark is only around double those numbers. we don't *need* the dramatic performance increase that c would bring if right now, at this exact phase of the project, we are targetting something that is 1/10th to 1/5th the performance of c.

so if we want to provide the client with a product *at all*, we go with python.

but one thing that i haven't pointed out is that i am an experienced linux python and c programmer, having been the lead developer of samba tng back from 1997 to 2000. i simpy transferred all of the tricks that i know involving while-loops around non-blocking sockets and so on over to python. ... and none of them helped. if you get 0.5% of the required performance in python, it's so far off the mark that you know something is drastically wrong. converting the exact same program to c is not going to help.

The fact you have strict timing guarantees means you should be using a realtime kernel and realtime threads with a dedicated network card and dedicated processes on IRQs for that card.

we don't have anything like that [strict timing guarantees] - not for the data itself. the data comes in on a 15 second delay (from the external source that we do not have control over) so a few extra seconds delay is not going to hurt.

so although we need the real-time response to handle the incoming data, we _don't_ need the real-time capability beyond that point.

Take the incoming messages from UDP and post them on a message bus should be step one so that you don't lose them.

.... you know, i think this is extremely sensible advice (which i have heard from other sources) so it is good to have that confirmed... my concerns are as follows:

questions:

* how do you then ensure that the process receiving the incoming UDP messages is high enough priority to make sure that the packets are definitely, definitely received?

* what support from the linux kernel is there to ensure that this happens?

* is there a system call which makes sure that data received on a UDP socket *guarantees* that the process receiving it is woken up as an absolute priority over and above all else?

* the message queue destination has to have locking otherwise it will be corrupted. what happens if the message queue that you wish to send the UDP packet to is locked by a *lower* priority process?

* what support in the linux kernel is there to get the lower priority process to have its priority temporarily increased until it lets go of the message queue on which the higher-priority task is critically dependent?

this is exactly the kind of thing that is entirely missing from the linux kernel. temporary automatic re-prioritisation was something that was added to solaris by sun microsystems quite some time ago.

to the best of my knowledge the linux kernel has absolutely no support for these kinds of very important re-prioritisation requirements.

Comment complex application example (Score 4, Insightful) 161

by lkcl on Sunday July 20, 2014 @04:40AM (#47492919) Attached to: Linux Needs Resource Management For Complex Workloads

i am running into exactly this problem on my current contract. here is the scenario:

* UDP traffic (an external requirement that cannot be influenced) comes in
* the UDP traffic contains multiple data packets (call them "jobs") each of which requires minimal decoding and processing
* each "job" must be farmed out to *multiple* scripts (for example, 15 is not unreasonable)
* the responses from each job running on each script must be collated then post-processed.

so there is a huge fan-out where jobs (approximately 60 bytes) are coming in at a rate of 1,000 to 2,000 per second; those are being multiplied up by a factor of 15 (to 15,000 to 30,000 per second, each taking very little time in and of themselves), and the responses - all 15 to 30 thousand - must be in-order before being post-processed.

so, the first implementation is in a single process, and we just about achieve the target of 1,000 jobs but only about 10 scripts per job.

anything _above_ that rate and the UDP buffers overflow and there is no way to know if the data has been dropped. the data is *not* repeated, and there is no back-communication channel.

the second implementation uses a parallel dispatcher. i went through half a dozen different implementations.

the first ones used threads, semaphores through python's multiprocessing.Pipe implementation. the performance was beyond dreadful, it was deeply alarming. after a few seconds performance would drop to zero. strace investigations showed that at heavy load the OS call futex was maxed out near 100%.

next came replacement of multiprocessing.Pipe with unix socket pairs and threads with processes, so as to regain proper control over signals, sending of data and so on. early variants of that would run absolutely fine up to some arbitrarry limit then performance would plummet to around 1% or less, sometimes remaining there and sometimes recovering.

next came replacement of select with epoll, and the addition of edge-triggered events. after considerable bug-fixing a reliable implementation was created. testing began, and the CPU load slowly cranked up towards the maximum possible across all 4 cores.

the performance metrics came out *WORSE* than the single-process variant. investigations began and showed a number of things:

1) even though it is 60 bytes per job the pre-processing required to make the decision about which process to send the job were so great that the dispatcher process was becoming severely overloaded

2) each process was spending approximately 5 to 10% of its time doing actual work and NINETY PERCENT of its time waiting in epoll for incoming work.

this is unlike any other "normal" client-server architecture i've ever seen before. it is much more like the mainframe "job processing" that the article describes, and the linux OS simply cannot cope.

i would have used POSIX shared memory Queues but the implementation sucks: it is not possible to identify the shared memory blocks after they have been created so that they may be deleted. i checked the linux kernel source: there is no "directory listing" function supplied and i have no idea how you would even mount the IPC subsystem in order to list what's been created, anyway.

i gave serious consideration to using the python LMDB bindings because they provide an easy API on top of memory-mapped shared memory with copy-on-write semantics. early attempts at that gave dreadful performance: i have not investigated fully why that is: it _should_ work extremely well because of the copy-on-write semantics.

we also gave serious consideration to just taking a file, memory-mapping it and then appending job data to it, then using the mmap'd file for spin-locking to indicate when the job is being processed.

all of these crazy implementations i basically have absolutely no confidence in the linux kernel nor the GNU/Linux POSIX-compliant implementation of the OS on top - i have no confidence that it can handle the load.

so i would be very interested to hear from anyone who has had to design similar architectures, and how they dealt with it.

Comment legal ramifications of identity verification (Score 1) 238

by lkcl on Wednesday July 16, 2014 @02:30AM (#47464365) Attached to: Pseudonyms Now Allowed On Google+

Comment Re:Hardware is hard (Score 2) 71

by lkcl on Tuesday July 01, 2014 @07:00PM (#47364551) Attached to: Improv Project, Vivaldi Tablet Officially Dead

Comment Re:Would it kill you to hint at what Improv is (wa (Score 3, Informative) 71

by lkcl on Tuesday July 01, 2014 @03:12PM (#47362691) Attached to: Improv Project, Vivaldi Tablet Officially Dead

Comment Re:What was desirable about it? (Score 3, Interesting) 71

by lkcl on Tuesday July 01, 2014 @03:01PM (#47362605) Attached to: Improv Project, Vivaldi Tablet Officially Dead

Open hardware sounds cool, but as others have noted, good hardware design is both difficult and expensive. Considering how rapidly the components advance (CPU/SoC, I/O, displays, etc.),

aaaah gotcha! that's the _whole_ reason why i designed the long-term modular standards, so that products *can* be split around the arms race of CPU/SoC on the one hand and battery life / display etc. on the other.

and the factory that we are in touch with (the big one), they _love_ this concept, because the one thing that you might not be aware of is that even the big guys cannot react fast enough nowadays.

imagine what it would mean to them to be able to buy HUGE numbers of CPUs (and related components), drop them into a little module that they KNOW is going to work across every single product that conforms to the long-term standard. in 6 months time there will be a faster SoC, more memory, less power, but that's ok, because *right now* they can get better discounts on the SoC that's available *now*.

on the other side of the interface, imagine what it would mean to them that they could buy the exact same components for a base unit for well... three to five years (or until something better came along or some component went end-of-life)?

it took them a while, but they _loved_ the idea. the problem is: as a PRC State-Sponsored company they are *prohibited* from doing anything other than following the rules... i can't tell you what those rules are: they're confidential, but it meant that we had to find other... creative ways to get the designs made.

We're in a world where a first generation Nexus 7 tablet sells for $140 or less. At Walmart.

yeah. now that prices are dropping, just like the PC price wars, the profits are becoming so small that the manufacturers are getting alarmed (or just dropping out of the market entirely). those people are now looking for something else. they're willing to try something that might get them a profit. what should we tell them?

anyway: thank you for your post, darylb, it provides a very useful starting point for some of the key insights i want to get across to people.

Comment moving forward: next crowdfunding launch (Score 5, Informative) 71

by lkcl on Tuesday July 01, 2014 @02:45PM (#47362451) Attached to: Improv Project, Vivaldi Tablet Officially Dead

short version: the plan is to carry on, using the lessons learned to
try again, with a crowd-funding campaign that is transparent. please
keep an eye on the mailing list, i will also post here on slashdot
when it begins.

http://lists.phcomp.co.uk/pipe...

long version:

this has been a hugely ambitious venture, i think henrik's post explains much:
http://lists.phcomp.co.uk/pipe...

the - extremely ambitious - goal set by me is to solve a huge range of
issues, the heart of which is to create environmentally-conscious
mass-volume appliances that software libre developers are *directly*
involved in at every step of the way.

so, not to be disparaging to any project past or future, but this isn't
"another beagleboard", or "another raspberry pi beater": it's a way to
help the average person *own* their computer appliances and save
money over the long term. software libre developers are invited
to help make that happen.

by "own" we mean "proper copyright compliance, no locked boot
loaders and a thriving software libre environment that they can
walk straight into to help them do what they want with *their*
device... if they want to".

the actual OS installed on the appliance will be one that is
relevant for that appliance, be it ChromeOS, Android, even
Windows or MacOSX. regardless of the pre-installed OS, the
products i am or will be involved in *will* be ones that Software
Libre Developers would be proud to own and would recommend
even to the average person.

by "saving money over the long term" we mean "the device is
split into two around a stable long-term standard
with a thriving second-hand market on each side, with new
CPU Cards coming along as well as new products as well.
buy one CPU Card and one product, it'll be a little bit more
expensive than a monolithic non-upgradeable product,
but buy two and you save 30% because you only need
one CPU Card. break the base unit and instead of the whole
product becoming land-fill you just have to replace the base,
you can transfer not just the applications and data but
the *entire computer*".

it was the environmental modular aspects as well as
the committment to free software *and* the desire to reach
mass-volume levels that attracted aaron to the Rhombus Tech
project.

perhaps unsurprisingly - and i take responsibility for this - the
details of the above did not translate well into the Improv
launch. the reason i can say that is because even henrik,
who has been helping out and a member of the arm netbooks
mailing list for quite some time, *still* has not fully grasped
the full impact of the technical details behind the standards

(hi henrik, how are ya, thank you very very much for helping
with the boot of the first A10 / A20 CPU card, your post on
the mailing list last week was very helpful because it shows
that i still have a long way to go to get the message across
in a short concise way).

the level of logical deduction, the details that need to be taken
into account, the number of processors whose full specifications
must be known in order to make a decent long-term stable
standard.... many people i know reading that sentence will think i
am some sort of self-promoting egotistical dick but i can tell you
right now you *don't* want to be holding in your head the
kinds of mind-numbing details needed to design a long-term
mass-volume computing standard. it's fun... but only in a
masochistic sort of way!

anyway. i did say long, so i have an excuse, but to get to the
point: now that the money is being returned, we can start again
with a new campaign - using a crowdfunding site that shows
numbers, and starts with a lower target (250) that offers more value
for that same amount of money to everyone involved as various
stretch goals (500, 1,000, 2500) are achieved. these will include
casework, FCC Certification, OS images prepared and, most
importantly as far as i am concerned, one of the stretch goals
i feel should be a substantial donation to the KDE Team in
recognition of the help - through some tough lessons if we are
honest - that they have given, as well as the financial outlay
that they've put forward because they believed in what we're
doing.

i'd like to hear people's thoughts and advice, here, because this
really is an exceptionally ambitious project that no commercial
company let alone a software-libre group would ever consider,
precisely because it requires a merging of *both* commercial
aspects *and* software libre principles and ethics. the
environmental angle and long-term financial savings are what
sells it to the end-users though.

Comment a Commodore Pet 3032 (Score 1) 153

by lkcl on Friday May 30, 2014 @09:46PM (#47133723) Attached to: Ask Slashdot: What Inspired You To Start Hacking?

Comment Re:Dark Reign (Score 1) 153

by lkcl on Friday May 30, 2014 @09:38PM (#47133695) Attached to: Ask Slashdot: What Inspired You To Start Hacking?

Slashdot Top Deals