Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×

Comment Re:flashy, but risky too. (Score 1) 83

Although I see problems with this I kind of doubt counterfeiting is going to be one. To successfully do this the driver/Uber would have to have access to a huge warehouse of counterfeit goods so they could exchange the real item (chosen by the customer, not the Uber driver) for a matching fake one. I just don't see that as a practical scheme for stealing goods.

Comment Re:Schneier got it right a decade and a half ago (Score 1) 119

Yes, Java and Python (3) and Qt all are causing enormous difficulties as they followed Microsoft down the fantasy road and thought you had to convert strings on input to "unicode" or somehow it was impossible to use them. Since not all 8-byte strings can convert there must either be a lossy conversion or there must be an error, neither of which are expected, especially if the software is intended to copy data from one point to another without change.

The original poster is correct in saying "stay away from Unicode". This does not mean that Unicode is impossible. It means "treat it as a stream of bytes". Do not try to figure out what Unicode code points are there unless you really really have a reason to. And you will be surprised how little you need to figure this out. In particular you can search for arbitrary regexps (including sets of Unicode code points) with a byte-based regexp interpreter. And you can search for ASCII characters with trivial code.

Comment Re:Type "bush hid the facts" into Notepad. (Score 1) 119

Actually Plan 9 and UTF-8 encoding existed well before Microsoft started adding Unicode to Windows.

The reason for 16-bit Unicode was political correctness. It was considered wrong that Americans got the "better" shorter 1-byte encodings for their letters, therefore any solution that did not punish those evil Americans by making them rewrite their software was not going to be accepted. No programmer at that time (including ones that did not speak English) would ever argue for using anything other than a variable-length byte encoding for a system that still had to deal with existing software and data that was ASCII, this was a command from people who did not have to write and maintain the software.

The programmers, who knew damn well that variable-length was the correct solution, were unfortunately not bright enough to avoid making mistakes in their encodings (such as not making them self-synchronizing). UTF-8 fixed that, but these errors also led some of the less-knowledgeable to think there was a problem with variable length.

Unfortunately political correctness at Microsoft won, despite the fact that they had already added variable-length encoding support to Windows. It may also have been seen as a way to force incompatibility with NFS and other networked data so that Microsoft-only servers could be used.

One of the few good things to come out of the "Unix wars" was that commercial Unix development was stopped before the blight of 16-bit characters was introduced (it was well on it's way and would have appeared at the same time Microsoft did it). Non-commercial Unix made the incredibly easy decision to ignore "wide characters".

The biggest problem now is that Window convinced a lot of people who should know better that you need to use UTF-16 to open files by name (all that is really needed is to convert UTF-8 just before the api is called). This led to UTF-16 to infect Python, Qt, Java, and a lot of other software and cause problems and headaches and bugs even on Linux. There is some hope that they are starting to realize they made a terrible mistake, Python in particular seems to be backing out by storing a UTF-8 version of the string alongside the UTF-32.

Comment Re: novice programmer alert! (Score 1) 119

The big downside of UTF-8 is using it as an in-memory string. To find the nth character and you have to start at the beginning of the string.

And this is important, why? Can you come up with an example where you actually produce "n" by doing anything other than looking at the n-1 characters before it in the string? No, and therefore an offset in bytes can be used just as easily.

C# and Java use UTF16 internally for strings.

And you are aware that UTF-16 is variable-length as well, and therefore you can't "find the nth character" quickly either?

You might want to retake compsci 101.

Comment Re:Type "bush hid the facts" into Notepad. (Score 1) 119

Maybe you're willing to accept that ambiguity, and use the rule, "If the file looks like valid UTF-8, then use UTF-8; otherwise use

Yay! You actually got the answer partially correct. However you then badly stumble when you follow this up with:

8-bit ANSI, but under no circumstances UTF-16

The correct answer is "after knowing it is not UTF-8, use your complicated and error-prone encoding detectors".

The problem is a whole lot of stupid code, in particular from Windows programmers, basically tries all kinds of matching against various legacy encodings and UTF-16, and only tries UTF-8 if all of those return false. This is why Unicode support still sucks everywhere.

You try UTF-8 FIRST. This is for two reasons: first because UTF-8 is really popular and thus likely the correct solution (especially if you count all ASCII files as UTF-8, which they are). But the second is that a random byte stream is INCREDIBLY unlikely to be valid UTF-8 (like 2.6% chance for a two-byte file, and geometrically lower for any longer ones), this means your decision of "is this UTF-8" is very very likely to be correct. Just moving this really reliable test to be the first one will improve your detection enormously.

The biggest help would be to check for UTF-8 first, not last. This would fix "Bush hid the facts" because it would be identified as UTF-8. But a variation on that bug would still exist if you stuck a non-ASCII byte in there, in which case it would still be useful (but much much less important) to not do stupid things in the detectory, for instance requiring UTF-16 to either start with a BOM or to have at least one word with either the high or low byte all zero would be a good idea and indicate you are not an idiot.

Comment Re:Write-only code. (Score 1) 757

I have no idea why you are arguing but saying EXACTLY the same things I am.

I am not saying to make a and b into unique pointers to a copy. I am saying "a and b ARE LOCAL VARIABLES!!!!!" They will be copied to make the lambda, it is NOT POSSIBLE to avoid this!!!! The function can return before the lambda is destroyed. And you seem to think "constructing on the stack" does not involve a copy of a and b, which is wrong. You do mention the "move" which does do another copy (though move semantics could cause a more-efficient version but it is not zero). Actually the lamda data structure is created on the heap because this is more efficient than the move.

The rest of my comments were about how the C++ compiler will actually do better than your attempts at premature optimization by forcing a and be to be on the heap. There will be only a single "shared pointer" to the lambda object, not one to a and another to b. Also what boost calls an "intrusive ptr" will be used, avoiding a lot of overhead of std::shared_ptr. And as my C code shows, it is possible to avoid multiple references to the lambda object, thus a unique_ptr could be used, though I believe this will require the optimizer to have access to the implementation of the thread constructor so it knows the lambda is not copied.

Comment Re:Write-only code. (Score 1) 757

Above AC is an excellent example of the problems with C++. He has quite a few misconceptions.

a = std::make_shared(x) does make a local shared pointer, but not the data itself, which is allocated on the heap.

The lambda absolutely does use the equivalent of a unique ptr. There is a block of memory allocated and a and b are copied to it (this block also contains a pointer to the actual code, which in the example will be something to further copy or move a and b to the stack and call the do_something function). This is the copy that is unavoidable. This block is freed when the pointer goes out of scope. Since it is passed by value move semantics mean that there is never more than one pointer, so it is concievable that the optimizer will do a unique ptr to it (though it is likely that something more like a shared ptr is done, or the boost intrusive_ptr).

Yes you can force it to use std::move but this should be an automatic optimization, it is nonsense that I have to type that. But even a move is much less efficient than direct use. The block is freed when no longer needed by the execution of the lambda (in the parallel thread).

I do not want to use a and b in the parent thread. That is the whole point. Way to get completely confused there!

Comment Re:Write-only code. (Score 1) 757

You mean the caller has to do something like this (not sure of the syntax and I think it requires C++17)?

  std::thread([std::move(a), std::move(b)](){do_something(a,b);}).detach();

Not sure if that is a good advertisement for C++.

It would be nice if this happened automatically when possible, but apparently for complex language rule reasons it cannot. The following code must make a copy of A:

  void f() {
          ComplexThing A(FunctionReturningComplexThing()); // move
          DoSomething(A); // the copy is here
  }

While this code, which seems like it should be what the above optimizes to, will only use move:

  void f() {
          DoSomething(FunctionReturingComplexThing());
  }

That is annoying and the fact that such optimizations are not allowed is a good sign that there are problems with the design of C++.

Slashdot Top Deals

What is research but a blind date with knowledge? -- Will Harvey

Working...