Python-to-C++ Compiler 181
Mark Dufour writes "Shed Skin is an experimental Python-to-C++ compiler. It accepts pure, but implicitly statically typed, Python programs, and generates optimized C++ code. This means that, in combination with a C++ compiler, it allows for translation of pure Python programs into highly efficient machine language. For a set of 16 non-trivial test programs, measurements show a typical speedup of 2-40 over Psyco, about 12 on average, and 2-220 over CPython, about 45 on average. Shed Skin also outputs annotated source code."
not terribly useful quite yet (Score:5, Insightful)
Re:not terribly useful quite yet (Score:3, Insightful)
I could envision it working like this. Instead of statically declaring all your variable types in every function, you instead simply declare that whatever tpyes are being used, the
Re:not terribly useful quite yet (Score:2)
Re:not terribly useful quite yet (Score:2)
Re:not terribly useful quite yet (Score:2)
I suspect that varies with the programmer. I'm pretty certain that much of my Python code contains things that a type deduction system (SML, Haskell) wouldn't be able to cope with. Certainly I use duck typing a lot.
And besides, only one of maybe a hundred Python program I've written ran unacceptably slow. And that was a quick hack for
Re:not terribly useful quite yet (Score:3, Informative)
How exactly does duck typing differ from the structural subtyping of e.g. OCaml, which allows you to write a function that can be passed any object, of any class or none, if it provides all the methods that function uses? The type inference system handles it just fine.
Of course, "duck typing"
Re:Underlying technology (Score:2)
I could see this Python > C++ > Machine code project being something PyPy was built on top of, but not the other way around? Please explain...
Ewwwww (Score:3, Funny)
Re:Ewwwww (Score:5, Insightful)
I think you're not supposed to read it. You're only supposed to feed it to your C++ compiler. f2c produced unreadable output too, but nobody read the output; at one time it was the only free fortran option on linux.
Re:Ewwwww (Score:2, Funny)
Re:Ewwwww (Score:3, Insightful)
Re:Ewwwww (Score:4, Insightful)
Yeah, whenever I look at the output of my optimising compiler, it's really hard to understand too. It's all in assembler, for a start.
Plus, the quality of C code generated by CFront was rubbish - unreadable.
Same with the Modula-3 compiler I tried. You couldn't work out what was going on in the resulting C code without a load of work.
Can you see where I'm going with this?
Yeah, but that's not what we need. (Score:3, Insightful)
This won't be meaningful until a converted python script is compared to efficient code written natively in C++ in the first place.
Re:Yeah, but that's not what we need. (Score:4, Insightful)
Re:Yeah, but that's not what we need. (Score:5, Insightful)
no, I'd be far more interested in a good compiler to compile that python straight to machine code...
Re:Yeah, but that's not what we need. (Score:3, Insightful)
Re:Yeah, but that's not what we need. (Score:2)
Re:Yeah, but that's not what we need. (Score:3, Insightful)
Is that the same way the method of using layers of multiple simple tools that all do one thing really well is more buggy that just using one larger general purpose monolithic app?
A cross platform Python to machine code compiler would presumably need to reinvent a whole lot of difficult platform specific stuff that has already been solved by C++ compilers.
Re:Yeah, but that's not what we need. (Score:4, Insightful)
...and that's why it shouldn't be a Python to C++ translator; it should be a GCC frontend instead (i.e., translating to GCC's internal representation).
Re:Yeah, but that's not what we need. (Score:2)
GCC -- not suitable for dynamic language? (Score:2)
Re:Yeah, but that's not what we need. (Score:3, Insightful)
Would you also like to translate a text from Arabic to English by passing through 3 or 4 languages in between?
In this analogy the problem would probably be accuracy, in the case you presented it would be performance being lost due to layers of conversion. Some high level optimizations are inevitably lost (unless the C++ compiler has some sort of strong AI).
Re:Yeah, but that's not what we need. (Score:2)
I can't imagine by what process of thought you came to think this was a useful thing to say. Strong AI is AI which is self-aware. It has nothing to do with problem solving capabilities. Furthermore, strong AI does not yet exist. Moreover, compiler optimizations are a set of rule-driven alterations based on mathematical proof that things aren't changed; theoretical AI wouldn't actually help in any way.
Read a Searle book before engaging in this sort of s
Re:Yeah, but that's not what we need. (Score:3, Insightful)
You are making a gigantic assumption that because this converter's better than the last one, that it's usable in efficiency arenas. By comparison, you might be looking at the difference between a shoe and a shoe with a spring (that's what air pumps do, don't laugh) wh
Re:Yeah, but that's not what we need. (Score:3)
here's one way to look at it (Score:3, Funny)
- 4 hours to write a given program in python, 32 hours to write same program in C++
- 10 seconds to run the python program, but just 2 seconds to run the faster C++ program
- the program is run 20 times a day
- assume the developer time costs as much as the the time of the person that runs it
Ok, so it'll take 630 days of running this program for the faster C++ program to make up for the extra time to develop it. So, if you can wa
Re:Yeah, but that's not what we need. (Score:2)
Indeed. But you replied anyway.
I'm not interested in writing native C++ code because it's hypothetically faster (it's not faster if I count coding time). But I am interested in a good python-to-C++ translator. Why wouldn't any python user be?
I never said they wouldn't be. Please feel free to re-read what I said until you understand it. However, please don't reply again and attempt to argue with me over things I didn't say. It's obnoxious.
Re:Yeah, but that's not what we need. (Score:2)
No they aren't, if only because ShedSkin doesn't handle modules yet and Python without modules is not really useful.
Last time I checked, it was the only Python compiler... (CPython is an interpreter, PyPy is also an interpreter, I'm pretty sure Vyper was also an interpreter [written in O'Caml before it got trash
Re:Yeah, but that's not what we need. (Score:3, Informative)
Neither CPython nor PyPy is a strict interpreter, both of them compile source to byte-code and then act as a virtual machine to run that byte-code. PyPy also does some work on compiling to native code on the fly, depending on which version you're using (Armin Rigo's is the most sophisticated on the JIT/native code front, but it's far from stable).
Re:Yeah, but that's not what we need. (Score:2)
Re:Yeah, but that's not what we need. (Score:2)
Yes, they are. They translate from one language to another. In the case of Psyco, that language happens to be machine code; in the case of CPython it's bytecode. They also have an interpreter (or runtime, virtual machine, whatever you want to call it) which is no different from, say, C++ having a runtime required in addition to your code (and like C++ you can statically link the runtime into a standalone executable if you want to).
Re:Yeah, but that's not what we need. (Score:2)
I think that falls under the standard definition of "interpreter".
Even ancient versions of MS BASIC were bytecode-oriented. They had to be to fit any decent-sized program into the limited RAM available.
Re:Yeah, but that's not what we need. (Score:2)
I think that falls under the standard definition of "interpreter".
I don't. It's a mixed system, with both a compiler and an interpreter. A strict interpreter runs directly on the source code (cf Bash).
Note that a compiler doesn't have to be an ahead-of-time process that generates machine code directly. Many C compilers generate assembly code that is then compil
Re:Yeah, but that's not what we need. (Score:2)
No they aren't, if only because ShedSkin doesn't handle modules yet and Python without modules is not really useful.
RTFA. They're saying this verbatim. This means that, in combination with a C++ compiler, it allows for translation of pure Python programs into highly efficient machine language.
Last time I checked, it was the only Python compiler
Then you need to check again. Arguin
Re:Yeah, but that's not what we need. (Score:2)
What a gloomy way of looking at the world.
Re:wrong comparison (Score:3, Interesting)
That'll teach me to hit submit without checking the preview. I lost a big and important chunk of the reply after operator< because I forgot to write out the entity for <. Here's a repaste; yay form buffers, boo no edit button for the first five minutes of a post.
-----------------
That's the wrong comparison to make, because it assumes that the C++ programmer has unlimited time to make his C++ code efficient and correct.
Well, yes and no. I actually got into this else-thread; there are a hell
Native code (Score:3, Insightful)
The best way to get some speed and still keep the nice Python functions and layout is just to export the most heavily used functions to native code (C/C++).
I don't know if its possible to take the C++ output and optimize it seperatly, that way you will have a good start to make native code though.
In short: Better, fast and easy, but not the best (if you can write native code)
Re:Native code (Score:2)
Very interesting... (Score:4, Informative)
Among python programmers, I'm curious - how many use psyco (another python performance enhancement tool) for their projects? I fiddled with it a while ago (it didn't work because of a C module that it didn't like), but never had a compelling reason to go back to it. Performance optimization has never been important enough for my applications to merit the effort.
Re:Very interesting... (Score:3, Interesting)
Yup. Along the same lines, Ruby has a related project by Ryan Davis, Ruby2C [rubyforge.org]. It's useful for small localized speedups, but you wouldn't want to try to write your entire app in it.
Re:Very interesting... (Score:2)
Re:Very interesting... (Score:4, Interesting)
I use Psyco in my work. My app is a code generator that processes multiple models and transforms them into optimization code. Psyco reduced the time it took for process 1 model from 20 seconds to 2 seconds. It doesn't sound like much, but when you have to do it for lots of models, the speedup suddenly becomes quite substantial.
Re:Very interesting... (Score:2)
<aol>me too!</aol> Much of my code spends the majority of its time waiting for database queries to finish, ImageMagick to finish doing its stuff, files to copy, etc. Psyco doesn't do a thing for that stuff. On the other hand, a small amount of my code is pretty CPU intensive - not enough to break out of pure Python into Pyrex or anything else, but enough to want a performance upgrade. For thos
Lots of cross-language compiling... (Score:2)
I'm confused... (Score:5, Interesting)
Re:I'm confused... (Score:3, Interesting)
And yet, if you're going to compile Python, I'd want the translation into source code. If it's worth rewriting in C++, it's worth tuning, especially if you can improve the usage of type-safe code.
It's almost the opposite (Score:2)
Absolutely not. It takes an enormous amount of effort to compete with the native code generation of good C or C++ compilers - and much of that effort has to be repeated for every platform you target.
Many language implementations for less mainstream languages compile through C, treating it as a "portable assembler", and leveraging all the work that's been done to optimize C compilation. This is even done for some high-end languag
If they can do this... (Score:2)
Re:If they can do this... (Score:2)
Probably because it is fairly hard to translate a dynamically typed language into the RTL GCC uses for its backends.
I think they had a lot of problems even 'only' with C++ and all its powerful constructs like templates etc.
Re:If they can do this... (Score:2)
On second thought, it could be to allow compilation on different platforms. Write once and precompile from Python, then compile for each system you prefer, using the system-specific compiler to optimize for the processor architecture. Of course, I'm just guessing. Hell
Re:If they can do this... (Score:5, Insightful)
If this converter proves to be successful, I believe that a GCC frontend will be written eventually. There are probably potential optimizations that would be difficult or impossible to implement any other way.
Some may think that the dynamic nature of Python may preclude its inclusion in GCC. Technically, all that would need to be done is to have a runtime to handle dynamic things, similar to how Objective-C (for which there is GCC support) has a runtime to handle message passing and late binding. However, a large portion of the potential efficiency of a compiled version of the language would be lost to these dynamic capabilities; luckily, a compiler can detect when things are implicitly static (in fact, this converter is limited to implicitly static constructs), and optimise them to be truly static at compile-time.
File as NBNC (Nice But No Cigar) (Score:5, Insightful)
What the Python C/C++ interested people REALLY need is a book written by a group of Python AND C/C++ masters which teaches the two simultaneously showing complimentary methods of doing any given thing working from beginner to advanced and I DON'T mean "How to turn your n00b Python code into C/C++ hotness" sort of viewpoint. I mean both taught simultaneously in synch showing how they can interchange and compliment.
Software tricks for converting? Ultimately worse than not having them because it leads to horrible obfuscation because we don't know exactly what is going on when 13,412 lines of Python is turned into C++ because WE DIDN'T WRITE IT AND WE NEVER LEARNED C/C++. "Say Mike, that's great but you're the company code cowboy and you don't do C++ natively and I sure as hell don't read it being management so exactly what happens if this needs to be fixed? We've gone from importing open source code you couldn't read to writing our own open source code you can't read."
Re:File as NBNC (Nice But No Cigar) (Score:2, Informative)
Re:File as NBNC (Nice But No Cigar) (Score:2)
And when Linux was still at 1.x, it should have been dismissed by business because it didn't support SMP.
The software is barely written. Have patience.
Re:File as NBNC (Nice But No Cigar) (Score:2)
Yes, lightyears ahead of code which is talked about, but still not publicly available : Arc [paulgraham.com].
Re:File as NBNC (Nice But No Cigar) (Score:4, Insightful)
That isn't how a compiler is used. When you compile a C++ program, you don't throw away your C++ source and check the executable into source control. "Oh, no! We used gcc and now we have a bunch of gobbledygook we don't understand!"
The C++ is an intermediate stage in the make process, akin to the output of various phases of gcc.
Python as prototyping language (Score:4, Interesting)
Python is a terrific prototyping language (and lots of other things besides.) As a C++ coder I've been using it for prototyping stuff that will eventually be integrated into a larger application and therefore MUST be translated to C++. So what I'd like to see is a tool (written in Perl, just for the fun of having a linguistic threesome) that just does a light gloss on Python syntax to get me most of the way to human-readable C++. That would be far more useful (to me) than thsi thing, which sounds more like f2c, whose output could case brain damage in humans and cancer in rats, or possibly the other way around.
Re:Why not just use pure C++? (Score:2, Insightful)
Re:Why not just use pure C++? (Score:2)
That's pretty much what he's doing, ShedSkin is a Python to C++ compiler, then you need to compile the C++ code ShedSkin yields to machine code, you can do that with gcc.
The goal (for the author) at the moment is to get a fairly complete Python to C++ compiler (ShedSkin is already very good if you're mostly doing simple operations such as crunching numbers, but if your program is really complex or uses libraries then you're out of luck)
Re:Why not just use pure C++? (Score:2)
After all, if you can express a program in binary, you can convert that binary string to a number. Just count off that many 0's (like they were tally marks).
Re:Why not just use pure C++? (Score:4, Informative)
Yes, I have have wasted some time staring at the shell waiting and waiting for it to return from some complicated Python routine. I know that compiled C would faster, and hand-rolled assembler would be faster still. But I say to myself: hey, I wrote this code in a single afternoon, how many weeks of hair-pulling would it take to re-engineer this - and make it bug-free - in C? When I put it that way, I don't mind waiting the extra minutes for Python to do my dirty work.
As a previous poster mentioned, the ability to handle tuples of mixed-types is critical. I look forward to seeing great things from Shed Skin in the future.
Re:Why not just use pure C++? (Score:2)
Not to mention that you can speed up execution time by throwing more hardware at it. If you try that with programmer time you just end up with a bruised programmer, and nobody wants that!
Re:Why not just use pure C++? (Score:2)
Re:Why not just use pure C++? (Score:2)
Complex data structures in Perl? Such pain I wish to never endure.
Re:Why not just use pure C++? (Score:3, Insightful)
As have I, but I'd certainly rather manage in languages that support first order data structures, "for each" loops for iterations, proper disjunctive types, pattern matching, and so on. C++ is better than it used to be, but all the data structures and algorithms in the standard library barely hold a candle to the expressive power of many functional programming and "scripting" languages.
Re:Why not just use pure C++? (Score:2)
I don't think expressive power and syntactic sugar are independent concepts in practice. (C.f. any language is syntactic sugar, equivalence of Turing-complete languages, everything gets turned into machine code eventually, etc.)
But the original comment was that C++ made it more difficult to use complex data structures, and given that we can express several useful concepts much more concisely using language features such as those I mentioned than is possible with C++, I think that's a fair claim.
Re:Why not just use pure C++? (Score:4, Interesting)
Which is why languages like python were written in the first place. They pretty much just make the underlying C calls anyways, but do so in a way that handles buffer overflows, pointers, etc., that pretty much make C/C++ so troublesome, hazardous, and hard to learn. I like java (alot really), but nothing beats a good scirpting language, like perl or python, to handle tasks like text manipulation. Python is especially good at using libraries, such as the imaging library, which are written in C anyways. How much faster can you get calling a C library from C than from python? I honestly don't know, but I can't imagine it's that much more. But when you add in speed of development, safety, and even portability, it's powerful.
Python's OOP is also a feature that makes it far more attractive than perl for me. Perl does OOP, but it's not as clean as python's, and I don't think it supports all the OOP features either. Doing GUI's is not the strength of any scripting language, but it depends on what you need to do. You can write a native frontend and embed python into a C or even a java application.
Re:Why not just use pure C++? (Score:3, Informative)
This is why projects like pyGTK [pygtk.org] exist
Re:Why not just use pure C++? (Score:2)
Re:Why not just use pure C++? (Score:2)
Re:Why not just use pure C++? (Score:2)
Re:Why not just use pure C++? (Score:2)
You say that like you think Java isn't a scripting language, but an analysis of language features, like anonymous inner classes (which encourage on-the-fly, non-designed extensions to applications) clearly shows that it is more appropriate to scripting than an applications development, particularly if you care about run-time performance (yeah, I know, with the right JVM and stu
Re:Why not just use pure C++? (Score:3, Interesting)
your "expert" C++ guy wasn't an expert. Can you describe the
problem a little better.. if what you say is true, I as
a long term C++ programmer would consider switching, but
I've looked at python, and I simply don't believe you.
I'll grant that C++ is a nightmare for beginners with more pitfalls
than an indiana jones movie, but once you know them, writing
poorly performing code is unlikely.
Re:Why not just use pure C++? (Score:2)
Re:Why not just use pure C++? (Score:2)
Re:Why not just use pure C++? (Score:2)
Why would you even try to compete with file I/O? Disk access is slow enough that an interpreted language is going to have no problem keeping up with a compiled one. As for the 4 hours, yeah, there's some stuff that Python just does better, but I'm inclined to say they just didn't know what they were doing, mostly because they even attempted competing on the I/O front.
Stupid comparison (Score:4, Insightful)
Re:Stupid comparison (Score:2)
Or, to put it another way:
Re:Sounds good... (Score:4, Insightful)
Uh, why would they have to? This goes from Python to C++, not vice versa. If there are no pointers or structs in the Python code, why would they have to handle them? Certainly, it's quite possible that some Python variable types will be converted to pointers or structs in the output code, but that's orthagonal to the issue of Python not having them natively.
If you were trying to go from C++ to Python, then you'd have to convert C++ pointers and structs to some sort of Python data type, and your comment would make sense. As it is, I'm not sure what you were trying to say.
Re:Sounds good... (Score:2)
Re:Sounds good... (Score:3, Insightful)
Why would one ever need to do that? The goal is not to write C++ in Python, it's to compile Python to machine code via an intermediate Python -> C++ compilation.
Re:Sounds good... (Score:2)
If you are just making a prototype, why is squeezing out extra perf so important? Prototyping is the sort of situation where you should be fine with just using straight python.
Re:Sounds good... (Score:2)
Nor do I think most projects will be able to depend entirely upon it. Looking at the limitations listed in the announcement, it seems like it takes away many of the things that make Python engaging as a language. What seems to remain? Imagine taking a compiling C program, translating it (as literally as possible) into python, and then let this program translate it i
Re:Static Typing? (Score:2)
I recently wrote a largish simulation in python for a Biology course. The goal was to watch how a species spread over a planet given other competing species, natural disasters and the like. It took four in deep hack mode to write the whole thing,
Re:Static Typing? (Score:2)
No, it doesn't. AFAIK all the Type-SIG and other groups looking at it decided against it and the issue is dead.
Re:Static Typing? (Score:3, Interesting)
A more significant roadblock, IMO, is that he can't handle mixed types in 3+-tuples, which is very common.
Re:Static Typing? (Score:2)
Re:Static Typing? (Score:4, Interesting)
I love Python, but I hate the dynamic typing. It can be handy at times, but 99% of the time you make a variable to hold one kind of thing. Having the static typing would both improve performance (because the interpreter knew what you were up to) but would also eliminate bugs (because it would complain when I tried to set a double to "And now press...").
I'd love to see Python get optional static typing.
Re:Static Typing? (Score:2)
But then, I guess it wouldn't be Python....
Re:Static Typing? (Score:2)
(I'm not really sure if Lisp's static typing guarantees compile errors. I'm just a beginner Lisp Weenie. I have potential, though, right?)
Re:Static Typing? (Score:2)
It depends upon the compiler.
CLISP ignores such statements, because it only compiles to bytecode anyway.
CMUCL and SBCL do generate compiler warnings, and sometimes even errors. Your static typechecking will not go unnoticed.
I think that to be good, the Python compiler should do the same, but that may mean introducing new statements in Python, like in Lisp :
(declaim (optimize (compilation-speed 0) (speed 3)))
Re:Static Typing? (Score:2)
Re:Very nice, but... (Score:2)
Jeremy
Re:Very nice, but... (Score:3, Insightful)
Re:Very nice, but... (Score:3, Informative)
Re:Very nice, but... (Score:2)
Well, yes. All code in any language whatsoever becomes machine code before it runs on the processor, and I can certainly do things more memory efficiently in some languages than others, so it's at least theoretically possible. C# and VB.NET code both become the same variety of intermediate binary code before being translated to machine code, but that doesn't mean anything you can do in one can be done in the other
Re:Very nice, but... (Score:2)
With any particular processor, the language used becomes a mere choice of syntax. All languages are boiled down to machine code and are on a level playing field. Whatever language you write in, it's all just food for the processors instruction set.
C++ and BASIC both get reduced to x86 machine code before running on my machine. This does not mean you can do anything in BASIC that you can do in C++. There are things you can d
Re:Very nice, but... (Score:3, Insightful)
Indeed, VB.net and C# have very similar features and capabilities, and if there are big performance differences between them, it's because the authors of one of the compilers screwed up.
But the other posters were arguing that their performance and capabilities should be identical because they both compile to MSIL, and in fact that any language that does so would have equal performance and capabilities. Which is just silly; hence my silly IRock.net example. For a less silly example, Managed C++ certainly
Re:2-40 what? (Score:3, Insightful)
Re:Speaking from experience... (Score:3, Informative)
It's a great language -- combining the benefits of Python, Ruby, and C# -- and it's wonderful for proto-typing in the