Forgot your password?
typodupeerror

Python-to-C++ Compiler 181

Posted by timothy
from the calibrate-your-scales dept.
Mark Dufour writes "Shed Skin is an experimental Python-to-C++ compiler. It accepts pure, but implicitly statically typed, Python programs, and generates optimized C++ code. This means that, in combination with a C++ compiler, it allows for translation of pure Python programs into highly efficient machine language. For a set of 16 non-trivial test programs, measurements show a typical speedup of 2-40 over Psyco, about 12 on average, and 2-220 over CPython, about 45 on average. Shed Skin also outputs annotated source code."
This discussion has been archived. No new comments can be posted.

Python-to-C++ Compiler

Comments Filter:
  • by Surt (22457) on Thursday June 15, 2006 @01:21PM (#15541340) Homepage Journal
    Until he addresses mixed types in n-tuples, this won't be useful for very many people.
    • But he's on the right track. Python allows dynamic typing but nearly all of ones programs do not take advantage of it. Recognizing that is key to making it go fast I think. It would be nice to have a filter you could run over python that would find all the type ambiguous points and let you insert some sort of compiler hinting.

      I could envision it working like this. Instead of statically declaring all your variable types in every function, you instead simply declare that whatever tpyes are being used, the
      • I could well even imagine that using RTTI in C++ he could actually just split out whatever logic expected different types. Certainly there's some work there, but it seems plausible to me.
      • I suggest you try Haskell [haskell.org] or another language with type deduction. You don't normally have to declare the type for any function you write, but you can add explicit type declarations to help catch errors or as a form of documentation (which is checked by the compiler, of course).
      • But he's on the right track. Python allows dynamic typing but nearly all of ones programs do not take advantage of it. Recognizing that is key to making it go fast I think.

        I suspect that varies with the programmer. I'm pretty certain that much of my Python code contains things that a type deduction system (SML, Haskell) wouldn't be able to cope with. Certainly I use duck typing a lot.

        And besides, only one of maybe a hundred Python program I've written ran unacceptably slow. And that was a quick hack for

        • I suspect that varies with the programmer. I'm pretty certain that much of my Python code contains things that a type deduction system (SML, Haskell) wouldn't be able to cope with. Certainly I use duck typing a lot.

          How exactly does duck typing differ from the structural subtyping of e.g. OCaml, which allows you to write a function that can be passed any object, of any class or none, if it provides all the methods that function uses? The type inference system handles it just fine.

          Of course, "duck typing"

  • Ewwwww (Score:3, Funny)

    by $RANDOMLUSER (804576) on Thursday June 15, 2006 @01:25PM (#15541375)
    As a UNIX admin, I was saddled with one of these kinds of things years ago, a DEC-BASIC to C compiler for UNIX. The output code quality was incredibly bad: machine generated variable and function names, bizarro nested struct/union/struct data structures, 400-line functions peppered with calls to 1-line functions. Completely unreadable. Thank $DEITY that project died quickly.
    • Re:Ewwwww (Score:5, Insightful)

      by Anonymovs Coward (724746) on Thursday June 15, 2006 @01:30PM (#15541434)
      Completely unreadable.

      I think you're not supposed to read it. You're only supposed to feed it to your C++ compiler. f2c produced unreadable output too, but nobody read the output; at one time it was the only free fortran option on linux.

    • Re:Ewwwww (Score:2, Funny)

      by Virak (897071)
      Which is why I suggest you use brainfuck for all your coding needs. The generated code will make just as much sense as the original, if not more.
    • Re:Ewwwww (Score:3, Insightful)

      If you actually tried ShedSkin you'd find the C++ it produces is very similar to what a human might produce, and is actually quite easily readable. But then - why would you want to anyway? It's an intermediate form useful to pass to an optimising C++ compiler, not as something to read.
    • Re:Ewwwww (Score:4, Insightful)

      by Tim Browse (9263) on Thursday June 15, 2006 @05:29PM (#15543944)

      Yeah, whenever I look at the output of my optimising compiler, it's really hard to understand too. It's all in assembler, for a start.

      Plus, the quality of C code generated by CFront was rubbish - unreadable.

      Same with the Modula-3 compiler I tried. You couldn't work out what was going on in the resulting C code without a load of work.

      Can you see where I'm going with this?

  • by stonecypher (118140) <stonecypher@gmai l . com> on Thursday June 15, 2006 @01:26PM (#15541390) Homepage Journal
    See, it's all well and good to compile python to speed it up. The problem is, people are now saying that they can write efficient code in python just because it magically translates to C++, and because this translator is faster than other python compilers.

    This won't be meaningful until a converted python script is compared to efficient code written natively in C++ in the first place.
    • by Anonymovs Coward (724746) on Thursday June 15, 2006 @01:42PM (#15541568)
      I don't see your point. Some of us use python. It takes me a fraction the time to do something in python than to do it in any other language. I'm not interested in writing native C++ code because it's hypothetically faster (it's not faster if I count coding time). But I am interested in a good python-to-C++ translator. Why wouldn't any python user be?
      • by advocate_one (662832) on Thursday June 15, 2006 @02:04PM (#15541857)
        But I am interested in a good python-to-C++ translator. Why wouldn't any python user be?

        no, I'd be far more interested in a good compiler to compile that python straight to machine code...

        • Why? If you can convert Python to reasonably optimized C++, then you can leverage the C++ compiler to do all the machine-level optimizations, rather than reinventing yet another wheel.
          • it gives you an extra area for weird bugs to creep in... get the Python right and go straight to machine code with a trusted compiler.
            • it gives you an extra area for weird bugs to creep in... get the Python right and go straight to machine code with a trusted compiler.

              Is that the same way the method of using layers of multiple simple tools that all do one thing really well is more buggy that just using one larger general purpose monolithic app?

              A cross platform Python to machine code compiler would presumably need to reinvent a whole lot of difficult platform specific stuff that has already been solved by C++ compilers.

          • by mrchaotica (681592) * on Thursday June 15, 2006 @04:09PM (#15543090)

            ...and that's why it shouldn't be a Python to C++ translator; it should be a GCC frontend instead (i.e., translating to GCC's internal representation).

          • Not quite true. Analogy:

            Would you also like to translate a text from Arabic to English by passing through 3 or 4 languages in between?

            In this analogy the problem would probably be accuracy, in the case you presented it would be performance being lost due to layers of conversion. Some high level optimizations are inevitably lost (unless the C++ compiler has some sort of strong AI).
            • (unless the C++ compiler has some sort of strong AI)

              I can't imagine by what process of thought you came to think this was a useful thing to say. Strong AI is AI which is self-aware. It has nothing to do with problem solving capabilities. Furthermore, strong AI does not yet exist. Moreover, compiler optimizations are a set of rule-driven alterations based on mathematical proof that things aren't changed; theoretical AI wouldn't actually help in any way.

              Read a Searle book before engaging in this sort of s
          • This was my point exactly. The article says "this thing does a better job of converting Python to C++ in terms of efficiency than did the older one." People are hearing "This thing generates efficient C++." Nobody's tested that yet, though.

            You are making a gigantic assumption that because this converter's better than the last one, that it's usable in efficiency arenas. By comparison, you might be looking at the difference between a shoe and a shoe with a spring (that's what air pumps do, don't laugh) wh
        • I'd prefer a python-to-Common-Lisp compiler, but only because I hate running out of stack space for recursive algorithms.
      • Assume that it takes:
        - 4 hours to write a given program in python, 32 hours to write same program in C++
        - 10 seconds to run the python program, but just 2 seconds to run the faster C++ program
        - the program is run 20 times a day
        - assume the developer time costs as much as the the time of the person that runs it

        Ok, so it'll take 630 days of running this program for the faster C++ program to make up for the extra time to develop it. So, if you can wa
      • I don't see your point.

        Indeed. But you replied anyway.

        I'm not interested in writing native C++ code because it's hypothetically faster (it's not faster if I count coding time). But I am interested in a good python-to-C++ translator. Why wouldn't any python user be?

        I never said they wouldn't be. Please feel free to re-read what I said until you understand it. However, please don't reply again and attempt to argue with me over things I didn't say. It's obnoxious.
    • The problem is, people are now saying that they can write efficient code in python just because it magically translates to C++

      No they aren't, if only because ShedSkin doesn't handle modules yet and Python without modules is not really useful.

      and because this translator is faster than other python compilers.

      Last time I checked, it was the only Python compiler... (CPython is an interpreter, PyPy is also an interpreter, I'm pretty sure Vyper was also an interpreter [written in O'Caml before it got trash

      • Last time I checked, it was the only Python compiler... (CPython is an interpreter, PyPy is also an interpreter

        Neither CPython nor PyPy is a strict interpreter, both of them compile source to byte-code and then act as a virtual machine to run that byte-code. PyPy also does some work on compiling to native code on the fly, depending on which version you're using (Armin Rigo's is the most sophisticated on the JIT/native code front, but it's far from stable).
        • They're still not compilers. And CPython + Psyco is not a compiler either.
          • They're still not compilers. And CPython + Psyco is not a compiler either.

            Yes, they are. They translate from one language to another. In the case of Psyco, that language happens to be machine code; in the case of CPython it's bytecode. They also have an interpreter (or runtime, virtual machine, whatever you want to call it) which is no different from, say, C++ having a runtime required in addition to your code (and like C++ you can statically link the runtime into a standalone executable if you want to).
        • Neither CPython nor PyPy is a strict interpreter, both of them compile source to byte-code and then act as a virtual machine to run that byte-code.

          I think that falls under the standard definition of "interpreter".

          Even ancient versions of MS BASIC were bytecode-oriented. They had to be to fit any decent-sized program into the limited RAM available.

          • Neither CPython nor PyPy is a strict interpreter, both of them compile source to byte-code and then act as a virtual machine to run that byte-code.

            I think that falls under the standard definition of "interpreter".


            I don't. It's a mixed system, with both a compiler and an interpreter. A strict interpreter runs directly on the source code (cf Bash).

            Note that a compiler doesn't have to be an ahead-of-time process that generates machine code directly. Many C compilers generate assembly code that is then compil
      • The problem is, people are now saying that they can write efficient code in python just because it magically translates to C++

        No they aren't, if only because ShedSkin doesn't handle modules yet and Python without modules is not really useful.


        RTFA. They're saying this verbatim. This means that, in combination with a C++ compiler, it allows for translation of pure Python programs into highly efficient machine language.

        Last time I checked, it was the only Python compiler

        Then you need to check again. Arguin
    • Yep, every new tool is just an opportunity for idiots to screw up in new ways.

      What a gloomy way of looking at the world.
  • Native code (Score:3, Insightful)

    by Roy van Rijn (919696) on Thursday June 15, 2006 @01:29PM (#15541419) Homepage
    This is a good step to make Python run a bit faster, but I don't think it'll really make a huge difference.

    The best way to get some speed and still keep the nice Python functions and layout is just to export the most heavily used functions to native code (C/C++).
    I don't know if its possible to take the C++ output and optimize it seperatly, that way you will have a good start to make native code though.

    In short: Better, fast and easy, but not the best (if you can write native code)

    • I think this sort of tool serves a different purpose. If you have an evolving program, one that needs some speed, but also needs rapid development, then hopefully what this allows is that you do not need to write the heavily used functions in c++, but instead can translate and compile each version of your python program as part of your tool chain. Your strategy makes more sense as your project reaches maturity and stability, whereas this sort of technique is more effective for situations where performance
  • Very interesting... (Score:4, Informative)

    by FuzzyDaddy (584528) on Thursday June 15, 2006 @01:33PM (#15541460) Journal
    This is a very interesting development, both from the practical promise and just 'cause it's cool. However, as a python programmer myself, it's not yet in a usable form. Much of the efficiency of programming in python is the standard libraries (in particular Tkinter for user interfaces), and the non-standard libraries (for example, the serial port library). This project does not yet support these.

    Among python programmers, I'm curious - how many use psyco (another python performance enhancement tool) for their projects? I fiddled with it a while ago (it didn't work because of a C module that it didn't like), but never had a compelling reason to go back to it. Performance optimization has never been important enough for my applications to merit the effort.

    • by tcopeland (32225) *
      > However, as a python programmer myself, it's not yet in a usable form

      Yup. Along the same lines, Ruby has a related project by Ryan Davis, Ruby2C [rubyforge.org]. It's useful for small localized speedups, but you wouldn't want to try to write your entire app in it.
    • A friend used it once because he was generating Fractals in python and Psycho significantly sped up his script (77% gain with his first algorithm [run time dropped from 197s to 46.52s], 98% with his second algorithm [runtime dropped from 154s to 13.4s])
    • by zhiwenchong (155773) on Thursday June 15, 2006 @02:05PM (#15541863)
      It's all a matter of magnitude.

      I use Psyco in my work. My app is a code generator that processes multiple models and transforms them into optimization code. Psyco reduced the time it took for process 1 model from 20 seconds to 2 seconds. It doesn't sound like much, but when you have to do it for lots of models, the speedup suddenly becomes quite substantial.
    • I'm curious - how many use psyco (another python performance enhancement tool) for their projects?

      <aol>me too!</aol> Much of my code spends the majority of its time waiting for database queries to finish, ImageMagick to finish doing its stuff, files to copy, etc. Psyco doesn't do a thing for that stuff. On the other hand, a small amount of my code is pretty CPU intensive - not enough to break out of pure Python into Pyrex or anything else, but enough to want a performance upgrade. For thos

  • ...kind of reminds me of the Google Web Toolkit [google.com] which is more or less a Java to Javascript/HTML compiler. It's not an optimization thing like ShedSkin, instead it lets folks use the Java skills they already have to write better web apps. I wonder what they use to parse the Java code? I don't see any mention of JavaCC [java.net] on their site, or ANTLR either for that matter...
  • I'm confused... (Score:5, Interesting)

    by advocate_one (662832) on Thursday June 15, 2006 @02:02PM (#15541823)
    surely the best way to speed it up is to compile it straight to object code... c++ has to be compiled and just adds an intermediate step which will make things harder to debug...
    • Re:I'm confused... (Score:3, Interesting)

      by Dasher42 (514179)
      I think that the best example of what you're saying would be the Java compiler in the gcc suite. That separate front-end, back-end approach of gcc is terribly helpful.

      And yet, if you're going to compile Python, I'd want the translation into source code. If it's worth rewriting in C++, it's worth tuning, especially if you can improve the usage of type-safe code.
    • surely the best way to speed it up is to compile it straight to object code...

      Absolutely not. It takes an enormous amount of effort to compete with the native code generation of good C or C++ compilers - and much of that effort has to be repeated for every platform you target.

      Many language implementations for less mainstream languages compile through C, treating it as a "portable assembler", and leveraging all the work that's been done to optimize C compilation. This is even done for some high-end languag

  • ...why not make it into a GCC frontend so Python can be compiled directly?
    • ...why not make it into a GCC frontend so Python can be compiled directly?
      Probably because it is fairly hard to translate a dynamically typed language into the RTL GCC uses for its backends.

      I think they had a lot of problems even 'only' with C++ and all its powerful constructs like templates etc.
    • We must not be programmers; I though the same thing. The only advantage is that you could go in and tweak the C++. But if you can read a machine compiled code well enough to make "corrections", wouldn't you just code it that way to begin with?

      On second thought, it could be to allow compilation on different platforms. Write once and precompile from Python, then compile for each system you prefer, using the system-specific compiler to optimize for the processor architecture. Of course, I'm just guessing. Hell
    • by rpwoodbu (82958) on Thursday June 15, 2006 @03:25PM (#15542682)
      It is worth mentioning that one of the the original implementations of C++ (if not the very first) was "cfront", a C++-to-C converter. I see this as a much easier way to get a new language implemented quickly, as you can take advantage of the common functionalities already implemented in the target language of the converter. Although Python is not a new language, using it as a compiled language is new, and thus I believe it is comparable to being a new language for this argument. C++ and Python have a lot in common, which makes C++ a very suitable target language for a Python-to-[compiled_language] converter.

      If this converter proves to be successful, I believe that a GCC frontend will be written eventually. There are probably potential optimizations that would be difficult or impossible to implement any other way.

      Some may think that the dynamic nature of Python may preclude its inclusion in GCC. Technically, all that would need to be done is to have a runtime to handle dynamic things, similar to how Objective-C (for which there is GCC support) has a runtime to handle message passing and late binding. However, a large portion of the potential efficiency of a compiled version of the language would be lost to these dynamic capabilities; luckily, a compiler can detect when things are implicitly static (in fact, this converter is limited to implicitly static constructs), and optimise them to be truly static at compile-time.
  • by suitepotato (863945) on Thursday June 15, 2006 @02:13PM (#15541953)
    Why? Read the linked page? Says it all. Violates most any Python code of any complexity out there. So if it doesn't convert Python code from the real world, what is it for? Making Python coders learn enough about C++ to remember the limitations and write/rewrite Python code to use it?

    What the Python C/C++ interested people REALLY need is a book written by a group of Python AND C/C++ masters which teaches the two simultaneously showing complimentary methods of doing any given thing working from beginner to advanced and I DON'T mean "How to turn your n00b Python code into C/C++ hotness" sort of viewpoint. I mean both taught simultaneously in synch showing how they can interchange and compliment.

    Software tricks for converting? Ultimately worse than not having them because it leads to horrible obfuscation because we don't know exactly what is going on when 13,412 lines of Python is turned into C++ because WE DIDN'T WRITE IT AND WE NEVER LEARNED C/C++. "Say Mike, that's great but you're the company code cowboy and you don't do C++ natively and I sure as hell don't read it being management so exactly what happens if this needs to be fixed? We've gone from importing open source code you couldn't read to writing our own open source code you can't read."
    • The way this will be used by pythonistas is not to convert 13,412 lines of code blindly in to C++. Rather, it provides a pythonic way of getting some speed benefit for those parts of the program that need it and that code will also be accessable to C++ programs as an added benefit.
    • Why? Read the linked page? Says it all. Violates most any Python code of any complexity out there. So if it doesn't convert Python code from the real world, what is it for? Making Python coders learn enough about C++ to remember the limitations and write/rewrite Python code to use it?

      And when Linux was still at 1.x, it should have been dismissed by business because it didn't support SMP.

      The software is barely written. Have patience.

    • by try_anything (880404) on Thursday June 15, 2006 @06:45PM (#15544620)
      Software tricks for converting? Ultimately worse than not having them because it leads to horrible obfuscation because we don't know exactly what is going on when 13,412 lines of Python is turned into C++ because WE DIDN'T WRITE IT AND WE NEVER LEARNED C/C++. "Say Mike, that's great but you're the company code cowboy and you don't do C++ natively and I sure as hell don't read it being management so exactly what happens if this needs to be fixed?"

      That isn't how a compiler is used. When you compile a C++ program, you don't throw away your C++ source and check the executable into source control. "Oh, no! We used gcc and now we have a bunch of gobbledygook we don't understand!"

      The C++ is an intermediate stage in the make process, akin to the output of various phases of gcc.

  • by radtea (464814) on Thursday June 15, 2006 @05:00PM (#15543569)

    Python is a terrific prototyping language (and lots of other things besides.) As a C++ coder I've been using it for prototyping stuff that will eventually be integrated into a larger application and therefore MUST be translated to C++. So what I'd like to see is a tool (written in Perl, just for the fun of having a linguistic threesome) that just does a light gloss on Python syntax to get me most of the way to human-readable C++. That would be far more useful (to me) than thsi thing, which sounds more like f2c, whose output could case brain damage in humans and cancer in rats, or possibly the other way around.

Life is difficult because it is non-linear.

Working...