Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Is Profiling Useless in Today's World?

Posted by CmdrTaco on Fri Jul 05, 2002 01:27 PM
from the optimization-shmoptimization dept.
rngadam writes "gprof doesn't work in Linux multithreaded programs without a workaround that doesn't work that well. It seems that if you want to use profiling, you have to look for alternatives or agree with RedHat's Ulrich Drepper that "gprof is useless in today's world"... Is profiling useless? How do you profile your programs? Is the lack of good profiling tools under Linux leading us in a world of bloated applications and killing Linux adoption by the embedded developers? Or will the adoption of a LinuxThreads replacement solve our problems?"
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Profiling Again? (Score:4, Funny)

    by stirfry714 (410701) on Friday July 05 2002, @01:29PM (#3828761)
    Why can't my code be judged by the content of its characters, and not by the color of its extension?

    Down with profiling! :)
  • Profiling is Useful (Score:5, Insightful)

    by Anonymous Coward on Friday July 05 2002, @01:30PM (#3828776)
    Maybe gprof, as an implementation might not be useful. But profiling, especially under Java, can make a world of different to an application.

    Saying "profiling isn't useful" is similar to saying "having information isn't useful".

    That's just dumb.
    • Re:Profiling is Useful by Westley (Score:1) Friday July 05 2002, @01:42PM
    • Re:Profiling is Useful (Score:5, Informative)

      by anonymous_wombat (532191) on Friday July 05 2002, @02:02PM (#3829034)
      In single threaded programs, just one type of profiling needs to be done, the kind that standard profiling tools measure. In multi-threaded programs, the relative execution times of the various threads may be more important. The first thing to do is to figure out which threads are using most of the resources. After this is done, and any optimizations made, the old-style profiling and optimizing of slow methods is just as important as ever. If your program is spending 80% of its time sorting, then optimize your sorting code.

      Of course, for many applications, multi-threading achieves the vast majority of the speed increase, and profiling will only be of marginal utility. The profiler is just one tool of many, and is not a silver bullet.

      [ Parent ]
    • Every terrorist was a profiler . . by Gatesninny.net (Score:1) Friday July 05 2002, @02:25PM
    • Re:Profiling is Useful by ChrisEmpson (Score:1) Friday July 05 2002, @07:28PM
    • 1 reply beneath your current threshold.
  • Ulrich Drepper (Score:2, Insightful)

    by quigonn (80360) on Friday July 05 2002, @01:31PM (#3828785) Homepage
    Ulrich Drepper is a fool, he made glibc crappy, and messed up most things he had to do with. He simply should shut up and let other people do the work and the thinking.

    Yeah, mod me down, but I have insight into the things Ulrich does, and he mostly does sh*t. Just my 2 cents (USD or EUR, you decide).
  • OProfile (Score:5, Informative)

    by mmontour (2208) <mail@mmontour.net> on Friday July 05 2002, @01:32PM (#3828794)
    Take a look at OProfile [sourceforge.net]. It's quite a nice tool, although it's not a direct replacement for gprof. From their 'About' page:

    OProfile is a system-wide profiler for Linux x86 systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.

    It consists of a kernel module and a daemon for collecting sample data, and several post-profiling tools for turning data into information.

    OProfile leverages the hardware performance counters of the CPU to enable profiling of a wide variety of interesting statistics, which can also be used for basic time-spent profiling. All code is profiled: hardware and software interrupt handlers, kernel modules, the kernel, shared libraries, and applications (the only exception being the oprofile interrupt handler itself).
  • but I suppose other people want to profile more than just Java. Bother.
  • 'pstack' on Solaris (Score:2, Informative)

    by wavq (216458) on Friday July 05 2002, @01:37PM (#3828826) Homepage
    While it doesn't give the exact time spent in a
    given function, running 'pstack' against a
    processID under Solaris will give the execution
    stack trace of any threads present.

    If you find that 80% of your threads are in
    slow_function( someParam ) then ya better get to
    work fixing it. This also has the added advantage
    of not slowing down your program with profiling
    code and other hooks.

    Obviously this isn't great for fine-grained
    profiling, or with applications with few threads,
    but I've found it helpful on my larger projects.
  • Hell, yes it's useful (Score:3, Insightful)

    by PissingInTheWind (573929) on Friday July 05 2002, @01:37PM (#3828834)
    Maybe the problems with today's profiler is that the compiler implementors spend too much time making a compiler that is going to try to optimize everything by itself, which then might not even get the best code in that case.

    What could be more useful is if the compiler implementor would spend as much time on the profiler than on the compiler: you would then be able to easily see faulty parts in your software and be able to determine what needs to be optimized.

    Good profilers would means efficient code. Don't think profilers are useless because most implementations of them sucks.

    • Re:Hell, yes it's useful (Score:4, Insightful)

      by maxwell demon (590494) on Friday July 05 2002, @02:07PM (#3829066) Journal
      While imroved profilers would surely be useful, don't think optimizing compilers are useless.
      • Hand-optimized code tends to be less clear and less readable. Also, it makes it easy for new bugs to creep in.
      • Hand-optimized code would be machine-specific. While it would work on other machines, it would be dog slow there. So you'd basically be back to per-architecture versions of your program.
      • Some optimizations cannot be done by the programmer, because they ocur at levels below the language. For example, the POWER architecture has a multiply+add instruction. Most common programming languages don't have a multiply+add command. So how would you optimize the use of that instruction?
      • Hand-optimization at the level the compiler does it could even hinder hand-optimization in the area where it is most effective and the compiler cannot do it at all: algorithmic optimization. To do that efficiently, you need highly structured code so you can exchange algorithms easily. However microoptimizations of the sort the compiler does them tend to destroy such structures.
      However, with the compilers getting more sophisticated in optimization, profilers get even more important: While you may be able to add some "profiling instructions" for your own use, profiler-driven optimization in the compiler cannot use such a replacement.
      [ Parent ]
    • Better yet: Optimization from profiler feedback by yerricde (Score:2) Friday July 05 2002, @03:59PM
  • gprof far from useless (Score:1, Informative)

    by tps12 (105590) on Friday July 05 2002, @01:38PM (#3828840) Homepage Journal
    First, a little background on gprof, for those new to the *n*x world. Gprof is what's known as a profiler. Basically, it inserts code into the beginnings and ends of functions. When you run your program through gprof, it then records how much time is spent in each part of your program. The idea is that the programmer sees where most of the time is being spent, and optimizes that part of the program.

    Now, as for the charges of gprof being useless, I can say that that is far from the case. True, it falls flat when dealing with multithreaded programs. But in practice, multithreaded programs are almost always interactive, and thus are primarily limited by user response times, which are many orders of magnitude longer than even the worst algorithm. In these cases, reducing the amount of input required from the user will always pay off better than any optimizations.

    As an example, in our enterprise database frontend, we had a dialog that would prompt users for an administrator password when they attempted the "delete" command. We did analysis (with a commercial profiler, but it may as well have been gprof) and found that, lo and behold, the bulk of execution time was being spent waiting for the user to type in the password. So what we did was change the delete command to "eteled" ("delete" backwards), and only told the administrators the new command name. This way, we could be certain that only administrators would even attempt a deletion, and no password prompt was necessary. We have since applied the same design philosophy throughout our software, and productivity is at an all-time high.

    As is usually the case, profiling can be the most important part of a project or next to useless. It all depends on how you use it. Gprof is a great tool for what it does; you just have to know how to use it properly.
    • Re:gprof far from useless by TheDick (Score:1) Friday July 05 2002, @01:53PM
    • Re:gprof far from useless by cruelworld (Score:2) Friday July 05 2002, @01:53PM
    • [YHBT] Re:gprof far from useless by ethereal (Score:1) Friday July 05 2002, @02:29PM
    • Re:gprof far from useless (Score:4, Insightful)

      by tuxlove (316502) on Friday July 05 2002, @02:56PM (#3829323)
      But in practice, multithreaded programs are almost always interactive, and thus are primarily limited by user response times,

      I would disagree with this wholeheartedly. What about databases like Oracle, MS SQL Server, and so on? They're internally multithreaded, and most definitely not "interactive" after you initiate a SQL query.

      I believe apache 2.0 is threaded. HTTP by nature is not interactive. And so on. There are many other examples, left as an exercise to the reader.

      While it is true that threads are very useful for interactive programs, in fact critical, their use does not stop there by a longshot. Any program which needs to do two things at once without fear of blocking on a system call is a candidate for threads. Threads are also useful for distributing compute cycles over multiple processors within a single process, allowing it to gain the benefit of concurrency.

      The project I'm currently working on is a custom database application, and without threads it would be useless. And there are no users talking to it directly, that's for sure.

      reducing the amount of input required from the user will always pay off better than any optimizations.

      I find this perplexing. Nobody cares about optimizing a user dialog. Reducing user input or optimization of user input code would serve little purpose in most multithreaded applications I'm aware of. Generally, interactive multithreaded programs use threads so they can interact with users while simultaneously performing some other task that shouldn't be stalled by waiting for user input. For example, a network monitor might have three threads: one for watching network traffic, one for resolving IP addresses to hostnames, and one for taking user input. It doesn't matter how long the user input thread sits around waiting for the user to type/click something. There are two other threads working away in the meantime, watching traffic and displaying it for the user, oblivious to whether or not the user is doing anything. In such a case as this, profiling the watcher/resolver threads might be very useful indeed, since they need to be more or less realtime.

      This gprof problem is a serious issue, and minimizing it by saying that threaded programs generally wouldn't benefit from profiling is naive.
      [ Parent ]
    • the 'worst' algorithm by Kenard (Score:1) Friday July 05 2002, @04:36PM
    • 7 replies beneath your current threshold.
  • by orz (88387) on Friday July 05 2002, @01:40PM (#3828859)
    I can't get any useful profiling information out of Microsoft Visual C++. When I compile in profiling mode, my program runs at less than 1% of normal speed, producing completely useless data. Am I doing something wrong? Should I be using 3rd party tools?
    • Re:Dead on linux? What about windows? by Malc (Score:1) Friday July 05 2002, @01:46PM
    • VTune by Anonymous Coward (Score:1) Friday July 05 2002, @01:47PM
    • VTune and Quantify (Score:4, Informative)

      by Codex The Sloth (93427) on Friday July 05 2002, @01:53PM (#3828950)
      If you want tree profiling (i.e. information about function and child performence) then Rational Quantify is a reasonable alternative to the crap profiler that comes with MSDev.

      If you want a flat profiler or need to analyze the cost of specific low level operations then you MUST get Intel VTune.
      [ Parent ]
  • Profiling will always be useful (Score:5, Informative)

    by Wesley Everest (446824) on Friday July 05 2002, @01:40PM (#3828861)
    I work as a game developer, and we have to make sure that everything that is done for each frame takes less than 33ms. So we're always profiling our code to cram more functionality into a limited amount of time.

    But even if you aren't doing something that is speed intensive like games, you always have tradeoffs when you choose your data structures and algorithms. Generally you first code up the easiest algorithm that you think will use an acceptable amount of memory and CPU time. Then, later, if something is too slow, you have to identify where the problem is. If could be that you chose an O(N^2) algorithm not realizing that N might be 1,000 instead of the max of 100 you were counting on, forcing you to switch to an O(NlogN) algorithm that is more complex.

    Now, if it is a small application, you might have enough familiarity with the code to be able to guess where the problem is -- then you fix it and see if it is still slow. If that works, then you're set and profiling isn't necessary. But if the fix doesn't speed it up enough, then you're stuck. You have to profile it somehow.

    You might try simple tricks like changing the code to loop on a suspected bit of code 100 times and see how much longer it takes. Or maybe throw in some printf's that spit out the current time at different points. Or maybe create your own profiling code that you manually call in functions you want to time. Or, you might use an actual profiler without modifications to the code. But lacking a profiler doesn't mean you can't or won't profile your code.

    And even with CPU speed doubling every couple of years or so, that doesn't mean speed is no longer an issue. You can easily choose the wrong algorithm and have something take 1000s of times longer to run than the proper algorithm.

  • I used gprof (Score:3, Informative)

    by Zo0ok (209803) on Friday July 05 2002, @01:40PM (#3828863)
    I used gprof quite much during my Master Thesis work this spring. gprof tells what functions consumes most cputime, and those functions could be optimised. Usually very small parts of the code consumes most of the cpu-time.

    This program was parallellised on network level - all clients were singlethreaded. If someone has multithreaded for performance (to utilize more than one cpu) I suppose gprof will still work well on a single cpu machine with just one thread.

    For programs that consumes lots of cpu time for well-defined computations it should not be hard to profile a single threaded version (a single threaded version is needed for debugging anyway).

    More complex applications (for example a web browser) I imagine are more dependant on multi-threading, and should pose a larger problem.

    gprof, is probably not dead - if you need it you can adapt the program...
  • Programmers, not tools (Score:4, Insightful)

    by sane? (179855) on Friday July 05 2002, @01:45PM (#3828891)
    The problem is not that certain tools have issues; but rather that today's programmers have no interest in creating efficient code.

    Those of us that started programming in 1k and sub megahurts can really feel the time taken by badly coded applications. We know that forgetting what is happening on the silcon can kill how well our code will run.

    However, those who started coding after ~1987 don't really have a gut feeling for it. To them the latest processor will make up for their bad coding. To a certain extent they are right. Today's advances STILL keep up with Moore's law, still make up for their lack of skill. However, when one looks at what is actually performed with all that power, one tends to question why we are paying so much, for so little.

    Can you actually say that MS WordXP is much better than the non-WYSIWYG wordprocessor of yesteryear (itself a blast from the past) ?

    We don't need profilers, we need coders have have that tacit knowledge of what really counts, where they should put real effort.

    Unfortunately that doesn't come in a software box.

    • Performance is not what really counts. by Tom7 (Score:2) Friday July 05 2002, @02:09PM
      • Yes it does! by Anonymous Brave Guy (Score:2) Friday July 05 2002, @06:49PM
      • 1 reply beneath your current threshold.
    • Re:Programmers, not tools by Malc (Score:3) Friday July 05 2002, @02:13PM
      • Re:Programmers, not tools by gillbates (Score:2) Friday July 05 2002, @06:11PM
      • No, he's right (Score:4, Insightful)

        by Anonymous Brave Guy (457657) on Friday July 05 2002, @07:08PM (#3830492)
        Why waste your time trying to write efficient code from the start? It's much better to write easily unstandable, easily maintained, quickly written and minimal bug code.

        Why are these mutually exclusive? There's efficient and there's optimised, and one is a much easier subset of the other.

        He's not claiming that everyone should hand-optimise from the word go. He's saying programmers should have a basic knowledge of their craft. It doesn't take much extra effort to use an efficient sorting algorithm or store data in a fast look-up structure, rather than writing a naff, hand-crafted shuffle sort and using arrays for everything whether they're appropriate or not. And yet, through ignorance or plain laziness, most programmers in most languages take the latter approach. (If you've never seen any of the source code for big name applications/OSes, trust me, it's scary.)

        Similarly, it is just careless to pass large structures by value unnecessarily in a language that has reference semantics. You have to know the basics of what is efficient use of your tools of choice if you want to write good code, and the old Moore's Law excuse is just a cover for laziness and failure to do the requisite amount of homework.

        Note that, very importantly, none of these things requires more than a small effort. They certainly don't compromise maintainability, bug count or any other relevant metrics, and a competent programmer (if you can find one) will take these things in his stride, and still be faster than the others.

        I used WordPerfect 5.0 (or whatever it was) on a dual 360K 5.25" floppy disk drive machine. Plain blue text screen only. I have to say, I *much* prefer Word XP.

        Interesting... We have just acquired a new P4/2.2GHz with 512MB RAM and running WinXP as a development machine at work. You know what? It's way, way slower than the 1.4GHz P4 running 2000 we already had. And that in turn is way slower than the 1GHz P3 running NT4. This is not subjective, it is based on obvious, objective measures. For example, my new machine (the fastest of the above) sometimes takes 3-4 minutes to act on an OK'd dialog in Control Panel. The NT4 box reacts instantly when you configure the equivalent options. Something is wrong at this point, and I'm betting it's a combination of code bloat and feature creep.

        [ Parent ]
      • Re:Programmers, not tools by sane? (Score:2) Saturday July 06 2002, @01:55AM
    • Re:Programmers, not tools by gid-foo (Score:1) Friday July 05 2002, @02:27PM
    • Re:Programmers, not tools by Tim Browse (Score:2) Friday July 05 2002, @02:30PM
    • working code, not pipe dreams by Splork (Score:3) Friday July 05 2002, @02:49PM
    • Re:Programmers, not tools by sheldon (Score:2) Friday July 05 2002, @02:53PM
    • Re:Programmers, not tools by gorilla (Score:2) Friday July 05 2002, @03:30PM
    • Re:Programmers, not tools by elindauer (Score:1) Friday July 05 2002, @04:31PM
    • You're so right by Anonymous Brave Guy (Score:2) Friday July 05 2002, @06:58PM
    • Re:Programmers, not tools by Planesdragon (Score:2) Friday July 05 2002, @08:34PM
    • 2 replies beneath your current threshold.
  • Not useless (Score:5, Insightful)

    by pthisis (27352) on Friday July 05 2002, @01:52PM (#3828939) Journal
    Profiling in general certainly isn't useless. I'll usually write new code primarily in a high-level, high-productivity language (e.g. Python), and if it's too slow I'll profile it and rewrite applicable parts in C. Some projects require a lower level (C) approach from the start, though those are pretty rare. Without profiling you'll spend a lot of time optimizing code that isn't a bottleneck.

    Remember the words of Knuth: "Premature optimization is the root of all evil." Without profiling, you don't know what optimization is really needed and what isn't.

    That said...
    BEGIN RANT
    I've used gprof successfully with plenty of recent code. It works perfectly fine in non-threaded code, which _should_ be the majority (99%+) of code out there. Yes, that includes big network servers (the last one I wrote just recently passed the 6 billion requests served mark without blinking). Threads are a really nasty programming rathole that should be applied in a limited way; they take much of the time and effort spent developing protected memory OSes and toss it out the window. They also tend to encourage highly synchronized executions instead of decoupled execution, which often makes things both slower and more bug-prone (locking issues are _tough_ to get right when they become more than 1-level) and slower to implement than a well-designed multiprocess solution with an appropriate I/O paradigm. Just because two popular platforms (Windows and Java) make good non-threaded programming difficult doesn't mean you should cave in.
    END RANT
    • Re:Not useless by TWR (Score:2) Friday July 05 2002, @02:15PM
    • So threads are evil -- now what? by johnfoobar (Score:2) Friday July 05 2002, @02:17PM
      • Re:So threads are evil -- now what? (Score:5, Insightful)

        by pthisis (27352) on Friday July 05 2002, @02:42PM (#3829262) Journal
        Okay, so let's say threads are evil.

        Okay.

        But processes as provided by current operating systems are too expensive to use.

        No, they aren't. Have you measured fork() speeds under Linux vs. pthread_create() speeds()? Sure, Windows and Solaris blow at process creation (and Windows doesn't have a reasonable fork() alternative--it conflates fork() and exec() into CreateProcess*()), but that doesn't make all OSes brain-dead.

        If I have a network server (e.g. a httpd) that has to create a process for each network request, it will never scale.

        Right. And if you create a new thread for each network request, you'll never scale--give it a try some time. Good servers that use a thread/process for every connection do so with pre-fork()'d/pre-pthread_create()'d/whatever pools. Apache, for instance, uses multiple processes (but no multithreading, except in some builds of 2.x) but pre-forks a pool of them. This is really basic stuff, even an introductory threading book will talk about pooling and other server designs.

        Really scalable/fast implementations don't even do that. They use just one process (or one per CPU) and multiplex the I/O with something like select, poll, queued realtime signals (Linux), I/O completion ports (NT), /dev/poll (Solaris), /dev/epoll, signal-per-fd, kqueues (FreeBSD), etc. (select and poll don't scale well to 10s of thousands of connections when most are idle, but some of the others are highly scalable). See e.g. Dan Kegel's c10k page [kegel.com] for specifics.

        Obviously, the OS needs to change, and give use something (maybe a hybrid between processes and threads) that more closely meets applications needs

        http://www-124.ibm.com/pthreads/ proposes an M:N threading model and offers an implementation, but it still has the shared memory problems of threads. multiprocessing may not be sexy but it's really a lot cleaner for most problems and can be more efficient in a lot of domains.

        Sumner
        [ Parent ]
      • by Splork (13498) on Friday July 05 2002, @02:46PM (#3829284) Homepage
        minimize the use of threads whenever possible. write your code in an event driven fashion as your friendly AC suggested. the poll() system call [superior to select(), though select() works well within its fixed size filedescriptor array limits] makes this possible.

        the basic mentality to switch from threads to event programming is this: anytime you're using a thread solely so that it can sit around and block on high latency events (network or disk I/O) most of its lifetime, it should not be a thread.

        its acceptable to have worker threads/processes that you hand computational tasks to and they trigger an event in your event loop when they hand a result back, but don't use threads of execution to manage your state. you'll pull your hair out and still have a nonfunctional program.
        [ Parent ]
      • Re:So threads are evil -- now what? by johnfoobar (Score:1) Friday July 05 2002, @02:45PM
      • 1 reply beneath your current threshold.
    • Re:Not useless by dvdeug (Score:2) Friday July 05 2002, @02:20PM
    • Re:Not useless by Lozzer (Score:1) Friday July 05 2002, @03:49PM
    • Re:Not useless by tc (Score:1) Friday July 05 2002, @05:13PM
    • Re: premature opitimization quote by alangmead (Score:1) Saturday July 06 2002, @09:01AM
  • by march (215947) on Friday July 05 2002, @01:52PM (#3828947)
    Profiling, in one form or another, is ABSOLUTELY necessary. There is no other way to find out why (and where!) your code is running slowly.

    Does gprof do everything we need? No. Are there better tools? Yes.

    But, the bottom line is that if you don't profile your code (and unit test it, and integration test it, and...), you are not writing good code.

    It's like debating if "breathing" is necessary or not.

  • by GGardner (97375) on Friday July 05 2002, @01:55PM (#3828970) Homepage
    It isn't just gprof that's broken by pthreads, other Linux tools fall victim as well. Core dump? Almost useless with pthreads running. Gdb? Getting better, but still a little wonky. Certain aspects of signal handling don't work as expected with pthreads.
  • ACE has the answer (Score:3, Informative)

    by Ricdude (4163) on Friday July 05 2002, @01:59PM (#3829000) Homepage
    There is a simple profiling capability in the ACE [wustl.edu] toolkit, the ACE_Profile_Timer [uci.edu]. Easy to wrap in a class with basic Start, Stop, and Elapsed methods. If you can guess what function or two the bulk of your program's time is being spent in, this can help pinpoint the worst offenders within that section of code. If not, create several timers, and time each function in your main loop, and print the information after the loop is finished. Drill down into subfunctions as needed. See where the milliseconds tick away. You might be surprised.

    And remember, in the immortal words of Michael Abrash, "Assume Nothing. Measure the improvements. If you don't measure, you're just guessing."

  • by exa (27197) on Friday July 05 2002, @02:00PM (#3829009) Homepage Journal
    Drepper is smoking some strange shit. Every serious programmer uses profiling.

    Just because kernel and glibc wackos don't find it useful doesn't mean it isn't useful.

    I regularly use profiling for any code that demands performance.

    That was a very unfortunate remark of Ulrich.
  • by tshoppa (513863) on Friday July 05 2002, @02:03PM (#3829041)
    Lots of debugging techniques don't work well with threaded programs. I think that blame here lies not with gprof, but with the threaded-programming paradigm or its current implementations.

    The problems that threading solves (multiple outstanding I/O's, multiple CPU utilization) can be solved using other methods. Those other methods have their evils, too, but trading off for the lesser net evil is what design and analysis is all about.

    Lack of profiling tools is pretty far down on the list of tradeoffs, in my opinion; much higher up are issues of maintainability and portability, areas where threading does badly anyway.

  • by Nathan Mates (129704) on Friday July 05 2002, @02:08PM (#3829071) Homepage
    Out in the console games development market, there's one real serious tool: a hardware profiler. Basiclly, it's a heavily modified PSX with bus analyzers tacked on so that it can snoop and tell *exactly* where the slowdowns are. Is it a cache miss? Is it the GPU hammering on things? There's none of the "this function is slow" -- it points out *why*.

    You should not rely on profilers from the beginning of writing code, but you they're no cure-all either. A profiler can't tell you to use quicksort over a bubblesort. It just says what is slow, and it's up to the programmers to find a faster way to do things.

    The most recent x86 profiler I've used was Intel's VTune (AMD's free tool at http://www.amd.com/us-en/Processors/DevelopWithAMD /0,,30_2252_3604,00.html was so-so at best for my use). Those apps don't do any of the fancy bus analysis, etc. Still, I'd suspect they're better than nothing.

    I know this is going to sound like flamebait, but C++ *does* make it very easy to shoot yourself in the foot with regards to performance. If you don't set up all your operators to properly take consts, if you forget to set things up, it can kill performance. If you rely on a *lot* of small functions, you can either (1) blow out the cache with a larger executable (more likely on consoles), or (2) forget to inline a few, and kill your performance with lots of *tiny* calls that probably won't show up under VTune. The slowness of various compilers makes people afraid of putting a lot of small functions in headers where they belong, as any change would force a slow, full rebuild.

    I've seen C++ compilers decide to inline a 4x4 matrix copy by unrolling a loop to read/write the first 12 elements, then call the Vector4 copy constructor. Worst of all worlds. Replacing that with a memcpy was a huge win. But, the only way one would know *how* to fix that is to be able to look at the disassembly.

    Nathan Mates
  • CPU Intensive tasks (Score:1, Informative)

    by Anonymous Coward on Friday July 05 2002, @02:11PM (#3829093)
    When you have to deal with CPU intensive programs,
    you will find that a profiller is quite useful.

    For instance, in the last year I've been developing a automatic nesting server using Linux and gprof was very important to spot the functions that were consuming more cpu time.

    With gprof it was easy to notice two small functions that were responsible for 95% of the cpu usage.

    As a result, I replaced that two small functions with 180 lines of optimized assembly code and I got a very good performance increase, since I was using a lot of inter-word bit shifts that the C compiler didn't handle well.

    Regarding multi-threads, I come to the conclusion that 9 out of 10 times you don't really need to use threads, even in interactive programs, since there are alternative ways of acheiving the same efects.

    For instance, all the X11 toolkits like Xt/Motif, Gtk+ and Qt, have the concept of work-procedures and timeout-funcions.

    If you put all of your time-consuming operations inside work-procedures, you can get the same results as you would get with multi-threads, because you have an efective way of executing several taks at the same time without blocking the user interface.

    Fernando Pereira

    • 1 reply beneath your current threshold.
  • Not useless, just different (Score:2, Insightful)

    by Dasein (6110) <tedcNO@SPAMcodebig.com> on Friday July 05 2002, @02:13PM (#3829104) Homepage Journal
    There are very few application that don't reach out across a network for information. The bottleneck is usually this network communications. Check out Performant [performant.com] for tools that work on the network level.

    There's also a continuing trend of software developers spending user's computing power to make thier jobs easier. Java, J2EE, C#, .NET, C++, C all can theoretically produce software that is just as speedy as assembly but it rarely is. People still write assembly where performance really counts (games, realtime, etc.)

    Some people thinks that the wasted processing power is a crime. Me, I think it's just economics. It's much cheaper to pay for processing power than it is to pay for the developers to squeeze every last bit of performance out of an app.

    However, there are some applications where profiling is absolutely required. Database engines, games, simulations, anything that is CPU-bound has the potential of benifiting from profiling.
  • Quantify! (Score:3, Insightful)

    by ptomblin (1378) <ptomblin@xcski.com> on Friday July 05 2002, @02:15PM (#3829115) Homepage Journal
    I've solved some important real-world problems using Quantify and Purify, especially when dealing with a huge system with a lot of developers fingers in the pie. One of the programs was handling 100,000+ transacations a day, and Quantify helped shaved enough off so we didn't have to force all of our customers to upgrade their hardware.

    Faced with a similar problem in Linux, I'd probably port the program to Solaris, Quantify it there, and hope the results are similar under Linux.
  • Plenty of options for Java (Score:2, Informative)

    by grungeman (590547) on Friday July 05 2002, @02:20PM (#3829153)
    For Java we have a really nice choice of profilers. There are basically three great products available, all of them have proved to be absolutely useful. There is JProbe [sitraka.com], OptimizeIt [borland.com] and JProfiler [ej-technologies.com] (the 2.0 beta of JProfiler looks cool [incors.com]). I don't know what the problems on Linux are, but when programming Java, profiling is quite an enjoyable task.
  • by taniwha (70410) on Friday July 05 2002, @02:21PM (#3829160) Homepage Journal
    mind you I have my own threads package - you need to if you want 1,000,000+ really small threads running together, with totally minimal stack space (4 bytes not the 1Mb that pthreads gives you). The only hard part was making gprof use SIGALTSTACK (which was broken in the kernel when I started).
    Of course this worked because from gprof's point of view I was running in one kernel thread - apart from that oprofile rocks :-)
  • short answer yes, the long answer no (Score:2, Insightful)

    by fermion (181285) on Friday July 05 2002, @02:22PM (#3829167) Journal
    First, given that programmers seem to want the profiler to automatically correct mistakes in timing, coding, and design, one would expect the profiler to to work even less well in multithreaded programming. For instance, if the threads to not lock and unlock resources in a consistent order, and therefore the code takes a very long time to access those resource, the profiler will tell you that the section of code is taking a long time, but not why. Such a design will be very hard to fix, and not really in the domain of the profiler.

    On the other hand, profilers are very good at indicated, if the code is well designed, that out of 10K lines of code,these three function of 10 lines each eat up 80% of the time. A sufficiently clever programmer will focus on those areas for analysis. If the code is not good, the profiler will unlikely be able to reduce the problem domain. If the programmer is not good, the information will not be so useful.

    Wrt the multithreading issue, I find most problems occur in two cases. First, as in debugging, the programmer does not begin with sufficiently simple conditions. Often one cannot debug the whole application at once. Likewise, profiling an entire application in multithreading mode may no the proper approach. Second, The function to be profiled may not be properly designed to allow a useful profile. Multithreading applications are often best when they are made up of simple small purpose functions. These are easier to debug, and easier to profile.

  • No (Score:2)

    by mfos.org (471768) on Friday July 05 2002, @02:27PM (#3829189)
    Just because there is a technical issue with multithreading, the concept of tracking where your program spends its memory and time isn't a dead concept.
  • Sample-based profiling (Score:3, Interesting)

    by p3d0 (42270) on Friday July 05 2002, @02:29PM (#3829197)
    The approach of instrumenting code is practiacally useless for high-performance code anyway, depending on how it is written, because the instrumentation disrupts the profile. However, it's not hard to build a sample-based profiler into a program (on Linux, anyway) and use that to get a statistical measure of where time is being spent.

    This can be done for about 40 lines of code. All you need is to set up an alarm timer, and then install a signal handler for it that spits out the current program counter to a file. After the run is finished, filter the PC values through addr2line and voila. If you want to get really fancy, make it walk the stack via the ebp register (on x86) and you can build yourself a call stack.

  • Unit Profiling (Score:2)

    by ChaoticCoyote (195677) on Friday July 05 2002, @02:34PM (#3829228) Homepage

    Profiling should be performed at the unit-test level, and not on full-blown applications.

    For the most part, this approach avoids hassles with threading and processes, and has worked effectively for me on multiprocessor clusters.

    • 1 reply beneath your current threshold.
  • by Dan Kegel (49814) on Friday July 05 2002, @02:37PM (#3829241)
    See www.kegel.com/gprof.html [kegel.com] for a patch that fixes
    another gprof problem: it chokes after 65534
    symbols. This makes it hard to profile large
    c++ programs.

    I think gprof is still useful. Ulrich is just
    being cranky. The workaround for the multithreaded support works pretty well...

  • Oh no... (Score:2)

    by be-fan (61476) on Friday July 05 2002, @03:08PM (#3829379)
    First, the idea was to write in ASM to squeeze every drop of performance from the hardware.
    Then, the idea was to write in a high-level language, but always be careful about performance.
    Then, the idea was to develop apps quickly, then profile to optimize the important parts.
    Now, screw optimization, let the user buy more hardware!
    I think this attitude sucks. Even my 1.5Ghz Athlon-XP is slower running KDE 3.x (or any version of gnome for that matter) than my old 300Mhz PII was running Win98. And it doesn't do a hell of a lot of stuff that my old machine couldn't. I switched to Linux and took the performance hit because I hated Microsoft. I keep upgrading KDE (and my hardware) because the latest apps only work on the latest version. I don't expect more complex software to get faster, but I'd expect that as I upgrade my hardware, software should stay relatively the same speed. Yet, it seems as if software is getting slower more quickly than system bottlenecks (specifically RAM and hard-drive speed) can keep up. That means that the end-user experience is deteriorating, even as users pump more money into their hardware to get usable performance.
  • I always find it funny... (Score:2, Insightful)

    by be-fan (61476) on Friday July 05 2002, @03:31PM (#3829471)
    How *NIX grognards always complain about multi-threading, but don't find signals (and their nasty interrupt-driven nature) to be the least bit unsettling!
  • what's the problem? (Score:4, Interesting)

    by g4dget (579145) on Friday July 05 2002, @03:56PM (#3829607)
    You say that there is a problem with profiling multithreaded code with gprof. But the issue you point to seems to apply to both single and multithreaded code: Linux gprof doesn't seem to count time spent in system code.

    Now, compute intensive code tends not to spend a lot of time in system calls, so it isn't clear that it matters whether a profiler counts time spent in system calls. I kind of prefer if it doesn't because it doesn't clutter up the profile with I/O delays (which are usually unavoidable).

    If you want to find out where your code is spending time in system calls, you can use "strace -c".

    There are also gcov-like tools that can be used for profiling via code insertion (as opposed to statistical profiling like gprof), although I'm not sure whether PC hardware has the necessary timer support.

    Overall, the answer is: yes, profiling still matters for programs that push the limits of the machine. But fewer programs do. I think most people would be a lot better off not programming in C or C++ at all and not worrying about performance. Too much worry about "efficiency" often results in code that is not only buggy but also quite inefficient: tricks that are fine for optimizing a few inner loops wreak havoc with performance when applied throughout a program. Too much tuning of low-level stuff also causes people to miss opportunities for better data structure and program logic. This is actually an endemic problem in the industry that affects almost all big C/C++ software systems. Desktop software, major servers, and even major parts of the kernel should simply not be written in C/C++ anymore.

    The thing with profiling and optimization is to know when to stop, and few people know that. So, maybe the best thing to say is: "no, profiling doesn't matter anymore". That will keep most people out of trouble, and the few that still need to profile will figure it out themselves.

  • by WetCat (558132) on Friday July 05 2002, @03:58PM (#3829617)
    _Native_ _x86_ multithreading is useless and harmful.
    1) It heavily decreases number of processes - a very tight resource in Linux
    2) It makes programs cumbersome and hard to debug
    3) In x86 architecture in Linux it was not a good idea to make threads implemenation via context switches for thread switches - in x86 it's a very costly operation.
    But it's only rant. Same as MIME this flawed technology is already used a lot and it's no way to turn it back.
    (why mime? mime is a stupid thing - a dirty hack created from not wanting to rewrite old 7-bit protocol from scratch).
    • 1 reply beneath your current threshold.
  • by PuntaConejo (180886) on Friday July 05 2002, @04:11PM (#3829693)
    I program a lot in c++, and I particularly like to use the STL. Thus, my programs often have a lot of inlined functions in them. I have found gprof to be much less useful when profiling such programs.
    When a function is inlined, gprof does not account for that functions time. Nor should it be exepcted to, since optimizations may reorder the code so much that it is not feasable to attribute a particular assembly instruction to a particular function. I have tried recompiling my programs with -fno-inline to expose the names of the inlined functions, but this changes the program performance so much in some cases that I am hesititant to draw any conclusions about a program from such a profile. Short of abandoning inlining (and interprocedural optimizations, which poses the same sort of problem), does anyone have suggestions on how to profile such programs?
  • gcj and JVMPI (Score:1)

    by KIngo (168933) on Friday July 05 2002, @05:40PM (#3830099)
    Well, if gcj's JVMPI becomes fully usable, maybe we could us a tool like JProfiler [ej-technologies.com] for natively compiled Java code. That would be great.
  • This will cost me karma... (Score:2, Insightful)

    by tlambert (566799) on Friday July 05 2002, @07:00PM (#3830459)
    Don't use threads.

    The problem you are complaining about profiling having is that it can't profile threaded programs. Don't write threaded programs, and the problem is solved.

    Frankly, I've always considered threading useful for only a few situations:

    o When you have an SMP system, and you need to scale your applicaiton to multiple CPUs so that you can throw hardware at the problem instead of solving it the right way

    o When you have programmers who can't write finite state automata, because they don't understand computer science, and should really be asking "Would you like fries with that?" somewhere, instead of cranking out code

    o When your OS doesn't support async I/O, and you need to interleave your I/O in order to achieve better virtual concurrency

    Other than those situations, threads don't make a lot of sense: you have all this extra context switching overhead, and you have all sorts of other problems -- like an iniability to reasonably profile the code with a statistical profiler.

    OK... Whew! Boy do I feel better! 8-).

    Statistically examining the PC, unless it's done on a per thread basis, is just a waste of time in threaded programs.

    If you want to solve the profiling problem for threaded programs, then you need to go to non-statistical profiling. This requires compiler support. The compiler needs to call a profile_enter and profile_exit for each function, with the thread ID as one of the arguments. THis lets you create an arc-list per thread ID, and seperately deal with the profiling, as if you has written the threads as seperate programs. It also catches out inter-thread stalls.

    -- Terry
  • by oezi (514948) on Friday July 05 2002, @07:08PM (#3830494)

    gprof is maybe not the most impressive tool to use, but it's quite useful. At a IA64 course at university [German, sorry] [uni-karlsruhe.de] we used gprof to identify the bottlenecks in the c-code of the xvid-codec [xvid.org]. Then we assembler-optimized like mad and got quite a nice speed-up.

    Result can be found in our wiki:

    Pre-Optimization [uni-karlsruhe.de]
    Post-Optimization [uni-karlsruhe.de]
    Without gprof we would have been lost... our IA64 wiki
  • by wwi (243026) on Friday July 05 2002, @07:18PM (#3830526) Homepage Journal
    This discussion has included some
    comments about the poor implementation of
    threads in Linux. Other writers suggest
    avoiding threads, if possible. Note
    that Java is nothing but threads. Any
    Java program is running 4-6 threads (depending
    on the JRE) right out of the box.

    Where I work, we have had
    severe problems getting Java programs to
    work correctly on Linux. The IBM Java
    support team has shared our frustration.
    Maybe IBM's new thread implementation is
    needed, just to get Linux, Java (and
    thread users in general) working correctly
    in an enterprise environment. After that
    is working, then we can see about improving
    other areas like performance.

  • by freddoh (589922) on Saturday July 06 2002, @02:33AM (#3831770)
    Saying profiling is useless is equivalent to saying that algorithmic complexity doesn't have to be studied. This is absurd.
    If for example your profiler says your function foo() is executed 100 times with a data of size 10 and 10000 times with a data of size 20, you have a serious algorithmic complexity problem. If some of them can (and should) be handled before hand, a profiler is very useful to handle those that weren't.
    Algorithmic complexity is independant of computer language and of CPU speed, therefore profilers will always be useful as long as algorithmic is used by computer languages, so for quite some time still ;)

  • Threading... (Score:2)

    by Znork (31774) on Saturday July 06 2002, @03:06AM (#3831827)
    I wouldnt say that gprof is useless... threading, however, comes very close to it.

    Threading is useful in the instance where you have an application that needs to scale with SMP and which you cannot, for whatever reason, fork. But the accompanying pain of being forced to pay extremely close attention and mutex lock the code all over makes it not worth it for most situations.

    Use fork. Use other IPC methods if necessary. But dont thread or you'll spend an order of magnitude more time debugging.
  • I don't know... (Score:1)

    by gatkinso (15975) on Friday July 05 2002, @01:32PM (#3828789)
    ...I think that the lack of profiling tools makes developers rely more heavily on solid upfront design.
    [ Parent ]
    • Re:I don't know... by Wesley Everest (Score:3) Friday July 05 2002, @01:51PM
      • Re:I don't know... (Score:5, Insightful)

        by pthisis (27352) on Friday July 05 2002, @01:59PM (#3828998) Journal
        You could argue that with good up front design, you'll know in advance what 10% of the code to focus on, but I don't think that works that well in practice. At best, you're making educated guesses about where bottlenecks will appear

        And a lot of smart people, from Knuth and Kernighan to Linus and Guido, will freely admit that predicting what to optimize is nearly impossible. Even people at that level of programming prowess are often surprised by where the bottlenecks appear (and where they don't appear). You certainly want to design for flexible optimization from the start, but you'll often discover that the stupid O(n) scan you put in is good enough for now and that you better optimize the I/O system before you think about replacing it with a tree or hash table or whatever.

        Sumner
        [ Parent ]
        • 1 reply beneath your current threshold.
      • Re:I don't know... by fermion (Score:3) Friday July 05 2002, @03:21PM
      • Re:I don't know... by gatkinso (Score:1) Friday July 05 2002, @03:23PM
    • Re:I don't know... by SirSlud (Score:2) Friday July 05 2002, @02:11PM
  • by shayne321 (106803) on Friday July 05 2002, @01:48PM (#3828914) Homepage Journal
    User: This program is slow
    Me: Really? Which part?
    User: When I click the "report" icon
    Me: Oh (tinkers with report code). Try it now.
    User: It's still slow
    Me: (shakes BOFH excuse 8-ball) Hrmm, must be interference from sunspots, try it again tommorrow

    :)

    [ Parent ]
    • 1 reply beneath your current threshold.
  • 14 replies beneath your current threshold.