Forgot your password?
typodupeerror
Linux Software

Ask Slashdot: Performance Monitoring for Linux 81

Posted by Cliff
from the statistical-analysis-tools dept.
muadib wants to know about the following: "Given the current discussions on tuning, I am trying to find out if there are any performance monitoring applications for Linux. I don't mean things like xload, xosview, etc which provide only a small amount of data. For anyone who's done benchmarking under NT, I mean something like their built in perfmon utility that lets you view and capture just about any statistic on your system or on a remote system. Capturing is the specific functionality I'm looking for b/c I'm working on a Linux device driver, and it would be nice to have historical data of CPU utilization, interrupts/s, etc. so that I can compare complete system perfomance between code revisions."
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Performance Monitoring for Linux

Comments Filter:

  • You know, mimick the interface, make NT sysadmins feel at home, and like they can reuse some of the $$$$ they spent on training??????

    KDE does a great job of looking and feeling like windows, and their ktop (or whatever) does a great job of imitating the NT task manager, but so far I have not seen anything like a Kperfomon.

    I know it is traditional to hate all things NT/95 but sysadmin tools with a NT/95 interface would have a very large built-in interface and possibly persuade managers into believing Linux is as easy to use as NT. After all, the fact that NT looked like 95 has to have something to do with it's acceptance as a server.....

    There really is no "standard" gui sysadmin interface for Linux, why not take advantage of all Bill Gates legal ground work making it legal for you to rip off the look and feel of his sysadmin tools.

    If you have any info on NT-alike sysadmin tools (such as a samba interface, event log, etc) let me know at:

    motjuste@briefcase.com

  • by Anonymous Coward
    I (stress the next two words)used to work for a company called Datametrics Systems Corporation (www.datametrics.com). They
    offer a product called 'Viewpoint' that does what
    you guys are really hoping for: there's a UNIX process that reads something like 300 kernel variables at any rate (usually every 30 seconds) and then sends that data to a central monitoring program. The central program can talk to hundreds of UNIX,VMS,Unisys,NT,etc machines at once and plots and correlates
    the data provided. The features it has are pretty mind-boggling; look on the web page to get a feel for it. If you look hard enough, I think they even made a Java and a Web version of the frontend.The tools are for enterprise
    clients who want to know about the details of
    their performance: if the cache hit rate isn't
    as good as it should be, if the network is too saturated for best performance, etc. I beleive you
    could even compare Linux and IRIX's relative
    merits by looking at the two's metrics side by side under similar stresses. When I left, they
    were adding some modules like an Oracle module
    (to correlate kernel metrics with Oracle's SQL performance) and I personally suggested creating
    an Apache module (which may or may not exist -
    they have an API to program to, so it could happen
    if someone cared enough to make it happen)

    I should stop here and say that I am pretty sure
    this product retails for tens of thousands of
    dollars. :(

    My understanding is that there was a port to Linux
    done inhouse but I doubt they have rolled it out.
    My other understanding is that the company has pretty much gone to shit since they were bought out right after I left; so who knows if they will be clued enough to want to work on Linux. If any of you are very excited about products like this
    (ie, products for managing tens/hundreds of millions of dollars worth of computers) coming to linux, I'd suggest going on the web site, finding
    a feedback form, and speaking your mind.
  • This webpage [lanl.gov] has a tool that gives info similar to perfmon under NT. Under Intel, it uses the model specific registers (which report everything from cache misses to branch delays...) as implemented by the library libpperf.

    This tool is IMHO the best of the pack out there at the time for really understanding the performance of your programs with respect to caching and processor quirks. Check it out.
  • by Anonymous Coward
    There's a beta tool at
    http://www.blakeley.com/resources/vtad [blakeley.com]
  • No - Actually perfmon is OK - It has quite a few fudges in there (the source is in the SDK) but basically is uses the performance registry.

    These are my experiences based on doing capacity planning agents for NT:

    If you want to roll your own use the performance registry.

    Now the performance registry is an interesting beast. Check out ther perfmon code to see how its done but the low down is that this API is unsafe. Be careful with multi thread access to Perf Registry (actually - don't)

    The buffer size is assuming UNICODE size so it halves the buffer every call unless you refresh it (see the API) there is a MS bug report on this.

    Calls may return crap even though the return code is OK - use unicode to check the header.

    Use the counter size returned by the API not those specified in the header (they are wrong in some cases - not most)

    The performance registry relies on other DLLs that may fail so it in turn may fail.

    The spec is that you return info if you are asked but SQL Server 6.x returns info even when NOT asked and this is bad for performance if you only want some counters. (this is a known bug)

    But beware others may do this too.

    Actually this is a great idea and I wish it was done properly (ie. robust)

    It kills UNIX in ths regard if it worked as spec'd

    Linux is even worse that say SVR4 UNIX in instrumentation particularly I/O is zero!
    BUT its /proc is better in its configuration information than say SUN /proc

    Ive written agent/server type stuff to get info and it aint to hard but you've got to have the info to begin with.
  • by Anonymous Coward
    Moodss is a modular application. It displays data described and updated in one or more independent modules loaded when the application is started. Data is originally displayed in tables. Graphical views (graph, bar, 3D pie charts, ...), summary tables (with current,average, minimum and maximum values) and free text viewers can be created from any number of table cells, originating from any of the displayed viewers.

    A thorough and intuitive drag'n'drop scheme is used for most viewer editing tasks: creation, modification, type mutation, destruction, ... Table rows can be sorted in increasing or decreasing order by clicking on column titles. The current configuration (modules, tables and viewers geometry, ...) can be saved in a file at any time, and later reused through a command line switch, thus achieving a dashboard functionality.

    The module code is the link between the moodss core and the data to be displayed. All the specific code is kept in the module package. Since module data access is entirely customizable through C code, Tcl, HTTP, ...) and since several modules can be loaded at once, applications for moodss become limitless. For example, comparing a remote database server CPU load and a network load from a probe on the same graph becomes possible.

    Apart from a sample module with random data, ps, cpustats, memstats, diskstats, mounts, route, arp modules for Linux, apache and apachex modules are included (running "wish moodss ps cpustats memstats" mimics the "top" application with a graphic edge).

    All the above in rpm, tgz, ...

    Jean-Luc Fontaine

    see screenshots, html documentation, ... on my homepage at http://www.multimania.com/jfontain/

    Regards.
  • According to what I read on /. every day, NT is absolutely terrible, and Linux is better in every way. So why would Linux want to copy NT's performance monitor? Surely Linux has a better performance monitor already included, since it's so much better than NT.
  • sarcasm.
  • Posted by Karym:

    I'm currently working on a tool that resembles what you're looking for. It consists of 3 parts and a kernel patch. The patch adds a feature to the kernel that enables a "trace" driver to register and enables key kernel parts to call upon the driver to note that a given event has occured. In turn, the trace module takes care of the events and puts them in a buffer. When a certain quantity of information is in the driver's buffer, he sends a signal to a trace daemon. The daemon then reads from the driver and appends the trace information to a trace file. The last part is the trace data decoder. This decoder takes the binary data and transforms it into a human-readable format. Therefore, impact on the system is minimized. As of this time, all the above mentionned parts are complete. The only thing that remains to be done is to build a GUI for the decoder (right now it works perfectly on command line). This is what I'm working on right now.

    This system enables the observer to know exactly what happens at every moment in the system . As for remotley observing a host, this is not a problem, it is actually planned. This will consist in the trace daemon offering it's services on an IP port which can be contacted by other hosts. If you're interested ... send me an e-mail. I'll have a web page for it as soon as the GUI is complete.


  • Forgive me if any of these are obvious -- I'm
    not trying to be sarcastic, it just that some
    sys admins don't know this yet:

    "top" shows you CPU load, memory usage, and
    usage per process -- there are many options in
    "top" check out the man page for it.

    "pstree" shows a tree graph all processes and
    who spawned what.

    "ps auwx" (or "ps -ef" in Solaris) shows all
    current processes

    "netstat" and "ifconfig -a" shows network info
    such as errors, dropped packets, etc.

    Big Brother is a decent package for monitoring
    several servers at once. It generates a web page
    of colored lights (GIFs) indicating system load, web
    daemon status, email daemon status, ftp daeomon
    status, etc.
  • some months back, an article plus TCL/TK source code that did some kind of monitoring & graphing. It did long time frame performance monitoring & graphing of some stats, and looked nicely extensible, /proc makes that easy.

    I'm guessing the author of the question (only two months wait? Wow, that new hardware really has sped things up.) has already perused the article on SCO releasing sar as open source. This sounds like what he needs, albeit in a command line only form (correct me if I'm wrong.)

    Now what I want to see is a graphical heartbeat that looks a bit like cthugha (an oscilliscope on acid, for sound) and uses whatever system stats the operator deems apropriate as parameters for its graphics generating equations. Now, if only I had studied math a bit harder, I might write it myself... I have the Father Guido Sarducci "5 Minute University" syndrome, "We'll teach you in five minutes everything you'll remember five years after you graduate."

  • maybe I'm just too tired right now, but I did not detect any heisenberg ref's in the threads I've read here on this topic. Trying to instrument a thread of execution is usually death for that thread. I despise the notion of "performance monitors".
    If you build it right, it will work, and any anomalies usually succumb to thoughtful analysis.
    Instrumenting a network at the level you guys are talking about can kill it, severely.
    Get ye a good sniffer, and learn to read it's utterances.
  • In a slightly related note, Linux needs some new graphics libraries--GD is good, but it's not Excel. I have the distinct feeling GIMP
    is better suited to what we need. Sooner or later we won't have to jump to Excel to get quality graphs drawn.


    Uuuh, GNUPlot?

    I've not looked at Guppi, but my instinct if people want flashier graphs than GNUPlot can produce, the way to go is to extend GNUPlot.
    --
  • Set up a cron process to output the results of whatever /proc entries you desire into a CSV formatted logfile. Use the tools of your choice to sort through the mass of data. If you need something new, pop into the Linux source and add a /proc entry with whatever you like.

    In a slightly related note, Linux needs some new graphics libraries--GD is good, but it's not Excel. I have the distinct feeling GIMP is better suited to what we need. Sooner or later we won't have to jump to Excel to get quality graphs drawn.


    Once you pull the pin, Mr. Grenade is no longer your friend.
  • SAR was one of the few things I really like about working with a sun box.
  • Guppi [gnome.org] will handle graphs for Gnumeric, the Gnome spreadsheet. Guppi has high ambitions, but I don't know if it's useful yet. The web page says it isn't, but web pages tend not to stay current for very long in the gnome world. At any rate, there are some more graphing resources listed on the Guppi page.

    Of course, if you do go with the GIMP, you could get some seriously sharp looking graphs for your extra effort.
    --

  • Why not have it read the data out of /proc? These files give you lots of information in a format that is easy to parse; most of the existing commands mangle these into a human-readable form which is a pain to parse.
  • I've been playing around with MRTG for a couple weeks now, it's really geared towards monitoring routers, any other use is really kludgy.

    What is this Crickett/RDD, I can't find mention of it with any web searches. Is it free? A web pointer please!

  • :g/Sun/s//SCO/g
  • Unfortunately, that list appears not to be archived. This new list is archived automatically, and it has a distinct purpose: accumulate and sift through enough performance and reliability information to create a Performance-and-Reliability-HOWTO document.

    I'm not sure that these lists will overlap too much; the list I started is focused on system-administrator level tools to monitor both the health and load of Linux systems. It's scope is larger than just performance tuning -- it's not only about tweaking systems to run benchmarks better, but about making sure you get notified when your systems go down.

    It's not about duplication of effort -- it's about a different perspective and different goals.

  • I also attended the performance and reliability BOF at Linux Expo. We need to have more communication about this topic in the Linux community. To that end, I've started a mailing list that aims to discuss these matters further.

    To join this list, send a message [mailto] consisting of the single word "subscribe" (in the message body , not the subject) to:

    linux-performance-request@lists.microstate.com.
    The first objective of this list is to gather enough information to build a performance and reliability HOWTO. Many of the attendees of the BOF are on this list. This list is still in its infancy, but I'm sure that the Slashdot effect will change that!
  • Firstly if you are serious about monitoring performance, forget SNMP.

    The best tool that I have ever seen is Performance Co-Pilot on SGIs. They recently demo'd this product at a Linux expo running on an SGI Visual Workstation running Linux and I believe they are heading towards open sourcing it (along with a lot of other SGI stuff).

    See http://www.sgi.com/software/co-pilot

    I have recently written my own tool for DEC Alphas, but it is primative compared to SGI's tool. Monitoring multiple hosts simultaneously in real-time on the same chart/3D visualisation is non-trivial.

    My impression is that there is a good oppurtunity to add some good instrumentation to Linux using a consistent interface, someone has just got to do it. The other UNIXes suffer from insufficient instrumentation and a lack of public interfaces to get at the information.
  • By The Way: If you have Tcl installed, you have to say "man 5 proc". Otherwise you get the manpage for proc(n): "Create a Tcl procedure".
  • Have it run as a cron job ; ie output these things to a file , possibly parsing the output with perl or the basic unix tools. Or write a daemon in perl. But in the latter case, you want to be careful that you don't create security holes.

    -- Donovan

  • Looks like SAR will be ported to linux. I haven't used it much (I don't admin Solaris, just use it), but it's worth looking at.

  • While implementing a raid filesystem under Linux, I was shocked to find out that iostat does not work under linux. Earlier this morning Sun announced that they would be releasing sar under a Modzilla license so that should be helpful. I believe that most of the information you are looking for is tracked, ie top pulls information from somewhere and should give at least a high level overview about what is going on in the system.

    Lando
  • There already exists a linux performance mailing list.
    To subscribe to this list, send an email to the following address
    with "subscribe" in the body of the message:
    linux-perf-request@www-klinik.uni-mainz.de

    My collegue, Rich Pettit has done alot of work to add perf stats to the /proc filesystem so let not reinvent all this stuff.
  • Seeing as someone already mentioned Datametrics
    I might as well put in a (shameless) plug for our product, RAPS. Check out http://www.foglight.com for more details. Again, it's aimed at the enterprise level so it's not cheap but it has OS level monitoring as well as Sybase and Oracle agents, Netscape and Apache WWW server monitoring.
  • Anyone know of any tools to monitor both Linux and NT machines? I currently use Scotty, but NT's snmpd tells so little. I'd even be happy with an rstatd for NT.
  • He's talking about openwin perfmon. Which as the ability under Solaris to monitor all kinds of crap and even log said crap to a file, leaving the method of analysis up to your imagination.

    Take a look at procinfo.
  • I think that's incorrect. The Performance Monitor measures at whatever increment you would like.

    perfmon is actually an excellent and comprehensive utility that has some very nice features and is actually useful in the real world. I use it for database tuning, for example.

    I recommend the book (now some years old) on using perfmon. In retrospect it looks like the last hurrah of the VMS crowd before the Win95 mentality took over NT development.

    And the fact that perfmon gets no respect is Yet Another Reason Why
  • by Ripp (17047)
    I dunno....

    *TOP* maybe?

    pipe it to a file or something.


  • Yes I would. Why - would you not?!
  • Whatever it is, I hope it is smarter about sampling methods than perfmon.exe.

    Perfmon only really performs spot measurements of things like CPU utilization. It can't tell you the true average CPU utilization of a process over a 10 minute interval. It can just tell you the average of instananeous CPU utilization at 0 minutes and 10 minutes. This bugs the hell out of me, especially since NT keeps a running count of execution time for each process.

  • Why don't you wait for the SAR sources to be
    released and adopt it. SAR supposedly gives
    a lot of low level statistics. If I remember
    correctly, sometime back [ok, longtime back!]
    when I was writing an SNMP agent for Acer
    Server Manager, we used to make use of SARs
    libraries on UnixWare to get some specific
    statistics for instrumentation.

    -Sas


  • The tool you're speaking of is the SE Toolkit, written by Adrian Cockroft for Sun. It's implemented in tcl/tk. It uses stuff like vmstat, iostat, mpstat to track system performance. It also uses some stuff in the Solaris kernel, so I don't see it being ported to Linux any time soon. It is very useful and does have its own language so you can write your own monitors, but its not very easy for someone who has a limited knowledge of the Solaris kernel (especially since the code isn't available!). I wish it were available for Linux!
  • I dunno

    *SARCASM* maybe?

    Really, 'top' only scrapes the surface of what NT's perfmon does ... 'top' is alot closer to NT's task manager.

    I love linux ... but ... NT has some very useful tools.

  • I like IBMs Performance Toolbox.
  • The company I work for just got a liscense for raps (by foglight), and one of their reps said that they are planning on porting to Linux.

    As it stands raps runs just on solaris (maybe nt?), and is very cool. I was skeptical at first, having seen a bunch f other crappy monitoring applications that I could write better myself, but this one really does a good job of presenting both low and high level stats in very digestible formats.
  • This was mentioned in the JavaOne conference yestereday. They mentioned that Rich Pettit was busily porting it to linux. it seems like a very usefull tool http://www.sun.com/sun-on-net/performance/se3/
  • I was poking around for similar utes today and ran across a rules based monitor... text out and looks REAL configurable... perfect for scheduling and collecting raw data for later analysis... check it out at http://www.blakeley.com/resources/ Please reply and let me know if this was any good. I may end up using it in the future and would appreciate and commentary you might have.
  • It makes me laugh to see this article [slashdot.org] appearing on the same page as this Ask /.

    Although, I presume no-one's ported it to Linux yet.

  • I thought 'monitor' was freely available for most *nix platforms!?!?

    Is this not true?
  • I have written a perl script which interfaces to some command line tools and outputs to a file which can be either directly read or browsed over the network. it was not very hard to do, and i would love to post it, but it is unfortunately covered under my IP agreement :(

    if you know perl, just use its extensive regular expression matching with common monitoring tools....couple this all together and have it output to files, and BINGO!

    :) hope this helps :)
  • ummm, no. its a freely available tool, or maybe im hallucinating when i run monitor on solaris and irix.

    my only uncertainty was whether or not there was a linux port....not whether i knew what monitor was....
  • Wow, "According to what I read on /. every day, NT is absolutely terrible, and Linux is better in every way.", that's really thinking for yourself isn't it?

    All I have to say is that they have their merits. Would you go to a (example) christian conservative "Pro-Life" demonstration and just take everything they said as gospel??

    Thanks for thinking,
    Pad

  • An application that pumps snmp and /proc data etc... into a database every n interval would be kewl.
    Then you could use SQL to search for certain criteria/data at a given period in time on a given machine.

    Does anything do this for Linux?
    I wish I had the time to do this, i suppose it would be worthwhile doing!?!

  • Just ftp to your favourite metalab (sunsite) mirror. Do mget *.lsm in pub/Linux/system/status and pub/Linux/system/status/xstatus then have a look and see what sounds nice.

    I use xperfmon++ for real-time monitoring - it looks a lot like the windows prefmon tool, but does not have the capability to log to file (this would probably run to only a few lines of code if you wanted to add it). However, almost all the stats can be got from vmstat(8) and netstat(8) -i. Just write a few filters to get rid the the headers and stuff from the output then paste(1) the results into one file and BINGO!

    Rodd
  • FreeBSD has "perfmon", the CPU performance
    monitoring interface. A majority of the Linux
    systems out there run on Intel CPUs. It is
    not hard to implement an interface for
    programming the CPU counters to measure
    a variety of events. However, there needs
    work to be done to use the so obtained data
    in a meaningful manner. The CPU counters can
    be coupled with various system tools that give
    us system statistics. The bottomline:

    1. There are a few drivers and libraries
    for Linux that allow you to make use
    of CPU counters (at least for the x86).
    These don't seem to be much used.

    2. Like Intel's VTune tool (which pertains
    mainly to code optimization), a tool
    could be written for Linux that gives
    extensive performance statistics, and
    helps in optimizing code. The
    infrastructure for such a tool is already
    there, IMHO.

    3. For interested people, I could find
    (from some dust-laden hard disks :)
    pointers to related code and documents.
    I worked on this subject long back.
  • I am curious as to what version of Linux and what GUI (if any) you are running. I had an older version of Linux running a year or so ago that had
    an OpenWin GUI which came with everything you mentioned and more: disk usage, swapping, cpu, interrupts, etc. I had the same tools on my SPARC station under both OpenWin and CDE so I am assuming it comes with the GUI and not the OS.

    Does this help?
  • All those packages are great for real time monitoring, but not for historical monitoring over a long period. Let's say I'm run a 6 hour long benchmark. I now tweak some stuff in my driver code and re-run that 6 hour benchmark. I want to be able to compare everything about my machine state during the two runs. CPU util., interrupts/second, etc, etc.

    Plus, lets say that my system goes down during the test run. I want to be able to look at it the next day and determine at what point that happened and what the state of the machine was right around the time it happened, b/c that might help me track down what happened.


    Deepak Saxena
    Project Director, Linux Demo Day '99

  • I'm the one who posted this question [about two months ago! :O ], and have since then done some more research and asked around at the Performance BOF @ Linux Expo and didn't seem like there was anyting that providede me with everything I'm looking for. So...in the spirit of open source I've decided to write my own. The basic idea is to have an agent running on each machine you want to montor, and either a gtk or newt based UI on
    the machine you're sitting at. Email me if you're interested in more info or helping.

    - Deepak
    Deepak Saxena
    Project Director, Linux Demo Day '99
  • maybe I'm just too tired right now, but I did not detect any heisenberg ref's in the threads I've read here on this topic. Trying to instrument a thread of execution is usually death for that thread. I despise the notion of "performance monitors". If you build it right, it will work, and any anomalies usually succumb to thoughtful analysis. Instrumenting a network at the level you guys are talking about can kill it, severely. Get ye a good sniffer, and learn to read it's utterances.

    Intuitively it would seem that measuring threads in great detail would distort the measurement. However, performance registers on board CPUS (I'm thinking Intel at the moment) allow one to monitor certain aspects of threads with almost no overhead whatsoever. It's perfectly feasable and desirable to monitor code and a fine grained level.

    There are many performance anomalies that can't necessarily be identified at the design level (for example pipeline flushing and cache problems). These detailed measurements can tell you much more about your program's behavior than the standard profiler can.
  • by RichP (60052)
    I wrote the SE Toolkit and I can tell you this: I've done a port of the interpreter to Linux and integrated some base classes that represent performance data that I've gathered from a new /proc source, /proc/perfstat (which I also developed). The interpreter and the SE Toolkit as it currently exists will never be released for Linux. It is pointless. The base classes that I wrote in C++ are LGPL and tools can be written directly on top of C++ instead of crippling the development effort by bolting the work on top of an interpreter for a subset of C++.

    I've made the patches to the kernel for my perfstat modifications available as well as the class library for reading the perfstat file. I'm in the process of getting an ftp server set up where I can put these changes. I will also make available a white paper I wrote on my efforts.

    I will get this done as soon as possible, but I've got a lot of other work to do that monopolizes my time.

    Rich
    ---- Richard Pettit
    ---- Performance Architect
    ---- Foglight Software, Inc.
  • At present, there is a paucity of information collected by the kernel, especially for disk I/O. However, patches exist for local disks:

    ftp://ftp.uk.linux.org/pub/linux/sct/fs/profilin g

    and for nfs:

    ftp://ftp.sce.carleton.ca/pub/rads/iostat-2.0.34 .tgz

    Please see the linux-perf mailing list for more information. Send subscribe requests to

    linux-perf-request@www-klinik.uni-mainz.de
  • by suso (153703)
    SNMP(Simple Network Monitoring Protocol) paired with something like MRTG or Crickett/RDD will do what you want.
  • Cricket/RRD Tool [munitions.com]
    MRTG [ee-staff.ethz.ch]
    UCD SNMP for Linux [ucdavis.edu]

    MRTG is kinda a bear to work with for monitoring stuff other than a router, but it can be done. For an example you can check out my suso.org stats page [suso.org]. Look on the left side.

"Only the hypocrite is really rotten to the core." -- Hannah Arendt.

Working...