KCachegrind. Well see how gprof and KCachegrind lie to us and why they do so, discuss the limits within which we can trust them nonetheless, and attempt to draw more general conclusions about profilers and profile visualization tools. In case your programming vocabulary is different from mine I use lying in its dispassionate meaning of communicating falsehoods and not to convey negative judgement on the contrary, Im indebted to the authors of profiling tools both as a user and a developer of such tools. So, consider a program with two parts an easy part and a hard part. Both parts do similar work but one part does much more work than the other void workint n. Here, work is a do nothing loop. We therefore expect main to spend most of its time in hard and only a tiny fraction in easy. Now lets profile the program with gprof gcc o try try. On my machine, this prints the following info self self total. X.jpeg' alt='How To Install Gprof On Ubuntu Iso' title='How To Install Gprof On Ubuntu Iso' />What happened Can we trust anything that gprof says Which parts of its output are entirely wrong like this easy is the same as hard business and which parts are roughly correct, give or take a measurement error To answer this, we need to briefly discuss how gprof works. Roughly, gprofs two sources of information are profil and mcount profil a cousin of creat in that it could have been spelled with an e as well updates an instruction address histogram every 1. That is, 1. 00 times a second the OS looks which instruction the program is executing, and increments a counter corresponding to that instruction. So the share of increments corresponding to a functions body is proportionate to the share of time the program spent in the function. Specifically, when a function is entered, it calls mcount to record a call to itself from the caller the caller is generally easy to identify because it necessarily passes a return address to your function and that address points right into the callers body. So if f calls g 1. With this in mind, we can roughly tell what gprof knows. Specifically, it knows that easy and hard were both called once work, called from each, ran twice. This info is from mcount and its 1. The program spent almost no time in the code of easy and hard, and most of its time in the code of work. This info is from profil and its rather reliable because the program ran for 3 seconds, which means we had 3. If almost all of these increments are in work, thats significant enough. What about the share of time easy spent in its call to work, and the share of time hard spent in work By now we know that gprof knows absolutely nothing about this. So it guesses, by taking 3. This shows how bad results can be produced from perfectly good measurements, if passed to the wrong algorithm. More generally, gprofs output falls into the following categories, listed in decreasing order of reliability Number of calls 1. I think please correct me if Im wrong. Self seconds in the Flat profile time spent in a given function not including children reliable to the extent that 1. Seconds attributed to call graph edges contribution of children to parents, total runtime spent in selfchildren, etc. Only trust it if theres zero code reuse in a given case that is, f is only called by g, or if the function in question is known to take about the same time regardless of the call site for example, rand. BTW, the fact that gprof lies doesnt mean that its documentation does on the contrary, man gprof says, in the BUGS section The granularity of the sampling is shown, but remains statistical at best. We assume that the time for each execution of a function can be expressed by the total time for the function divided by the number of times the function is called. Thus the time propagated along the call graph arcs to the functions parents is directly proportional to the number of times that arc is traversed. Unfortunately, users tend to read tools output without reading documentation. The ability of users who arent into profiling tools to understand the implications of this passage is a separate question. The man page also refers to papers from 1. An age of over three decades is a good reason to cut a program some slack. In a way, gprofs age is not only a source of its limitations, such as only 1. Now lets look at a more modern profiler called callgrind a valgrind plugin. Being more modern, callgrind has a few advantages over gprof such as not lying in its call graph though some would debate that as well see, and coming with a GUI program to visualize its output called KCachegrind. KCachegrind the viewer as opposed to callgrind the measurements collector does lie in its call tree as opposed to call graph as well shortly observe. But first lets have a look at its truthful reporting of the situation with easy being easier than hard As you can see, easy isnt even shown at the graph KCachegrind hides things with insignificant cost you can, however, see the cost of mains call to easy and hard at the source view indeed easy is 1. Why 1. 00. 0x and not 1. Because I changed hard to run a million iterations instead of a billion, bringing the difference down to 1. Why did I do that Because callgrind is slow its based on Valgrind which is essentially a processor simulator. This means that you dont measure time you measure things like instructions fetched and cache misses which are interesting in their own right, and you get an estimation of the time the program should take given these numbers and your processor model. It also means callgrind is slow. Is it slower than gprof Not necessarily. With gprof, code runs at near native speed, but you only get 1. With callgrind you get much more data points per second. So for a hairy program, with callgrind you get statistically significant data more quickly so effectively callgrind is faster. But for a simple program with just a couple of hot spots, callgrind is slower because if the program has a costly part 1 and then a costly part 2, itll take callgrind more time to even get to part 2, whereas gprof, with its near native speed, will give you good enough data from its fast run. So much about speed now lets look at a case where KCachegrind lies to us, and then well discuss why it happens. To expose the lie, well need a more complicated program. Well achieve the required complexity by having two worker functions instead of one, and then adding a manager a function that does nothing except calling the two workers with the number of iterations to run. How does the manager decide the number of iterations each worker should run Based on the project requirements, of course. Our projects will be two more functions, each calling the manager with its own pair of iteration numbers for the two workers. As you can see, both workers work on both projects, but each project is mostly done by one of the workers, the other contributing 1. Now lets see what KCachegrind says about this we need to run callgrind, which can be done without special compilation flags gcc o try. Heres what well see The bottom part of the screen shows us truths, but the top part shows falsehoods. The truth is that each project called the manager once the manager called each worker twice and each worker did half the work in the program all shown at the call graph at the bottom. However, at the top, each of the project functions occupies half the window and shows that worker. The Ultimate A To Z List of Linux Commands. Short Bytes Linux distributions can leverage an extensive range of commands to accomplish various tasks. 