Blog

DigitalRune Helper Library: Hierarchical Profiler

Mar 15

Written by:
Tuesday, March 15, 2011  RssIcon


The new version of the DigitalRune Helper Library (DigitalRune.dll, which is part of DigitalRune Physics Bundle) has not 1 but 2 profiler classes. The “simple profiler” class was introduced in the previous blog post. In this blog post, we talk about the second profiler, which is a “hierarchical profiler”. It was inspired by the article Real-Time Hierarchical Profiling (Game Programming Gems 3).

Do we really need another profiler class? – Yes, really! This class has slightly different goals. The main goal is to measure the timing of nested method calls. This is extremely helpful in many applications – in particular if you need to optimize your game loop. An important difference between the simple profiler and the hierarchical profiler is that the hierarchical profiler does not work across multiple threads. That’s why we still need the simple profiler. The simple profiler can be used to collect values and to profile code in multi-threaded scenarios.

Overview

The HierarchicalProfiler class can be found in the namespace DigitalRune.Diagnostics. Here is a class diagram of the relevant classes.

image

Unlike the Profiler class, the HierarchicalProfiler is not a static class. You need to create a new instance of the HierarchicalProfiler, and this instance must not be shared across threads.

Time is measured with HierarchicalProfiler.Start() and Stop(). This profiler allows nested time measurement – even for recursive calls! If Start(“Foo”) is called twice, without calling Stop() the profiler treats this as a recursive method call. If Start(“Foo0”) and then Start(“Foo1”) is called, the profiler assumes that Foo1 is a method called by Foo0. Thus, it builds a tree of profiler data. The root of the tree is HierarchicalProfiler.Root. Each node in the tree is a HierarchicalProfilerNode and has other child nodes.

The method HierarchicalProfiler.NewFrame() must be called when a new frame of the game starts (= at the beginning of the game’s main loop, or at beginning of Game.Update() in XNA).

I guess it gets clearer with an example…

Example

The following example “simulates” a game. Methods called Update() and Draw() are called in a loop. The Update method calls other methods to compute Physics and AI.

// The compilation symbol "DIGITALRUNE_PROFILE" must 
//
be defined to activate profiling.
#define
DIGITALRUNE_PROFILE

using System;
using System.Threading;
using
DigitalRune.Diagnostics;

namespace
ProfilingTest
{
 
class Program

{
   
// The profiler instance.
    private static HierarchicalProfiler _profiler = new HierarchicalProfiler("MyProfiler"
);

   
static void Main(string
[] args)
{
     
// Start profiling.
      _profiler.Reset();

     
// This represents the main-loop of a game.
      for (int
i = 0; i < 10; i++)
{
       
// NewFrame() must be called when a new frame of the game begins.
        _profiler.NewFrame();

        Update();
        Draw();
      }

     
// Write the profiler data to the console. We start at the root node and include
      // up to 5 child levels.
      Console
.WriteLine(_profiler.Dump(_profiler.Root, 5));
    }

   
private static void
Update()
{
      _profiler.Start(
"Update"
);

      Physics();
      AI();
      AI();
      AI();
     
Thread
.Sleep(1);

      _profiler.Stop();
    }

   
private static void
Physics()
{
      _profiler.Start(
"Physics"
);

     
Thread
.Sleep(6);

      _profiler.Stop();
    }

   
private static void
AI()
{
      _profiler.Start(
"AI"
);

     
Thread
.Sleep(3);

      _profiler.Stop();
    }

   
private static void
Draw()
{
      _profiler.Start(
"Draw"
);

     
Thread.Sleep(4);

      _profiler.Stop();
    }
  }
}

Here is the console output:

MyProfiler.Root 20 frames 20.286ms/frame 405.721ms total
  Update 79.8% 16.198ms/frame 1 calls/frame (15.99ms/16.198ms/18ms)
    Physics 29.6% 6.003ms/frame 1 calls/frame (5.367ms/6.003ms/6.356ms)
    AI 45.1% 9.145ms/frame 3 calls/frame (2.389ms/3.048ms/5.107ms)
    Other 5.2% 20.977ms 1.049ms/frame
  Draw 19.6% 3.979ms/frame 1 calls/frame (3.514ms/3.979ms/4.683ms)
  Other 1.2% 4.919ms 245.97µs/frame

The dump uses a compacted format. The percentage values show the time of a node relative to the root of the dump. The values in round brackets are Minimum/Average/Maximum times. 'Other' represents the time of a node that was not measured by a child node. (If you don’t like the output format, you can traverse the tree manually, starting at the Root node, and collect the data that is relevant in your particular case.)

The output tells us a few things:

  • The mainloop of the game was executed 20 times. The average frame time is ~20ms. If we target a 60 fps game (= 16.6 ms/frame), we need to do some optimization.
  • 79.9% of the whole game time is spent in Update(). Drawing accounts for 19.6% of the game time.
  • AI uses 45.1% of the whole game time and is called 3 times per frame.
  • Etc.


Your name:
Gravatar Preview
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
CAPTCHA image
Enter the code shown above in the box below
Add Comment   Cancel