[Verse-dev] profiling the server

Brecht Van Lommel blendix at pandora.be
Thu Oct 14 21:04:21 CEST 2004


Hey all,

I profiled the server with gprof, to learn some more about it's
performance. I uploaded a bitmap to the server (and downloaded as well,
since the app also subscribes to the layers). Attached is the most
important part of output, with some interesting results.

Most surprising is that 'callback_send_packet_ack' is taking up 2/3 of
all running time!

I figured out why: the sent-but-not-yet-acked command bufs are stored in
queue->history. They are hashed by their address sum, for quickly
finding identical command buffers (needed for event compression). But
looking up all command bufs with a given packet id is really slow if
queue->history gets big. The command buf are randomly spread over the
array.

A solution would be to have an extra hash, for quick packet id based
lookups.

'callback_send_b_layer_create' is also taking a suspiciously long time.
Using calloc instead of malloc (and a for loop) lets this function's
executing time drop from 0.26 to 0.00 in the profile's output. This is
not portable for floats though, since zero bytes don't necessarily mean
a 0.0 float value.

'callback_send_b_tile_set' can also be optimized by using memcpy instead
of manual copying in a for loop. (0.12 -> 0.01).

Using memcpy, strcpy, memcmp, etc. instead of for loops could help
improve performance in many functions, while even making them more
readable.

Cheers,
Brecht.
-------------- next part --------------
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 67.63      3.99     3.99      700     5.70     5.70  callback_send_packet_ack
 12.88      4.75     0.76   154900     0.00     0.00  v_cmd_buf_compare
  4.41      5.01     0.26        4    65.00    65.03  callback_send_b_layer_create
  3.05      5.19     0.18    37403     0.00     0.03  v_noq_send_buf
  2.37      5.33     0.14    37394     0.00     0.01  verse_send_b_tile_set
  2.20      5.46     0.13    37269     0.00     0.00  v_e_encrypt_command
  1.53      5.55     0.09     7426     0.01     0.03  v_noq_send_queue
  1.36      5.63     0.08    37403     0.00     0.00  v_cmd_buf_set_address_size
  1.02      5.69     0.06   675525     0.00     0.00  vnp_raw_pack_uint8
  1.02      5.75     0.06    18644     0.00     0.04  callback_send_b_tile_set
  0.68      5.79     0.04    38281     0.00     0.00  v_cmd_buf_allocate
  0.51      5.82     0.03    18644     0.00     0.04  v_unpack_b_tile_set
  0.17      5.83     0.01   336949     0.00     0.00  vnp_raw_unpack_uint8
  0.17      5.84     0.01   149586     0.00     0.00  vnp_raw_pack_uint16
  0.17      5.85     0.01    59268     0.00     0.00  vnp_raw_unpack_uint32
  0.17      5.86     0.01    38982     0.00     0.00  v_con_get_network_queue
  0.17      5.87     0.01    19358     0.00     0.00  v_fs_get_user_data
  0.17      5.88     0.01    10719     0.00     0.00  v_n_receive_data
  0.17      5.89     0.01     1580     0.01     0.01  v_niq_get
  0.17      5.90     0.01      878     0.01     0.01  v_e_dencrypt_data_packet

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.
 
 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
	   else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this 
	   function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
	   the function in the gprof listing. If the index is
	   in parenthesis it shows where it would appear in
	   the gprof listing if it were to be printed.



More information about the Verse-dev mailing list