[Verse-dev] profiling the server
Brecht Van Lommel
blendix at pandora.be
Thu Oct 14 21:04:21 CEST 2004
Hey all,
I profiled the server with gprof, to learn some more about it's
performance. I uploaded a bitmap to the server (and downloaded as well,
since the app also subscribes to the layers). Attached is the most
important part of output, with some interesting results.
Most surprising is that 'callback_send_packet_ack' is taking up 2/3 of
all running time!
I figured out why: the sent-but-not-yet-acked command bufs are stored in
queue->history. They are hashed by their address sum, for quickly
finding identical command buffers (needed for event compression). But
looking up all command bufs with a given packet id is really slow if
queue->history gets big. The command buf are randomly spread over the
array.
A solution would be to have an extra hash, for quick packet id based
lookups.
'callback_send_b_layer_create' is also taking a suspiciously long time.
Using calloc instead of malloc (and a for loop) lets this function's
executing time drop from 0.26 to 0.00 in the profile's output. This is
not portable for floats though, since zero bytes don't necessarily mean
a 0.0 float value.
'callback_send_b_tile_set' can also be optimized by using memcpy instead
of manual copying in a for loop. (0.12 -> 0.01).
Using memcpy, strcpy, memcmp, etc. instead of for loops could help
improve performance in many functions, while even making them more
readable.
Cheers,
Brecht.
-------------- next part --------------
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
67.63 3.99 3.99 700 5.70 5.70 callback_send_packet_ack
12.88 4.75 0.76 154900 0.00 0.00 v_cmd_buf_compare
4.41 5.01 0.26 4 65.00 65.03 callback_send_b_layer_create
3.05 5.19 0.18 37403 0.00 0.03 v_noq_send_buf
2.37 5.33 0.14 37394 0.00 0.01 verse_send_b_tile_set
2.20 5.46 0.13 37269 0.00 0.00 v_e_encrypt_command
1.53 5.55 0.09 7426 0.01 0.03 v_noq_send_queue
1.36 5.63 0.08 37403 0.00 0.00 v_cmd_buf_set_address_size
1.02 5.69 0.06 675525 0.00 0.00 vnp_raw_pack_uint8
1.02 5.75 0.06 18644 0.00 0.04 callback_send_b_tile_set
0.68 5.79 0.04 38281 0.00 0.00 v_cmd_buf_allocate
0.51 5.82 0.03 18644 0.00 0.04 v_unpack_b_tile_set
0.17 5.83 0.01 336949 0.00 0.00 vnp_raw_unpack_uint8
0.17 5.84 0.01 149586 0.00 0.00 vnp_raw_pack_uint16
0.17 5.85 0.01 59268 0.00 0.00 vnp_raw_unpack_uint32
0.17 5.86 0.01 38982 0.00 0.00 v_con_get_network_queue
0.17 5.87 0.01 19358 0.00 0.00 v_fs_get_user_data
0.17 5.88 0.01 10719 0.00 0.00 v_n_receive_data
0.17 5.89 0.01 1580 0.01 0.01 v_niq_get
0.17 5.90 0.01 878 0.01 0.01 v_e_dencrypt_data_packet
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.
total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.
name the name of the function. This is the minor sort
for this listing. The index shows the location of
the function in the gprof listing. If the index is
in parenthesis it shows where it would appear in
the gprof listing if it were to be printed.
More information about the Verse-dev
mailing list