Get your own customer support community
 

Diagnosing response time delay using RubyRun trace

Quite a few customers use RubyRUn to diagnose response time problems, includng code that they didnt write but rather bought and integrated. Since they dont know the code, there is no way of knowing where to put extra code in to break down the response time in a trace form, as often required by other performance tools on the market. RubyRun can trace a trasaction as is with no need for any additional code.

Based on observations of how real Ruby shops use RubyRun to opitimize codepath and fixing delay hotspots, I think some ‘methodology’ can be used for doing pathlength measurement with a tracing tool like RubyRun.

First, go after ‘delay’ before ‘CPU’ tuning. CPU tuning is always tough, because it is normally caused by either high # of methods or some very computational bound method, legitmately or due to a bug. The fact is, customers see delay, and only capacity planners/administrator are worried about CPU utilization. I mean, in a truly scalable software system, delay due to CPU capacity can always be resolved by upgrading the hardware.

Here I think what the steps should be:

Base Run:
1. Instrument only the controller classes in APP_PATHS
2. Turn off trace
3. Turn off class reload (if you are using Rails, look at development.env)
4. Start server
5. Run a bunch of popular requests, each multiple times (5+) to average out the initial hits (to achieve so called "steady state")
6. Pick the top 5 slowest requests from the performance summary report to work on, one set at a time (unless you are actually ok with the response time even though it appears to be long)

Analyze each request as below:

Analysis:
1. Expand the instrumentation scope incrementally in APP_PATHS or INCLUDE_HASH if wished
2. Turn trace on
3. Start server
4. Run the selected request 3 times
5. Ignore the total response time in the Performance Report since the overhead due to tracing is also counted
6. Look at the trace of the third request
7. Only look at timings of methods, and sql calls. Also check if the flow is the intended one
8. Expand (INCLUDE, APP_PATHS) or shrink (EXCLUDE) instrumentation scope to see things easier
9. Change code if appropriate
10. Go to Analysis until you are happy

If you guys find better methodology please let us know!
 
happy I’m confident
Inappropriate?
1 person likes this idea

User_default_medium