World's most popular travel blog for travel bloggers.

Data normalization when control has multiple values in CS/CE

, , No Comments
Problem Detail: 

(This question was originally posted at but was considered too specific for Academia.)

In computer science/engineering research papers relating to performance improvement wherein execution time is normalized to control benchmarks (i.e. speedup plots), how is that normalization generally implemented when the control(s) is/are executed multiple times in order to avoid the possibility of other processes/interrupts/etc. taking processor time and skewing the results? My first thought would be to just use the minimum time achieved for each benchmark, but as that would probably not be the minimum possible execution time in most cases and, depending on the situation, overall execution time rather than best-case execution time may be more important, is it better to just accept the skewing and go off the means? Or is it better to use the median for each set of results?

Asked By : JAB
Answered By : D.W.

One simple approach is to use the median. It is less sensitive to outliers than the mean, but still serves as a good summary of typical running time. Alternatively, you could use the mean but in that case you should inspect all the results yourself to check for the possibility of outliers. I personally would recommend the median.

Papers often show confidence intervals as well, so that you can assess the effect of statistical noise.

Normally, benchmarks are run on an isolated system to minimize interference from other tasks.

Note that there are many challenges with benchmarking. Even simple irrelevant changes can cause significant changes to performance, e.g., because they happened to change cache alignment to something that is randomly better or worse. Therefore, it can be difficult to separate out whether your optimization led to a 3% performance improvement because your optimization is an improvement, or if that's just randomness (e.g., just recompiling with slightly different settings can change the running time by +/- 3%, and you happened to get unlucky once and lucky the other time).

Best Answer from StackOverflow

Question Source :

3200 people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback