Tutorial 4 - Performance Measurements
Submission process:
- Submission deadline is December 12, 14:00 CET (before the lecture) .
- Commit and push your solution as single notebook file via git as ./tutorial/tutorial4/tutorial4.ipynb. Please take care of the correct subfolder/filename since submission is denied otherwise.
- During the first lecture after the deadline we will discuss a sample solution in class.
- Afterwards, you have time until December 19, 14:00 CET to submit a corrected version of your submission:
- Rework your solution according to our discussion in class.
- Commit and push the corrected version as single file via git as ./tutorial/tutorial4/tutorial4.ipynb. Please take care of the correct filename since submission is denied otherwise.
Remarks:
- Grading is done based on both versions of your submission.
- If the first submission is missing your submission will not be graded.
- If the second submission contains major flaws after revision not more than half of the credits for this tutorial can be achieved.
- A sample solution is provided after December 19, 14:00 CET eventually.
- Please use acn@net.in.tum.de for questions regarding lecture, tutorial, and project of ACN.
Open vSwitch (3.0 credits)
The plot below shows the throughput of Open vSwitch on a Linux machine configured for unidirectional forwarding. The hardware used for this measurement was a Xeon E3-1230v2 clocked at 3.3 GHz and a 10 Gbit Intel X520-T2 NIC.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def plot_throughput(pkt_sizes, tp_in_gbps):
if len(pkt_sizes) != len(tp_in_gbps):
print("number of x and y values is not the same")
return
plt.figure(3,figsize=(20,10))
plt.plot(pkt_sizes, tp_in_gbps)
plt.axis((0,1550,0,20))
plt.xlabel('Packet size [Byte]')
plt.ylabel('Throughput [Gbit/s]')
plt.show()
packet_sizes = [64, 128, 256, 384, 512]
throughput_in_gbps = [1.256, 2.2259, 4.128, 6.04384, 7.91616]
plot_throughput(packet_sizes, throughput_in_gbps)
%matplotlib inline
packet_sizes_extrapolated = [
64, 128, 256, 384, 512, 768, 1024, 1280, 1500
]
throughput_in_gbps_extrapolated = [
1.256, 2.2259, 4.128, 6.04384, 7.91616
# begin insert code
# The throughput increases linearly until the line rate of the 10 Gbit NIC is reached
# and stays constant afterwards.
,10,10,10,10
# end insert code
]
plot_throughput(packet_sizes_extrapolated, throughput_in_gbps_extrapolated)
def convert_to_mpps(gbps_list, pkt_sz_list):
result = []
# begin insert code
for g,p in zip(gbps_list, pkt_sz_list):
# To calculate the packets per second from the data rate given,
# the values have to be divided by the packet size + 20 bytes
# including inter frame gap, preamble and start of frame delimiter.
result.append(((g * 1000 * 1000 * 1000) / ((p + 20) * 8 )) / 1000000)
# end insert code
return result
tp_in_mpps = convert_to_mpps(throughput_in_gbps_extrapolated, packet_sizes_extrapolated)
print(tp_in_mpps)
[1.8690476190476188, 1.8799831081081082, 1.8695652173913044, 1.87, 1.86, 1.586294416243655, 1.1973180076628354, 0.9615384615384615, 0.8223684210526315]
Assume that Open vSwitch forwarding has constant per-packet costs, i.e., the cycles needed on the CPU per packet are independent of the packet’s size and total applied load. You can also assume that the CPU is fully loaded when forwarding 64 Byte packets.
c) [0.5 credits] Create a function calculating the relative load of the used CPU core. Your function gets the list of already calculated packet transfer rates (mpps_list) as input parameter.
def convert_to_cpuload(mpps_list):
result = []
# begin insert code
# Maximum transfer rate given in problem setting,
# as we assume 64 byte packets to cause maximum
# CPU load
max_rate = mpps_list[0]
for g in mpps_list:
# relative calculation
result.append(g/max_rate)
# end insert code
return result
cpuload = convert_to_cpuload(tp_in_mpps)
print(cpuload)
[1.0, 1.0058508349113446, 1.0002769315978954, 1.0005095541401277, 0.9951592356687899, 0.8487180316208092, 0.6406032652463578, 0.514453699167075, 0.4399932953402615]
d) [0.75 credits] Create a graph visualizing the CPU load (y axis in %) in relation to the packet size (x axis).
%matplotlib inline
def plot_cpu_load(pkt_sizes, cpu_load):
# begin insert code
if len(pkt_sizes) == len(cpu_load):
plt.figure(3,figsize=(20,10))
plt.plot(pkt_sizes, [c * 100 for c in cpu_load])
plt.axis((0, 1550, 0, 105))
plt.xlabel('Packet size [Byte]')
plt.ylabel('CPU load [%]')
plt.show()
# end insert code
return
plot_cpu_load(packet_sizes_extrapolated, cpuload)
e) [0.5 credits] IO can be processed by a CPU in a polling-based manner and an interrupt-based manner. Describe and discuss both approaches especially the implications on CPU load and energy consumption.
- Interrupt-based IO: In case of an IO event the hardware sends a signal, which interrupts a running program and executes an interrupt request handler. After handling the interrupt the program can continue its operation.
- Polling-based IO: Using polling, the IO device has to be actively checked if an IO operation has to be performed.
The CPU load caused by handling single interrupt is more expensive than a polling-based lookup because for interrupt handling two context switches have to be performed for saving and restoring the application’s state. The polling-based lookups can prevent the CPU from entering a sleep state because the IO device has to be polled regularly. Also, if no IO is available cycles get wasted on unnecessary polling, whereas interrupts are only executed if IO operations have to be performed.
f) [0.25 credits] Linux supports both, polling-based and interrupt-based packet reception, and has the ability to switch dynamically between them. Which of the aforementioned reception mechanisms is better suited to handle a high network load and why?
The polling based approach is better suited for high load scenarios as the single polling call is cheaper and the chance of wasting cycles on polling an idling IO device are lower.
Long Tails (3.5 credits)
Latency of software systems often exhibit a long tail, i.e., a particularly bad worst-case latency. Especially virtual machines (VMs) are affected by poor long tail latencies. Average latency does not tell the whole story in this case. Therefore, latency measurements often include the 99th and 99.9th percentile latency to show the worst-case latencies one can expect.
Latency distributions found in real systems can be approximated by a log-normal distribution with the following probability density function:
$f(x) = \frac{1}{x\sigma\sqrt{2\pi}} e ^{-\frac{(ln x- \mu)^2}{2\sigma^2}}$
We approximated the parameters μ = 5 and σ = 0.6 for the latency distribution of a Linux system running inside a VM measured in microseconds. The following code creates sample values according to a log normal distribution with numpy:
import numpy as np
# \mu = 5, \sigma = 0.6, and 50000 samples
distribution_data = np.random.lognormal(5, 0.6, 50000)
a) [1.0 credits] Plot the given sample distribution. Use a histogram with 1000 equi-sized bins and limit the x axis to the 99.9th percentile.
%matplotlib inline
import matplotlib.pyplot as plt
def plot_histogram(data):
# begin insert code
plt.figure(3,figsize=(20,10))
plt.hist(data, bins = 1000)
plt.xlabel("Latency [us]")
plt.ylabel("Frequency")
plt.xlim(0, np.percentile(data, 99.9))
plt.show()
# end insert code
return
plot_histogram(distribution_data)
b) [0.5 credits] To describe the basic characteristic of a distribution, certain values can be useful. Calculate the mean, standard deviation, median, 99th and 99.9th percentile. Use the given data type as return format. You may use numpy to calculate the required values.
def calc_basic_properties(data):
result = {
'average': 0,
'std_deviation': 0,
'median': 0,
'percentile99': 0,
'percentile999': 0,
}
# begin insert code
result['average'] = np.average(data)
result['std_deviation'] = np.std(data)
result['median'] = np.median(data)
result['percentile99'] = np.percentile(data, 99)
result['percentile999'] = np.percentile(data, 99.9)
# end insert code
return result
# print results
for key, value in calc_basic_properties(distribution_data).items():
print('{:14}: {}'.format(key, value))
average : 176.94332058403472 std_deviation : 115.43116500740415 median : 148.2987911689638 percentile99 : 588.4605765956042 percentile999 : 942.4409397864906
The following subproblems give a rough example to demonstrate how often seemingly rare events (99th percentile and beyond) can happen.
Imagine you are using a game server running inside a VM. You are concerned about the poor worst-case characteristics of networking in VMs. The game requests new data from the server 60 times per second. User tests show that the delay in the 99th percentile is acceptable for your application. However, users experience a noticable lag event if latencies in the 99.9th percentile show up.
c) [1.0 credits] Calculate the probability of seeing at least one latency worse than the 99.9th percentile for a given number of seconds.
import math
# probability to have at least one noticable lag event
def probability_for_lag_event(experiment_time_in_sec):
result = 1.0
# begin insert code
# probability for user happy event => 99.9% => 0.999
# probability for user happy event in one second => 0.999^60
# probability for user unhappy event in one second = 1 - (0.999^60)
# probability for user unhappy event depending on seconds:
return 1.0 - math.pow(0.999, (60*experiment_time_in_sec))
# end insert code
return result
print("Probability to see at least one lag event ( 1s): ", probability_for_lag_event(1))
print("Probability to see at least one lag event (60s): ", probability_for_lag_event(60))
Probability to see at least one lag event ( 1s): 0.0582637377768318 Probability to see at least one lag event (60s): 0.9727254487692769
d) [1.0 credits]
An evaluation that conforms to the benchmarking standard RFC 2544 only provides the average and optionally the standard deviation based on only 20 measurements. Subproblem b) refined the reported results by adding 99.xth percentiles.
Do you think that the two percentiles (99th and 99.9th) are enough to characterize the latency of our game server or other latency critical systems in general? Briefly explain why or why not.
Depends...
For the game server example the two reported percentiles show a very steep increase in latency, which is a very valuable information for the game service provider. Additionally, providing the 99.999th percentile and the maximum observed latency might also be a meaningful information in the case of our game server. However, there might be applications where such a steep latency increase happens even more rarely, then the reporting of additional 99.999x numbers might be of interest. In general there is no definite number of 9s which describes the latency sufficiently enough for any application.
An important aspect is also how often these "rare" events can be observed by application users. The game server example shows that these events can have an impact on user experience. So the characteristic of the application might also be a meaningful information in the design of latency critical apps. Think for instance about an application doing >10000 requests per second there very high percentiles (99.9999th) matter.
More information on measuring worst-case latency can be found in the following presentation.
Problem 3 Exponential Weighted Moving Average (3 credits)
In this exercise you will compute the retransmission timout of a TCP connection. The first cell downloads a csv file which contains RTT samples of a TCP connection. You will then analyze the RTT values and finally compute the retransmission timout.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# Download RTT CSV from ACN website
!wget -N https://acn.net.in.tum.de/exercise/tcp_rtt.npz
# Load file into Python
data = np.load('tcp_rtt.npz')
ts, rtt = data['ts'], data['rtt']
--2025-01-31 09:51:11-- https://acn.net.in.tum.de/exercise/tcp_rtt.npz Resolving acn.net.in.tum.de (acn.net.in.tum.de)... 2a00:4700:0:9:f::1, 188.95.232.11 Connecting to acn.net.in.tum.de (acn.net.in.tum.de)|2a00:4700:0:9:f::1|:443... connected. HTTP request sent, awaiting response... 304 Not Modified File ‘tcp_rtt.npz’ not modified on server. Omitting download.
# Helper function to plot data
def plot(*data, x_label=None, y_label=None, legends=[], y_min=None, y_max=None):
plt.figure(figsize=(18, 6))
plt.xlabel(x_label)
plt.ylabel(y_label)
for xs, ys in data:
plt.plot(xs, ys)
plt.legend(legends)
plt.gca().set_ylim(bottom=y_min, top=y_max)
plot((ts, rtt), x_label='Time in s', y_label='RTT in ms', legends=['flow1'])
a) [0.1 credits] Have a look at the above graph. Which congestion control algorithm was used by the TCP sender? (No explanation required)
TCP Cubic
b) [0.5 credits] Write a function to compute the exponential weighted moving average. The function ewma() gets two parameters, a list of values and a value alpha as weight for the new samples.
def ewma(values, alpha):
# begin insert code
out = []
for sample in values:
if len(out) == 0:
# We just use the first sample as initial value
new_average = sample
else:
new_average = out[-1] * (1 - alpha) + sample * alpha
out.append(new_average)
return out
# end insert code
return values
smoothed_rtt = ewma(rtt, 0.125)
plot((ts, rtt), (ts, smoothed_rtt), x_label='Time in s', y_label='RTT in ms', legends=['RTT', 'EWMA'])
c) [0.4 credits] Compare the results when using different values for alpha and explain how it impacts the resulting average.
Larger values for alpha increase the influence of new samples. The resulting average adjusts to new values faster.
Smalle values increase the influence of past samples which makes the overall value more stable but also inertial.
d) [1.5 credits] Have a look at RFC 6298 how TCP computes the retransmission timeout (RTO). In the following we implement a function to compute this value for our connection. We will ignore the clock granularity G, e.g. consider it small enough.
Write the function compute_rto() which receives a list of RTT samples and computes the RTO values using the two state variables SRTT (smoothed round-trip time) and RTTVAR (round-trip time variation) as explained in RFC 6298.
def compute_rto(samples):
# constants
ALPHA = 0.125
BETA = 0.25
K = 4
G = 0
# begin insert code
output = []
# Initialize values
# SRTT <- R
# RTTVAR <- R/2
# RTO <- SRTT + max (G, K*RTTVAR)
r = samples[0] # first sample
srtt = r
rttvar = r / 2
rto = srtt + max(G, K * rttvar)
output.append(rto)
# we skip the first value since it was already processed in the initialization
for rtt in samples[1:]:
# RTTVAR <- (1 - beta) * RTTVAR + beta * |SRTT - R'|
# SRTT <- (1 - alpha) * SRTT + alpha * R'
# RTO <- SRTT + max (G, K*RTTVAR)
rttvar = (1 - BETA) * rttvar + BETA * abs(srtt - rtt)
srtt = (1 - ALPHA) * srtt + ALPHA * rtt
rto = srtt + max(G, K * rttvar)
output.append(rto)
return output
# end insert code
return samples
plot((ts, rtt), (ts, compute_rto(rtt)), x_label='Time in s', y_label='RTT in ms', legends=['RTT', 'RTO'])
e) [0.5 credits] Discuss the influence on TCP if the RTT is considerably overestimated or underestimated.
- RTT overestimated -> Timeout too long -> it takes a long time before the timeout is triggered after segments are lost.
- RTT unterestimated -> timeout too short -> the timeout is triggered even though no loss occurred, i.e., segments are sent multiple time but no loss occurred.
Advanced Computer Networking by Prof. Dr.-Ing. Georg Carle
Teaching assistants: Christian Dietze, Sebastian Gallenmüller, Max Helm, Benedikt Jaeger, Marcel Kempf, Jihye Kim, Patrick Sattler