Difference between revisions of "EJFAT UDP Transmission Performance"

From epsciwiki
Jump to navigation Jump to search
Line 78: Line 78:
 
<li>which pair of terminals the sender & receiver are run from
 
<li>which pair of terminals the sender & receiver are run from
 
</ol>
 
</ol>
<br>
+
 
'''The highest observed average data rate was 4.0GB/sec.'''
+
 
 +
<font size="+2">'''The highest observed average data rate was 4.0GB/sec.'''</font>
  
  

Revision as of 14:43, 10 June 2022

UDP Performance Overview

This page is dedicated to researching methods to maximize reliable UDP transmission rates between nodes.

CURRENTLY UNDER CONSTRUCTION!


Transmission between indra-s1 and indra-s2

The following tests were run with 1 sender and 1 receiver. The sender was the packetBlaster program whose help gives the following output:


usage: ./packetBlaster
        [-h] [-v] [-ip6] [-sendnocp]
        [-bufdelay] (delay between each buffer, not packet)
        [-host <destination host (defaults to 127.0.0.1)>]
        [-p <destination UDP port>]
        [-i <outgoing interface name (e.g. eth0, currently only used to find MTU)>]
        [-mtu <desired MTU size>]
        [-t <tick>]
        [-ver <version>]
        [-id <data id>]
        [-pro <protocol>]
        [-e <entropy>]
        [-b <buffer size>]
        [-s <UDP send buffer size>]
        [-cores <comma-separated list of cores to run on>]
        [-tpre <tick prescale (1,2, ... tick increment each buffer sent)>]
        [-dpre <delay prescale (1,2, ... if -d defined, 1 delay for every prescale pkts/bufs)>]
        [-d <delay in microsec between packets>]

        EJFAT UDP packet sender that will packetize and send buffer repeatedly and get stats
        By default, data is copied into buffer and "send()" is used (connect is called).
        Using -sendnocp flag, data is sent using "send()" (connect called) and data copy minimized, but original data buffer changed


The blaster was sending 89kB buffers in 10 packets from Indra-s1 to the load balancer (129.57.109.254 / 19522) with mtu = 9000.
The sending thread was NOT tied to any specific core. And finally, the entropy and id are the same (0):


./packetBlaster -host 129.57.109.254 -p 19522 -mtu 9000 -ver 2 -sendnocp -t 0 -id 0 -e 0 -b 89000


The receiver was the packetBlastee program whose help gives the following output:


usage: ./packetBlastee
        [-h] [-v] [-ip6]
        [-a <listening IP address (defaults to INADDR_ANY)>]
        [-p <listening UDP port>]
        [-b <internal buffer byte sizez>]
        [-r <UDP receive buffer byte size>]
        [-cores <comma-separated list of cores to run on>]
        [-tpre <tick prescale (1,2, ... expected tick increment for each buffer)>]

        This is an EJFAT UDP packet receiver made to work with packetBlaster.


The blastee was receiving on Indra-s2. Initially the receiving thread was NOT tied to any specific core.
This program is able to track the number of dropped packets and to make sure this stat is accurate,
the value given to the -dpre command line option must be identical for both sender & receiver. This
ensures that the receiver knows which tick is coming next.


./packetBlastee -p 17750


The speed of data transfer depend upon a number of factors:

  1. if the sending thread was tied to a specific core or cores
  2. if the receiving thread was tied to a specific core or cores
  3. if a linux operating system ksoftirqd thread was running and consuming significant cpu time
  4. which pair of terminals the sender & receiver are run from


The highest observed average data rate was 4.0GB/sec.


Terminals

Running 1 sender and 1 receiver at a time, of multiple terminals running on s1 and s2, some pairs produce the highest transfer rate and some do not.
How that is determined by the operating system is a mystery at this point. Sometimes, to get the highest rate, a sending/receivering pair need to run,
killed, and restarted a number of times. Again, why this is so is a mystery. If a sender/receiver are not running at 4GB/s, then they generally run at 3.3GB/s.

Sending Thread

If a sending thread was tied to a single or multiple cores, then performance diminished noticeably. The top transfer rate became 3.5GB/s.