Difference between revisions of "EJFAT UDP Transmission Performance"

From epsciwiki
Jump to navigation Jump to search
Line 55: Line 55:
 
</pre>
 
</pre>
  
The blastee was receiving on Indra-s2. The receiving thread was '''NOT''' tied to any specific core:
+
The blastee was receiving on Indra-s2. Initially the receiving thread was '''NOT''' tied to any specific core:
  
 
  '''./packetBlastee -p 17750'''
 
  '''./packetBlastee -p 17750'''

Revision as of 19:40, 9 June 2022

UDP Performance Overview

This page is dedicated to researching methods to maximize reliable UDP transmission rates between nodes.

CURRENTLY UNDER CONSTRUCTION!


Transmission between indra-s2 and indra-s2

The following tests were run with 1 sender and 1 receiver. The sender was the packetBlaster program whose help gives the following output:

usage: ./packetBlaster
        [-h] [-v] [-ip6] [-sendnocp]
        [-bufdelay] (delay between each buffer, not packet)
        [-host <destination host (defaults to 127.0.0.1)>]
        [-p <destination UDP port>]
        [-i <outgoing interface name (e.g. eth0, currently only used to find MTU)>]
        [-mtu <desired MTU size>]
        [-t <tick>]
        [-ver <version>]
        [-id <data id>]
        [-pro <protocol>]
        [-e <entropy>]
        [-b <buffer size>]
        [-s <UDP send buffer size>]
        [-cores <comma-separated list of cores to run on>]
        [-tpre <tick prescale (1,2, ... tick increment each buffer sent)>]
        [-dpre <delay prescale (1,2, ... if -d defined, 1 delay for every prescale pkts/bufs)>]
        [-d <delay in microsec between packets>]

        EJFAT UDP packet sender that will packetize and send buffer repeatedly and get stats
        By default, data is copied into buffer and "send()" is used (connect is called).
        Using -sendnocp flag, data is sent using "send()" (connect called) and data copy minimized, but original data buffer changed

The blaster was sending 89kB buffers in 10 packets from Indra-s1 to the load balancer (129.57.109.254 / 19522) with mtu = 9000. The sending thread was NOT tied to any specific core. And finally, the entropy and id are the same (0):

./packetBlaster -host 129.57.109.254 -p 19522 -mtu 9000 -ver 2 -sendnocp -t 0 -id 0 -e 0 -b 89000

The receiver was the packetBlastee program whose help gives the following output:

usage: ./packetBlastee
        [-h] [-v] [-ip6]
        [-a <listening IP address (defaults to INADDR_ANY)>]
        [-p <listening UDP port>]
        [-b <internal buffer byte sizez>]
        [-r <UDP receive buffer byte size>]
        [-cores <comma-separated list of cores to run on>]
        [-tpre <tick prescale (1,2, ... expected tick increment for each buffer)>]

        This is an EJFAT UDP packet receiver made to work with packetBlaster.

The blastee was receiving on Indra-s2. Initially the receiving thread was NOT tied to any specific core:

./packetBlastee -p 17750


This new meta-data, populated by the data source, consists of two parts; the first for the LB and the second for the RE:

  • the LB to route all UDP packets with a common tick value to a single destination endpoint
  • the destination fragmentation RE to reassemble packets with a common tick into proper sequence. Figure X is a diagram of the new data stream processing requirements for example for the JLaB DAQ system.

The LB meta-data, processed by the LB, is to be in network or big endian order. The rest of the data including the RE mete-data can be formatted at the discretion of the EJFAT application.

Load Balancer Meta-Data

The LB meta-data is 128 bits that consists of two 64 bit words:

  • LB Control Word is 64 bits (bits 0-63) such that
    • bits 0-7 the 8 bit ASCII character ’L’
    • bits 8-15 the 8 bit ASCII character ’B’
    • bits 16-23 the 8 bit LB version number starting at 1 (constant for run duration)
    • bits 24-31 the 8 bit Protocol Number (very useful for protocol decoders e.g., wireshark/tshark )
    • bits 32-47 or 16 bits Reserved, MBZ
    • bits 48-63 an unsigned 16 bit Entropy value for destination port selection
  • Tick is an unsigned 64 bit quantity (bits 64-127) that for the duration of a data transfer session
    • Monotonically increases
    • Unique
    • Never rolls over
    • Never resets
    • Serves as the top level aggregation tag across packets that should be sent to a single specific destination.

In standard IETF RFC format:

protocol 'L:8,B:8,Version:8,Protocol:8,Reserved:16,Entropy:16,Tick:64'
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       L       |       B       |    Version    |    Protocol   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 3               4                   5                   6  
 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              Rsvd             |            Entropy            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 6                                               12  
 4 5       ...           ...         ...         0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                              Tick                             |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The value of the Tick field is a convention between data source and sink and LB control plane and is populated by the data source so as to send all UDP packets with a shared value to a single destination host IP; e.g., for the JLab particle detector DAQ system it would likely be timestamp. The Entropy field (bits 48-63) serves to deliver packets to a range of ports at the host IP.

Reassembly Engine Meta-Data

The RE meta-data (Figure X, yellow section) is 64 bits and consists of

  • bits 0-3 the 4 bit Version number
  • bits 4-13 a 10 bit Reserved field
  • bit 14 indicates first packet
  • bit 15 indicates last packet
  • bits 16-31 an unsigned 16 bit Data Id
  • bits 32-63 an unsigned 32 bit packet sequence number or optionally data offset byte number from beginning of file (BOF) for reassembly

In standard IETF RFC format:

protocol 'Version:4,Rsvd:10,First:1,Last:1,ROC-ID:16,Offset:32'
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|        Rsvd       |F|L|            Data-ID            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Packet Sequence # or Byte Offset from Beginning of File   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The Data Id field is a shared convention between data source and sink and is populated (or ignored) to suite the data transfer / load balance application, e.g., for the JLab particle detector DAQ system it would likely be ROC channel # or proxy.

The sequence number or optionally data offset byte number provides the RE with the necessary information to reassemble the transferred data into a meaningful contiguous sequence and is a shared convention between data source and sink. As such, the relationship between data_id, and sequence number or offset is undefined and is application specific. In many use cases for example, the sequence number or offset will be subordinate to the data_id, i.e., each set of packets with a coomon data_id will be individually sequenced as a distinct group from other groups with a different data_id for a common tick value.

Strictly speaking, the RE meta-data is opaque to the LB and therefore considered as part of the payload and is itself therefore a convention between data producer/consumer.

The resultant data stream is shown just below the block diagram in Figure X and depicts the stream UDP packet structure from the source data system to the LB. Individual packets are meta-data tagged both for the LB, to route based on tick to the proper compute node, and for the RE with packet offset spanning the collection of packets for a single tick for eventual destination reassembly.

The depicted sequence is only illustrative, and no assumption about the order of packets with respect to either tick or offset should be made by the LB or the RE.

UDP Header

The UDP Header Source Port field can optionally be modified/populated as follows:

Source Port = lower 16 bits of Load Balancer Tick (for LAG switch entropy)

The UDP Header Destination Port field must be modified/populated as follows:

Destination Port = Value that indicates LB should perform load balancing (else packet is discarded) = 'LB' = 0x4c42