EJFAT UDP Transmission Performance
UDP Performance Overview
This page is dedicated to researching method to maximize reliable UDP transmission rates between nodes.
Transmission between Indra-s2 and Indra-s2
Any data source wishing to take advantage of the EJFAT Load Balancer e.g., the Read Out Controllers (ROC) of the JLab DAQ system, must be prepared to stream data via UDP optionally with a properly modified UDP header, but must include additional meta-data prepended to any actual payload.
The purpose of setting of the UDP header source port is to induce LAG switch entropy at the front edge, and whether to do so is at the discretion of the EJFAT application.
This new meta-data, populated by the data source, consists of two parts; the first for the LB and the second for the RE:
- the LB to route all UDP packets with a common tick value to a single destination endpoint
- the destination fragmentation RE to reassemble packets with a common tick into proper sequence. Figure X is a diagram of the new data stream processing requirements for example for the JLaB DAQ system.
The LB meta-data, processed by the LB, is to be in network or big endian order. The rest of the data including the RE mete-data can be formatted at the discretion of the EJFAT application.
Load Balancer Meta-Data
The LB meta-data is 128 bits that consists of two 64 bit words:
- LB Control Word is 64 bits (bits 0-63) such that
- bits 0-7 the 8 bit ASCII character ’L’
- bits 8-15 the 8 bit ASCII character ’B’
- bits 16-23 the 8 bit LB version number starting at 1 (constant for run duration)
- bits 24-31 the 8 bit Protocol Number (very useful for protocol decoders e.g., wireshark/tshark )
- bits 32-47 or 16 bits Reserved, MBZ
- bits 48-63 an unsigned 16 bit Entropy value for destination port selection
- Tick is an unsigned 64 bit quantity (bits 64-127) that for the duration of a data transfer session
- Monotonically increases
- Unique
- Never rolls over
- Never resets
- Serves as the top level aggregation tag across packets that should be sent to a single specific destination.
In standard IETF RFC format:
protocol 'L:8,B:8,Version:8,Protocol:8,Reserved:16,Entropy:16,Tick:64' 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | L | B | Version | Protocol | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3 4 5 6 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Rsvd | Entropy | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 6 12 4 5 ... ... ... 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Tick | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The value of the Tick field is a convention between data source and sink and LB control plane and is populated by the data source so as to send all UDP packets with a shared value to a single destination host IP; e.g., for the JLab particle detector DAQ system it would likely be timestamp. The Entropy field (bits 48-63) serves to deliver packets to a range of ports at the host IP.
Reassembly Engine Meta-Data
The RE meta-data (Figure X, yellow section) is 64 bits and consists of
- bits 0-3 the 4 bit Version number
- bits 4-13 a 10 bit Reserved field
- bit 14 indicates first packet
- bit 15 indicates last packet
- bits 16-31 an unsigned 16 bit Data Id
- bits 32-63 an unsigned 32 bit packet sequence number or optionally data offset byte number from beginning of file (BOF) for reassembly
In standard IETF RFC format:
protocol 'Version:4,Rsvd:10,First:1,Last:1,ROC-ID:16,Offset:32' 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| Rsvd |F|L| Data-ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Packet Sequence # or Byte Offset from Beginning of File | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The Data Id field is a shared convention between data source and sink and is populated (or ignored) to suite the data transfer / load balance application, e.g., for the JLab particle detector DAQ system it would likely be ROC channel # or proxy.
The sequence number or optionally data offset byte number provides the RE with the necessary information to reassemble the transferred data into a meaningful contiguous sequence and is a shared convention between data source and sink. As such, the relationship between data_id, and sequence number or offset is undefined and is application specific. In many use cases for example, the sequence number or offset will be subordinate to the data_id, i.e., each set of packets with a coomon data_id will be individually sequenced as a distinct group from other groups with a different data_id for a common tick value.
Strictly speaking, the RE meta-data is opaque to the LB and therefore considered as part of the payload and is itself therefore a convention between data producer/consumer.
The resultant data stream is shown just below the block diagram in Figure X and depicts the stream UDP packet structure from the source data system to the LB. Individual packets are meta-data tagged both for the LB, to route based on tick to the proper compute node, and for the RE with packet offset spanning the collection of packets for a single tick for eventual destination reassembly.
The depicted sequence is only illustrative, and no assumption about the order of packets with respect to either tick or offset should be made by the LB or the RE.
UDP Header
The UDP Header Source Port field can optionally be modified/populated as follows:
Source Port = lower 16 bits of Load Balancer Tick (for LAG switch entropy)
The UDP Header Destination Port field must be modified/populated as follows:
Destination Port = Value that indicates LB should perform load balancing (else packet is discarded) = 'LB' = 0x4c42