

# 40 Gbit Ethernet PCS/PMA and MAC FPGA implementation

... optimized for your needs:

- 802.3ba-2010 compliant
- field tested for industrial usage
- full HDL source code available for developers

# **AITIA's 40Gbps IP core:**

### Features:

- IEEE 802.3ba-2010 full compliant implementation
- Source code available for further development!
- Uses Xilinx Virtex 6 device specific GTH transceivers, can be easily ported to other devices too
- 320bit//320bit Receive/Transmit MAC data-interface running at 156,25Mhz
- Optimized ethernet CRC checksum calculation on Rx and Tx MAC interfaces
- Tested on CGEP with CFP SR optical modules
- Simple and parameterizable traffic generator application available
- Sample application with 40G partitioned ISE project for fast development runtime, and guaranteed timing compliance.

### PMA RX/TX:

- Xilinx specific transceivers, clock synthesis and clock-data-recovery
- Fe.: 1 x GTH\_QUAD built-in Virtex 6 modules

### PCS-RX:

- 64b66b alignment lock
- marker locking
- lane deskew compensation
- lane reordering

## PCS-TX:

- 64b66b block creation, and encoding, scrambling
- marker and BIP insertion
- idle insertion when needed

### MAC-RX:

- Framing: removing preamble, align SOF on first byte
- CRC checking, error detection

### MAC-TX:

- Framing: adding preamble
- Realigning data lanes
- Adding CRC checksum

### **Monitor application:**

- Frame processing: filtering, chunking according to configured rules
- adding monitor header and high precision timestamp sourced from an onboard atomic clock

### Traffic generator application:

- Packet generator with configurable parameters: packet length, inter-packetgap, and packet-data
- Packet generator with predefined traffic profiles (eg. Browsing, VOIP, P2P ...)

### **40Gbps PCS/PMA functional description:**

All major building modules of the 40Gbps ethernet core are well separated according to their functions. This results in code thats easier to understand and develop, desing placement and partitioning is also much more convenient.

### The main components are:

- ETH\_40G\_v6: this contains the Virtex 6 specific transcievers, the 40G RX/TX PCS/PMA, and MAC functions
  - pcs\_phy\_40G: this contains the V6 specific transcievers, and clocking modules
  - PCS\_RX\_40G: RX PCS/PMA functions
    - align\_64b66b\_40G
    - detect\_marker\_40G
    - lane FIFOs
    - vl\_marker\_lock\_40G
    - vl order 40G
    - descrambler 64b66b 256b
    - decode 64b66b 40G
  - MAC RX 40G: RX MAC functions
    - CRC32\_320\_wtable
  - MAC\_TX\_40G: TX MAC functions
    - CRC32\_320\_wtable
  - PCS TX 40G: TX PCS/PMA functions
    - encode\_64b66b\_40G
    - scrambler 64b66b 256b
    - lane\_FIFOs
    - insert\_marker\_40G

### User applications:

- XLG\_FrameGen\_wBRAM: Simple parametrizable traffic generator.

  This module can send various types and numbers of predefined ethernet frames with variable length and interframe-gap
- XLG\_Monitor: Received ethernet frames are filtered, chunked, and time-stamped. Headered packet are written into DDR3 storage for speed compensation, and are then sent out over 4x10Gbps ethernet links for further analysis; like traffic mix, or deep packet inspection.
  - A full functional application is avaliable for the CGEP\_4G4X\_2Z2Q reference board.

– *pcs\_phy\_40G*: This is the device specific physical interface of the 40G core. In our case its a Virtex 6 device, so 1 GTH transcievers are used to connect to and external optical QFP module over 4x10G electrical lines.

The GTH transcievers are configured in RAW 64 bit mode, no gearboxes are used (40G marker processing permits to use the built in 64b/66b encoder, or line scrambler).

If you want to use another device, for example an Altera chip, you only have to replace this module, the other parts of the core do not depend on the physical interface used.

– *PCS\_RX\_40G*: This module implements the RX PCS/PMA functions according to IEEE 802.3ba-2010

The 4 virtual lanes are multiplexed straight into 4 physical lanes, no interleaved bits are present.

The GTH serdes has a 64 bit output, so we have to convert and lock on 64b/66b alignment too (align\_64b66b\_40G).

After that we can detect the markers that are inserted into the virtual lanes, to achieve cross-channel alignment (detect\_marker\_40G, vl\_marker\_lock\_40G).

There can be a maximum of 180ns time delay between physical channels because of the optical transmission and electrical conversions, so we have to insert FIFOs on every virtual-lane to compensate delays (lane\_FIFOs). These FIFO-s also take care of the PCS(161.13MHz) to MAC(156.25MHz) clock conversion.

After that virtual-lanes must be reordered as sent by the far end device (vl\_order\_40G).

Next is the self-synchronizing descrambler. This is based upon a simple linear feedback shift register (descrambler\_64b66b\_256b).

The last stage is the 66b decoder, which determines the ethernet frame boundaries, and other control codes (decode\_64b66b\_40G)



- *MAC\_RX\_40G*: This module aligns frame data to always start on the first byte, and cuts off ethernet preamble. Ethernet frame checksum is calculated and compared on the received data (CRC32\_320\_wtable). This is done in a sophisticated table-based checksum refactoring manner.
- *MAC\_TX\_40G:* This module is responsible for assembling the ethernet frame from the raw input data. Preamble is added on the start, and checksum on the end of frame(CRC32\_320\_wtable). By default MAC input data width is 320 bits, this needs to be converted to 4x64 bits.
- *PCS\_TX\_40G*: This module implements the TX PCS/PMA functions according to IEEE 802.3ba-2010

First input data is encoded into 64b format, frame control and data codes are added when needed (encode\_64b66b\_40G). If there is no input data, then idle codes are added to ensure a continuous dataflow.

After that encoded data is scrambled (scrambler\_64b66b\_256b) to dampen the DC component in the signal.

Lane FIFO-s (lane\_FIFOs) are added to store the data to compensate speed differences from idle-code and lane-marker insertion. Also clock conversion from MAC(156.25MHz) to PCS(161.23MHz) clock is done in this component.

Virtual lanes are created by peridically adding a marker into the datastream (insert\_marker\_40G).



# **Partitioning the design:**

To achieve better timing closure and faster builds it is advised to partition the design into atleast two parts.

Partitioning is only necessary if using multiple 40G cores in the design.

The partitioning is done in two steps. First we have to carefully choose the partitions, and placements for building the whole core as one.

This is the "reference build", which is used only to implement the 40G partition and its interfaces. We can simply reuse the fully implemented partition files in the following builds. The placement, internal wiring and timing closure of the 40G partitions remain untouched, so only the rest of the design has to be recompiled after that.



### XC6VHX255T-2FF1155

The picture above shows a pre-partitioned design in FPGA-editor:

The red components are part of a fixed partition consisting of the 40G RX/TX PCS/PMA and MAC core, but without the physical transcievers. The blue components are the rest of the design interfacing to the 40G MAC core. The 3 GTH transcievers are on the right upper side, those are not part of the 40G partition.

Partitioned and implemented 40G reference design is avaliable for CGEP\_4X4G\_2Q device in Xilinx ISE project format.

# **Interfacing to the core:**

• Physical IO ports:

```
Q0_REFCLK_N : IN STD_LOGIC;
Q0_REFCLK_P : IN STD_LOGIC;
```

Those are the differential reference clocks for the device specific GTH Quad transcievers. These should be an independent 156.25 MHz crystal oscillator with atleast 100ppm stability. Input pin assignments are locked in the .ucf file.

```
DRP_CLK : IN STD_LOGIC;
```

This clock input is needed for completing GTH initialization and reset. It must be a free running clock in range of 25-60 MHz.

```
RXN_GTH : IN STD_LOGIC_VECTOR(4-1 downto 0);
RXP_GTH : IN STD_LOGIC_VECTOR(4-1 downto 0);
TXN_GTH : OUT STD_LOGIC_VECTOR(4-1 downto 0);
TXP_GTH : OUT STD_LOGIC_VECTOR(4-1 downto 0);
```

These are the high-speed serial data lines of the GTH serdes. All 4 pairs are used. The implementation tool assigns them automatically to the corresponding physical pins.

• User ports (fabric):

```
XLG_Clk_RX : OUT STD_LOGIC;
XLG Clk TX : OUT STD LOGIC;
```

Two independent MAC clocks are synthetized from the PCS clocks for RX and TX. All user logic acts on the rising edge of those clocks.

```
XLG_RXData : OUT STD_LOGIC_VECTOR(5*64-1 downto
0);
```

Receive Data from 40G Rx-MAC. Ethernet frames always start on the first byte.

```
XLG_RXMod40 : OUT STD_LOGIC_VECTOR(6-1 downto 0);
```

This shows the number of valid bytes at the end of the frame. 0 means all bytes are valid, b"100111" means 319 bytes are valid.

```
XLG_RXDv : OUT STD_LOGIC;
```

Data valid signal: RXData, RXSof, RXEof is only valid when this signal is in high-state.

XLG\_RXSof : OUT STD\_LOGIC;

Start of frame signal: This indicates the first valid data-nibble in an ethernet frame.

XLG\_RXEof : OUT STD\_LOGIC;

End of frame signal: This indicates the last valid data-nibble in an ethernet frame. (RXMod40 is only valid when this signal is in high-state)

XLG RXCRCErr : OUT STD LOGIC;

Checksum error signal: The indicates, that calculated and received ethernet checksums don't match.

XLG\_RXErr : OUT STD\_LOGIC;

Receive error signal: This indicates a far-end receive error, or and invalid 64b/66b code.

XLG\_TXAv : OUT STD\_LOGIC;

Transmit buffer available: This indicates, that the transmit buffer is ready to accept an ethernet frame for transmission (60 - 16kbyte in length)

XLG\_TXData : IN STD\_LOGIC\_VECTOR(5\*64-1 downto
0);

Transmit Data for 40G Tx-MAC. Ethernet frames must always start on the first byte.

XLG\_TXMod40 : IN STD\_LOGIC\_VECTOR(6-1 downto 0);

This shows the number of valid bytes at the end of the frame. 0 means all bytes are valid, b"100111" means 319 bytes are valid.

XLG\_TXDv : IN STD\_LOGIC;

Data valid signal: The Tx-MAC only accepts input data, when this signal is in high-state.

XLG\_TXSof : IN STD\_LOGIC;

Start of frame signal: This indicates the first valid data-nibble in an ethernet frame.

XLG\_TXEof : IN STD\_LOGIC;

End of frame signal: This indicates the last valid data-nibble in an ethernet frame. (TXMod40 is only valid when this signal is in high-state)

• Misc signals:

PCS RXSof : OUT STD LOGIC;

PCS start of frame: for debug purposes, and precise timestamping.

XLG\_Rst : IN STD\_LOGIC;

Reset signal: This signal resets the user portion of the 40G core (fabric reset)

XLG\_CDR\_Rst : IN STD\_LOGIC;

GTH CDR Reset: This signal resets the Rx clock syntheser of the GTH transciever (phy reset)

XLG\_signal\_ok : IN STD\_LOGIC;

Receive signal OK. Can be tied to high if no physical indication is available.

XLG initdone : OUT STD LOGIC;

GTH initialization complete, core is ready.

XLG\_aligned : OUT STD\_LOGIC;

All virtual lanes are aligned and locked, core is ready to receive frames.