[Jool-list] BIB-less NAT64

Alberto Leiva ydahhrk at gmail.com
Wed Oct 11 12:38:11 CDT 2017


Okay, I think we're done with the heavy testing. there are still a few
loose ends I'd like to wrap up, but I think Sander and I agree that it's
time to put this thread's concerns to rest. The results may be
disappointing in a way, but at least we didn't come out empty-handed.

Summary first:

Jool, even 3.5.4 Jool, withstands T-Rex's torture traffic without
flinching. There are no significant performance issues to worry about. CPU
usage is at 1% at worst and there are no packet drops. The bottleneck was
some hardware quirk or some software misconfiguration that, unfortunately,
we never spotted.

(Come to think of it, T-Rex uses more CPU than Jool. At least when BIB
logging is turned off. I didn't expect that--considering T-Rex uses DPDK.)

I'm going to detail my thoughts now. If all you wanted was the outcome of
the performance testing voyage, that was it, so you don't need to read this
mail any further.

So, basically, I have personally not seen any traces of any struggle Jool
has ever gone through while being pummeled by T-Rex's traffic. I can't
explain Sander's soft lockup at all, but for as long as I have run the
test, the machine has remained responsive and `top` always reports minimal
CPU utilization. (As long as I turn BIB logging off; rsyslog does seem to
struggle due to the logging volume.) From its perspective, Jool is
translating packets without even noticing disruptions. Packets are being
translated uneventfully and the kernel returns success whenever Jool trusts
a packet to it. (Although admittedly, that might or might not mean that the
packet was successfully delivered due to the random and imperfect nature of
networks.) Whatever was causing mayhem, Jool wasn't even aware of it.

So I'd like to clarify:

> it also seems that session creation has a higher priority than packet
forwarding

I will not deny this, but I should mention that this is not so much a
priority issue as much as a natural consequence of the fact that the
translated packet is sent *after* it is translated. Once Jool has done its
job, it is up to the kernel whether the packet is posted on the interface,
and even then, it is up to the network to deliver the packet to the
destination.

We used to get very strange performance reports (speed going up and down
with massive drop rates), until at some point in the past month, the issue
appeared to (seemingly independently of anything I did) fix itself. And
that's why I'm down on this whole ordeal: We can't go back to the old setup
to continue testing. There is no way to find the root of the problem now. I
had hoped this whole effort would result in a documentation note akin to
the offload warning, so users could be aware of it, but there is nothing I
can do anymore. All I can hope is that this was a hardware bug which
silently prevented offloads from being turned off or something like that,
and which eventually fixed itself as a result of my constant random
tweaking everywhere.

But at least, I can now say fairly confidently that Jool is pretty darn
fast, even without the latest performance tweaks applied, as evidenced from
the fact that, now that whatever was hobbling before is gone, it is pretty
clear that Jool can keep up to at least this configuration of T-Rex with
flying colors.

Well, it is what I personally am getting out of the output anyway. In case
anyone wants to review T-Rex's latest report, I have attached it to this
mail along with one of the old/failing ones. Here's what I believe is the
official documentation of this output:
https://trex-tgn.cisco.com/trex/doc/trex_manual.html#_trex_output. The
numbers I'm basing my claims off are the drop-rate (outright zero), the
"Total-Tx" and "Total-Rx" numbers (always close to each other) and the
active flows ("m_active_flows" and "m_est_flows"; also always close to each
other). One thing that has baffled me this whole time is the Transmission
Bandwidth ("Tx Bw"). I don't understand why would it be so different for
each port, but I don't really understand where does T-Rex gets this
information in the first place.

When I said that we didn't come out empty-handed I was referring to a
little bottleneck I spotted while looking at Jool's numbers. Turns out that
the session timer can hold a spinlock for a bit irresponsibly long (I
probably underestimated those tree removal rebalances), which keeps packets
being translated in the meantime waiting. It is not something overly
significant, because it affects the maximum translation timestamp far more
than the average one, but for as much a micro-optimization it is,
addressing it would certainly be more productive than dealing with the
cache-missing that usual kernel developers seem to obsess over. But it's
not going to be noticeably faster either, which is why it should be safe to
postpone to the release after the upcoming one.

Regarding the BIB-pool4 optimization I explained a few mails back: This
wasn't the problem at all because T-Rex's traffic was not enough to exhaust
pool4. I believe I will still end up merging the solution into master
regardless.

Finally, I need to announce that I'm going out for a week's worth of
vacations starting tomorrow so I won't be able to merge all the latest work
until then. In the meantime, since there are no severe performance issues
to note, using the latest version of Jool shouldn't be too hazardous. I do
recommend master over 3.5.4 due to the latest bugfix, though.

I'm still keeping an eye on the list.

On Fri, Sep 15, 2017 at 2:03 PM, Alberto Leiva <ydahhrk at gmail.com> wrote:

> It's strange that the third stream pushed the SIIT further when the NAT64
> was already at 100% it the second stream. I'd have expected the load to be
> the same since the NAT64 could not, in theory, send the SIIT any more
> packets. Or am I misunderstanding something?
>
> Also, I'm really stumped that you managed to peak using only two streams.
> This should render all database lookups practically instantaneous. If this
> is the case then... yeah. I guess there's not much I will be able to do
> from code.
>
> But we'll see.
>
> On Fri, Sep 15, 2017 at 12:18 PM, Alan Whinery <whinery at hawaii.edu> wrote:
>
>>
>> Sort of a digression, but since Alberto referred to Linux router
>> performance --
>>
>> After I got the Jool/Jool v6-only NAT64/bitw-464CLAT scenario working, I
>> tried some file transfers at 100 Mbps to v4-numeric addresses, so it was
>> hitting both boxes (VMs, actually).  Watching the software interrupt load
>> with top, I was getting around 10% load on the first 100 Mbps stream and a
>> second stream pushed NAT64 to 100% load on SI, while CLAT was only doing
>> about  30%, third stream 100% NAT64, 90% CLAT.
>>
>> Attached PDF is what I wrote when I still remembered, about increasing
>> cores and spreading CPU affinity to mitigate.
>>
>> The point being that there are things to be understood about Linux router
>> performance, in tandem with NAT64/SIIT performance. For one, rolling in the
>> right off-loading, coalescence, etc, as well as CPU affinity to tune the
>> box like a router, rather than as a host. This stuff is might be a problem
>> well before you get to the network scale that has been tested with TRex.
>>
>> On 9/15/2017 5:49 AM, Alberto Leiva wrote:
>>
>> Thank you!
>>
>> > One thing I have been wondering about is if the TRex side gets confused
>> and Jool is actually ok. If that is the case then I apologise!
>>
>> Well, who knows. I'm thinking that, if a normal Linux router would pass a
>> similar test but a NAT64 Linux with Jool doesn't, then there should in
>> theory be something that can be done.
>>
>> > What would be the best way to check that? Massive pcaps?
>>
>> I will compile a version with a bunch of timestamp tracking and see if we
>> can get some conclusions out of it.
>>
>> Working...
>>
>> On Fri, Sep 15, 2017 at 5:24 AM, Sander Steffann <sander at steffann.nl>
>> wrote:
>>
>>> Hi,
>>>
>>> > Okay, guys. Prototype ready. I didn't test a gazillion connections,
>>> but as far as basic functionality goes, it looks stable. Don't quote me on
>>> that, though.
>>> >
>>> > Experimental branch in fake-nat64, in case anyone wants to try it out:
>>> https://github.com/NICMx/Jool/tree/fake-nat64
>>>
>>> Sorry, it still collapses :(
>>>
>>> I recorded a small test here: http://www.steffann.nl/sander/
>>> Fake%20NAT64%20collapse.mov
>>>
>>> The behaviour is really strange. One thing I have been wondering about
>>> is if the TRex side gets confused and Jool is actually ok. If that is the
>>> case then I apologise! What would be the best way to check that? Massive
>>> pcaps?
>>>
>>> Cheers,
>>> Sander
>>>
>>>
>>
>>
>> _______________________________________________
>> Jool-list mailing listJool-list at nic.mxhttps://mail-lists.nic.mx/listas/listinfo/jool-list
>>
>>
>>
>> _______________________________________________
>> Jool-list mailing list
>> Jool-list at nic.mx
>> https://mail-lists.nic.mx/listas/listinfo/jool-list
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail-lists.nic.mx/pipermail/jool-list/attachments/20171011/946246ef/attachment-0001.html>
-------------- next part --------------
-------------------------------------------------------------------
This is a report I got just now.
-------------------------------------------------------------------

-Per port stats table 
      ports |               0 |               1 
 -----------------------------------------------------------------------------------------
   opackets |        14987324 |        41611624 
     obytes |      1776772320 |     56340402998 
   ipackets |        41610306 |        14987530 
     ibytes |     57170300382 |      1477031674 
    ierrors |               0 |               0 
    oerrors |               0 |               0 
      Tx Bw |      23.74 Mbps |     752.99 Mbps 

-Global stats enabled 
 Cpu Utilization : 4.4  %  35.0 Gb/core 
 Platform_factor : 1.0  
 Total-Tx        :     776.73 Mbps  
 Total-Rx        :     783.78 Mbps  
 Total-PPS       :      94.55 Kpps  
 Total-CPS       :       2.78 Kcps  

 Expected-PPS    :       0.00  pps  
 Expected-CPS    :       0.00  cps  
 Expected-BPS    :       0.00  bps  

 Active-flows    :     6665  Clients :      256   Socket-util : 0.0413 %    
 Open-flows      :  1665014  Servers :    65536   Socket :     6665 Socket/Clients :  26.0 
 drop-rate       :       0.00  bps   
 current time    : 603.5 sec  
 test duration   : 0.0 sec  
 *** TRex is shutting down - cause: 'test has ended'
 latency daemon has stopped
 ==================
 interface sum 
 ==================
 --------------- 
port : 0 
------------
 opackets                                 : 14992636 
 obytes                                   : 1777402520 
 ipackets                                 : 41625088 
 ibytes                                   : 57190607216 
 Tx :      14.97 Mbps  
port : 1 
------------
 opackets                                 : 41626404 
 obytes                                   : 56360414104 
 ipackets                                 : 14992844 
 ibytes                                   : 1477555762 
 Tx :     474.97 Mbps  
 Cpu Utilization : 4.4  %  22.1 Gb/core 
 Platform_factor : 1.0  
 Total-Tx        :     489.94 Mbps  
 Total-Rx        :     494.39 Mbps  
 Total-PPS       :      59.64 Kpps  
 Total-CPS       :       1.52 Kcps  

 Expected-PPS    :       0.00  pps  
 Expected-CPS    :       0.00  cps  
 Expected-BPS    :       0.00  bps  

 Active-flows    :     6664  Clients :      256   Socket-util : 0.0413 %    
 Open-flows      :  1665601  Servers :    65536   Socket :     6664 Socket/Clients :  26.0 
 drop-rate       :       0.00  bps   
 ==================
 

 ==================
 interface sum 
 ==================
------------------------ 
 per core stats core id : 1  
------------------------ 
------------------------ 
 per core per if stats id : 1  
------------------------ 
 port 0, queue id :0  - client 
 ---------------------------- 
 port 1, queue id :0  - server 
 ---------------------------- 
 ==================
 generators 
 ==================


normal
-------------
 min_delta  : 10 usec 
 cnt        : 0 
 high_cnt   : 0 
 max_d_time : 0 usec
 sliding_average    : 0 usec
 precent    : -nan %
 histogram 
 -----------
 m_total_open_flows                      :       1.67 Mflows 
 m_total_open_flows                      : 1665601 
 m_total_close_flows                     : 1658937 
 active                                  : 6664 
 --------------- 
port : 0 
------------
 opackets                                 : 14992636 
 obytes                                   : 1777402520 
 ipackets                                 : 41625088 
 ibytes                                   : 57190607216 
 Tx :      14.97 Mbps  
port : 1 
------------
 opackets                                 : 41626404 
 obytes                                   : 56360414104 
 ipackets                                 : 14992844 
 ibytes                                   : 1477555762 
 Tx :     474.97 Mbps  
 Cpu Utilization : 4.4  %  22.1 Gb/core 
 Platform_factor : 1.0  
 Total-Tx        :     489.94 Mbps  
 Total-Rx        :     494.39 Mbps  
 Total-PPS       :      59.64 Kpps  
 Total-CPS       :       1.52 Kcps  

 Expected-PPS    :       0.00  pps  
 Expected-CPS    :       0.00  cps  
 Expected-BPS    :       0.00  bps  

 Active-flows    :     6664  Clients :      256   Socket-util : 0.0413 %    
 Open-flows      :  1665601  Servers :    65536   Socket :     6664 Socket/Clients :  26.0 
 drop-rate       :       0.00  bps   
 summary stats 
 -------------- 
 Total-pkt-drop       : 1108 pkts 
 Total-tx-bytes       : 58137816624 bytes 
 Total-tx-sw-bytes    : 0 bytes 
 Total-rx-bytes       : 58668162978 byte 
 
 Total-tx-pkt         : 56619040 pkts 
 Total-rx-pkt         : 56617932 pkts 
 Total-sw-tx-pkt      : 0 pkts 
 Total-sw-err         : 0 pkts 
 Total ARP sent       : 2 pkts 
 Total ARP received   : 0 pkts 
                       |          client   |            server   |  
 -----------------------------------------------------------------------------------------
       m_active_flows  |            6664  |             6666  |  active flows
          m_est_flows  |            6659  |             6659  |  active est flows
         m_tx_bw_l7_r  |       3.02 Mbps  |      389.83 Mbps  |  tx bw
         m_rx_bw_l7_r  |     389.83 Mbps  |        3.02 Mbps  |  rx bw
           m_tx_pps_r  |      13.67 Kpps  |       12.15 Kpps  |  tx pps
           m_rx_pps_r  |      39.47 Kpps  |       15.18 Kpps  |  rx pps
           m_avg_size  |       924.09  B  |          1.80 KB  |  average pkt size
                    -  |             ---  |              ---  |  
                  TCP  |             ---  |              ---  |  
                    -  |             ---  |              ---  |  
     tcps_connattempt  |         1665601  |                0  |  connections initiated
         tcps_accepts  |               0  |          1665600  |  connections accepted
        tcps_connects  |         1665596  |          1665593  |  connections established
          tcps_closed  |         1658937  |          1658934  |  conn. closed (includes drops)
       tcps_segstimed  |         4995405  |         13321062  |  segs where we tried to get rtt
      tcps_rttupdated  |         4995379  |         13319229  |  times we succeeded
          tcps_delack  |         8323320  |                0  |  delayed acks sent
        tcps_sndtotal  |        14992636  |         13325139  |  total packets sent
         tcps_sndpack  |         1665596  |          9991751  |  data packets sent
         tcps_sndbyte  |       414733404  |      53430974928  |  data bytes sent
         tcps_sndctrl  |         1665619  |               18  |  control (SYN|FIN|RST) packets sent
         tcps_sndacks  |        11661393  |          3330226  |  ack-only packets sent 
         tcps_rcvpack  |        39950637  |          3329798  |  packets received in sequence
         tcps_rcvbyte  |     53430887916  |        414732657  |  bytes received in sequence
      tcps_rcvackpack  |         3329792  |         13319229  |  rcvd ack packets
      tcps_rcvackbyte  |       416396112  |      53426370242  |  tx bytes acked by rcvd acks
         tcps_preddat  |        36619358  |                0  |  times hdr predict ok for data pkts 
           tcps_drops  |               0  |                1  | *connections dropped
       tcps_conndrops  |               1  |                0  | *embryonic connections dropped
      tcps_rexmttimeo  |              46  |              332  | *retransmit timeouts
       tcps_keeptimeo  |               1  |                1  | *keepalive timeouts
       tcps_keepdrops  |               1  |                1  | *connections dropped in keepalive
   tcps_sndrexmitpack  |              28  |             3144  | *data packets retransmitted
   tcps_sndrexmitbyte  |            6972  |          2265942  | *data bytes retransmitted
      tcps_rcvduppack  |            1994  |              421  | *duplicate-only packets received
      tcps_rcvdupbyte  |           54930  |                0  | *duplicate-only bytes received
  tcps_rcvpartduppack  |              10  |                0  | *packets with some duplicate data
  tcps_rcvpartdupbyte  |           47628  |                0  | *dup. bytes in part-dup. packets
       tcps_rcvoopack  |            8074  |                0  | *out-of-order packets received
       tcps_rcvoobyte  |        10190982  |                0  | *out-of-order bytes received
       tcps_rcvdupack  |             421  |             7728  | *rcvd duplicate acks
       tcps_reasalloc  |            1526  |                0  | *allocate tcp reasembly ctx
        tcps_reasfree  |            1526  |                0  | *free tcp reasembly ctx
                    -  |             ---  |              ---  |  
           Flow Table  |             ---  |              ---  |  
                    -  |             ---  |              ---  |  
       redirect_rx_ok  |             265  |              265  |  redirect to rx OK
-------------- next part --------------
-------------------------------------------------------------------
This is one of the reports I saved back when the test was failing.
-------------------------------------------------------------------

-Per port stats table 
      ports |               0 |               1 
 -----------------------------------------------------------------------------------------
   opackets |          518481 |          548208 
     obytes |        54889755 |       736249102 
   ipackets |          548280 |          193464 
     ibytes |       747217870 |        19261862 
    ierrors |               0 |               0 
    oerrors |               0 |               0 
      Tx Bw |       6.54 Mbps |      15.19 Kbps 

-Global stats enabled 
 Cpu Utilization : 11.1  %  0.1 Gb/core 
 Platform_factor : 1.0  
 Total-Tx        :       6.55 Mbps  
 Total-Rx        :      16.57 Kbps  
 Total-PPS       :       8.34 Kpps  
 Total-CPS       :       2.78 Kcps  

 Expected-PPS    :       0.00  pps  
 Expected-CPS    :       0.00  cps  
 Expected-BPS    :       0.00  bps  

 Active-flows    :   145554  Clients :      256   Socket-util : 0.9028 %    
 Open-flows      :   166422  Servers :    65536   Socket :   145554 Socket/Clients :  568.6 
 drop-rate       :       6.53 Mbps   
 current time    : 62.3 sec  
 test duration   : 0.0 sec  
 *** TRex is shutting down - cause: 'test has ended'
 latency daemon has stopped
 ==================
 interface sum 
 ==================
 --------------- 
port : 0 
------------
 opackets                                 : 518905 
 obytes                                   : 54931307 
 ipackets                                 : 548282 
 ibytes                                   : 747217998 
 Tx :       3.96 Mbps  
port : 1 
------------
 opackets                                 : 548208 
 obytes                                   : 736249102 
 ipackets                                 : 193466 
 ibytes                                   : 19261990 
 Tx :       7.59 Kbps  
 Cpu Utilization : 11.1  %  0.1 Gb/core 
 Platform_factor : 1.0  
 Total-Tx        :       3.97 Mbps  
 Total-Rx        :       8.66 Kbps  
 Total-PPS       :       5.06 Kpps  
 Total-CPS       :       1.42 Kcps  

 Expected-PPS    :       0.00  pps  
 Expected-CPS    :       0.00  cps  
 Expected-BPS    :       0.00  bps  

 Active-flows    :   145693  Clients :      256   Socket-util : 0.9037 %    
 Open-flows      :   166561  Servers :    65536   Socket :   145693 Socket/Clients :  569.1 
 drop-rate       :       3.96 Mbps   
 ==================
 

 ==================
 interface sum 
 ==================
------------------------ 
 per core stats core id : 1  
------------------------ 
------------------------ 
 per core per if stats id : 1  
------------------------ 
 port 0, queue id :0  - client 
 ---------------------------- 
 port 1, queue id :0  - server 
 ---------------------------- 
 ==================
 generators 
 ==================


normal
-------------
 min_delta  : 10 usec 
 cnt        : 0 
 high_cnt   : 0 
 max_d_time : 0 usec
 sliding_average    : 0 usec
 precent    : -nan %
 histogram 
 -----------
 m_total_open_flows                      :     166.56 Kflows 
 m_total_open_flows                      : 166561 
 m_total_close_flows                     : 20868 
 active                                  : 145693 
 --------------- 
port : 0 
------------
 opackets                                 : 518905 
 obytes                                   : 54931307 
 ipackets                                 : 548282 
 ibytes                                   : 747217998 
 Tx :       3.96 Mbps  
port : 1 
------------
 opackets                                 : 548208 
 obytes                                   : 736249102 
 ipackets                                 : 193466 
 ibytes                                   : 19261990 
 Tx :       7.59 Kbps  
 Cpu Utilization : 11.1  %  0.1 Gb/core 
 Platform_factor : 1.0  
 Total-Tx        :       3.97 Mbps  
 Total-Rx        :       8.66 Kbps  
 Total-PPS       :       5.06 Kpps  
 Total-CPS       :       1.42 Kcps  

 Expected-PPS    :       0.00  pps  
 Expected-CPS    :       0.00  cps  
 Expected-BPS    :       0.00  bps  

 Active-flows    :   145693  Clients :      256   Socket-util : 0.9037 %    
 Open-flows      :   166561  Servers :    65536   Socket :   145693 Socket/Clients :  569.1 
 drop-rate       :       3.96 Mbps   
 summary stats 
 -------------- 
 Total-pkt-drop       : 325365 pkts 
 Total-tx-bytes       : 791180409 bytes 
 Total-tx-sw-bytes    : 0 bytes 
 Total-rx-bytes       : 766479988 byte 
 
 Total-tx-pkt         : 1067113 pkts 
 Total-rx-pkt         : 741748 pkts 
 Total-sw-tx-pkt      : 0 pkts 
 Total-sw-err         : 0 pkts 
 Total ARP sent       : 4 pkts 
 Total ARP received   : 8 pkts 
                       |          client   |            server   |  
 -----------------------------------------------------------------------------------------
       m_active_flows  |          145693  |             1399  |  active flows
          m_est_flows  |            1393  |             1394  |  active est flows
         m_tx_bw_l7_r  |       0.00  bps  |        0.00  bps  |  tx bw
         m_rx_bw_l7_r  |       0.00  bps  |        0.00  bps  |  rx bw
           m_tx_pps_r  |       4.26 Kpps  |        1.20  pps  |  tx pps
           m_rx_pps_r  |       0.00  pps  |        0.00  pps  |  rx pps
           m_avg_size  |         0.00  B  |          0.00  B  |  average pkt size
                    -  |             ---  |              ---  |  
                  TCP  |             ---  |              ---  |  
                    -  |             ---  |              ---  |  
     tcps_connattempt  |          166561  |                0  |  connections initiated
         tcps_accepts  |               0  |            22261  |  connections accepted
        tcps_connects  |           22261  |            22256  |  connections established
          tcps_closed  |           20868  |            20862  |  conn. closed (includes drops)
       tcps_segstimed  |          209695  |           172527  |  segs where we tried to get rtt
      tcps_rttupdated  |           65385  |           171128  |  times we succeeded
          tcps_delack  |          108525  |                0  |  delayed acks sent
        tcps_sndtotal  |          518905  |           183653  |  total packets sent
         tcps_sndpack  |           22261  |           129398  |  data packets sent
         tcps_sndbyte  |         5542989  |        689710134  |  data bytes sent
         tcps_sndctrl  |          335256  |               34  |  control (SYN|FIN|RST) packets sent
         tcps_sndacks  |          161368  |            43129  |  ack-only packets sent 
         tcps_rcvpack  |          514821  |            43124  |  packets received in sequence
         tcps_rcvbyte  |       689710134  |          5541744  |  bytes received in sequence
      tcps_rcvackpack  |           43124  |           171128  |  rcvd ack packets
      tcps_rcvackbyte  |         5562612  |        681820178  |  tx bytes acked by rcvd acks
         tcps_preddat  |          471697  |                0  |  times hdr predict ok for data pkts 
      tcps_rexmttimeo  |          168715  |             5580  | *retransmit timeouts
   tcps_sndrexmitpack  |              20  |            11092  | *data packets retransmitted
   tcps_sndrexmitbyte  |            4980  |          7986240  | *data bytes retransmitted
      tcps_rcvduppack  |           11102  |                0  | *duplicate-only packets received
      tcps_rcvdupbyte  |         7986240  |                0  | *duplicate-only bytes received
       tcps_rcvdupack  |              10  |                0  | *rcvd duplicate acks
                    -  |             ---  |              ---  |  
           Flow Table  |             ---  |              ---  |  
                    -  |             ---  |              ---  |  
              err_cwf  |              24  |                0  | *client pkt without flow
       redirect_rx_ok  |              72  |               75  |  redirect to rx OK



More information about the Jool-list mailing list