[Jool-list] BIB-less NAT64
Alberto Leiva
ydahhrk at gmail.com
Wed Oct 11 12:38:11 CDT 2017
Okay, I think we're done with the heavy testing. there are still a few
loose ends I'd like to wrap up, but I think Sander and I agree that it's
time to put this thread's concerns to rest. The results may be
disappointing in a way, but at least we didn't come out empty-handed.
Summary first:
Jool, even 3.5.4 Jool, withstands T-Rex's torture traffic without
flinching. There are no significant performance issues to worry about. CPU
usage is at 1% at worst and there are no packet drops. The bottleneck was
some hardware quirk or some software misconfiguration that, unfortunately,
we never spotted.
(Come to think of it, T-Rex uses more CPU than Jool. At least when BIB
logging is turned off. I didn't expect that--considering T-Rex uses DPDK.)
I'm going to detail my thoughts now. If all you wanted was the outcome of
the performance testing voyage, that was it, so you don't need to read this
mail any further.
So, basically, I have personally not seen any traces of any struggle Jool
has ever gone through while being pummeled by T-Rex's traffic. I can't
explain Sander's soft lockup at all, but for as long as I have run the
test, the machine has remained responsive and `top` always reports minimal
CPU utilization. (As long as I turn BIB logging off; rsyslog does seem to
struggle due to the logging volume.) From its perspective, Jool is
translating packets without even noticing disruptions. Packets are being
translated uneventfully and the kernel returns success whenever Jool trusts
a packet to it. (Although admittedly, that might or might not mean that the
packet was successfully delivered due to the random and imperfect nature of
networks.) Whatever was causing mayhem, Jool wasn't even aware of it.
So I'd like to clarify:
> it also seems that session creation has a higher priority than packet
forwarding
I will not deny this, but I should mention that this is not so much a
priority issue as much as a natural consequence of the fact that the
translated packet is sent *after* it is translated. Once Jool has done its
job, it is up to the kernel whether the packet is posted on the interface,
and even then, it is up to the network to deliver the packet to the
destination.
We used to get very strange performance reports (speed going up and down
with massive drop rates), until at some point in the past month, the issue
appeared to (seemingly independently of anything I did) fix itself. And
that's why I'm down on this whole ordeal: We can't go back to the old setup
to continue testing. There is no way to find the root of the problem now. I
had hoped this whole effort would result in a documentation note akin to
the offload warning, so users could be aware of it, but there is nothing I
can do anymore. All I can hope is that this was a hardware bug which
silently prevented offloads from being turned off or something like that,
and which eventually fixed itself as a result of my constant random
tweaking everywhere.
But at least, I can now say fairly confidently that Jool is pretty darn
fast, even without the latest performance tweaks applied, as evidenced from
the fact that, now that whatever was hobbling before is gone, it is pretty
clear that Jool can keep up to at least this configuration of T-Rex with
flying colors.
Well, it is what I personally am getting out of the output anyway. In case
anyone wants to review T-Rex's latest report, I have attached it to this
mail along with one of the old/failing ones. Here's what I believe is the
official documentation of this output:
https://trex-tgn.cisco.com/trex/doc/trex_manual.html#_trex_output. The
numbers I'm basing my claims off are the drop-rate (outright zero), the
"Total-Tx" and "Total-Rx" numbers (always close to each other) and the
active flows ("m_active_flows" and "m_est_flows"; also always close to each
other). One thing that has baffled me this whole time is the Transmission
Bandwidth ("Tx Bw"). I don't understand why would it be so different for
each port, but I don't really understand where does T-Rex gets this
information in the first place.
When I said that we didn't come out empty-handed I was referring to a
little bottleneck I spotted while looking at Jool's numbers. Turns out that
the session timer can hold a spinlock for a bit irresponsibly long (I
probably underestimated those tree removal rebalances), which keeps packets
being translated in the meantime waiting. It is not something overly
significant, because it affects the maximum translation timestamp far more
than the average one, but for as much a micro-optimization it is,
addressing it would certainly be more productive than dealing with the
cache-missing that usual kernel developers seem to obsess over. But it's
not going to be noticeably faster either, which is why it should be safe to
postpone to the release after the upcoming one.
Regarding the BIB-pool4 optimization I explained a few mails back: This
wasn't the problem at all because T-Rex's traffic was not enough to exhaust
pool4. I believe I will still end up merging the solution into master
regardless.
Finally, I need to announce that I'm going out for a week's worth of
vacations starting tomorrow so I won't be able to merge all the latest work
until then. In the meantime, since there are no severe performance issues
to note, using the latest version of Jool shouldn't be too hazardous. I do
recommend master over 3.5.4 due to the latest bugfix, though.
I'm still keeping an eye on the list.
On Fri, Sep 15, 2017 at 2:03 PM, Alberto Leiva <ydahhrk at gmail.com> wrote:
> It's strange that the third stream pushed the SIIT further when the NAT64
> was already at 100% it the second stream. I'd have expected the load to be
> the same since the NAT64 could not, in theory, send the SIIT any more
> packets. Or am I misunderstanding something?
>
> Also, I'm really stumped that you managed to peak using only two streams.
> This should render all database lookups practically instantaneous. If this
> is the case then... yeah. I guess there's not much I will be able to do
> from code.
>
> But we'll see.
>
> On Fri, Sep 15, 2017 at 12:18 PM, Alan Whinery <whinery at hawaii.edu> wrote:
>
>>
>> Sort of a digression, but since Alberto referred to Linux router
>> performance --
>>
>> After I got the Jool/Jool v6-only NAT64/bitw-464CLAT scenario working, I
>> tried some file transfers at 100 Mbps to v4-numeric addresses, so it was
>> hitting both boxes (VMs, actually). Watching the software interrupt load
>> with top, I was getting around 10% load on the first 100 Mbps stream and a
>> second stream pushed NAT64 to 100% load on SI, while CLAT was only doing
>> about 30%, third stream 100% NAT64, 90% CLAT.
>>
>> Attached PDF is what I wrote when I still remembered, about increasing
>> cores and spreading CPU affinity to mitigate.
>>
>> The point being that there are things to be understood about Linux router
>> performance, in tandem with NAT64/SIIT performance. For one, rolling in the
>> right off-loading, coalescence, etc, as well as CPU affinity to tune the
>> box like a router, rather than as a host. This stuff is might be a problem
>> well before you get to the network scale that has been tested with TRex.
>>
>> On 9/15/2017 5:49 AM, Alberto Leiva wrote:
>>
>> Thank you!
>>
>> > One thing I have been wondering about is if the TRex side gets confused
>> and Jool is actually ok. If that is the case then I apologise!
>>
>> Well, who knows. I'm thinking that, if a normal Linux router would pass a
>> similar test but a NAT64 Linux with Jool doesn't, then there should in
>> theory be something that can be done.
>>
>> > What would be the best way to check that? Massive pcaps?
>>
>> I will compile a version with a bunch of timestamp tracking and see if we
>> can get some conclusions out of it.
>>
>> Working...
>>
>> On Fri, Sep 15, 2017 at 5:24 AM, Sander Steffann <sander at steffann.nl>
>> wrote:
>>
>>> Hi,
>>>
>>> > Okay, guys. Prototype ready. I didn't test a gazillion connections,
>>> but as far as basic functionality goes, it looks stable. Don't quote me on
>>> that, though.
>>> >
>>> > Experimental branch in fake-nat64, in case anyone wants to try it out:
>>> https://github.com/NICMx/Jool/tree/fake-nat64
>>>
>>> Sorry, it still collapses :(
>>>
>>> I recorded a small test here: http://www.steffann.nl/sander/
>>> Fake%20NAT64%20collapse.mov
>>>
>>> The behaviour is really strange. One thing I have been wondering about
>>> is if the TRex side gets confused and Jool is actually ok. If that is the
>>> case then I apologise! What would be the best way to check that? Massive
>>> pcaps?
>>>
>>> Cheers,
>>> Sander
>>>
>>>
>>
>>
>> _______________________________________________
>> Jool-list mailing listJool-list at nic.mxhttps://mail-lists.nic.mx/listas/listinfo/jool-list
>>
>>
>>
>> _______________________________________________
>> Jool-list mailing list
>> Jool-list at nic.mx
>> https://mail-lists.nic.mx/listas/listinfo/jool-list
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail-lists.nic.mx/pipermail/jool-list/attachments/20171011/946246ef/attachment-0001.html>
-------------- next part --------------
-------------------------------------------------------------------
This is a report I got just now.
-------------------------------------------------------------------
-Per port stats table
ports | 0 | 1
-----------------------------------------------------------------------------------------
opackets | 14987324 | 41611624
obytes | 1776772320 | 56340402998
ipackets | 41610306 | 14987530
ibytes | 57170300382 | 1477031674
ierrors | 0 | 0
oerrors | 0 | 0
Tx Bw | 23.74 Mbps | 752.99 Mbps
-Global stats enabled
Cpu Utilization : 4.4 % 35.0 Gb/core
Platform_factor : 1.0
Total-Tx : 776.73 Mbps
Total-Rx : 783.78 Mbps
Total-PPS : 94.55 Kpps
Total-CPS : 2.78 Kcps
Expected-PPS : 0.00 pps
Expected-CPS : 0.00 cps
Expected-BPS : 0.00 bps
Active-flows : 6665 Clients : 256 Socket-util : 0.0413 %
Open-flows : 1665014 Servers : 65536 Socket : 6665 Socket/Clients : 26.0
drop-rate : 0.00 bps
current time : 603.5 sec
test duration : 0.0 sec
*** TRex is shutting down - cause: 'test has ended'
latency daemon has stopped
==================
interface sum
==================
---------------
port : 0
------------
opackets : 14992636
obytes : 1777402520
ipackets : 41625088
ibytes : 57190607216
Tx : 14.97 Mbps
port : 1
------------
opackets : 41626404
obytes : 56360414104
ipackets : 14992844
ibytes : 1477555762
Tx : 474.97 Mbps
Cpu Utilization : 4.4 % 22.1 Gb/core
Platform_factor : 1.0
Total-Tx : 489.94 Mbps
Total-Rx : 494.39 Mbps
Total-PPS : 59.64 Kpps
Total-CPS : 1.52 Kcps
Expected-PPS : 0.00 pps
Expected-CPS : 0.00 cps
Expected-BPS : 0.00 bps
Active-flows : 6664 Clients : 256 Socket-util : 0.0413 %
Open-flows : 1665601 Servers : 65536 Socket : 6664 Socket/Clients : 26.0
drop-rate : 0.00 bps
==================
==================
interface sum
==================
------------------------
per core stats core id : 1
------------------------
------------------------
per core per if stats id : 1
------------------------
port 0, queue id :0 - client
----------------------------
port 1, queue id :0 - server
----------------------------
==================
generators
==================
normal
-------------
min_delta : 10 usec
cnt : 0
high_cnt : 0
max_d_time : 0 usec
sliding_average : 0 usec
precent : -nan %
histogram
-----------
m_total_open_flows : 1.67 Mflows
m_total_open_flows : 1665601
m_total_close_flows : 1658937
active : 6664
---------------
port : 0
------------
opackets : 14992636
obytes : 1777402520
ipackets : 41625088
ibytes : 57190607216
Tx : 14.97 Mbps
port : 1
------------
opackets : 41626404
obytes : 56360414104
ipackets : 14992844
ibytes : 1477555762
Tx : 474.97 Mbps
Cpu Utilization : 4.4 % 22.1 Gb/core
Platform_factor : 1.0
Total-Tx : 489.94 Mbps
Total-Rx : 494.39 Mbps
Total-PPS : 59.64 Kpps
Total-CPS : 1.52 Kcps
Expected-PPS : 0.00 pps
Expected-CPS : 0.00 cps
Expected-BPS : 0.00 bps
Active-flows : 6664 Clients : 256 Socket-util : 0.0413 %
Open-flows : 1665601 Servers : 65536 Socket : 6664 Socket/Clients : 26.0
drop-rate : 0.00 bps
summary stats
--------------
Total-pkt-drop : 1108 pkts
Total-tx-bytes : 58137816624 bytes
Total-tx-sw-bytes : 0 bytes
Total-rx-bytes : 58668162978 byte
Total-tx-pkt : 56619040 pkts
Total-rx-pkt : 56617932 pkts
Total-sw-tx-pkt : 0 pkts
Total-sw-err : 0 pkts
Total ARP sent : 2 pkts
Total ARP received : 0 pkts
| client | server |
-----------------------------------------------------------------------------------------
m_active_flows | 6664 | 6666 | active flows
m_est_flows | 6659 | 6659 | active est flows
m_tx_bw_l7_r | 3.02 Mbps | 389.83 Mbps | tx bw
m_rx_bw_l7_r | 389.83 Mbps | 3.02 Mbps | rx bw
m_tx_pps_r | 13.67 Kpps | 12.15 Kpps | tx pps
m_rx_pps_r | 39.47 Kpps | 15.18 Kpps | rx pps
m_avg_size | 924.09 B | 1.80 KB | average pkt size
- | --- | --- |
TCP | --- | --- |
- | --- | --- |
tcps_connattempt | 1665601 | 0 | connections initiated
tcps_accepts | 0 | 1665600 | connections accepted
tcps_connects | 1665596 | 1665593 | connections established
tcps_closed | 1658937 | 1658934 | conn. closed (includes drops)
tcps_segstimed | 4995405 | 13321062 | segs where we tried to get rtt
tcps_rttupdated | 4995379 | 13319229 | times we succeeded
tcps_delack | 8323320 | 0 | delayed acks sent
tcps_sndtotal | 14992636 | 13325139 | total packets sent
tcps_sndpack | 1665596 | 9991751 | data packets sent
tcps_sndbyte | 414733404 | 53430974928 | data bytes sent
tcps_sndctrl | 1665619 | 18 | control (SYN|FIN|RST) packets sent
tcps_sndacks | 11661393 | 3330226 | ack-only packets sent
tcps_rcvpack | 39950637 | 3329798 | packets received in sequence
tcps_rcvbyte | 53430887916 | 414732657 | bytes received in sequence
tcps_rcvackpack | 3329792 | 13319229 | rcvd ack packets
tcps_rcvackbyte | 416396112 | 53426370242 | tx bytes acked by rcvd acks
tcps_preddat | 36619358 | 0 | times hdr predict ok for data pkts
tcps_drops | 0 | 1 | *connections dropped
tcps_conndrops | 1 | 0 | *embryonic connections dropped
tcps_rexmttimeo | 46 | 332 | *retransmit timeouts
tcps_keeptimeo | 1 | 1 | *keepalive timeouts
tcps_keepdrops | 1 | 1 | *connections dropped in keepalive
tcps_sndrexmitpack | 28 | 3144 | *data packets retransmitted
tcps_sndrexmitbyte | 6972 | 2265942 | *data bytes retransmitted
tcps_rcvduppack | 1994 | 421 | *duplicate-only packets received
tcps_rcvdupbyte | 54930 | 0 | *duplicate-only bytes received
tcps_rcvpartduppack | 10 | 0 | *packets with some duplicate data
tcps_rcvpartdupbyte | 47628 | 0 | *dup. bytes in part-dup. packets
tcps_rcvoopack | 8074 | 0 | *out-of-order packets received
tcps_rcvoobyte | 10190982 | 0 | *out-of-order bytes received
tcps_rcvdupack | 421 | 7728 | *rcvd duplicate acks
tcps_reasalloc | 1526 | 0 | *allocate tcp reasembly ctx
tcps_reasfree | 1526 | 0 | *free tcp reasembly ctx
- | --- | --- |
Flow Table | --- | --- |
- | --- | --- |
redirect_rx_ok | 265 | 265 | redirect to rx OK
-------------- next part --------------
-------------------------------------------------------------------
This is one of the reports I saved back when the test was failing.
-------------------------------------------------------------------
-Per port stats table
ports | 0 | 1
-----------------------------------------------------------------------------------------
opackets | 518481 | 548208
obytes | 54889755 | 736249102
ipackets | 548280 | 193464
ibytes | 747217870 | 19261862
ierrors | 0 | 0
oerrors | 0 | 0
Tx Bw | 6.54 Mbps | 15.19 Kbps
-Global stats enabled
Cpu Utilization : 11.1 % 0.1 Gb/core
Platform_factor : 1.0
Total-Tx : 6.55 Mbps
Total-Rx : 16.57 Kbps
Total-PPS : 8.34 Kpps
Total-CPS : 2.78 Kcps
Expected-PPS : 0.00 pps
Expected-CPS : 0.00 cps
Expected-BPS : 0.00 bps
Active-flows : 145554 Clients : 256 Socket-util : 0.9028 %
Open-flows : 166422 Servers : 65536 Socket : 145554 Socket/Clients : 568.6
drop-rate : 6.53 Mbps
current time : 62.3 sec
test duration : 0.0 sec
*** TRex is shutting down - cause: 'test has ended'
latency daemon has stopped
==================
interface sum
==================
---------------
port : 0
------------
opackets : 518905
obytes : 54931307
ipackets : 548282
ibytes : 747217998
Tx : 3.96 Mbps
port : 1
------------
opackets : 548208
obytes : 736249102
ipackets : 193466
ibytes : 19261990
Tx : 7.59 Kbps
Cpu Utilization : 11.1 % 0.1 Gb/core
Platform_factor : 1.0
Total-Tx : 3.97 Mbps
Total-Rx : 8.66 Kbps
Total-PPS : 5.06 Kpps
Total-CPS : 1.42 Kcps
Expected-PPS : 0.00 pps
Expected-CPS : 0.00 cps
Expected-BPS : 0.00 bps
Active-flows : 145693 Clients : 256 Socket-util : 0.9037 %
Open-flows : 166561 Servers : 65536 Socket : 145693 Socket/Clients : 569.1
drop-rate : 3.96 Mbps
==================
==================
interface sum
==================
------------------------
per core stats core id : 1
------------------------
------------------------
per core per if stats id : 1
------------------------
port 0, queue id :0 - client
----------------------------
port 1, queue id :0 - server
----------------------------
==================
generators
==================
normal
-------------
min_delta : 10 usec
cnt : 0
high_cnt : 0
max_d_time : 0 usec
sliding_average : 0 usec
precent : -nan %
histogram
-----------
m_total_open_flows : 166.56 Kflows
m_total_open_flows : 166561
m_total_close_flows : 20868
active : 145693
---------------
port : 0
------------
opackets : 518905
obytes : 54931307
ipackets : 548282
ibytes : 747217998
Tx : 3.96 Mbps
port : 1
------------
opackets : 548208
obytes : 736249102
ipackets : 193466
ibytes : 19261990
Tx : 7.59 Kbps
Cpu Utilization : 11.1 % 0.1 Gb/core
Platform_factor : 1.0
Total-Tx : 3.97 Mbps
Total-Rx : 8.66 Kbps
Total-PPS : 5.06 Kpps
Total-CPS : 1.42 Kcps
Expected-PPS : 0.00 pps
Expected-CPS : 0.00 cps
Expected-BPS : 0.00 bps
Active-flows : 145693 Clients : 256 Socket-util : 0.9037 %
Open-flows : 166561 Servers : 65536 Socket : 145693 Socket/Clients : 569.1
drop-rate : 3.96 Mbps
summary stats
--------------
Total-pkt-drop : 325365 pkts
Total-tx-bytes : 791180409 bytes
Total-tx-sw-bytes : 0 bytes
Total-rx-bytes : 766479988 byte
Total-tx-pkt : 1067113 pkts
Total-rx-pkt : 741748 pkts
Total-sw-tx-pkt : 0 pkts
Total-sw-err : 0 pkts
Total ARP sent : 4 pkts
Total ARP received : 8 pkts
| client | server |
-----------------------------------------------------------------------------------------
m_active_flows | 145693 | 1399 | active flows
m_est_flows | 1393 | 1394 | active est flows
m_tx_bw_l7_r | 0.00 bps | 0.00 bps | tx bw
m_rx_bw_l7_r | 0.00 bps | 0.00 bps | rx bw
m_tx_pps_r | 4.26 Kpps | 1.20 pps | tx pps
m_rx_pps_r | 0.00 pps | 0.00 pps | rx pps
m_avg_size | 0.00 B | 0.00 B | average pkt size
- | --- | --- |
TCP | --- | --- |
- | --- | --- |
tcps_connattempt | 166561 | 0 | connections initiated
tcps_accepts | 0 | 22261 | connections accepted
tcps_connects | 22261 | 22256 | connections established
tcps_closed | 20868 | 20862 | conn. closed (includes drops)
tcps_segstimed | 209695 | 172527 | segs where we tried to get rtt
tcps_rttupdated | 65385 | 171128 | times we succeeded
tcps_delack | 108525 | 0 | delayed acks sent
tcps_sndtotal | 518905 | 183653 | total packets sent
tcps_sndpack | 22261 | 129398 | data packets sent
tcps_sndbyte | 5542989 | 689710134 | data bytes sent
tcps_sndctrl | 335256 | 34 | control (SYN|FIN|RST) packets sent
tcps_sndacks | 161368 | 43129 | ack-only packets sent
tcps_rcvpack | 514821 | 43124 | packets received in sequence
tcps_rcvbyte | 689710134 | 5541744 | bytes received in sequence
tcps_rcvackpack | 43124 | 171128 | rcvd ack packets
tcps_rcvackbyte | 5562612 | 681820178 | tx bytes acked by rcvd acks
tcps_preddat | 471697 | 0 | times hdr predict ok for data pkts
tcps_rexmttimeo | 168715 | 5580 | *retransmit timeouts
tcps_sndrexmitpack | 20 | 11092 | *data packets retransmitted
tcps_sndrexmitbyte | 4980 | 7986240 | *data bytes retransmitted
tcps_rcvduppack | 11102 | 0 | *duplicate-only packets received
tcps_rcvdupbyte | 7986240 | 0 | *duplicate-only bytes received
tcps_rcvdupack | 10 | 0 | *rcvd duplicate acks
- | --- | --- |
Flow Table | --- | --- |
- | --- | --- |
err_cwf | 24 | 0 | *client pkt without flow
redirect_rx_ok | 72 | 75 | redirect to rx OK
More information about the Jool-list
mailing list