[Jool-list] Moving to jool

Alberto Leiva ydahhrk at gmail.com
Fri Nov 15 19:09:50 CST 2019


Half-assed but perhaps relevant update, since the upcoming weekend is
going to be four-day long for me:

I found something really weird:

1. If I initialize one IPv6->IPv4 iperf stream, I get an 8 Gbit/sec stream.
2. If I initialize two simultaneous IPv6->IPv4 iperf streams, I get two
   ~5 Gbit/sec streams (10 Gbit/sec total). This is a bit odd.
3. BUT THEN, if I initialize one IPv6->IPv4 iperf stream, and
   (simultaneously) one IPv4->IPv6 stream, I get two 8 Gbit/sec streams
   (16 Gbit/sec total).

And you know what's even weirder? CPU usage remains at 50% (ie. one of
two CPUs used) during all three tests. It's as though CPU usage does
not depend on traffic.

I'm starting to suspect that Jool is not the bottleneck here. I wonder
whether it's even a player at all in the game of performance.

I read somewhere that the kernel tends to assign one CPU to each
interface. (Interfaces might share CPUs if you don't have enough CPUs
I suppose.) This more or less matches the results above and would
simply mean that, if you want to make use of all your CPUs, you'd have
to engineer your traffic so it's spread through different interfaces.
Assuming it scales linearly, you would presumably get 8 Gbit/sec on
every interface receiving traffic.

Perhaps this is just the way it's meant to work on Linux. I've been
consistently finding some sources claiming that haphazardly spreading
traffic through different CPUs might actually degrade performance
rather than improve it: [0][1][2]. The reason they explain sounds
fairly illogical to me, but I'm far from an expert on that whole
cache-swapping magic.

But I've been trying to pull it off anyway, just to see it working for
myself, and I just can't get it to work for some reason. Traffic that
shares an iperf source is always channeled through the same CPU. If
you want to experiment with it, it's called "SMP affinity." Here's
some documentation: [3][4]

[0] https://serverfault.com/a/447647
[1] https://serverfault.com/a/918748
[2] https://www.kernel.org/doc/ols/2009/ols2009-pages-169-184.pdf (Section 1.4)
[3] http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
[4] https://cs.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt

On Thu, Nov 14, 2019 at 11:53 AM Nico Schottelius
<nico.schottelius at ungleich.ch> wrote:
>
>
> Good evening everyone,
>
> thanks for following up and also for adding cool new pictures on the
> jool website - it's cool to see some movements - AND to that guy who
> created an Alpine package - I owe you a drink of choice!
>
> So many things happening... today I've started to look into joold,
> because we are considering to run 2 active routers with the same
> available IPv4 addresses.
>
> Regarding the UDP loss, Stefan Brudny (see mail below) actually
> pointed me to an open bug in iperf that I wasn't aware of either. So
> this might all be a false positive, for which I'm very sorry if it
> costed anyone time!
>
> Best regards and many motivated greetings from the mountains,
>
> Nico
>
> p.s.: From a performance point of view and remembering how P4 lets you
> modify packets, I think jool should be able to handle native forwarding
> / line speed, as the actual modifications are very little and the
> information required even fits into every L1 cache. Minus the obvious OS
> overhead.
>
>
> --------------------------------------------------------------------------------
> From: Stefan Brudny <stefan.brudny at gmail.com>
> To: Nico Schottelius <nico.schottelius at ungleich.ch>
> Subject: Re: [Jool-list] Moving to jool
> Flags: replied, seen
> Date: Thu 07 Nov 2019 11:11:37 PM CET
> Maildir: /ungleich/2019
>
> Gents,
>
> Blind shot for packet loss, I was experiencing some extreme packets loss in
> udp in Azure, not related to any nat64, different service. I was using
> iperf. It turned out that iperf has a bug and sometimes in some
> environments and configurations it misbehaves.
>
> https://github.com/esnet/iperf/issues/296
>
> I used nttcp and udp packet loss dropped from 90 to 0.1%.
>
> BTW, used jool for poc to find solution for pfsense, and it works
> perfectly. Heads up too.
>
> SB
> --------------------------------------------------------------------------------
>
> Alberto Leiva <ydahhrk at gmail.com> writes:
>
> > Ok, I was able to replicate 8 Gbit/sec by using virtualization (since
> j> my physical hardware cannot keep up at all). I can confirm that
> >
> > - according to top, the NAT64 machine refuses to exceed 100% CPU utilization
> >   (which allegedly signifies that only one CPU is being used), and
> > - according to /proc/interrupts, most traffic that shares an incoming
> >   interface also shares CPU:
> >
> >     $ cat /proc/interrupts
> >     CPU0       CPU1
> >     3551       112239    enp0s8
> >     4825321    49        enp0s3
> > (Output trimmed to only relevant rows and columns)
> >
> > I don't know when this started happening, but considering that
> > performance is (in my experience) most people's main concern, I do
> > think this is a problem that needs immediate attention.
> >
> > I don't think this is a Jool bug; it's simply the way the kernel is
> > configured to handle interrupts by default. However, it's certainly
> > worth a note in the documentation, to ease the solution for people who
> > need to squeeze as much performance out of their translator as
> > possible. I just hope it doesn't require a custom kernel...
> >
> > I will try to figure this out and should come back in a few days with
> > more information.
> >
> > ------------------------------------------
> >
> > I still haven't figured out what's with the "Datagram Lost" column.
> > Sometimes iperf's output is quite nonsensical; I have seen it report
> > literally 100% datagram lost rate and yet the reported "speed" is 8
> > Gbit/sec. I don't understand what's up with this. Maybe it's a
> > checksum problem (ie. the packets arrive but the checksum is incorrect
> > so iperf reports them as arrived and lost at the same time), but then
> > it's strange that I can't identify any artifacts in video streams.
> > This needs to be investigated further.
> >
> > Working...
> >
> > On Thu, Nov 7, 2019 at 12:40 PM Nico Schottelius
> > <nico.schottelius at ungleich.ch> wrote:
> >>
> >>
> >> Good evening Jordi, Alberto,
> >>
> >>
> >> JORDI PALET MARTINEZ <jordi.palet at consulintel.es> writes:
> >>
> >> > Hi Nico,
> >> >
> >> > I have read your complete document when you sent it to the list, and I want to thank you for it.
> >> >
> >> > I'm a frequent user of Jool, and teach about it to the community and
> >> > customers.
> >>
> >> Very nice!
> >>
> >> > I was also surprised about your UDP failures, I've never seen that before, so as you just said, it may be due to your specific configuration. I recall having tested Jool the first time in Ubuntu 16.x, but I often try to upgrade the kernel to the latest available release, etc.
> >> >
> >> > In fact, I usually check and adjust myself CPU affinity (even I do that in my OpenWRT routers!).
> >> >
> >> > One suggestion, in case you can invest a bit of extra time on this, so to make your work more comprehensive, will be to test also using VPP:
> >> >
> >> > https://docs.fd.io/vpp/17.07/nat64_doc.html
> >>
> >> Interesting! I have added it to my backlog, I wasn't aware of nat64 in
> >> vpp!
> >>
> >> > I will actually say, if you allow me, "forget Tayga", it doesn't
> >> > scale, isn't longer mantained, and Jool and VPP are much better
> >> > targets to focus on!
> >>
> >> I assumed so. However, there is one really, really big advantage of
> >> tayga: it is included in every distribution. This was actually the
> >> reason why we chose tayga in 2017 for datacenterlight.ch.
> >>
> >> Now that we hit cpu limitations we are more willing to manually maintain
> >> it and it is somewhat "ok", because we only have 6 routers. I'm actually
> >> considering to spend some of our resources to package jool for Alpine
> >> Linux, which is our target os for the new router generation.
> >>
> >> Either way, I have to thank you guys, you did a quite impressive job
> >> with jool!
> >>
> >> Best regards from Switzerland,
> >>
> >> Nico
> >>
> >>
> >> --
> >> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
>
>
> --
> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch


More information about the Jool-list mailing list