2014-05-11

High cpu load with freebsd ipfw nat

This post is about my analysis of high cpu load in freebsd while using ipfw (kernel) nat.


Intro

 

Ipfw is one of three available firewalls in FreeBSD. It has NAT functionality: network address translation.
NAT is implemented by libalias library that is used not only in kernel ipfw but also in userland natd, userland ppp and kernel ng_nat.

Ipfw nat can be well paralleled by using multiple nat instances. E.g. if you have block of 256 IPv4 addresses you can run nat instance for each address and these 200+ instances will be scheduled at all your 8 or 16 cpu cores.


The problem

 

At some moments one or few processes
kernel{igb0 que}
start consuming a lot of cpu. And at 100% load of some single core it causes high packet loss.

If you are using net.isr.direct=1 I suppose it will looks like high load caused by interrupts e.g.
intr{irq264: igb0:que}


Investigation

 

The first thing that was useful is log option for ipfw nat. After creating nat instances with
ipfw nat 42 config ip <pub_addr> log
you will be able to check some statistics for specific instance invoking
ipfw nat show 42
or
ipfw nat show
to check all instances.

Output will looks like
nat 10241: icmp=0, udp=2663, tcp=13002, sctp=0, pptp=0, proto=0, frag_id=0 frag_ptr=0 / tot=15665

It is small amount of data but sometimes it can be helpful.
In most cases I use
ipfw nat show | grep -E 'tot=[0-9]{5}'
because I have a lot of instances and I'm interested only in instances with session count >= 10'000

My NAT box dies when session count is near 45k

Another thing that is very useful is pmcstat
https://wiki.freebsd.org/PmcTools/CallchainCaptureAndAnalysis
It requires kernel option but it very helpful for investigation. Using it I have actually found that problem was in LibAlias.


LibAlias at high level

 

I will skip things related to port/protocol/address redirection, etc.
So how libalias works (it is my view on problem, I can misunderstand something, so no warranty):

There is database that  stores information about all sessions going through NAT.
Most important are (it is not libalias terminology )
internal_addr, internal_port, alias_addr, alias_port, outside_addr, outside_port

Internal address and port are address and port from internal network that is NATed.
Alias address and alias port are address and port which outgoing packet will have after translation. Alias address is often the NAT server's public ip address.
And outside address and port belongs to some endpoint (server) in the internet.

IP packet that goes from inside net to internet is checked to be present in sessions db.
If it is present then its source changed from
internal_addr:internal_port to alias_addr:alias_port.
If not libalias will create new session record. It is called link in terms of libalias.

Packet from internet usually does not create session/link (we are skipping redirect stuff).
So if it is in db it's destination
alias_addr:alias_port changed to internal_addr:internal_port
If not in db packet is either not processed by libalias or dropped.


LibAlias internals

 

Sessions or links are stored in 2 hash tables: linkTableIn and linkTableOut.
And  there is two hashing functions: StartPointIn for linkTableIn  and StartPointOut for linkTableOut.
StartPointOut hashing is just bad (but working) and StartPointIn hashing is VERY bad.

StartPointIn produce hash by using only 3 parameters:
protocol - it is TCP or UDP in 99% cases - so almost no randomness
alias_addr - it is constant for single instance - no randomness at all
alias_port - 16 bit value.

Why 'same_ports' option is evil

 

LibAlias has option called 'same_ports'. It is related to link creation process.
If same_ports is enabled then LibAlias tries to assign alias_port that equals to original internal_port.
And in most cases LibAlias will succeed.

So lets imagine some situation close to real life. Some host in inside network has installed some crazy torrent client. Or host is infected by some malware.
Software binds to some port e.g. 666 and start sending packets to big amount of different hosts.
All that packets will have same protocol same internal_addr and same internal_port.

In LibAlias all these packets will be aliased to same alias_addr and alias_port (due same_ports option).

In linkTableIn table all that link will be put into the same bucket. And we will have a long linked list with complexity to search in it as O(N). Since LibAlias checks the linkTableIn on each link creation and we have to create N new link the complexity will be O(N^2).

As I have mentioned above my NAT server dies around 45k sessions.

If we have some replays to our packets then situation becomes even worse - server dies faster at lower session count. That is because all incoming packets are checked to be present in linkTableIn.

Questions 

 

Would increasing LINK_TABLE_IN_SIZE and LINK_TABLE_OUT_SIZE help?

No, it would not. The problem is in hash function. This function will provide the same value and all the links will be putted to same (one) table bucket.

Will situation be better without same_ports?

Yes, it will. alias_port will be random value from 32768 to 65535 and hash function will spread links in linkTableIn in proper way. So with default table size
#define LINK_TABLE_IN_SIZE        4001
average list size in one bucket will be 10 elements instead of 40k.

But I do not use same_ports or turned it off

 

Actually it is possible that you use.
I found that (today is 2014-05-11) freebsd has a bug related to that. Option same_ports is not passed from ipfw to LibAlias. And by default it is turned on in LibAlias.

PR is http://www.freebsd.org/cgi/query-pr.cgi?pr=189655

(Reminder: update this post when bug is fixed)

How to check if same_ports option is active

 

Notice that
ipfw nat show config
does not show the real situation.

I have checked with small python script on internal host

import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('', 2000))
sock.sendto('test', ('1.1.1.1', 1234))

and have tcpdump running at NAT server listening outgoing traffic
tcpdump -n -i igb1 host 1.1.1.1

If packets are sent with src port 2000 then same_ports option is active.
If outgoing port is some random number then no.

How to fix it

 

Wait until freebsd bug will be fixed and do not use same_ports.
I was not able to wait so I have pached LibAlias and recompiled kernel.
It is not good idea to do that because it is ugly hack. But...

How it should be fixed in proper way

 

My opinion that some parts of LibAlias should be updated.
I do not blame the code since HISTORY file tells me that this code was written about 20 year ago.

I have some thoughts about that but it is not related to this post.


2 comments:

  1. The bug has been squashed as of May 2014. So 9.3 and 10.1 should be unaffected by the bug.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete