This is an attempt to describe, at a very basic level, what sequences of events occur when a FreeBSD host sends and receives network packets. This document is based mostly on reading the source code and referencing the book, "Design and Implementation of the 4.4 BSD Operating System". This document is bound to contain errors for which I apologise in advance!
The FreeBSD base system source code is in the directory /usr/src on a FreeBSD box. The kernel is in the directory /usr/src/sys. All paths in this document are relative to the kernel source directory.
The kernel source tree has a number of sub-directories. Important ones are:
| Directory | Contains |
| sys/ | Lots of system header files (which get installed to /usr/include/sys and form the basis of the kernel<->user interface |
| net/ | Code for network device types (e.g. ethernet, atm, ppp etc) |
| dev/ | Device drivers (e.g. for RAID support) |
| pci/ | PCI Device drivers (e.g. most network cards) |
| netinet/ | Code for the Internet Protocols |
| kern/ | Low-level kernel stuff (memory, syscalls etc) |
Useful links:
The ethernet hardware will receive a frame, check it's CRC and place it in memory somewhere. The driver will notice either when it polls the device status some time later of when notified via interrupt.
For example, consider the RealTek 8129/8139 driver in pci/if_rl.c:
...
rl_rxeof(sc)
struct rl_softc *sc;
{
struct mbuf *m;
struct ifnet *ifp;
...
ifp->if_ipackets++;
(*ifp->if_input)(ifp, m);
return;
}
The struct mbuf is the entity which holds the data received from the network. mbufs are fixed size chunks (fixed size to help prevent memory fragmentation) and can be easily chained together to represent larger pieces of data. There are a number of programmatic idioms for managing mbufs which will be described later.
The struct ifnet (defined in net/if_var.h represents the network device to the system. The structure contains information like the interface name, state of the interface (up/down, broadcast), information about the device's capabilities (e.g. some devices are able to support larger-than-normal ethernet vlan frames while others cannot) and a bunch of function pointers. Many kernel entities are generic structs containing function pointers; rather like virtual base classes in C++. A specific network type (such as ethernet) will point its struct ifnet function pointers to the appropriate ethernet functions. Of particular interest to us are:
| struct ifnet member | arguments | points to |
| if_output | struct ifnet *, struct mbuf *, struct sockaddr * struct rtentry * | net/if_ethersubr.c:ether_output |
| if_input | struct ifnet *, struct mbuf * | net/if_ethersubr.c:ether_input |
In our case, the call to (*ifp->if_input)(ifp, m) leads to net/if_ethersubr.c and the function ether_input(struct ifnet *, struct mbuf *). The function looks like the following:
ether_input(struct ifnet *ifp, struct mbuf *m)
{
struct ether_header *eh;
u_short etype;
...
The struct ether_header type is defined in net/ethernet.h as:
struct ether_header {
u_char ether_dhost[ETHER_ADDR_LEN];
u_char ether_shost[ETHER_ADDR_LEN];
u_short ether_type;
};
The ether_input function contines with a couple of sanity checks and then
...
eh = mtod(m, struct ether_header *);
etype = ntohs(eh->ether_type);
...
The mtod macro (defined in sys/mbuf.h) takes an mbuf and returns a pointer to the
data portion of the mbuf nicely cast to the correct
type. Since we've received an ethernet frame, we expect the initial part of
the data to be an ethernet header. The ntohs function
performs network to host endianness conversion.
After a few more sanity checks the ether_input function passes the mbuf pointer to the Berkeley Packet Filters (BPF):
...
BPF_MTAP(ifp, m);
...
Shortly later, the ethernet frame is passed to the "netgraph" ethernet module, if the module has been loaded.
...
/* Handle ng_ether(4) processing, if any */
if (ng_ether_input_p != NULL) {
(*ng_ether_input_p)(ifp, &m);
if (m == NULL)
return;
}
...
ng_ether_input_p is a function pointer defined in
global scope in the net/if_ethersubr.c file. When the
netgraph module is loaded it registers itself by assigning this function
pointer to point to it's own input routine. Using function pointers as
hooks is a common pattern in the network stack. If you want to add extra
processing to the stack this is a fairly non-invasive way of doing it.
Next the routine considers any possible bridging actions:
...
/* Check for bridging mode */
if (BDG_ACTIVE(ifp) ) {
...
}
...
before finally calling the ethernet demultiplexing routine
...
ether_demux(ifp, m);
...
}
In the same file (net/if_ethersubr.c) control proceeds to the function
void
ether_demux(struct ifnet *ifp, struct mbuf *m)
{
struct ether_header *eh;
int isr;
...
eh = mtod(m, struct ether_header *);
...
After a few sanity checks and stats gathering functions we examine the type of ethernet frame:
...
ether_type = ntohs(eh->ether_type);
...
/*
* Handle protocols that expect to have the Ethernet header
* (and possibly FCS) intact.
*/
switch (ether_type) {
case ETHERTYPE_VLAN:
...
}
...
/* Strip off Ethernet header. */
m_adj(m, ETHER_HDR_LEN);
...
switch (ether_type) {
case ETHERTYPE_IP:
if (ipflow_fastforward(m))
return;
isr = NETISR_IP;
break;
case ETHERTYPE_ARP:
...
Note that the type fields are all defined in net/ethernet.h (these are all IEEE standard numbers).
The function m_adj increments the
mbuf start-of-data pointer to point to the data immediately after the
ethernet header.
Consider what happens if the ether_type is ETHERTYPE_IP and the packet contains an IP packet (so the data after the ethernet header is an IP header). The function ipflow_fastforward defined in netinet/ip_flow.c exists to speed up forwarding of IP datagrams. It returns 0 (for example) if the system is not a router. Returning to ether_demux, after the ether_type switch the control passes to the code
...
netisr_dispatch(isr, m);
return;
The file net/netisr.c defines the function netisr_dispatch:
void
netisr_dispatch(int num, struct mbuf *m)
{
struct netisr *ni;
isrstat.isrs_count++;
KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))),
("bad isr %d", num));
ni = &netisrs[num];
if (ni->ni_queue == NULL) {
m_freem(m);
return;
}
...
This function looks up the struct netisr for the
particular protocol. There is an array of these structs defined at the top
of the file:
struct netisr {
netisr_t *ni_handler;
struct ifqueue *ni_queue;
} netisrs[32];
Supported protocols must register themselves (via
netisr_register) in advance. The dispatch function
makes an initial attempt to process the packet immediately and falls back
to queueing if this is not possible. The netisr_queue
function is simpler and only queues the packets.
Note that in both cases, the schednetisr function is
called. This tells the kernel that some processing needs to be done later
and lets the function return. I think that the functions up to know have
been running at high priority from an interrupt handler whereas most
complex protocol processing happens with interrupts reenabled, by another
part of the kernel.
The file netinet/ip_input.c has the following code:
static struct ifqueue ipintrq;
...
void
ip_init()
{
...
netisr_register(NETISR_IP, ip_input, &ipintrq);
}
This function is called early on during system boot to register the
function ip_input and the packet queue
struct ifqueue ipintrq with the netisr system.
The function ip_input function looks much like the ether_demux function we saw earlier. It does lots of sanity checks, grabs the IP header from the mbuf and demultiplexes the packet to the appropriate IP protocol handler.
Written by David Scott