4.1 - How can I open a raw data socket?
Under Winsock 1.1, the SOCK_RAW socket type is optional. Some of
the non-Microsoft stacks implemented it, but these implementations are
essentially extinct. SOCK_RAW in Winsock 1.1 is also problematic because
the Winsock spec's writers did not try to rigorously define what we
should expect from a SOCK_RAW implementation.
The Winsock 2 spec defines raw socket behavior, and Microsoft's
Winsock 2 stacks do implement some types of raw sockets.
Windows 2000 (and its successors) has by far the best implementation
of raw sockets; details below.
On older versions of Windows with Winsock 2, raw socket support is
fairly sparse: Microsoft only supports raw IGMP and ICMP sockets on these
platforms. The latter allows you to send "ping"
packets in a standard way. These stacks do not support raw IP, packet capturing, or changing
packet headers from the Winsock layer.
Available raw sockets support in Microsoft stacks:
Notice that raw TCP and UDP aren't possible directly under Winsock 2.
Instead, you must use IP_HDRINCL (a.k.a. raw IP) and build your own IP
and TCP or UDP headers.
Under Windows NT derivatives, only users that are members of the
Administrator group can open raw sockets.
4.2 - How can I capture packets on a LAN with Winsock?
Winsock does not allow promiscuous IP packet captures. To get at
raw packet data, you have to bypass Winsock and talk to the Transport
Data Interface (TDI) or Network Device Interface Specification
(NDIS) layers. The TDI layer is just above the system's NDIS (network
driver) layer.
Some of the Windows packet sniffers in the FAQ's
debugging resources section
include source, which you could pick apart to figure out
how this works. Probably the easiest one to work with is
WinDump, because its capture
code is separated into a free library called WinPCap. If you're familiar
with the Unix libpcap mechanism, you should be able to pick up WinPCap
quickly. For a second example of a program that uses WinPCap, see
Ethereal.
There are also some libraries in
the Resources section that provide various types of raw socket
access. I have not tried any of these products, so I can't say how
well they work. At the time of writing, the relevant libraries on
that page are the Komodia TCP/IP
Library, LibnetNT and
WinDis32.
PCAUSA — the makers of WinDis32 — also has several FAQs that talk
about various low-level network stack access methods. These FAQs also
point you to various bits of sample code, most of it from Microsoft's
various DDKs.
4.3 - How can I change the IP or TCP header of a packet?
The Winsock stack in Windows 2000 and its successors can do this
with raw sockets; the stack in older versions
only allow you to set a few IP header fields with setsockopt()
and/or ioctlsocket(). One such field is TTL.
If you need more complete control, you will have to resort to
lower-level techniques. One of these is to add a layer to the network
stack with Winsock 2's Layered Service Provider mechanism. That
mechanism is not covered in this FAQ, but there is some useful code
and documentation on the MSDN
site and disks.
Another option is to do raw data I/O using the Transport
Data Interface (TDI) or the Network Driver Interface
Specification (NDIS). Further information is available in PCAUSA's FAQs.
Also, don't rule out the option of building your application on a
platform that does have easy access to the packet headers. Most
Unix flavors (including Linux) offer copious tools for low-level network
I/O. For information on raw network programming on Unixlike platforms,
see Thamer Al-Herbish's Raw IP
Networking FAQ.
4.4 - How can I "ping" another machine with Winsock?
The "official" method uses the IPPROTO_ICMP raw socket type
defined by Winsock 2. All of Microsoft's Winsock 2 stacks support
this. [C++ example]
The other method uses ICMP.DLL, which comes with Windows and works
only with the native Microsoft stack. Though ICMP.DLL comes with all
versions of Windows as of this writing, Microsoft discourages its use
in the strongest terms possible, claiming that the API will disappear
as soon as a better method exists. (It hasn't actually happened yet,
despite several years of threats. :) ) ICMP.DLL's main advantage is that
it works with Microsoft's older Winsock 1.1 stacks. It isn't as flexible
as the raw sockets method. [C++ example]
Many programs misuse ping. Naturally it has good uses, but it's a
sign of a broken program or protocol if you find yourself resorting
to regular use of ping packets. The most common case of ping abuse is
when the program needs to detect
dropped connections. See that FAQ item for better solutions to
this problem.
4.7 - How do I get the MAC (a.k.a. hardware) address of the local Ethernet adapter?
This FAQ has example code for two hackish methods and one complex
but reliable method.
The first method involves asking the
NetBIOS API for the adapter addresses. This method will fail on systems
where NetBIOS isn't present, and it sometimes gives bogus answers.
There is a second method that depends
on a property of the RPC/OLE API. This property is documented but not
guaranteed to do what we want, and in fact it fails in a number of
situations. (Details in the example program's commentary.) As a result,
I have to recommend that you give this method a miss.
The third method uses the
sparsely-documented SNMP API to get MAC addresses. This method seems
to work all the time, but it's far more complex than the other two
methods.
There is one other method for which I don't yet have an example:
the IP Helper API has a function called GetIfTable() which
returns a table containing MAC addresses, among many other tasty bits
of info. This method only works on Windows 98 and its successors and on
Windows NT derivatives. Reportedly, you have to use LoadLibrary()
to dig this function out of iphlpapi.dll, as it isn't exposed for direct
linking. It's just as well, since implicitly linking to iphlpapi.dll
will make your program fail to run on older versions of Windows.
There are some lower-level
methods in PCAUSA's NDIS FAQ that may also be helpful to you.
4.8 - How many simultaneous sockets can I have open with Winsock?
On Windows 95 derivatives, there's a quite-low
limit imposed by the kernel: 100 connections. You
can increase this limit by editing the registry key
HKLM\System\CurrentControlSet\Services\VxD\MSTCP\MaxConnections. On
Windows 95, the key is a DWORD; on Windows 98/ME, it's a string. I've
seen some reports of instability when this value is increased to more
than a few times its default value.
On Windows NT derivatives, anecdotal evidence puts the limit
somewhere in the 1000s of connections neighborhood if you use overlapped
I/O. (Other I/O strategies hit their own
performance limits on Windows before you get to thousands of simultaneous
connections.) The specific limit is dependent on how much physical memory
your server has, and how busy the connections are:
The Memory Factor: According to Microsoft, the
WinNT and successor kernels allocate sockets out of the non-paged memory
pool. (That is, memory that cannot be swapped to the page file by the
virtual memory subsystem.) The size of this pool is necessarily fixed,
and is dependent on the amount of physical memory in the system. On Intel
x86 machines, the non-paged memory pool stops growing at 1/8 the size
of physical memory, with a hard maximum of 128 megabytes for Windows NT
4.0, and 256 megabytes for Windows 2000. Thus for NT 4, the size of the
non-paged pool stops increasing once the machine has 1 GB of physical
memory. On Win2K, you hit the wall at 2 GB.
The "Busy-ness" Factor: The amount of data
associated with each socket varies depending on how that socket's used,
but the minimum size is around 2 KB. Overlapped I/O buffers also eat
into the non-paged pool, in blocks of 4 KB. (4 KB is the x86's memory
management unit's page size.) Thus a simplistic application that's
regularly sending and receiving on a socket will tie up at least 10 KB
of non-pageable memory. Assuming that simple case of 10 KB of data per
connection, the theoretical maximum number of sockets on NT 4.0 is about
12,800s, and on Win2K 25,600.
I have seen reports of a 64 MB Windows NT 4.0 machine hitting the
wall at 1,500 connections, a 128 MB machine at around 4,000 connections,
and a 192 MB machine maxing out at 4,700 connections. It would appear
that on these machines, each connection is using between 4 KB and 6
KB. The discrepancy between these numbers and the 10 KB number above
is probably due to the fact that in these servers, not all connections
were sending and receiving all the time. The idle connections will only
be using about 2 KB each.
So, adjusting our "average" size down to 6 KB per socket, NT 4.0
could handle about 22,000 sockets and Win2K about 44,000 sockets. The
largest value I've seen reported is 16,000 sockets on Windows NT 4.0. This
lower actual value is probably partially due to the fact that the entire
non-paged memory pool isn't available to a single program. Other running
programs (such as core OS services) will be competing with yours for
space in the non-paged memory pool.
4.9 - What are the "64 sockets" limitations?
There are two 64-socket limitations:
The Win32 event mechanism (e.g. WaitForMultipleObjects())
can only wait on 64 event objects at a time. Winsock 2 provides the
WSAEventSelect() function which lets you use Win32's event mechanism
to wait for events on sockets. Because it uses Win32's event mechanism,
you can only wait for events on 64 sockets at a time. If you want to
wait on more than 64 Winsock event objects at a time, you need to use
multiple threads, each waiting on no more than 64 of the sockets.
The select() function is also limited in certain situations
to waiting on 64 sockets at a time. The FD_SETSIZE constant defined
in winsock.h determines the size of the fd_set structures you pass to
select(). It's defined by default to 64. You can define this
constant to a higher value before you #include winsock.h, and this will
override the default value. Unfortunately, at least one non-Microsoft
Winsock stack and some Layered Service Providers assume the default of
64; they will ignore sockets beyond the 64th in larger fd_sets.
You can write a test program to try this on the systems you plan on
supporting, to see if they are not limited. If they are, you can get
around this with threads, just as you would with event objects.
4.10 - How do I make Winsock use a specific network interface?
On a machine with multiple network interfaces (a modem for dialup
Internet and a LAN card, for example), it can sometimes be useful to
force Winsock to use a specific interface. Before I go into how, keep
in mind that the routing layer of the stack exists to handle this for
you. If your setup isn't working the way you want, maybe you just need to
change the routing tables. (This is done with the "route" and "netstat"
command-line programs on Microsoft stacks.)
There are two common reasons why you might want to force Winsock to
use a particular network interface. The first is when you only want
your server program to handle incoming connections on a particular
interface. For example, if you have an NT machine set up as an Internet
gateway, and it also runs a server that you only want internal LAN users
to be able to access, you will want to set it to only listen on the
LAN interface. The other reason is that you have two or more possible
outgoing routes, and you want your client program to connect using a
particular one without the routing layer getting in the way.
You can do both of these things with the bind() function. Using
one of the "get my IP addresses" examples,
you can present your user with a list of possible addresses. Then they
can pick the appropriate address to use, which your program will use
in the bind() call. Obviously, this is only feasible for programs
intended for advanced users.
Incidentally, this is how virtual hosting on the Internet works. A
single server is set up with a single network card but several IP
addresses. Windows NT derivatives can do this, but Win95 derivatives
cannot. To set this up in NT, go into the TCP/IP area of the Network
control panel, and then click the Advanced button. IIRC, you can enter
up to five network addresses per interface in NT Workstation, perhaps
more in NT Server.
Note that this information does not apply to the Win9x multihomed
computer Dialup Networking
bug. This problem cannot be fixed by bind()ing to the LAN
interface in an effort to force the OS to use it exclusively. The problem
is due to a bug in the OS's name resolver. See the DUN bug FAQ item
for workarounds.
4.13 - Is it a bad idea to bind() to a particular port in a client program?
It's occasionally justifiable, but most of the time it's a very
bad idea.
I've only heard of two good uses of this feature. The first is
when your program needs to bind to a port in a particular range. Some
implementations of the Berkeley "r commands" (e.g. rlogin, rsh, rcp,
etc.) do this for security purposes. Because only the superuser on
a Unix system can bind to a low-numbered port (1-1023), such an r
command tries, sequentially, to bind to one of the ports in this range
until it succeeds. This allows the remote server to surmise that if the
connection is coming from a low-numbered port, the remote user must be a
superuser. (This port range limit also applies on Windows NT derivatives,
but not on Windows 95 derivatives.)
The second justifiable example is FTP in its "active" mode: the
client binds to a random port and then tells the server to connect to
that port for the next data transfer (whether it is an upload, download,
or a file listing). This is justifiable because it arguably cleans up the
protocol, and the FTP client doesn't need to bind to any particular port,
it just needs to bind to a port. (Incidentally, it does this by
binding to port 0 — the stack chooses an available port when you
do this.) This is also justifiable because the FTP client is acting as a
server in this case, so it makes sense that it has to bind to a port.
By contrast, it is almost always an error to bind to a
particular port in a client. (Notice that both of the above
examples are flexible about the ports they bind to.) To see why this
is bad, consider a web browser. They often create several connections
to download a single web page, one each to fetch all of the individual
pieces of the page: images, applets, sound clips, etc. If they always
bound to a particular local port, they could only have one connection
going at a time. Also, you couldn't have a second instance of the web
browser downloading another page at the same time.
That's not the biggest problem, though. When you close a TCP
connection, it goes into the TIME_WAIT state for a short period
(between 30 and 120 seconds, typically), during which you cannot reuse
that connection's "5-tuple:" the combination of {local host, local port,
remote host, remote port, transport protocol}. (This timeout period
is a feature of all correctly-written TCP/IP stacks, and is covered
in RFC 793 and especially
RFC 1122.) In practical
terms, this means that if you bind to a specific port all the time,
you cannot connect to the same host using the same remote port until
the TIME_WAIT period expires. I have personally seen anomalous
cases where the TIME_WAIT period does not occur, but when this
happens, it's a bug in the stack, not something you should count on.
For more on this matter, see the
Lame List.
4.14 - What is the connection backlog?
When a connection request comes into a
network stack, it first checks to see if any
program is listening on the requested port. If so, the stack replies to
the remote peer, completing the connection. The
stack stores the connection information in a queue called the connection
backlog. (When there are connections in the backlog, the accept()
call simply causes the stack to remove the oldest connection from the
connection backlog and return a socket for it.)
The purpose of the listen() call is to set the size of the
connection backlog for a particular socket. When the backlog fills up,
the stack begins rejecting connection attempts.
Rejecting connections is a good thing if your program is written to
accept new connections as fast as it reasonably can. If the backlog fills
up despite your program's best efforts, it means your server has hit its
load limit. If the stack were to accept more connections, your program
wouldn't be able to handle them as well as it should, so the client will
think your server is hanging. At least if the connection is rejected,
the client will know the server is too busy and will try again later.
The proper value for backlog depends on how many connections
you expect to see in the time between accept() calls. Let's say
you expect an average of 1000 connections per second, with a burst
value of 3000 connections per second. [Ed. I picked these values
because they're easy to manipulate, not because they're representative
of the real world!] To handle the burst load with a short connection
backlog, your server's time between accept() calls must be under
0.3 milliseconds. Let's say you've measured your time-to-accept under
load, and it's 0.8 milliseconds: fast enough to handle the normal load,
but too slow to handle your burst value. In this case, you could make
backlog relatively large to let the stack queue up connections
under burst conditions. Assuming that these bursts are short, your
program will quickly catch up and clear out the connection backlog.
The traditional value for listen()'s backlog parameter
is 5. On some stacks, that is also the maximum value: this includes the
worstation class Windows NT derivatives and on Windows 95 derivatives. On
the server-class Windows NT derivatives, the maximum connection backlog
size is 200, unless the dynamic backlog feature is enabled. (More info
on dynamic backlogs below.) The stack will use its maximum backlog value
if you pass in a larger value. There is no standard way to find out what
backlog value the stack chose to use.
If your program is quick about calling accept(), low backlog
limits are not normally a problem. However, it does mean that concerted
attempts to make lots of connections in a short period of time can fill
the backlog queue. This makes non-Server flavors of Windows a bad choice
for a high-load server: either
a legitimate load or a SYN flood attack can overload a server on such
a platform. (See below for more on SYN attacks.)
There is a special constant you can use for the backlog size,
SOMAXCONN. This tells the underlying service provider to
set the backlog queue to the largest possible size. This is defined as
5 in winsock.h, and 0x7FFFFFFF in winsock2.h. The Winsock.h definition
limits its value somewhat.
There are even better reasons not to use SOMAXCONN:
that large backlogs make SYN flood attacks much more, shall we say,
effective. When Winsock creates the backlog queue, it starts small and
grows as required. Since the backlog queue is in non-pageable system
memory, a SYN flood can cause the queue to eat a lot of this precious
memory resource.
After the first SYN flood attacks in 1996, Microsoft added a feature
to Windows NT called "dynamic backlog". (The feature is in service pack
3 and higher.) This feature is normally off for backwards compatibility,
but when you turn it on, the stack can increase or decrease the size
of the connection backlog in response to network conditions. (It
can even increase the backlog beyond the "normal" maximum of 200,
in order to soak up malicious SYNs.) The Microsoft Knowledge Base
article that describes the feature also has
some good practical discussion about connection backlogs.
You will note that SYN attacks are dangerous for systems with both
short and very long backlog queues. The point is that a middle ground is
the best course if you expect your server to withstand SYN attacks. Either
use Microsoft's dynamic backlog feature, or pick a value somewhere in
the 20-200 range and tune it as required.
A program can rely too much on the backlog feature. Consider a
single-threaded blocking server: the design means it can only handle
one connection at a time. However, it can set up a large backlog,
making the stack accept and hold connections until the program gets
around to handling the next one. (See this
example to see the technique at work.) You should not take advantage
of the feature this way unless your connection rate is very low and the
connection times are very short. (Pedagogues excepted.)