2.5 - Can I use Winsock with { My Favorite Language }?
Most programming languages these days have some way of accessing
Winsock, but Winsock is rarely used directly except from C or C++. There
are several reasons for this.
Reason 1: Some languages simply lack the language
features to call the Winsock API. Your language needs the following
to fully use the Winsock API:
- Pointers. (The ability to access a specific piece of memory
by its address.)
- Bitwise operators. (The ability to change specific bits in
a byte.)
- Structures or records. (The ability to define a block of
memory that is an aggregate of simple data elements, such as
two characters followed by a 16-bit integer. This feature must
also allow some measure of control as to how the data is laid
out in memory.)
Reason 2: Many languages rely on some form of component
architecture (e.g. ActiveX) to provide outside services like network
access. Often the language environment comes with basic networking
components, sufficient for most tasks. If your tool didn't come
with the necessary compnents, or the ones it does come with aren't
powerful enough, you may be able to find the functionality you need in a
third-party library, rather than writing
the necessary Winsock code yourself.
Reason 3: Many newer languages – especially cross-platform
scripting languages – include language support for networking.
(Examples include Java, Perl, Python and Tcl.) From the programmer's point
of view, Winsock is rarely a concern when working in these languages.
For these reasons and others, this FAQ is
biased towards C++.
If your language allows direct access to the Winsock API, you may
be able to translate the C++ code in the FAQ into equivalent code in
your chosen language. However, I recommend that you look for sample
code in your chosen language via the Web
Pages section of the FAQ, so you can study working code before you
begin translating.
2.6 - Are there any tools available for debugging Winsock programs?
There are two main categories of debugging tools: network analyzers
(colloquially known as "sniffers") and Winsock shims.
Sniffers are usually software packages that run on one of the LAN's
workstations and, due to the way typical LANs work, capture all of the
traffic going over the LAN. Most sniffers will also decode that traffic
by various degrees. One advantage of a sniffer is that it literally sees
everything about the conversation, including low level protocol details
that aren't available from the Winsock layer. Another is that the good
ones are extremely powerful and configurable. For example, some allow
you to write "protocol plugins" that will decode any protocol, such as
a custom protocol that you've developed.
The other major category of debugging tools are "Winsock shims." A
shim sits between your program and Winsock, usually by "hooking"
the Winsock API. These tools are limited to monitoring events on the
Winsock layer itself, and can only monitor traffic to or from a single
host. That disadvantage is offset by the fact that they can see things
that a sniffer can't, like the exact sequence of Winsock calls and the
parameters used for each call.
There is no clear-cut choice of whether to use a sniffer or a shim to
debug a network program. Sometimes a shim does a better job of showing
you what the problem is with your program, since it shows you exactly
how your program is using Winsock. Conversely, sometimes you need to
see exactly what data is coming out of your machine. Another issue is
that some sniffers don't work on point-to-point links like those used
for PPP and WAN connections, but a shim will work in this situation.
I suggest that you start by picking up one of the free sniffers. Then
you should try out some of payware sniffers' demos, to see if they
have any additional features you've just gotta have. The FAQ has a
Debugging Tools review section that links to most of the
sniffers out on the market today. Once you've found a sniffer you like,
you can then look at the links farther down on that page for pointers
to shims and other debugging tools.
You may also find the FAQ article Debugging TCP/IP useful
for some less-automated methods of debugging a TCP program.
Methods That Don't Work: There are a couple of
debugging tools that are supposed to work that don't, or are too flaky
to deal with. The first is the SO_DEBUG socket option. It simply doesn't
work on Microsoft stacks. The other is the Winsock DLL debugging plugin
dt_dll.dll; this method is flaky. Bob Quinn wrote an article about this,
but unfortunately the site that held it was bought by another company
that hasn't yet made that article available again.
2.7 - How do I get a readable error message from a Winsock error number?
The problem with this question is that it assumes that there is a
"good" canned error message for every situation. The reality is that many
times, you need to know the program's context before you can turn an error
value into a meaningful error message. For example, WSAEFAULT
can mean "Bad pointer passed," or "Passed buffer too small," or even "That
version of the API is not supported." Since the Winsock spec documents
the most likely error values that each function will return, you should
use this information to construct intelligent error handlers.
Still, sometimes an API call returns something unexpected, so a cryptic
error message is better than none at all. In that case, you can just
build a stringtable in your resource file mapping error numbers to error
messages. There is one such RC file for the Winsock 1.1 error values available
here. Alternately, the basic
Winsock tutorial programs in the FAQ include a utility module
(ws_util.cpp) that defines a function for translating Winsock
error numbers into strings.
Note that under Windows 2000 and possibly newer versions of Windows,
the FormatMessage() API will return canned messages for Winsock
error numbers. I've only tested this on Windows 2000, where it works,
and Windows 98, where it fails. I suppose you can be reasonably sure
it will continue to exist in newer versions of Windows. Nevertheless,
I think you're better off spending your time constructing meaningful
error messages based on program context than chasing something that
could never work very well even if it was documented behavior.
2.8 - Winsock keeps returning the error WSAEWOULDBLOCK. What's wrong with my program?
Not a thing. WSAEWOULDBLOCK is a perfectly normal
occurrence in programs using non-blocking and asynchronous sockets. It's
Winsock's way of telling your program "I can't do that right now,
because I would have to block to do so."
The next question is, how do you know when it's safe to try
again? In the case of asynchronous sockets, Winsock will send you an
FD_WRITE message after a failed send() call when it
is safe to write; it will send you an FD_READ message after
a recv() call when more data arrives on that socket. Similarly,
in a non-blocking sockets program that uses select(), the
writefds will be set when it's okay to write, and the
readfds will be set if there is data to read.
Note that Win9x has a bug where
select() can fail to block on a nonblocking socket. It will
signal one of the sockets, which will cause your program to call
recv() or send() or similar. That function will return
WSAEWOULDBLOCK, which can be quite a surprise. So, a program
using select() under Win9x has to be able to deal with this error
at any time.
This gets to a larger issue: whenever you use some form of nonblocking
sockets, you have to be prepared for WSAEWOULDBLOCK at any
time. It's simply a matter of defensive programming, just like checking
for null pointers.
2.9 - How can I test my Winsock application without setting up a network?
There is a special address called the loopback or
localhost address, 127.0.0.1. This lets two programs running on
a single machine talk to each other. The server usually listens for
connections on all available interfaces, and the client connects to
the localhost address. (See the Example Programs section for basic
client and server program code.)
If you have an Internet or LAN connection on your development machine,
you're already set up for this.
For machines without networks, you have to set up a "dummy"
network. Windows NT derivatives have the "Microsoft Loopback Device"
for this very purpose — just add this in the network control panel,
and you'll be able to use the loopback address.
For Windows 95 derivatives, you can try installing Dial Up Networking
and pointing it at an unused serial port. This can be quirky, but
it's possible to limp by with this method. The main problems are when
Dialup Networking (DUN) decides it needs to dial the modem, and finds
that there is no modem on the port you chose. To minimize this problem,
never use name lookup calls like gethostbyname() and turn off DUN's
"automatic dial" feature.
Be warned: behavior through the loopback interface may well be
different from behavior on a network, if only because conditions are
much simpler within a single machine than over a LAN or WAN. You should
try to test your application on a real network, even if you do primary
development on a single machine.
2.11 - Is it possible to close the connection "abnormally"?
Sure, but it's an evil thing to do. :) The simplest way is to
set the SO_LINGER flag to 0 with the setsockopt()
call before you call closesocket(). Another method is to call
shutdown() with the how parameter set to 2 ("both
directions"), possibly followed by a closesocket() call.
"Slamming the connection shut" is only justifiable in a very small
number of cases. You must have fairly deep knowledge of the way TCP
works before you can properly decide to use this technique. Generally,
the perceived need to slam the connection shut comes from a broken
program, either yours or the remote peer. I recommend that you try to
fix the broken program so you don't have to resort to such a questionable
technique.
2.12 - How do I detect when my TCP connection is closed?
All of the I/O strategies discussed in the I/O
strategies article have some way of indicating that the connection
is closed.
First, keep in mind that TCP is a full-duplex network protocol.
That means that you can close the connection half-way and still send data
on the other half. An example is a web browser: it sends a short request
to the web server, then closes its half of the connection. The web server
then sends back the requested data on the other half of the connection,
and closes its sending side, which terminates the TCP session.
Normal TCP programs only close the sending half, which the remote
peer perceives as the receiving half. So, what you normally want to
detect is whether the remote peer closed its sending half, meaning you
won't be receiving data from them any more.
With asynchronous sockets, Winsock sends you an FD_CLOSE
message when the connection drops. Event objects are similar: the system
signals the event object with an FD_CLOSE notification.
With blocking and non-blocking sockets, you probably have a loop
that calls recv() on that socket. recv() returns 0 when
the remote peer closes the connection. As you would expect, if you are
using select(), the SOCKET descriptor in the read_fds
parameter gets set when the connection drops. As normal, you'll call
recv() and see the 0 return value.
As you might have guessed from the discussion above, it is also
possible to close the receiving half of the connection. If the
remote peer then tries to send you data, the stack will drop that data
on the floor and send a TCP RST to the remote peer.
See below for information on handling
abnormal disconnects.
2.13 - How do I detect an abnormal network disconnect?
The previous question deals with detecting
when a protocol connection is dropped normally, but what if you want
to detect other problems, like unplugged network cables or crashed
workstations? In these cases, the failure prevents notifying the remote
peer that something is wrong. My feeling is that this is usually a
feature, because the broken component might get fixed before anyone
notices, so why demand that the connection be reestablished?
If you have a situation where you must be able to detect all network
failures, you have two options:
The first option is to give the protocol a command/response
structure: one host sends a command and expects a prompt response
from the other host when the command is received or acted upon. If the
response does not arrive, the connection is assumed to be dead, or at
least faulty.
The second option is to add an "echo" function to your protocol,
where one host (usually the client) is expected to periodically send out
an "are you still there?" packet to the other host, which it must promptly
acknowledge. If the echo-sending host doesn't receive its response or the
receiving host fails to see an echo request for a certain period of time,
the program can assume that the connection is bad or the remote host
has gone down.
If you choose the "echo" alternative, avoid the temptation to use
the ICMP "ping" facility for this. If you did it this way, you would
have to send pings from both sides, because Microsoft stacks won't let
you see the other side's echo requests, only responses to your own echo
requests. Another problem with ping is that it's outside your protocol,
so it won't detect a failed TCP connection if the hardware connection
remains viable. A final problem with the ping technique is that ICMP is
an unreliable protocol: does it make a whole lot of sense to use an
unreliable protocol to add an assurance of reliability to another
protocol?
Another option you should not bother with is the TCP keepalive
mechanism. This is a way to tell the stack to send a packet out over
the connection at specific intervals whether there's real data to send
or not. If the remote host is up, it will send back a similar reply
packet. If the TCP connection is no longer valid (e.g. the remote host
has rebooted since the last keepalive), the remote host will send back
a reset packet, killing the local host's connection. If the remote host
is down, the local host's TCP stack will time out waiting for the reply
and kill the connection.
There are two problems with keepalives:
- Only Windows 2000 and its successors allow you to change
the keepalive time on a per-process basis. On older versions
of Windows, changing the keepalive time changes it for all
applications on the machine that use keepalives. (Changing
the keepalive time is almost a necessity since the default is
2 hours.)
- Each keepalive packet is 40 bytes of more-or-less useless
data, and there's one sent each direction as long as the
connection remains valid. Contrast this with a command/response
type of protocol, where there is effectively no useless data:
all packets are meaningful. In fairness, however, TCP keepalives
are less wasteful on Windows 2000 and its successors than the
"are you still there" strategy above.
Note that different types of networks handle physical disconnection
differently. Ethernet, for example, establishes no link-level connection,
so if you unplug the network cable, a remote host can't tell that its
peer is physically unable to communicate. By contrast, a dropped PPP link
causes a detectable failure at the link layer, which propagates up to
the Winsock layer for your program to detect.
2.14 - How can I change the timeout for a Winsock function?
Some of the blocking Winsock functions (e.g. connect())
have a timeout embedded into them. The theory behind this is that only
the stack has all the information necessary to set a proper timeout.
Yet, some people find that the value the stack uses is too long for
their application; it can be a minute or longer.
Under Winsock 2, you can set the SO_SNDTIMEO and
SO_RCVTIMEO options with setsockopt() to change the
timeouts for send() and recv().
Unfortunately, the Winsock spec does not document a way to change
many other timeout values, and the above advice doesn't apply to
Winsock 1.1.
The solution is to avoid blocking sockets altogether. All of the
non-blocking socket methods lend themselves to timeouts:
- Non-blocking sockets with
select() – The
fifth parameter to the select() function is a timeout value.
- Asynchronous sockets – Use the Windows API
SetTimer().
- Event objects –
WSAWaitForMultipleEvents()
has a timeout parameter.
- Waitable Timers – These are a new feature in
Windows 98 and NT 4.0 SP3 and higher. A waitable timer is an
object like a semaphore, except that the OS signals it at a future
time that you specify. You create them with the Win32 function
CreateWaitableTimers(). So, you could wait on a 5-second
timer as well as your event objects; if nothing happens on the
sockets within 5 seconds, Windows will signal the timer, thus
breaking you out of the WaitForMultipleObjects() call.
Note that with asynchronous and non-blocking sockets, you may be able
to avoid handling timeouts altogether. Your program continues working even
while Winsock is busy. So, you can leave it up to the user to cancel an
operation that's taking too long, or just let Winsock's natural timeout
expire rather than taking over this functionality in your code.
2.16 - What is out-of-band data (MSG_OOB), and why is it bad?
Out-of-band (OOB) data is like a second data channel. The intent is
to use the regular TCP data stream for most data and the OOB stream for
"emergency" messages. The telnet protocol uses this for "interrupt"
keystrokes like Ctrl-C, so that they don't have to wait on the remote
peer to handle regular TCP data before the interrupt occurs. You can send
OOB data by passing the MSG_OOB flag to send() and receive it by
passing MSG_OOB to recv(). You can also get OOB data by setting
the SO_OOBINLINE flag with setsockopt().
OOB data is a useful concept, but unfortunately there are two
conflicing interpretations of how OOB data should be handled at the stack
level: the original description of OOB in the TCP protocol specification
(RFC 793) was superceded by the
"host requirements" spec (RFC
1122), but there are still many machines with RFC 793 OOB
implementations. Section 3.5 in the Winsock 2 spec (version 2.2.2,
as of this writing) discusses OOB, with details on why RFC 793 vs. RFC
1122 is a problem in section 3.5.2.
OOB also isn't a fully functional second data channel: it's rather
limited. So, never use OOB except when implementing legacy protocols
like telnet which demand it. You can get reliable OOB-like behavior by
simply using two data connections: one for normal data, and the second
for emergency data.