Path Maximum Transmission
Unit (PMTU) Discovery
PMTU
discovery is described in RFC 1191. When a connection is
established, the two hosts involved exchange their TCP
maximum segment size (MSS) values. The smaller of the two
MSS values is used for the connection. Historically, the MSS
for a host has been the MTU at the link layer minus 40 bytes
for the IP and TCP headers. However, support for additional
TCP options, such as time stamps, has increased the typical
TCP+IP header to 52 or more bytes.
When TCP
segments are destined to a non-local network, the Don't
Fragment bit is set in the IP header. Any router or media
along the path can have an MTU that differs from that of the
two hosts. If a media segment has an MTU that is too small
for the IP datagram being routed, the router attempts to
fragment the datagram accordingly. It then finds that the
Don't Fragment bit is set in the IP header. At this point,
the router should inform the sending host that the datagram
can not be forwarded further without fragmentation.
This
is done with an ICMP Destination Unreachable
Fragmentation Needed and DF Set message.
Most routers also specify the MTU for the next hop by
putting the value for it in the low-order 16 bits of the
ICMP header field that is unused in RFC 792. See RFC 1191,
section 4, for the format of this message. Upon receiving
this ICMP error message, TCP adjusts its MSS for the
connection to the specified MTU minus the TCP and IP header
size so that any further packets sent on the connection are
no larger than the maximum size that can traverse the path
without fragmentation.
Note The
minimum MTU permitted is 88 bytes, and Windows 2000 TCP
enforces this limit.
Some
noncompliant routers may silently drop IP datagrams that can
not be fragmented or may not correctly report their next-hop
MTU. If this occurs, it may be necessary to make a
configuration change to the PMTU detection algorithm. There
are two registry changes that can be made to the TCP/IP
stack in Windows 2000 to work around these problematic
devices. These registry entries are described in more detail
in Appendix A:
-
EnablePMTUBHDetect—Adjusts
the PMTU discovery algorithm to attempt to detect black
hole routers. Black hole detection is disabled by
default.
-
EnablePMTUDiscovery—Completely
enables or disables the PMTU discovery mechanism. When
PMTU discovery is disabled, an MSS of 536 bytes is used
for all non-local destination addresses. PMTU discovery
is enabled by default.
The PMTU
between two hosts can be discovered manually using the
ping command with the -f (don't fragment) switch,
as follows:
ping -f -n <number of
pings> -l <size> <destination ip
address>
As shown
in the example below, the size parameter can be
varied until the MTU is found. The size parameter
used by ping is the size of the data buffer to send, not
including headers. The ICMP header consumes 8 bytes, and the
IP header is normally 20 bytes. In the case below
(Ethernet), the link layer MTU is the maximum-sized ping
buffer plus 28, or 1500 bytes:
C:\>ping -f -n 1 -l 1472
10.99.99.10
Pinging 10.99.99.10 with 1472 bytes of data:
Reply from 10.99.99.10:
bytes=1472 time<10ms TTL=128
Ping
statistics for 10.99.99.10:
Packets: Sent = 1, Received =
1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms,
Average = 0ms
C:\>ping -f -n 1 -l
1473 10.99.99.10
Pinging 10.99.99.10 with 1473 bytes of data:
Packet needs to be fragmented
but DF set.
Ping
statistics for 10.99.99.10:
Packets: Sent = 1, Received =
0, Lost = 1 (100% loss),
Approximate round trip times in milliseconds:
Minimum = 0ms, Maximum = 0ms,
Average = 0ms
In the
example shown above, the IP layer returned an ICMP error
message that ping interpreted. If the router had been a
black hole router, ping would simply not be answered once
its size exceeded the MTU that the router could handle. Ping
can be used in this manner to detect such a router.
A sample
ICMP Destination unreachable error message is shown
here:
******************************************************************************
Src Addr Dst Addr
Protocol Description
10.99.99.10
10.99.99.9 ICMP Destination Unreachable: 10.99.99.10
See frame 3
+ FRAME: Base frame
properties
+ ETHERNET: ETYPE =
0x0800 : Protocol = IP: DOD Internet Protocol
+ IP: ID = 0x4401;
Proto = ICMP; Len: 56
ICMP: Destination
Unreachable: 10.99.99.10 See frame 3
ICMP: Packet Type =
Destination Unreachable
ICMP: Unreachable
Code = Fragmentation Needed, DF Flag Set
ICMP: Checksum =
0xA05B
ICMP: Next Hop MTU =
576 (0x240)
ICMP: Data: Number of
data bytes remaining = 28 (0x001C)
ICMP: Description of
original IP frame
ICMP: (IP) Version =
4 (0x4)
ICMP: (IP) Header
Length = 20 (0x14)
ICMP: (IP) Service
Type = 0 (0x0)
ICMP: Precedence =
Routine
ICMP: ...0.... =
Normal Delay
ICMP: ....0... =
Normal Throughput
ICMP: .....0.. =
Normal Reliability
ICMP: (IP) Total
Length = 1028 (0x404)
ICMP: (IP)
Identification = 45825 (0xB301)
ICMP: Flags Summary =
2 (0x2)
ICMP: .......0 = Last
fragment in datagram
ICMP: ......1. =
Cannot fragment datagram
ICMP: (IP) Fragment
Offset = 0 (0x0) bytes
ICMP: (IP) Time to
Live = 32 (0x20)
ICMP: (IP) Protocol =
ICMP - Internet Control Message
ICMP: (IP) Checksum =
0xC91E
ICMP: (IP) Source
Address = 10.99.99.9
ICMP: (IP)
Destination Address = 10.99.99.10
ICMP: (IP) Data:
Number of data bytes remaining = 8 (0x0008)
ICMP: Description of
original ICMP frame
ICMP: Checksum =
0xBC5F
ICMP: Identifier =
256 (0x100)
ICMP: Sequence Number
= 38144 (0x9500)
00000: 00 AA 00 4B B1
47 00 AA 00 3E 52 EF 08 00 45 00 ...K.G...>R...E.
00010: 00 38 44 01 00
00 80 01 1B EB 0A 63 63 0A 0A 63 .8D........cc..c
00020: 63 09 03 04 A0
5B 00 00 02 40 45 00 04 04 B3 01 c....[...@E.....
00030: 40 00 20 01 C9
1E 0A 63 63 09 0A 63 63 0A 08 00 @. ....cc..cc...
00040: BC 5F 01 00 95
00
This
error was generated by using ping -f –n 1
-l 1000 on an Ethernet-based host to send a large
datagram across a router interface that only supports an MTU
of 576 bytes. When the router tried to place the large frame
onto the network with the smaller MTU, it found that
fragmentation was not allowed. Therefore, it returned the
error message indicating that the largest datagram that
could be forwarded is 0x240, or 576 bytes.
Dead Gateway Detection
Dead
gateway detection is used to allow TCP to detect failure of
the default gateway and to adjust the IP routing table to
use another default gateway. The Microsoft TCP/IP stack uses
the triggered reselection method described in RFC 816, with
slight modifications based upon customer experiences and
feedback.
When a
TCP connection routed through the default gateway attempts
to send a TCP packet to the destination a number of times
(equal to one-half of the registry value
TcpMaxDataRetransmissions) without receiving a response,
the algorithm changes the Route Cache Entry (RCE) for that
remote IP address to use the next default gateway in the
list. When 25 percent of the TCP connections have moved to
the next default gateway, the algorithm advises IP to change
the computer's default gateway to the one that the
connections are now using.
For
example, assume that there are currently TCP connections to
11 different IP addresses that are being routed through the
default gateway. Now assume that the default gateway fails,
that there is a second default gateway configured, and that
the value for TcpMaxDataRetransmissions is at the
default of 5.
When the
first TCP connection tries to send data, it does not receive
any acknowledgments. After the third retransmission, the RCE
for that remote IP address is switched to the next default
gateway in the list. At this point, any TCP connections to
that one remote IP address have switched over, but the
remaining connections still try to use the original default
gateway.
When the
second TCP connection tries to send data, the same thing
happens. Now, two of the 11 RCEs point to the new gateway.
When the
third TCP connection tries to send data, after the third
retransmission, three of 11 RCEs have been switched to the
second default gateway. Because, at this point, over 25
percent of the RCEs have been moved, the default gateway for
the whole computer is moved to the new one.
That
default gateway remains the primary one for the computer
until it experiences problems (causing the dead gateway
algorithm to try the next one in the list again) or until
the computer is restarted.
When the
search reaches the last default gateway, it returns to the
beginning of the list.
TCP Retransmission
Behavior
TCP
starts a retransmission timer when each outbound segment is
handed down to IP. If no acknowledgment has been received
for the data in a given segment before the timer expires,
the segment is retransmitted. For new connection requests,
the retransmission timer is initialized to 3 seconds
(controllable using the TcpInitialRtt per-adapter
registry parameter), and the request (SYN) is resent up to
the value specified in TcpMaxConnectRetransmissions
(the default for Windows 2000 is 2 times). On existing
connections, the number of retransmissions is controlled by
the TcpMaxDataRetransmissions registry parameter (5
by default). The retransmission time-out is adjusted on the
fly to match the characteristics of the connection, using
Smoothed Round Trip Time (SRTT) calculations as described in
Van Jacobson's paper called "Congestion Avoidance and
Control." The timer for a given segment is doubled after
each retransmission of that segment. Using this algorithm,
TCP tunes itself to the normal delay of a connection. TCP
connections over high-delay links take much longer to time
out than those over low-delay links.4
The
following trace clip shows the retransmission algorithm for
two hosts that are connected over Ethernet on the same
subnet. An FTP file transfer was in progress when the
receiving host was disconnected from the network. Because
the SRTT for this connection was very small, the first
retransmission was sent after about one-half second. The
timer was then doubled for each of the retransmissions that
followed. After the fifth retransmission, the timer was once
again doubled. If no acknowledgment was received before it
expired, the connection was aborted.
delta source ip dest ip pro
flags description
0.000 10.57.10.32
10.57.9.138 TCP .A.., len: 1460, seq: 8043781, ack: 8153124,
win: 8760
0.521 10.57.10.32
10.57.9.138 TCP .A.., len: 1460, seq: 8043781, ack: 8153124,
win: 8760
1.001 10.57.10.32
10.57.9.138 TCP .A.., len: 1460, seq: 8043781, ack: 8153124,
win: 8760
2.003 10.57.10.32
10.57.9.138 TCP .A.., len: 1460, seq: 8043781, ack: 8153124,
win: 8760
4.007 10.57.10.32
10.57.9.138 TCP .A.., len: 1460, seq: 8043781, ack: 8153124,
win: 8760
8.130 10.57.10.32
10.57.9.138 TCP .A.., len: 1460, seq: 8043781, ack: 8153124,
win: 8760
There are
some circumstances under which TCP retransmits data prior to
the time that the retransmission timer expires. The most
common of these occurs due to a feature known as fast
retransmit. When a receiver that supports fast
retransmit receives data with a sequence number beyond the
current expected one, it assumes that some data was dropped.
To help make the sender aware of this event, the receiver
immediately sends an ACK, with the ACK number set to the
sequence number that it was expecting. It continues to do
this for each additional TCP segment that arrives containing
data subsequent to the missing data in the incoming stream.
When the sender starts to receive a stream of ACKs that are
acknowledging the same sequence number and that sequence
number is earlier than the current sequence number being
sent, it can infer that a segment (or more) must have been
dropped. Senders that support the fast retransmit algorithm
immediately resend the segment that the receiver is
expecting to fill in the gap in the data, without waiting
for the retransmission timer to expire for that segment.
This optimization greatly improves performance in a busy
network environment.
By
default, Windows 2000 resends a segment if it receives three
ACKs for the same sequence number and that sequence number
lags the current one. This is controllable with the
TcpMaxDupAcks registry parameter. See also the "TCP
Selective Acknowledgment (RFC 2018)" section in this paper.
TCP Keep-Alive Messages
A TCP
keep-alive packet is simply an ACK with the sequence number
set to one less than the current sequence number for the
connection. A host receiving one of these ACKs responds with
an ACK for the current sequence number. Keep-alives can be
used to verify that the computer at the remote end of a
connection is still available. TCP keep-alives can be sent
once every KeepAliveTime (defaults to 7,200,000
milliseconds or two hours) if no other data or higher-level
keep-alives have been carried over the TCP connection. If
there is no response to a keep-alive, it is repeated once
every KeepAliveInterval seconds. KeepAliveInterval
defaults to 1 second. NetBT connections, such as those
used by many Microsoft networking components, send NetBIOS
keep-alives more frequently, so normally no TCP keep-alives
are sent on a NetBIOS connection. TCP keep-alives are
disabled by default, but Windows Sockets applications can
use the setsockopt function to enable them.
Slow Start Algorithm and
Congestion Avoidance
When a
connection is established, TCP starts slowly at first to
assess the bandwidth of the connection, and to avoid
overflowing the receiving host or any other devices or links
in the path. The send window is set to two TCP segments, and
if that is acknowledged, it is incremented to three
segments.5 If those are acknowledged, it is
incremented again, and so on until the amount of data being
sent per burst reaches the size of the receive window on the
remote host. At that point, the slow start algorithm is no
longer in use, and flow control is governed by the receive
window. However, congestion could still occur on a
connection at any time during transmission. If this happens
(evidenced by the need to retransmit), a
congestion-avoidance algorithm is used to reduce the send
window size temporarily and to grow it back towards the
receive window size. Slow start and congestion avoidance are
discussed further in RFC 1122 and RFC 2581.
Silly Window Syndrome
(SWS)
Silly
Window Syndrome is described in RFC 1122 as follows:
"In
brief, SWS is caused by the receiver advancing the right
window edge whenever it has any new buffer space available
to receive data and by the sender using any incremental
window, no matter how small, to send more data [TCP:5]. The
result can be a stable pattern of sending tiny data
segments, even though both sender and receiver have a large
total buffer space for the connection."
Windows
2000 TCP/IP implements SWS avoidance, as specified in RFC
1122, by not sending more data until there is a sufficient
window size advertised by the receiving end to send a full
TCP segment. It also implements SWS avoidance on the receive
end of a connection by not opening the receive window in
increments of less than a TCP segment.
Nagle Algorithm
Windows
NT and Windows 2000 TCP/IP implement the Nagle algorithm
described in RFC 896. The purpose of this algorithm is to
reduce the number of very small segments sent, especially on
high-delay (remote) links. The Nagle algorithm allows only
one small segment to be outstanding at a time without
acknowledgment. If more small segments are generated while
awaiting the ACK for the first one, these segments are
coalesced into one larger segment. Any full-sized segment is
always transmitted immediately, on the assumption that there
is a sufficient receive window available. The Nagle
algorithm is effective in reducing the number of packets
sent by interactive applications, such as Telnet, especially
over slow links.
The Nagle
algorithm can be observed in the following trace captured by
Microsoft Network Monitor. The trace was captured by using
PPP to dial up an Internet provider at 9600 BPS. A Telnet
(character-mode) session was established, and then the Y
key was held down on the Windows NT Workstation. At all
times, one segment was sent, and further Y characters
were held by the stack until an acknowledgment was received
for the previous segment. In this example, three to four
Y characters were buffered each time and sent together
in one segment. The Nagle algorithm resulted in a huge
savings in the number of packets sent—the number of packets
was reduced by a factor of about three.
Time Source IP Dest IP Prot
Description
0.644 204.182.66.83
199.181.164.4 TELNET To Server Port = 1901
0.144 199.181.164.4
204.182.66.83 TELNET To Client Port = 1901
0.000 204.182.66.83
199.181.164.4 TELNET To Server Port = 1901
0.145 199.181.164.4
204.182.66.83 TELNET To Client Port = 1901
0.000 204.182.66.83
199.181.164.4 TELNET To Server Port = 1901
0.144 199.181.164.4
204.182.66.83 TELNET To Client Port = 1901
. . .
Each
segment contained several of the Y characters. The
first segment is shown more fully parsed below, and the data
portion is pointed out in the hexadecimal display at the
bottom.
***********************************************************************
Time Source IP Dest
IP Prot Description
0.644 204.182.66.83
199.181.164.4 TELNET To Server Port = 1901
+ FRAME: Base frame
properties
+ ETHERNET: ETYPE =
0x0800 : Protocol = IP: DOD Internet Protocol
+ IP: ID = 0xEA83;
Proto = TCP; Len: 43
+ TCP: .AP..., len:
3, seq:1032660278, ack: 353339017, win: 7766, src: 1901 dst:
23 (TELNET)
TELNET: To Server
From Port = 1901
TELNET: Telnet Data
D2 41 53 48 00 00 52
41 53 48 00 00 08 00 45 00 .ASH..RASH....E.
00 2B EA 83 40 00 20
06 F5 85 CC B6 42 53 C7 B5 .+..@. .....BS..
A4 04 07 6D 00 17 3D
8D 25 36 15 0F 86 89 50 18 ...m..=.%6....P.
1E 56 1E 56 00 00 79
79 79 .V.V..yyy
^^^
data
Windows
Sockets applications can disable the Nagle algorithm for
their connections by setting the TCP_NODELAY socket option.
However, this practice should be avoided unless it is
absolutely necessary because it increases network
utilization. Some network applications may not perform well
if their design does not take into account the effects of
transmitting large numbers of small packets and the Nagle
algorithm. The Nagle algorithm is not applied to loopback
TCP connections for performance reasons. Windows 2000 Netbt
disables Nagling for NetBIOS over TCP connections as well as
direct-hosted redirector/server connections, which can
improve performance for applications issuing numerous small
file manipulation commands. An example is an application
that uses file locking/unlocking frequently.
TCP TIME-WAIT Delay
When a
TCP connection is closed, the socket-pair is placed into a
state known as TIME-WAIT. This is done so that a new
connection does not use the same protocol, source IP
address, destination IP address, source port, and
destination port until enough time has passed to ensure that
any segments that may have been misrouted or delayed are not
delivered unexpectedly. The length of time that the
socket-pair should not be reused is specified by RFC 793 as
2 MSL (two maximum segment lifetimes), or four minutes. This
is the default setting for Windows NT and Windows 2000.
However, with this default setting, some network
applications that perform many outbound connections in a
short time may use up all available ports before the ports
can be recycled.
Windows
NT and Windows 2000 offer two methods of controlling this
behavior. First, the TcpTimedWaitDelay registry
parameter can be used to alter this value. Windows NT and
Windows 2000 allow it to be set as low as 30 seconds, which
should not cause problems in most environments. Second, the
number of user-accessible ephemeral ports that can be used
to source outbound connections is configurable using the
MaxUserPorts registry parameter. By default, when an
application requests any socket from the system to use for
an outbound call, a port between the values of 1024 and 5000
is supplied. The MaxUserPorts parameter can be used
to set the value of the uppermost port that the
administrator chooses to allow for outbound connections. For
instance, setting this value to 10,000 (decimal) would make
approximately 9000 user ports available for outbound
connections. For more details on this concept, see RFC 793.
See also the MaxFreeTcbs and MaxHashTableSize
registry parameters.
TCP Connections to and
from Multihomed Computers
When TCP
connections are made to a multihomed host, both the WINS
client and the Domain Name Resolver (DNR) attempt to
determine whether any of the destination IP addresses
provided by the name server are on the same subnet as any of
the interfaces in the local computer. If so, these addresses
are sorted to the top of the list so that the application
can try them prior to trying addresses that are not on the
same subnet. If none of the addresses is on a common subnet
with the local computer, behavior is different depending
upon the name space. The PrioritizeRecordData TCP/IP
registry parameter can be used to prevent the
DNR component from sorting local subnet addresses to
the top of the list.
In the
WINS name space, the client is responsible for randomizing
or load balancing between the provided addresses. The WINS
server always returns the list of addresses in the same
order, and the WINS client randomly picks one of them for
each connection.
In the
DNS name space, the DNS server is usually configured to
provide the addresses in a round robin fashion. The DNR does
not attempt to further randomize the addresses. In some
situations, it is desirable to connect to a specific
interface on a multihomed computer. The best way to
accomplish this is to provide the interface with its own DNS
entry. For example, a computer named raincity could
have one DNS entry listing both IP addresses (actually two
separate records in the DNS with the same name), and also
records in the DNS for raincity1 and raincity2, each
associated with just one of the IP addresses assigned to the
computer.
When TCP
connections are made from a multihomed host, things get a
bit more complicated. If the connection is a Winsock
connection using the DNS name space, once the target IP
address for the connection is known, TCP attempts to connect
from the best source IP address available. Again, the route
table is used to make this determination. If there is an
interface in the local computer that is on the same subnet
as the target IP address, its IP address is used as the
source in the connection request. If there is no best source
IP address to use, the system chooses one randomly.
If the
connection is a NetBIOS-based connection using the
redirector, little routing information is available at the
application level. The NetBIOS interface supports
connections over various protocols and has no knowledge of
IP. Instead, the redirector places calls on all of the
transports that are bound to it. If there are two interfaces
in the computer and one protocol installed, there are two
transports available to the redirector. Calls are placed on
both, and NetBT submits connection requests to the stack,
using an IP address from each interface. It is possible that
both calls succeed. If so, the redirector cancels one of
them. The choice of which one to cancel depends upon the
redirector ObeyBindingOrder registry value6
. If this is set to 0 (the default value), the primary
transport (determined by binding order) is the preferred
one, and the redirector waits for the primary transport to
time out before accepting the connection on the secondary
transport. If this value is set to 1, the binding order is
ignored, and the redirector accepts the first connection
that succeeds and cancels the other(s).
Throughput Considerations
TCP was
designed to provide optimum performance over varying link
conditions, and Windows 2000 contains improvements such as
those supporting RFC 1323. Actual throughput for a link
depends on a number of variables, but the most important
factors are:
-
Link
speed (bits-per-second that can be transmitted)
-
Propagation delay
-
Window size (amount of unacknowledged data that may be
outstanding on a TCP connection)
-
Link
reliability
-
Network and intermediate device congestion
-
Path
MTU
TCP
throughput calculation is discussed in detail in Chapters
20–24 of TCP/IP Illustrated, by W. Richard Stevens7
. Some key considerations are listed below:
-
The
capacity of a pipe is bandwidth multiplied by round-trip
time. This is known as the bandwidth-delay product.
If the link is reliable, for best performance the window
size should be greater than or equal to the capacity of
the pipe so that the sending stack can fill it. The
largest window size that can be specified, due to its
16-bit field in the TCP header, is 65535, but larger
windows can be negotiated by using window scaling as
described earlier in this document. See TcpWindowSize
in Appendix A.
-
Throughput can never exceed window size divided by
round-trip time.
-
If
the link is unreliable or badly congested and packets
are being dropped, using a larger window size may not
improve throughput. Along with scaling windows support,
Windows 2000 supports Selective Acknowledgments (SACK;
described in RFC 2018) to improve performance in
environments that are experiencing packet loss. It also
includes support for timestamps (described in RFC 1323)
for improved RTT estimation.
-
Propagation delay is dependent upon the speed of light,
latencies in transmission equipment, and so on.
-
Transmission delay depends on the speed of the media.
-
For a
specified path, propagation delay is fixed, but
transmission delay depends upon the packet size.
-
At
low speeds, transmission delay is the limiting factor.
At high speeds, propagation delay may become the
limiting factor.
To
summarize, Windows NT and Windows 2000 TCP/IP can adapt to
most network conditions and can dynamically provide the best
throughput and reliability possible on a per-connection
basis. Attempts at manual tuning are often
counter-productive unless a qualified network engineer first
performs a careful study of data flow.
User Datagram Protocol
(UDP)
UDP
provides a connectionless, unreliable transport service. It
is often used for too many communications that use broadcast
or multicast IP datagrams. Since delivery of UDP datagrams
is not guaranteed, applications using UDP must supply their
own mechanisms for reliability, if needed. Microsoft
networking uses UDP for logon, browsing, and name
resolution. UDP can also be used to carry IP multicast
streams.
UDP and Name Resolution
UDP is
used for NetBIOS name resolution by unicast to a NetBIOS
name server or subnet broadcasts, and for DNS host name to
IP address resolution. NetBIOS name resolution is
accomplished over UDP port 137. DNS queries use UDP port 53.
Because UDP itself does not guarantee delivery of datagrams,
both of these services use their own retransmission schemes
if they receive no answer to queries. Broadcast UDP
datagrams are not usually forwarded over IP routers, so
NetBIOS name resolution in a routed environment requires a
name server of some type, or the use of static database
files.
Mailslots over UDP
Many
NetBIOS applications use mailslot messaging. A
second-class mailslot is a simple mechanism for sending a
message from one NetBIOS name to another over UDP. Mailslot
messages can be broadcast on a subnet or directed to the
remote host. To direct a mailslot message to another host,
there must be some method of NetBIOS name resolution
available. Microsoft provides Windows Internet Name Server
(WINS) for this purpose.
NetBIOS over TCP/IP
The
Windows NT and Windows 2000 implementation of NetBIOS over
TCP/IP is referred to as NetBT. NetBT uses the
following TCP and UDP ports:
-
UDP
port 137 (name services)
-
UDP
port 138 (datagram services)
-
TCP
port 139 (session services)
NetBIOS
over TCP/IP is specified by RFC 1001 and RFC 1002. The
NetBT.sys driver is a kernel-mode component that supports
the Transport Driver Interface (TDI) interface. Services
such as Workstation and Server use the TDI interface
directly, but traditional NetBIOS applications have their
calls mapped to TDI calls by the Netbios.sys driver. Using
TDI to make calls to NetBT is a more difficult programming
task, but can provide higher performance and freedom from
historical NetBIOS limitations. NetBIOS concepts are
discussed further in the "Network Application Interfaces"
section of this document.
Transport Driver Interface
(TDI)
Microsoft
developed the Transport Driver Interface (TDI) to provide
greater flexibility and functionality than is provided by
existing interfaces, such as NetBIOS and Windows Sockets.
All Windows transport providers expose TDI. The TDI
specification describes the set of primitive functions by
which transport drivers and TDI clients communicate and the
call mechanisms used for accessing them. Currently, TDI is
kernel-mode only.
The
Windows 2000 redirector and server both use TDI directly,
rather than going through the NetBIOS mapping layer. By
doing so, they are not subject to many of the restrictions
imposed by NetBIOS, such as the legacy 254-session limit.
TDI Features
TDI may
be the most difficult to use of all Windows network APIs. It
is a simple conduit, so the programmer must determine the
format and meaning of messages.
TDI
includes the following features:
-
Most
Windows NT or Windows 2000 transports support TDI (DLC,
however, does not.)
-
An
open naming and addressing scheme
-
Message and stream-mode data transfer
-
Asynchronous operation
-
Support for unsolicited indication of events
-
Extensibility—clients can submit private requests to a
transport driver that understands them.
-
Support for limited use of standard kernel-mode I/O
functions to send and receive data
-
32-bit addressing and values
-
Support for Access Control Lists (ACLs, used for
security) on TDI address objects
More
information on TDI is available from the Windows 2000 Device
Driver Kit (DDK).
Security Considerations
Network
security is a serious consideration for administrators with
machines exposed to public networks. Microsoft's TCP/IP
stack has been hardened against many attacks and in its
default state handles most of the common attacks. Some
additional protection against popular Denial of Service
attacks can be added by enabling the SynAttackProtect
key in the registry. This key allows the administrator to
choose several levels of protection against SYN attacks.
Here are
general guidelines that can lower your vulnerability to
attack:
-
Disable unnecessary or optional services (for instance,
Client for Microsoft Networks on an IIS server).
-
Enable TCP/IP filtering and restrict access to only the
ports that are necessary for the server to function.
(See the Microsoft Knowledge Base article number Q150543
for a list of ports that Windows services use.)
-
Unbind NetBIOS over TCP/IP where it is not needed.
-
Configure static IP addresses and parameters for public
adapters.
-
Configure registry settings for maximum protection (see
Appendix D).
Consult
the Microsoft Security Web site regularly for security
bulletins.
Network Application
Interfaces
There are
a number of ways that network applications can communicate
using the TCP/IP protocol stack. Some of them, such as named
pipes, go through the network redirector, which is part of
the Workstation service. Many older applications were
written to the NetBIOS interface, which is supported by
NetBIOS over TCP/IP.
The
Windows Sockets interface is currently popular. A quick
overview of the Windows Sockets Interface and the NetBIOS
Interface is presented here.
Windows Sockets
Windows
Sockets specifies a programming interface based on the
familiar socket interface from the University of California
at Berkeley. It includes a set of extensions designed to
take advantage of the message-driven nature of Microsoft
Windows. Version 1.1 of the specification was released in
January 1993, and version 2.2.0 was published in May of
1996.8 http://www.microsoft.com/ and
ftp.microsoft.com. Windows 2000 supports version 2.2,
commonly referred to as Winsock2.
Applications
There are
many Windows Sockets applications available. A number of the
utilities that ship with Windows 2000 are based on Windows
Sockets, including the FTP and DHCP clients and servers,
Telnet client, and so on. There are also higher-level
programming interfaces that rely on Winsock, such as the
Windows Internet API (WinInet) used by Internet Explorer.
Name and Address
Resolution
Windows
Sockets applications generally use the gethostbyname()
function to resolve a host name to an IP address. The
gethostbyname() function uses the following (default)
name look-up sequence:
-
Checks the local host name for a matching name.
-
Checks the hosts file for a matching name entry.
-
If a
Domain Name Server is configured, it queries it.
-
If no
match is found, try NetBIOS name-resolution until the
point at which DNS resolution is attempted.
Some
applications use the gethostbyaddr() function to
resolve an IP address to a host name. The gethostbyaddr()
call uses the following (default) sequence:
-
Check
the host's file for a matching address entry.
-
If a
Domain Name Server is configured, it queries it.
-
Send
a NetBIOS Adapter Status Request to the IP address being
queried. If it responds with a list of NetBIOS names
registered for the adapter, parse it for the computer
name.
Support for IP
Multicasting
Winsock2
provides support for IP multicasting. Multicasting is
described in the Windows Sockets 2.0 specification and in
the IGMP section of this document. IP multicasting is
currently supported only on AF_INET sockets of the types
SOCK_DGRAM and SOCK_RAW.
Backlog Parameter
Windows
Sockets server applications generally create a socket, and
then use the listen() function on it to listen for
connection requests. One of the parameters passed when
calling listen() is the backlog of connection
requests that the application would like Windows Sockets to
queue for it. This value controls the number of unaccepted
connections that can be queued. Once an application
accepts a connection, it is moved out of the connection
request backlog and no longer counts. The Windows Sockets
1.1 specification indicates that the maximum allowable value
for a backlog is 5; however, Windows NT 3.51 accepts a
backlog of up to 100, Windows NT 4.0 and Windows 2000 Server
accept a backlog of 200, and Windows NT 4.0 Workstation and
Windows 2000 Professional accept a backlog of 5 (which
reduces memory demands).
Push Bit Interpretation
By
default, Windows 2000 TCP/IP completes a recv() call
when one of the following conditions is met:
-
Data
arrives with the PUSH bit set
-
The
user recv buffer is full
-
0.5
seconds have elapsed since any data has arrived
If a
client application is run on a computer with a TCP/IP
implementation that does not set the push bit on send
operations, response delays may result. It is best to
correct this on the client; however, a configuration
parameter (IgnorePushBitOnReceives) was added to
Afd.sys to force it to treat all arriving packets as though
the push bit were set. This parameter was new in Windows NT
4.0 and is supported in Windows 2000.
NetBIOS over TCP/IP
NetBIOS
defines a software interface and a naming convention, not a
protocol. Early versions of Microsoft networking products
provided only the NetBEUI local area networking protocol
with a NetBIOS application-programming interface. NetBEUI is
a small, fast protocol with no networking layer; thus, it is
not routable and is often not suitable for WAN
implementations. NetBEUI relies on broadcasts for name
resolution and location of services. NetBIOS over TCP/IP
provides the NetBIOS programming interface over the TCP/IP
protocol, extending the reach of NetBIOS client and server
programs to the WAN, and providing interoperability with
various other operating systems.
The
Workstation service, Server service, Browser, Messenger, and
NetLogon services are all (direct) NetBT clients. They use
TDI (described earlier in this paper) to communicate with
NetBT. Windows NT and Windows 2000 also include a NetBIOS
emulator. The emulator takes standard NetBIOS requests from
NetBIOS applications and translates them to equivalent TDI
primitives.
Windows
2000 still uses NetBIOS over TCP/IP to communicate with
prior versions of Windows NT and other clients, such as
Windows 95. However, the Windows 2000 redirector and server
components now also support direct hosting to
communicate with other computers running Windows 2000.
Direct hosting uses the DNS for name resolution. No NetBIOS
name resolution (WINS or broadcast) is used, and the
protocol is simpler. Direct Host TCP uses port 445, instead
of the NetBIOS TCP port 139.
By
default, both NetBIOS and direct hosting are enabled, and
both are tried in parallel when a new connection is
established. The first to succeed in connecting is used for
any attempt. NetBIOS support can be disabled to force all
traffic to use direct hosting.
To disable NetBIOS support
-
On
the Start menu, point to Settings, and
then click Network and Dial-up Connection.
Right-click Local Area Connection and click
Properties.
-
Select Internet Protocol (TCP/IP), and click
Properties.
-
Click
Advanced.
-
Click
the WINS tab, and select Disable NetBIOS over
TCP/IP.
Applications and services that depend on NetBIOS no longer
function after this is done, so it is important that you
verify that any clients and applications no longer need
NetBIOS support before you disable it. For example,
pre-Windows 2000 computers will be unable to browse, locate,
or create file and print share connections to a Windows 2000
computer with NetBIOS disabled.
NetBIOS Names
The
NetBIOS namespace is flat, meaning that all names within the
name space must be unique. NetBIOS names are 16 characters
in length. Resources are identified by NetBIOS names, which
are registered dynamically when computers boot, services or
applications start, or users log on. Names can be registered
as unique (one owner) or as group (multiple owner) names. A
NetBIOS Name Query is used to locate a resource by resolving
the name to an IP address.
Microsoft
networking components, such as Workstation and Server
services, allow the first 15 characters of a NetBIOS name to
be specified by the user or administrator, but reserve the
sixteenth character of the NetBIOS name to indicate a
resource type (00-FF hex). Many popular third-party software
packages also use this character to identify and register
their specific services. Table 3 lists some example NetBIOS
names used by Microsoft components.
Table 3 Examples of NetBIOS names used by Microsoft components
|
Unique name |
Service |
|
computer_name[00h] |
Workstation service |
|
computer_name[03h] |
Messenger service |
|
computer_name[06h] |
RAS Server service |
|
computer_name[1Fh] |
NetDDE service |
|
computer_name[20h] |
Server service |
|
computer_name[21h] |
RAS Client service |
|
computer_name[BEh] |
Network Monitor Agent |
|
computer_name[BFh] |
Network Monitor Application |
|
user_name[03] |
Messenger service |
|
domain_name[1Dh] |
Master Browser |
|
domain_name[1Bh] |
Domain Master Browser |
|
Group Name |
Service |
|
domain_name[00h] |
Domain name |
|
domain_name[1Ch] |
Domain controllers |
|
domain_name[1Eh] |
Browser service elections |
|
\\--__MSBROWSE__[01h] |
Master browser |
To see
which names a computer has registered over NetBT, type the
following from a command prompt:
nbtstat -n
Windows
2000 allows you to re-register names with the name server
after a computer has already been started. To do this, type
the following from a command prompt:
nbtstat –RR
NetBIOS Name Registration
and Resolution
Windows
TCP/IP systems use several methods to locate NetBIOS
resources:
-
NetBIOS name cache
-
NetBIOS name server
-
IP
subnet broadcasts
-
Static Lmhosts file
-
Local
host name (optional, depends on EnableDns
registry parameter)
-
Static hosts file (optional, depends on EnableDns
registry parameter)
-
DNS
servers (optional, depends on EnableDns registry
parameter)
NetBIOS
name resolution order depends upon the node type and system
configuration. The following node types are supported:
-
B-node
uses broadcasts for name registration and resolution.
-
P-node
uses a NetBIOS name server (such as WINS) for name
registration and resolution.
-
M-node
uses broadcasts for name registration. For name
resolution, it tries broadcasts first, but switches to
p-node if it receives no answer.
-
H-node
uses a NetBIOS name server for both registration and
resolution. However, if no name server can be located,
it switches to b-node. It continues to poll for a name
server and switches back to p-node when one becomes
available.
-
Microsoft-enhanced
uses the local Lmhosts file or WINS proxies plus Windows
Sockets gethostbyname calls (using standard DNS
and/or local Hosts files) in addition to standard node
types.
Microsoft
ships a NetBIOS name server known as the Windows Internet
Name Service (WINS). Most WINS clients are set up as
h-nodes; that is, they first attempt to register and resolve
names using WINS, and if that fails, they try local subnet
broadcasts. Using a name server to locate resources is
generally preferable to broadcasting for two reasons:
-
Broadcasts are not usually forwarded by routers.
-
Broadcasts are received by all computers on a subnet,
requiring processing time at each computer.
NetBIOS Name Registration
and Resolution for Multihomed Computers
As
mentioned, NetBT binds to only one IP address per physical
network interface. From the NetBT viewpoint, a computer is
multihomed only if it has more than one NIC installed. When
a name registration packet is sent from a multihomed
machine, it is flagged as a multihomed name registration so
that it does not conflict with the same name being
registered by another interface in the same computer.
If a
multihomed machine receives a broadcast name query, all
NetBT/interface bindings receiving the query respond with
their addresses, and by default the client chooses the first
response and connects to the address supplied by it. This
behavior can be controlled by the RandomAdapter
registry parameter described in Appendix B.
When a
directed name query is sent to a WINS server, the WINS
server responds with a list of all IP addresses that were
registered with WINS by the multihomed computer.
Choosing
the best IP address to connect to on a multihomed computer
is a client function. Currently, the following algorithm is
employed, in the order listed:
-
If
one of the IP addresses in the name query response list
is on the same logical subnet as the calling binding of
NetBT on the local computer, that address is selected.
If more than one of the addresses meets the criteria,
one is picked at random from those that match.
-
If
one of the IP addresses in the list is on the same
(classless) network as the calling binding of NetBT on
the local computer, that address is selected. If more
than one of the addresses meets the criteria, one is
picked at random from those that match.
-
If
one of the IP addresses in the list is on the same
logical subnet as any binding of NetBT on the local
computer, that address is selected. If more than one of
the addresses meets the criteria, one is picked at
random from those.
-
If
none of the IP addresses in the list is on the same
subnet as any binding of NetBT on the local computer, an
address is selected at random from the list.
This
algorithm provides a reasonably good way of balancing
connections to a server across multiple NICs, and still
favoring direct (same subnet) connections when they are
available. When a list of IP addresses is returned, they are
sorted into the best order, and NetBT attempts to ping each
of the addresses in the list until one responds. NetBT then
attempts a connection to that address. If no addresses
respond, a connection attempt is made to the first address
in the list anyway. This is tried in case there is a
firewall or other device filtering ICMP traffic. Windows
2000 supports per interface NetBT name caching, and
nbtstat -c displays the name cache on a per-interface
basis.
NetBT Internet/DNS
Enhancements and the SMB Device
It has
always been possible to connect from one Windows-based
computer to another using NetBT over the Internet. To do so,
some means of name resolution had to be provided. Two common
methods were to use the Lmhosts file or a WINS server.
Several enhancements were introduced in Windows NT 4.0 and
carried forward in Windows 2000 to eliminate these special
configuration needs.
It is now
possible to connect to a NetBIOS over TCP/IP resource in two
new ways:
-
Use
the command net use \\ip address\share_name.
This eliminates the need for NetBIOS name-resolution
configuration.
-
Use
the command net use \\FQDN\share_name.
This allows the use of a DNS to connect to a computer
using its fully qualified domain name (FQDN).
Examples
of using new functionality to map a drive to
ftp.microsoft.com are shown here. The IP address listed here
is subject to change.
-
net
use f: \\ftp.microsoft.com\data
-
net
use \\198.105.232.1\data
-
net
view \\198.105.232.1
-
dir
\\ftp.microsoft.com\bussys\winnt
In
addition, various applications, such as the Event Viewer
Select Computer option on the Log menu,
allow you to enter an FQDN or IP address directly. In
Windows 2000, it is also possible to use direct hosting to
establish redirector or server connections between Windows
2000 computers without the use of the NetBIOS namespace or
mapping layer at all. By default, Windows attempts to make
connections using both methods so that it can support
connections to lower-level computers. However, in Windows
2000–only environments, you can disable NetBIOS completely
from the Network Connections folder.
The new
interface in Windows 2000 that makes NetBIOS-less operation
possible is termed the SMB device. It appears to the
redirector and server as another interface, much as an
individual network adapter/protocol stack combination does.
At the TCP/IP stack however, the SMB device is bound to
ADDR_ANY, and it uses the DNS namespace natively, like a
Windows Sockets application. Calls placed on the SMB device
will result in a standard DNS lookup to resolve the (DNS)
name to an IP address, followed by a single outbound
connection request (even on a multihomed computer) using the
best source IP address and interface as determined by the
route table. Additionally, there is no NetBIOS session setup
on top of the TCP connection, as there is with traditional
NetBIOS over TCP/IP. By default, the redirector places calls
on both the NetBIOS device(s) and the SMB device, and the
file server receives calls on both. The file server SMB
device listens on TCP port 445 instead of the traditional
NetBIOS over TCP port 139.
NetBIOS over TCP Sessions
NetBIOS
sessions are established between two names. For example,
when a Windows 2000 Professional-based workstation makes a
file-sharing connection to a server using NetBIOS over
TCP/IP, the following sequence of events takes place:
-
The
NetBIOS name for the server is resolved to an IP
address.
-
The
IP address is resolved to a media access control
address.
-
A TCP
connection is established from the workstation to the
server, using
port 139.
-
The
workstation sends a NetBIOS Session Request to the
server name over the TCP connection. If the server is
listening on that name, it responds affirmatively, and a
session is established.
When the
NetBIOS session has been established, the workstation and
server negotiate which level of the SMB protocol to use.
Microsoft networking uses only one NetBIOS session between
two names at any time. Any additional file or print sharing
connections are multiplexed over the same NetBIOS session
using identifiers within the SMB header.
NetBIOS
keep-alives are used on each connection to verify that both
the server and workstation are still able to maintain their
session. Therefore, if a workstation is shut down
ungracefully, the server eventually cleans up the connection
and associated resources, and vice versa. NetBIOS
keep-alives are controlled by the SessionKeepAlive
registry parameter and default to once per hour.
If
LMhosts files are used and an entry is misspelled, it is
possible to attempt to connect to a server using the correct
IP address but an incorrect name. In this case, a TCP
connection is still established to the server. However, the
NetBIOS session request (using the wrong name) is rejected
by the server, because there is no listen posted on that
name. An Error 51, "Remote computer not listening," is
returned.
NetBIOS Datagram Services
Datagrams
are sent from one NetBIOS name to another over UDP port 138.
The datagram service provides the ability to send a message
to a unique name or to a group name. Group names may resolve
to a list of IP addresses or a broadcast. For example, the
command net send /d:mydomain test sends
a datagram containing the text "test" to the group name
mydomain[03]. The mydomain[03] name resolves to
an IP subnet broadcast, so the datagram is sent with the
following characteristics:
-
Destination media access control address: broadcast
(FFFFFFFFFFFF).
-
Source media access control address: The NIC address of
the local computer.
-
Destination IP address: The local subnet broadcast
address.
-
Source IP address: The IP address of the local computer.
-
Destination name: mydomain[03] (the messenger
service on the remote computers).
-
Source name: username[03] (the messenger service
on the local computer).
All hosts
on the subnet pick up the datagram and process it, at least
to the UDP protocol. On hosts that are running a NetBIOS
datagram service, UDP hands the datagram to NetBT on port
138. NetBT checks the destination name to see if any
application has posted a datagram receive on it and if so,
passes the datagram up. If no receive is posted, the
datagram is discarded.
If
support for NetBIOS is disabled in Windows 2000 (as
described earlier in this section), NetBIOS datagram
services are not available.
Critical Client Services
and Stack Components
The focus
of this paper is on core TCP/IP stack components, not on the
many available services that use it. However, the stack
itself relies upon a few services for configuration
information and name and address resolution. A few of these
critical client services are discussed here.
Automatic Client
Configuration and Media Sense
One of
the most important client services is the Dynamic Host
Configuration Protocol (DHCP) client. The DHCP client has an
expanded role in Windows 2000. Its primary new feature is
the ability to automatically configure an IP address and
subnet mask when the client is started on a small private
network without a DHCP server available to assign addresses
(such as a home network). Another new feature is support for
Media Sense, which can improve the roaming experience
for portable device users.
-
If a
Microsoft TCP/IP client is installed and set to
dynamically obtain TCP/IP protocol configuration
information from a DHCP server (instead of being
manually configured with an IP address and other
parameters), the DHCP client service is engaged each
time the computer is restarted. The DHCP client service
now uses a two-step process to configure the client with
an IP address and other configuration information.
-
When
the client is installed, it attempts to locate a DHCP
server and obtain a configuration from it. Many TCP/IP
networks use DHCP servers that are administratively
configured to hand out information to clients on the
network. If this attempt to locate a DHCP server fails,
the Windows 2000 DHCP client autoconfigures its stack
with a selected IP address from the IANA-reserved class
B network 169.254.0.0 with the subnet mask 255.255.0.09
. The DHCP client tests (using a gratuitous ARP) to make
sure that the IP address that it has chosen is not
already in use. If it is in use, it selects another IP
address (it does this for up to 10 addresses). Once the
DHCP client has selected an address that is verifiably
not in use, it configures the interface with this
address. It continues to check for a DHCP server in the
background every 5 minutes. If a DHCP server is found,
the autoconfiguration information is abandoned, and the
configuration offered by the DHCP server is used
instead. This autoconfiguration feature is known as
Automatic Private IP Addressing (APIPA) and allows
single subnet home office or small office networks to
use TCP/IP without static configuration or the
administration of a DHCP server.
If the
DHCP client has previously obtained a lease from a DHCP
server, the following modified sequence of events occurs:
-
If
the client's lease is still valid (not expired) at boot
time, the client tries to renew its lease with the DHCP
server. If the client fails to locate a DHCP server
during the renewal attempt, it tries to ping the default
gateway that is listed in the lease. If pinging the
default gateway succeeds, the DHCP client assumes that
it is still located on the same network where it
obtained its current lease and continues to use the
lease. By default, the client attempts to renew its
lease in the background when half of its assigned lease
time has expired.
-
If
the attempt to ping the default gateway fails, the
client assumes that it has been moved to a network that
has no DHCP services currently available (such as a home
network), and autoconfigures itself as described above.
Once autoconfigured, it continues to try to locate a
DHCP server every 5 minutes, in the background.
Media Sense
support was added in NDIS 5.0. It provides a mechanism for
the Network Interface Card (NIC) to notify the protocol
stack of media connect and media disconnect events. Windows
2000 TCP/IP utilizes these notifications to assist in
automatic configuration. For instance, in Windows NT 4.0,
when a portable computer was located and DHCP was configured
on an Ethernet subnet, and then moved to another subnet
without rebooting, the protocol stack received no indication
of the move. This meant that the configuration parameters
became stale, and not relevant to the new network.
Additionally, if the computer was shut off, carried home and
rebooted, the protocol stack was not aware that the NIC was
no longer connected to a network, and again stale
configuration parameters remained. This could be
problematic, as subnet routes, default gateways, and so on,
could conflict with dial-up parameters.
Media
Sense support allows the protocol stack to react to events
and invalidate stale parameters. For instance, if a computer
running Windows 2000 is unplugged from the network (assuming
the NIC supports Media Sense), after a damping period
implemented in the stack (currently 3 seconds), TCP/IP will
invalidate the parameters associated with the network which
has been disconnected. The IP address(es) will no longer
allow sends, and any routes associated with the interface
are invalidated. You can make the network connection status
visible on the taskbar by selecting a connection,
right-clicking it, clicking Properties, and then
selecting the Show icon in taskbar when connected
check box. The network connection icon will also appear
automatically with a red "X" when the adapter is having a
connectivity problem.
If an
application is bound to a socket that is using an
invalidated address, it should handle the event and recover
in a graceful way, such as attempting to use another IP
address on the system or notifying the user of the
disconnect.
Dynamic Update DNS Client
Windows
2000 includes support for dynamic updates to DNS as
described in RFC 2136. Every time there is an address event
(new address or renewal), the DHCP client sends option 81
and its fully qualified name to the DHCP server, and
requests the DHCP server to register a DNS pointer resource
record PTR RR on its behalf. The dynamic update client
handles the A RR registration on its own. This is done
because only the client knows which IP addresses on the host
map to that name. The DHCP server may not be able to
properly do the A RR registration because it has incomplete
knowledge. However, the DHCP server can be configured to
instruct the client to allow the server to register both
records with the DNS. Registry parameters associated with
the dynamic update DNS client are documented in Appendix C.
The
Windows 2000 DHCP server handles option 81 requests as
specified in the draft RFC10 . If a Windows 2000
DHCP client talks to a down-level DHCP server that does not
handle option 81, it registers a PTR RR on its own. The
Windows 2000 DNS server is capable of handling dynamic
updates.
Statically configured (non-DHCP) clients register both the A
RR and the PTR RR with the DNS server themselves.
DNS Resolver Cache Service
Windows
2000 includes a caching DNS resolver service, which is
enabled by default. For troubleshooting purposes, this
service can be viewed, stopped, and started like any other
Windows service. The caching resolver reduces DNS network
traffic and speeds name resolution by providing a local
cache for DNS queries. Name query responses are cached for
the TTL specified in the response (not to exceed the value
specified in the MaxCacheEntryTtlLimit parameter),
and future queries are answered from the cache, when
possible. One interesting feature of the DNS Resolver Cache
Service is that it supports negative caching. For example,
if a query is made to a DNS server for a given host name and
the response is negative, succeeding queries for the same
name are answered (negatively) from the cache for
NegativeCacheTime seconds (the default is 300). Another
example of negative caching is that if all DNS servers are
queried and none are available, for NetFailureCacheTime
seconds (the default is 30) all succeeding name queries fail
instantly, instead of timing out. This feature can save time
for services that query the DNS during the boot process,
especially when the client is booted from the network.
The DNS
Resolver Cache Service has a number of other adjustable
registry parameters, which are documented in Appendix C.
TCP/IP Troubleshooting
Tools and Strategies
Many
network troubleshooting tools are available for Windows.
Most are included in the product or the Windows 2000 Server
Resource Kit. Microsoft Network Monitor is an excellent
network-tracing tool. The full version is part of the
Microsoft Systems Management Server product, and a more
limited version is included in the Windows 2000 Server
product.
When
troubleshooting any problem, it is helpful to use a logical
approach. Some questions to ask are:
-
What
does work?
-
What
does not work?
-
How
are the things that do and do not work related?
-
Have
the things that do not work ever worked on this
computer/network?
-
If
so, what has changed since it last worked?
Troubleshooting a problem from the bottom up is often a good
way to isolate the problem quickly. The tools listed below
are organized for this approach.
IPConfig Tool
IPConfig
is a command-line utility that prints out the TCP/IP-related
configuration of a host. When used with the /all
switch, it produces a detailed configuration report for all
interfaces, including any configured serial ports (RAS).
Output can be redirected to a file and pasted into other
documents:
C:\>ipconfig /allWindows 2000
IP configuration:
Host Name . . . . . .
. . . . . . : DAVEMAC2
Primary DNS Suffix .
. . . . . . : mytest.microsoft.com
Node Type . . . . . .
. . . . . . : Hybrid
IP Routing Enabled. .
. . . . . . : No
WINS Proxy Enabled. .
. . . . . . : No
DNS Suffix Search
List. . . . . . : microsoft.com
Ethernet adapter
Local Area Connection 2:
Connection-specific
DNS Suffix . :
Description . . . . .
. . . . . . : 3Com EtherLink III EISA (3C579-TP)
Physical Address. . .
. . . . . . : 00-20-AF-1D-2B-91
DHCP Enabled. . . . .
. . . . . . : No
Autoconfiguration
Enabled . . . . : Yes
IP Address. . . . . .
. . . . . . : 10.57.8.190
Subnet Mask . . . . .
. . . . . . : 255.255.255.0
Default Gateway . . .
. . . . . . :
DNS Servers . . . . .
. . . . . . : 10.57.9.254
Primary WINS Server .
. . . . . . : 10.57.9.254
Ethernet adapter
Local Area Connection:
Connection-specific
DNS Suffix . :
Description . . . . .
. . . . . . : AMD Family PCI Ethernet Adapter
Physical Address. . .
. . . . . . : 00-80-5F-88-60-9A
DHCP Enabled. . . . .
. . . . . . : No
IP Address. . . . . .
. . . . . . : 199.199.40.22
Autoconfiguration
Enabled . . . . : Yes
Subnet Mask . . . . .
. . . . . . : 255.255.255.0
Default Gateway . . .
. . . . . . : 199.199.40.1
DNS Servers . . . . .
. . . . . . : 199.199.40.254
Primary WINS Server .
. . . . . . : 199.199.40.254
Ping Tool
Ping is a tool that helps to verify IP-level
reachability. The ping command can be used to send an
ICMP echo request to a target name or IP address. First,
ping the IP address of the target host to see if it responds
because this is the simplest test. If that succeeds, try
pinging the name. Ping uses Windows Sockets-style
name resolution to resolve the name to an address;
therefore, if pinging by address succeeds but pinging by
name fails, the problem lies in name resolution, not network
connectivity.
Type
ping -? to see what command-line options are available.
Ping allows you to specify the size of packets to
use, how many to send, whether to record the route used,
what TTL value to use, and whether to set the don't
fragment flag. See the PMTU discovery section of this
document for details on using ping to manually determine the
PMTU between two computers.
The
following example illustrates how to send two pings, each
1450 bytes in size, to address 10.99.99.2:
C:\>ping -n 2 -l 1450
10.99.99.2
Pinging
10.99.99.2 with 1450 bytes of data:
Reply from 10.99.99.2:
bytes=1450 time<10ms TTL=32
Reply from
10.99.99.2: bytes=1450 time<10ms TTL=32
Ping
statistics for 10.99.99.2:
Packets: Sent = 2, Received =
2, Lost = 0 (0% loss),
Approximate round trip times in milliseconds:
Minimum = 0ms, Maximum = 0ms,
Average = 0ms
By
default, ping waits one second for each response to
be returned before timing out. If the remote system being
pinged is across a high-delay link, such as a satellite
link, responses could take longer to be returned. The -w
(wait) switch can be used to specify a longer time-out.
Computers using IPSec may require several seconds to set up
a security association before they respond to a ping.
PathPing Tool
The
Pathping command is a route-tracing tool that combines
features of the ping and tracert commands with additional
information that neither of those tools provides. The
Pathping command sends packets to each router on the way
to a final destination over a given period of time, and then
computes results based on the packets returned from each
hop. Since the command shows the degree of packet loss at
any given router or link, it is easy to determine which
routers or links might be causing network problems. The
switches –R –T can be used with Pathping to
determine whether the devices on the path are
802.1p-compliant and RSVP-aware.
The
following example illustrates the default output when
tracing the route to www.sectur.gov.ar [200.1.247.2] over a
maximum of 30 hops:
0 warren.microsoft.com
[163.15.2.217]
1
tnt2.seattle2.wa.da.uu.net [206.115.150.106]
2 206.115.169.217
3
119.ATM1-0-0.HR2.SEA1.ALTER.NET [152.63.104.38]
4
412.atm11-0.gw1.sea1.ALTER.NET [137.39.13.73]
5
teleglobe2-gw.customer.ALTER.NET [157.130.177.222]
6
if-0-3.core1.Seattle.Teleglobe.net [207.45.222.37]
7
if-1-3.core1.Burnaby.Teleglobe.net [207.45.223.113]
8
if-1-2.core1.Scarborough.Teleglobe.net [207.45.222.189]
9
if-2-1.core1.Montreal.Teleglobe.net [207.45.222.121]
10
if-3-1.core1.PennantPoint.Teleglobe.net [207.45.223.41]
11
if-5-0-0.bb1.PennantPoint.Teleglobe.net [207.45.222.94]
12
BOSQUE-aragorn.tecoint.net [200.43.189.230]
13
ARAGORN-bosque.tecoint.net [200.43.189.229]
14
GANDALF-aragorn.tecoint.net [200.43.189.225]
15
Startel.tecoint.net [200.43.189.18]
16 200.26.9.245
17 200.26.9.26
18 200.1.247.2
Computing
statistics for 450 seconds:
Source to Here This Node/Link
Hop RTT Lost/Sent =
Pct Lost/Sent = Pct Address
0
warren.microsoft.com [63.15.2.217]
0/ 100 = 0% |
1 115ms 0/ 100 = 0%
0/ 100 = 0% tnt2.seattle2.wa.da.uu.net [206.115.150.106]
0/ 100 = 0% |
2 121ms 0/ 100 = 0%
0/ 100 = 0% 206.115.169.217
0/ 100 = 0% |
3 122ms 0/ 100 = 0%
0/ 100 = 0% 119.ATM.ALTER.NET [152.63.104.38]
0/ 100 = 0% |
4 124ms 0/ 100 = 0%
0/ 100 = 0% 412.atm.sea1.ALTER.NET [137.39.13.73]
0/ 100 = 0% |
5 157ms 0/ 100 = 0%
0/ 100 = 0% teleglobe2-gw.ALTER.NET [157.130.177.222]
0/ 100 = 0% |
6 156ms 0/ 100 = 0%
0/ 100 = 0% if-0-3.Teleglobe.net [207.45.222.37]
|