Tutorial on Network Management This is an excerpt (the appendix; P155-177) of Internet FYI/RFC: FYI on a Network Management Tool Catalog: Tools for Monitoring and Debugging TCP/IP Internets and Interconnected Devices (Request for Comments: 1147; FYI: 2) The original RFC was edited by R. Stine Appendix Network Management Tutorial Network Management Tutorial This tutorial is an overview of the practice of network management. Reading this section is no substitute for know- ing your system, and knowing how it is used. Do not wait until things break to learn what they ought to do or how they usually work: a crisis is not the time for determining how "normal" packet traces should look. Furthermore, it takes little imagination to realize that you do not want to be digging through manuals while your boss is screaming for network service to be restored. We assume an acquaintance with the TCP/IP protocol suite and the Internet architecture. There are many available refer- ences on these topics, several of which are listed below in Section 7. Since many of the details of network management are system- specific, this tutorial is a bit superficial. There is, however, a more fundamental problem in prescribing network management practices: network management is not a well- understood endeavor. At present, the cutting edge of net- work management is the use of distributed systems to collect and exchange status information, and then to display the data as histograms or trend lines. It is not clear that we know what data should be collected, how to analyze it when we get it, or how to structure our collection systems. For now, automated, real-time control of internets is an aspira- tion, rather than a reality. The communications systems that we field are apparently more complex than we can comprehend, which no doubt accounts in part for their fre- quently surprising behavior. The first section of this tutorial lists the overall goals and functions of network management. It presents several aspects of network management, including system monitoring, fault detection and isolation, performance testing, confi- guration management, and security. These discussions are followed by a bibliographic section. The tutorial closes with some final advice for network managers. 1. Network Management Goals and Functions An organization's view of network management goals is shaped by two factors: IETF NOCTools Working Group [Page 155] Appendix Network Management Tutorial 1. people in the organization depend on the system working, 2. LANs, routers, lines, and other communications resources have costs. From the organizational vantage point, the ultimate goal of network management is to provide a consistent, predictable, acceptable level of service from the available data communi- cations resources. To achieve this, a network manager must first be able to perform fault detection, isolation, and correction. He must also be able to effect configuration changes with a minimum of disruption, and measure the utili- zation of system components. People actually managing networks have a different focus. Network managers are usually evaluated by the availability and performance of their communications systems, even though many factors of net performance are beyond their control. To them, the most important requirement of a network manage- ment tool is that it allows the detection and diagnosis of faults before users can call to complain: users (and bosses) can often be placated just by knowing that a network problem has been diagnosed. Another vital network management func- tion is the ability to collect data that justify current or future expenditures for the data communications plant and staff. Following a section on system monitoring, this tutorial addresses fault, performance, configuration, and security management. By fault management, we mean the detection, diagnosis, and correction of network malfunctions. Under the subject of performance management, we include support for predictable, efficient service, as well as capacity planning and capacity testing. Configuration management includes support for orderly configuration changes (usually, system growth), and local administration of component names and addresses. Security management includes both protecting system components from damage and protecting sensitive information from unintentional or malicious disclosure or corruption. Readers familiar with the ISO management standards and drafts will note both that we have borrowed heavily from the "OSI Management Framework," except that we have omitted the "account management" function. Account management seems a bit out of place with the other network management IETF NOCTools Working Group [Page 156] Appendix Network Management Tutorial functions. The logging required by account management is likely to be done by specialized, dedicated subsystems that are distinct from other network management components. Hence, this tutorial does not cover account management. Rest assured, however, that account management, if required, will be adequately supported and staffed. For those with a DoD background, security may also seem out of place as a subtopic of network management. Without doubt, communications security is an important issue that should be considered in its own right. Because of the requirements of trust for security mechanisms, security com- ponents will probably not be integrated subcomponents of a larger network management system. Nevertheless, because a network manager has a responsibility to protect his system from undue security risks, this tutorial includes a discus- sion on internet security. 2. System Monitoring System monitoring is a fundamental aspect of network manage- ment. One can divide system monitoring into two rough categories: error detection and baseline monitoring. System errors, such as misformatted frames or dropped pack- ets, are not in themselves cause for concern. Spikes in error rates, however, should be investigated. It is sound practice to log error rates over time, so that increases can be recognized. Furthermore, logging error rates as a func- tion of traffic rates can be used to detect congestion. Investigate unusual error rates and other anomalies as they are detected, and keep a notebook to record your discoveries. Day-to-day traffic should be monitored, so that the opera- tional baselines of a system and its components can be determined. As well as being essential for performance management, baseline determination and traffic monitoring are the keys to early fault detection. A preliminary step to developing baseline measurements is construction of a system map: a graphical representation of the system components and their interfaces. Then, measure- ments of utilization (i.e., use divided by capacity) are needed. Problems are most likely to arise, and system tun- ing efforts are most likely to be beneficial, at highly utilized components. IETF NOCTools Working Group [Page 157] Appendix Network Management Tutorial It is worthwhile to develop a source/destination traffic matrix, including a breakdown of traffic between the local system and other internet sites. Both volume and type of traffic should be logged, along with its evolution over time. Of particular interest for systems with diskless workstations is memory swapping and other disk server access. For all systems, broadcast traffic and routing traffic should be monitored. Sudden increases in the vari- ance of delay or the volume of routing traffic may indicate thrashing or other soft failures. In monitoring a system, long-term averages are of little use. Hourly averages are a better indicator of system use. Variance in utilization and delay should also be tracked. Sudden spikes in variance are tell-tale signs that a problem is looming or exists. So, too, are trends of increased packet or line errors, broadcasts, routing traffic, or delay. 3. Fault Detection and Isolation When a system fails, caution is in order. A net manager should make an attempt to diagnose the cause of a system crash before rebooting. In many cases, however, a quick diagnosis will not be possible. For some high priority applications, restoring at least some level of service will have priority over fault repair or even complete fault diag- nosis. This necessitates prior planning. A net manager must know the vital applications at his site. If applica- tions require it, he must also have a fall-back plan for bringing them online. Meanwhile, repeated crashes or hardware failures are unambiguous signs of a problem that must be corrected. A network manager should prepare for fault diagnosis by becoming familiar with how diagnostic tools respond to net- work failure. In times of relative peace, a net manager should occasionally unplug the network connection from an unused workstation and then "debug" the problem. When diagnosing a fault or anomaly, it is vital to proceed in an orderly manner, especially since network faults will usually generate spurious as well as accurate error mes- sages. Remember to keep in mind that the network itself is failing. Do not place too much trust in anything obtained remotely. Furthermore, it is unlikely to be significant that remote information such as DNS names or NFS files can- not be obtained. IETF NOCTools Working Group [Page 158] Appendix Network Management Tutorial Even spurious messages can be revealing, because they pro- vide clues to the problem. From the data at hand, develop working hypotheses about probable causes of the problems you detect. Direct your further data gathering efforts so that the information you get will either refute or support your hypotheses. An orderly approach to debugging is facilitated if it is guided by a model of network behavior. The following por- tions of this section present such a model, along with a procedure for checking network connectivity. The section concludes with some hints for diagnosing a particularly tricky class of connectivity problem. 3.1 A Network Model as a Diagnostic Framework The point of having a model of how things work is to have a basis for developing educated guesses about how things go wrong. The problem of cascading faults -- faults generating other faults -- makes use of a conceptual model a virtual necessity. In general, only problems in a component's hardware or operating system will generate simultaneous faults in multi- ple protocol layers. Otherwise, faults will propagate vert- ically (up the protocol stack) or horizontally (between peer-level communications components). Applying a concep- tual model that includes the architectural relations of net- work components can help to order an otherwise senseless barrage of error messages and symptoms. The model does not have to be formal or complex to bring structure to debugging efforts. A useful start is something as simple as the following: 1. Applications programs use transport services: TCP/UDP. Before using service, applications that accept host names as parameters must translate the names into IP addresses. Translation may be based on a static table lookup (/etc/hosts file in UNIX hosts), the DNS, or yellow pages. Nslookup and DiG are tools for monitoring the activities of the DNS. 2. Transport protocol implementations use IP ser- vices. The local IP module makes the initial decision on forwarding. An IP datagram is for- warded directly to the destination host if the IETF NOCTools Working Group [Page 159] Appendix Network Management Tutorial destination is on the same network as the source. Otherwise, the datagram is forwarded to a gateway attached to the network. On BSD hosts, the con- tents of a host's routing table are visible by use of the "netstat" command.* 3. IP implementations translate the IP address of a datagram's next hop (either the destination host or a gateway) to a local network address. For ethernets, the Address Resolution Protocol (ARP) is commonly used for this translation. On BSD systems, an interface's IP address and other con- figuration options can be viewed by use of the "ifconfig" command, while the contents of a host's ARP cache may be viewed by use of "arp" command. 4. IP implementations in hosts and gateways route datagrams based on subnet and net identifiers. Subnetting is a means of allocating and preserving IP address space, and of insulating users from the topological details of a multi-network campus. Sites that use subnetting reserve portions of the IP address's host identifier to indicate particu- lar networks at their campus. Subnetting is highly system-dependent. The details are a criti- cal, though local, issue. As for routing between separate networks, a variety of gateway-to-gateway protocols are used. Traceroute is a useful tool for investigating routing problems. The tool, "query," can be used to examine RIP routing tables. A neophyte network manager should expand the above descrip- tion so that it accurately describes his particular system, _________________________ * Initial forwarding may actually be complex and vulnerable to multiple points of failure. For example, when sending an IP datagram, 4.3BSD hosts first look for a route to the particular host. If none has been specified for the destination, then a search is made for a route to the network of the destination. If this search also fails, then as a last resort, a search is made for a route to a "default" gateway. Routes to hosts, networks, and the "default" gateway may be stat- ic, loaded at boot time and perhaps updated by operator commands. Alternatively, they may be dynamic, loaded from redirects and routing protocol updates. IETF NOCTools Working Group [Page 160] Appendix Network Management Tutorial and learn the tools and techniques for monitoring the opera- tions at each of the above stages. 3.2 A Simple Procedure for Connectivity Check In this section, we describe a procedure for isolating a TCP/IP connectivity problem.** In this procedure, a series of tests methodically examine connectivity from a host, starting with nearby resources and working outward. The steps in our connectivity-testing procedure are: 1. As an initial sanity check, ping your own IP address and the loopback address. 2. Next, try to ping other IP hosts on the local subnet. Use numeric addresses when starting off, since this eliminates the name resolvers and host tables as poten- tial sources of problems. The lack of an answer may indicate either that the destination host did not respond to ARP (if it is used on your LAN), or that a datagram was forwarded (and hence, the destination IP address was resolved to a local media address) but that no ICMP Echo Reply was received. This could indicate a length-related problem, or misconfigured IP Security. 3. If an IP router (gateway) is in the system, ping both its near and far-side addresses. 4. Make sure that your local host recognizes the gateway as a relay. (For BSD hosts, use netstat.) 5.addresses Still using numeric IP addresses, try to ping hosts beyond the gateway. If you get no response, run hop- check or traceroute, if available. Note whether your packets even go to the gateway on their way to the des- tination. If not, examine the methods used to instruct your host to use this gateway to reach the specified destination net (e.g., is the default route in place? Alternatively, are you successfully wire-tapping the IGP messages broadcast on the net you are attached to?) _________________________ ** Thanks to James VanBokkelen, president of FTP Software, for sharing with us a portion of a PC/TCP support document, the basis for the above connectivity procedure. IETF NOCTools Working Group [Page 161] Appendix Network Management Tutorial If traceroute is not available, ping, netstat, arp, and a knowledge of the IP addresses of all the gateway's interfaces can be used to isolate the cause of the problem. Use netstat to determine your next hop to the destination. Ping that IP address to ensure the router is up. Next, ping the router interface on the far sub- net. If the router returns "network unreachable" or other errors, investigate the router's routing tables and interface status. If the pings succeed, ping the close interface of the succeeding next hop gateway, and so on. Remember the routing along the outbound and return paths may be different. 6. Once ping is working with numeric addresses, use ping to try to reach a few remote hosts by name. If ping fails when host names are used, check the operation of the local name-mapping system (i.e., with nslookup or DiG). If you want to use "shorthand" forms ("myhost" instead of "myhost.mydomain.com"), be sure that the alias tables are correctly configured. 7. Once basic reachability has been established with ping, try some TCP-based applications: FTP and TELNET are supported on almost all IP hosts, but FINGER is a simpler protocol. The Berkeley-specific protocols (RSH, RCP, REXEC and LPR) require extra configuration on the server host before they can work, and so are poor choices for connectivity testing. If problems arise in steps 2-7 above, rerunning the tests while executing a line monitor (e.g., etherfind, netwatch, or tcpdump) can help to pinpoint the problem. The above procedure is sound and useful, especially if lit- tle is known about the cause of the connectivity problem. It is not, however, guaranteed to be the shortest path to diagnosis. In some cases, a binary search on the problem might be more effective (i.e., try a test "in the middle," in a spot where the failure modes are well defined). In other cases, available information might so strongly suggest a particular failure that immediately testing for it is in order. This last "approach," which might be called "hunting and pecking," should be used with caution: chasing one will o' the wisp after another can waste much time and effort. Note that line problems are still among the most common causes of connectivity loss. Problems in transmission across local media are outside the scope of this tutorial. IETF NOCTools Working Group [Page 162] Appendix Network Management Tutorial But, if a host or workstation loses or cannot establish con- nectivity, check its physical connection. 3.3 Limited Connectivity An interesting class of problems can result in a particu- larly mysterious failure: TELNET or other low-volume TCP connections work, but large file transfers fail. FTP transfers may start, but then hang. There are several pos- sible culprits in this problem. The most likely suspects are IP implementations that cannot fragment or reassemble datagrams, and TCP implementations that do not perform dynamic window sizing (a.k.a. Van Jacobson's "Slow Start" algorithm). Another possibility is mixing incompatible frame formats on an ethernet. Even today, some IP implementations in the Internet cannot correctly handle fragmentation or reassembly. They will work fine for small packets, but drop all large packets. The problem can also be caused by buffer exhaustion at gate- ways that connect interfaces of widely differing bandwidth. Datagrams from a TCP connection that traverses a bottleneck will experience queue delays, and will be dropped if buffer resources are depleted. The congestion can be made worse if the TCP implementation at the traffic source does not use the recommended algorithms for computing retransmission times, since spuriously retransmitted datagrams will only add to the congestion.* Fragmentation, even if correctly implemented, will compound this problem, since processing delays and congestion will be increased at the bottleneck. Serial Line Internet Protocol (SLIP) links are especially vulnerable to this and other congestion problems. SLIP lines are typically an order of magnitude slower than other gateway interfaces. Also, the SLIP lines are at times con- figured with MTUs (Maximum Transfer Unit, the maximum length of an IP datagram for a particular subnet) as small as 256 _________________________ * To avoid this problem, TCP implementations on the In- ternet must use "exponential backoff" between succes- sive retransmissions, Karn's algorithm for filtering samples used to estimate round-trip delay between TCP peers, and Jacobson's algorithm for incorporating vari- ance into the "retransmission time-out" computation for TCP segments. See Section 4.2.3.1 of RFC 1122, "Re- quirements for Internet Hosts -- Communication Layers." IETF NOCTools Working Group [Page 163] Appendix Network Management Tutorial bytes, which virtually guarantees fragmentation. To alleviate this problem, TCP implementations behind slow lines should advertise small windows. Also, if possible, SLIP lines should be configured with an MTU no less than 576 bytes. The tradeoff to weigh is whether interactive traffic will be penalized too severly by transmission delays of lengthy datagrams from concurrent file transfers. Misuse of ethernet trailers can also cause the problem of hanging file transfers. "Trailers" refers to an ethernet frame format optionally employed by BSD systems to minimize buffer copying by system software. BSD systems with ether- net interfaces can be configured to send large frames so that their address and control data are at the end of a frame (hence, a "trailer" instead of a "header"). After a memory page is allocated and loaded with a received ethernet frame, the ethernet data will begin at the start of the memory page boundary. Hence, the ethernet control informa- tion can be logically stripped from the end merely by adjusting the page's length field. By manipulating virtual memory mapping, this same page (sans ethernet control infor- mation), can then be passed to the local IP module without additional allocation and loading of memory. The disadvan- tage in using trailers is that it is non-standard. Many implementations cannot parse trailers. The hanging FTP problem will appear if a gateway is not con- figured to recognize trailers, but a host or gateway immedi- ately "upstream" on an ethernet uses them. Short datagrams will not be formatted with trailers, and so will be pro- cessed correctly. When the bulk data transfer starts, how- ever, full-sized frames will be sent, and will use the trailer format. To the gateway that receives them, they appear simply as misformatted frames, and are quietly dropped. The solution, obviously, is to insure that all hosts and gateways on an ethernet are consistent in their use of trailers. Note that RFC 1122, "Internet Host Requirements," places very strict restrictions on the use of trailers. 4. Performance Testing Performance management encompasses two rather different activities. One is passive system monitoring to detect problems and determine operational baselines. The goal is to measure system and component utilization and so locate bottlenecks, since bottlenecks should receive the focus of IETF NOCTools Working Group [Page 164] Appendix Network Management Tutorial performance tuning efforts. Also, performance data is usu- ally required by upper level management to justify the costs of communications systems. This is essentially identical to system monitoring, and is addressed at greater length in Section 2, above. Another aspect of performance management is active perfor- mance testing and capacity planning. Some work in this area can be based on analysis. For example, a rough estimate of gateway capacity can be deduced from a simple model given by Charles Hedrick in his "Introduction to Administration of an Internet-based Local Network," which is per-packet processing time = switching time + (packet size) * (transmission bps). Another guideline for capacity planning is that in order to avoid excessive queuing delays, a system should be sized at about double its expected load. In other words, system capacity should be so high that utilization is no greater than 50%. Although there are more sophisticated analytic models of communications systems than those above, their added com- plexity does not usually gain a corresponding accuracy. Most analytic models of communications nets require assump- tions about traffic load distributions and service rates that are not merely problematic, but are patently false. These errors tend to result in underestimating queuing delays. Hence, it is often necessary to actually load and measure the performance of a real communications system if one is to get accurate performance predictions. Obviously, this type of testing is performed on isolated systems or during off hours. The results can be used to evaluate parameter settings or predict performance during normal operations. Simulations can be used to supplement the testing of real systems. To be believable, however, simulations require validation, which, in turn, requires measurements from a real system. Whether testing or simulating a system's per- formance, actual traffic traces should be incorporated as input to traffic generators. The performance of a communi- cations system will be greatly influenced by its load characteristics (burstiness, volume, etc.), which are them- selves highly dependent on the applications that are run. IETF NOCTools Working Group [Page 165] Appendix Network Management Tutorial When tuning a net, in addition to the usual configuration parameters, consider the impact of the location of gateways and print and file servers. A few rules of thumb can guide the location of shared system resources. First, there is the principle of locality: a system will perform better if most traffic is between nearby destinations. The second rule is to avoid creating bottlenecks. For example, multi- ple diskservers may be called for to support a large number of workstations. Furthermore, to avoid LAN and diskserver congestion, workstations should be configured with enough memory to avoid frequent swapping. As a final note on performance management, proceed cau- tiously if your ethernet interface allows you to customize its collision recovery algorithm. This is almost always a bad idea. The best that it can accomplish is to give a few favored hosts a disproportionate share of the ethernet bandwidth, perhaps at the cost of a reduction in total sys- tem throughput. Worse, it is possible that differing colli- sion recovery algorithms may exhibit a self-synchronizing behavior, so that excess collisions are generated. 5. Configuration Management Configuration management is the setting, collecting, and storing of the state and parameters of network resources. It overlaps all other network management functions. Hence, some aspects of configuration management have already been addressed (e.g., tuning for performance). In this section, we will focus on configuration management activities needed to "hook up" a net or campus to a larger internet. We will not, of course, include specific details on installing or maintaining internetted communications systems. We will, however, skim over some of the TCP/IP configuration highlights. Configuration management includes "name management" -- the control and allocation of system names and addresses, and the translation between names and addresses. Name-to- address translation is performed by "name servers." We con- clude this section with a few strictures on the simultaneous use of two automated name-servers, the Domain Name System (DNS), and Yellow Pages (YP). 5.1 Required Host Configuration Data for TCP/IP internets In a TCP/IP internet, each host needs several items of information for internet communications. Some will be IETF NOCTools Working Group [Page 166] Appendix Network Management Tutorial host-specific, while other information will be common for all hosts on a subnet. In a soon to be published RFC docu- ment,* R. Droms identifies the following configuration data required by internet hosts: o An IP address, a host specific value that can be hard-coded or obtained via BOOTP, the Reverse Address Resolution Protocol (RARP) or Dynamic RARP (DRARP). o Subnet properties, such as the subnet mask and the Maximum Transmission Unit (MTU); obviously, these values are not host-specific. o Addresses of "entry" gateways to the internet; addresses of default gateways are usually hard- coded; though the ICMP "redirect" message can be used to refine a host's routing tables, there is currently no dynamic TCP/IP mechanism or protocol for a host to locate a gateway; an IETF working group is busy on this problem. o For hosts in internets using the Domain Name Sys- tem (DNS) for name-to-address translation, the location of a local DNS server is needed; this information is not host-specific, and usually hard-coded; o Host name (domain name, for hosts using DNS); obviously host-specific; either hard-coded or obtained in a boot procedure. o For diskless hosts, various boot services. BOOTP is the standard Internet protocol for downloading boot configuration information. The Trivial File Transfer Protocol (TFTP) is typically used for downloading boot images. Sun computers use the "bootparams" RPC mechanism for downloading initial configuration data to a host. There are ongoing developments, most notably the work of the Dynamic Host Configuration Working Group of the IETF, to support dynamic, automatic gathering of the above data. In the meantime, most systems will rely on hand-crafted confi- guration files. _________________________ * Draft "Dynamic Configuration of Internet Hosts." IETF NOCTools Working Group [Page 167] Appendix Network Management Tutorial 5.3 Connecting to THE Internet The original TCP/IP Internet (spelled with an upper-case "I") is still active, and still growing. An interesting aspect of the Internet is that it spans many independently administered systems. Connection to the Internet requires: a registered network number, for use in IP addresses; a registered autonomous system number (ASN), for use in internet routing; and, a registered domain name. Fielding a primary and backup DNS server is a condition for registering a domain name. The Defense Data Network (DDN) Network Information Center (NIC) is responsible for registering network numbers, auto- nomous system numbers, and domain names. Regional nets will have their own policies and requirements for Internet con- nections, but all use the NIC for this registration service. Contact the NIC for further information, at: DDN Network Information Center SRI International, Room EJ291 333 Ravenswood Avenue Menlo Park, CA 94025 Email: HOSTMASTER@NIC.DDN.MIL Phone: 1-415-859-3695 1-800-235-3155 (toll-free hotline) 5.4 YP and DNS: Dueling name servers. The Domain Name System (DNS) provides name service: it translates host names into IP addresses (this mapping is also called "resolution"). Two widespread DNS implementa- tions are "bind" and "named." The Sun Yellow Pages (YP) system can be configured to provide an identical service, by providing remote, keyed access to the "hosts.byname" map. Unfortunately, if both DNS and the YP hosts.byname map are installed, they can interact in disruptive ways. The problem has been noted in systems in which DNS is used as a fallback, to resolve hostnames that YP cannot. If DNS is slow in responding, the timeout in program ypserv may expire, which triggers a repeated request. This can result in disaster if DNS was initially slow because of congestion: the slower things get, the more requests are generated, which slows things even more. A symptom of this problem is that failures by the DNS server or network will trigger IETF NOCTools Working Group [Page 168] Appendix Network Management Tutorial numerous requests to DNS. Reportedly, the bug in YP that results in the avalanche of DNS requests has been repaired in SunOS 4.1. The problem, however, is more fundamental than an implementation error. The YP map hosts.byname and the DNS contain the same class of information. One can get an answer to the same query from each system. These answers may well be different: there is not a mechanism to maintain consistency between the systems. More critical, however, is the lack of a mechanism or procedure to establish which system is authoritative. Hence, running the DNS and YP name services in parallel is pointless. If the systems stay consistent, then only one is needed. If they differ, there is no way to choose which is correct. The YP hosts.byname service and DNS are comparable, but incompatible. If possible, a site should not run both ser- vices. Because of Internet policy, sites with Internet con- nections MUST use the DNS. If YP is also used, then it should be configured with YP-to-DNS disabled. Hacking a system so that it uses DNS rather than the YP hosts.byname map is not trivial, and should not be attempted by novices. The approach is to rebuild the shared C link- library, so that system calls to gethostbyname() and gethostbyaddr() will use DNS rather than YP. To complete the change, programs that do not dynamically link the shared C library (rcp, arp, etc.) must also be rebuilt. Modified shared C libraries for Sun 3s and Sun 4s are avail- able via anonymous FTP from host uunet.uu.net, in the sun- fixes directory. Note that use of DNS routines rather than YP for general name resolution is not a supported SunOS feature at this time. 6. Internet Security The guidelines and advice in this section pertain to enhanc- ing the protection of data that are merely "sensitive." By themselves, these measures are insufficient for protecting "classified" data. Implementing the policies required to protect classified data is subject to stringent, formal review procedures, and is regulated by agencies such as the Defense Investigative Service (DIS) and the National Secu- rity Agency (NSA). A network manager must realize that he is responsible for IETF NOCTools Working Group [Page 169] Appendix Network Management Tutorial protecting his system and its users. Furthermore, though the Internet may appear to be a grand example of a coopera- tive joint enterprise, recent incidents have made it clear that not all Internet denizens are benign. A network manager should be aware that the network services he runs have a large impact on the security risks to which his system is exposed. The prudent network manager will be very careful as to what services his site provides to the rest of the Internet, and what access restrictions are enforced. For example, the protocol "finger" may provide more information about a user than should be given to the world at large. Worse, most implementations of the protocol TFTP give access to all world-readable files. This section highlights several basic security considera- tions for Internet sites. It then lists several sources of information and advice on improving the security of systems connected to the Internet. 6.1 Basic Internet Security Two major Internet security threats are denial of service and unauthorized access. Denial of service threats often take the form of protocol spoofers or other malicious traffic generators. These prob- lems can be detected through system monitoring logs. If an attack is suspected, immediately contact your regional net office (e.g., SURANET, MILNET). In addition, DDN users should contact SCC, while other Internet users should con- tact CERT (see below). A cogent description of your system's symptoms will be needed. At your own site, be prepared to isolate the problems (e.g., by limiting disk space available to the message queue of a mail system under attack). As a last resort, coping with an attack may require taking down an Internet connection. It is better, however, not to be too quick to quarantine your site, since information for coping with the attack may come via the Internet. Unauthorized access is a potentially more ominous security threat. The main avenues are attacks against passwords and attacks against privileged system processes. An appallingly common means of gaining entry to systems is by use of the initial passwords to root, sysdiag, and other IETF NOCTools Working Group [Page 170] Appendix Network Management Tutorial management accounts that systems are shipped with. Only slightly less vulnerable are common or trivial passwords, since these are readily subverted by dictionary attacks.* Obvious steps can reduce the risk of password attacks: pass- words should be short-lived, at least eight characters long, with a mix of upper and lower case, and preferably random. The distasteful aspect of memorizing a random string can be alleviated if the password is pronounceable. Improving passwords does not remove all risks. Passwords transmitted over an ethernet are visible to all attached systems. Furthermore, gateways have the potential to inter- cept passwords used by any FTP or TELNET connections that traverse them. It is a bad idea for the root account to be accessed by FTP or TELNET if the connections must cross untrusted elements. Attacks against system processes are another avenue of unau- thorized access. The principle is that by subverting a sys- tem process, the attacker can then gain its access privileges. One approach to reducing this risk is to make system pro- grams harder to subvert. For example, the widespread attack in November 1988 by a self-replicating computer program ("worm," analogous to a tapeworm) subverted the "fingerd" process, by loading an intrusive bootstrap program (known variously as a "grappling hook" or "vector" program), and then corrupting the stack space so that a subroutine's return address was overwritten with the address of the bootstrap program.** The security hole in fingerd consisted of an input routine that did not have a length check. Secu- rity fixes to "fingerd" include the use of a revised input routine. A more general protection is to apply the principle of "least privilege." Where possible, system routines should run under separate user IDs, and should have no more privilege than is necessary for them to function. _________________________ * Exotic fantasy creatures and women's names are well represented in most password dictionaries. ** An early account of the Internet Worm incident of November 1988 is given by Eugene Spafford in the Janu- ary 89 issue of "Computer Communications Review." Several other articles on the worm incident are in the June 89 issue of the "Communications of the ACM." IETF NOCTools Working Group [Page 171] Appendix Network Management Tutorial To further protect against attacks on system processes, sys- tem managers should regularly check their system programs to ensure that they have not been tampered with or modified in any way. Checksums should be used for this purpose. Using the operating system to check a file's last date of modifi- cation is insufficient, since the date itself can be compromised. Finally, to avoid the unauthorized replacement of system code, care should be exercised in assigning protection to its directory paths. Some system programs actually have "trap doors" that facili- tate subversion. A trap door is the epitome of an undocu- mented feature: it is a hidden capability of a system pro- gram that allows a knowledgeable person to gain access to a system. The Internet Worm exploited what was essentially a trap door in the BSD sendmail program. Ensuring against trap doors in software as complex as send- mail may be infeasible. In an ideal world, the BSD sendmail program would be replaced by an entire mail subsystem (i.e., perhaps including mail user agents, mail transfer agents, and text preparation and filing programs). Any site using sendmail should at least obtain the less vulnerable, toughened distribution from ucbarpa.berkeley.edu, in file ~ftp/4.3/sendmail.tar.Z. Sites running SunOS should note that the 4.0.3 release closed the security holes exploited by the Internet Worm. Fixes for a more obscure security hole in SunOS are available from host uunet.uu.net in ~ftp/sun-fixes; these improvements have been incorporated in SunOS 4.1. Sendmail has problems other than size and complexity. Its use of root privileges, its approach to alias expansion, and several other design characteristics present potential ave- nues of attack. For UNIX sites, an alternative mail server to consider is MMDF, which is now at version 2. MMDF is distributed as part of the SCO UNIX distribution, and is also available in the user contributed portion of 4.3BSD. Though free, MMDF is licensed, and resale is restricted. Sites running MMDF should be on the mmdf email list; requests to join this list should be sent to: mmdf2-request@relay.cs.net. Programs that masquerade as legitimate system code but which contain trap doors or other aides to unauthorized access are known as trojan horses. Computer "viruses," intrusive IETF NOCTools Working Group [Page 172] Appendix Network Management Tutorial software that infects seemingly innocent programs and pro- pagates when the infected programs are executed or copied, are a special case of trojan horse programs.* To guard against trojan horse attacks, be wary of programs downloaded from remote sources. At minimum, do not download executables from any but the most trusted sources. Also, as noted above, to avoid proliferation of "infected" software, checksums should be computed, recorded, and periodically verified. 6.2 Security Information Clearing-Houses The Internet community can get security assistance from the Computer Emergency Response Team (CERT), established by DARPA in November 1988. The Coordination Center for the CERT (CERT/CC) is located at the Software Engineering Insti- tute at Carnegie Mellon University. The CERT is intended to respond to computer security threats such as the November '88 worm attack that invaded many defense and research com- puters. Consult RFC 1135 (Reynolds, J., "The Helminthiasis of the Internet", USC/ISI, December 1989), for further information. CERT assists Internet sites in response to security attacks or other emergency situations. It can immediately tap experts to diagnose and solve the problems, as well as establish and maintain communications with the affected com- puter users and with government authorities as appropriate. Specific responses will be taken in accordance with the nature of the problem and the magnitude of the threat. CERT is also an information clearing-house for the identifi- cation and repair of security vulnerabilities, informal assessments of existing systems in the research community, improvement to emergency response capability, and both ven- dor and user security awareness. This security information is distributed by periodic bulletins, and is posted to the USENET news group comp.security.announce. In addition, the security advisories issued by CERT, as well as other useful security-related information, are available via anonymous FTP from cert.sei.cmu.edu. For immediate response to attacks or incidents, CERT mans a _________________________ * Virus attacks have been seen against PCs, but as yet have rarely been directed agains UNIX systems. IETF NOCTools Working Group [Page 173] Appendix Network Management Tutorial 24-hour hotline at (412) 268-7090. To subscribe to CERT's security announcement bulletin, or for further information, contact: CERT Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213-3890 (412) 268-7080 cert@cert.sei.cmu.edu. For DDN users, the Security Coordination Center (SCC) serves a function similar to CERT. The SCC is the DDN's clearing- house for host/user security problems and fixes, and works with the DDN Network Security Officer. The SCC also distri- butes the DDN Security Bulletin, which communicates informa- tion on network and host security exposures, fixes, and con- cerns to security and management personnel at DDN facili- ties. It is available online, via kermit or anonymous FTP, from nic.ddn.mil, in SCC:DDN-SECURITY-yy-nn.TXT (where "yy" is the year and "nn" is the bulletin number). The SCC pro- vides immediate assistance with DDN-related host security problems; call (800) 235-3155 (6:00 a.m. to 5:00 p.m. Pacific Time) or send e-Mail to SCC@NIC.DDN.MIL. For 24 hour coverage, call the MILNET Trouble Desk (800) 451-7413 or AUTOVON 231-1713. The CERT/CC and the SCC communicate on a regular basis and support each other when problems occur. These two organiza- tions are examples of the incident response centers that are forming; each serving their own constituency or focusing on a particular area of technology. Other network groups that discuss security issues are: comp.protocols.tcp-ip, comp.virus (mostly PC-related, but occasionally covers Internet topics), misc.security, and the BITNET Listserv list called VIRUS-L. 7. Internet Information There are many available references on the TCP/IP protocol suite, the internet architecture, and the DDN Internet. A soon to be published FYI RFC document, "Where to Start: A Bibliography of General Internetworking Information." pro- vides a bibliography of online and hard copy documents, reference materials, and multimedia training tools that address general networking information and "how to use the IETF NOCTools Working Group [Page 174] Appendix Network Management Tutorial Internet." It presents a representative collection of materials that will help the reader become familiar with the concepts of internetworking. Inquires on the current status of this document can be sent to user-doc@nnsc.nsf.net or by postal mail to: Corporation for National Research Initiatives 1895 Preston White, Suite 100 Reston, VA 22091 Attn: IAB Secretariat. Two texts on networking are especially noteworthy. Inter- networking with TCP/IP, by Douglas Comer, is an informative description of the TCP/IP protocol suite and its underlying architecture. The UNIX System Administration Handbook, by Nemeth, Snyder, and Seebass, is a "must have" for system administrators who are responsible for UNIX hosts. In addi- tion to covering UNIX, it provides a wealth of tutorial material on networking, the Internet, and network manage- ment. A great deal of information on the Internet is available online. An automated, online reference service is available from CSNET. To obtain a bibliography of their online offer- ings, send the email message request: info topic: help request: end to info-server@sh.cs.net. The DDN NIC also offers automated access to many NIC docu- ments, online files, and WHOIS information via electronic mail. To use the service, send an email message with your request specified in the SUBJECT field of the message. For a sampling of the type of offerings available through this service, send the following message To: SERVICE@NIC.DDN.MIL Subject: help Msg: The DDN Protocol Implementations and Vendors Guide, pub- lished by the DDN Network Information Center (DDN NIC),* is _________________________ * Products mentioned in the guide are not specifically IETF NOCTools Working Group [Page 175] Appendix Network Management Tutorial an online reference to products and implementations associ- ated with the DoD Defense Data Network (DDN) group of com- munication protocols, with emphasis on TCP/IP and OSI proto- cols. It contains information on protocol policy and evaluation procedures, a discussion of software and hardware implementations, and analysis tools with a focus on protocol and network analyzers. To obtain the guide, invoke FTP at your local host and connect to host NIC.DDN.MIL (internet address 26.0.0.73 or 10.0.0.51). Log in using username 'anonymous' with password 'guest' and get the file NETINFO:VENDORS-GUIDE.DOC. The DDN Protocol Guide is also available in hardcopy form. To obtain a hardcopy version of the guide, contact the DDN Network Information Center: By U.S. mail: SRI International DDN Network Information Center 333 Ravenswood Avenue, Room EJ291 Menlo Park, CA 94025 By e-mail: NIC@NIC.DDN.MIL By phone: 1-415-859-3695 1-800-235-3155 (toll-free hotline) For further information about the guide, or for information on how to list a product in a subsequent edition of the guide, contact the DDN NIC. There are many additional online sources on Internet Manage- ment. RFC 1118, "A Hitchhiker's Guide to the Internet," by Ed Krol, is a useful introduction to the Internet routing algorithms. For more of the nitty-gritty on laying out and configuring a campus net, see Charles Hedrick's "Introduc- tion to Administration of an Internet-based Local Network," available via anonymous FTP from cs.rutgers.edu (sometimes listed in host tables as aramis.rutgers.edu), in subdirec- tory runet, file tcp-ip-admin. Finally, anyone responsible for systems connected to the Internet must be thoroughly versed in the Host Requirements RFCs (RFC 1122 and RFC 1123) _________________________ endorsed or recommended by the Defense Communications Agency (DCA). IETF NOCTools Working Group [Page 176] Appendix Network Management Tutorial and "Requirements for Internet Gateways," RFC 1009. 8. The Final Words on Internet Management Keep smiling, no matter how bad things may seem. You are the expert. They need you. 9. Security Considerations Security issues are discussed in Section 6. 10. Author's Address Robert H. Stine SPARTA, Inc. 7926 Jones Branch Drive Suite 1070 McLean, VA 22102 EMail: STINE@SPARTA.COM IETF NOCTools Working Group [Page 177]