The Internet has become infamous among network professionals for its near pathological inability to deploy multicast. The Internet’s current multicast specifications place significant demands on the network, most significantly the need for routers to maintain a multicast routing tree for each multicast address. I propose a lightwight multicast (LWM) designed specifically to alleviate this requirement by listing a full set of destination addresses in each packet header, using a header option.
INTRODUCTION
The Internet has become infamous among network professionals for its near pathological inability to deploy multicast. There seem to be various reasons for this, not all of them technical. Service providers are wary of spending large sums to deploy unproven technologies without any clear means of cost recovery. There are technical reasons as well, though. The Internet’s current multicast specifications (RFC 1112, 2236, 2362, 2710, 2730), henceforth refered to as heavyweight multicast, place significant demands on the network, most notably the routers, which build and maintain multicast routing trees for each IP multicast address. Given the problems we’ve seen with explosive routing table growth, a technology that requires explicit routing table entries for every active multicast session has understandably been slow to gain acceptance.
See draft-ietf-mboned-iesg-gap-analysis-00.txt, particularly sections 6 through 20, for a good discussion of the current state of the art in Internet multicasting. In particular, I’ll note the distinction that has arisen between ASM (Any Source Multicast), the original Internet multicasting model where any device could transmit to a multicast address at any time and expect the packet to be delivered to the current set of subscribers, and SSM (Single Source Multicast), a newer, simpler model that ties multicast addresses and sources together and requires only a single distribution tree rooted at the only source that can transmit to a given multicast channel.
I propose a lightwight multicast designed with the following features:
- intended for multicasts with a small number of destinations
- capable of transitioning to heavyweight multicast if the number of destinations grows significantly
- designed to minimize network, and particularly router requirements
The idea is to multicast by listing a full set of destination addresses in each packet header, using header options. Obviously, this only makes sense for feeds with a small number of destinations, but has the advantage of requiring almost no additional support by the routers. In particular, there is no need for multicast routing trees; standard unicast routing is used to determine routes.
The basic model of router operation is to lookup the routes for each of the multiple destinations specified in the IP header option. If all destinations route to the same next hop, nothing more is done other than forward the packet. If the destinations route via different next hops, then the packet is duplicated. In each copy, those addresses being routed via other next hops are struck out and replaced with the all-zeros address. The packets are then forwarded as before.
If this seems like only a small improvement over unicast, hardly worth implementing, then consider this. A bandwidth-intensive application can consume a significant percentage of a single link. A single video feed, for example, can dominate a DSL line. Sending one copy of a video stream with four addresses in the packet headers vs. sending four copies of the video stream can mean the difference between a working application and an overloaded link. Furthermore, it is becoming increasingly apparent that multicast, far from being an esoteric delivery model used only in exotic applications, is in fact the natural mode for a variety of common applications, such as caches, shared desktops, distributed databases and video conferencing.
IP ADDRESSES AND PORT NUMBERS
My first conception of LWM was to transmit directly to the unicast addresses of all recipients, avoiding IP multicast addresses completely. An immediate problem with this scheme is that it requires a single UDP port number to be shared across all the receivers. Heavyweight multicast also requires a shared port number, but that doesn’t seem so bad since it will be bound to a particular IP multicast address, the entire thing associated with a single application, so it’s easy to pick a port number and assume that nothing else will be using it. With the LWM unicast-address scheme, it’s not nearly so clear. You’d have to use a well-known common port number, but what’s worse is that if there are several similar applications on the same host you couldn’t select among them – they’d all have to listen on the same port and each would get announcements intended for another.
So, to avoid issues like re-writing UDP packet headers or trying to synchronize UDP port numbers across multiple IP addresses, we’ll resort to the traditional IP multicast practice of using a dedicated multicast address for each multicast group. Then we’ll bind each receiver to a specific address/port combination using a multicast IP address distinct from any of its unicast addresses. This avoids the UDP port number problem. Multicast addresses still need to be acquired in a conventional manner. Since LWM sources need to be aware of their recipient list, since this information needs to be put into header options, ASM would require some kind of flooding mechanism to communicate this information to all possible senders. Due to the difficulty in achieving this synchronization, we restrict ourselves heretofore to an SSM model. In other words, each source IP address has a block of SSM addresses (232/8 in IPv4, FF3x::/96 in IPv6) allocated specifically to it.
The obvious next question is, do we use the existing SSM address space, or do we allocate yet another block of multicast addresses for LWM operation? The primary advantage of allocating another special block of addresses is that it allows network devices to clearly distinguish between addresses using heavyweight and lightweight operation. However, this advantage is not as clear-cut as it would seem. First, using a standard multicast address allows for an obvious transition mechanism into heavyweight multicast when the number of subscribers increases to justify this. Second, as will become apparent later in this memo, heterogeneous networks using lightweight multicast in some areas and heavyweight multicast in others are completely feasible and should be supported or even encouraged. And finally, since LWM header options can be added (or removed) by a router, it’s not clear whether any benefit is accrued by seperating LWM operation into its own address space.
SOURCE ROUTING
Some set of header options should be provided that allow source routing (especially loose source routing) to be used in conjunction with LWM. The simplest way to do this is to specify LWM using a single header option, nearly identical to a Loose Source Route (IPv4) or a Routing Header (IPv6) and to be interpreted in almost exactly the same way, except that multiple options can be specified to provide the multicast functionality.
So, for IPv4, specifying ‘n’ non-source routed multicast addresses would require exactly 8*n bytes – four for the option header and four for each address. Since the maximum value in the Internet Header Length field is 15, corresponding to a 15*4=60 byte IP header, at most five addresses could be specified in this way. Using the more compact header format, this could be done using 4+(4*n) bytes, allowing up to 9 addresses. For IPv6, 24*n bytes would be required for the more general option, while the compact format would require 8+(16*n) bytes. IPv6 does not impose any particular limit on option length.
A “best of both worlds” approach would be to specify two different IP header options – a general form that could be used with source routing and a compact form that could not. Especially using IPv6, without any specific limit on the length of header options, both options could be used together (and more than one of the general form options).
THE HEADER OPTIONS
IPv4 general option:
Type=A +--------+--------+--------+--------+ |AAAAAAAA| length | pointer| resv | +--------+--------+--------+--------+ | option originator address | +--------+--------+--------+--------+ | Address[1] | +--------+--------+--------+--------+ | Address[2] | +--------+--------+--------+--------+ . . . . . . . . . +--------+--------+--------+--------+ | Address[n] | +--------+--------+--------+--------+
This works the same way as an IPv4 loose source route, except that we don’t replace the IP destination address, and the option can be specified multiple times to specify multiple source routed destination addresses. The pointer points to the current destination address. An address can be deleted by setting its pointer to zero.
The “option originator address” is the IP address of the device that attached the LWM option. See below for more discussion of this field (including speculation whether it should be included at all).
IPv4 compact option:
Type=B +--------+--------+--------+--------+ |BBBBBBBB| length | resv | +--------+--------+--------+--------+ | option originator address | +--------+--------+--------+--------+ | Address[1] | +--------+--------+--------+--------+ | Address[2] | +--------+--------+--------+--------+ . . . . . . . . . +--------+--------+--------+--------+ | Address[n] | +--------+--------+--------+--------+
The route data consists of a series of IPv4 destination addresses. An address can be deleted by clearing it to zero.
IPv6 general option:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Hdr Ext Len | Routing Type=C| Segments Left | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Option Originator Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Address[1] + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Address[2] + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . . . . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Address[n] + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
A Type C Routing header is just like a Type 0 Routing header except that there can be multiple Type C Routing headers to specify multiple source routed destinations. Just like with the IPv4 general option, Type C headers do not replace the destination address and have to be examined at each hop.
The Option Originator Address is the IPv6 address of the device that attached the LWM option (again, read on).
IPv6 compact option:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Hdr Ext Len | Routing Type=D| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Option Originator Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Address[1] + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Address[2] + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . . . . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Address[n] + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
A Type D routing header consists of multiple, non-source routed destination addresses. An address can be deleted by setting it to all zeros.
SUBSCRIPTION TO LWM CHANNELS
The conventional Internet multicast models, both ASM and SSM, place the burden of subscription management on the network. In IPv4, a receiver uses IGMP (Internet Group Management Protocol) to inform the local routers of its interest in a particular multicast session using a Multicast Join. These routers are then responsible for finding the session’s rendezvous point (RP) and forwarding join messages to it (for SSM the source is the RP). All intermediate routers need to note the join messages and construct or add to a multicast routing table for _this_specific_session. It is these additional routing tables that form much of the adverse load of heavyweight multicast, which LWM attempts to avoid.
We now need to modify our subscription model for LWM. In the ASM and SSM models, the source(s) simply transmit the packets with multicast destination addresses, and the network takes care of routing the packets to any subscribed destinations. In the LWM model, the source needs to specify a complete list of destination addresses when it transmits the packets.
The obvious modification is for destinations to transmit multicast joins directly to the source, rather than to local routers. The obvious drawback I see to this is that it requires the destination to behave differently depending on whether it’s subscribing to an LWM or an ASM/SSM session. Note also, that multicast leaves also need to be transmitted back to the source.
INTEROPERATION WITH HEAVYWEIGHT MULTICAST
There’s also the very real possibility of “mixed” networks, where part of the network supports LWM only, part supports SSM only, and part supports both (or neither). In fact, the part supporting both is the only part that requires any real attention, since the rule of law in the other parts dictates what multicast services are available. Clearly, SSM-only networks are the only kind that exist today, but given the resistance to deploying heavyweight multicast, LWM-only networks are quite conceivable. In fact, once suitable router software would become available, enabling LWM on a network would be trivial (that’s the whole point), while heavyweight multicast requires additional routing protocols, peering arrangements with other providers, and possibly more memory on your routers. So LWM-only networks could happen.
Consider a mixed-mode router that would receive SSM multicasts and add LWM header options in order to transit the traffic across an LWM-only network. In fact, this could become a popular mode of operation, using SSM in a heavyweight core (MBONE) and then using LWM to deliver traffic into networks that don’t support heavyweight multicast. Moving in the other direction, LWM multicasts could be addressed to the mixed-mode router as one of the multiple destination addresses, to be stripped off and delivered as SSM into the heavyweight core. Adding and removing IP header options could be problematic. In the case of IPv4, due to the size limit, it might be impossible, and in the case of IPv6 it might trigger fragmentation. In any event, adding and removing headers is an operation routers tend to avoid, but let’s continue with the analysis (for now).
Another real possibility is a heavyweight island in a predominately lightweight network. Or more precisely, heavyweight LANs (like Ethernet) interconnected by lightweight-only WAN links. Using mixed-mode routers here would allow a single destination address (the router’s) to be listed in the LWM headers. Having received the LWM packet over the WAN link, the router would then multicast into the Ethernet LAN using localized heavyweight SSM. This would allow a large number of closely clustered subscribers to take advantage of LWM without putting all their addresses into the LWM headers.
All this adds up to a complex, mixed-mode network with heavyweight cores and heavyweight islands interconnected by lightweight WANs. In fact, this is probably the most attractive use of LWM. You use heavyweight multicast on stub LANs dominated by local traffic using routers with small tables and default routes, where the extra cost of maintaining a handful of multicast routes (those being used by local devices) is minimal. You use lightweight multicast on core WANs dominated by transit traffic using routers with large, defaultless tables that need to support large numbers of (foreign) multicast sessions without piling anything extra onto the routing tables.
This would seem to suggest that the mixed-mode router, and not the origin source, should pick up the LWM joins. Of course, the receiver sending the joins probably has no way of knowing where this mixed-mode router is. Yet this is exactly the scenario envisioned by the authors of RFC 2113, the standard-track IP Router Alert header option:
-
It is also desirable to be able to gradually deploy the new technology, specifically to avoid having to upgrade all routers in the path between source and destination. This goal is somewhat at odds with the least common denominator information available, since a router that is not immediately adjacent to another router supporting the new protocol has no way of determining the location or identity of other such routers (unless something like a flooding algorithm is implemented over unicast forwarding, which conflicts with the simplicity goal).
…
The goal, then, is to provide a mechanism whereby routers can intercept packets not addressed to them directly, without incurring any significant performance penalty. This document defines a new IP option type, Router Alert, for this purpose.
A similar option is standard-tracked for IPv6 in RFC 2711. Both IGMPv3 (RFC 3376) and MLDv2 (RFC 3810) require the use of their respective Router Alert options. RFC 3376:
-
IGMP messages are encapsulated in IPv4 datagrams, with an IP protocol number of 2. Every IGMP message described in this document is sent with an IP Time-to-Live of 1, IP Precedence of Internetwork Control (e.g., Type of Service 0xc0), and carries an IP Router Alert option [RFC-2113] in its IP header.
If we change these semantics to use a larger IP TTL value and address the IGMP packet to the source address of the multicast feed, then the packet (hopefully) will be routed back along the path to the source. Heavyweight multicast routers along the path will pick up the IGMP message (because of the combination of its IP Router Alert and IGMP message type) and process it using a fairly normal heavyweight procedure – add the source to its multicast routing table and resend a similar IGMP join again towards the source – similar, but with its own outbound IP address as the new source address. Lightweight multicast routers along the path simply ignore IGMP packets. Mixed-mode routers along lightweight/heavyweight boundaries need to recognize when they’ve received a join from the lightweight region (this is probably obvious because the source isn’t on a directly attached interface) and make suitable provisions to add (or remove) LWM headers from the actual data packets that will follow.
Now we can return an issue I skipped over earlier – the problem of forwarding LWM packets over something like Ethernet. You have the link-layer multicast procedure described in RFC 1112, based on the standard multicast address, but how do other routers on the segment know which multicast address they need to subscribe to? The simple answer is that they don’t. You duplicate the packet and unicast multiple copies over the Ethernet… if it’s operating in lightweight mode, where the routers don’t maintain routing table state for the multicast address (and thus have no way to know which multicast addresses to subscribe to). Or you operate the Ethernet in heavyweight mode, where you can use link-layer multicast because now the routers do maintain routing table state and thus know which multicast addresses to listen for.
INTEROPERATION WITH NON-MULTICAST DOMAINS
Large tracts of the Internet currently have no multicast support whatsoever. How can LWM interoperate with a non-multicast domain?
First, let’s consider what happens when we apply the procedure outlined in the last section to a network with a non-multicast domain. The IGMP join (hopefully it’s not completely filtered out) would be routed across the non-multicast domain unaltered. Hopefully it hits a mixed-mode router on the other side; we specifically don’t want it processed by a SSM-only router that might not even understand an IGMP message with a TTL other than one. Since the source address of the IGMP message would not be on a directly attached interface, that foreign mixed-mode router would conclude that it needs to use LWM. Of course, the LWM header options would not be understood by the non-multicast domain, and since its routers would presumably lack any kind of routing information for the multicast address, we’d probably want the LWM options labeled to trigger an ICMP error message when they hit a non-LWM router. The source address of the ICMP error messages would indicate the first non-LWM router along the path. The destination address of the ICMP error messages would be the multicast sender.
Now, what we’d really like is for those ICMP messages to come back to the router that added the LWM options, but they’re going to come back to the source. The next best thing is to take advantage of the copy of the IP header that gets included in the ICMP Parameter Problem message. That’s why we include a Option Originator field in the LWM header options. When a multicast source receives an ICMP Parameter Problem for an LWM packet, it simply pulls the Option Originator field out of the error message and retransmits the ICMP Parameter Problem to that address. Thus we insure that the ICMP error messages come back to the device that added the LWM option, at the price of requiring a change in end system behavior. This is the only change to end system behavior required by LWM. However, it should be noted that without this change, an endless stream of unresolved ICMP Parameter Problems will be the result.
What does the router do with the ICMP Parameter Problem message? The only possible solution that I see (other than completely removing the problem destination from the distribution list), is to switch to a unicast mode of transport. Which destinations need to be unicast? Well, exactly those that still remain in the distribution list that got copied back in the Parameter Problem message. How to unicast the packets? Probably using something like GRE to encapsulate the multicast packet inside a unicast header and thus get it across the unicast network.
Furthermore, the originating router can even assume that everything between here and there (the router that generated the Parameter Problem) is LWM-capable, and thus could use an LWM source route to deliver a single copy of the packets to the last LWM router before the one that triggered the ICMP error. Investigate this further.
MODE TRANSITION
The additional load of SSM over LWM is borne entirely by the network, and in fact SSM is simplier than LWM for the end systems because it requires less management. So end systems have little motivation not to favor SSM over LWM (assuming that the choice is available at all). On the other hand, networks have a strong motivation to favor LWM over SSM… unless there are so many LWM receivers for a given feed that the size of the packet headers, and cost of processing them, becomes prohibitive. In short, one of the nice things about the mixed-mode network described above is that it lets individual administrative domains decide whether they want to use LWM or SSM, and to make this decision on a per-channel basis.
Assuming that this decision is made dynamically, we need to ask if we can transition a LWM multicast session to or from SSM depending on its subscriber load. More precisely, given the above discussion, can we transition a portion of the network back and forth between SSM and LWM?
Let’s consider first the LWM-to-SSM transition, which would be undertaken when an LWM session gained enough subscribers to warrant transitioning to SSM. This could be accomplished by multicasting an announcement using LWM, informing the routers along the path of the imminent transition, and prompting them to construct an SSM routing tree that exactly mimics the propagation path taken by the LWM packets. This routing tree would presumably extend throughout the portion of the network making the transition. After a suitable length of time had passed, the sender (more generally the domain’s ingress router) could drop the LWM header options from future packets, and the session would have transitioned to SSM.
An SSM-to-LWM transition (triggered now by a fall in subscribers, rather than an increase) is complicated by the sender’s lack of information about the subscriber addresses. The first question to be asked is, how does the sender (or any other device) even determine that there are few enough subscribers to perform an SSM-to-LWM transition? Some higher level protocols, such as RTP, have an additional feedback data flow that would allow the sender to at least roughly gauge the number of subscribers. Suffice it to say that an out-of-band mechanism may be available, but probably a multicast ‘ping’ could be used to measure subscriber load. Such a ping could be probabilistic in nature – the echo request could include a probability with which the destinations should reply. A low probability request, say .1%, would trigger on average one response per thousand subscribers. This could be used to measure subscriber load. By restricting the ping to an administrative domain, and requiring egress routers to process such restricted pings instead of routing them on, the amount of dispersion within a particular domain could be measured. Once an SSM-to-LWM decision has been made, or is at least being seriously considered, a series of multicast pings with 100% probabilities could be used to construct an LWM destination list. Finally, the session would transition to LWM and series of IGMP/MLD announcements would be sent to the routers informing them that they can discard their SSM routing tables.
Doing all this in the face of unreliable delivery requires more thought.
USE OF FLOW LABELS
IPv6 provides a 20-bit Flow Label header field that can be used to optimize router operation. Quoting some relevant sections from RFC 3697 (IPv6 Flow Label Specification):
-
The usage of the 3-tuple of the Flow Label and the Source and Destination Address fields enables efficient IPv6 flow classification, where only IPv6 main header fields in fixed positions are used.
…
Nodes keeping dynamic flow state MUST NOT assume packets arriving 120 seconds or more after the previous packet of a flow still belong to the same flow, unless a flow state establishment method in use defines a longer flow state lifetime or the flow state has been explicitly refreshed within the lifetime duration.
and RFC 2460 (IPv6 Specification):
-
All packets belonging to the same flow must be sent with the same source address, destination address, and flow label. If any of those packets includes a Hop-by-Hop Options header, then they all must be originated with the same Hop-by-Hop Options header contents (excluding the Next Header field of the Hop-by-Hop Options header). If any of those packets includes a Routing header, then they all must be originated with the same contents in all extension headers up to and including the Routing header (excluding the Next Header field in the Routing header).
A straightforward application of these principles should allow routers to efficiently process LWM packets without having to re-parse the LWM headers for every packet. No additional flow setup mechanism appears to be required. Routers simply parse the LWM headers for the first packet they receive in the flow, calculate the appropriate processing required, then cache the (Source,Multicast Destination,Flow ID) tuple for future use. No flow-state cleanup is provided other than the default 120 inactivity timeout.
The big problem I can see here is with mixed-mode routers trying to add or remove packet headers. The quote from RFC 2460 makes it clear that you can’t change Routing headers within flows, and the flow IDs are expected to be assigned by the source. So how a router add (or remove) LWM headers? It would seem that such a router would have to generate its own flow labels, and (to avoid duplication) track them on a per-source basis. This is way too much trouble, yet we definitely want to use flow labels to avoid parsing the complex LWM headers for every packet. All this seems to suggest the use of…
GRE ENCAPSULATION
Under this varient, we simply don’t add or remove IP headers. When we want to do something like that, we encapsulate in a GRE packet. GRE encapsulation is used as a complete alternative to adding or removing headers. We don’t need an “option originator address”, either. Flow labeling and ICMP error handling are much cleaner. The encapsulating router uses its own source address (so ICMP errors come back to it) and generates its own flow labels, which are simply listed as part of the routing entry which basically states, “encapsulate with LWM options X and flow label Y”.
In addition to these obvious benefits, GRE encapsulation has the enormous advantage of being much more in line with what we expect from router behavior, and thus much less likely to break obscure assumptions that might be overlooked at first.
A perceived disadvantage of GRE encapsulation is the extra burden it places on routers, but I would suggest this is illusory. Basically, anything that requires any but the most trivial header modification (decrementing TTL), and certainly anything that changes the size of the packet is quite demanding of router resources. So encapsulating in GRE really isn’t much more demanding than adding or removing headers.
For me, the biggest disadvantage of GRE encapsulation is psychological. Mentally, I equate encapsulation with hacked-on kludges that break underlying theoretical assumptions in the interest of expediency. In this case, however, I am hard pressed to back this up. First, I think the heavyweight model is flawed because its demands on routing tables don’t scale to numbers of multicast sessions on the order of the current number of unicast sessions. Next, I note that relieving this load by putting multicast routing information in packet headers is an elegant solution completely in line with what has been historically regarded as good networking practice. Yet the heavyweight model also has its niche; I don’t think anyone would seriously consider abandoning PIM in favor of explicit distribution lists in packet headers. So, then, the two need to interoperate. How to achieve this?
It’s at this point that the “kludge alert” really seems to go off in my mind for the first time. Maybe we need more sophisticated session-oriented protocols, ala ATM, so we can signal routers to add nodes to existing distribution trees. Maybe we need an improved MPLS, so we just push a tag on the front of the multicast packet and it routes accordingly. Maybe our base architecture is too unicast-oriented; we want something built from the ground up to handle multicast as the primitive case, of which unicast is just a special case.
In any event, I don’t see any easy answers to these questions. Beyond noting that I suspect this proposal is, at least partially, another NAT, I also admit that I don’t see anything better at the moment. So just as we implemented NAT and ultimately accepted it as the cost of not properly implementing source routing, I suggest we implement LWM with GRE encapsulation and accept this as the cost of some not-yet-understood design compromise made years ago.
USE OF MPLS
Multiprotocol Label Switching (MPLS), RFC 3031 and others, is a popular method of optimizing router performance by essentially picking multiple next hops at once; a packet’s entire path through part of the network can specified by attaching a label to the packet. As section 4.8 of RFC 3031 makes clear, MPLS is designed to interoperate with multicast, in the sense that NHLFEs (Next Hop Label Forwarding Entries) can specify multiple destinations to receive identical copies of the packet. The key word here is “identical”. Traditional, heavyweight multicast does not modify multicast packets as they traverse the network, but lightweight multicast requires addresses to be struck from the headers (to avoid routing loops) as packets are duplicated.
Can this modification can be performed as packets leave an MPLS domain?
THE BOTTOM LINE
So what is the bottom line? I like lightweight multicast. I think there’s a serious scaling problem with traditional multicast’s routing structure, and I like the idea of relieving the load on router tables by moving information into the packet headers. I like the flexibility of using lightweight multicast in some parts of the network, and heavyweight multicast in others.
On the other hand, I’m suspicious of the whole business of adding and removing headers along the way. The problems it creates (fragmentation, flow labels, routing ICMP Parameter Problems) are sufficiently severe that I think GRE encapsulation should be considered as a complete alternative to adding or removing headers. I’m suspicious of this as well, but think it’s an acceptable compromise at this stage to achieve clear gains in router operation.
So what needs to be done? Well, LWM-complient routers MUST route packets according to any LWM headers that may be present, and MAY encapsulate multicast packets within GRE packets in order to transit them across a LWM network.
Now the only change we make to receiver behavior is that IGMP joins are sent with a standard TTL. Routers forwarding IGMP joins SHOULD send them with a standard TTL (but see next paragraph).
Any device that sends IGMP joins with a standard TTL (both end systems and routers) MUST be prepared to accept GRE-encapsulated multicast packets.
Sender end systems MAY want to include some router functionality to be able to process foreign IGMP joins that come all back to them.
And finally, we MUST investigate how ASM would be affected by LWM. Initially, I suppose that LWM could be well used to distribute multicast packets out from a PIM-SM RP. And we MUST figure out how precisely how a domain can transition into heavyweight multicast when the number of recipients becomes high enough.
EXAMPLES
\\--------/------------------- H3 N3 ZZ \\ \\----// \\ /---H2 /----\\\\ NX \\ / | / N1 XX \\ / N2 | H1-------\\\\ YY------| ------------------------
Example 1a: N1/N2/N3/NX are all non-multicast domains. H1 wants to subscribe to a multicast session provided by H2.
H1 sends an IGMP join which gets routed all the way to H2. H2 detects that this is a join from a non-directly-attached device and thus processes it as LWM. H2’s first attempt to send a LWM packet triggers an ICMP Parameter Problem message from a local router in N2. H2 therefore falls back to unicast and transmits further multicast packets to H1 encapsulated in unicast GRE packets.
Example 1b: Now H3 wants to subscribe as well.
The exact same thing happens. H2 ends up transmiting all its multicast packets twice, embedding both copies into unicast GRE packets.
Example 2: N1/N2/N3 are heavyweight multicast domains. NX is a non-multicast domain. H1 wants to subscribe to a multicast session provided by H2.
H1 sends an IGMP join which gets processed by the heavyweight routers in network N1. In particular, they build a multicast routing tree from router XX to host H1. Router XX’s multicast join (with XX listed as the source address) routes all the way to router YY, which detects a non-directly-attached device and processes this as LWM. Router YY continues the multicast join by sending it along towards H2. The routers in N2 process this as heavyweight and build a multicast routing tree from H2 to router YY. Multicast data packets are distributed ‘normally’ within network N2, with router YY as one of the subscribers. YY’s routing table indicates the need for LWM forwarding to router XX. The first attempt to do so triggers an ICMP Parameter Problem message from network N3. If we’re using GRE encapsulation, then there’s probably no ICMP error message at all, since with only a single recipient there’s no need to use LWM headers. If we’re not using GRE encapsulation, the ICMP message goes all the way back to H2, which has to forward the message to YY. In either case, YY now falls back on encapsulating multicast packets within GRE and unicasting these packets to XX. Once XX receives the packets, normal heavyweight distribution proceeds within network N1.
Note that additional subscribers within network N1 (or within network N2) create no additional traffic on network N3.
Example 2b: Now H3 wants to subscribe.
The exact same thing happens, up to the point where YY has to begin transmitting to ZZ as well as XX. Now, if we are using GRE encapsultion for LWM packets, YY tries (for the first time) to attach LWM headers to the GRE packet listing both XX and ZZ as destinations. This triggers an ICMP Parameter Problem error, which comes back to router YY (the source of the GRE packets). If YY was attaching LWM headers directly to the multicast packets, then YY already has seen an ICMP Parameter Problem (forwarded from H2 after YY’s first attempt to forward an LWM multicast packet). In either event, YY now knows that network NX can’t support LWM and begins duplicating the multicast packets, encapsulating them into GRE packets and unicasting one copy to XX and one copy to ZZ. We now proceed as before.
Again, note that additional subscribers in N3 (as well as N2 and N1) are handled without any additional load on NX.
Example 3: Same as example 2, but now NX is a lightweight multicast domain.
Everything proceeds as before. Note particularly that the LWM routers in NX forward the IGMP joins without processing them. The _only_ thing LWM routers have to do is process the LWM header options, and if we’re using IPv6, then we try to make things easy on them by flow labeling everything. Once we’re at the point where YY attempts to use LWM headers (either on the original packets, if we’re not encapsulating, or when H3/ZZ subscribes, if we’re encapsulating), there’s no ICMP Parameter Problem error. The LWM headers work, and YY can use them to transmit single packets into NX bound for both XX and ZZ. The packets are duplicated somewhere inside NX where the routing for XX and ZZ diverge.
SUMMARY OF MULTICAST-RELATED RFCs
RFC 1112 – Host extensions for IP multicasting (IGMP)
RFC 2189/2201 – Core-Based Trees (CBTv2)
RFC 2236 – IGMPv2
RFC 2362 – PIM-SM
RFC 2710 – Multicast Listener Discover for IPv6
RFC 2730 – MADCAP – Multicast Address Dynamic Client Allocation Protocol
RFC 3569 – overview of Source Specific Multicast (SSM)
RFC 3618 – Multicast Source Discovery Protocol
RFC 3956 – Embedding the RP address in the IPv6 Multicast Address
RFC 3973 – PIM-DM
RFC 3450 – Asynchronous Layered Coding (ALC)
RFC 4286 – Multicast Router Discovery Protocol
RFC 4410 – Selectively Reliable Multicast Protocol
RFC 791 – IPv4
RFC 2460 – IPv6
Since writing this, I’ve come across RFC 5058 and learned that others are thinking along the same lines. In their scheme, called “Xcast”, a dedicated multicast address (All-Xcast-Routers) is used for the destination address, and devices that don’t support Xcast do not process these packets at all. They also use a separate protocol-level header for Xcast, not an IP header option (at least not in IPv4; in IPv6 there is little real distinction). No modification is envisioned to multicast joins; instead SIP is expected to support Xcast. As in my essay, several transition modes are discussed.
Looks like they’ve got a mailing list, and a wiki.