1
[HTML reformatted by RAM -- 04/01/2001]
3
Ultrapeers: Another Step Towards Gnutella Scalability
11
{anurag, crohrs}@limewire.com
13
Abstract: We propose a scheme to have a hierarchical gnutella
14
network by categorizing the nodes on the network as client-nodes and
15
super-nodes. A client-node keeps only one connection open, and that is
16
to a super-node. A super-node acts as a proxy to the gnutella network
17
for the client nodes connected to it. This has a effect of making
18
the gnutella network scale, by reducing the number of nodes on the
19
network involved in message handling and routing, as well as reducing
20
the actual traffic among them.
26
The Gnutella network is a peer-to-peer network, all the hosts on the
27
network are treated equal regardless of their bandwidth capabilities or
28
computation power or other properties (like uptime, connection capability
29
etc). Nodes on the network communicate using either of Gnutella Protocol
30
0.4 [PROT04], or an enhanced version that uses new Connection Handshaking
31
mechanism [GC01], both of which have many interesting properties. New
32
hosts can join any time, and the current hosts on the network may leave
33
the network any time. The ability to have a dependable network, without
34
dependence on any particular host (or set of hosts), is a remarkable
35
feature of gnutella that has led to its immense popularity. Along with
36
this there are many other remarkable features of gnutella network
37
[WHYGNUT01] that distinguish it from other systems.
39
But along with all the good features that gnutella possesses, there have
40
always been doubts cast upon the scalability of gnutella network. That?s
41
primarily because of the overabundance of messages flowing thru the
42
gnutella network including broadcast pings and queries, as well as the
43
inability to assure content availability close enough to the searching
44
host. Queries are sent to large number of hosts in an attempt to secure
45
a hit from few of them. This has led to bandwidth problems in the past
46
as well as in the present.
48
Gnutella developers have taken a number of very promising initiatives to
49
reduce the number of messages flowing in the gnutella network, by making
50
the methods more directed as well as carry more information. Notable among
51
these are Ping Pong Scheme Proposal [CV01] which reduces the number of
52
messages by eliminating the ones not contributing much to the system,
53
Meta Information Search Proposal (S01) which attempts to make search
54
more specific, and Query Routing Proposal [C00] which reduces the query
55
traffic by doing selective broadcast.
57
Although the above initiatives are a promising step towards making the
58
network efficient, there definitely exist other supplementary schemes,
59
one of which is the scheme proposed in this paper that has a major
60
potential of making network scalable and efficient. We propose a scheme
61
which if implemented will create a two level hierarchy of nodes in the
62
gnutella network, where faster nodes (ones having better networking
63
capabilities and CPU power) take charge of much of the load on slower
64
nodes. In the proposed scheme this restructuring of the network takes
65
place automatically and dynamically, and works in unison with the other
66
initiatives mentioned above. The restructured network will have a effect
67
of significant reduction in the number of messages flowing thru the
70
At the same point, we would like to mention that the concept of
71
the ultrapeers is not new, and has been in existence in variety of
72
applications (including proxy access to web which has static ultrapeer
73
selection, as well as distributed transaction processing which has
74
dynamic ultrapeer selection algorithm). Also it has been applied to
75
recent peer-to-peer applications using FastTrack [FTRACK], or other
76
such similar technologies. The application of the ultrapeer concept in
77
Peer-to-Peer application has been done using proprietary algorithms and
78
protocols. Our contribution in this paper is to come up with a scheme
79
to implement the dynamic ultrapeer selection mechanism in the Gnutella
80
network. No such scheme has been published as yet. Our approach may or
81
may not be similar to the proprietary ones in existence, with which we
82
have no way to compare.
84
We present our scheme as follows. First we describe how ultrapeers work
85
in an ideal network with a static topology. In doing so, we discuss
86
options for ultrapeers to index [xyz1]leaf nodes. Next we tackle the
87
more difficult problem of ?ultrapeer election?, i.e., of dynamically
88
constructing a tiered network. We describe a handshaking mechanism
89
based on the recent Gnutella 0.6 protocol in detail and the concept
90
of ?ultrapeer guidance?. Finally, we conclude with some performance
91
observations and directions for future work.
94
2. Routing with Ultrapeers
96
Nodes in this network can be divided into leaf nodes and ultrapeers.
97
Leaf nodes maintain only a single connection to a ultrapeer. Ultrapeers
98
maintain many (10-100) leaf connections, as well as a small (<10)
99
number of connections to other ultrapeers. Ultrapeers shield leaf nodes
100
from virtually all ping and query traffic. This can be done in one of
104
* Reflector indexing: in the style of the Clip2 Reflector[xyz2],
105
ultrapeers periodically send an indexing query (four spaces,
106
i.e., ? ?) to leaf nodes. Leaf nodes respond with a query
107
reply naming all shared files. (The leaf can send multiple query
108
reply messages if sharing more than 255 messages.) The ultrapeer
109
uses these replies to build an index of its leaves? contents.
110
The ultrapeer then responds to queries on behalf of leaf nodes,
111
shielding them from all query traffic.
113
* QRP routing: leaf nodes periodically send ultrapeers query
114
routing tables using LimeWire?s proposed Query Routing Protocol
115
[xyz3](QRP). Ultrapeers then forward queries only to those leaf
116
nodes whose QRP table has a corresponding entry. Leaves respond
117
in the normal manner. Ultrapeers do not propogate QRP tables
118
amongst themselves.[1]
121
Hybrid schemes are of course possible. However, we recommend QRP routing
122
for several reasons[xyz4]:
124
* Better QHD support: because leaf nodes are given the chance
125
to respond to all queries, they can provide up-to-the-minute
126
information in the QHD, such as estimated download speed and
129
* Update efficiency: because QRP tables eliminate redundant keywords
130
and add compression, they are likely to be significantly smaller
131
than replies to indexing queries. Furthermore, a leaf node only
132
needs to send incremental QRP updates when its shared files
133
have changed. In contrast, a reflector-style ultrapeer would
134
need to periodically re-index all of the leaves? files-this takes
137
* CPU efficiency: it is very cheap (constant time) for a ultrapeer
138
to decide whether to forward a query to a leaf node with the
139
QRP proposal. The reflector-style scheme can be implemented
140
efficiently, but it is considerably more difficult.
142
* Privacy: ultrapeers do not actually know what files are shared by
143
leaf nodes-only those files? hashes.
145
* Ease of implementation: LimeWire has already implemented QRP,
146
and the code is available to the public under the GNU Public
147
License (GPL). Maintaing a single QRP table per connection is
148
easier to implement than building a ?virtual file manager? for
152
A more difficult question is how ultrapeers communicate amongst
153
themselves. Initially we recommend simple broadcast as with the current
154
Gnutella protocol. Eventually ultrapeers may use more complicated
155
protocols. One idea is for a ultrapeer to not forward a query if it
156
receives a large number of replies from its leaf nodes. QRP could also
157
be used between ultrapeers.
159
The proposed scheme is backward compatible with older clients. From the
160
perspective of newer clients, old clients are simply ultrapeers that
161
do not maintain any shielded leaf connections. The scheme benefits the
162
network even in the presence of old clients, though of course it would
163
be better if all clients supported ultrapeers.
166
3. Ultrapeer Election
169
Now we return to the question of how the ultrapeer-leaf hiearchy is
170
created in a distributed manner. Hosts regularly determine whether they
171
are eligible to be ultrapeers (?ultrapeer capable?) by looking at uptime,
172
operating system, bandwidth, etc. We recommend the following requirements
176
* Not firewalled. This can usually be approximated by looking at
177
whether the host has received incoming connections.
179
* Suitable operating system. Some operating systems handle large
180
numbers of sockets better than others. Linux, Windows 2000/NT/XP,
181
and Mac OS/X will typically make better ultrapeers than Windows
182
95/98/ME or Mac Classic.
184
* Sufficient bandwidth. We recommend at least 15KB/s downstream and
185
10KB/s upstream bandwidth. This can be approximated by looking
186
at the maximum upload and download throughput.
188
* Sufficient uptime. Ultrapeers should have long expected uptimes.
189
A reasonable heuristic is that the expected future uptime is
190
proportional to the current uptime. That is, nodes should not
191
become ultrapeers until they have been running for several minutes.
194
It is important to distinguish between hosts that are ultrapeer capable
195
and hosts that are actually ultrapeers:
198
* Ultrapeer: hosts that are ultrapeer capable and not a shielded leaf
199
node to an ultrapeer. Note that an ultrapeer does not necessarily
200
have leaf connections.
202
* Leaf: hosts that have only a single, routed connection to an
203
ultrapeer. Note that leaves may actually be ultrapeer capable.
206
When hosts connect, they express their capabilities and intents using
207
Gnutella 0.6 [xyz5] connection headers. In some cases, this will require
208
some negotiation to prevent too many nodes from becoming ultrapeers.
209
There are five possible cases, shown below.
213
3.1. Leaf to Ultrapeer
215
The simplest case is a leaf node connecting to a ultrapeer. The key
216
header here is "X-Ultrapeer", which tells whether a host plans on acting
217
as a ultrapeer (if true) or a shielded node (if false). When this
218
interaction is over, the leaf is a shielded node of the ultrapeer.
219
The leaf should drop any Gnutella 0.4 connection it has and send a QRP
226
User-Agent: LimeWire 1.9
228
X-My-Address: 10.254.0.16:6349
232
X-Ultrapeer-Needed: false
233
User-Agent: LimeWire 1.9
234
X-Try-Ultrapeers: 23.35.1.146:6346,
236
X-Try: 24.37.144:6346, 193.205.63.22:6346
237
X-My-Address: 10.254.0.16:6346
243
The meaning of the other headers are as follows:
245
User-Agent: the identity of the vendor
247
X-Ultrapeer-Needed: used to balance the number of ultrapeers.
248
This is discussed in detail in section 5 below.
250
X-Query-Routing: the version of query routing to be used between leaf
251
and ultrapeer, IF a shielded leaf/ultrapeer connection is established
253
X-My-Address: the host's address and port reported in query replies
256
X-Try: addresses of non-ultrapeer hosts to try if the connection
259
X-Try-Ultrapeers: addresses of ultrapeer to try if the connection
263
These headers are sent in almost all interactions. However, for clarity,
264
non-essential headers will be ommitted in the remaining examples.
265
It is important to note that headers can be sent in any order. Also,
266
case is ignored in "True" and "False".
268
3.2. Leaf to Shielded Leaf
271
The next case we consider is a ultrapeer-incapable node A connecting to
272
a leaf node B who happens to have a connection to a ultrapeer S. In this
273
case, B will refuse to accept the connection, redirecting A instead to S.
274
Note that B does not send "200 OK" in its response.
279
GNUTELLA/0.6 503 I am a shielded leaf node
281
X-Try-Ultrapeers: 18.2.3.14:6346,
283
[terminates connection]
286
At this point, A should should try to establish a connection to S,
287
as in case (1) above.
289
[Note by RAM: should send a "204" error here, since there's no content]
291
3.3. Leaf to Unshielded Leaf
294
Sometimes nodes will be ultrapeer-incapable but unable to find
295
a ultrapeer. In this case, they behave exactly like old, unrouted
296
Gnutella 0.4 connections.
307
The original ultrapeer proposal involved two TCP connections in this case.
308
This version is much simpler.
311
3.4. Ultrapeer to Ultrapeer
314
When two ultrapeers meet, both set X-Ultrapeer true. If both have leaf
315
nodes, they will remain ultrapeers after the interaction. Note that
316
no QRP route table is sent between ultrapeers after the connection
327
Note that the X-Ultrapeer-Needed header is ignored in this case. This is
328
discussed in the next section.
330
3.5. Ultrapeer to Ultrapeer, with Leaf Guidance
333
Sometimes there will be too many ultrapeer-capable nodes on the network.
334
Consider the case of an ultrapeer A connecting to an ultrapeer B.
335
If B doesn't have enough leaves, it may direct A to become a leaf node.
336
If A has no leaf connections, it stops fetching new connections, drops
337
any Gnutella 0.4 connections, and sends a QRP table to B. (If A has
338
leaf connections, it obeys the guidance, as in the above case.) Then B
339
will shield A from all traffic.
346
X-Ultrapeer-Needed: false
351
The original ultrapeer proposal involved two TCP connections in this case.
352
This version is much simpler.
354
----------------------
355
[1] Readers familiar with the QRP protocol will note that only some of
356
the protocol is actually used. In particular, all route table entries
357
will be limited to INFINITY or 1.
359
[xyz1] I don't like using this word. It's not accurate.
360
[xyz2] Cite...except it doesn't exists any more.
361
[xyz3] Expand this section...perhaps discuss XML.
362
[xyz4] Bullet points over-simplifies here. For example up-to-dateness
363
and efficiency are related. There are tradeoffs.