PROTOCOL FOR EFFICIENT FILE FINDING 1. Introduction This document describes the Protocol for Efficient File Finding (PEFF). PEFF is a network protocol that runs over TCP. Its purpose is to assemble participating nodes into a ring topology and forward queries through this topology. The actual transfer of file data is done outside the PEFF protocol; once a file is located, FTP is used to retrieve its contents. 2. Participants Each entity participating in the protocol is referred to as a "node". Each node's behavior is identical to that of every other node. Each node stores a set of files, each with a name which identifies the file. 3. Messages 3.1 TCP All messages between nodes will be sent over a TCP connection. If a node A wishes to send a message to node B, it may reuse an existing TCP connection to node B, or create a new TCP connection to node B on port 4251. 3.2 Message Overview Messages are sent individually. While some messages should elicit a response from the receiver, each message is decipherable without any context. Nodes therefore do not need to keep any state regarding previously sent messages. 3.3 Message Format All messages in PEFF have a fixed length header of 4 octets (each octet is 8 bits), followed by arbitrary data octets. When multiple messages are sent over a single TCP connection, they are sent one after another with no other octets between. When a set of octets is to be interpeted as an integer, they should be interpreted as an unsigned 2-complement integer in network (big-endian) byte order. The format of the message header is as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| Type | Reserved | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Version - This is the protocol version. This value should be set to the bitstring '0000' for the protocol described in this document. - Type - The message type. Message types are described below. - Reserved - This field is reserved for future use. - Length - An integer specifying the length of the message in octets, including the 4 header octets. - Data - Arbitrary octets that must be interpeted based on the value of the message type field. 3.3 Message Types The message types and the corresponding bitstrings for the "Type" field are as follows: +----------------------+ | Type | Message | +------+---------------+ | 0000 | JOIN_REQUEST | | 0001 | JOIN_CONFIRM | | 0010 | KEEPALIVE | | 0011 | QUERY | | 0100 | FOUND | | 0101 | GOODBYE | | 0110 | MISSING | +----------------------+ The message types are described in the following sections. In all descriptions, assume the message was sent from node "A" to node "B". 3.3.1 JOIN_REQUEST A JOIN_REQUEST message indicates to B that A wishes to make a connection in the ring from A to B. The Data field of the message is ignored. 3.3.2 JOIN_CONFIRM A JOIN_CONFIRM message indicates to B that A has confirmed a request to join; the nodes are now considered connected. The Data field of the message is ignored. 3.3.3 KEEPALIVE A KEEPALIVE message indicates to B that A is functioning properly. The Data field of the message is ignored. 3.3.4 QUERY A QUERY message is a search for a file. The Data field of a QUERY message is structured as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | QType | ID | Time-to-Live (TTL) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source IP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Query data ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - QType - The type of query. Currently, the only allocated QType is 0. - ID - this is an identifier which the sender places in the query to aid in matching returned responses with a particular query. - TTL - An integer specifying the time-to-live of the query; it is decremented each time the query is forwarded. The query is discarded if this value reaches 0. - Source IP Address - This is the IP address of the node sending the query. It is used by other nodes to respond to the query. - Query data - This is an arbitrary number of octets containing the query itself. The octets will be interpeted according to the value of QType. The length of this field can be calculated from the Length field of the message. 3.3.5 FOUND A FOUND message indicates to node B that node A has found one or more files matching a query. The Data field of a FOUND message is structured as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | FType | ID | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Location ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - FType - The type of location specifier. Currently, the only allocated FType is 0. - ID - The ID of the query which this FOUND message is a response to. - Reserved - This field is reserved for future use. - Location - This is an arbitrary number of octets containing the location of the file. The octets will be interpreted according to the value of FType. The length of this field can be calculated from the Length field of the message. 3.3.6 GOODBYE A GOODBYE message indicates to B that A is destroying the connection in the ring between them. The Data field is not used. 3.3.7 MISSING A MISSING message indicates to B that A is in State 1 and is looking for the other end of the ring. The Data field of a MISSING message is structured as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source IP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Source IP Address is the IP address of the original sender of the MISSING message. 3.4 Query Types The only currently allocated query type is 0, a case-insensitive substring match. A query is considered to "match" a file if the octets of the query are contained within the name of the file. 3.5 Found Types The only currently allocated found type is 0, which is a Uniform Resource Identifier, as described in RFC 2396. The Location field is interpreted as a URI pointing to the file contents. 4. Node Behavior 4.1 Node States A node can be in one of three states: - State 0 - The node has no connections to any other nodes. - State 1 - The node is connected to one other node. - State 2 - The node is connected to two other nodes. Nodes always strive to reach a higher-numbered state (which completes the ring). Furthermore, each node keeps track of the IP address of each of its neighbors (if it has any). The connections are referred to as "C0" and "C1". 4.2 Node Timers The node must keep track of various timers. Each timer is measured in seconds away from zero, and moves towards zero over time. The node must be able to set the timer to an arbitrary value, and should be triggered when the timer reaches zero. There are 3 timers in each node: - MAIN - C0_ALIVE - C1_ALIVE The following timer constants are defined: - SEND_KEEPALIVE 30 - TIMEOUT 60 4.3 Node Startup When a node starts up, it enters State 0. The MAIN timer is set to 0. The C0_ALIVE and C1_ALIVE timers are set to infinity. 4.4 Node Actions Each action is triggered by an event occurring (either a timer expiring or a message being received). For each event, different actions may occur based on the node state and the contents of the message. 4.4.1 JOIN_REQUEST received If the node is in: - State 0 - Set C0 to the IP address of the sender. Set C0_ALIVE to TIMEOUT. Send a JOIN_CONFIRM message to C0. Send a MISSING message to C0. Transition to State 1. - State 1 - Set C1 to the IP address of the sender. Set C1_ALIVE to TIMEOUT. Send a JOIN_CONFIRM message to C1. Transition to State 2. - State 2 - Randomly selection x=0|1 with a uniform distribution. Send a GOODBYE message to Cx. Send a JOIN_CONFIRM message to the sender. Set Cx to the IP address of the sender. Set Cx_ALIVE to TIMEOUT. 4.4.2 JOIN_CONFIRM received If the node is in: - State 0 - Set C0 to the IP address of the sender. Set C0_ALIVE to TIMEOUT. Send a MISSING message to C0. Transition to State 1. - State 1 - Set C1 to the IP address of the sender. Set C1_ALIVE to TIMEOUT. Transition to State 2. - State 2 - A GOODBYE message is sent to the sender. 4.4.3 KEEPALIVE received If the node is in: - State 0 - The message is ignored. - State 1 or State 2 - If the sender address is: - equal to C0; set C0_ALIVE to TIMEOUT. - equal to C1; set C1_ALIVE to TIMEOUT. - otherwise; ignore the message. 4.4.4 QUERY received If the source address indicated in the query is the same as the receiving node, the message is ignored. The node performs the requested match against its internal list of files. If a match is found, sends a FOUND message to the source address specified in the query. If more than one match is found, several FOUND messages are sent. Decrement the TTL field of the query. If the field is now 0, take no further action. Otherwise, forward the message. If the message was received from C0, send the message to C1. If the message was received from C1, forward the message to C0. 4.4.5 FOUND received Present the location information to the user. The client MAY keep track of query IDs it has sent and correlate the results with that ID. 4.4.6 GOODBYE received - State 0 - The message is ignored. - State 1 - If the IP address of the sender: - matches C0; set C0_TIMEOUT to infinity. Transition to State 0. - otherwise; the message is ignored. - State 2 - If the IP address of the sender: - matches C0; set C0 to C1. Set C0_TIMEOUT to C1_TIMEOUT. Set C1_TIMEOUT to infinity. Send a MISSING message to C0. Transition to State 1. - matches C1; set C1_TIMEOUT to infinity. Send a MISSING message to C0. Transition to State 1. - otherwise; the message is ignored. 4.4.7 MISSING received - State 0 - Send a JOIN_REQUEST to the Sender IP Address field of the message. - State 1 - Send a JOIN_REQUEST to the Sender IP Address field of the message. - State 2 - Forward the message, similar to a query. If the message was received on C0, send the message to C1. If the message was received on C1, send the message to C0. 4.4.8 MAIN timer expires - State 0 - If the IP address of a node in the ring is: - known; send a JOIN_REQUEST message to that node. Set MAIN to SEND_KEEPALIVE. - not known; Set MAIN to infinity. - State 1 - Send a KEEPALIVE message to C0. Set MAIN to SEND_KEEPALIVE. - State 2 - Send a KEEPALIVE message to C0. Send a KEEPALIVE message to C1. Set MAIN to SEND_KEEPALIVE. 4.4.9 C0_ALIVE timer expires - State 1 - Send GOODBYE to C0. Set C1_TIMEOUT to infinity. Transition to State 0. - State 2 - Send GOODBYE to C0. Set C0 to C1. Set C0_TIMEOUT to C1_TIMEOUT. Set C1_TIMEOUT to infinity. Send MISSING to C0. Transition to State 1. 4.4.10 C1_ALIVE timer expires Send GOODBYE to C1. Set C1_TIMEOUT to infinity. Send MISSING to C0. Transition to State 1. 4.4.10 User makes a query request If the node is in State 0, an error is returned to the user. Create ID for query. The query should be unique for that particular node. Note that since there are a finite number of query IDs, it is not possible for the ID to be universally unique. However, it is unlikely that all IDs will be used close enough together in time for this to cause incorrect query correlation. Create a QUERY message containing the user's query, the generated ID, and the node's source address. The QUERY message is sent to C0.