CS8803D Course Reading Summaries Paper #: 3.app - 31 Title: Disconnected Operation in the Coda File System (1) Problems This project sought to provide a file system in which a user could keep working on his files even after getting disconnected from the file server. The sub-problems that result involve making sure the user will have available the files he needs to work on, even if the disconnection is involuntary; and, being able to correctly synchronize or re-integrate the user's work into the master file system, once he re-connects. Since the user may make many changes to the file system while disconnected, it's necessary to maintain a replay log (and to simplify and shorten it as much as you can), so that the changes can be integrated into the server system upon reconnection. At that point, there may be conflicts with changes made by other users during the time of disconnection, and those have to be reconciled. (2) New Ideas and Strengths It's a good idea, to get both the advantages of very reliable data storage on a server machine (probably backed up frequently, protected from power outages, etc.), while also being able to disconnect from the server and work on your local disk. This is handy for when you're mobile and can't be connected to the server, for when the connection gets severed accidently, and for when something goes wrong with the server. The strength of the system is that you get the best of both worlds; local disk convenience and server disk reliability and availability. Also, the designers made what appear to me to be a lot of good decisions. For example, they chose optimistic replica control because they thought Unix typically has a low degree of write sharing. As their measurements indicate, this was a good decision, and there are very few conflicts that result from allowing multiple users to make changes to files while some are disconnected. (Although, they note that as the system grows larger and as disconnect times increase, there may be more conflicts.) Venus' simple operational states of hoarding (most of the time), emulation (while disconnected), and reintegration (at time of reconnect), seem elegant and well-chosen. It also seems that the designers' extensive experience with file systems (from the Andrew system and others) has helped them to flesh out these basic states with well-thought out and implemented processes like use of replay logs and transactions to maintain the recoverable virtual memory. (3) Weaknesses and Extensions It did take a minute or so to reintegrate a previously disconnected user into the system, but this seems a small price to pay for the benefits gained, and with present-day network, processor and disk speeds, this time is probably much less. For a very large system of users, particularly where they might be a lot of file sharing, the possibility exists that there might be a lot of conflicts to resolve when somebody reconnects, and it appears that their method to reconcile these conflicts doesn't always work and you have to reconcile them manually. This could be a big pain. So, improvement in the automatic reconciliation mechanism might be needed. The authors propose to use Coda as a base for adding transactional support to Unix. I think this relates to a log-structured file system, and could help with allowing Unix file systems to scale well when they include thousands of nodes. I am not sure what practical file-sharing systems need that many nodes, but I imagine they exist. Perhaps it could be useful in a large software effort, where hundreds of people are working concurrently on a large system. The authors also point out an opportunity to extend the Coda system to support weakly connected environments, where connectivity if intermittent or of low bandwidth. -- END --