This project studies how today's cloud storage services (CSSes) support collaborative file editing. As a tradeoff for transparency and user-friendliness, they do not ask users to use version control systems; instead, they implement their own heuristics for dealing with conflicts, which however often lead to unexpected and undesired experiences. With measurement study and reverse engineering, we unravel a number of their design and implementation issues as the root causes of undesired experiences.
Driven by the findings, we propose to reconsider the collaboration support of CSSes from a novel perspective of operations without using any locks. To enable this idea, we design intelligent approaches to the inference and transformation of users' editing operations, as well as optimizations to the maintenance of a file's historic versions. We develop an open-source system, UFC2 (User-Friendly Collaborative Cloud), to embody our design, which can avoid most (98%) conflicts with little (2%) time overhead.
We provide the customization code of UFC2 on GitHub.
The code includes the following parts.
Description | Source Code |
Operation Inference & Transformation | diff_match_patch.py |
operation.py | |
Server-client Message Exchange | Start.py |
GRPCServer_impl.py | |
GRPCServer_pb2.py | |
GRPCServer_pb2_grpc.py | |
GRPCServer.proto | |
user_session.py | |
Maintenance of Historic Versions | cdc.py |
ocdc.py | |
vnode.py | |
metadata.py | |
versions.py | |
AWS S3 Management | s3connector.py |
s3operator.py | |
System Unitization | config.py |
chunkcache.py | |
cache.py | |
global_variables.py | |
utils.py | |
A Plug-in for Microsoft Word Files | docx_util (7 python files in this directory) |
We collected real-world collaboration traces (i.e., files during collaboration) under multiple cloud storage platforms. These data are publicly available at Google Drive (14GB, in archive).
Besides, we also publicize the web-client (source codes) and server-side system (binary files) of Overleaf to facilitate academic research. Note that Overleaf is a typical product for monitored editors (i.e., collaboration tools that monitor users' real-time edit status to avoid conflicts) -- another kind of work for online collaboration.
Some early research results of this project are published in FAST'20. A poster of this project (which is also demonstrated at FAST'20) can be found here.
[FAST'20] Jian Chen, Minghao Zhao, Zhenhua Li, Ennan Zhai, Feng Qian, Hongyi Chen, Yunhao Liu, and Tianyin Xu. "Lock-Free Collaboration Support for Cloud Storage Services with Operation Inference and Transformation." In Procddeings of the 18th USENIX Conference on File and Storage Technologies, pp. 13-27. 2020.
zhaominghao.thu [AT] gmail.com
lizhenhua1983 [AT] gmail.com