Lab Home | Phone | Search | ||||||||
|
||||||||
With the advent of open source parallel file systems a new usage pattern emerges: users isolate subsystems of parallel file systems and put them in contexts not foreseen by the original designers, e.g. an object-based storage back end gets a new REST-ful front end to become Amazon Web Service's S3 compliant key value store, or a data placement function becomes a placement function for customer accounts. This trend shows a desire for the ability to use existing file system services and compose them to implement new services. We call this ability "programmable storage systems". In this talk I will argue that by designing programmability into storage systems has the following benefits: (1) we are achieving greater separation of storage performance engineering from storage reliability engineering, making it possible to optimize storage systems in a wide variety of ways without risking years of investments into code hardening; (2) we are creating an environment that encourages people to create a new stack of storage systems abstractions, both domain-specific and across domains, including sophisticated optimizers that rely on machine learning techniques; (3) we inform commercial parallel file system vendors on the design of low-level APIs for their products so that they match the versatility of open source storage systems without having to release their entire code into open source; and (4) use this historical opportunity to leverage the tension between the versatility of open source storage systems and the reliability of proprietary systems to lead the community of storage system designers. I will illustrate programmable storage with an overview of programming abstractions that we have found useful so far, and if time permits, talk about "scriptable storage systems" and the interesting new possibilities of truly data-centered software engineering it enables. Host: Curt Canada |