The availability of cache devices with diverse cost-performance profiles has improved the prospects of multi-tier caches. Workload sizes continue to grow making DRAM-only caches not large enough to yield acceptable hit rates. Furthermore, multi-tier caches have been shown to improve cost-efficiency. We want to configure multi-tier caches based on device and workload properties while efficiently evaluating the large configuration space. We are also interested in optimizing multi-tier caches by using better cache admission policies.
Random spatial sampling is used to reduce overhead for trace collection and analysis. We extend random spatial sampling to work with multiblock storage requests where it generates samples with lower error in features like mean read/write request size, mean read/write interarrival time, and write ratio compared to random spatial sampling.
Developed a block storage system stressor using the C++ cache engine CacheLib to cache the data and libaio to transfer data to and from the backing store. Replayed diverse production workloads across different types of storage server that add up to more than 10 years of compute time to derive insights about sizing multiple cache tiers, selecting storage devices, and improving cost efficiency.
GitHub PDFA python library to analyze and sample block storage traces. Implemented random spatial sampling and augmented it to improve performance for block storage traces.
GitHubPyMimircache is an open source cache simulation framework developed by Junchen Yang as part of the Emory SimBioSys Lab. I implemented miniature-simulations of a workload based on paper in FAST'15: Efficient MRC Construction with SHARDS by Carl A. Waldspurger, Nohhyun Park, Alexander Garthwaite, and Irfan Ahmad, CloudPhysics, Inc.
GitHubUsed a Convolutional Neural Network (CNN) to classify block traces. Each trace was converted into access plot images with block addresses on the y-axis and time on the x-axis. These images are later classified by the CNN.
PDFUsing performance data from block trace replay, we used random forest regression to predict the optimal split of budget between DRAM and SSD for multi-tier caching.
PDFBeing an inexpensive approach compared to other network model, wireless mesh networks(WMN) are a perfect tool to connect underprivileged areas to the global network. We analyze how we can improve connectivity of WMNs using limited multi-radio nodes which are expensive.
PDFCreated a website to extract features from file system traces and display them. Trace files were stored in S3, processed in chunks using Lambda functions, and the metadata was stored in DynamoDB. The trace features were visualized using d3.js.
Tyler Estro, Mário Antunes, Pranav Bhandari, Anshul Gandhi, Geoff Kuenning, Yifei Liu, Carl Waldspurger, Avani Wildani and Erez Zadok
31st International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2023)
Tyler Estro, Pranav Bhandari, Avani Wildani, Erez Zadok
12th {USENIX} Workshop on Hot Topics in Storage and File Systems (HotStorage 20)
PDFPranav Bhandari, Rahul Chandrashekhar, Peter Yoon
CSC'15 - The 2015 International Conference on Scientific Computing, Las Vegas, Nevada
PDF