├── .gitignore ├── README.md ├── big data and AI dataset ├── get-started.md ├── paper-list └── paper-reviews ├── 1-paper presentation list.md ├── 20210727【ATC21】-FAASNET: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute.md ├── 20210810【ATC21】-INFaaS: Automated Model-less Inference Serving.md ├── 20210828【ATC21】-Prediction-Based Power Oversubscription in Cloud Platforms.md ├── 20210901【ATC21】-Sonic: Application-aware Data Passing for Chained Serverless Applications.md ├── 20210907【APSys 2021】-An Empirical Study on Challenges of Application Development in Serverless Computing.md ├── 20210920【APSys 2021】-Lessons Learned from Migrating Complex Stateful Applications onto Serverless Platforms.md ├── 20211007【arXiv 2021】-Harvesting Idle Resources in Serverless Computing via Reinforcement Learning.md ├── 20211012【ICDCS 2021】-Gillis: Serving Large Neural Networks in Serverless Functions with Automatic Model Partitioning.md ├── 20211030【OSDI21】-Pollux: Co-adaptive-Cluster-Scheduling-for-Goodput-Optimized-Deep-Learning.md ├── 20211102【EuroSys21】-OFC: An Opportunistic Caching System for FaaS Platforms.md ├── 20211109【SoCC2021】-LLama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines.md ├── 20211221【Middleware21】-Experience Paper: Towards Enhancing Cost Efficiency in Serverless Machine Learning Training.md ├── 20220111【Middleware21】- Towards Optimal Placement and Scheduling of DNN Operations with Pesto.md ├── 20220222【Infocom22】-StepConf: SLO-Aware Dynamic Resource Configuration for Serverless Function Workflows.md ├── 20220301【SoCC2021】-Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving.md ├── 20220329【ASPLOS22】-FaaSFlow: Enable Efficient Workflow Execution for Function-as-a-Service.md ├── 20220405【arXiv 2022】-Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads.md ├── 20220412【arXiv】-Survey on Large Scale Neural Network Training.md ├── 20220503【EuroSys22】-Memory Deduplication for Serverless Computing with Medes.md ├── 20221025【MobiSys】- Band: Coordinated Multi-DNN Inference on Heterogeneous Mobile Processors.md ├── 20221101【ATC】- Tetris: Memory-efficient Serverless Inference through Tensor Sharing.md ├── 20221108【OSDI】- Synergy: Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters.md ├── 20221115【RTSS】- Pipelined Data-Parallel CPUGPU Scheduling for Multi-DNN Real-Time Inference.md ├── 20221122【OSDI】- Serving DNNs like Clockwork Performance Predictability from the Bottom Up.md ├── 20221129【OSDI】- Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences.md ├── 20230407【Archive 23】-MuxFlow Efficient and Safe GPU Sharing in Large-Scale Production Deep Learning Clusters.md ├── 20240924 【OSDI 24】-Optimizing Resource Allocation in Hyperscale Datacenters Scalability, Usability, and Experiences.md ├── 20241015【OSDI24】- dLoRA Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving.pdf ├── 20241029 【ATC24】 -StreamBox: A Lightweight GPU SandBox for Serverless Inference Workflow.md ├── 20241106 【ATC24】 -Starburst A Cost-aware Scheduler for Hybrid Cloud.pdf ├── 20241106【NSDI23】- ModelKeeper:Accelerating DNN Training via Automated Training Warmup.pdf ├── 20241112 【SOCC24】 -Pre-Warming is Not Enough: Accelerating Serverless Inference With Opportunistic Pre-Loading.md ├── 20241112【arXiv】-mLoRA Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs.pdf ├── 20241119【OSDI24】- USHER:Holistic Interference Avoidance for Resource Optimized ML Inference.pdf ├── 20241119【arXiv】-Managing Bandwidth The Key to Cloud-Assisted Autonomous Driving.pdf ├── 20241126 【SC24】 -SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing.md ├── 20241126【ICPP24】-Online Scheduling and Pricing for Multi-LoRA Fine-Tuning Tasks.pdf ├── 20241203 【SOCC24】 -AutoBurst Autoscaling Burstable Instances for Cost-effective Latency SLOs.pdf ├── 20241203【Eurosys24】- Optimus :Warming Serverless ML Inference via Inter-Function Model Transformation.pdf ├── 20241210 【NSDI24】 -Jolteon: Unleashing the Promise of Serverless for Serverless Workflows.md ├── 20241210 【SOCC24】 -Kale Elastic GPU Scheduling for Online DL Model Training.pdf ├── 20241217【MobiCom】-Delta A Cloud-assisted Data Enrichment Framework for On-Device Continual Learning.pdf ├── 20241217【SOCC24】- Near-Lossless Gradient Compression for Data-Parallel Distributed DNN Training.pdf ├── 20241224 【CIKM24】 -PISeL Pipelining DNN Inference for Serverless Computing.md ├── 20241224 【OSDI24】 -DistServe- Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving.pdf ├── 20241231【MobiCom】-FlexNN Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices.pdf ├── 20250107【SOSP24】- PowerInfer Fast Large Language Model Serving with a Consumer-grade GPUTraining.pdf ├── 20250114【arXiv】 -CARASERVE CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference.pdf ├── 20250218【ATC24】- Power-aware Deep Learning Model Serving with μ-Serve.pdf ├── 20250311【SOCC24】- Queue Management for SLO-Oriented Large Language Model Serving.pdf ├── 20250401【NSDI25】- SuperServe Fine-Grained Inference Serving for Unpredictable Workloads.pdf └── 2025318 【ASPLOS25】-Forecasting GPU Performance for Deep Learning Training and Inference.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/README.md -------------------------------------------------------------------------------- /big data and AI dataset: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/big data and AI dataset -------------------------------------------------------------------------------- /get-started.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/get-started.md -------------------------------------------------------------------------------- /paper-list: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-list -------------------------------------------------------------------------------- /paper-reviews/1-paper presentation list.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/1-paper presentation list.md -------------------------------------------------------------------------------- /paper-reviews/20210727【ATC21】-FAASNET: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20210727【ATC21】-FAASNET: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute.md -------------------------------------------------------------------------------- /paper-reviews/20210810【ATC21】-INFaaS: Automated Model-less Inference Serving.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20210810【ATC21】-INFaaS: Automated Model-less Inference Serving.md -------------------------------------------------------------------------------- /paper-reviews/20210828【ATC21】-Prediction-Based Power Oversubscription in Cloud Platforms.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20210828【ATC21】-Prediction-Based Power Oversubscription in Cloud Platforms.md -------------------------------------------------------------------------------- /paper-reviews/20210901【ATC21】-Sonic: Application-aware Data Passing for Chained Serverless Applications.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20210901【ATC21】-Sonic: Application-aware Data Passing for Chained Serverless Applications.md -------------------------------------------------------------------------------- /paper-reviews/20210907【APSys 2021】-An Empirical Study on Challenges of Application Development in Serverless Computing.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20210907【APSys 2021】-An Empirical Study on Challenges of Application Development in Serverless Computing.md -------------------------------------------------------------------------------- /paper-reviews/20210920【APSys 2021】-Lessons Learned from Migrating Complex Stateful Applications onto Serverless Platforms.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20210920【APSys 2021】-Lessons Learned from Migrating Complex Stateful Applications onto Serverless Platforms.md -------------------------------------------------------------------------------- /paper-reviews/20211007【arXiv 2021】-Harvesting Idle Resources in Serverless Computing via Reinforcement Learning.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20211007【arXiv 2021】-Harvesting Idle Resources in Serverless Computing via Reinforcement Learning.md -------------------------------------------------------------------------------- /paper-reviews/20211012【ICDCS 2021】-Gillis: Serving Large Neural Networks in Serverless Functions with Automatic Model Partitioning.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20211012【ICDCS 2021】-Gillis: Serving Large Neural Networks in Serverless Functions with Automatic Model Partitioning.md -------------------------------------------------------------------------------- /paper-reviews/20211030【OSDI21】-Pollux: Co-adaptive-Cluster-Scheduling-for-Goodput-Optimized-Deep-Learning.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20211030【OSDI21】-Pollux: Co-adaptive-Cluster-Scheduling-for-Goodput-Optimized-Deep-Learning.md -------------------------------------------------------------------------------- /paper-reviews/20211102【EuroSys21】-OFC: An Opportunistic Caching System for FaaS Platforms.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20211102【EuroSys21】-OFC: An Opportunistic Caching System for FaaS Platforms.md -------------------------------------------------------------------------------- /paper-reviews/20211109【SoCC2021】-LLama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20211109【SoCC2021】-LLama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines.md -------------------------------------------------------------------------------- /paper-reviews/20211221【Middleware21】-Experience Paper: Towards Enhancing Cost Efficiency in Serverless Machine Learning Training.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20211221【Middleware21】-Experience Paper: Towards Enhancing Cost Efficiency in Serverless Machine Learning Training.md -------------------------------------------------------------------------------- /paper-reviews/20220111【Middleware21】- Towards Optimal Placement and Scheduling of DNN Operations with Pesto.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20220111【Middleware21】- Towards Optimal Placement and Scheduling of DNN Operations with Pesto.md -------------------------------------------------------------------------------- /paper-reviews/20220222【Infocom22】-StepConf: SLO-Aware Dynamic Resource Configuration for Serverless Function Workflows.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20220222【Infocom22】-StepConf: SLO-Aware Dynamic Resource Configuration for Serverless Function Workflows.md -------------------------------------------------------------------------------- /paper-reviews/20220301【SoCC2021】-Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20220301【SoCC2021】-Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving.md -------------------------------------------------------------------------------- /paper-reviews/20220329【ASPLOS22】-FaaSFlow: Enable Efficient Workflow Execution for Function-as-a-Service.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20220329【ASPLOS22】-FaaSFlow: Enable Efficient Workflow Execution for Function-as-a-Service.md -------------------------------------------------------------------------------- /paper-reviews/20220405【arXiv 2022】-Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20220405【arXiv 2022】-Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads.md -------------------------------------------------------------------------------- /paper-reviews/20220412【arXiv】-Survey on Large Scale Neural Network Training.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20220412【arXiv】-Survey on Large Scale Neural Network Training.md -------------------------------------------------------------------------------- /paper-reviews/20220503【EuroSys22】-Memory Deduplication for Serverless Computing with Medes.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20220503【EuroSys22】-Memory Deduplication for Serverless Computing with Medes.md -------------------------------------------------------------------------------- /paper-reviews/20221025【MobiSys】- Band: Coordinated Multi-DNN Inference on Heterogeneous Mobile Processors.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20221025【MobiSys】- Band: Coordinated Multi-DNN Inference on Heterogeneous Mobile Processors.md -------------------------------------------------------------------------------- /paper-reviews/20221101【ATC】- Tetris: Memory-efficient Serverless Inference through Tensor Sharing.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20221101【ATC】- Tetris: Memory-efficient Serverless Inference through Tensor Sharing.md -------------------------------------------------------------------------------- /paper-reviews/20221108【OSDI】- Synergy: Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20221108【OSDI】- Synergy: Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters.md -------------------------------------------------------------------------------- /paper-reviews/20221115【RTSS】- Pipelined Data-Parallel CPUGPU Scheduling for Multi-DNN Real-Time Inference.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20221115【RTSS】- Pipelined Data-Parallel CPUGPU Scheduling for Multi-DNN Real-Time Inference.md -------------------------------------------------------------------------------- /paper-reviews/20221122【OSDI】- Serving DNNs like Clockwork Performance Predictability from the Bottom Up.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20221122【OSDI】- Serving DNNs like Clockwork Performance Predictability from the Bottom Up.md -------------------------------------------------------------------------------- /paper-reviews/20221129【OSDI】- Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20221129【OSDI】- Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences.md -------------------------------------------------------------------------------- /paper-reviews/20230407【Archive 23】-MuxFlow Efficient and Safe GPU Sharing in Large-Scale Production Deep Learning Clusters.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20230407【Archive 23】-MuxFlow Efficient and Safe GPU Sharing in Large-Scale Production Deep Learning Clusters.md -------------------------------------------------------------------------------- /paper-reviews/20240924 【OSDI 24】-Optimizing Resource Allocation in Hyperscale Datacenters Scalability, Usability, and Experiences.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20240924 【OSDI 24】-Optimizing Resource Allocation in Hyperscale Datacenters Scalability, Usability, and Experiences.md -------------------------------------------------------------------------------- /paper-reviews/20241015【OSDI24】- dLoRA Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241015【OSDI24】- dLoRA Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving.pdf -------------------------------------------------------------------------------- /paper-reviews/20241029 【ATC24】 -StreamBox: A Lightweight GPU SandBox for Serverless Inference Workflow.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241029 【ATC24】 -StreamBox: A Lightweight GPU SandBox for Serverless Inference Workflow.md -------------------------------------------------------------------------------- /paper-reviews/20241106 【ATC24】 -Starburst A Cost-aware Scheduler for Hybrid Cloud.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241106 【ATC24】 -Starburst A Cost-aware Scheduler for Hybrid Cloud.pdf -------------------------------------------------------------------------------- /paper-reviews/20241106【NSDI23】- ModelKeeper:Accelerating DNN Training via Automated Training Warmup.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241106【NSDI23】- ModelKeeper:Accelerating DNN Training via Automated Training Warmup.pdf -------------------------------------------------------------------------------- /paper-reviews/20241112 【SOCC24】 -Pre-Warming is Not Enough: Accelerating Serverless Inference With Opportunistic Pre-Loading.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241112 【SOCC24】 -Pre-Warming is Not Enough: Accelerating Serverless Inference With Opportunistic Pre-Loading.md -------------------------------------------------------------------------------- /paper-reviews/20241112【arXiv】-mLoRA Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241112【arXiv】-mLoRA Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs.pdf -------------------------------------------------------------------------------- /paper-reviews/20241119【OSDI24】- USHER:Holistic Interference Avoidance for Resource Optimized ML Inference.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241119【OSDI24】- USHER:Holistic Interference Avoidance for Resource Optimized ML Inference.pdf -------------------------------------------------------------------------------- /paper-reviews/20241119【arXiv】-Managing Bandwidth The Key to Cloud-Assisted Autonomous Driving.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241119【arXiv】-Managing Bandwidth The Key to Cloud-Assisted Autonomous Driving.pdf -------------------------------------------------------------------------------- /paper-reviews/20241126 【SC24】 -SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241126 【SC24】 -SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing.md -------------------------------------------------------------------------------- /paper-reviews/20241126【ICPP24】-Online Scheduling and Pricing for Multi-LoRA Fine-Tuning Tasks.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241126【ICPP24】-Online Scheduling and Pricing for Multi-LoRA Fine-Tuning Tasks.pdf -------------------------------------------------------------------------------- /paper-reviews/20241203 【SOCC24】 -AutoBurst Autoscaling Burstable Instances for Cost-effective Latency SLOs.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241203 【SOCC24】 -AutoBurst Autoscaling Burstable Instances for Cost-effective Latency SLOs.pdf -------------------------------------------------------------------------------- /paper-reviews/20241203【Eurosys24】- Optimus :Warming Serverless ML Inference via Inter-Function Model Transformation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241203【Eurosys24】- Optimus :Warming Serverless ML Inference via Inter-Function Model Transformation.pdf -------------------------------------------------------------------------------- /paper-reviews/20241210 【NSDI24】 -Jolteon: Unleashing the Promise of Serverless for Serverless Workflows.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241210 【NSDI24】 -Jolteon: Unleashing the Promise of Serverless for Serverless Workflows.md -------------------------------------------------------------------------------- /paper-reviews/20241210 【SOCC24】 -Kale Elastic GPU Scheduling for Online DL Model Training.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241210 【SOCC24】 -Kale Elastic GPU Scheduling for Online DL Model Training.pdf -------------------------------------------------------------------------------- /paper-reviews/20241217【MobiCom】-Delta A Cloud-assisted Data Enrichment Framework for On-Device Continual Learning.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241217【MobiCom】-Delta A Cloud-assisted Data Enrichment Framework for On-Device Continual Learning.pdf -------------------------------------------------------------------------------- /paper-reviews/20241217【SOCC24】- Near-Lossless Gradient Compression for Data-Parallel Distributed DNN Training.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241217【SOCC24】- Near-Lossless Gradient Compression for Data-Parallel Distributed DNN Training.pdf -------------------------------------------------------------------------------- /paper-reviews/20241224 【CIKM24】 -PISeL Pipelining DNN Inference for Serverless Computing.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241224 【CIKM24】 -PISeL Pipelining DNN Inference for Serverless Computing.md -------------------------------------------------------------------------------- /paper-reviews/20241224 【OSDI24】 -DistServe- Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241224 【OSDI24】 -DistServe- Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving.pdf -------------------------------------------------------------------------------- /paper-reviews/20241231【MobiCom】-FlexNN Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20241231【MobiCom】-FlexNN Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices.pdf -------------------------------------------------------------------------------- /paper-reviews/20250107【SOSP24】- PowerInfer Fast Large Language Model Serving with a Consumer-grade GPUTraining.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20250107【SOSP24】- PowerInfer Fast Large Language Model Serving with a Consumer-grade GPUTraining.pdf -------------------------------------------------------------------------------- /paper-reviews/20250114【arXiv】 -CARASERVE CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20250114【arXiv】 -CARASERVE CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference.pdf -------------------------------------------------------------------------------- /paper-reviews/20250218【ATC24】- Power-aware Deep Learning Model Serving with μ-Serve.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20250218【ATC24】- Power-aware Deep Learning Model Serving with μ-Serve.pdf -------------------------------------------------------------------------------- /paper-reviews/20250311【SOCC24】- Queue Management for SLO-Oriented Large Language Model Serving.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20250311【SOCC24】- Queue Management for SLO-Oriented Large Language Model Serving.pdf -------------------------------------------------------------------------------- /paper-reviews/20250401【NSDI25】- SuperServe Fine-Grained Inference Serving for Unpredictable Workloads.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/20250401【NSDI25】- SuperServe Fine-Grained Inference Serving for Unpredictable Workloads.pdf -------------------------------------------------------------------------------- /paper-reviews/2025318 【ASPLOS25】-Forecasting GPU Performance for Deep Learning Training and Inference.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icloud-ecnu/paper-reading-list/HEAD/paper-reviews/2025318 【ASPLOS25】-Forecasting GPU Performance for Deep Learning Training and Inference.pdf --------------------------------------------------------------------------------