The computer industry is at a significant crossroads. Constrained by heat and power usage, today, all computing devices are composed of processors with increasing numbers and a variety of cores (CPU, GPU, FPGA, and DSP). These processors offer little or no increase in clock speed per core. This new computing era has brought a twofold challenge in building software: how to expose the parallelism in the software without much of a hassle and how to fully exploit the performance of all the processing elements. Our research aims to solve these complex challenges by building parallel programming models and runtimes to increase productivity and achieve high performance and energy-efficient execution.

Why do we need HPC?

News

2023

» Vanshika Jain joined the lab as a PhD student.

» Project proposal accepted by the Shell R&D.

» Dr. Vivek will be teaching the course Operating System in Monsoon 2023 semester.

2022

» Poster accepted in the HiPC conference.

» Dr. Vivek will be teaching a new course Parallel Runtimes for Modern Processors.

» Sunil Kumar selected for Google PhD fellowship in Systems and Networking category. He was one of the 7 awardee in India.

2021

» Paper accepted in the EduHiPC workshop.

» Hardik's BTP received appreciation letter from the IIITD.

» Our SC '21 paper is one of the five Best Reproducibility Advancement Finalist papers.

» Sunil shortlisted for SIGHPC travel grant (only six awardees worldwide).

» Dr. Vivek Kumar co-organizing PPEE workshop at the HiPC confrence.

» Sunil shortlisted for student volunteer at the SC confrence.

» Our SC paper received badges for artifact availabile, artifact functional, and results reproduced.

» Sunil Kumar joined HiPeC Lab as a PhD student.

» Paper accepted in the SC conference.

2020

» Paper accepted in the HiPC conference.

2019

» Dr. Vivek Kumar delivered an invited talk at the IndoSys'19.

» Paper accepted in the EuroPAR conference.

» Paper accepted in IWOMP workshop. This work was in collaboration with Texas Instruments (Sugar Land, USA).

2018

» Dr. Vivek Kumar presented a tutorial on HClib at the HiPC conference.

» Abhiprayah Tiwari successfully defended his MTech Thesis titled "High Performance and Energy Optimal Parallel Programming on CPU and DSP based MPSoC".

2017

» Thanks to Texas Instruments for donating EVMK2H development board.

» Paper in the AsHES workshop.

» Dr. Vivek joined IIIT-Delhi as an Assistant Professor.

Research Papers

[HiPC SRS 2022]: S. Kumar, V. Kumar, and S. Bhalachandra, "Energy-Efficient Execution of Multicore Parallel Programs under Limited Power Budget", in in 29th IEEE International Conference on High Performance Computing, Data, and Analytics Student Research Symposium (HiPC SRS), Bangalore, India, December 2022.

[EduHiPC 2021]: V. Kumar, "Teaching High Productivity and High Performance in an Introductory Parallel Programming Course", in Proceedings of the 28th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW), Bangalore, India, December 2021.

[SC 2021]: S. Kumar, A. Gupta, V. Kumar, and S. Bhalachandra, "Cuttlefish: Library for Achieving Energy Efficiency in Multicore Parallel Programs", in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'21), St. Louis, MO, USA, November 2021.

[HiPC 2020]: V. Kumar, "PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks", in 27th IEEE International Conference on High Performance Computing, Data, and Analytics, Pune, India, December 2020.

Due to the challenges in providing adequate memory access to many cores on a single processor, Multi-Die and Multi-Socket based multicore systems are becoming mainstream. These systems offer cache-coherent Non-Uniform Memory Access (NUMA) across several memory banks and cache hierarchy to increase memory capacity and bandwidth. Random work-stealing is a widely used technique for dynamic load balancing of tasks on multicore processors. However, it scales poorly on such NUMA systems for memory-bound applications due to cache misses and remote memory access latency. Hierarchical Place Tree (HPT) is a popular approach for improving the locality of a task-based parallel programming model, albeit it requires the programmer to map the dynamically unfolding tasks over a NUMA system evenly. Specifying data-affinity hints provides a more natural way to map the tasks than HPT. Still, a scalable work-stealing implementation for the same is mostly unexplored for modern NUMA systems.

This paper presents PufferFish, a new async–finish parallel programming model and work-stealing runtime for NUMA systems that provide a close coupling of the data- affinity hints provided for an asynchronous task with the HPTs in Habanero C/C++ library (HClib). PufferFish introduces Hierarchical Elastic Tasks (HET) that improves the locality by shrinking itself to run on a single worker inside a place or puffing up across multiple workers depending on the work imbalance at a particular place in an HPT. We use a set of widely used memory-bound benchmarks exhibiting regular and irregular execution graphs for evaluating PufferFish. On these benchmarks, we show that PufferFish achieves a geometric mean speedup of 1.5× and 1.9× over HPT implementation in HClib and random work-stealing in CilkPlus, respectively, on a 32-core NUMA AMD EPYC processor.

[IWOMP 2019]: V. Kumar, A. Tiwari, and G. Mitra, "HetroOMP: OpenMP for Hybrid Load Balancing Across Heterogeneous Processors", in 15th International Workshop on OpenMP, Lecture Notes in Computer Science (LNCS), Springer, Auckland, New Zealand, September 2019.

[EuroPar 2019]: V. Kumar, "Featherlight Speculative Task Parallelism", in 25th International European Conference on Parallel and Distributed Computing, Lecture Notes in Computer Science (LNCS), Springer, Göttingen, Germany, August 2019.

[IPDPSW 2017]: M. Grossman, V. Kumar, N. Vrvilo, Z. Budimlic, and V. Sarkar, "A Pluggable Framework for Composable HPC Scheduling Libraries", in Proceedings of IEEE International Parallel and Distributed Processing Symposium Workshops, ACM, Orlando, Florida, USA, May 2017.

Driven by the increasing diversity of current and future HPC hardware and software platforms, the HPC community has seen a dramatic increase in research and development efforts into the composability of discrete software systems. While modularity is often desirable from a software engineering, quality assurance, and maintainability perspective, the barriers between software components often hide optimization opportunities. Recent examples of work in composable HPC software include GPU-Aware MPI, OpenMP’s target directive, Lithe, HCMPI, and MVAPICH’s unified communication runtime. These projects all deal with breaking down the walls between software or hardware components in order to achieve performance, programmability, and/or portability gains. However, they also generally focus on composing only specific types of HPC software and have limited extensability.

In this paper, we present work on using a pluggable API framework on top of a "generalized work-stealing" runtime to achieve composability of communication, accelerator, and other HPC libraries. We motivate this work by the increasing heterogeneity of HPC hardware, software, and applications, and note that as heterogeneity increases many discrete software frameworks will need to cooperate within a single process. Our framework, called HiPER (a Highly Pluggable, Extensible, and Re-configurable scheduling framework for HPC) enables exactly this cooperation. We demonstrate the programmability improvements enabled by the HiPER framework through the use of novel APIs which reduce programmer burden. We also present performance studies that demonstrate that through unified and asynchronous scheduling of composed software systems we can achieve performance improvements over hand-optimized benchmark.

[IA³ 2016]: V. Kumar, K. Murthy, V. Sarkar, and Y. Zheng, "Optimized Distributed Work-Stealing", in Proceedings of the 6th International Workshop on Irregular Applications: Architectures and Algorithms, IA3, ACM, Salt Lake City, Utah, USA, November 2016 (co-located with SC16).

[PPPJ 2016]: V. Kumar, J. Dolby, and S. M. Blackburn, "Integrating Asynchronous Task Parallelism and Data-centric Atomicity", at The 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools, PPPJ'16, Lugano, Switzerland, August 2016.

[OpenSHMEM 2016]: M. Grossman, V. Kumar, Z. Budimlic, and V. Sarkar, "Integrating Asynchronous Task Parallelism with OpenSHMEM", at The 3rd workshop on OpenSHMEM and Related Technologies, OpenSHMEM 2016, Baltimore, Maryland, USA, August 2016.

[HPEC 2015]: V. Kumar, A. Sbirlea, Z. Budimlic, D. Majeti and V. Sarkar, "Heterogeneous Work-stealing across CPU and DSP cores", at The 19th International Conference on High Performance Extreme Computing Conference, HPEC 2015, Waltham, MA, USA, September 2015.

[PGAS 2014]: V. Kumar, Y. Zheng, V. Cave, Z. Budimlic and V. Sarkar, "HabaneroUPC++: a Compiler-free PGAS Library", at The 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, Eugene, Oregon, October 2014.

[VEE 2014]: V. Kumar, S. M. Blackburn and D. Grove, "Friendly Barriers: Efficient Work-Stealing With Return Barriers", at The 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE 2014, Salt Lake City, Utah, March 2014.

[OOPSLA 2012]: V. Kumar, D. Frampton, S. M. Blackburn, D. Grove, and O. Tardieu, "Work-Stealing Without The Baggage," in Proceedings of the 2012 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications (OOPSLA 2012), Tucson, AZ, October 19-26, 2012, 2012. (selected for SIGPLAN Communications of ACM Research Highlights, 2013)

[VMIL 2012]: V. Kumar and S. M. Blackburn, "Faster Work-Stealing With Return Barriers", at The 6th workshop on Virtual Machines and Intermediate Languages, VMIL 2012, Tucson, AZ, October 2012.

[X10 2011]: V. Kumar, D. Frampton, D. Grove, O. Tardieu, and S. M. Blackburn, "Work-Stealing by Stealing States from Live Stack Frames of a Running Application," at X10' 11 Workshop collocated with PLDI 2011, San Jose, CA, June 2011.

Members

Faculty

Vivek Kumar

Assistant Professor, IIIT-Delhi

Current Students

Sunil Kumar

Ph.D candidate

(Funding: Google Ph.D Fellowship)

Vanshika Jain

Ph.D candidate

(Funding: Shell India Research Grant)

Soham Chitlangia

B.Tech. Thesis

Varun Prashar

B.Tech. Thesis

Aakarsh Jain

B.Tech. Thesis

Siddharth Nayak

B.Tech. project

Thesis Students (Alumni)

Raghav Gupta

B.Tech.

Sujay Raj

M.Tech.

Harsh Parikh

M.Tech.

Hardik Saini

B.Tech

Akshat Gupta

B.Tech

Sunil Kumar

B.Tech

Aniansh Raj Singh

B.Tech

Agamdeep Bains

B.Tech

Anuj Singh

B.Tech

Aamleen Ahmed

B.Tech

Short Projects (Alumni)

Adarsh Raj Shivam

B.Tech.

Alind Khare

B.Tech.

Vaibhav Pande

M.Tech.

Dibya Gautam

B.Tech.

Vibhu Agrawal

B.Tech.

Manan Jain

B.Tech.

Viraj Parimi

B.Tech.

Varun Prashar

B.Tech.

Nikhil Hassija

B.Tech.

Rajat Mahey

M.Tech.

Gaurav Joshi

M.Tech.

Abhijeet Singh

B.Tech.

Abhiprayah Tiwari

M.Tech.

Courses Offered

PRMP (CSE513)
AP (ACM CSEDU)
FPP (CSE502)
AP (CSE201)

Parallel Runtimes for Modern Processors (CSE513)

Fall 2022

Computing hardware is becoming more and more complex. Today and in the foreseeable future, performance will be delivered principally by increased hardware parallelism. Modern multicore processors scale to over one hundred cores, have wide vector units, maintain a complex memory hierarchy and even share the memory with accelerators such as GPU. Conventional programming models using threads impose significant complexity to organize code into multiple threads of control and balance work amongst threads to ensure proper utilization of computing resources. This shortcoming has helped the advent of parallel runtimes that assist the programmer by efficiently scheduling the parallel tasks over available resources. This course introduces the design and implementation of such a parallel runtime and explores the challenges in achieving performance and energy efficiency over modern processors. This course is offered at IIIT Delhi for undergraduate and postgraduate students. Students interested in taking this course should have prior experience with C/C++ programming.

Learn More

Advanced Programming (ACM CSEDU FDP)

Jan 2023, July 2023

The CSEDU program is for teachers of Computer Science in both Engineering (B.Tech/BE) as well as non-Engineering (B.Sc./BCA/MCA). The aim is to improve the teaching capability of teachers in different subjects of Computer Science. AICTE has agreed to providing FDP credits to the subjects offered under CSEDU. IIIT-Delhi issues the certificate, and the certificate mentions that the program is supported by ACM India. The main goal of the Advanced Programming module is to prepare the participants to build programs using an object-oriented approach, reusable code design, test-driven development, and pattern-oriented design and implementation. The topics discussed in this module are object oriented paradigm, classes and objects, class relationships, interfaces, inheritance, polymorphism, defensive programming, unit testing, modelling techniques, design patterns, multithreading, event driven programming, and tools for plagiarism detection.

Learn More

Foundations of Parallel Programming (CSE502)

Spring 2017-2023

Multicore processors are ubiquitous. It is an unavoidable consequence of the breakdown of Dennard scaling, which has put a stop to hardware delivering ever faster sequential performance. Hence, it is essential to parallelize the software applications running on these multicore processors for achieving high performance. FPP introduces the fundamentals of parallel programming. It covers both the traditional approaches and new advancements in the area of parallel programming. A key aim of this course is to provide hands-on knowledge on parallel programming by writing parallel programs in different programming models taught in this course. This course is offered in the spring semester at IIIT Delhi for undergraduate and postgraduate students. Students interested in taking this course should have prior experience with C/C++ programming.

Learn More

Advanced Programming (CSE201)

Fall 2017-2021

The Advanced Programming is a successor to the Introduction of Programming course. The main goal of this course is to prepare students to the challenge of building large-scale programs which multiple functional components, some of which could be designed/implemented independently. The course will use Java to to introduce students to concepts of object orientation, reusable code design, test-driven development, programming to an application-programming-interface, pattern oriented program design and implementation etc. At the end of the course the students are expected to be able to work in teams in order to develop large application programs starting from a reasonably well-defined application design with multiple independent components with well-defined interfaces.

Learn More

Funding

» Research grant from the Shell R&D (USD 53K).

» Google PhD fellowship for Sunil Kumar (USD 50K).

» Initation research grant from IIIT-Delhi (INR 500K) .

Gallery

Links

» Open sourced repositories of the software developed in the HiPeC lab are available here.

» Use this website for making reservations on HiPeC servers. VPN authentication required outside IIIT-Delhi.

» Go through the HiPeC Lab user manual if you are a new joiner. VPN authentication required outside IIIT-Delhi.

HiPeC Lab

Why do we need HPC?

News

2023

2022

2021

2020

2019

2018

2017

Research Papers

Members

Faculty

Vivek Kumar

Current Students

Sunil Kumar

Vanshika Jain

Soham Chitlangia

Varun Prashar

Aakarsh Jain

Siddharth Nayak

Thesis Students (Alumni)

Raghav Gupta

Sujay Raj

Harsh Parikh

Hardik Saini

Akshat Gupta

Sunil Kumar

Aniansh Raj Singh

Agamdeep Bains

Anuj Singh

Aamleen Ahmed

Short Projects (Alumni)

Adarsh Raj Shivam

Alind Khare

Vaibhav Pande

Dibya Gautam

Vibhu Agrawal

Manan Jain

Viraj Parimi

Varun Prashar

Nikhil Hassija

Rajat Mahey

Gaurav Joshi

Abhijeet Singh

Abhiprayah Tiwari

Courses Offered

Parallel Runtimes for Modern Processors (CSE513)

Advanced Programming (ACM CSEDU FDP)

Foundations of Parallel Programming (CSE502)

Advanced Programming (CSE201)

Funding

Gallery

Links

Contact