Science Up To Par Project
Scientists are increasingly turning to interpreted languages, such as Python, Perl and R, to implement their data analysis algorithms. While such languages permit rapid software development, their implementations often experience performance issues that slow down the scientific process. Source-level approaches for optimization and parallelization of interpreted languages are problematic for two reasons: first, many of the language features common to these languages can be challenging for the kinds of analyses needed for parallelization; and second, even where such analysis is possible, a language-specific approach implies that each language would need its own parallelizing compiler and/or constructs, resulting in significant duplication of effort.
The Science Up To Par project is investigating a radically different approach to this problem: automatic parallelization at the machine code level using machine-code-level trace information and leveraging the computational patterns that typically occur in interpreted data analysis codes. The key to accomplishing this will be the static and dynamic analysis of executables and the reconstitution of such executables into optimized, parallel executables. The key insight is that with trace information it should be possible to optimize out significant interpreter overhead and other dynamic features in a language-agnostic manner and create parallelized executables for multicore architectures. If successful, this can allow scientists to continue to develop in programming environments that most conveniently support their scientific exploration without paying significant performance overheads currently associated with many such environments. This should enable the use of on-node parallelism to evaluate and prototype data analysis algorithms on large datasets.
People
- Michelle Strout (PI)
- Saumya Debray (Co-PI)
- Kate Isaacs (Co-PI)
- Jon Stephens (AMP graduate, now PhD student at UT, Austin)
- Sabin Devkota (PhD Student)
- Wei He (PhD Student)
- Brandon Neth (URA)
- Riley Campillo (URA)
- Hunter McNenny (URA)
- Ben Schroeder
Publications
CFGExplorer: Designing a Visual Control Flow Analytics System around Basic Program Analysis Operations, Computer Graphics Forum, 2018, (pdf), (BibTEX).Enabling Specialization for Dynamic Programming Languages, The University of Arizona, August 2018, , (BibTEX).
Handling Nested Parallelism, Load Imbalance, and Early Termination in an Orbital Analysis Code, Tech report, arXiv:1707.09668, July, 2017, (pdf), (BibTEX).
Language-Agnostic Optimization and Parallelization for Interpreted Languages, Blue Sky paper in The 30th International Workship on Languages and Compilers for Parallel Computing (LCPC), October, 2017, , (BibTEX).
Analyzing Parallel Programming Models for Magnetic Resonance Imaging, The 29th International Workship on Languages and Compilers for Parallel Computing (LCPC), September, 2016, (pdf), (BibTEX).