The University of Arizona
banner image

Science Up To Par Project

Scientists are increasingly turning to interpreted languages, such as Python, Perl and R, to implement their data analysis algorithms. While such languages permit rapid software development, their implementations often experience performance issues that slow down the scientific process. Source-level approaches for optimization and parallelization of interpreted languages are problematic for two reasons: first, many of the language features common to these languages can be challenging for the kinds of analyses needed for parallelization; and second, even where such analysis is possible, a language-specific approach implies that each language would need its own parallelizing compiler and/or constructs, resulting in significant duplication of effort.

The Science Up To Par project is investigating a radically different approach to this problem: automatic parallelization at the machine code level using machine-code-level trace information and leveraging the computational patterns that typically occur in interpreted data analysis codes. The key to accomplishing this will be the static and dynamic analysis of executables and the reconstitution of such executables into optimized, parallel executables. The key insight is that with trace information it should be possible to optimize out significant interpreter overhead and other dynamic features in a language-agnostic manner and create parallelized executables for multicore architectures. If successful, this can allow scientists to continue to develop in programming environments that most conveniently support their scientific exploration without paying significant performance overheads currently associated with many such environments. This should enable the use of on-node parallelism to evaluate and prototype data analysis algorithms on large datasets.


People


Publications

CFGExplorer: Designing a Visual Control Flow Analytics System around Basic Program Analysis Operations, Sabin Devkota and Katherine E. Isaacs, Computer Graphics Forum, 2018,
(pdf), (BibTEX).

Enabling Specialization for Dynamic Programming Languages, Jon Stephens, The University of Arizona, August 2018, , (BibTEX).

Handling Nested Parallelism, Load Imbalance, and Early Termination in an Orbital Analysis Code, Benjamin James Gaska, Neha Jothi, Mahdi Soltan Mohammadi, Kat Volk, and Michelle Mills Strout, Tech report, arXiv:1707.09668, July, 2017, (pdf), (BibTEX).

Language-Agnostic Optimization and Parallelization for Interpreted Languages, Michelle Mills Strout, Saumya Debray, Katherine E. Isaacs, Barbara Kreaseck, Julio Cardenas-Rodriguez, Bonnie Hurwitz, Kat Volk, Sam Badger, Jesse Bartels, Ian Bertolacci, Sabin Devkota, Anthony Encinas, Ben Gaska, Brandon Neth, Theo Sackos, Jon Stephens, Sarah Willer, and Babak Yadergari, Blue Sky paper in The 30th International Workship on Languages and Compilers for Parallel Computing (LCPC), October, 2017, , (BibTEX).

Analyzing Parallel Programming Models for Magnetic Resonance Imaging, Forest Danford, Eric Welch, Julio Cardenas-Rodriguez, and Michelle Mills Strout, The 29th International Workship on Languages and Compilers for Parallel Computing (LCPC), September, 2016, (pdf), (BibTEX).