Programming Language Strategy

Submitted by cmurders on Mon, 12/14/2020 - 10:37

Most modern platforms have come to realize that every programming language has it's own set of strengths and weaknesses, and there is not really a one size fits all language. Each should be used according to specific needs and requirements, with a dose of convenience factored in. The number of languages in use is only going to continue to grow as new and more specialized requirements emerge.  However, this often forces an organization to support many more languages than is practical for their size. This challenge often forces decisions for their platform to be made based on staffing options instead of functional or financial choices. Certainly a cost to feature balance will be managed, but it is important to have a good idea of what a healthy ecosystem  that is reasonable to manage might include.

As an example, fast and precise function algorithms that are well understood are often formalized into low level, very efficient C++ libraries. These functions would be loaded into the environment of a single program only as needed, and put to work when required. This could include specialized tasks like text search, database access, audio conversion, data frame and array transformation operators, and data modeling for machine learning. These are the fundamental components that will be used to build a larger frameworks supporting more complex operations. This is the toolbox of nuts and bolts that will get assembled into engines that will be the workhorses of the operation for a program. Often, these libraries are continually developed by specialists for tighter precision, faster operation, fewer errors, and better functionality. 

The glue that juggles and manages these functions is a higher level language that summons each one in a controlled manner, passing outputs of one to the input of another. It also adds it's own framework for user interaction, sanity checking, performance monitoring, and status reporting along the way. The various strengths and weaknesses are arbitrary and quite personal at this level of abstraction. Personal comfort and fluency is really important for a healthy ecosystem to thrive, so this will always be a function of the organization and the programmers that participate.

This factor really comes into play when there is a significant amount of prototyping happening. When new ideas and features, required changes, and extension of current features are coming fast and furiously, you need a team that can work efficiently in a familiar landscape. Modern development models stress quick deployment of a minimally functional deliverable, and then layers of additional functionality are built up around that skeletal core.  To do this, a programmer friendly language is required for quick and dirty tests that can be tightened down in performance when the essential features are worked out. Code readability, inline documentation, and straightforward grammar are significant features to allow a team to coordinate overlapping tasks and contribute working code.

Python and R have significant overlap in their strengths, and their libraries are both full of valuable functions for data selection, transformation, and analysis. When the exact nature of the data is not well known, they provide simple frameworks for exploration and trying different approaches quickly.  In those cases, a very flexible and modular programming language is valuable. Dancing in the midst of these are things like SQL dialects, vendor and platform specific scripting languages, and dozens of related middle-ware packages. Obviously, the age of learning one programming language for a comfortable career is long gone. 

Astronomy is decidedly multi-disciplinary. The field includes Optics, RF Engineering, Particle physics, Chemistry, data processing, and a never-ending love for puzzle-solving. Each of them comes with their own sets of math and data demands. Combine that with a large instrument that needs to be constantly examined, sanity checked, and adjusted for precision, and you have a tremendously wide array of gadgets and gizmos scattered across many systems, with each one driven by it's own requirements, demands, and compromises. 

As a manageable approach, limiting the overall number of programming languages used on a daily basis is a good first step. The same language should be used from bottom to top of any vertical silo of tasks if at all possible. In the worst case, a single API to unite various parts and pieces will greatly simplify the administration and management aspects of the system at large.

For example, on an optical telescope there are many systems working in tandem. The calibration, pointing, tracking, and photography systems all work together to ensure everything is lined up for an important observation. Sometimes the photography CCD is used to determine the exact pointing of the telescope in order to move it from that relative position to a different one based on a database of object coordinates. It may then follow it for 15 minutes, all the while taking long exposures at a pre-determined pace. That complete series of photos will then be stacked to build a composite image, and then adjusted and corrected for any relics of the telescope's optical characteristics. This can be largely automated with the right software platform API doing the coordination between the mount, camera, database, and photography software. A radio telescope will have a similar interplay of systems, but the signal received will need to be electronically recorded as a stream during the observation, and then further analyzed in depth later on. 

In all of these scenarios, the acquired photographic or radio frequency data needs to be processed, normalized, corrected, and ultimately quantified. That workflow is often made more complicated than necessary when individual tools are written in a variety of languages. To make matters more administratively difficult, updated versions of languages and even the tools themselves evolve over time with bugfixes and updates that may change the way minor things are done. Any one of these could impact past and future data analysis results. 

All of this just points out the importance of creating a solid strategy of curation for the everyday compute and operating functions involved in an astronomy project. The layers will inevitably interact sooner or later, so having a strategic architecture defined and enforced will allow components to build on one another into a quality controlled processing chain, resulting in science observations that can be examined, audited, and cross-checked independently.