For a lot of reasons, multiple catalogs of astronomical objects are going to be needed. The various catalogs contain lots of information, everything from simple positional information, to detailed spectrum, polarity, and flux data. Many analysis or prospecting projects begin with lifting out facets of these catalogs, and cross-correlating various values. Unfortunately, this seemingly simple process often involves multiple steps of selecting, calibrating, and correlating values from multiple catalogs. Catalogs contain many kinds of objects as well, including galaxies, nebulae, and stars in various epochs of their life.
Choosing the catalogs that cover the most features is no simple matter. Fortunately, most surveys have some primary science designed. Specific wavelengths, positional data, angular velocity, spectrum values, flux curves, and many many more features can be mined from the available public data. Catalog query services are available that will do a lot of the heavy lifting. However, the output is not always going to be in the most helpful format. So, for quick, ad hoc investigations, this is a good solution.
To take investigations to the next level, perhaps using machine learning, big data analysis, and other feature selections that require calculated parameters, the available interfaces will come up a bit short. In light of this likely circumstance, I am assembling local copies of well-known catalogs. In their native formats, each one varies in how their data is presented. Not just in data table content, but in file format as well. Some are in FITS files, others are in text. Still others have reduced data into still other formats. This jumble of data cannot be reconciled without an extensive workflow of intermediate steps, linked together to form a single answer to a pretty specific question. This process is not very modular by design.
Enter the methods of Data Science and Analysis. By applying some basic ideas and methods now in common use for making sense of mountains of data, it is time that Big Data took it's place in the pipeline of astronomical data processing and analysis.