The Hubble Source Catalog

R. L. White (rlw[at]stsci.edu)

The Hubble Source Catalog (HSC) is designed to facilitate science from the Hubble Space Telescope by combining 48,000 visit-based source lists from the Hubble Legacy Archive (HLA) into a single master catalog. The HSC is based on publicly available images obtained with the WFPC2, ACS/WFC, WFC3/UVIS and WFC3/IR instruments. The catalog provides photometric and astrometric information for 540 million detections of 108 million astronomical objects. The measurements are cross-matched and linked in the database to enable rapid exploration of the morphology, colors, photometry and astrometry (including time variations in all these quantities) via a variety of user interfaces. The amount of data used for the HSC spans nearly 24  years, from the beginning of WFP2 observations on 1993 December 29 through the end of data that was publicly available as of 2017 October 1. The HSC is an excellent resource for either quick explorations or deep dives into the Hubble data.

Photometry and the Hubble Catalog of Variables

The current HSC version 3 release (2018 July 5) includes significant improvements to the photometry in the catalog. Most of those improvements were the direct result of work by the Hubble Catalog of Variables (HCV) project. The HCV was a 4-year, ESA-funded project at the National Observatory of Athens (PI: Alceste Bonanos) that recently culminated with the release of a major new high-level science product at both MAST and the ESAC Science Data Centre. The HCV team developed a sophisticated catalog-processing pipeline that analyzed all HSC objects with at least five repeated measurements in the same camera/filter to identify candidate variable sources.

The HCV project began their analysis using version 2 of the HSC. They identified a number of issues in the HSC photometry, including larger photometric errors near image edges (traced to errors in the SourceExtractor background computation) and biases in photometry due to misaligned exposures. The HLA project at STScI made improvements to our image and catalog generation pipeline and reprocessed all the ACS and WFC3 images for HLA DR10. Those improvements were incorporated in HSC version 3 and led to improved photometric accuracy and repeatability. The released version of the HCV was based on HSCv3.

The HCV is the first homogeneous catalog of variable sources found in the HSC. It includes variable stars in our Galaxy and nearby galaxies, as well as transients such as novae and supernovae and variable active galactic nuclei. The HCV contains 84,428 candidate variable sources (out of 3.7 million HSC sources that were searched for variability) with V <= 27 mag; for 11,115 of them the variability is detected in more than one filter. The number of data points in a light curve range from 5 to 120, and the time baseline ranges from under a day to over 15 years. The released data includes both variable objects and objects identified as "constant."

The HCV is fully integrated with HSCv3 in the MAST interfaces, including the HSC CasJobs and VO TAP database query interfaces and the MAST catalogs simple form interface and query API. See below for further information on these interfaces.  The ESAC Science Data Centre has also created the HCV Explorer, a new online web tool to access, visualize, and interactively explore the HCV.

Note that the HCV includes improved magnitudes with local corrections for objects that were searched for variability, so this data may also be useful for projects that are not primarily focused on variability. See the HCV journal paper (Bonanos et al., 2019, A&A, 630, A92) and the MAST High-Level Science Product page for more details on the project.

Star data
Figure 1: A sampling of HCV light curves from Figure 9 of Bonanos et al. (2019).

Astrometry in the SWEEPS field

The recent HSC version 3.1 release has also introduced improvements in the astrometric measurements in the HSC. HSCv3.1 provides proper motions of over 400,000 objects in the augmented Sagittarius Window Eclipsing Extrasolar Planet Search (SWEEPS) HST field. This field is within a few degrees of the Galactic center, and most of the stars belong to the Galactic bulge. The field has been observed by ACS and WFC3 with a time baseline as long as 12 years.  

The astrometry of this field is determined by cross-matching HLA source lists to Gaia DR2. The overall median proper motion error is about 0.8 mas/yr and drops to about 0.2 mas/yr for objects with measurements over time baselines longer than 7 years. The catalog reaches to about AB mag 25 in the ACS F814W filter and about mag 27 for the ACS F606W filter. Note that this accurate astrometric catalog is far deeper than the Gaia catalog, enabling many interesting science projects. It is likely the largest publicly available set of proper motions that extends to such faint objects.

The proper motion information is available within the HSC CasJobs and VO TAP database query interfaces and the MAST catalogs simple form interface and query API.  There are Jupyter notebooks with some science use cases demonstrating how to access the data from Python via several of the interfaces. See below for further information on these interfaces.

This work on the SWEEPS field is a prototype for the next version of the Hubble Source Catalog. We plan to extend these same algorithms to all fields in the HSC that have adequate time coverage for the measurement of proper motions.

proper motion
Figure 2: Mean longitude proper motion as function of color and magnitude in SWEEPS.

New (and old) interfaces to the HSC

There are several different interfaces available to access the data in the HSC. You can choose the interface that is best suited for your problem. For example, if you want to quickly check the HSC contents while you’re on an observing run, use the simple HSC search form or the MAST portal. If you are comfortable with SQL and want to do large or complex queries, check out the CasJobs interface or the VO TAP Queries service. The catalog's API supports a SQL-free (but still powerful) approach for complex queries. Both the MAST portal and the catalogs search form provide the ability to cross-match the HSC with a list of positions.  

The HSC CasJobs interface will be familiar to users of the Sloan Digital Sky Survey. The MAST version uses software developed by the Johns Hopkins University to allow SQL queries in a web browser. There is local user storage for data in your MyDB database, and queries can potentially run for hours if necessary. While the MAST CasJobs interface is not new, there is a recently developed Python module mastcasjobs that makes it simple to query the HSC and other MAST databases directly from Python.

The VO TAP database query interface is another option for SQL users. It is limited to somewhat smaller queries than CasJobs, but a significant advantage is that it can be accessed directly from VO-aware tools such as TOPCAT.  In TOPCAT, use the VO->Table Access Protocol menu and search for HSC TAP to find the HSC database. TOPCAT provides a nice interface for browsing the tables in a database and for examining and plotting query results.

The new HSC catalog interface, released 2019 August 15, will be familiar to users of our Pan-STARRS catalog interface. The HSC search form provides a simple tool to search the catalog for objects near a sky position. The new interface features the ability to constrain searches on any catalog parameters. A single search page provides access to both the summary table (with mean magnitudes and positions for each object) and the detailed table (with full time-dependent data on the objects). It also allows access to both HSC version 2 and HSC version 3. (HSC v3 is preferred due to its improved astrometric and photometric accuracy.) This single page replaces four different forms from the previous MAST search pages. The new interface also provides easy access to the Hubble Catalog of Variables data tables.

The form interface relies on a fast API that is also easily used for scripted access from Python and other languages. See below for more details.

While the MAST CasJobs interface can support larger and more complex queries of the database, for many users this new interface provides simpler access to the science-ready data in the Hubble Source Catalog.

search page
Figure 3: HSC and HCV catalog search form.

Python Jupyter notebooks

We have created sample Python Jupyter notebooks that demonstrate the use of several of our interfaces for accessing the HSC and HCV data. 

Notebooks utilizing the catalogs API include searching for variable objects in IC 1613, creating a color-magnitude diagram with 760,000 objects from the Small Magellanic Cloud, and proper motion studies for half a million stars in the Galactic bulge SWEEPS field. An HCV notebook compares variables and non-variables in IC 1613, extracts a light curve for a nova in M87 and examines the associated HST images, explores the properties of all the HCV objects that have classifications from human expert examinations of the data, and selects a highly variable object from an all-sky search, based solely on its variable characteristics.

There are similar notebooks available that use CasJobs for the queries rather than the catalogs API. The corresponding CasJobs notebooks are available from links in the notebooks listed above. In some cases the additional capabilities of CasJobs allow more complex queries to be done—e.g., the CasJobs version of the SWEEPS proper motion notebook is able to access some color information that is not accessible via the API.

Note that several of these notebooks use the HLA image cutout service to retrieve images of the objects in the HSC. We particularly recommend the visual examination of images for projects using the Hubble Catalog of Variables. Selecting variable candidates with unusual properties is also a good way to discover artifacts in the HSC! For example, in fields with bright stars, measurements at certain orientations may be affected by diffraction spikes, leading to apparent but spurious variability. In many cases a quick visual examination of the images can confirm that an object's variability is real rather than the result of an artifact in the image. (These image cutouts were also found to be of key importance in the expert reviews that were done for a subset of the HCV variable candidates.)

sweeps field
Figure 4: HLA image cutouts of a high proper motion star in SWEEPS field.

There are undoubtedly many discoveries to be made in the HSC catalog, and the HSC measurements are also useful as a reference for comparison to other analyses of Hubble data. The HSC provides a quick summary of information from all available observations that can be a huge time saver compared with the alternative of re-reducing all the data yourself. We welcome questions and feedback about the HSC and HCV databases and interfaces at the Archive Help Desk.