Characterization and processing of seismic catalogues

Much of PSHA depends on the assumption that future seismicity will occur near observed past seismicity, and at rates that can be approximated by empirical or physical models. Thus, the early steps in PSHA include compiling and processing an earthquake catalogue. Beyond collecting instrumental and historic earthquake records, catalogues must be homogenized (expressed in uniform units), declustered (devoid of aftershocks and foreshocks), and filtered for completeness. The assumptions and uncertainties in the catalogue should be well understood by the modeler.

Most source types used in hazard models built by the GEM Secretariat use magnitude-frequency distributions (MFDs) based on seismicity. Together with ground motion prediction equations (GMPEs), MFDs govern the computed hazard levels for time frames of interest, and so their robust calculation - and thus careful preparation of the input catalogue - is critical.

Here, we describe the ISC-GEM extended catalogue (Weatherill et al., 2016), which contributes the majority of earthquakes used in hazard models built internally by GEM; the workflow for combining other earthquake records with the ISC-GEM catalogue; and the remaining steps to prepare the catalogue for rate and spatial analysis. We emphasize that while most of these steps are routinely applied outside of GEM models, the following explanations only account for our own best practices.

The ISC-GEM catalogue

The ISC-GEM catalogue is a compilation of earthquake bulletins for seismicity occurring in the range 1900-2015. This catalogue sources records from numerous agencies to include the record deemed most accurate for each event, ensuring that no duplicates are included, and magnitudes are homogenized to MW. The most recent catalogue updates were completed by Weatherill et al. (2016) using the GEM Catalogue Toolkit, totaling 562 840 earthquakes with MW 2.0 to 9.6, and producing what is herein called the ISC-GEM extended catalogue. This current version is motivated by initiatives to improve regional and global scale seismicity analyses, hazard and otherwise.

Regional models developed by the GEM Secretariat use the ISC-GEM extended catalogue, augmented by data from local agencies when possible.

GEM Historical Earthquake Catalogue

The GEM Historical Earthquake Catalogue (Albini et al., 2013), includes large earthquakes (M>7) from before the instrumental period (1000-1903) that have been carefully reviewed to estimate a location and magnitude. The completeness of this catalogue is highly variable across the globe, and depends on how long each location has been inhabited, and the availability and quality of documentation on earthquakes occurring in this period.

Processing of seismicity catalogues

Catalogue homogenization

In order to use the bulletins from multiple agencies together in statistical analyses, records must be homogenized to meet the same criteria, e.g., to use the same measure of magnitude. Usually, moment magnitude (MW) is selected, since it does not saturate at high magnitudes. Thus, magnitudes reported in other scales must be converted. When possible, this is done using empirical relations developed for independent local datasets, but relies on global relations when too few calibration events are available.

The homogenization methodology used to build the ISC-GEM extended catalogue is described in detail in Weatherill et al. (2016).

Completeness analysis

Catalogue completeness analysis accounts for the variability in instrumentation coverage throughout the catalogue duration, admitting that any catalogue is missing earthquakes beneath a magnitude threshold. This type of filtering prevents rate analysis of an incomplete catalogue - a modeling mistake that will propagate into hazard estimates. Importantly, completeness analysis must be applied to a declustered catalogue as to not confuse dependent earthquakes (such as aftershocks) with magnitude completeness.

The completeness algorithms that are applicable to any instrumental catalogue must depend on properties of the earthquakes, and not the stations, thus focusing on the statistics of the catalogue sample rather than the probability that a station at a known position would record an earthquake. The most common algorithmic method is by Stepp (1971), which compares the observed rate of seismicity to a predicted Poissonian rate for each magnitude, and returns a spatially constant table of time-variable magnitude thresholds. Importantly, the validity of this methodology is subject to the judgement of the user.

The Stepp (1971) is implemented in the OpenQuake Engine, and used in some steps of the modeling procedure for hazard models built by the GEM Secretariat. In other cases, we determine the completeness manually from 3D histograms that count earthquakes for magnitude-time bins, visually identifying the timings at which the occurrences rates stabilize.


Catalogue declustering is applied in order to isolate mainshock earthquakes - that is, earthquakes that occur independently of each other - from a complete catalogue. The resulting declustered catalogue should therefore reflect the Poissonian rate at which earthquakes occur within a greater tectonic region. PSHA aims to model the hazard from sesimicity occurring at this background Poissonian rate.

Declustering algorithms identify mainshocks by comparing individual earthquakes to the "cluster" of earthquakes that occurred within a given proximity and time to that earthquake, choosing the largest for a given set of magnitude-dependent "triggering windows". The theory of declustering algorithms is described in detail in Stiphout et. al., 2012. The OpenQuake Hazard Modeler's Toolkit provides three different windowing options: the original implementation of Gardner and Knopoff (1974), and additionally the configurations of Uhrhammer (1986) and Gruenthal (see Stiphout et al., 2012).

In subduction zones or other complex environments, we first classify the seismicity by tectonic domain (described below), and then decluster groups of domains within which we expect seismicity to interact (i.e., interface mainshocks can trigger crustal aftershocks), and then separate the deemed mainshocks into subcatalogues based on their tectonic classification. We typically use two groups: crustal, interface, and shallow slab seismicity (that beneath the interface but with intraslab mechanisms); and deep intraslab seismicity. The declustering algorithm comparing epicentral (not hypocentral) proximities, and thus, declustering by groups is crucial for seismicity within slab-type volumes.

Classification of seismicity

The workflow used by GEM to construct seismic source models in complex tectonic regions is dependent on the use of classified seismicity, that is, the assignment of each earthquake to a tectonic domain. Separating earthquakes in this manner allows us to compute MFDs from only the seismicity occurring within a delineated domain, thus more accurately characterizing individual seismic sources or source zones. For example, in subduction zones, we separate earthquakes occurring on the interface itself from those within the downgoing slab or the overriding plate. This allows us to model the hazard from these source types using the appropriate GMPEs.

At GEM, we classify seismicity using an procedure with similar theory to Zhao et al., (2015) and Garcia et al., (2012), which assigns earthquakes to tectonic domains defined by the modeler. In subduction zones, earthquakes are usually categorized as crustal, interface, or intraslab based on hypocentral proximity to the Moho, and the interface and slab-top complex surfaces defined by the Subduction Toolkit. Where subduction zones are modeled as segmented interfaces or slabs, the domains are divided accordingly. Each tectonic domain is defined by a surface and a buffer region based on general characteristics of the corresponding cross sections. The modeler provides a tectonic hierarchy that chooses among multiple assignments for earthquakes occurring within overlapping buffers of two or more domains. Usually, we specify interface superseding intraslab, and intraslab superseding crustal. Earthquakes that do not correspond to any of the defined domains are deemed "unclassified".

The classification routine includes workarounds to correct some common misclassifications, such as to seclude dominant groups of earthquakes beneath a polygon (e.g., volcanic events); to classify large magnitude earthquakes from historic catalogues only by epicenter; and the ability to manually classify earthquakes by their event IDs.


Albini P., Musson R.M.W., Rovida A., Locati M., Gomez Capera A.A., and Viganò D. (2014). The Global Earthquake History. Earthquake Spectra, May 2014, Vol.30, No.2, pp. 607-624.

Gardner, J. K., and Leon Knopoff. "Is the sequence of earthquakes in Southern California, with aftershocks removed, Poissonian?." Bulletin of the Seismological Society of America64.5 (1974): 1363-1367.

Stepp, J. C. (1971). “An investigation of earthquake risk in the Puget Sound area by the use of the type I distribution of largest extreme”. PhD thesis. Pennsylvania State University (cited on pages 9, 25–27).

Stiphout, T. van, J. Zhuang, and D. Marsan (2012). Theme V -Models and Techniques for Analysing Seismicity. Technical report. Community Online Resource for Statistical Seismicity Analysis. URL:

Uhrhammer R. (1986). Characteristics of northern and southern California seismicity. Earthquake Notes 57, p. 21

Weatherill, G. A., M. Pagani, and J. Garcia. "Exploring earthquake databases for the creation of magnitude-homogeneous catalogues: tools for application on a regional and global scale." Geophysical Journal International 206.3 (2016): 1652-1676.