# Seismological Research Letters

- © Seismological Society of America

## ABSTRACT

I present a set of routines in MATLAB for estimating the second‐degree moments of an earthquake’s rupture from far‐field body waves. The second moments describe the length, width, duration, and directivity of a rupture. The second‐moments approach is particularly useful when a seismic dataset is dense enough to resolve the primary finite‐source properties but the geodetic data needed for a well‐resolved finite‐fault inversion are not available. In particular for *M*_{w} 3–6 earthquakes, this approach can be a useful way to estimate rupture area without the assumptions of typical corner‐frequency approaches. The provided software utilizes empirical Green’s function deconvolution to isolate the apparent source time function (ASTF) for each station and phase. The spatial variations in the duration of the ASTF are quantified and inverted for the second moments. The inverse problem is solved using MATLAB’s convex optimization routines for systems of linear matrix inequalities. An error analysis using the jackknife and bootstrap methods is included. An example *M*_{w} 4.7 earthquake from the San Jacinto fault is used to demonstrate the method.

MATLAB scripts.

## INTRODUCTION

The second moments of the slip‐rate distribution for any earthquake describe the spatial and temporal extent of the rupture, as well as its propagation, all of which contribute to the apparent duration of the earthquake observed in any particular far‐field phase. For an earthquake with a constant moment tensor such that the spatial variations in moment rate are described by (1)The second moments are defined as (2)in which is a scalar function that describes the spatial and temporal distribution of moment release along the fault (McGuire *et al.*, 2001), and *t*_{0} denote the centroid location and time (i.e., the first moments), respectively. The hat denotes that these are central moments taken about the centroid. The integrals are taken over the entire source volume and earthquake duration (Backus, 1977a,b; McGuire *et al.*, 2001).

When is integrated over the volume of the source, it is known as the moment rate or source time function (STF) . The second spatial moment is related to the spatial extent of the rupture area; the second temporal moment is related to the duration of rupture; and the mixed moment is related to rupture propagation.

There is considerable background literature on second moments. In general, the second moments are a way to capture the overall kinematic properties of a rupture that are well constrained by the far‐field waveforms. For a more detailed theoretical background and examples in various settings, see Backus and Mulcahy (1976a,b), Backus (1977a,b), Doornbos (1982a,b), Silver (1983), Gusev and Pavlov (1988), Bukchin (1995), Das and Kostrov (1997), McGuire *et al.* (2001), and Clévédé *et al.* (2004). This toolbox mostly follows the measurement scheme developed in McGuire (2004) and the inversion scheme developed initially in McGuire *et al.* (2001). The measurement scheme and partial derivatives are specific to far‐field body waves with some sort of time domain Green’s function available for propagation corrections (empirical or theoretical). For surface‐wave‐based schemes, the inversion algorithms still apply, but a different measurement and partial derivative calculation scheme is required, depending on the approach (see McGuire *et al.*, 2001; Clévédé *et al.*, 2004; Chen *et al.*, 2005; Llenos and McGuire, 2007).

The characteristic rupture duration *τ*_{c}, rupture length *L*_{c}, and average propagation velocity of the instantaneous spatial centroid *v*_{0} are defined following Backus and Mulcahy (1976a); Backus (1977a); Silver (1983); Silver and Jordan (1983); and McGuire *et al.* (2001) (3)in which *x*_{c} is the spatial extent of the rupture in the direction , and *L*_{c} is the maximum value of *x*_{c} (i.e., corresponding to the largest eigenvalue of ). *W*_{c} corresponds to the second largest eigenvalue, for example, the rupture width. The second moments can either be calculated in three spatial dimensions or along a 2D fault plane if the mechanism is known. To resolve the fault‐plane ambiguity, we typically invert for the second moments assuming each 2D nodal plane and choose the one with the higher variance reduction as the true nodal plane.

In general, the characteristic dimensions give an idea of the region that contributed substantially to moment release and the relative importance of directivity in the rupture. Figure 1 shows an example of the characteristic dimensions calculated for a theoretical crack model with unilateral propagation from Kaneko and Shearer (2015). The ellipse defined by the second spatial moment captures the orientation and extent of the region with the large slip, which is smaller than the total dimension of the rupture (Fig. 1). Similarly, the second temporal moment captures the time period in which most of the moment was released, not the total duration (Fig. 1). This definition of earthquake duration is theoretically closely related to the corner frequency obtained from simple spectral fitting (Silver, 1983), but in practice the second moments may result in better estimates of rupture area (Chen and McGuire, 2016).

## SOFTWARE REQUIREMENTS

An Ⓔ electronic supplement to this article contains a MATLAB toolbox for calculating the second moments. The toolbox contains two primary directories. The *src* directory contains the MATLAB functions for making measurements and inversion as well as a Fortran program for calculating takeoff angles. The *SJFex* directory contains an example *M*_{w} 4.7 earthquake from the San Jacinto fault in southern California. The third directory contains a PDF file of this manual. To use this toolbox you need only the codes in the *src* directory and MATLAB. The inversion algorithm uses some routines from MATLAB’s Robust Control Toolbox. All routines were tested with MATLAB v. 2013b.

## DATA PREPARATION

The input to the empirical Green’s function (EGF) algorithm is simply pairs of velocity seismograms for each station/component on which you would like to try a deconvolution. The code is currently set up to allow interactive waveform windowing; this could be automated if you prefer. The *SJFexamp* directory contains an example of the data preparation using event directories of miniSEED files and the Antelope software package for event information. The data loading is done in the script *runSJFsetup.m* that could be easily modified to match other databasing approaches.

The example event is an *M*_{w} 4.7 earthquake on 11 March 2013 that was recorded by both the ANZA network and a dense temporary deployment of instruments including numerous strong‐motion sensors (Kurzon *et al.*, 2014). The example event is run with the *runSJFex.m* script. The flag LOADDATAFILE can be set to 1 to use a *pre‐loaded.mat* file with all of the necessary arrays, or it can be set to 0 to load the miniSEED files and alter filters, and so on. If loading directly from miniSEED, the key choices to make are the dofilt flag and *f*_{min} and *f*_{max} that collectively set the frequency bandpass for filtering the data. The code is set up to integrate all seismograms, with component names like HNZ, from acceleration to velocity. The key choice is specifying the list of stations and components to be read in and tried as deconvolutions. This is done in the stasm and compm variables. This could be automated to look for all pairs in the event and EGF directories.

## MEASUREMENTS

Our approach requires using the far‐field body‐wave data to estimate the moment‐rate function from the *P* and/or *S* wave at a set of azimuthally distributed stations. This estimate will be either a stretched or compressed version of the true moment rate function due to the finite‐source properties of the rupture and hence is termed an apparent source‐time function (ASTF). Because many moderate earthquakes require working at relatively high frequencies (>1 Hz), to retrieve the ASTF, we typically use EGFs to remove propagation effects. Many EGF deconvolution algorithms, such as the water‐level technique, produce ASTF estimates that have low‐amplitude ringing for an extended period of time following the main pulse of moment release that makes it difficult to determine the end of the rupture. One technique that provides an objective determination of the duration of the ASTF is the projected Landweber deconvolution (PLD) algorithm of Bertero *et al.* (1997) and Lanza *et al.* (1999). This algorithm performs the deconvolution with moment release restricted to a series of increasing‐length time intervals and analyzes the misfit as a function of the interval length (see Fig. 2). The interval length in which this trade‐off curve flattens out is chosen as the interval during which moment release is allowed (Lanza *et al.*, 1999). This technique produces ASTFs that satisfy a positivity constraint, provide a good fit to the observed seismograms, and are very consistent between nearby stations (McGuire, 2004).

Deciding if the ASTF should be used is somewhat subjective. Two quantitative criteria that are routinely applied are that the misfit be less than 0.3 and that the moment ratio between the mainshock and EGF be about a factor of 1000 or larger. The latter constraint is important to limit the natural biases toward underestimating the duration of the ASTF due to the finite duration of the EGF. Assuming cube‐root scaling between moment and duration implies that a factor of 1000 difference in moment would limit this bias to less than 10% of the duration estimate. Using a larger EGF (moment ratio ≤ ∼ 1000) will bias the results toward shorter durations and faster velocities than the true values. It is also important to qualitatively check that nearby stations yield similar values of the apparent duration. In the example event below, the duration estimates vary smoothly as a function of position as would be expected.

The MATLAB files for making the measurements are as follows:

*makemeasurements.m*: This contains the graphical user interface for looping through velMS and velEGF arrays containing the velocity seismograms and making the measurements of , in which is the slowness vector of the phase being measured.*pld.m*: This file contains the implementation of the PLD algorithm. The key control parameter for the deconvolution is the number of iterations. Effectively, the higher this number (variable niter), the better the ASTF will do at fitting the highest frequency energy in the mainshock waveform. It is best to experiment with a range of values for your dataset, starting relatively low (∼10) and increasing. Typically a value of*niter*=100 is a good compromise between seismogram fit and computation time.*findt2.m*: This calculates for the ASTF that results from the deconvolution. This gives you the option to pick the starting and ending time points of the ASTF that you want to include in the calculation. Assuming this is working well, this choice could be omitted (set pickt2 = 0), and the calculation will be done on the entire range of the time function.

The flag DOMEAS lets you choose between either just loading in an example set of measurements or going through the process of making the measurements yourself.

In making the measurements, the routine is set to store the current set of measurements after each station in the file *MEASUREMENTS.mat*. This is intended to allow you to stop partway through a dataset and restart at a later time or simply go back and revise a few stations that appear to need it after seeing the whole dataset. The first window that appears asks you if you want to load an existing set of measurements from this file or to start from scratch. The *makemeasurements.m* function will loop through all of the channels in the dataset and ask you to make a series of choices for each:

The first menu asks you if you want to start a new set of measurements or load the previous set from

*MEASUREMENTS.mat*. The script will save the measurements you have made after each station. This provides an easy way to stop in the middle and restart later. The*MEASUREMENTS.mat*file is overwritten after every station is processed, so if you wish to save it, you need to copy it to some other filename. If this is your first attempt with this event, choose “Start New” to initialize the variables.The first plot shows the mainshock (MS) and EGF waveforms and will ask you if this channel is worthy of doing a deconvolution. If you say no, DONE(i) remains = 0, and this channel will not be used. If you choose yes, you will move onto windowing the MS and EGF waveforms.

If you choose to try deconvolving this channel, it will ask you to manually pick a time range to zoom in on the mainshock waveform. Just click roughly before the beginning of the

*P*or*S*phase on which you want to work. It will then replot this time range of the MS and ask you to pick the time range to be fit by the deconvolution. It is best to pick a window surrounding the main arrival. The inversion assumes that the ASTF reflects energy that left the source at roughly the slowness vector of the first arrival. Hence, it is most accurate not to try to fit the later parts of the coda that likely reflect different slownesses. In practice, we usually pick the start of the interval as being 20–50 samples before the first arrival (e.g., so there is a relatively flat near‐zero part before the arrival). Picking the end of the fitting interval can be tricky. Ideally, the waveform would have returned to near‐zero amplitude relatively quickly after the main arrival. For an*M*4–5 earthquake, we typically fit 3–5 s of*P*or*S*wave. If there is a clear second arriving phase (*Pn*, etc.), it is best to end the interval before that arrival. For stations with a strong coda, it is often very difficult to pick the end of the interval. These stations will often not work very well.Once the MS‐fitting interval has been picked, it will replot this interval and ask you to pick the onset time of the

*P*or*S*wave.It will then plot the EGF waveform and ask you for an interval in which to zoom. This is slightly different. You should zoom fairly tightly on the first arrival. It will then replot this window and ask you to pick the first arrival time, which defines the start of the Green’s function.

Once the waveforms have been picked, it will perform the PLD for a series of allowed durations of the ASTF and calculate the misfit for each deconvolution. It will plot this trade‐off curve (misfit vs. ASTF interval). If the EGF was a good choice, then the trade‐off curve will have a region where it drops steadily to a low value (≤0.3) and then flattens out. In an ideal case, there is a clear break in slope of the trade‐off curve that identifies the correct value for the total duration at that station. It will ask you to pick this inflection point (see Fig. 2 for some examples).

If you have set the flag pickT2 = 1, it will plot the ASTF for your choice and ask you to pick the start and end points for the integration that calculates . This is sometimes desirable for earthquakes that do not end abruptly in case you want to revise your choice without re‐doing the deconvolution. If you had pickT2 = 0, then it does this calculation automatically on the whole ASTF (usually fine for good datasets).

It will now plot the ASTF and the fit to the data and ask you to choose among four options for saving the result: (1) finished

*P*wave, (2) finished*S*wave, (3) re‐do, or (4) “No, done” on this channel and move on to the next. The first two set the*DONE*variable to 1 for this channel and record whether it was a*P*or*S*wave. The “re‐do” option lets you go through this channel again in case you think you got the windowing wrong. The “No, done” option will set the*DONE*variable to 1, and this channel will not be used in the inversion.The script will loop through this process for the rest of the stations.

Once this loop has been completed for all stations and the *DONE* variable has been set to 1 or 0 for each, the measurement process is complete and saved.

## INVERSION SCHEME FOR SECOND MOMENTS

The inversion algorithm is essentially identical to that in McGuire (2004) and McGuire *et al.* (2001) but has been implemented using the Robust Control Toolbox in MATLAB. In general, the inverse problem for the second moments can be posed as a simple linear inversion based on the equation for the observations (4)in which is the slowness vector associated with a particular measurement and · denotes the dot product (see Silver, 1983; McGuire, 2004, for details). However, in practice if the station distribution is suboptimal, this could result in unphysical estimates for the second moments (Das and Kostrov, 1997). We always enforce the constraint that the source region has nonnegative volume (McGuire *et al.*, 2001); that is accomplished by enforcing the matrix inequality (5)in which ≥0 indicates that the matrix is required to be positive semidefinite.

To set up an inverse problem based on equation (4), we simply need the measurements of from the previous section and the corresponding slowness vectors for each phase. We include a Fortran program called *topp* for calculating the takeoff angles, which is run external to MATLAB. Most of this code is taken directly from the hypoDD software package of Waldhauser and Ellsworth (2000; see Data and Resources). *Topp* is called from within the subroutine for calculating the partial derivatives in which *mlats*, *mlons*, and *melevs* give the station coordinates; *late*, *lone*, and *depe* give the earthquake coordinates; *V*_{P}, *V*_{S}, and *topl* give the layered velocity model; and *topl* is the depth of the top of each layer in kilometers. *Phas* is a character array the length of the number of measurements that has either the value *P* or *S* for each measurement. This routine calculates the partial derivatives for a 2D source along a fault specified by *strike* and *dip* in the usual convention.

If the DOINVERSION flag is set to 1, the runSJFex script will call the function *seconds_2d_v2.m* to do the actual inversion. To solve equation (2) in a least‐squares sense subject to equation (3), we recast equation (2) as a linear matrix inequality (LMI), similar to equation (3), and solve the system using MATLAB’s Robust Control Toolbox. The least‐squares problem from equation (2) is rewritten , in which *c* is a dummy decision variable. The problem in the LMI form is then given by (6)in which is the identity matrix with dimension equal to the number of measurements *N* and ≥0 indicates the matrix is positive semidefinite. The equivalence between the linear least‐squares problem and the above can be seen by calculating the eigenvalues of the *N*+1 by *N*+1 matrix, which are nonnegative when the matrix is positive semidefinite. This restatement of the problem is known as using Schur complements to represent a nonlinear constraint as linear matrix inequality (Vandenberghe and Boyd, 1996). Equation (6) ensures that the estimate of the second temporal moment is smaller than the largest measurement of , which should be true for any dataset that contains stations with good azimuthal coverage.

MATLAB’s Robust Control Toolbox contains many routines that are particularly useful for problems with LMI constraints such as equation (5). For a detailed description of how to use these, see the MATLAB manual pages. The script *seconds_2d_v2.m* shows how to implement the problem in equation (6) within MATLAB’s LMI syntax. For each matrix inequality, the matrix on the left side is first defined in terms of the optimization variables (the second moments) using the function *lmivar*. Each individual inequality in equation (6) is added to the LMI system using the function *lmiterm*. For instance, the lines is an implementation of equation (5) in which the variable represents the LHS (7)and has been previously specified as a 3×3 symmetric matrix composed of the six independent elements of the second moments (for a 2D fault) that we seek to estimate with the MATLAB to solver. The LMI is described to MATLAB by specifying the location of each individual term within the LMI by which side of the inequality it is on, which entry within the matrix it is part of, and any constant factors that multiply it. The entries in the call to *lmiterm* are the *termid*, and two constant matrices *A* and *B* (see the *lmiterm* man pages). The LMI we are describing has only one term on the left side and it is a matrix variable type of the term. It has no pre‐ or postmultiplying matrices, hence *A* and *B* are set equal to 1. For the , the entries are as follows:

first entry = −1 denotes the sign of the inequality in MATLAB’s convention (−1 indicates the greater side of the inequality) and that this is the first constraint equation (they are concatenated into a block diagonal system);

second and third entries = 1 give the location of the next variable in the matrix (e.g., there is only 1,1 in this example);

and fourth entry = is the term to be added to the location specified in the LMI.

Note that a sequence of LMIs such as equation (6) is implemented as one block diagonal LMI, so the first entry is incremented (1, 2, 3, …) in *seconds_2d_v2.m* to accommodate each subsequent constraint equation as a new block.

## ERROR ANALYSIS

We present two versions of error analysis that have been used previously in second moment studies based on the jackknife and bootstrap approaches. Both approaches are based on resampling the data vector. Often seismic arrays involve a number of stations that are located close together, whereas other parts of the focal sphere are poorly sampled at best. Thus, it is not straightforward to apply techniques like the jackknife that assume each data point is independent. In our case, nearby stations will likely have highly correlated measurements both in terms of the true values of *τ*_{c}(*s*) and the errors introduced by imperfect EGFs. To account for this when subsampling the dataset, we delete all of the stations in an azimuthal bin, typically with a range of 20°–40°. The jackknife calculation produces an estimate of the covariance matrix of the six second‐moment quantities from which errors on the derived characteristic rupture quantities can be estimated (McGuire *et al.*, 2001). The jackknife calculation with 20° azimuth bins results in the following 1*σ* error estimates for the San Jacinto fault example in Figures 2 and 3: *τ*_{c}=0.30±0.015 s, *L*_{c}=0.33±0.06 km, *W*_{c}=0.19±0.12 km, and .

In general, the jackknife results are sufficient to capture the uncertainty. It is sometimes worthwhile to perform inversions for a number of random perturbations to the earthquake location, velocity model, or fault‐plane orientation, if the uncertainties in those quantities are well known to evaluate their influence (McGuire, 2004). Another interesting approach is to apply the bootstrap technique. The bootstrap approach with 1000 resamples of the data vector (with replacement) yields the probability density functions (PDFs) for *τ*_{c} and *L*_{c} shown in Figure 4. As can be seen in Figure 4, the jackknife error bounds capture well the peaks of the bootstrap PDFs for *L*_{c} and *τ*_{c}. However, when there are only a small number of stations in a critical azimuth (e.g., the forward or backward rupture directions), then the resamples that lack stations from this direction can result in values that are well outside the jackknife bounds. In particular, for the San Jacinto earthquake, only three stations capture the long durations in the backward direction. For resamples that do not involve these observations, much greater variability in the second moment estimates would be expected. The 95% confidence limits from the bootstrap are [0.25, 0.36] and [0.31, 1.39] km for *τ*_{c} and *L*_{c}, respectively. The extent to which these are meaningful depends on how dense the station distribution is.

## DATA AND RESOURCES

Seismograms used in this study were collected as part of the ANZA Network and the San Jacinto Fault Zone Program for Array Seismic Studies of the Continental Lithosphere (PASSCAL) experiment. Data can be obtained from the Incorporated Research Institutions for Seismology (IRIS) Data Management Center at www.iris.edu (last accessed September 2016) and the Southern California Earthquake Data Center scedc.caltech.edu (last accessed November 2016). For the *topp* code that was taken directly from the hypoDD software package of Waldhauser and Ellsworth (2000), see http://earthquake.usgs.gov/research/software/ (last accessed January 2017).

## ACKNOWLEDGMENTS

Thanks to three reviewers for careful reading and suggestions. Thanks to Yoshi Kaneko for sharing the results of his dynamic rupture simulations for calculating the second moments shown in Figure 1. Thanks to Yehuda Ben‐Zion and Frank Vernon for sharing the San Jacinto fault earthquake dataset. This work was supported with Southern California Earthquake Center (SCEC) Grant Number 16104 and U.S. Geological Survey (USGS) National Earthquake Hazards Reduction Program (NEHRP) Award 2016‐0188. This research was supported by the SCEC Contribution Number 7162. SCEC is funded by National Science Foundation Cooperative Agreement EAR‐1033462 and USGS Cooperative Agreement G12AC20038.