# Seismological Research Letters

- © 2015 by the Seismological Society of America

*Online Material: *Explanation of the concept of a dynamic rupture simulation.

## INTRODUCTION

Earthquakes are complex events that involve a myriad of interactions among multiple geologic features and processes. One of the tools that is available to assist with their study is computer simulation, particularly dynamic rupture simulation. A dynamic rupture simulation is a numerical model of the physical processes that occur during an earthquake. Starting with the fault geometry, friction constitutive law, initial stress conditions, and assumptions about the condition and response of the near‐fault rocks, a dynamic earthquake rupture simulation calculates the evolution of fault slip and stress over time as part of the elastodynamic numerical solution ( see the simulation description in the electronic supplement to this article). The complexity of the computations in a dynamic rupture simulation make it challenging to verify that the computer code is operating as intended, because there are no exact analytic solutions against which these codes’ results can be directly compared. One approach for checking if dynamic rupture computer codes are working satisfactorily is to compare each code’s results with the results of other dynamic rupture codes running the same earthquake simulation benchmark. To perform such a comparison consistently, it is necessary to have quantitative metrics. In this paper, we present a new method for quantitatively comparing the results of dynamic earthquake rupture computer simulation codes.

Over the past decade, the Southern California Earthquake Center–U.S. Geological Survey (SCEC‐USGS) Spontaneous Rupture Code Verification Project has published a series of computational dynamic rupture benchmark exercises (Harris and Archuleta, 2004; Harris *et al.*, 2009, 2011). The benchmarks, a subset of which of are shown in Figure 1, are listed on the project’s website (http://scecdata.usc.edu/cvws, last accessed November 2014) and cover many earthquake settings, including a range of fault geometries (strike‐slip faults, normal faults, branching faults, faults with stepovers, and nonplanar faults), a range of initial stress conditions (uniform initial stresses, depth‐dependent initial stresses, and stochastic initial stresses), a range of friction parameterizations (slip‐weakening and rate–state friction), and a range of off‐fault material properties (elastic, plastic, and viscoplastic). Descriptions of the modeling codes are also available on the website.

Each benchmark is specified in great detail to ensure that all participants are computationally simulating the same physical situation. A benchmark specification includes the fault geometry, the friction law and friction constitutive parameters, the nucleation method, the boundary conditions that apply at the fault edges, the initial stress conditions, the material properties and constitutive parameters, the locations of observation stations where synthetic seismograms are to be generated, the required duration of the simulation, and the required spatial resolution.

To conduct these benchmark exercises, each participant uses his/her own computer code to simulate dynamic earthquake rupture and uploads the results to the project’s website. Participating codes employ a wide variety of computational approaches, including finite element, finite difference, spectral element, discontinuous Galerkin, support operator, and boundary integral methods. The earthquake simulation results, including fault‐rupture patterns and synthetic seismograms, are then compared to the results submitted by other modelers. Until this year, the SCEC‐USGS code group comparisons were primarily conducted qualitatively, through visual inspection.

## PREVIOUS WORK

Following the creation of our SCEC‐USGS collaborative project a decade ago, dynamic earthquake rupture modelers have used selected subsets of our project’s simulation results to qualitatively compare the earthquake simulation results produced by a variety of computer code implementations. A number of researchers have also performed quantitative comparisons of dynamic earthquake rupture simulations, most of them using just one of the SCEC‐USGS benchmarks, to compare results among a few codes. Some of these comparisons used the original SCEC‐USGS benchmark TPV3, a vertical strike‐slip fault set in an elastic full‐space (Fig. 1). For example, Day *et al.* (2005) used the TPV3 benchmark to compare finite‐difference versus boundary‐integral dynamic rupture simulation methods, Moczo *et al.* (2007) used TPV3 to quantitatively compare the fault rupture and synthetic seismogram results among four finite‐element and finite‐difference code implementations, and Pelties *et al.* (2012) used TPV3 to quantitatively compare a high‐order discontinuous Galerkin method on unstructured tetrahedral meshes with finite‐difference and spectral element computer codes. Kristekova *et al.* (2006, 2009) describe a time–frequency decomposition technique for comparing seismograms, including numerical solutions of TPV3. A few researchers have checked their codes using more than one benchmark exercise. Tago *et al.* (2012) examined the results of two project benchmarks, the original full‐space TPV3 benchmark and TPV10 (Fig. 1), the case of a 60°‐ dipping normal fault set in an elastic half‐space, that includes the Earth’s surface. Tago *et al.* (2012) quantitatively compared the accuracy and convergence results produced by their code with the results produced by five other codes performing the same benchmark exercises. Our SCEC‐USGS dynamic rupture group also conducted a multibenchmark quantitative study, but we limited ourselves to quantitatively evaluating only the peak ground motions at one seismic station, with the rest of the comparison being done qualitatively. The purpose of the four‐benchmark quantitative test was to examine if one of the participating codes, which had been used to predict the largest possible ground motions at Yucca Mountain, the United States’ formerly proposed high‐level nuclear waste repository, had operated as expected (Hanks *et al.*, 2013). The group was able to conclude that the code had indeed functioned as its authors had intended (Harris *et al.*, 2011).

## CHALLENGES FOR METRIC DESIGN

Most previous attempts to define quantitative metrics for dynamic earthquake rupture simulations have focused on a small number of codes and a few select benchmark exercises. In contrast, our project’s metrics need to be designed to operate on the full set of computational results that we have produced using many computer codes and many benchmark exercises. As of March 2014, the SCEC‐USGS dynamic rupture project’s collection of computational simulation results included 8 GB of data, 39 benchmarks, 36 participating modelers, 10,000 uploaded files, and an average of 14 sets of results for each benchmark, leading to 560,000 pairs of simulation results that can be compared. With this large quantity of data, it is best if metric generation is fully automated. It is also best if the metrics are designed to be sufficiently flexible to accommodate future contributions as new codes enter the code comparison exercise and as the group adds more benchmark exercises. Therefore, metric values should be absolute (i.e., they should not depend on the range of results submitted), so they do not change as new results are uploaded.

## METRIC DESIGN—THE TYPES OF DYNAMIC RUPTURE SIMULATION RESULTS

Modelers submit three kinds of simulation results for each benchmark exercise: on‐fault time‐series data, off‐fault time‐series data, and rupture‐front contour data. There is some variation from benchmark to benchmark, but in general the codes are requested to submit the following:

On‐fault time series of fault slip, slip rate, shear stress, and normal stress. Modelers submit one time‐series file for each seismic station located on the fault surface. The benchmark description specifies the station locations, but modelers are free to choose the times at which they report the data.

Off‐fault time series of displacement and velocity. Modelers submit one time‐series file for each seismic station located away from the fault. The benchmark description specifies the station locations, but modelers are free to choose the times at which they report the data.

Rupture‐front contour data that gives the time at which each point on the fault surface begins to slip, which is defined as the time when the slip rate first exceeds 0.001 m/s. Modelers are free to choose the array of points on the fault surface at which they report the rupture time. For benchmarks with multiple fault surfaces, modelers submit one rupture contour file for each fault surface. This is called rupture contour data because we typically use it to draw contour plots showing the propagation of the rupture.

Metrics must be computed only from this set of code results. Also, metric computations must deal with the fact that different sets of modeling results are sampled at different times (for time‐series data) and different locations (for rupture contour data). We do this by defining our metrics to be integrals over time or space. The discrete data supplied by each modeler is interpolated to form a continuous function, and then the metrics are evaluated by integrating the continuous functions.

### Challenges and Solutions for Working with Time‐Series Results

Raw time‐series data often contain spurious oscillations. Codes differ in the amount and type of oscillations they produce. Some codes seem to have built‐in filtering, whereas others seem to have none. We suppress oscillations by applying a 5 Hz low‐pass filter to velocity, slip rate, shear stress, and normal stress, prior to computing the metrics (Fig. 2). The filter used is a two‐pole, two‐pass Butterworth filter. The cutoff frequency of 5 Hz strikes a good balance of suppressing most oscillations without changing the waveform shape too much. We do not filter displacement or fault slip, because they generally do not have oscillations.

Time‐series data may contain some fields that are close to zero. For example, in a strike‐slip‐fault benchmark, the along‐dip slip rate may be very small at some seismic stations (but perhaps not at other stations). Comparison of near‐zero data is problematic, because it may be dominated by noise. To avoid this difficulty, individual fields in a time series are combined to form vector‐valued fields whenever possible, and all metrics are computed using the vector values. Specifically, at on‐fault stations, (a) horizontal and along‐dip slip are combined to form a 2D slip vector; (b) horizontal and along‐dip slip rate are combined to form a 2D slip‐rate vector; and (c) horizontal and along‐dip shear stress are combined to form a 2D shear‐stress vector (the normal stress is treated as a scalar). At off‐fault stations, (a) the *x*, *y*, and *z* components of displacement are combined to form a 3D displacement vector, and (b) the *x*, *y*, and *z* components of velocity are combined to form a 3D velocity vector.

Codes may produce time‐series data with very similar waveforms, which differ by a shift in time (Fig. 3). This would occur, for example, if codes disagree somewhat on the rupture propagation speed but agree on other aspects of the rupture. Merely subtracting one time series from the other would miss the agreement in waveform shape. So, when comparing two time series, we apply a time shift to one of the time series, so as to produce an optimum fit. The time shift value is reported as part of the metric values.

Some modelers have submitted time‐series data that extend to times beyond the duration requested in the benchmark description. Also, we have observed that some codes report anomalous data on the first time step. So, when processing time‐series data, we always discard the first time step, and we discard any time steps that come after the end of the requested duration.

### Challenges and Solutions for Working with Fault‐Rupture Results

For rupture‐front contour data, we need to compare the rupture times reported by different codes. Some benchmarks have ruptures that slow down and stop (Fig. 4). In areas where the rupture is propagating slowly, there is likely to be a large difference in rupture time, because a small difference in location equates to a large difference in time. So, when comparing rupture times, we exclude areas where the rupture velocity is less than 1 km/s. The cutoff velocity of 1 km/s is chosen based on our experience in examining rupture contour plots from our database.

We have observed that some codes report anomalous rupture times near the bottom and side boundaries of the fault surface. So, when comparing rupture times, we exclude areas within 300 m of the bottom and side boundaries. In the next two sections, we describe in detail the mathematical formulas and preprocessing procedures used to compute our quantitative metrics.

## METRIC FORMULAS FOR TIME‐SERIES DATA

Let *f* (*t*) be a function of time, which may be either scalar valued or vector valued. If *t*_{1} and *t*_{2} are given starting and ending times, define the *L*^{2} norm of *f* (*t*), over the interval from *t*_{1} to *t*_{2}, to be (1)

Suppose we want to compare two functions *f* (*t*) and *g*(*t*). Given a time shift *t*_{s}, the metric value *Q* is defined to be the normalized root mean square (rms) difference between *f* and *g*: (2)

The three integrals in equation (2) are all computed over the range of times *t* for which both *f* (*t*) and *g*(*t*−*t*_{s}) are defined. For example, if *f* (*t*) and *g*(*t*) are both defined for times *t* ranging from 0 to 12 s, and if *t*_{s}=0.5 s, then all three integrals appearing (implicitly) in equation (2) are evaluated with *t*_{1}=0.5 s and *t*_{2}=12 s.

It can be proven that 0≤*Q*≤1. The value *Q*=0 indicates a perfect match. To convert *Q* into a percentage, we multiply it by 200. The multiplier is 200, rather than 100, because the denominator in equation (2) is a sum of two norms. The lower the value of *Q*, the better the match between the functions *f* and *g* (Fig. 5). In general, for our benchmarks completed to date, we find that values of *Q* less than 10% indicate reasonably good agreement.

As described above, time‐series data is organized into files. Each file contains the data from one modeler, at one seismic station. Before performing any comparisons, each individual time‐series file is preprocessed as follows:

Remove the first time step, and any time steps after the requested benchmark duration, as discussed above.

Apply a 5 Hz low‐pass filter to slip rate, shear stress, normal stress, and velocity, as discussed above.

Replace shear stress and normal stress with their AC components. That is, subtract the mean from each value of shear or normal stress. This is done because typically the change in stress during the simulation is much less than the absolute value of stress, and we wish to focus on the changes in the stress values. (A few benchmarks have additional data fields corresponding to rate–state friction variables, and these data fields are also replaced with their AC components.)

Form data fields into vectors, to produce a vector‐valued time series, as discussed above.

Extend the time series to a continuous function of time, by linearly interpolating between the points supplied by the modeler. This makes each time series into a piecewise linear function, so that integrals such as equation (1) can be performed analytically.

After preprocessing, we compare each pair of time‐series files at a given seismic station. The first step is to find the optimum time shift *t*_{s}. The time shift is chosen to minimize the comparison metric *Q* for one selected data field. For an off‐fault station, we minimize the value of *Q* for the 3D velocity vector. For an on‐fault station, if both files have a peak slip rate of at least 0.050 m/s, then we minimize *Q* for 2D slip rate; otherwise, we minimize *Q* for 2D shear stress. We cannot use 2D slip rate for all on‐fault stations, because some benchmarks have on‐fault stations that never slip or that only slip a very small amount.

Minimization is done by computing the metric *Q* for a series of values of *t*_{s}, ranging from −1.0 s to +1.0 s. The value of *t*_{s} that minimizes *Q* is taken to be the time shift for that pair of time‐series files and is reported along with the metric values.

Once the time shift *t*_{s} is determined, we use equation (2) to compute a metric, or *Q* value, for each pair of data fields. For an off‐fault station, the result is two metric values, one each for 3D displacement and 3D velocity. For an on‐fault station, the result is typically four metric values, one each for 2D slip, 2D slip rate, 2D shear stress, and normal stress. However, there is some variation in the metric values reported for an on‐fault station. If either file has a peak slip rate less than 0.010 m/s, then metrics for 2D slip and 2D slip rate are not reported, because there is too little slip to perform a meaningful comparison. If the benchmark is a vertical planar fault in a uniform elastic medium, then no metric is reported for normal stress, because normal stress cannot vary in this case. If the benchmark uses rate–state friction, then there may be additional metric values pertaining to rate–state variables.

## METRIC FORMULAS FOR FAULT‐SURFACE RUPTURE‐FRONT CONTOUR DATA

Let (*x*, *y*) denote position on the fault surface. Let *f* (*x*,*y*) be a function of position. If *S* is a portion of the fault surface, define the *L*^{2} norm of *f* (*x*,*y*), over the surface *S*, to be (3)

Suppose that *f* (*x*,*y*) and *g*(*x*,*y*) are rupture time functions, that is, they give the time at which the point (*x*, *y*) on the fault surface begins to slip. To compare *f* and *g*, we compute the metric value *T*, which is defined as the rms difference in rupture time: (4)

The surface integral that appears (implicitly) in equation (4) runs over the portion *S* of the fault surface where both *f* (*x*,*y*) and *g*(*x*,*y*) are defined and where they both have a rupture propagation speed of at least 1 km/s. The need for imposing a minimum rupture propagation speed was discussed above. The symbol *A* is the area of the surface *S*.

We report the metric value *T* in milliseconds. The value *T*=0 indicates a perfect match. The lower the value of *T*, the better the match between the functions *f* and *g* (Fig. 6). In general, for our benchmarks completed to date, we find that values of *T* less than 50 ms indicate reasonably good agreement.

Recall that each modeler submits one rupture contour file for each fault surface, which contains the rupture time at each point of a modeler‐selected array of points on the fault surface. Before performing any comparisons, each individual rupture contour file is preprocessed as follows:

Form the Delaunay triangulation of the points supplied by the modeler. Extend the rupture time to a continuous function of position on the fault surface by interpolating linearly within each Delaunay triangle.

Lay out a dense grid of points on the fault surface. We use a rectangular grid with points spaced 12.5 m apart, along strike and along dip. (For a benchmark with a nonplanar fault surface, we lay out the grid on a plane and then project the grid points onto the nonplanar fault surface.) Resample the rupture time function onto the dense grid. That is, evaluate the rupture time at each grid point, using linear interpolation within Delaunay triangles. All integrals, such as the one in equation (3), are then performed (to a good approximation) by calculating Riemann sums on the dense grid.

Compute the rupture velocity at each point on the fault surface. We do this at each point on a rectangular grid with points spaced 50 m apart, along strike and along dip. To compute rupture velocity at a given point, first draw a circle of radius 300 m around the point. The radius of 300 m is chosen so that the circle typically spans several of the elements used by the dynamic rupture code. The rupture velocity is considered to be indeterminate if any part of the circle lies outside the portion of the fault that ruptured (except that for a benchmark set in a half‐space, the rupture velocity is not considered to be indeterminate merely because part of the circle extends above the Earth’s surface). Then, perform a 2D linear least‐squares regression within the circle. The regression yields a linear function of

*x*and*y*, which is a good approximation of the rupture time function near the given point. The gradient of the linear function is taken to be the slowness at the given point. If the slowness is less than 10^{−4}s/m, then the rupture velocity is considered to be indeterminate. (This typically occurs near the hypocenter, and for some benchmarks may occur elsewhere.) Otherwise, the rupture velocity is the reciprocal of the slowness.Compute the area inside each rupture contour. Rupture contours are spaced at time intervals of Δ

*t*=0.5 s, so that the*n*‐th rupture contour contains all points that have begun to slip by time*n*Δ*t*=*n*(0.5 s). The area inside the*n*‐th contour is computed by evaluating the integral*A*_{n}≡∫_{S}*dxdy*; the surface*S*consists of all points on the fault with*f*(*x*,*y*)≤*n*Δ*t*, in which*f*(*x*,*y*) is the rupture time function. These areas are reported along with the metric values.Compute the process zone width at each on‐fault station. (The process zone is the region near the rupture front within which the friction decreases from its static value to its dynamic value.) This step requires use of the time‐series data at each on‐fault station. First, examine the time‐series data to determine the amount of time

*t*_{c}it takes for the fault slip to reach the critical slip distance. The critical slip distance*d*_{c}is defined to be the distance that the fault must slip so that friction reaches its steady‐state sliding value. Most benchmarks use linear slip‐weakening friction, in which*d*_{c}is specified in the benchmark description as one of the friction constitutive parameters. For benchmarks that use rate–state friction, we estimate*d*_{c}by visual inspection of time‐series plots. Then, the process zone width is taken to be*t*_{c}multiplied by the rupture velocity. (If the rupture velocity is indeterminate, then the process zone width is also considered to be indeterminate.) The process zone widths are reported along with the metric values.

After preprocessing, we compare each pair of rupture contour files. The comparison is done by computing the metric value *T* in equation (4). If the benchmark has multiple fault surfaces, then a separate metric value *T* is computed for each fault surface.

## METRICS PRESENTATION

When applied to the entire database of results produced by each code for each benchmark, our procedure produces hundreds of thousands of metric values. There is a time‐series metric value *Q* for each simulation result (*e.g.*, slip rate or velocity), each seismic station, for each pair of codes, for each benchmark. There is also a time shift value *t*_{s} for each seismic station, each pair of codes, and each benchmark. So, it is important to organize and present the metric values in such a way that readers can navigate this large collection. We present the metrics in a set of pages on our group’s website (http://scecdata.usc.edu/cvws/metrics_v01.html, last accessed November 2014).

For each benchmark, we summarize the time‐series metrics in two different ways, so as to give readers an overview of the metric values. The first kind of summary is called a code‐by‐code summary. For each pair of codes, we compute the average time‐series metric value *Q* (as defined by equation 2) over all stations and data fields in the benchmark exercise. Separate averages are computed for on‐fault seismic stations and off‐fault seismic stations. This yields two average metric values for each pair of codes, which summarize the overall level of agreement between those two modelers (Fig. 7).

The second kind of summary is called a station‐by‐station summary. For each station and data field, we compute the average time‐series metric value *Q* over all pairs of codes that submitted results for the benchmark. Also, for each station, we compute the average absolute time shift value |*t*_{s}| over all pairs of codes that submitted results for the benchmark. These average metric values, and average time shifts, summarize the overall level of agreement among the entire group of modelers at each seismic station (Fig. 8).

Each number that appears in a metrics summary table on the website (scecdata.usc.edu/cvws, last accessed November 2014) is also a link. Clicking on the number brings up another webpage that displays detailed metric values for the selected pair of codes (in the case of a code‐by‐code summary) or for the selected station and data field (in the case of a station‐by‐station summary). So, beginning with the website summary tables, readers can explore the entire collection of time‐series metric values.

The rupture contour metrics for a fault surface are displayed on a single webpage. There are three tables on the page. The first table shows the metric value *T* (as defined by equation 4) for each pair of codes that submitted results for the benchmark. The second table shows the area inside each rupture contour, for each code. The third table shows the process zone width for each on‐fault station, for each code. Figure 9 illustrates the tables of metric values and process zone widths.

All the tables are colored on a green‐to‐red scale to help the reader distinguish better from worse metric values. Small metric values, which indicate good agreement, receive colors near the green end of the scale. Large metric values receive colors near the red end of the scale.

Different kinds of metric values are converted to colors in different ways. For the time shift *t*_{s} and the process zone width *w*, the hue *H* is obtained from linear mappings of the form (5)(6)

Linear mappings do not work well in assigning hues to the rms metric values *Q* and *T*. This is because a few outliers can stretch the color scale, so that most of the metric values receive nearly indistinguishable shades of green. So, for *Q* and *T* we use exponential mappings of the form (7)(8)

We use corner values *Q*_{c}≈72% and . In equations (5)–(8), the values of *a* and *b* are automatically selected so that the extreme ends of the color scale correspond to the 5th and 95th percentiles of metric values; this helps to minimize the effects of outliers. Notice that unlike the numerical metric values, which are absolute and do not depend on the range of results submitted, the colors are relative to the range of results submitted.

## DISCUSSION

A metric might seem to provide an objective way to compare modeling results, but ultimately it does not. The manner in which two waveforms differ from each other cannot be described with one or a few numbers. So, designing a metric is necessarily an act of human judgment, which involves deciding what aspects of the waveform are most important or useful to measure, and how they should be measured. The numerical values produced by a metric may be objective, but the judgments that went into the metric’s design are subjective.

Our metrics are not intended to replace visual inspection of the waveforms; rather, they are intended to be an additional tool that is used together with visual inspection. Accordingly, we do not plan to have the computer produce a pass/fail assessment of whether or not the modeling codes are in good agreement. We believe that such an assessment should always be done by a person using both metrics and visual inspection.

Metrics excel at consistency and repeatability. There are many thousands of possible comparisons for each benchmark, far too many for a person to inspect or remember them all. Metrics can be calculated for every possible comparison, with all the results displayed in tables that are easy to browse. Looking through these tables gives an excellent overview of the level of agreement between any two codes or among the entire set of codes. In some cases, codes seem to fall into groups, with better agreement within a group than between codes in different groups. The metrics also can pinpoint potential trouble spots that warrant further attention.

Metrics are also good at combining different forms of data in ways that are impossible to do visually. For example, our metrics are able to calculate the cohesive zone width by combining time‐series data with rupture time data, something that we could only roughly estimate before we had metrics.

In contrast, visual inspection reveals the nature of any disagreement and allows conclusions to be drawn as to its cause and importance. For example, a secular difference between two waveforms might be seen as more significant than two waveforms that appear to oscillate around a common mean, even if the metric value is the same. Likewise, a difference that occurs late in the simulation, which may be due to numerical artifacts such as reflections from the boundary of the model domain, might be seen as less significant than a similar difference early in the simulation.

Visual inspection can also reveal important details that are not visible to the metrics. An example is provided by one of our recent benchmarks that has slightly asymmetric fault geometry. The slight asymmetry causes seismic waves emitted from different portions of the fault to interfere constructively near the hypocenter, which in turn causes a small change in the fault‐slip rate at the hypocenter. The effect is three orders of magnitude smaller than the total slip rate at the hypocenter and is completely invisible to our metrics. Nonetheless, by careful visual inspection it is possible to not only see the effect, but also to verify that the modeling codes are in good agreement.

Because the metrics are intended to be used together with visual inspection, we designed the metrics to mimic the kinds of comparisons we have always done visually. Perfect mimicry of a human observer is impossible, but this goal was a useful guideline in making decisions about how to calculate metrics.

Our waveform comparison metric, defined by equations (1) and (2), is based on the *L*^{2} norm. There are many other metrics that could be defined instead. One possibility would be to use the *L*^{1} norm, which is the area under the curve: (9)

Compared to the *L*^{1} norm, the *L*^{2} norm tends to place greater weight where the function is large. This makes the *L*^{2} norm a better choice for mimicking the way we perform visual comparisons. As seen in Figures 2, 3, and 5, plots of velocity or slip rate often have a relatively narrow peak, followed by a long tail with lower velocity or slip rate. When comparing such waveforms visually, we usually pay the most attention to the portion of the waveform where velocity or slip rate is large, because that portion makes the greatest contribution to seismic hazard. Also, differences occurring late in the simulation may be due to interactions with the boundary of the modeling domain. So, we want the metric to emphasize the portion of the waveform where the velocity or slip rate is large. Because the largest differences between waveforms tend to occur where the signal is large (Fig. 5), the *L*^{2} norm emphasizes the portion where the signal is large, as desired. More specifically, a large but short‐lived difference near the peak might enclose the same area as a small difference that persists throughout the tail. As a result, the *L*^{1} norm may place too much weight on the tail and not enough near the peak.

Filtering is another aspect of the metric in which we attempt to mimic the way we perform visual comparisons. Some codes generate output that contains spurious high‐frequency oscillations. This occurs due to the need to discretize the calculation in space and time. A discretized numerical model may have modes of vibration with much lower energy than the corresponding vibrations in a continuum. Because of their low energy, these vibrational modes are easily excited, leading to large oscillations in the output. In finite element work, they are called hourglass modes. Many finite‐element codes include features to suppress the hourglass modes, either by raising their energy or by damping them. Oscillations may also arise if the true solution contains spatial or temporal frequencies that exceed the resolution of the code. Instead of simply disappearing, the energy associated with the unresolved high frequencies may instead appear in the output in the form of oscillations.

Our metrics suppress these oscillations by applying a 5 Hz low‐pass filter to velocity, slip rate, and stress. This is the same procedure we use when performing visual comparisons, and experience shows it works well. However, by filtering the data, we are altering the output of the modeling codes, and one can reasonably ask if it is fair to do so. On the one hand, for codes that produce little or no spurious oscillation, the application of a filter might make the code’s output worse due to overfiltering. For these codes, it seems unfair to alter their outputs. On the other hand, for codes that produce a lot of spurious oscillation, failing to apply a filter would leave them with poor performance on the metrics. This poor performance is not truly indicative of the abilities of the code, because it can be made to perform much better with a simple filtering step, and is particularly unfair because we explicitly requested that modelers not perform their own filtering when submitting data. In short, any decision about filtering is unfair to someone. We decided that the best compromise is to continue what we have always done during visual comparisons, but we recognize that it is a compromise. Also, it should be noted that the 5 Hz cutoff frequency is based on our existing dataset. As codes improve in their ability to simulate higher‐frequency effects, and as benchmarks come to include more heterogeneity, it may at some point become appropriate to increase the cutoff frequency.

Time shifting is another way that we alter the data. In our data, it is fairly common to see waveforms with very similar shape but offset from each other in time. In a case such as shown in Figure 3, where the time offset is large enough to separate the signal peaks, the difference between the unshifted waveforms can be large. Without applying a time shift, the resulting poor metric value would greatly overstate the difference between the codes. We decided to apply a time shift that yields the best agreement in the waveform shape but also to report the amount of time shift that was applied so that the user can take the shift into account when assessing the codes. This, too, is consistent with the way that we perform comparisons visually.

Interpolation is something that we never had to deal with explicitly when performing visual comparisons. When presented with data in graphical form, the human eye interpolates automatically. For our metrics, interpolation is unavoidable because different codes produce data for different points in space and time. Although the benchmark description specifies a required resolution, it is not possible for all codes to use the same computational grid and the same time step, because the codes use different computational approaches. For example, although many codes use a uniform square grid of points on the fault surface, at least one code requires a triangular mesh, and another requires nonuniform grid spacing. Codes that employ higher‐order methods may use larger grid spacing or time steps than lower‐order codes to achieve a given resolution.

An important consideration is that we want to interpolate in a way that is exactly symmetric; comparing *f* with should yield exactly the same result as comparing *g* with *f*. Also, we want to interpolate in a way that, as much as possible, preserves the original discretization of each function, to avoid introducing any artifacts. We achieve both goals by separately converting each function into a continuous, piecewise linear function, and then comparing these continuous functions. Because each linear segment is bounded by the function’s original discretization points, this largely preserves the original discretization.

## CONCLUSIONS

The goal of the SCEC‐USGS Spontaneous Rupture Code Verification Project is to test the correctness of dynamic earthquake rupture computer codes by comparing the results produced by different codes conducting a series of benchmark exercises. We have demonstrated that such comparisons can be performed quantitatively, using the metrics described in this article. The metrics are calculated using an automated process and are designed to adapt as new codes and benchmarks join the project, while still maintaining the integrity of the original metrics calculations. In this article, we described suitable computational procedures and demonstrated them by computing metric values for a large existing set of online dynamic rupture simulations. A small subset of our metrics results is presented in this article. The full set of metrics results can be seen on our website, http://scecdata.usc.edu/cvws (last accessed November 2014).

Our computational procedures are customized to suit the characteristics of our database. Some of our design decisions, such as the use of rms metric values, may be broadly applicable. Other design decisions rely on very specific knowledge of our database. If the ideas in this article are applied to other collections of modeling data, then they will likely need to be modified and customized to suit those other collections.

## ACKNOWLEDGMENTS

Thank you to our colleagues over the years in the SCEC‐USGS Spontaneous Rupture Code Verification Project: Brad Aagaard, Jean Paul Ampuero, Joe Andrews, Ralph Archuleta, Harsha Bhat, Xiaofei Chen, Victor Cruz‐Atienza, Luis Dalguer, Steve Day, Nora DeDontney, Ben Duan, Eric Dunham, Kenneth Duru, Geoff Ely, Alice Gabriel, Percy Galvez, Junle Jiang, Yoshi Kaneko, Yuko Kase, Jeremy Kozdon, Nadia Lapusta, Yi Liu, Bin Luo, Shuo Ma, Amaryllis Nerger, Hiroyuki Noda, David Oglesby, Kim Olsen, Ryan Payne, Christian Pelties, Arben Pitarka, Matthew Purvance, Daniel Roten, Zheqiang Shi, Surendra Nadh Somala, Seok Goo Song, Josue Tago, Elizabeth Templeton‐Barrett, and Zhenguo Zhang. Thank you to Editor‐in‐Chief Zhigang Peng, Associate Editor Jeremy Zechar, Joe Fletcher, Debbie Smith, Martin Mai, Peter Moczo, and two anonymous reviewers for helpful recommendations that improved the manuscript. This work was funded by the U.S. Geological Survey (USGS), the Southern California Earthquake Center (SCEC), and the Pacific Gas and Electric Company. SCEC is funded by grants from the USGS and the National Science Foundation. This is SCEC Contribution Number 1951. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.