- © 2011 by the Seismological Society of America
The waveform suite developed at the University of Alaska Geophysical Institute is an open-source collection of MATLAB classes that provide a means to import, manipulate, display, and share waveform data while ensuring the integrity of the data and providing stability for programs that build upon them.
Many seismic investigations begin by extracting data from a database or archive in order to carry out some type of advanced processing on segments containing the relevant signal(s). This extraction process might include requesting data from the IRIS Data Management Center, a regional seismic network, or an in-house project database. Locally the extracted data may be stored in numerous well-established data formats or databases including SAC (Tapley and Tull 1992), SEISAN (Havskov and Ottemöller 2000), AH, SEGY (Barry et al. 1975), Winston (Cervelli et al. 2004) and Antelope (http://brtt.com; Lindquist 2009). MATLAB has proven to be a popular environment in which to work with the extracted data in part because of the ready access to powerful platform-independent routines for signal processing and plotting. Once in MATLAB, however, the original data format is often replaced with a variety of ad-hoc formats as each user employs a variety of structures, arrays, and cells to manage the data. The independent nature of these home-grown formats can make it difficult to transfer data from one system (or project) to another without spending time on single-use conversion routines.
The ease of programming in MATLAB makes it particularly vulnerable to impromptu or disposable coding with the consequence being that many of the resulting scripts are transient affairs that serve an immediate need. Many scripts are then discarded, while others enjoy a more celebrated existence; bolstered by quick success, users (including the authors) often decide to build out these routines into standalone packages, only to realize that the vast majority of this effort is consumed by bookkeeping, exception handling, documentation, and error checking. Though thankless, these tasks make the difference between robust reusable codes and thesis appendices.
We introduce here the waveform suite, a MATLAB-based format for handling waveform data. The waveform suite consists of three MATLAB classes that have been designed from scratch to provide a robust foundation for manipulating waveforms. For users seeking the ability to import, manipulate, and display seismic data, the waveform suite provides this functionality directly. The supplied interface allows users to both import several standard formats and write import routines for their own formats. The suite's full benefits will be gained by users looking to build more advanced packages such as for receiver functions, shear wave splitting, wavefield migration, phase picking, cross correlation, synthetics, or source inversion. By replacing ad-hoc systems with a common framework, the waveform suite can provide an architecture upon which more complex programs may be created and shared.
The waveform suite is not tailored for any particular type of analysis (e.g., earthquakes, ambient noise, volcanic tremor, synthetics, etc.) but attempts to provide generic tools specific to seismology programming. This is somewhat different than similar efforts such as the CORAL toolbox (Creager 1997) and MATLAB's built-in time series toolbox. The suite minimizes the need for considering the sometimes grotesque details required for error checking or manipulating large numbers of waveforms. Both the error handling ability and the improved code readability help programs built upon the waveform suite to be more robust, with fewer crashes and logical errors (McConnell 2004).
The waveform suite relieves the user of routine bookkeeping chores by automating the tedious aspects of data manipulation and by keeping related information together. Tasks managed through the suite include automatically updating attributes as required by various data manipulations (such as integrating, resampling, stacking, etc.), ensuring that arrays are of proper dimensions, determining the times of samples, keeping track of station-channel combinations, automatically labeling graphs, and more. A core feature of the waveform suite is its ability to handle multiple waveforms simultaneously, dispensing with the frequent need to loop through each individual trace. The suite is extensible, providing users with the ability to write import routines for their own formats. Databases or file systems may be queried for vast numbers of waveforms that can then be manipulated en masse. The suite allows the user to make changes to waveforms using standard mathematical operators (+, -,.*, etc.) and provides the ability to perform common manipulations such as filtering, subsetting, or demeaning.
Some basic statistics are supported as are more complex operations such as integration and Hilbert transforms. See Table 1 for a representative sample of functions.
The waveform suite consists of three primary MATLAB classes:
Waveform—This class provides easy data manipulation and display of evenly sampled (e.g., seismic) data.
Datasource—This class provides the interface between programs and stored data.
Scnlobject—This class handles each trace's station-channel-network-location information.
In most applications, datasource and scnlobject are used to set up the importation of data as in this example:
>> ds = datasource(`Antelope','/iwrun/op/db/archive_2010_02_27');
>> scnl = scnlobject(`SPBG', `BHZ', `AV', `—');
>> w = waveform(ds, scnl, startTime, endTime);
where startTime and endTime happen to be N × M matrices of trace start and end times. The result is an N × M waveform object, w. This particular example draws data from an Antelope database. This could be pointed to a different type of input data by just changing the first line. As will be the case for all scripting examples, the user-typed commands are shown after the MATLAB prompt “>>” while computer-generated output is represented in italics.
The terms class and object appear repeatedly throughout this paper and deserve brief mention. An object is essentially a variable and is treated as such, but is special in ways that are discussed throughout the next section. A class refers to the template that explicitly lays out the types of data within an object, as well as the functions (called methods) that are allowed to operate on those data. In fact, the standard MATLAB data types (“double,” “char,” and “cell”) are classes.
BENEFITS INHERENT IN USING THE WAVEFORM SUITE'S CLASSES
The waveform suite's classes provide a sturdy and flexible framework that frees the user from bookkeeping details, allowing the user to concentrate instead upon getting results. The functions and the data upon which they operate are treated as a unit, providing the user with a simple, seamless interface upon which more complex programs can be easily created. Additional benefits include the reduced need for variable juggling, providing familiar ways to work with data, and insulating the program from bad data while protecting data from bad input, all of which results in speedier and more effective program development and facilitates reproducible results.
Classes provide a way to unify data, reducing the number of variables that must be tracked separately. Working with seismic data typically requires tracking amplitudes, times, stations, frequencies, units, etc. The waveform suite's classes allow the user to merely track a single all-encompassing object of the class waveform. This variable may be N-dimensional, representing multiple seismic traces (building on the prior example):
Input routines have been designed for each of the waveform suite's classes. These are capable of pre-screening values to avoid the assignment of erroneous values that would cause a program to misbehave. Values that have been assigned to an object are protected from unsupervised modification and instead are accessed through functions tailored to work with the data. Each of the suite's functions provides targeted access to the underlying structure in order to prevent accidental or nonsensical data modification. According to MATLAB convention, the assignment function is called set and the retrieval function is called get. Through set, data types and formats can be strictly enforced, and it can be ensured that data lie within proper ranges and have appropriate dimensions and units. This last point is important because matrix operations treat 1 × N arrays differently than N × 1 arrays. Through these routines, the user may access derived or interpreted properties as well as native properties. A simple example of this behavior is a request for a date: while the date is internally represented in a waveform as a MATLAB serial date number, the user may opt to retrieve the date as a text string or an epoch. To an external program, actual or derived data are indistinguishable.
Since MATLAB determines which version of a function to use based upon the data type (class) passed to it, defining waveforms as a distinct MATLAB class permits the reuse of standard function names such as plot() and min() as well as existing symbols for standard mathematical manipulations including “+” and “.*”.
The ability to use existing operators and create routines that replace multiple lines of code with a simple intuitive command improves the understandability of routines, which helps reduce errors. Compact legible code is illustrated by this example, which plots the normalized traces of three of the waveforms stored in w from the previous examples. The output can be seen in Figure 1.
>> peak2peak = max(w) – min(w);
>> normalized_traces = w./peak2peak;
Every attempt has been made in the waveform suite to provide thorough and meaningful error messages to expedite the debugging process. For cases where the cause is outside the user's control (as in erroneous values within a data stream, data gaps, etc.), the classes within waveform suite may be able to recover intelligently, rather than crash. Ideally, no application using these classes should retrieve nonsensical data since errors within the data are reported or corrected at the time of assignment.
DESCRIPTION OF THE WAVEFORM SUITE'S CORE CLASSES
The waveform class is the suite's workhorse, providing a way to manipulate evenly sampled data (i.e., seismic data) within MATLAB. Waveform tracks data relating to frequency, start time, trace identification, measured amplitudes, and history, along with user-defined fields. A selected list of waveform functions is included in Table 1.
Modifications to waveform objects are recorded in the history of each waveform. The history is capable of recording text blurbs as well as other types of information, such as other objects used to modify the waveform (e.g., details of an applied filter). Detailed history may later be retrieved to determine how the data were modified or, more importantly, how to reproduce results.
User-defined fields expand the capabilities of waveform by allowing users to include additional information relevant to each trace. In practice these user-defined fields serve as extensible header fields. These fields are created automatically in some instances; when a SAC file is imported, header information is transferred into user-defined fields. Though this information can be stored in adjacent matrices, incorporating it directly into the waveform object allows it to pass automatically into existing routines. Because these fields can be of any type it is possible to store not only simple information, (e.g., an event location), but also more complex information such as an instrument response or the frequency spectrum of the trace. While there is some danger in infinitely extensible header information, in our experience thus far this capability has given considerably more power than anticipated to the waveform suite. Access to the data within user-defined fields is through the same get and set methods that are used to access the rest of waveform's details.
The scnlobject class was created to simplify handling station, channel, network, and location information. Though not required in all situations, the four-parameter SEED naming convention is now nearly ubiquitous in earthquake seismology. Each scnlobject represents a single sta_chan_net_loc combination (Ahern et al. 2009). Encapsulating these descriptors into a dedicated object improves the management of large numbers of traces by facilitating single-command subsets, concatenations, and queries against waveform objects. Commonly accessed scnlobjects can be stored for automatic retrieval, so that they are not required to be created manually each time. Scnlobjects understand “*” wildcards, further improving the ability to sift through large numbers of traces for desired information.
The datasource class provides the connection between waveforms (or other classes) and their databases or stored files. The datasource class has proven to be a valuable way to insulate the waveform class from the (often changing) data streams. Together, the datasource and waveform classes have built-in interpreters for several database and file formats including:
Antelope—The waveform suite wraps the required elements from the Antelope toolbox for MATLAB (Lindquist 2009), providing the ability to access Antelope databases. This is included in the standard Antelope distribution from Boulder Real Time Technologies.
Winston—The waveform suite reaches directly into Winston using the java library distributed with SWARM (Cervelli et al. 2004).
SAC—Seismic Analysis Code (SAC) files (Tapley and Tull 1992) may be imported without additional codes. Additional header fields are translated into similarly named user-defined fields.
SEISAN—No additional utilities are required to import files from the SEISmic ANalysis system (Havskov and Ottemöller,2003). However, due to SEISAN's file naming conventions, datasource may not be able to automatically determine which file is desired. In this case, the datasource should be created using specific filenames.
.mat file—Datasource is capable of looking within .mat files for all variables of a desired type. This allows it to parse data from files that contain previously generated waveform objects.
User-defined—The datasource/waveform object combination makes it straightforward to translate a file of any type into an array of one or more waveform objects. A short wrapper function should be all that is necessary to make an existing import routine compatible with the waveform suite. Notably, this does not require an understanding of the datasource/waveform codebase beyond the set function.
Most users are well acquainted with reading data on a per file basis. This is straightforward in the waveform suite. In addition, complex directory structures and file naming schemes can be traversed thanks to the datasource's ability to interpret fprintf() style formatting statements that may describe a file's time and/or station/channel/network/location information (see Figure 2). The following example shows how a datasource might be created that can access SAC files stored in the current directory.
Essentially the combination of the waveform, scnlobject, and datasource objects allow a user's homegrown data organization structure to be queried as a simple relational database. Datasource facilitates this by providing separation between data requests and the explicit data storage structure. Though not required, this approach is in our opinion generally preferable to hardwiring code to specific file names.
Additionally, the datasource is able to return information that crosses file or database boundaries. Data that is retrieved from individual files or databases can be combined into a continuous object. In the case of a waveform object, this is done automatically. By accessing files through their generalized formats, instead of individually, issues such as the “11:59 p.m. earthquake problem” can be avoided.
While the datasource class works in concert with the waveform class to retrieve information, it is not dependent upon waveform and may be used to access data of any type. Many users choose to save commonly used datasource objects in a .mat data file or an .m script file where they can be loaded automatically, such as from the startup.m file.
Additional companion classes are included with the waveform suite to assist common tasks. The filterobject class provides a method of filtering waveform data with a Butterworth filter. Using this class, multiple waveforms can be filtered with minimal coding. The spectralobject class was designed to simplify the creation and display of spectrogram (or periodogram) data. Uispecgram is an included application that provides a graphical interface for creating custom spectrograms.
Though importing data into MATLAB is not necessarily difficult, choices about how to store and work with the data have a tremendous influence on a project's success. A project underpinned by the waveform suite's object-oriented framework has access to tools able to retrieve data from a variety of sources, manipulate and display the same data in an intuitive manner, and develop robust applications that are easy to maintain. Each class's internal structure remains invisible to dependent applications, allowing the suite to change as it matures without breaking the programs that implement it. However, users should not expect the suite to grow into an all-encompassing singularly defined seismic format. Several people have used the waveform suite as the foundation for their own processing packages, using this common, flexible framework to reduce development time and making it easy to share waveform data with other MATLAB users.
WHERE TO GET THE WAVEFORM SUITE
The waveform suite is available as a self-contained distribution from the MATLAB file exchange: http://www.mathworks.com/matlabcentral/fileexchange/23809. This distribution is regularly updated as new stable releases are produced.
The waveform suite codebase is maintained as an open source project together with several associated seismic tools in the GISMO toolbox at: http://code.google.com/p/gismotools/.
The waveform suite requires MATLAB. The current release of the Waveform suite has been successfully tested against MATLAB 7.1 (R14SP3); however, MATLAB's object handling has changed considerably in recent years, so current development is occurring only in MATLAB version R2009b and later. The waveform suite does not depend upon any additional toolboxes from The MathWorks, Inc. Additional libraries are required to read data from Antelope or Winston databases.
Thanks to all those who have helped the waveform suite evolve to its current state. The authors would especially like to recognize J. Amundson (a great debugger and source of additional functionality), M. Thorne (whose SAC routines were thoroughly cannibalized), G. Thompson and S. DeAngelis (as testers and for SEISAN help), M. Robinson and L. Valcic (for network and computer support), and S. McNutt (one author's advisor, who let him develop these codes when, just perhaps, he should have been concentrating on interpreting the wiggles themselves). Also, the authors would like to thank K. Creager for his thoughtful review. This work was partially supported by the Alaska Volcano Observatory and the U.S. Geological Survey as part of their Volcano Hazard and Geothermal studies, and by additional funds from the State of Alaska.
Geophysical Institute, Alaska Volcano Observatory, University of Alaska Fairbanks, Fairbanks, Alaska