- © 2014 by the Seismological Society of America
The usage of seismic ambient noise has recently proved its efficiency in different contexts, from imaging to monitoring. The impulse response (or Green’s function [GF]) between two sensors can be reconstructed from the correlation of seismic noise recorded (Campillo and Paul, 2003). This method has provided excellent results in imaging the Earth’s interior, from global to regional or local scales. More recently, the method was extended to study time‐dependent variations in those GF. A change in the delay times might originate from a change in the medium velocity or from a dramatic change in the position of the source or of one/many scatterers. Several studies using seismic ambient noise have shown that small perturbations within a volcanic edifice can be detected as changes in seismic‐wave properties (Sens‐Schönfelder and Wegler, 2006; Brenguier, Shapiro, et al., 2008; Duputel et al., 2009; Mordret et al., 2010; Brenguier et al., 2011; Anggono et al., 2012). Contrary to the use of active sources or earthquake coda waves, the technique takes advantage of the continuous sampling of the medium using around‐the‐clock records from seismic stations. The method has proven its ability to evidence temporal physical changes in fault zones (Wegler and Sens‐Schönfelder, 2007; Brenguier, Campillo, et al., 2008), the lunar environment (Sens‐Schönfelder and Wegler, 2011), or to detect instrumental errors (Stehly et al., 2007; Sens‐Schönfelder, 2008).
Some codes have already been presented to compute cross correlations of seismic noise, for example, within Seismic Analysis Code (SAC) (Goldstein et al., 2003) or within Computer Programs in Seismology (CPS) 3.3 (Herrmann, 2002). To our knowledge, no integrated solution has been published to go from the raw waveforms to the travel‐time variations, automatically detecting changes in the data archive and computing only what is necessary on a scheduled basis (hourly, daily) and which is also usable as a research tool to process archives. Automating the detection is important, as, for example, data streams coming from real‐time telemetered stations contain gaps that could be filled up later on, and provide important data to be analyzed. Such a tool must be able to work with any common seismic format, be reasonably fast, optimized whenever possible. It should interact with a data archive and a database with high‐level helper functions in order to be pluggable and extensible. Finally, it must produce exportable data, either in waveform format, tabular text files, or high‐quality figures. This is the purpose of Monitoring using Seismic Noise (i.e., MSNoise).
We do not want to provide another black box to the users so MSNoise is an integrated solution, its code is open, commented, and documented. The processing workflow is separated in steps and one can easily replace one step by another code, providing that the inputs and outputs are respected.
In this paper, the software steps are presented both from the IT and scientific/methodological points of view. Example graphical outputs are shown and described. We finally validate MSNoise using archive data from the Piton de la Fournaise volcano (La Réunion, France), where precursory seismic velocity changes have been observed prior to two eruptions in October and December 2010, during the UnderVolc project (Brenguier et al., 2012). UnderVolc is an Agence Nationale de la Recherche (ANR) project and stands for “UNDERstanding VOLCanic Processes: Towards Eruption Prediction and Risk Mitigation, an application to Piton de La Fournaise volcano, La Réunion (2009–2013).”
MSNoise is written in Python to be fully cross platform. It is open source and free, for noncommercial educational and research usage. MSNoise is licensed under the European Union Public License (EUPL v1.1). Commercial services related to MSNoise, for example, providing consultancy or support for its implementation or maintenance, are prohibited without signed agreement with the original authors. MSNoise is available on http://www.msnoise.org/ (last accessed March 2014).
The workflow of MSNoise is pretty straightforward (Fig. 1) and is composed of simple steps that are described extensively on the companion website.
Installation and Configuration
MSNoise requires Python and only a few extra packages for running on any operating systems (OS). The waveform archive must be either a known format, for example SeisComP Data Structure (SDS) or Buffer of Uniform Data structure (BUD), or one has to define its own. MSNoise uses a database for storing configuration bits, waveform metadata, and jobs. This database can be either MySQL or sqlite, but we recommend using MySQL for performance reasons. The package comes with an installer script that will initialize the connection to the database and store the user name and password locally to allow the Configurator and all processes to interact with the database. This Configurator GUI (built using Enthought Tools Suite [Enthought, 2008]) shows three different panels: the general MSNoise configuration, the stations configuration, and the filters configuration. All configuration elements are described in the online documentation.
Automatic Job Definition
To automatically run every night on the data acquired during the previous day, MSNoise needs to check the data archive for new or modified files. Those files could have been acquired during the last day, but could also be the data from a previously offline station and contain useful information for, say, a month ago. The time to search for is defined in the config. MSNoise uses the find command (gnufind on Windows) with the ‐mtime argument to locate new or modified files. Once located and read (using obspy [Beyreuther et al., 2010; Megies et al., 2011]) they are inserted (if new) or updated (if modified) in the data availability table. This table can then be turned into a data availability plot (Fig. 2). Upon first run, the script must be called with the init argument in order to avoid using the mtime argument for the find command and to insert all discovered files in the data availability table.
Outputs, Extensibility, and Plugability
The original goal of MSNoise is to provide δv/v plots over time. These data can be represented as several plots, one per station pair, one average for all station pairs, etc. MSNoise comes with several example plots that one can duplicate or hack to meet its needs. MSNoise uses pandas for data analysis (McKinney, 2012) as it provides great helper functions to analyse time series data (rolling average, resampling, detrending time series containing NaNs…) and matplotlib (Hunter, 2007) for static graphical outputs.
MSNoise has been designed with plugability and extensibility in mind. All the communications with the database, with the data archive or with the cross‐correlation files go through helper functions. None of the processing steps makes direct connections to the database or to the data files. Extension makers are advised to look at the documentation of the functions to build their extensions. A good starting point is to have a look at the example output plots provided with the package. We want to emphasize that although we currently prove MSNoise is working as expected (see the Validation section below), one might be interested in modifying or using other computation routines. This can be easily done by replacing steps in the workflow.
Our approach for continuously measuring seismic velocity changes of the Earth’s Interiors relies on three steps. (1) The first step is computing cross‐correlation functions (CCFs) of ambient seismic noise time series at different dates for individual pairs of seismic sensors; (2) measuring travel‐time delays of different arrivals (direct or coda waves) between these individual CCFs and a defined reference CCF, and (3) averaging these travel‐time delays for different correlation lag times over different sensor pairs and interpreting these travel‐time delays using a simple model of uniform relative velocity change within the studied area (δv/v=constant).
Computation of Cross‐Correlation Functions
For each cross‐correlation job, the waveforms are preprocessed and then analyzed. Jobs are independent, so several computations can be run in parallel, usually depending on ones’ CPU‐RAM‐Disk speed triplet.
For each station, all files that potentially contain data for the day are opened. The traces are then merged and split, to obtain the most continuous chunks possible. The different chunks are then demeaned, tapered, and merged again to a one day long trace. If shorter than one day, the trace is padded with zeroes. If longer, it is cut to match the start/end of the day. Subsequently, each one day long trace is low passed, high passed then decimated, or downsampled. Decimation is faster, but only allows decimation by integer factors, whereas the downsampling supports any factor which allows the usage of heterogeneous station configurations. Decimation/Downsampling are configurable, and users are advised to test both. The resampling method used in MSNoise comes from the audio world and provides excellent quality results (de Castro Lopo, 2013). Low‐pass and down‐sampling values must be defined together properly.
Once all waveforms are loaded in memory, the computation is done by iteration on the station pairs, on the different components to compute and then on the different filters for each defined window (30 minutes slices by default, configurable).(2)
The radial (R) and transverse (T) components are calculated from the rotation of the east (E) and north (N) components. The rotation angle is defined as the azimuth (Az) between the two sensors.
In order to attenuate the parasitic signals from seismic events (being local, regional, or global), we apply a Windsorizing (Tukey, 1962) of three times RMS to the traces as a first‐processing step. A second step consists of whitening the amplitude of the signal between two configured frequencies.
The cross correlation is conducted in the frequency domain. Let x(t) and y(t) be two time series and X(f) and Y(f) their Fourier transforms. The correlation operation is then defined as (3)with X*(f) being the complex conjugate of X(f). If x(t)=y(t), then the operation is called auto correlation. The CCF c(t) is the inverse Fourier transform of C(f). If configured (not by default), the c(t) for each 30 minutes slice can be stored to an output directory on the computer. Again, if configured (default), the day stack of the non‐zero, non‐infinite, and non‐NaN 48×30 minutes c(t) are stored.
Stacking Strategy and Export
MSNoise is capable of using a reference function (REF) defined by absolute or relative dates span. For example, an absolute range could be from 1 January 2010 to 31 December 2011 and a relative range could be the last 200 days. To define this reference, we propose to plot all CCF already computed and check for their consistency. Once a stable enough period has been identified, it can be defined as the reference function (REF) for further analysis.
The correlation coefficient of each daily CCF with a given REF can be evaluated and is already a good indicator that something happened under, around or to ones’ sensors (Fig. 3). Signals in the negative and positive time lags can sometimes be different (Mordret et al., 2010), as shown on Figure 3. A change in the velocity should bring smooth changes in the correlation, whereas changes in the origin of the seismic noise or in the position of some of the scatterers should dramatically change the correlation (Wegler and Sens‐Schönfelder, 2007).
Computing velocity variations by comparing daily CCF with the REF can bring strong variations because of noisy CCF. Cross correlating an increasing number of days stacked together with the REF helps deciding a reasonable number of days for which the correlation coefficient reaches 0.96. A drawback of taking a too long moving‐window stack is an immediate decrease in resolution. Liu et al. (2010) have demonstrated that the choice of the window size might highlight or hide features in the data, so we made MSNoise capable of exporting multiple moving‐window stacks, for example, 2, 5, 10, and 30 days stacks. Once the user has defined the reference and a (some) reasonable moving average(s), both need to be exported before computing the relative travel‐time variations. Similarly, the definition of REF requires attention. If one takes the whole archive as reference, but it actually contains strong‐velocity variations, the quality of the reference will be greatly diminished (Duputel et al., 2009; Mordret et al., 2010). The investigation of stable periods compared with the total data length is highly recommended.
Only data for new/modified dates need to be exported. If any cross‐correlation (CC) job has been marked “Done” within the last day, the stacks will be calculated, and a new travel‐time variation (DTT) job will be inserted in the database. Note, the subsequent δt/t calculations will only be done for dates for which new/modified CC is available, or for which a new REF has been determined. If one defines a moving‐window REF, then the REF stack will need to be calculated every time, and the travel‐time variations will need to be completely recalculated too. With a fixed REF, only the δt/t for the previous day should be calculated, which makes the whole process faster.
As described in Hadziioannou et al. (2009), the accuracy of seismic velocity change measurements relies on the reconstruction of stable in time CCF. As a first step, one must thus ensure that sufficiently long noise time series have been correlated in order to reach, for every sensor pair, a coherence between individual and the reference CCFs of a certain threshold. An obvious trade‐off is the longer the time series, the higher the coherence, but the lower the time resolution.
Relative Travel‐Time Variations
Time Delay Measurements
Two techniques can be used to estimate the time delays: the moving‐window cross spectrum analysis (MWCS; first introduced by Ratdomopurbo and Poupinet, 1995) and passive interferometry (Sens‐Schönfelder and Wegler, 2006). Although two studies highlighted the stability of the second technique over the first (Duputel et al., 2009; Hadziioannou et al., 2009), the MWCS technique has the advantage of operating in the frequency domain, in which the bandwidth of coherent signal in the correlation function can be clearly defined (Clarke et al., 2011). Furthermore, the recent development of the technique by Clarke et al. (2011) has led to some improvement of MWCS technique, particularly the estimation of statistical parameters which allows assessment of the variability above which perturbations can be reliably observed. In MSNoise, time delays are computed following the methodology of Clarke et al. (2011), rewritten in Python for consistency. In the Validation section of this paper, we compare the results obtained by MSNoise with the ones obtained using Daniel Clarke’s measured FORTRAN program (personal comm., 2011).
The current CCF is compared with the reference. Both time series are sliced in several overlapping windows. Each slice is mean adjusted and cosine tapered (1% taper at both ends) before Fourier transformation to the frequency domain. Fcur(ν) and Fref(ν) are the first halves of the Hermitian symmetric Fourier‐transformed segments that were padded with zeroes to a length equal to the next power of 2 of their length. The cross spectrum X(ν) is defined as (4)in which asterisk denotes the complex conjugation. X(ν) is then smoothed by convolution with a Hanning window. The similarity of the two time series is assessed using the cross coherence between energy densities in the frequency domain: (5)in which the overline here represents the smoothing of the energy spectra for Fref and Fcur and of the spectrum of X. The mean coherence for the segment is defined as the mean of C(ν) in the frequency range of interest. The time delay between the two cross correlations is found in the unwrapped phase, , of the cross spectrum and is linearly proportional to frequency: (6)
The time shift for each window between two signals is the slope m of a weighted linear regression of the samples within the frequency band of interest. The weights are those introduced by Clarke et al. (2011), which incorporate both the cross‐spectral amplitude and cross coherence, unlike Poupinet et al. (1984). The errors are estimated using the weights (thus the coherence) and the squared misfit to the modeled slope: (7)in which w are weights, ν are cross coherences, and is the squared misfit of the data to the modeled slope and is calculated as (8)
The output of this process is a table containing, for each moving window: the central time lag, the measured delay, its error, and the mean coherence of the segment. Tables are saved as text files (comma‐separated values [csv]) to the disk. For any given day, MSNoise computes M sets (1) of delay time versus lag time and saves the result in text files.
Velocity Variations Computation
If one assumes a relative velocity variation δv/v homogeneous in space and a relative time shift δt/t between the reference and the current CF, it has been demonstrated (Ratdomopurbo and Poupinet, 1995) that (9)
In the following sections, we will mainly speak about δt/t because this is the raw information stored by MSNoise, and one can easily convert it to δv/v when needed.
The MWCS delay‐time tables saved at the previous step can then be visualized as a delay matrix for each day (Fig. 4), evidencing the slight delay differences between pairs as a function of the time lag. MSNoise is able to compute δt/t for each station pair, but also for a filtered weighted average of the pairs. An extra line is added at the bottom of the matrix once all station pairs are loaded. This line, called the “ALL” line within MSNoise, will contain a filtered column weighted average of the delay times. For each column (lag time), delay times must satisfy given rules to be included in the weighted mean calculation. Example rules are presented for the UnderVolc data in the Validation section.
The selection of the minimum and maximum lag times for the time shift computations is of much importance. As scattered waves traveling along longer paths accumulate larger time delays, most of the studies performed in volcanic environment were using the late arrivals of the CCF (e.g., Duputel et al., 2009, 5–20 s or Mordret et al., 2010, 10–30 s). An exception arises from the recent study of Anggono et al. (2012), in which ±10 to ±1 s of lag times are used. Because of poor waveform similarity in the coda part (>10 s), they neither used the MWCS nor the stretching method, but compared the maximum of the daily CCF with the one from a reference CCF. We advise the user to perform preliminary tests in different windows and to avoid working around zero time lag. Moreover, the CCF is merely symmetric on volcanoes especially at short lag times (i.e., the ballistic waves described by Duputel et al., 2009). This might evidence an inhomogeneous distribution of sources, a preferred location for the scatterers which act as secondary sources (Paul, 2005; Stehly et al., 2006) or reflect clock problems (Sens‐Schönfelder, 2008). Mordret et al. (2010) averaged the causal and anticausal parts of the CCFs to achieve a better coherency and stability of the reconstructed wavefield. However, one can also decide to solely work in the causal or acausal part of the retrieved CCFs. Ambient noise sources change in time (change in spectrum, in location and amplitude), and thus seismic velocity changes may be biased by these changes. These changes in the noise source properties mostly affect the direct reconstructed waves. In order to estimate this bias, one may thus, in a preliminary step, measure seismic velocity changes taking into account only measured travel‐time shifts of the direct or early arrivals of the CCFs.
A weighted least‐square regression (WLS) is subsequently computed for each M+1 line of the matrix, through the satisfying points (in black in the data selection matrix on Fig. 4). The WLS is computed two times, first by allowing for a constant (equation 10a) and second by forcing the regression to cross the origin ([0,0]) (equation 10b). Errors on each parameter are computed, ea and em for a and m, respectively, in equation (10a)) and em0 for m0 in equation (10b). The presence of a significantly non‐null constant a can reflect a possible instrument drift. The results, for a single day, can be plotted as a scatter plot of δt/t versus pair number (Fig. 4, right two plots) or as a matrix (Fig. 5). The former also shows the slopes determined for the “ALL” line (red vertical line) and the weighted average value of the slopes determined for all pairs (green line). In this example, results are highly similar, but, even visually, some station pairs seem to behave differently. The matrix (Fig. 5) helps in identifying station groups, which could be analyzed together. As an example in the case of the UnderVolc project, one could define two groups: “crater stations” and “large interdistance stations”, for example, (10a)and (10b)in which the subscript 0 is added to stress the [0,0] forcing.
The six values (the slopes m and m0, the constant a, and their uncertainties) are saved to the disk, for each pair and for the “ALL” line, in order to be easily readable by plotting or exporting routines.
To validate our processing work flow, we tested it against the same data set that was used to successfully identify precursory velocity changes under the Piton de la Fournaise volcano for eruptions in October and December 2010 (Brenguier et al., 2012) during the UnderVolc project.
Data and Parameters
The data set consists of 21 broadband three‐component seismic stations placed on the Piton de la Fournaise volcano at various interdistances ranging from <1 to >10 km (Fig. 6). Most of the stations were installed by the beginning of November 2009 and were removed at the end of June 2011 (Fig. 2). In fact, most of the station locations were taken over by Observatoire Volcanologique du Piton de la Fournaise (OVPF) network, using new station codes. A total of 11,393 Z‐component files have been processed in this test. For computing the CCF, the 100 Hz waveforms have been bandpassed between 0.01 and 8.0 Hz, then decimated to 20 Hz (factor 5). Only ZZ components have been calculated for different stations (no auto correlation). Each day was cut in 30 minute slices, and the resulting CCFs were saved to disk, together with the daily stack. Stacks have been calculated for 2, 5, 10, and 30 moving‐window days. The MWCS frequency band is 0.2–0.85 Hz and is computed for 10 second sliding windows overlapping by 50%. For δt/t calculations, both causal and acausal parts are used, lag times between ±5 and ±50 s are selected. Remaining selection parameters are 0.5 minimum coherence, 0.1 s maximum error, and 0.5 s maximum dt. Example timing on a 4 vCPU virtual machine and data volumes are given in Tables 1 and 2.
The δv/v variations of the volcano during the UnderVolc project as obtained by applying the same methodology on the delay times calculated with the FORTRAN code of Clarke et al. (2011) and MSNoise show very similar results (Fig. 7). Slight differences below 0.02% are noted, whereas the signal of the eruptions is stronger than 0.05%. A general trend is present and denotes an apparent slight change in the velocity of 0.2% per year. This long‐term component (LTV) sensu Brenguier, Shapiro, et al. (2008), can be removed to highlight shorter time‐scale variations. Analyzing both codes in depth, we assume that the little differences might come from differences either in the way the cosine‐smoothing function is constructed, or in the phase calculation code. We choose to use well‐tested numerical routines included in the numpy, scipy (Jones et al., 2001; Oliphant, 2006), and statsmodels (Seabold and Perktold, 2010) packages instead of rewriting Clarke’s FORTRAN functions.
Eruptions of November 2009, January 2010, October 2011, and December 2011 are preceded by a decrease of the δv/v (going up on Fig. 8). There seem to be no precursory signs of October 2009 eruption. We evidence the November 2009, October 2010, and December 2010 eruptions with a precursory time of about 15 days. For the last two eruptions, an increase of the δv/v is noticed during the day before the actual eruption. This could be explained by a velocity increase at depth, by the stronger seismic activity during the hours preceding the eruption or by the presence of eruptive tremor in the minutes before. The last two are provoking a change in the noise source provenance and are disturbing the system. It is worth noting the decorrelation of the moving‐stack function relative to the REF, five days before the eruption (Fig. 3). As Liu et al. (2010) have shown, a change in the stacking duration might highlight or erase signal in the evolution of the δv/v curve. We show on Figure 3 the different decorrelation patterns when 2, 5, 10, or 30 days stacks are compared with the same REF. For example, precursory changes of δv/v before the November 2009 eruption are visible for stack durations up to 10 days, but are completely erased when using the 30 day stack (Fig. 8). Short period (of less or equal to a few days) variations are visible on the 1, 2, and 5 day stacks (Fig. 8) and could have different origins, for example, linked with “local” causes like changes in the barometric pressure, rainfall, or seismic activity, or more “regional” ones like changes in the noise source positions. Although possible, the short period of a few days makes the “regional” causes less likely than the “local” ones.
CONCLUSIONS AND PERSPECTIVES
We present the first complete software package for computing and monitoring relative velocity variations using ambient seismic noise. MSNoise is a fully integrated solution that automatically scans data archives and determines which jobs need to be done whenever the scheduled task is executed. The whole package is written in Python, open source and freely available on http://www.msnoise.org (last accessed March 2014). The processing steps have been tested against codes used to successfully identify precursory velocity variations (Clarke et al., 2011) under the Piton de La Fournaise volcano (La Réunion Island, France) during the UnderVolc project (Brenguier et al., 2012). Using the same data set, results of both processing codes are highly similar. MSNoise is pluggable and extensible and is de facto a useful tool for researchers as they can only focus on implementing their processing code, while benefiting from its robust framework.
Our approach relies on the assumption that every wavepacket of the coda part of the reconstructed CCFs has experienced a uniform relative velocity change (δv/v=constant) along its travel path. This is a strong assumption that may not be valid in all cases. The consequence is that the measured relative seismic velocity change may strongly differ from “real” δv/v’s within the studied medium. A main limitation arises from the fact that the coda wavepackets may travel areas of nonuniform velocity changes. In that case, the measured δv/v is an average of the real δv/v, and one may, for example, misinterpret a change of δv as an increase of δv/v whereas it could simply be associated with a spatial extent of a constant value δv/v.
Finally, although we present the validation on MSNoise in a volcanic environment, its application should not be restricted to this specific case, and we are looking forward to seeing applications applied to different contexts and to receiving comments and contributions from other research groups. Contributions could be, for example, a new correlation algorithm, a new δt/t calculation technique, the implementation of attenuation computation (like in Prieto et al., 2011), the usage of the delay matrices to locate variations of δv/v, or decorrelation at different time lags (like in Larose et al., 2010; Obermann et al., 2013). Other contributions could include the analysis of much higher‐frequency data (higher than 1 kHz such as in Olivier et al., 2012), or the usage of the CCF as input to an frequency‐time analysis (FTAN) analysis (as described in Levshin et al., 1989) for computing dispersion and subsequent inversion of 1D surface‐wave velocity profiles. We are planning to implement the possibility to define multiple references in the same way as the multiple stacks and the possibility to define several “station groupings” similar to the “ALL” (see Velocity Variations Computation section).
Corentin Caudron’s Ph.D. work is supported by the Belspo (Action 2 Grant WI/33/J02). Thomas Lecocq’s 1 month stay at the Observatoire Volcanologique du Piton de la Fournaise (OVPF) has been partly supported by the Institut du Physique du Globe (IPGP). The data used for the analysis were collected by the IPGP/OVPF, and the Institut des Sciences de la Terre (ISTerre) within the framework of the ANR_08_RISK_011/UnderVolc project. The sensors are property of the French transportable seismic network, Sismob (INSU‐CNRS). Thierry Camelbeeck (Royal Observatory of Belgium [ROB]) is acknowledged for his continuous support of this project and Daniel Clarke for providing his moving‐window cross spectrum (MWCS) codes in order to compare and validate our own. OVPF staff is thanked for their help throughout the UnderVolc project and for their warm welcome during Thomas’ visit in 2011. Josef Perktold, of statsmodels, is acknowledged for his kind help on the usage of the weighted least‐square regression. Christoph Gohlke from the Laboratory for Fluorescence Dynamics, University of California (Irvine) is acknowledged for providing an extensive list of prebuilt Python packages for Windows on his website (http://www.lfd.uci.edu/~gohlke/pythonlibs/; last accessed March 2014). Finally thanks to all the alpha/beta users of MSNoise for providing inestimable feedback!
The editor‐in‐chief, associate editor, and Reviewer Weston Thelen are acknowledged for their constructive comments.