4.1. clustering

Functions to cluster seismograms by a range of constraints.

Copyright 2015 Calum Chamberlain

This file is part of EQcorrscan.

EQcorrscan is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

EQcorrscan is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with EQcorrscan. If not, see <http://www.gnu.org/licenses/>.

clustering.SVD(stream_list)[source]

Function to compute the SVD of a number of templates and return the singular vectors and singular values of the templates.

Parameters:stream_list (List of Obspy.Stream) – List of the templates to be analysed
Returns:SVector(list of ndarray), SValues(list) for each channel, Uvalues(list of ndarray) for each channel, stachans, List of String (station.channel)

Note

We recommend that you align the data before computing the SVD, e.g., the P-arrival on all templates for the same channel should appear at the same time in the trace.

clustering.SVD_2_stream(SVectors, stachans, k, sampling_rate)[source]

Function to convert the singular vectors output by SVD to streams, one for each singular vector level, for all channels.

Parameters:
  • SVectors (List of np.ndarray) – Singular vectors
  • stachans (List of Strings) – List of station.channel Strings
  • k (int) – Number of streams to return = number of SV’s to include
  • sampling_rate (float) – Sampling rate in Hz
Returns:

SVstreams, List of Obspy.Stream, with SVStreams[0] being composed of the highest rank singular vectors.

clustering.cluster(stream_list, show=True, corr_thresh=0.3, save_corrmat=False, cores=u'all', debug=1)[source]

Function to take a set of templates and cluster them, will return groups as lists of streams. Clustering is done by computing the cross-channel correlation sum of each stream in stream_list with every other stream in the list. Scipy.cluster.hierachy functions are then used to compute the complete distance matrix, where distance is 1 minus the normalised cross-correlation sum such that larger distances are less similar events. Groups are then created by clustering the distance matrix at distances less than 1 - corr_thresh.

Will compute the distance matrix in parallel, using all available cores

Parameters:
  • stream_list (List of Obspy.Stream) – List of templates to compute clustering for
  • show (bool) – plot linkage on screen if True, defaults to True
  • corr_thresh (float) – Cross-channel correlation threshold for grouping
  • save_corrmat (bool) – If True will save the distance matrix to dist_mat.npy
  • cores (int) – numebr of cores to use when computing the distance matrix, defaults to ‘all’ which will work out how many cpus are available and hog them.
  • debug (int) – Level of debugging from 1-5, higher is more output, currently only level 1 implimented.
Returns:

List of groups with each group a list of streams making up that group.

clustering.corr_cluster(trace_list, thresh=0.9)[source]

Group traces based on correlations above threshold with the stack - will run twice, once with a lower threshold, then again with your threshold to remove large outliers.

Parameters:
  • trace_list (list of obspy.Trace) – Traces to compute similarity between
  • thresh (float) – Correlation threshold between -1-1
Returns:

np.ndarray of bool

clustering.cross_chan_coherence(st1, st2, i=0)[source]

Function to determine the cross-channel coherancy between two streams of multichannel seismic data.

Parameters:
  • st1 (obspy Stream) – Stream one
  • st2 (obspy Stream) – Stream two
  • i (int) – index used for parallel async processing, returned unaltered
Returns:

cross channel coherence, float - normalized by number of channels, if i, returns tuple of (cccoh, i) where i is int, as intput.

clustering.distance_matrix(stream_list, cores=1)[source]

Function to compute the distance matrix for all templates - will give distance as 1-abs(cccoh), e.g. a well correlated pair of templates will have small distances, and an equally well correlated reverse image will have the same distance as apositively correlated image - this is an issue.

Parameters:
  • stream_list (List of obspy.Streams) – List of the streams to compute the distance matrix for
  • cores (int) – Number of cores to parallel process using, defaults to 1.
Returns:

ndarray - distance matrix

clustering.empirical_SVD(stream_list, linear=True)[source]

Empirical subspace detector generation function. Takes a list of templates and computes the stack as the first order subspace detector, and the differential of this as the second order subspace detector following the emprical subspace method of Barrett & Beroza, 2014 - SRL.

Parameters:
  • stream_list (list of stream) – list of template streams to compute the subspace detectors from
  • linear (bool) – Set to true by default to compute the linear stack as the first subspace vector, False will use the phase-weighted stack as the first subspace vector.
Returns:

list of two streams

clustering.extract_detections(detections, templates, contbase_list, extract_len=90.0, outdir=None, extract_Z=True, additional_stations=[])[source]

Function to extract the waveforms associated with each detection in a list of detections for the template, template. Waveforms will be returned as a list of obspy.Streams containing segments of extract_len. They will also be saved if outdir is set. The default is unset. The default extract_len is 90 seconds per channel.

Parameters:
  • detections (list tuple of of :class: datetime.datetime, string) – List of datetime objects, and their associated template name.
  • templates (list of tuple of string and :class: obspy.Stream) – A list of the tuples of the template name and the template Stream used to detect detections.
  • contbase_list (list of tuple of string) – List of tuples of the form [‘path’, ‘type’, ‘network’] Where path is the path to the continuous database, type is the directory structure, which can be either Yyyyy/Rjjj.01, which is the standard IRIS Year, julian day structure, or, yyyymmdd which is a single directory for every day.
  • extract_len (float) – Length to extract around the detection (will be equally cut around the detection time) in seconds. Default is 90.0.
  • outdir (bool or str) – Default is None, with None set, no files will be saved, if set each detection will be saved into this directory with files named according to the detection time, NOT than the waveform start time. Detections will be saved into template subdirectories.
  • extract_Z (bool) – Set to True to also extract Z channels for detections delays will be the same as horizontal channels, only applies if only horizontal channels were used in the template.
  • additional_stations (list of tuple) – List of stations, chanels and networks to also extract data for using an average delay.
Returns:

list of :class: obspy.Stream

clustering.group_delays(stream_list)[source]

Function to group template waveforms according to their delays.

Parameters:stream_list (list of obspy.Stream) – List of the waveforms you want to group
Returns:list of List of obspy.Streams where each initial list is a group with the same delays.
clustering.re_thresh_csv(path, old_thresh, new_thresh, chan_thresh)[source]

Function to remove detections by changing the threshold, can only be done to remove detection by increasing threshold, threshold lowering will have no effect.

Parameters:
  • path (str) – Path to the .csv detection file
  • old_thresh (float) – Old threshold MAD multiplier
  • new_thresh (float) – New threhsold MAD multiplier
  • chan_thresh (int) – Minimum number of channels for a detection

returns: List of detections

clustering.space_time_cluster(detections, t_thresh, d_thresh)[source]

Function to cluster detections in space and time, use to seperate repeaters from other events.

Parameters:
  • detections (list) – List of tuple of tuple of location (lat, lon, depth (km)), and time as a datetime object
  • t_thresh (float) – Maximum inter-event time threshold in seconds
  • d_thresh (float) – Maximum inter-event distance in km
Returns:

List of tuple (detections, clustered) and list of indeces of clustered detections