- Ground-motion model is (form based on non-parametric analyses and previous studies):
- Response parameter is pseudo-acceleration for 5% damping.
- V
_{s,30}between 106 and 2100m∕s. - Only include data from earthquakes with most reliable locations and magnitudes from a master database of
about 157 000 records from the KiK-net network from October 1997 to December 2011. Exclude subduction
events and those with focal depth > 35km. Only include surface records with measured V
_{s,30}. Only consider records within their individual conservative passband and those records with signal-to-noise ratios ≥ 3 within this passband. Exclude data from earthquakes with < 3 usable records. - Derive model to propose via a data-driven approach a better site classification than one based on V
_{s,30}. - Data from 644 different sites.
- Data distribution dominated by distant > 50km records and events with M
_{w}< 5. Hence site terms capture linear site response. - Use multi-step mixed-effects regression technique to estimate τ (inter-event), ϕ
_{S2S}(inter-site) and ϕ_{0}(residual) variabilities. Firstly calibrate f_{R}then use distance-corrected observations to find f_{M}. This is done to ensure coefficients unbiased by a few well-observed earthquakes or sites. Do not include site term in the original function. - Do not include a term related to faulting mechanism because did not find significant dependency within non-parametric analyses on mechanism.
- Examine the intermediate residuals w.r.t. M
_{w}, V_{s,30}and r_{jb}. Compute mean, 15th and 85th percentiles of residuals within 10 magnitude bins and 10 distance bins. Find no significant trends. - Plot predicted and observed (from within small magnitude bins) ground motions for 0.02, 0.2 and 2s w.r.t. distance. Plot observations colour-coded by distance and predictions w.r.t. magnitude. Find good match.
- Classify the 588 stations with δS2S available at all periods into 8 site clusters (number specified a priori)
with distinct mean site amplification functions and within-cluster site-to-site variability about 50% smaller
than overall ϕ
_{S2S}using a spectral (k-means) clustering analysis (a type of unsupervised machine learning). Choose 8 as number of clusters based on consideration of total within sum of squares (WSS) and the gap statistic comparing the WSS change with that expected under an appropriate null reference distribution of the data. Examine average site amplifications within each cluster and find clear separation. Compare classification with previous classifications. Examine distribution of T_{G}(predominant period) V_{s,10}, V_{s,30}and H_{800}(depth to horizon with V_{s}= 800m∕s) within each class. Find that some combinations of these parameters can be used to classify stations into the 8 classes.