- Ground-motion model is a nonphysical function (subsymbolic) (polynomial) of predictor variables (Mw,
rjb, V s,30, fault mechanism and depth to top of rupture) with 48 coefficients (not reported) (14 for Mw, 5 for
rjb, 4 for V s,30, 6 for rupture depth, 15 for combination of Mw and rjb, intercept parameter, pseudo-depth
and 2 for mechanism). Use polynomials because simple, flexible and easy to understand.
- Characterize sites using V s,30.
- Use three faulting mechanisms:
- Rake angle between 30 and 150∘. 19 earthquakes and 1870 records.
- Rake angle between -150 and -30∘. 11 earthquakes and 49 records.
- Other rake angle. 30 earthquakes and 741 records.
- Use data from NGA project because best dataset currently available. Note that significant amount of
metadata are missing. Discuss the problems of missing metadata. Assume that metadata are missing at
random, which means that it is possible to perform unbiased statistical inference. To overcome missing
metadata only select records where all metadata exist, which note is only strictly valid when metadata are
missing completely at random.
- Select only records that are representative of free-field conditions based on Geomatrix classification C1.
- Exclude some data from Chi-Chi sequence due to poor quality or co-located instruments.
- Exclude data from rjb > 200km because of low engineering significance and to reduce correlation between
magnitude and distance. Also note that this reduces possible bias due to different attenuation in different
- In original selection one record with Mw5.2 and the next at Mw5.61. Record with Mw5.2 had a dominant
role for small magnitudes so it was removed.
- Discuss the problem of over-fitting (modelling more spurious details of sample than are supported by data
generating process) and propose the use of generalization error (estimated using cross validation), which
directly estimates the average prediction error for data not used to develop model, to counteract it. Judge
quality of model primarily in terms of predictive power. Conclude that approach is viable for large datasets.
- State that objective is not to develop a fully-fledged alternative NGA model but to present an extension
to traditional modelling strategies, based on intelligent data analysis from the fields of machine learning
and artificial intelligence.
- For k-fold cross validation, split data into k roughly equal-sized subsets. Fit model to k - 1 subsets and
compute prediction error for unused subset. Repeat for all k subsets. Combine k prediction error estimates
to obtain estimate of generalization error. Use k = 10, which is often used for this approach.
- Use rjb because some trials with simple functional form show that it gives a smaller generalization error
than, e.g., rrup.
- Start with simple functional form and add new terms and retain those that lead to a reduction in
- Note that some coefficients not statistically significant at 5% level but note that 5% is an arbitrary level
and they result in lower generalization error.
- Compare generalization error of final model to that from fitting the functional form of Akkar and
Bommer (2007b) and an over-fit polynomial model with 58 coefficients and find they have considerably
higher generalization errors.
- After having found the functional form, refit equation using random-effects regression.
- Note that little data for rjb < 5km.
- Note that weakness of model is that it is not physically interpretable and it cannot be extrapolated. Also
note that could have problems if dataset is not representative of underlying data generating process.
- Note that problem with magnitude scaling of model since available data is not representative of underlying