The GAM model of creature level shows relative influence to a few attributes.
model.gam.everything <- gam(
level ~
s(hardiness) +
s(fortitude) +
s(dexterity) +
s(endurance) +
s(intellect) +
s(cleverness) +
s(dependability) +
s(courage) +
s(fierceness) +
s(power) +
s(kinetic) +
s(energy) +
s(blast) +
s(heat) +
s(cold) +
s(electricity) +
s(acid) +
s(stun),
data = normalized_df,
family = gaussian()
)
summary(model.gam.everything)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## level ~ s(hardiness) + s(fortitude) + s(dexterity) + s(endurance) +
## s(intellect) + s(cleverness) + s(dependability) + s(courage) +
## s(fierceness) + s(power) + s(kinetic) + s(energy) + s(blast) +
## s(heat) + s(cold) + s(electricity) + s(acid) + s(stun)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.54878 0.06721 320.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(hardiness) 6.331 7.532 6.207 4.94e-07 ***
## s(fortitude) 8.919 8.992 79.784 < 2e-16 ***
## s(dexterity) 4.733 5.879 7.500 2.43e-07 ***
## s(endurance) 1.000 1.000 7.105 0.008049 **
## s(intellect) 3.090 3.896 26.842 < 2e-16 ***
## s(cleverness) 3.003 3.837 20.640 1.19e-14 ***
## s(dependability) 1.737 2.188 3.906 0.021412 *
## s(courage) 3.747 4.697 3.524 0.004945 **
## s(fierceness) 1.000 1.000 4.193 0.041341 *
## s(power) 7.465 8.378 18.327 < 2e-16 ***
## s(kinetic) 8.407 8.880 6.845 6.02e-09 ***
## s(energy) 8.486 8.900 50.274 < 2e-16 ***
## s(blast) 1.000 1.000 0.420 0.517232
## s(heat) 1.000 1.000 0.012 0.914625
## s(cold) 3.302 4.128 10.697 2.26e-08 ***
## s(electricity) 1.000 1.000 7.921 0.005165 **
## s(acid) 3.384 4.168 4.509 0.001356 **
## s(stun) 2.960 3.675 5.201 0.000719 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.991 Deviance explained = 99.2%
## GCV = 2.2436 Scale est. = 1.852 n = 410
Specifically, the following attributes:
The appearance of kinetic
, energy
, cold
and no other resists is especially strange.
Also interesting is that the following attributes have 1.00
degrees of freedom (so their influence is essentially flat):
We are going to make some assumptions about the model based on our domain knowledge of the game, and to avoid over-fitting.
It’s possible that the additive models associate higher weights to hardiness due to its correlation with fortitude. However, for now, we’ll assume that each attribute is more or less equal.
We’re combining these two because they are uniquely capped at 60%.
This is due to domain knowledge – “vuln stacking” creatures caused a measurable drop in creature level. It’s possible the special emphasis given to cold
is due to over-fitting or something unique about furrycat’s data.
To mediate these concerns, we can create the following synthetic attributes
average_hdi
– is the average of hardiness, dexterity, and intellect. Taking the mean of these attributes and training the GAM on this synthetic feature will force it not to over-fit on any of hardiness, dexterity, and intellect.
kinen
– mean of kinetic and energy, for the same reasons.
nonkinen
– mean of cold, heat, electricity, acid, and stun.
With that, this is the final model:
model.gam <- gam(
level ~
s(average_hdi) +
s(fortitude) +
s(cleverness) +
s(power) +
s(kinen) +
s(nonkinen),
data = normalized_df
)
summary(model.gam)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## level ~ s(average_hdi) + s(fortitude) + s(cleverness) + s(power) +
## s(kinen) + s(nonkinen)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.54878 0.08488 253.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(average_hdi) 6.729 7.813 44.02 <2e-16 ***
## s(fortitude) 8.904 8.993 83.78 <2e-16 ***
## s(cleverness) 7.920 8.680 16.11 <2e-16 ***
## s(power) 2.856 3.656 49.38 <2e-16 ***
## s(kinen) 4.745 5.800 53.25 <2e-16 ***
## s(nonkinen) 2.686 3.399 38.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.986 Deviance explained = 98.7%
## GCV = 3.2284 Scale est. = 2.954 n = 410
Analysis of the data shows extreme an extreme segmentation point \(fortitude = 500\).
In fact, the \(fortitude >= 500\) data set is particularly well-behaved.
model.gam.armor <- gam(
level ~
s(average_hdi) +
s(fortitude) +
s(cleverness) +
s(power) +
s(kinen) +
s(nonkinen),
data = armor_df
)
summary(model.gam.armor)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## level ~ s(average_hdi) + s(fortitude) + s(cleverness) + s(power) +
## s(kinen) + s(nonkinen)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.7722 0.1541 277.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(average_hdi) 4.641 5.649 20.446 4.86e-15 ***
## s(fortitude) 7.098 8.062 7.065 7.35e-07 ***
## s(cleverness) 1.000 1.000 64.836 5.41e-12 ***
## s(power) 1.445 1.759 43.904 4.98e-12 ***
## s(kinen) 1.000 1.000 44.441 4.67e-09 ***
## s(nonkinen) 4.239 5.227 11.536 2.51e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.99 Deviance explained = 99.3%
## GCV = 2.5287 Scale est. = 1.875 n = 79
In fact, the relative degrees of freedom show that several of these parameters are already close to linear.
linear.fit.level.armor <- lm(
level ~
average_hdi +
fortitude +
cleverness +
power +
kinen +
nonkinen,
data = armor_df
)
summary(linear.fit.level.armor)
##
## Call:
## lm(formula = level ~ average_hdi + fortitude + cleverness + power +
## kinen + nonkinen, data = armor_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1643 -1.0873 0.1926 1.0277 4.3139
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -21.331842 3.491081 -6.110 4.60e-08 ***
## average_hdi 0.027648 0.004965 5.568 4.18e-07 ***
## fortitude 0.056252 0.006059 9.285 6.18e-14 ***
## cleverness 0.024034 0.003182 7.552 1.05e-10 ***
## power 0.015740 0.002460 6.398 1.39e-08 ***
## kinen 0.096920 0.018767 5.164 2.06e-06 ***
## nonkinen 0.085904 0.015458 5.557 4.36e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.93 on 72 degrees of freedom
## Multiple R-squared: 0.9818, Adjusted R-squared: 0.9802
## F-statistic: 645.6 on 6 and 72 DF, p-value: < 2.2e-16
This simple linear model behaves remarkably well.
And analysis of the residuals are highly promising.
shapiro.test(rs.model.level.armor)
##
## Shapiro-Wilk normality test
##
## data: rs.model.level.armor
## W = 0.98336, p-value = 0.3941
bptest(linear.fit.level.armor)
##
## studentized Breusch-Pagan test
##
## data: linear.fit.level.armor
## BP = 6.1042, df = 6, p-value = 0.4116
To summarize, this simple linear model for the “armored” data set (i.e. creatures that have armor):
Taken together, this may imply that the residuals are due to randomness within the crafting system itself, and may not be due to missing variables or unknown non-linear relationships.
The same analysis for unarmored creatures does not look as promising.
linear.fit.level.noarmor <- lm(
level ~
average_hdi +
fortitude +
cleverness +
power +
kinen +
nonkinen,
data = no_armor_df
)
summary(linear.fit.level.noarmor)
##
## Call:
## lm(formula = level ~ average_hdi + fortitude + cleverness + power +
## kinen + nonkinen, data = no_armor_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2233 -1.1795 0.0108 1.1361 7.7033
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.935414 0.361548 19.183 < 2e-16 ***
## average_hdi 0.032619 0.001946 16.759 < 2e-16 ***
## fortitude -0.020018 0.001334 -15.004 < 2e-16 ***
## cleverness 0.025625 0.002112 12.131 < 2e-16 ***
## power 0.013442 0.001331 10.095 < 2e-16 ***
## kinen 0.109083 0.008599 12.686 < 2e-16 ***
## nonkinen 0.050314 0.006046 8.321 2.46e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.245 on 324 degrees of freedom
## Multiple R-squared: 0.9341, Adjusted R-squared: 0.9328
## F-statistic: 764.9 on 6 and 324 DF, p-value: < 2.2e-16
Which shows a statistically decent but clearly heteroscedastic fit.
The residuals are clearly not normal.
shapiro.test(rs.model.level.noarmor)
##
## Shapiro-Wilk normality test
##
## data: rs.model.level.noarmor
## W = 0.9869, p-value = 0.004318
bptest(linear.fit.level.noarmor)
##
## studentized Breusch-Pagan test
##
## data: linear.fit.level.noarmor
## BP = 49.813, df = 6, p-value = 5.126e-09
For unarmored creature level, a few ideas:
The influence of fortitude shows a slight negative influence below \(fortitude = 500\), and a positive influence above.
See the influence plot,
This is reflected in the above linear model, where fortitude’s slope in the unarmored data is actually negative!
Recall the result of the linear, unarmored model:
##
## Call:
## lm(formula = level ~ average_hdi + fortitude + cleverness + power +
## kinen + nonkinen, data = no_armor_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2233 -1.1795 0.0108 1.1361 7.7033
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.935414 0.361548 19.183 < 2e-16 ***
## average_hdi 0.032619 0.001946 16.759 < 2e-16 ***
## fortitude -0.020018 0.001334 -15.004 < 2e-16 ***
## cleverness 0.025625 0.002112 12.131 < 2e-16 ***
## power 0.013442 0.001331 10.095 < 2e-16 ***
## kinen 0.109083 0.008599 12.686 < 2e-16 ***
## nonkinen 0.050314 0.006046 8.321 2.46e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.245 on 324 degrees of freedom
## Multiple R-squared: 0.9341, Adjusted R-squared: 0.9328
## F-statistic: 764.9 on 6 and 324 DF, p-value: < 2.2e-16
Can this be true? Other influence models, such as GBMs, also show the same thing. Still, it is not an intuitive finding. However, note that:
I’m unsure if this could be accurate, but it is compelling nonetheless.