


Some journals have banned their use altogether, but others still will only accept “significant” results.There’s no reason to set a line in the sand for “significance” - 0.05 means that there’s a 1 in 20 probability your result could be random chance, and 0.056 means it’s 1 in 18. The 0.05 thing is from a single sentence, taken out of context, from a book published in 1925.Otherwise you’re just equating correlation and causation. High p values often mean your independent variables are irrelevant, but low p values don’t mean they’re important - that judgement requires a rational justification, and examining the effect size and importance.The probability of seeing an effect of the same size as our results given a random model.Almost never used in business, as the important question is usually not does x cause y but can x predict y.

The alternative hypothesis in a two-tailed test is that the quantities are different, while the alternative hypothesis in a one-tailed test is that one quantity is larger or smaller than the other.Comparing the null hypothesis (typically, that two quantities are equivalent) to an alternative hypothesis.Called an estimate as we are approximating population-level values from sample data.Named after the tail and not the peak of the graph, as values in that tail occur more often than would be expected with a normal distribution.A left-skewed distribution has a long tail on the left side of the graph, while a right-skewed distribution has a long tail to the right.Data where the median does not equal the mean.Many datasets - especially in nature - aren’t.Many statistical analyses assume your data are normally distributed.Data where mean = median, 2/3 of the data are within one standard deviation of the mean, 95% of the data are within two SD and 97% are within 3.Usually shown as a curved line on a graph, or a histogram.How often every possible value occurs in a dataset.R used to deal with unstructured data by converting it to factors while this isn’t necessary anymore, some functions still require text data to be in factor form.Data without a strict format, typically composed of text.Useful with binned data, but also in graphing to rearrange the order categories are drawn.A type of categorical data where each value is assigned a level or rank.Often used in situations where a “hit” - an animal getting trapped, a customer clicking a link, etc - is a 1, and no hit is a 0.Categorical data where the only values are 0 and 1.“between 1 and 2 inches”) is typically categorical
#Basic r studio commands code
