# Statistics

## Introduction

With Desmos, students can investigate the shape, center, and spread of various data sets, run regression to model bivariate data, or (with a little bit of elbow grease) create and explore dynamic displays of important stats topics. You'll find lists at the core of our statistics experience. Start there, then dive deeper with the resources and challenges below.

The calculator provides several functions for computing statistical properties from lists of data, performing basic statistical tests, counting combinations and permutations, working with distributions, and generating random values. These functions are accessible from the "Stats" and "Dist" sections of the "functions" menu in the keypad, or can be typed directly into the expressions list using a keyboard.

## General Statistical Functions

Function: Result:
total(list) or total(a,b,c,...) Output the sum of a list of numbers.
length(list) or length(a,b,c,...) Output the length of a list of numbers.
mean(list) or mean(a,b,c,...) or mean(distribution) Output the mean of a list of numbers. This function will also return the mean of a distribution, if it exists. See the section on distributions below.
median(list) or median(a,b,c,...) or median(distribution) Output the median of a list of numbers. This function will also return the median of a distribution, if it exists. See the section on distributions below.
min(list) or min(a,b,c,...) Output the minimum value contained in a list of numbers.
max(list) or max(a,b,c,...) Output the maximum value contained in a list of numbers.
quartile(list, q) Output the qth quartile of list. q must be a number between 0 and 4 (inclusive), otherwise the result will be undefined. Note that this function uses the Moore and McCabe method, which discards the median in odd-length data sets before computing the upper and lower quartiles.
quantile(list, q) or quantile(distribution, q) Output the qth quantile of list. q must be a number between 0 and 1 (inclusive), otherwise the result will be undefined. Passing a distribution as the first argument to quantile allows you to evaluate its inverse CDF. See the section on distributions below.
inversecdf(distribution,q) An alias for quantile. See the section on distributions below.
~ Used for performing regressions.
stdev(list) or stdev(a,b,c,...) or stdev(distribution) Output the sample standard deviation of a list of numbers. This function will also return the standard deviation of a distribution, if it exists. See the section on distributions below.
stdevp(list) or stdevp(a,b,c,...) Output the population standard deviation of a list of numbers.
mad(list) or mad(a,b,c,...) Output the mean absolute deviation of a list of numbers.
var(list) or var(a,b,c,...) or var(distribution) Output the sample variance of a list of numbers. This function will also return the variance for a distribution, if it exists. See the section on distributions below. Note that, while not on the keypad, a varp function is also available to compute the population variance.
cov(list1, list2) Output the covariance between two lists of numbers.
corr(list1,list2) Output the Pearson correlation coefficient between two lists of numbers.
spearman(list1, list2) Output Spearman's rank correlation coefficient between two lists of numbers. Note that any repeated data values are assigned their average (possibly fractional) rank before computing the correlation.
nCr(n, r) Output the number of r-sized combinations (unordered arrangements) that can be selected from a set of size n.
nPr(n, r) Output the number of r-sized permutations (ordered arrangements) that can be selected from a set of size n.
n! Output the factorial of n

## Statistical Tests

#### ttest(list, value = 0)

Perform a one-sample t-test of whether the mean of the population from which list is sampled differs from value (the null hypothesis). The output includes p-values for both the one-tailed versions (labeled "less than" and "greater than") and the two-tailed version (labeled "not equal") of the test. Note that if the second argument is omitted the hypothesized mean defaults to 0.

#### tscore(list, value = 0)

Output the raw test statistic used in the one-sample ttest function.

#### ittest(list1, list2)

Perform an independent (unpaired) two-sample t-test of whether the mean of the population from which list1 is sampled differs from the mean of the population from which list2 is sampled. The output includes p-values for both the one-tailed versions (labeled "less than" and "greater than") and the two-tailed version (labeled "not equal") of the test. Note that, while the sample sizes may differ (list1 and list2 need not have equal length), this test does assume that the underlying populations have equal variance.

## Distributions

The calculator can plot the probability density functions (PDFs), probability mass functions (PMFs), and cumulative distribution functions (CDFs) of several common statistical distributions, as well as compute cumulative probabilities for those distributions.

### Plotting

Each of the following functions will plot a distribution's PDF or PMF.

#### uniformdist(minimum = 0, maximum = 1)

Plot the PDF of a uniform distribution with the given minimum and maximum. Note that if the second argument is omitted the maximum defaults to 1, and if both arguments are omitted the minimum also defaults to 0.

#### normaldist(mean = 0, standard deviation = 1)

Plot the PDF of a normal distribution with the given mean and standard deviation. Note that if the second argument is omitted the standard deviation defaults to 1, and if both arguments are omitted the mean also defaults to 0.

#### tlist(degrees of freedom)

Plot the PDF of a Student's t-distribution with the given degrees of freedom. Note that degrees of freedom must be greater than 0.

#### poissondist(mean)

Plot the PMF of a Poisson distribution with the given mean. Note that mean must be greater than 0.

#### binomialdist(trials, probability = 0.5)

Plot the PMF of a binomial distribution given a number of (independent) trials and a probability of success on each trial. Note that trials must be a nonnegative integer and probability must be a number between 0 and 1 (inclusive).

### Computing cumulative probabilities

When using any of the above functions to plot a PDF/PMF, a checkbox labeled "Find Cumulative Probability (CDF)" will appear. If that box is checked, the calculator will output the cumulative probability between the values in the "Min" and "Max" input fields. It will also display a visualization of the cumulative probability, either as a shaded region under the curve (for continuous distributions) or as a series of vertical segments and points (for discrete distributions).

### Normal Distribution: ### Binomial Distribution: ### Other functions for use with distributions

The top-level distribution functions offer a simple way to plot PDFs and PMFs and compute cumulative probabilities, but the calculator also provides some functions for working with distribution PDFs/PMFs and CDFs inside of other expressions. Once you have created a distribution, you are able to access its .pdf(), .cdf(), .inversecdf(), and .random() functions. (For more information about .random(), see the section below on generating random values.) *Note that for discrete distributions there is a difference between what the calculator will plot for the top-level distribution function and what it will plot for the .pdf() function. When using the .pdf() and .cdf() functions, a discrete PMF or CDF will be plotted as a step function rather than as a series of points.

#### distribution.pdf(value)

Evaluate distribution's PDF/PMF at the given value. If value is numeric, the calculator will output a numeric evaluation. If value is an expression that depends on a free variable, the calculator will plot the PDF/PMF as a function of value.

#### distribution.cdf(value)

Evaluate distribution's CDF at the given value. If value is numeric, the calculator will output a numeric evaluation. If value is an expression that depends on a free variable, the calculator will plot the CDF as a function of value. For example, normaldist(0,1).cdf(2) will output the probability that a random variable from a standard normal distribution has a value less than or equal to 2.

#### distribution.cdf(lower, upper)

Compute distribution's cumulative probability between lower and upper. For example, normaldist(0,1).cdf(-1, 1) will output the probability that a random variable from a standard normal distribution has a value between -1 and 1. Note that for discrete distributions d.pdf(x) will round x to the nearest integer, and a plot of d.pdf(x) will look like a piecewise-constant function. To plot a set of points instead, you could use a table or a point list: R=[0…10], (R, d.pdf(R)). The .pdf() and .cdf() functions let you combine distributions in interesting ways. For example, by plotting the difference between their PDFs, it's possible to see that a t-distribution approaches a standard normal distribution as its number of degrees of freedom increases: #### distribution.inversecdf(value)

Compute distribution's inverse cumulative density at value. If value is numeric, the calculator will output a numeric evaluation. If value is an expression that depends on a free variable, the calculator will plot the inverse CDF as a function of value. For example, normaldist(0,1).inversecdf(0.5) will output 0 because normaldist(0,1).cdf(0) is 0.5.

## Generating Random Values

The calculator offers a single function called random() for generating different kinds of random values in different contexts, depending on the provided arguments. For example, calling random() on a list will uniformly select elements from the list, and calling random() on a distribution will sample numbers with a frequency defined by that distribution. Regardless of the context, calling random() without additional arguments will return a single value; calling random(n) with a single additional argument will return a list of n values; and calling random(n, seed) will return a list of n values, using seed to influence the random number generator. See the note on seeds below.

Type: Result:
random() Generate a random value sampled uniformly from the interval [0,1).
random(n) Generate a list of n random values sampled uniformly from the interval [0,1).
random(n, seed) Generate a list of n random values sampled uniformly from the interval [0,1), using seed to influence the random number generator.
list.random() Return a single item selected uniformly from list.
list.random(n) Return a list of n items selected uniformly—with replacement—from list.
distribution.random() Return a single random number sampled from distribution.
distribution.random(n) Return a list of n samples drawn from distribution.
distribution.random(n, seed) Return a list of n samples drawn from distribution, using seed to influence the random number generator.

### A Note on Random Seeds

Like many computer programs, Desmos uses a pseudorandom number generator (PRNG) to produce sequences of numbers that are in practice reasonably indistinguishable from random even though they are in fact deterministic. The sequence of numbers produced by a PRNG is fixed by an initial value called its seed. The calculator does most of the work of creating and managing these seeds for you, but also offers two ways for you to force a seed update. 1. If any expression contains a random() call, a small "randomize" icon will appear at the top of the expressions list. Clicking it will set the global seed to a new value, which will simultaneously re-randomize all expressions that use random(). 2. It is also possible to pass an optional seed argument to any individual random() call, which will only affect the seed for that specific call. This is mainly useful because it allows you to re-randomize a single expression in response to updates elsewhere in the expressions list. For instance, using a slider as a seed argument allows you to generate new random values whenever the slider is moved, perhaps in an animation. It's important to note that the seed argument you pass to random() is only one small part of the overall seed consumed by the PRNG. The other parts are beyond user control—and are not necessarily stable as expressions are edited—so you should not rely on random() results being reproducible. You should think of the seed argument as a mechanism for timing the generation of new random values, rather than as a mechanism for setting specific new random values.

## Statistics in Action

### "The best way to learn is to do." – Paul Halmos ### Graphing Challenges

Stretch your skills with graphing challenges. 