Aggregate functions summarize the values from each event to create a single, meaningful value. Common aggregate functions include Average, Count, Minimum, Maximum, Standard Deviation, Sum, and Variance.
Most aggregate functions are used with numeric fields. However, there are some functions that you can use with either alphabetic string fields or numeric fields. The function descriptions indicate which functions you can use with alphabetic strings.
For an overview, see Overview of SPL2 stats functions.
avg(<value>)
This function returns the average, or mean, of the values in a field.
Usage
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Examples
The following example returns the average of the values in the size
field for each distinct value in the host
field.
... | stats avg(size) BY host
The following example returns the average thruput
of each host
for each 5 minute time span.
... | bin _time span=5m | stats avg(thruput) BY host
The following example displays a timechart of the average of the cpu_seconds
field by processor, rounded to 2 decimal points.
... | timechart eval(round(avg(cpu_seconds),2)) BY processor
When you use a eval expression with the timechart
command, you must also use BY clause.
count(<value>) or c(<value>)
This function returns the number of occurrences in a field.
Usage
To use this function, you can specify count(<value>)
, or the abbreviation c(<value>)
.
This function processes field values as strings.
To indicate a specific field value to match, use the format <field>=<value>
.
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Examples
Several of these examples use an eval expression with the count
function. See Using eval expressions in stats functions.
The following example returns the count of events where the status
field has the value "404".
...| stats count(eval(status="404")) AS count_status BY sourcetype
The following example separates the search results into 10 bins and counts the values in the _raw
field for each bin.
... | bin size bins=10 | stats count(_raw) BY size
You can use search literals in functions that accept predicate expressions. Search literals are enclosed in backtick characters ( ` ). The following search uses a search literal to count the occurrences of the value 500
in your events. The results are organized by the host
field:
... | stats count(`500`) by host
For more information, see Search literals in expressions.
The following example uses the timechart
command to count the events where the action
field contains the value purchase
.
| from my_dataset where sourcetype="access_*" | timechart count(eval(action="purchase")) BY productName usenull=f useother=f
distinct_count(<value>) or dc(<value>)
This function returns the count of distinct values in a field.
Usage
To use this function, you can specify distinct_count()
, or the abbreviation dc()
.
This function processes field values as strings.
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Basic examples
The following example removes duplicate results with the same host
value and returns the total count of the remaining results.
... | stats dc(host)
The following example generates the distinct count of the values in the devices
field. The results are returned in a field called numdevices
.
...| stats dc(device) AS numdevices
The following example counts the distinct client IP addresses for each host and category, and then bins the count for each result into five minute spans.
... | stats dc(clientip) BY host, categoryId | bin _time span=5m
Extended example
The following search counts the number of different customers who purchased something from the Buttercup Games online store yesterday. The search organizes the count by the type of product , such as accessories, t-shirts, and type of games, that customers purchased.
| from my_dataset where sourcetype="access_*" action=purchase | stats dc(clientip) BY categoryId
- This example first searches for purchase events,
action=purchase
. - These results are piped into the
stats
command and thedc()
function counts the number of distinct users who made purchases. - The
BY
clause is used to organize the distinct count based on the different category of products, thecategoryId
.
The results look like this:
categoryId | dc(clientip) |
---|---|
ACCESSORIES | 37 |
ARCADE | 58 |
NULL | 8 |
SHOOTER | 31 |
SIMULATION | 34 |
SPORTS | 13 |
STRATEGY | 74 |
TEE | 38 |
estdc(<value>)
This function returns the estimated count of the distinct values in a field.
Usage
This function processes field values as strings.
The string values 1.0
and 1
are considered distinct values and counted separately.
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Examples
The following example removes duplicate results with the same host
value in a field, and returns the estimated total count of the remaining results.
... | stats estdc(host)
The results look like this:
estdc(host) |
---|
6 |
The following example generates the estimated distinct count of the devices
field and renames the results field, numdevices
.
...| stats estdc(devices) AS numdevices
The following example estimates the distinct count for the values in the source
field for each sourcetype. The results are organized into 5 minute time spans.
... | stats estdc(source) BY sourcetype | bin 5m
estdc_error(<value>)
This function returns the theoretical error of the estimated count of the distinct values in a field.
The error represents this ratio:
- absolute_value(estimate_distinct_count - real_distinct_count) / real_distinct_count
Usage
This function processes field values as strings.
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Example
The following example determines the error ratio for the estimated distinct count of the values in the host
field.
... | stats estdc_error(host)
exactperc(<value>,<percentile>)
This function returns an exact percentile based on the values in a numeric field.
The exactperc
function provides the exact value, but is very resource expensive for high cardinality fields. The exactperc
function can consume a large amount of memory, which might impact how long it takes for a search to complete.
Usage
There are three percentile functions: exactperc
, perc
, and upperperc
. For details about the differences, see the perc function.
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Examples
See the perc function.
max(<value>)
This function returns the maximum value in a field.
Usage
This function processes field values as numbers if possible, otherwise processes field values as strings.
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
If the values in the field are non-numeric, the maximum value is found using lexicographical ordering.
Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.
- Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
- Uppercase letters are sorted before lowercase letters.
- Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.
Basic example
This example returns the maximum value of the size
field.
... | stats max(size)
Extended example
Calculate aggregate statistics for the magnitudes of earthquakes in an area
This search uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
Search for earthquakes in and around California. Calculate the number of earthquakes that were recorded. Use statistical functions to calculate the minimum, maximum, range (the difference between the min and max), and average magnitudes of the recent earthquakes. List the values by magnitude type.
source=all_month.csv place=*California* | stats count, max(mag), min(mag), range(mag), avg(mag) BY magType
The results look like this:
magType | count | max(mag) | min(mag) | range(mag) | avg(mag) |
---|---|---|---|---|---|
H | 123 | 2.8 | 0.0 | 2.8 | 0.549593 |
MbLg | 1 | 0 | 0 | 0 | 0.0000000 |
Md | 1565 | 3.2 | 0.1 | 3.1 | 1.056486 |
Me | 2 | 2.0 | 1.6 | .04 | 1.800000 |
Ml | 1202 | 4.3 | -0.4 | 4.7 | 1.226622 |
Mw | 6 | 4.9 | 3.0 | 1.9 | 3.650000 |
ml | 10 | 1.56 | 0.19 | 1.37 | 0.934000 |
mean(<value>)
This function returns the arithmetic mean of the values in a field.
The mean
values should be exactly the same as the values calculated using the avg()
function.
Usage
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Basic example
The following example returns the mean of the values in the kbps
field.
...| stats mean(kbps)
Extended example
This search uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
Run the following search to find the mean, standard deviation, and variance of the magnitudes of recent quakes by magnitude type.
source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType
The results look like this:
magType | count | mean(mag) | std(mag) | var(mag) |
---|---|---|---|---|
H | 123 | 0.549593 | 0.356985 | 0.127438 |
MbLg | 1 | 0.000000 | 0.000000 | 0.000000 |
Md | 1565 | 1.056486 | 0.580042 | 0.336449 |
Me | 2 | 1.800000 | 0.346410 | 0.120000 |
Ml | 1202 | 1.226622 | 0.629664 | 0.396476 |
Mw | 6 | 3.650000 | 0.716240 | 0.513000 |
ml | 10 | 0.934000 | 0.560401 | 0.314049 |
median(<value>)
This function returns the middle-most value of the values in a field.
Usage
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
If you have an even number of search results, the median calculation is the average of the two middle-most values.
Basic example
Consider the following list of values, which counts the number of different customers who purchased something from the Buttercup Games online store yesterday. The values are organized by the type of product (accessories, t-shirts, and type of games) that customers purchased.
categoryId | count |
---|---|
ACCESSORIES | 37 |
ARCADE | 58 |
NULL | 8 |
SIMULATION | 34 |
SPORTS | 13 |
STRATEGY | 74 |
TEE | 38 |
When the list is sorted the the middle-most value is 37.
categoryId | count |
---|---|
NULL | 8 |
SPORTS | 13 |
SIMULATION | 34 |
ACCESSORIES | 37 |
TEE | 38 |
ARCADE | 58 |
STRATEGY | 74 |
The middle-most value is returned when there are an odd number of results. When there are an even number of results, the average of the two middle-most numbers is returned.
min(<value>)
This function returns the minimum value in a field.
Usage
This function processes field values as numbers if possible, otherwise processes field values as strings.
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
If the values in the field are non-numeric, the minimum value is found using lexicographical ordering.
Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.
- Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
- Uppercase letters are sorted before lowercase letters.
- Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.
Basic examples
The following example returns the minimum size and maximum size of the HotBucketRoller component in the _internal index.
index=_internal component=HotBucketRoller | stats min(size), max(size)
The following example returns a list of processors and calculates the minimum cpu_seconds and the maximum cpu_seconds.
index=_internal | chart min(cpu_seconds), max(cpu_seconds) BY processor
Extended example
See the Extended example for the max() function. That example includes the min()
function.
mode(<value>)
This function returns the most frequent value in a field.
Usage
This function processes field values as strings.
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Examples
The mode
returns the most frequent value. Consider the following data:
firstname | surname | age |
---|---|---|
Claudia | Garcia | 32 |
David | Mayer | 45 |
Alex | Garcia | 29 |
Wei | Zhang | 45 |
Javier | Garcia | 37 |
When you search for the mode in the age
field, the value 45
is returned.
...| stats mode(age)
You can also use mode with fields that contain string values. When you search for the mode in the surname
field, the value Garcia
is returned.
...| stats mode(surname)
Here's another set of sample data:
_time | host | sourcetype |
---|---|---|
04-06-2020 17:06:23.000 PM | www1 | access_combined |
04-06-2020 10:34:19.000 AM | www1 | access_combined |
04-03-2020 13:52:18.000 PM | www2 | access_combined |
04-02-2020 07:39:59.000 AM | www3 | access_combined |
04-01-2020 19:35:58.000 PM | www1 | access_combined |
If you run a search that looks for the mode in the host
field, the value www1
is returned because it is the most common value in the host
field. For example:
... |stats mode(host)
The results look something like this:
mode(host) |
---|
www1 |
perc(<value>,<percentile>)
This function returns an approximate percentile based on the values in a numeric field.
There are three percentile functions. These functions return the nth percentile of the values in a numeric field. You can think of this as an estimate of where the top N% starts. For example, a 95th percentile says that 95% of the values in the field are below the estimate and 5% of the values in the field are above the estimate.
Valid percentiles are floating point numbers between 0 and 100, such as 99.95.
Differences between the percentile functions:
Function | Description |
---|---|
perc(<value>,<percentile>) | Use the perc function to calculate an approximate threshold, such that the values in a field fall below the percentile threshold you specify. The perc function returns a single number that represents the lower end of the approximate values for the percentile requested. |
upperperc(<value>,<percentile>) | When there are more than 1000 values, the upperperc function gives the approximate upper bound for the percentile requested. Otherwise the upperperc function returns the same percentile as the perc function. |
exactperc(<value>,<percentile>) | The exactperc function provides the exact value, but is very resource expensive for high cardinality fields. The exactperc function can consume a large amount of memory, which might impact how long it takes for a search to complete. |
The percentile functions process field values as strings.
The perc
and upperperc
functions give approximate values for the integer percentile requested. The approximation algorithm that is used, which is based on dynamic compression of a radix tree, provides a strict bound of the actual value for any percentile.
Usage
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
In the search results, the field names use this format: perc<percentage>(<value>)
regardless of the
Valid syntax
To use this function, you can specify perc()
using one of the following syntaxes:
Syntax | Example |
---|---|
perc(<value>,<percentile>) |
|
perc<percentile>(<value>) |
|
Differences between Splunk and Excel percentile algorithms
If there are less than 1000 distinct values, the Splunk percentile functions use the nearest rank algorithm. See http://en.wikipedia.org/wiki/Percentile#Nearest_rank. Excel uses the NIST interpolated algorithm, which basically means you can get a value for a percentile that does not exist in the actual data, which is not possible for the nearest rank approach.
Splunk algorithm with more than 1000 distinct values
If there are more than 1000 distinct values for the field, the percentiles are approximated using a custom radix-tree digest-based algorithm. This algorithm is much faster and uses much less memory, a constant amount, than an exact computation, which uses memory in linear relation to the number of distinct values. By default this approach limits the approximation error to < 1% of rank error. That means if you ask for 95th percentile, the number you get back is between the 94th and 96th percentile.
You always get the exact percentiles even for more than 1000 distinct values by using the exactperc
function instead of the perc
function.
Basic examples
Consider this list of values {10,9,8,7,6,5,4,3,2,1}
in the myfield
field.
The following example returns the 90th percentage for the values:
...| stats perc(myfield, 90)
The results look like this:
perc90(myfield) |
---|
9.1 |
The following example returns several percentages:
...| stats perc(score,95), perc(score,50), perc(score,25)
The results look like this:
perc95(myfield) | perc50(myfield) | perc55(myfield) |
---|---|---|
9.55 | 5.5 | 3.25 |
Extended example
Consider the following set of data, which shows the number of visitors for each hour a store is open:
hour | visitors |
---|---|
0800 | 0 |
0900 | 212 |
1000 | 367 |
1100 | 489 |
1200 | 624 |
1300 | 609 |
1400 | 492 |
1500 | 513 |
1600 | 376 |
1700 | 337 |
Let's use a dataset literal to create a temporary dataset from this data and then run the streamstats
command to create a cumulative total for the visitors.
| FROM [{hour: 0800, visitors: 0}, {hour: 0900, visitors: 212}, {hour: 1000, visitors: 367}, {hour: 1100, visitors: 489}, {hour: 1200, visitors: 624}, {hour: 1300, visitors: 609}, {hour: 1400, visitors: 492}, {hour: 1500, visitors: 513}, {hour: 1600, visitors: 367}, {hour: 1700, visitors: 337}] | streamstats sum(visitors) as 'visitors total'
The results from this search look like this:
hour | visitors | visitors total |
---|---|---|
0800 | 0 | 0 |
0900 | 212 | 212 |
1000 | 367 | 579 |
1100 | 489 | 1068 |
1200 | 624 | 1692 |
1300 | 609 | 2301 |
1400 | 492 | 2793 |
1500 | 513 | 3306 |
1600 | 376 | 3673 |
1700 | 337 | 4010 |
Let's add the stats
command with the perc
function to determine the 50th and 95th percentiles.
| FROM [{hour: 0800, visitors: 0}, {hour: 0900, visitors: 212}, {hour: 1000, visitors: 367}, {hour: 1100, visitors: 489}, {hour: 1200, visitors: 624}, {hour: 1300, visitors: 609}, {hour: 1400, visitors: 492}, {hour: 1500, visitors: 513}, {hour: 1600, visitors: 367}, {hour: 1700, visitors: 337}] | streamstats sum(visitors) as 'visitors total' | stats perc50('visitors total') perc95('visitors total')
The results from this search look like this:
perc50(visitors total) | perc95(visitors total) |
---|---|
1996.5 | 3858.35 |
The perc50
estimates the 50th percentile, when 50% of the visitors had arrived. You can see from the data that the 50th percentile was reached between visitor number 1996 and 1997, which was sometime between 1200 and 1300 hours. The perc95
estimates the 95th percentile, when 95% of the visitors had arrived. The 95th percentile was reached with visitor 3858, which occurred between 1600 and 1700 hours.
range(<value>)
Returns the difference between the maximum and minimum values in a field.
Usage
The values in the field must be numeric.
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Basic example
This example uses events that list the numeric sales for each product and quarter, for example:
products | quarter | sales | quota |
---|---|---|---|
ProductA | QTR1 | 1200 | 1000 |
ProductB | QTR1 | 1400 | 1550 |
ProductC | QTR1 | 1650 | 1275 |
ProductA | QTR2 | 1425 | 1300 |
ProductB | QTR2 | 1175 | 1425 |
ProductC | QTR2 | 1550 | 1450 |
ProductA | QTR3 | 1300 | 1400 |
ProductB | QTR3 | 1250 | 1125 |
ProductC | QTR3 | 1375 | 1475 |
ProductA | QTR4 | 1550 | 1300 |
ProductB | QTR4 | 1700 | 1225 |
ProductC | QTR4 | 1625 | 1350 |
It is easiest to understand the range
if you also determine the min
and max
values. To determine the range of sales by product, run this search:
... | stats sum(sales), min(sales), max(sales), range(sales) BY products
The results look something like this:
quarter | sum(sales) | min(sales) | max(sales) | range(sales) |
---|---|---|---|---|
QTR1 | 4250 | 1200 | 1650 | 450 |
QTR2 | 4150 | 1175 | 1550 | 375 |
QTR3 | 3925 | 1250 | 1375 | 125 |
QTR4 | 4875 | 1550 | 1700 | 150 |
The range(sales)
is max(sales)
minus min(sales)
.
Extended example
See the Extended example for the max() function. That example includes the range()
function.
stdev(<value>)
This function returns the sample standard deviation of the values in a field.
Usage
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
A sample standard deviation is different than a population standard deviation. A sample standard deviation is calculated from a larger population by dividing the number of data points by 1 less than the total data points.
For information about creating a population standard deviation, see the stdevp function.
Basic example
This example returns the standard deviation of fields that end in the word delay
which can apply to both, delay
and xdelay
. Because a wildcard is used, you must use single quotation marks to enclose the field name.
... | stats stdev('*delay')
Extended example
This search uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
Run the following search to find the mean, standard deviation, and variance of the magnitudes of recent quakes by magnitude type.
source=usgs place=*California* | stats count(), mean(mag), stdev(mag), var(mag) BY magType
The results look like this:
magType | count | mean(mag) | std(mag) | var(mag) |
---|---|---|---|---|
H | 123 | 0.549593 | 0.356985 | 0.127438 |
MbLg | 1 | 0.000000 | 0.000000 | 0.000000 |
Md | 1565 | 1.056486 | 0.580042 | 0.336449 |
Me | 2 | 1.800000 | 0.346410 | 0.120000 |
Ml | 1202 | 1.226622 | 0.629664 | 0.396476 |
Mw | 6 | 3.650000 | 0.716240 | 0.513000 |
ml | 10 | 0.934000 | 0.560401 | 0.314049 |
stdevp(<value>)
This function returns the population standard deviation of the values in a field.
Usage
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
A sample population deviation is different than a sample standard deviation. A population standard deviation is calculated from the entire population.
For information about creating a sample standard deviation, see the stdev function.
Basic example
The following example calculates the population standard deviation of the field lag
.
... | stats stdevp(lag)
sum(<value>)
This function returns the sum of the values in a field.
Usage
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Examples
You can create totals for any numeric field. For example:
...| stats sum(bytes)
The results look something like this:
sum(bytes) |
---|
21502 |
You can rename the column using the AS keyword:
...| stats sum(bytes) AS "total bytes"
The results look something like this:
total bytes |
---|
21502 |
You can organize the results using a BY clause:
...| stats sum(bytes) AS "total bytes" by date_hour
The results look something like this:
date_hour | total bytes |
---|---|
07 | 6509 |
11 | 3726 |
15 | 6569 |
23 | 4698 |
sumsq(<value>)
This function returns the sum of the squares of the values in a field.
The sum of the squares is used to evaluate the variance of a dataset from the dataset mean. A large sum of the squares indicates a large variance, which tells you that individual values fluctuate widely from the mean.
Usage
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Example
The following table contains the temperatures taken every day at 8 AM for a week.
You calculate the mean of the these temperatures and get 48.9 degrees. To calculate the deviation from the mean for each day, take the temperature and subtract the mean. If you square each temperature, you get results like this:
day | temperature | mean | deviation | square of temperatures |
---|---|---|---|---|
sunday | 65 | 48.9 | 16.1 | 260.6 |
monday | 42 | 48.9 | -6.9 | 47.0 |
tuesday | 40 | 48.9 | -8.9 | 78.4 |
wednesday | 31 | 48.9 | -17.9 | 318.9 |
thursday | 47 | 48.9 | -1.9 | 3.4 |
friday | 53 | 48.9 | 4.1 | 17.2 |
saturday | 64 | 48.9 | 15.1 | 229.3 |
Take the total of the squares, 954.9, and divide by 6 which is the number of days minus 1. This gets you the sum of squares for this series of temperatures. The standard deviation is the square root of the sum of the squares. The larger the standard deviation the larger the fluctuation in temperatures during the week.
You can calculate the mean, sum of the squares, and standard deviation with a few statistical functions. Let's use a dataset literal so that you can try this on your own:
FROM [{day: "sun", temp: 65}, {day: "mon", temp: 42}, {day: "tue", temp: 40}, {day: "wed", temp: 31}, {day: "thu", temp: 47}, {day: "fri", temp: 53}, {day: "sat", temp: 64}] | stats mean(temp), sumsq(temp), stdev(temp)
This search returns these results:
mean(temp) | sumsq(temp) | stdev(temp) |
---|---|---|
48.857142857142854 | 17664 | 12.615183595289349 |
upperperc(<value>,<percentile>)
This function returns an approximate upper bound percentile value of a numeric field that contains more than 1000 values.
The upperperc
function returns an estimate of where the top N% starts. For example, a 95th percentile says that 95% of the data points are below the estimate and 5% of the data points are above the estimate.
Usage
There are three percentile functions: exactperc
, perc
, and upperperc
. For details about the differences, see the perc function.
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Examples
See the perc function.
var(<value>)
Returns the sample variance of the values in a field.
Usage
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Examples
Variance is often used to determine the standard deviation of a set of values.
The following table contains the temperatures taken every day at 8 AM for a week.
day | temperature |
---|---|
sunday | 65 |
monday | 42 |
tuesday | 40 |
wednesday | 31 |
thursday | 47 |
friday | 53 |
saturday | 64 |
Let's use a dataset literal so that you can try this on your own:
FROM [{day: "sun", temp: 65}, {day: "mon", temp: 42}, {day: "tue", temp: 40}, {day: "wed", temp: 31}, {day: "thu", temp: 47}, {day: "fri", temp: 53}, {day: "sat", temp: 64}] |stats mean(temp)
You calculate the mean, which is the average, of the these temperatures and get 48.9 degrees.
mean(temp) |
---|
48.857142857142854 |
To calculate the deviation from the mean for each day, take the temperature and subtract the mean.
day | temp | mean | deviation |
---|---|---|---|
sunday | 65 | 48.9 | 16.1 |
monday | 42 | 48.9 | -6.9 |
tuesday | 40 | 48.9 | -8.9 |
wednesday | 31 | 48.9 | -17.9 |
thursday | 47 | 48.9 | -1.9 |
friday | 53 | 48.9 | 4.1 |
saturday | 64 | 48.9 | 15.1 |
To calculate the variance, square each deviation and average the result. Or just run this search:
FROM [{day: "sun", temp: 65}, {day: "mon", temp: 42}, {day: "tue", temp: 40}, {day: "wed", temp: 31}, {day: "thu", temp: 47}, {day: "fri", temp: 53}, {day: "sat", temp: 64}] |stats mean(temp), var(temp), stdev(temp)
This search returns these results:
mean(temp) | var(temp) | stdev(temp) |
---|---|---|
48.857142857142854 | 159.14285714285748 | 12.615183595289349 |
The larger the standard deviation the larger the fluctuation in temperatures during the week.
Extended example
See the Extended example for the mean() function. That example includes the var()
function.
varp(<value>)
Returns the population variance of the values in a field.
Usage
You can use this function with the stats
, eventstats
, streamstats
, and timechart
commands.
Basic example
See also
- Function information
- Overview of SPL2 stats and chart functions
- Quick Reference for SPL2 Stats and Charting Functions
- Naming function arguments in the SPL2 Search Manual