Speaking of worst-case (or best-case, average-case, etc.) scenarios, how does big O notation relate? The variables inside an O() notation as far as I know refer only to the size of the input, so when we say that finding a value in an unsorted list is in O(n), we're referring to the worst-case scenario (obviously, finding a value in a list when that value is the head of the list is constant time, and not very interesting). Of course, that's a simplistic example, but with more complex algorithms like Quicksort, when we say it's in O(n log n) we're talking about average-case. Is this just because we know that worst-case performance in Quicksort is exceedingly rare so we don't bother mentioning that O(n log n) is average-case unless we're studying it more deeply?
Big-O (or Ω or Θ) notation is orthogonal to notions of best-case or worst-case, which are merely constraints on the set of inputs we're considering. If, for example, you ask "what is the upper bound on best-case running time for linear search," then what you're actually asking is "given an input sequence that begins with the item being searched for, what is the upper bound on the running time." I think part of the confusion stems from the fact that when people supply bounds on best- or worst-case running times for deterministic algorithms, they often use Big-O notation when they're actually giving an asymptotically tight bound. So in the case of linear search's best case, O(1) is an upper bound, but the lower bound is of the same order, i.e. Ω(1). It's not incorrect, but it can be misleading.
In the case of Quicksort, I think the answer to your question is yeah, pretty much. I mean, cases where you're dealing with nearly-sorted input are common in some domains and in that case you should be very much aware of the worst-case behavior of Quicksort, but otherwise I think the discussion from CLRS puts things in perspective: even if you had the bad luck of only ever receiving input sequences such that every partitioning led to a 99-to-1 split, then your running time is still going to be O(n lg n), since your recursion tree is going to be at most log-base-100/99(n) levels tall, which is the same as lg n / (lg 100/99), the denominator there being a constant factor that disappears in an asymptotic context.
The average case for finding a value in an unsorted list is also O(n). Assuming the values you want are randomly distributed, on average you have to look at half the list. Naively this sounds like O(n/2), but O(n/2) is actually the same as O(n) because you strip out the constant factor of 1/2.
Of course. But it's also the worst case. I'm really just wondering if there's a "default" scenario that's being referred to when we just say "f(n) is in O(n)" or does it depend on context?
It depends on context. Generally if people fail to specify, they mean average case, as in "Quicksort takes O(n*lg n) time". There may even be some amortization shenanigans thrown in there, as in "list.append takes O(1) time".
Quicksort is the exception, in my experience big-O usually refers to worst case running time as opposed to average case or expected running time. CLR(S) [1] sticks to worst case because 1) worst case is a guarantee; 2) the worst case can be frequent; 3) the average case is often roughly as bad as the worst case. This is a widely used textbook, so I generally assume a lot people follow its conventions.
It's also not obvious what "average case" means. For example, if we're talking about the asymptotic behavior of the function
A(n) = avg { RunningTime(Algo, x) : x is valid input to Algo and len(x) = n }
are we assuming that the inputs are drawn uniformly and therefore have equal weight or are we drawing them from a non-uniform distribution? What if some inputs never occur in practice?
Worst case and best case are totally unambiguous and don't need any additional clarification. There are just so many knobs one can adjust when defining "average".
First, when we talk about Big-O we're not talking about the "worst case scenario." Big-O gives us an upper bound on the worst case scenario, but the actual worst case scenario might be better. Big-O means "no worse than" not "as bad as."
When most people say Big-O they really mean Big-Θ, which does encapsulate the idea of "asymptotically equivalent, up to a constant." See my answer on Quora for more technical details.
Second, Big-O and other forms of notation used in the asymptotic analysis of functions were invented before physical computers existed. They're statements about pure functions.
When applied to the analysis of algorithms the function we're "really" analyzing isn't the algorithm. Rather, if we have an algorithm A that takes as its input a positive integer n, we're really analyzing the function "the length of time it takes algorithm A to run given input n."
The up-to-a-constant nature of Big-O notation is nice because that constant can encapsulate things like processor speed, memory access times, and so forth. This enables us to make intelligent statements about algorithms per se without reference to the underlying machine on which the algorithm might be implemented.
Even with ideal functions, this naïve asymptotic analysis has some problems. For a toy example, imagine a spiky function like this:
f(n) = 800*n^2 if n is divisible by 1000000000
f(n) = 400*n otherwise
This function is not O(n) but it is O(n^2). The "worst case" behaves like O(n^2), but for "most inputs" it behaves like O(n). We can't say "f(n) is asymptotically no worse than n, up to a constant" because for infinitely many inputs it is.
Lots of algorithms behave like this in practice because we optimize for common cases perhaps at the expense of less common cases. "Common" is dictated by how our algorithm is used.
Taking quicksort as an example, let's call the algorithm Q. We want to measure its running time given an input of length n. For input x, let's say it's running time is T(x).
Well, there are many x such that len(x) == n, so what does it even mean to say "its running time given an input of length n?" Are we given a particular input of length n? A uniformly-selected-but-random input of length n? To the extent that we can, we want to be making statements about the algorithm per se, not statements about the algorithm given a particular input.
On way to answer this to ask "Given an input of length n, what's the most time my algorithm could take?" In that case we're analyzing the following function:
W(n) = max { T(x) | x is valid input and len(x) == n }
On the other hand, maybe we care more about the average case. Perhaps we randomly pick 1,000 inputs of length n and average the running time. Now we're talking about something that looks more like a probability distribution than a discrete thing like "running time" because we've sampled the input space.
And in fact, we could calculate "expected running time given input n" in this way and graph that. We could then make Big-O like statements about that new function, which is the kind of thing folks mean when they talk about "average case."
It sounds to me like you're mixing up two things: "worst case performance" and "asymptotic performance." You say
> Big-O is concerned with worst case performance. Colloquially, f∈O(g) means that "Eventually, f performs no worse than g."
Aren't these two different concepts? Worst case performance deals with the worst possible input of any given input size (like pathological inputs to Quicksort), while asymptotic performance is talking about sufficiently large inputs (like the vertical lines you've drawn on the graphs).
When I say that merge sorting a set of n elements is in O(n lg n), I'm saying that there's some value on the n axis beyond which n lg n >= MergeSort(n). But when I say that Quicksort is O(n^2) in the worst case, it's as if I'm talking about another other function called WorstQuicksort which when given a set of n items always takes as long as Quicksort would take to sort the most pathological set of n items, and there is some value on the n axis beyond which n^2 >= WorstQuicksort(n).
Asymptotic analysis only makes sense in the context of functions whose inputs are real numbers (possibly integers) and whose outputs are real numbers. When we want to do asymptotic analysis of algorithms we need to derive functions amenable to this analysis and for a given algorithm there is more than one derived function we might care to look at.
But in any Big-O situation the person is always, always, always talking about a function from the real line to the real line, with perhaps many layers of confusion and equivocation in between. You should be able to suss out what function they're "really" talking about.
Phrases like "worst case", "average case", and "best case" are shorthand for three of the most common derived functions.
If you want, think of a function T which takes as its input an algorithm and an input to that algorithm and returns its running time.
T(QuickSort)(x) = running time of QuickSort given x as its input
then
W(QuickSort)(n) = max { T(QuickSort)(x) : length(x) == n }
We're then in the business of doing asymptotic analysis on W(QuickSort), A(QuickSort), and B(QuickSort).
Let's be precise. I'm being more precise here in my comment than I was on Quora.
Big-O and related notations are ways of categorizing functions. O(n^2) for example is actually a set of functions, which is why I wrote f ∈ O(n^2) rather than something like f = O(n^2) or f(n) = O(n^2). That is, f is a member of some set of functions which all satisfy a particular, precisely-defined property.
To understand that property, first, let's get rid of the idea of "performance" because asymptotic analysis has nothing to do with "performance" per se and predates even the first precise definitions of things like "algorithm" or "computability." The notation itself was invented in the late 19th century.
Instead, let's just talk about "upper bounds." If we have a function it's easy to talk about upper bounds. For example,
f(x) = sin(x)
is bounded above by 1, 1.5, 10, 100, 80457, and an infinitude of other numbers for any real number x. It's bounded below by -1.
Now, in this case, it's east for us to see that not only is
sin(x) <= 1 for all real x
but also that
max { sin(x) : x is real } = 1
So in this sense the upper bound of 1 is strict. 2 is also an upper bound in the sense that
sin(x) <= 2 for all real x
but it's not strict. There are other upper bounds which are strictly smaller than 2, e.g., 1.5. So, we can say that "the value of sin(x) for real x is no greater than 2," but we can't say that it "is" 2.
So, to answer your point before diving deeper, Big-O is about "worst case performance" in this sense. By itself it doesn't tell us what the worst case performance is. Instead, it gives us an upper bound on the worst case performance. It says "the worst case performance is no worse than FOO." The actual worst case performance might be better.
Big-Θ is the asymptotic equivalent to "this is a tight upper bound."
I'll skip further development of this for now and jump back to the issue of algorithms. The issue is this: given an algorithm with input of length N, we want to say something about how long it takes to run.
This means that the function we're analyzing isn't "QuickSort(n)". What does that even mean? The input of QuickSort is an array of integers and it returns a sorted array of integers. How can an array of anything be greater than or equal to n^2? So that's one way in which the CS vernacular equivocates -- we're not really talking about QuickSort we're talking about some other function:
T(n) = the amount of time it takes QuickSort to run given an input of length n
We're then talking about bounds on this other function T, asymptotic or otherwise.
But now we're in a pickle because what does "the amount of time it takes QuickSort to run given an input of length n" mean? There are many inputs of length n. If we're talking about just arrays of integers of length n, there are n! if all we care about is relative ordering and not the actual values in the array. If we care about the actual values in the array then there are an infinitude of inputs of length n.
There are a few ways we can handle this. Let's re-define T(n) like so:
T(x) = the amount of time it takes QuickSort to run given input x
One way is the "worst case" method. This says, ok, look at this function:
W(n) = max { T(x) : x is a valid input to QuickSort and len(x) == n }
We can now do Big-O, bounds, asymptotic analysis, etc. on W(n). This is what we mean when we say the worst case is O(n^2). It means W ∈ O(n^2).
Another way is the "average case" method. This says, ok, look at this function:
A(n) = avg { T(x) : x is a valid input to QuickSort and len(x) == n }
This is tricky if there are in principle an infinite number of valid inputs of a given length. There are various ways of handling this issue. For something like QuickSort we can see that it's really only the ordering that matters, i.e., for the purposes of QuickSort [1,10,5] is the same operation-wise as [-50, 80, 0], so there are only n! inputs we really need to check for a given n.
Yet another way is the "best case" method, which looks at
B(n) = min { T(x) : x is a valid input to QuickSort and len(x) == n }
So, given an algorithm we can derive these three functions and then answer Big-O questions about them. We're never answering Big-O questions about the algorithm per se, although we can get away with equivocating when W(n), A(n), and B(n) are always equal or it's obvious we only care about one of them.
For simple examples this is often the case, e.g., calculating the nth Fibonacci number in the standard iterative way has best, average, and worse case performance of O(n).
To make matters worse, most people say Big-O but mean Big-Θ, or at the very least aren't clear when they mean one or the other. So, when one says "worst case performance" and we have W(n), A(n), and B(n) all being the same, it can be particularly confusing.
Depending on the algorithm in question which it might be understood what we care about one more than the others. For example, if worst case inputs are particularly pathological we might talk as if we mean the performance of the algorithm per se but really be talking about A(n). However, if "bad" inputs are common we might really be talking about W(n).
f∈Θ(g) is equivalent to f∈O(g)∧g∈O(f). This isn't really a "tight upper bound." For your spikey function f above, we have f∈O(n^2) and f∈Ω(n). These bounds are tight in the sense that there are no strictly tighter bounds. That is, there exists no function g with f∈O(g)∧g∈o(n^2) (there is no function asymptotically smaller than n^2 that bounds f from above). So for this f one could reasonably consider n^2 to be a "tight upper bound", although one would need to pick a spikey bounding function to say anything involving f and Θ.
The problem of having an infinite number of inputs of size n is usually not a problem, because there are finitely many bit strings of each length. If an algorithm uses subroutines that act on input objects of unbounded length (like comparing arbitrary-precision integers), and you are only interested in the bound for the number of subroutine calls, then there might be some trouble with the notion of an average case across a fixed input size. This is a bit silly though; it's more a way to define "fixed input size" into describing "unboundedly large input size" than something I would actually want to do for some useful purpose.
Er, well, I was using the function as an example of something which is f∈O(n^2) but not f∈Θ(n^2) under the assumption that when the OP reads the symbols "f∈O(n^2)" he's really thinking something more like f∈Θ(n^2).
Right, yes, that's the main thing to understand and it's also the thing that most explanations obscure. I tried to be clear about that, but wasn't clear enough. See my follow-up comment.
Speaking of worst-case (or best-case, average-case, etc.) scenarios, how does big O notation relate? The variables inside an O() notation as far as I know refer only to the size of the input, so when we say that finding a value in an unsorted list is in O(n), we're referring to the worst-case scenario (obviously, finding a value in a list when that value is the head of the list is constant time, and not very interesting). Of course, that's a simplistic example, but with more complex algorithms like Quicksort, when we say it's in O(n log n) we're talking about average-case. Is this just because we know that worst-case performance in Quicksort is exceedingly rare so we don't bother mentioning that O(n log n) is average-case unless we're studying it more deeply?