The Basics of Big-O and Sorting Algorithms
It describes the asymptotic nature of a system as the input grows. Usually, we are describing the runtime of an algorithm, but it can also be used to describe space complexity or even systems outside the realm of computer science. Here I will assume it describes the runtime of an algorithm unless stated otherwise. Asymptotic analysis means that you focus on how the runtime of the algorithm grows as the input grows and approaches infinity. The input is usually denoted as n. As our datasets get larger, it is the growth function that will be the dominating factor of runtime. For example, O(n) suggests that the runtime changes linearly with the input n. O(n^2 ) suggests that the runtime changes proportionally to the size of the input squared. With large datasets, this is usually enough information to tell you which one will be faster.
When you analyze asymptotic characteristics of a function, you want to discount added constants, as well as drop coefficients and lower-order terms. For example, f(n) = 100 * n * lg(n) + n + 10000 is described as O( n * lg(n) ), because as n goes to infinity, the constant and the coefficient become insignificant. Check these graphs out:

As you can see, in small datasets, big O notation may not be telling the whole story, but even going from 50 to 500 data points in the above graph made the asymptotic nature of the functions take over.
Certainly, there are merits to optimizing code within a certain asymptotic runtime to maximize performance, but that should be done only after the proper asymptotic algorithm is being used.
Although it seems that colloquially “big-O” is used to mean the asymptotic runtime which describes a function’s ‘tight asymptotic’ nature, (ie describes both upper and lower bound), this isn’t actually accurate. The formal definitions of asymptotic runtime are briefly described below [1]:
Big ϴ (theta) :
Formally, for ϴ( g(n) ) to describe a function f(n), there exist positive constants c1, c2, and n_o such that 0