Hypothesis testing has some similarities with one of the simplest methods of proof from mathematical logic: Proof by contradiction. Hypothesis testing can be thought of as being the probabilistic counterpart of proof by contradiction.
Let us refresh our memories first about this method. How does proof by contradiction work?
Let’s say we want to prove a statement, call it A. When we use proof by contradiction, we start by assuming the opposite of A (denoted ~A or not A) as being true, and then we try to reach to a contradiction (a statement that is always false). If we reach a contradiction, then this means that our assumption that ~A is true, should be false, and therefore A should be true.
For example, let’s prove by contradiction that there is no smallest positive rational number. Now A is the statement that we want to prove: “there is no smallest rational number greater than 0”. We start by assuming the opposite of A being true, ~A: “there exists a rational number greater than 0 that is the smallest, call it r”. But if r is a rational number > 0, this means that r/2 is also a rational number > 0. Moreover, r/2 is smaller than r. But this contradicts our assumption, ~A, that r is the smallest rational number > 0. This means that ~A is false, and therefore, A is true.
A downside of this method, and of pure mathematical proofs in general, is that it cannot be directly applied in many real-world situations. Let’s say you want to show that a certain type of medical treatment is effective. Can you use a pure mathematical proof to show that? No. There are too many unknowns and too much uncertainty in such situations to be able to show things in a purely mathematical way; that is to prove with no doubt that a certain thing is true.
But you can show that a certain thing has some probability of being true. That’s what statistics do, and there is this method in statistics called hypothesis testing that, in my opinion, borrowed the idea of a proof by contradiction and added a probabilistic dimension to it.
The idea of hypothesis testing is similar to that of proof by contradiction, but this time ~A is called the “null hypothesis” and denoted by H₀, and A is called the “alternative hypothesis” and denoted by H₁. Reaching a contradiction (that is, obtaining a result that cannot be true) is now reaching a result that can be true with low probability. How low this probability should be? An often-used value is 0.05, but it can also be something else, preferably smaller, depending on how confident you want to be of your result; this threshold value is called a significance level and is denoted by α. So, if the probability of your observed data given the null hypothesis is ≤ α, then you consider this probability being too low for H₀ to be true; so, you will reject H₀ and therefore accept H₁. In the case of hypothesis testing, instead of proving H₁ (that is, with 100% confidence; this is the case for pure mathematical proofs), you are 1-α confident that H₁ is true.
Note that we can accept only H₁, not H₀. Just because we may fail to reject H₀ sometimes, this does not mean that we accept it. As in the case of proof by contradiction, just because we may not have a good idea of how to reach a contradiction once we assumed ~A to be true, this does not necessarily mean that ~A is true.
Concluding that H₀ is true based on the initial assumption that H₀ is true would be a paradox, therefore we can only reject H₀.
This is my opinion regarding these 2 methods that may seem very distinct at a glance, as they are applied in vastly different situations, but in fact, they are both based on similar logical ideas.
I hope you found this story interesting and thanks for reading!
This article is also posted on my own website here. Feel free to have a look!
A theorem by two mathematical greats, which anybody, no maths experience required, can understand! An amazing and beautiful result accompanied by some pretty hand-coded visualisations!
Regardless of your experience with math, this theorem can be understood, and the elegance of the solution appreciated. Throughout I will both use diagrams to explain concepts where possible. Where I use algebra notation like ‘x’ and ‘y’, I will also give concrete examples using real numbers so you can understand what is going on even if you aren’t comfortable with algebra.
Simple diagrams to explain complicated language
Before I state the problem, I want to build some understanding. That’s because the terms can be confusing to people who haven’t done much math, but by taking a little more time we can understand the problem without previous knowledge of any complicated language. If you are familiar with the terms used (in bold), then you can skip to the next section.
Sequence
An example sequence of numbers
First, we learn what a sequence of numbers is. (1,2,3,4,5) is a sequence of numbers. (2,9,-1,23,12,24,-5) is another sequence of numbers. A sequence of numbers is simply a list of numbers, where you can ask what the first number is, what the second number is, and so on. For example, in the sequence (2,9,-1,23,12,24,-5), the first number is 2, the second number is 9, the third number is -1, the fourth number is 23, the fifth number is 12, the sixth number is 24 and the seventh number is -5. Then the sequence ends.
(2,-1,24,-5) is a subsequence of (2,9,-1,23,12,24,-5)
Second, we learn what a subsequence of numbers is. A subsequence of numbers takes a sequence, and then only looks at some numbers in the sequence. This is like going shopping in a supermarket: you don’t have to take every item in an aisle, you can instead go down the aisle, and take a couple of elements out. For example, a subsequence of (1,2,3,4,5) is (1,4,5); a subsequence of (2,9,-1,23,12,24,-5) is (2,-1,24,-5). In each case, we simply chose a few numbers, from left to right, from the original sequence.
(2, -1, -5) is a decreasing subsequence
Finally, we learn what an increasing subsequence, and decreasing subsequence are. (1,3,5) is an increasing subsequence of (1,2,3,4,5), as 1≤3≤5. A decreasing subsequence of (2,9,-1,23,12,24,-5) is (2,-1,-5), as 2≥-1≥-5.
(1,3,5) is an increasing subsequence
Note that while some subsequencesare increasing, and some are decreasing, others are neither. Consider the sequence (1,2,3,4,5,0). Then (1,4,0) is a subsequence which is neither increasing (as 4≥0, as 0 is after 4 in the sequence), not is it decreasing (as 1≤ 4 and 4 is after 1 in the sequence).
To use our supermarket analogy, if you go down the aisle, and every new item you pick is more expensive than the last, then you have picked an increasing sequence. If you go down the aisle, and every new item is less expensive than the last, then you have picked a decreasing sequence. And, finally, if you go down the aisle, and sometimes the next item you pick is more expensive, and sometimes less expensive, then the subsequence you have picked is neither increasing, nor is it decreasing.
Statement of the Theorem
Suppose we have the numbers 1,2,3,…N, but arranged in any order. For example, if N=5, we have the numbers 1,2,3,4,5 and we allow them to be arranged in any order; such as (3,4,1,2,5)
An example of arranging (1,2,3,4,5) in a random order
The Erdos-Szekeres theorem tells us that there either an increasing subsequence of length root N, or there is a decreasing subsequence of root N.
Suppose N = 3² = 9. Then the Erdos-Szekeres sequence tells us that we have the numbers (1,2,3,…,9) arranged in any order, we can either find an increasing subsequence of at least length 3, or a decreasing subsequence of at least length 3. In the sequence (9,3,2,6,1,4,5,8,7) you can check for yourself that this is indeed the case; e.g. (2,4,5,7) is an increasing subsequence of length 4. A visual illustration of this example is below.
What if N isn’t a perfect square? Then we look at the square root rounded up. If N = 50, then the square root of 50 is about 7.07, so we round upwards to 8, and the theorem tells us that we are guaranteed either an increasing subsequence of length at least 8, or a decreasing subsequence of length at least 8. However, we will only look at the case where N is a perfect square because the logic is essentially the same in both cases, just the algebra is a little messier when N is not a square number.
Proof
For each number in the sequence, consider the longest increasing subsequence which ends at that number, and longest decreasing subsequence which ends at that number.
Consider two points in the sequence, x and y, with x earlier in the sequence than y. If x is larger than y, i.e. x>y, then for any decreasing sequence which ends at x, we can make it longer by making it end at y. This remains a decreasing sequence, as y is smaller than x. The diagram below illustrates this
For instance, if x=10, and y=5, the decreasing sequence (16,14,13,11,10) which ends at x, can be made longer by including y on the end, (16,14,13,11,10,5), and this remains a decreasing subsequence. This guarantees that the longest decreasing subsequence ending at y is longer than the longest decreasing subsequence ending at x, provided x>y.
y is smaller than x; therefore we can extend a decreasing subsequence ending at x to include y.
If y is larger than x, i.e. y>x then we get a very similar result. If we have an increasing subsequence ending at x, we can make it longer by including y, which remains an increasing subsequence as y>x. This guarantees that the longest increasing subsequence ending at x is shorter than the longest increasing subsequence ending at y, provided y>x.
y is greater than x: therefore we can extend an increasing subsequence ending at x.
Note, either y>x or x>y, as all the numbers in our sequence are different.
We will be using the following notation in the next sections, so please read the next two paragraphs carefully.
Let Inc[x] denote the length of the longest increasing subsequence ending at x, and let Dec[x] denote the length of the longest decreasing subsequence ending at x. We use the same notation for y. E.g. in (1,2,3,0,4), Inc[1] = 4, as the longest increasing subsequence starting at 1 is (1,2,3,4) which is length 4. To help you remember, ‘Inc’ is an abbreviation of ‘Increasing’, and ‘Dec’ is an abbreviation of ‘Decreasing’
It is impossible that both Inc[x] = Inc[y] and Dec[x] = Dec[y]. This is because, if x>y then Dec[x] < Dec[y], and if xy, we can create a longer decreasing subsequence ending at y than x, and when x
The Hardy space has three equivalent definitions. Each of them emphasizes a different aspect of the space and all three of these aspects…
G.H. Hardy (1877–1947) and David Hilbert (1862–1943), whom Hardy- and Hilbert spaces are named after
The Hardy space has three equivalent definitions. Each of them emphasizes a different aspect of the space and all three of these aspects play a role in our discussion. But we will follow here the easiest approach so that the beginner can understand its logic thoroughly.
Let D be the open unit disc in the complex plane and T the unite circle in the complex plane. Then the Hardy-Hilbert space H²(D) is the set of all analytic functions whose Taylor coefficients are square summable. It is a Hilbert space which is isomorphic to the sequence space l², which is the space of all complex sequences which are square summable.
It can be seen as the closed subspace of the Lebesgue space L²(T) on the unite circle whose Fourier coefficients for negative indices vanishes. It is a reproducing kernel Hilbert space. The Hardy-Hilbert space is a reproducing kernel Hilbert space. It means that for every function in H²(D) there is a unique vector in H²(D) such that the value of the function at every point of D is equal to the inner product of that function and reproducing kernel. Its proper definition is given in the following;
In above definition if we replace X by D, then we get the standard Hardy-Hilbert space on the unit disc.
One thing is important to note that not all analytic functions are in H²(D). Let us take the example f(z)=1/1-z which is analytic on D, but does not belong to H²(D).
We introduce the class of composition operators, which are operators inducedby analytic functions mapping the unit disk into itself. We discuss some of the
main properties of such operators. For this topic we follow the book “An Introduction to Operators on the Hardy-Hilbert space”.
There are several points which are important about this operator. First of all is that one can check very easy that the composition operator is a linear operator. It is not obvious that the image of composition operator belong to H²(D). The product of two composition operators is again a composition operator. The following theorem verifies it;
We now state a result which show that the composition operator on H²(D) is a well defined bounded operator.
The following is an immediate corollary of the above theorem.
The theory of composition operators is also related to the concept of complex analysis. Here is the subordinate principal of complex analysis.
The following theorem called Littlewood’s Subordination Theorem which shows that the image of composition operator belong H²(D)=H².
Reproducing kernel as defined above provide information about composition operators. The following lemma shows that the adjoint of composition operator maps the set reproducing kernels onto reproducing kernels.
The following theorem provide an information about the lower and upper bound for the composition operators;
It is trivial that the norm of composition operator is 1 if and only if its symbol fix the origin.
Characterization of Composition Operators
Composition operators are characterized as those operators whose adjoints
map the set of reproducing kernels into itself.
There are other interesting characterizations of composition operators which are given in the following results.
An operator T on a Hilbert space is called normal if T*T=TT* where T* is called the adjoint of T.
There are very few normal composition operators which are characterize by its symbol.
There are a fair number of composition operators which are given below.
Invertibility of Composition Operators
We know that the conformal map of the unit disc D onto itself are the functions of the form
The following theorem provide necessary and sufficient condition for the invertibility of composition operators which is connected with the symbol of it.
Spectra of Composition Operator
Some information about spectra of composition operators is easily obtainable in the cases in which the inducing function has a fixed point.
Open Problem
What is the adjoint of the composition operator whose symbol is any analytic function mapping D onto D?
Arguably the most important unsolved problem in pure mathematics today is to prove (or disprove) the Riemann hypothesis, which is intimately connected to the distribution of prime numbers. One of the basic techniques needed to understand the problem is called analytic continuation, which is the topic of this article. Analytic continuation is a technique from a branch of mathematics called complex analysis used to extend the domain over which a complex analytic function is defined.
Figure 1: Illustration of the application of the analytic continuation technique to the natural logarithm (imaginary part) (source).
Before introducing the technique, I will briefly explain some important mathematical concepts that will be needed.
Taylor Series
Suppose we want to find a polynomial approximation to some function f(x). A polynomial is a mathematical expression formed by variables and coefficients. They involve the basic operations (addition, subtraction, and multiplication) and contain only non-negative integer exponents of the variables. A polynomial in one variable x of degree n can be written as:
Equation 1: Polynomial in one variable x and degree n. Figure 2: Graphs of polynomials of degree 3 and 4 (source).
Now suppose the polynomial has an infinite degree (it is given by an infinite sum of terms). Such polynomials are called Taylor series (or Taylor expansions). Taylor series are polynomial representations of functions as infinite sums of terms. Each term of the series is evaluated from the values of the derivatives of f(x) at one single point (around which the series is centered). Formally a Taylor series around some number a is given by:
Equation 2: A Taylor series of a function f(x) around a number a.
where the upper indices (0), (1), … indicate the order of the derivative of f(x) as x=a. One can approximate a function using a polynomial with only a finite number of terms of the corresponding Taylor series. Such polynomials are called Taylor polynomials. In the figure below, several Taylor polynomials for the function f(x) = sin x with an increasing number of terms (hence, increasing degrees) are shown.
Figure 3: Taylor polynomials with an increasing number of terms are shown. The black curve is the function sin(x). The other approximations are Taylor polynomials of degree 1, 3, 5, 7, 9, 11, and 13 (source).
The first four Taylor polynomials for f(x) = sin x are given by:
Equation 3: Taylor polynomials for f(x) = sin x with degrees 1, 3, 5 and 7. They are plotted in the figure above (together with higher-order expansions).
Convergence
The concept of convergence of infinite series will also be crucial in our discussion of the analytic continuation. A mathematical sequence is a list of elements (or objects) with a particular order. They can be represented as follows:
Equation 4: Infinite sequence of numbers.
A well-known example of a sequence is the Fibonacci sequence0,1,1,2,3,5,8,13,21,34,55,… where each number is the sum of the two preceding ones.
Figure 4: A tiling with squares that have side lengths equal to successive Fibonacci numbers (source).
One builds a series by taking partial sums of the elements of a sequence. The series of partial sums can be represented by:
Equation 5: Infinite sequence of partial sums.
where:
Equation 6: The values of the partial sums in Eq. 5.
An example of a series, the familiar geometric series, is shown below. In a geometric series, the common ratio between successive elements is constant. For a ratio equal to 1/2 we have:
Equation 7: The geometric series with common ratio = 1/2.
Fig. 5 shows pictorially that the geometric series above converges to twice the area of the largest square.
Figure 5: A pictorial demonstration of the convergence of the geometric series with common ration r=1/2 and first term a=1 (source).
A series such as in Eq. 6 is convergent if the sequence Eq. 5 of partial sums approaches some finite limit. Otherwise, the series is said to be divergent. An example of a convergent series is the geometric series in Eq. 7. An example of a divergent series is:
Equation 8: An example of a divergent series is the so-called harmonic series.
It is easy to see that the harmonic series diverges by comparing it with the integral of the curve y=1/x. See Fig.6. Since the area below the curve is entirely contained within the rectangles and the area below y=1/x is:
Equation 9: The area below the curve y=1/x shown in Fig. 6.
the total area of the rectangles must be infinite as well.
Figure 6: Comparison between the harmonic series and the area below the curve y=1/x proves that the harmonic series diverges.
The geometric series is, in general, the sum of successive powers of a given variable x (see Fig. 5). More concretely, consider the following geometric series where the first term is 1 and the common ratio is x:
Equation 10: An example of a geometric series.
To find a closed form for this sum is not difficult. Just multiply both sides by x
Equation 11: Eq. 10 multiplied by x.
and subtract both equations. Most terms cancel out and we are left with:
In mathematics, especially when it comes to number theory, there are many basic things that we don’t know.
Simple questions that we can’t answer.
This can be realized by considering a few of the famous unanswered questions that we mathematicians would like to know the answer to.
A few of them are
Are there infinitely many primes p such that p+2 is also prime?
Are there infinitely many Mersenne primes?
Can every even number greater than 2 be written as a sum of two primes?
Are there infinitely many amicable numbers?
Is the Collatz conjecture true?
These questions seem deceptively simple, but these are actually some of the deepest mysteries of the field and we don’t have a clue about how to even approach them. We are in desperate need of both knowledge and machinery.
This list is only the beginning of our embarrassing lack of knowledge. In this article, we shall add some more mysteries to the list.
The theory of perfect numbers is one of the oldest in mathematics. It all started in about 300 BC when Euclid proved an amazing property about them.
Before getting into that though, we need to define them so that we are all on the same page.
Recall that a divisor of a number n is a number d such that d divides n. A proper divisor of n is defined as a divisor of n that is not n i.e. it must be strictly less than n.
Let’s consider the number 28.
28 is a divisor of 28 and 7 is a proper divisor of 28.
The proper divisors of 28 are 1, 2, 4, 7, and 14.
A perfect number is a natural (whole and positive) number such that the sum of its proper divisors is equal to itself.
For example, 6 is a perfect number because its proper divisors are 1, 2, and 3, and the sum of these is 6.
6 is the smallest perfect number and 28 is the next one. You can try adding the proper divisors of 28 listed above – you should get 28.
The next ones are 496 and 8128 and they get pretty big pretty fast.
We can also define perfect numbers using the sum of (all) the divisors.
That sum should be twice the number if it’s perfect. The sum of divisors of a number n is denoted σ(n) thus by that notation, a number n is perfect if and only if σ(n) = 2n e.g. 1+2+3+6 = 12 = 2 ⋅ 6.
Recall that two numbers are relatively prime if the only number dividing both of them is 1.
The sum of divisor function σ is what’s called an arithmetic function and one can prove that it is multiplicative i.e. if n and m are relatively prime, then σ(nm) = σ(n)σ(m).
The truth of this fact can be realized by simple combinatorics. Consider the number k = nm where n and m are relatively prime (i.e. n and m doesn’t share any non-trivial divisors). A divisor of k then must be of the form dc where d divides n and c divides m. Moreover, all the combinations of such numbers dc give us all the divisors of k. The sum of divisors of k is therefore
In about 300 BC Euclid showed the following important discovery.
Let’s prove this.
We are going to use the elegant proof that Euler discovered where he applied the sum of divisor function.
Before I state the proof, recall from my article about Mersenne primes that a partial sum of a geometric series has a closed-form expression. In particular,
Suppose that 2^p-1 is a prime number. By the above discussion, we have the following:
showing that m must be a perfect number.
Note that we have used the multiplicative property of σ (it’s clear that the two numbers are relatively prime because a power of 2 only has even divisors and the other is odd leaving it with no even divisors) as well as the closed-form of the geometric sum with base 2.
A small comment here: note that m must be even because of the power of 2 lying around.
Euclid was the first person to prove this fact. More than 2000 years of frustrations followed before Euler proved something astonishing.
Euler proved that all even perfect numbers are of this form.
That is, if m is an even perfect number, then m = 2^(p – 1)(2^p-1) for some prime p, and 2^p – 1 is prime.
I will give you a proof here:
Primes of this form are called Mersenne primes so what Euler showed was that there is a one-to-one correspondence between even perfect numbers and Mersenne primes.
A natural question is of course if there are infinitely many even perfect numbers and the answer is: we don’t know! Of course, an equivalent question is if there are infinitely many Mersenne primes which is a deep mystery.
Another “perfect” riddle is: are there any odd perfect numbers? And once again, I am sorry to disappoint you but we don’t know that either.
We do know that if there exists an odd perfect number, then it has to satisfy a parade of hard constraints, so by a heuristical argument, the answer is: we don’t think so… But that is of course not good enough.
We are also pretty certain that there are infinitely many Mersenne primes and therefore also infinitely many (even) perfect numbers, but a proof of that seems far away. We are waiting for a new Euler to prove it.
You might be wondering why we are interested in perfect numbers in the first place. Euclid was because they possess some kind of underlying dividing beauty, Euler was, probably, because it was a great challenge and he had an insight about using the sum of divisors function (or maybe he had only published 49 papers that year!). But why are we?
Well, the even perfect numbers are in some sense equivalent to Mersenne primes and Mersenne primes are used in online security, so we are interested in finding out more about them.
As mentioned above, we don’t know if we will ever run out of Mersenne primes.
At least we know that about our oil reserves but of course Mersenne primes do not pollute, so we are not leaving a mathematical ruin for the next generations if we use them all, so no need to have a bad conscience over that, if that should be the case!
But knowing more about perfect numbers would almost certainly require knowing more about the “sum of divisor” function which is closely related to the Riemann zeta function in which information about the distribution of the prime numbers is encoded.
And then of course: if a famous problem has survived for more than 2000 years, then it is because we need an extraordinary idea or theory and THAT is worth waiting for.
For now, though, I will sit back and wait — hoping that a new Euler will arrive in my lifetime — even though the odds are slim… at best.
Natural numbers were created by God, everything else is the work of men — Kronecker (1823–1891).
Number Theory
Perfect Numbers
An ancient riddle
Prime Numbers
The Goldbach Conjecture
An Equivalent Formulation of an Unsolved Mystery
Number Theory
The Euler Totient Function
A Probabilistic Approach
Number Theory
Transcendental Numbers
Into the Unknown
Riemann Hypothesis
A Fun Proof of the Riemann Hypothesis
Real math about a fictional object.
Number Theory
The Euler-Mascheroni Constant
Remember the harmonic series? More often than not, it serves as one’s first encounter with a series wherein individual terms diminish successively yet the series diverges to infinity. The below quote nicely sums the up notoriousness of the harmonic series: Today I said to my calculus students, “I know,
Euler
The Most Beautiful Equation in the World
And the Geometry of Numbers
Philosophy
Professor E. Brian Davies’s Mathematical Empiricism
This commentary is on the relevant parts of Davies’s book, Science in the Looking Glass. The following is not a book review.
Babylonian
A Modern Look at Square Roots in the Babylonian Way
Revisiting the Babylonian method for square roots: why and how does it work?
Ramanujan
It’s Time for a Breakdown (of Numbers)
Now it’s time for a breakdown… (cue En Vogue’s My Lovin’ (You’re Never Gonna Get It)) In particular, a breakdown of numbers!
Pi
A slice of π
π is ubiquitous both in mathematics and physics. Its appearance in some contexts are intuitive to grasp. Others require intellectual contortion, a mind-bending that defies all intuition.
What happens when logic goes wrong? Let’s see why the Law of the Excluded Middle is wrong (sometimes), discover foundational problems at the heart of Mathematics, before finally trying to resolve what’s going wrong.
You’ll also learn how to make sense of the negative root of -1, delve briefly into analysis, as well as proving all Cretans are liars only using logic.
Armed with a greater appreciation of logic, axioms and mathematics, we will show some flaws in Western Philosophy, particularly in Immanuel Kant’s notion of a priori truths (Epilogue III). In a beautiful twist, the conception of mathematics provided will coincide with the ideas of ‘différance’ provided by Derrida (Epilogue IV). There’s also a connection between logic and determinism in Epilogue V.
A Cretan Liar, circa 600 B.C.E
It was a hot day in Crete, and the Philosopher Epimenides was making mischief. He’d found a way to prove something with logic which, well, shouldn’t be provable with logic.
As an added plus, in those days he didn’t need to drudge through peer review, nor be an unpaid research assistant in his holidays. Below is a picture of Epimenides sharing his Philosophical discoveries with a potted plant.
‘Ah yes, Mr Plant. My one true companion’
‘All Cretans are liars. Merely by making this statement, I have deduced that that not all Cretans are liars!’
He proclaimed, to his potted plant. Here, by a liar, we mean someone who never tells the truth.
Well, what if his statement is true? Then we have a contradiction, because here is a Cretan who is telling the truth.
The Law of the Excluded Middle now swoops in. This Law states that a statement is either true, or its negation is true. We have just shown the statement cannot be true (as there would be a contradiction).
Wait one sec. Surely whether or not all Cretans are liars is a matter for historians and archaeologists to judge, not for us to deduce from logic alone??
This would be a funny superpower indeed. (If you want more detail on this paradox, see Epilogue II)
Unless… perhaps we made a false assumption in the Law of the Excluded Middle. Perhaps there are other options. But more on this later.
There’s two logical laws which are often conflated.
The Law of the Excluded Middle: a statement is true or its negation is truePrinciple of Bivalence: every statement is either true or false
Consider rolling a dice. The statement ‘I will roll a six’ is not (necessarily) true or false: it’s probabilistic. So the principle of bivalence fails. But LEM holds: the statement “I will roll a six or I will not roll a six” is a true statement.
Tremors in the Foundations of Mathematics
If you are a mathematician, this should ring alarm bells.
The Law of the Excluded Middle is pretty essential for mathematics. As Hilbert — a truly great mathematician — put it, ‘to take the law of the excluded middle away from the mathematician would be like denying the astronomer the telescope or the boxer the use of his fists’.
In mathematics we deal with such abstract statements, it is easier to prove that if something weren’t true we have a contradiction, than to actually construct the thing in question.
The founder of (rigorous) intuitionist mathematics, Brouwer, decided in later life that his earlier discoveries in Topology were invalid because of their use of proof by contradiction.
Here’s an example, but feel free to skip it if it is an obstruction to understanding the broader article. I’ve put the example in a quote box to help skip if you want.
For instance, consider a rectangular box.You are going to throw darts whichi land within the box. Every time a dart lands, you draw a circle of the same radius around the point. For instance, you might choose a radius of 0.1cm to draw around every dart.
If we continue like this, how do we prove that after a finite number of throws we have covered the entire rectangular dart board with the circles drawn?
running out of space
It seems obvious that it is true for the box. But we want to prove it is true for any shape with certain properties, with the technical name of compactness. This shape might even be infinite dimensional — try visualising that!With the law of the excluded middle, the proof is ‘simple’. We assume that we can pick points indefinitely.
We can then use a well known result that if we throw an infinite number of darts into a compact space, like the box, we can pick a sequence of darts which get arbitrarily close. But this is a contradiction, because the points are picked to be more than a fixed radius (say, 0.01 as before) apart. Thus,
assuming the statement is false, we get a contradiction. So we deduce that the statement is true.Done. Q.E.D. Finis.
But how would you do this without the Law of the Excluded Middle?
Mathematics is hard enough already without making it even harder!
Another important distinction is between the Law of Non Contradiction and the Law of the Excluded Middle.
Law of the Excluded Middle: the statement (P or not P) is true. Law of non contradiction: (P and not P) is false.
If you are baking a cake, and it is partially cooked, you may not want to affirm that it is, currently, a cake, but neither say that it is not a cake. Instead, it is in some intermediate state.
So while we aren’t allowing contradictory things to coexist, we aren’t limiting our categories as much to describe reality. We allow our categories to be slightly fuzzy.
Intuitionism and Baking a Cake
It is now probably clear that the problems in logic are inescapably tied to baking.
Photo by Brooke Lark on Unsplash
That’s a lie, they’re actually inescapably tied to mathematics.
The Intuitionist mathematicians, led by Brouwer, argued that Mathematics consists of objects which are constructed by the mind.
Here we see that proving that
¬(¬P) (symbolically)not (not P) (in words)
is no longer a valid proof of P. I haven’t actually constructed the damn thing! In Brouwer’s opinion, you can’t assume there’s this abstract realm of mathematical statements with a true or false label attached. If this mystical realm existed (in some sense we can’t make head or tails of, see this article) then the construction approach would be unnecessary.
This is a bit abstract, so let’s imagine Hilbert is baking a cake. And Hilbert deduces, from the smell, that the statement ‘there is not a cake in the oven’ is false. And so, he happily starts to go and call everyone to dinner, because the cake is ready.
But Brouwer stops him, and says that the Law of the Excluded Middle doesn’t hold. Before he gets everyone gorging themselves on half-baked cake mix, he should open the damn oven and see whether it is a cake or not.
There is a chance, Brouwer acknowledges, that Hilbert is correct, but cautions that we cannot be sure until we see with our own eyes. Here’s me using some creative license.
Brouwer looked over at Hilbert with disdain. In his eyes, the German mathematician was stuffing his face full of raw, uncooked cake mix, licking and smacking his lips, and totally unaware of the mess on the table and on the floor.Hilbert thought Brouwer was mad. He would open the oven door so often to check if the cake was there that we’d never even have a cake. And besides, it’s so sweet and so good – “mmmm, I think these are chocolate chips, you should really try some”
Hilbert actually viewed Brouwer as dangerous. And this makes sense — he thought Brouwer would forever derail the ability of Mathematicians to make cakes (proofs).
Intuitionism continued
You’re now well placed to understand that Brouwer is fine with the Law of Non Contradiction, but does not accept the Law of the Excluded Middle. His denial of this abstract mathematical realm where things have true or false attached to them leads him to deny the Principle of Bivalence too!
In Brouwer’s conception of mathematics, things must be built to exist. It is an entirely different conception of mathematics, which demands more of a proof. (Ie makes fewer assumptions)
This seems pretty weird, but remember the paradox at the beginning and it makes some sense. With Brouwer’s approach to mathematics applied to this thought experiment, truth is a constructive excercise. Thus, to show that all Cretans are liars, we have to ‘construct’ the proof by going through every statement ever made by a Cretan, and our asbtract logical proof was wrong.
If we wanted to put a label on this, we might say statements are true, false or indeterminate. This is a bit of a cop-out, as it labels the problem cases as indeterminate, but also seems to provide good categories for resolving this mess.
And it turns out that this is exactly what Brouwer did, something which I stumbled upon while editing. He used a many valued logic, with an ‘as yet undetermined’ section. If you’re really interested, you can read up on many valued logics
(In addition, my layman’s understanding is that experimental quantum physics indicates ‘indeterminate’ is part of reality as we know it. This was formalised by Von Neumann in his ‘Quantum Logic’.)
Intuitionism makes sense, but are you prepared to ditch modern mathematics?
Now that’s a tricky question. Because Intuitionism means ditching a vast amount of extraordinarily useful mathematics.
And there’s no way we’re gonna do that as long as Mathematics is ‘working fine’, aka producing results which can be used without contradictions popping up everywhere.
Some problems with intuitionism
The issue is that it’s not at all clear that mathematics is merely mental constructs. This is such a foundational question it’s hard to see how it can be treated as anything but an axiom, one way or another. That being said, proofs which construct the result always feel more certain than others.
My hunch is that reality is always messier than our categories. Brouwer may be right that proof by contradiction is not always valid, but it seems absurd to give it up everywhere. In our day to day life at least, for most sensible statements, the laws of classical logic hold. That’s why Aristotle made them!
We have no way of knowing a priori which categories are right, so have no choice but to improvise and fudge it, making adjustments when things go wrong. Using Brouwer’s reasoning the correct categories aren’t something which exist out there anyway, so this constructive process to choosing our axioms keeps the spirit of his approach.
The Real World and Mathematics
Mathematics looks at the logical deductions you can make given certain structures and axioms. It is when we apply it, that we breathe life into those categories and operations and axioms. And then we have the connection with Derrida (Epilogue IV), which came as much of a shock to me when I realised it as, I hope, it will come to you.
What is a number? It gets complicated (and then easy again)
Negative numbers don’t make much sense if you think numbers only stand for finite, integer, positive number of things in the world. E.g. 10 sheep, 6 children, 3 goats. Not -1 sheep.
But we define negative numbers as a structure, with operations and axioms, and suddenly we give meaning to these operations. That meaning might be the concept of debt, or might be the idea of going backwards on a line. Or the past. Or subtraction.
Likewise, the square root of negative -1, i, seems pretty abstract and like it doesn’t exist. (That’s what I thought at first)
But when you apply complex numbers to actual situations, you give the operations meaning. For instance, the modelling of physical phenomena with complex numbers in quantum physics, or the representation of complex numbers on a plane, where multiplication corresponds to a rotation and a change in radius. Here, the square root of negative 1 makes perfect sense, as multiplication has been extended in meaning and in purpose to have a rotational aspect. The point i has angle 90 degrees, and length 1. So, when we multiply a point z by i, we rotate it by 90 degrees, and scale the radius by 1
i² = -1, as the new angle is 90 + 90 = 180 degrees, and the new radius is 1 times 1.
So if the operations of classical logic seem reasonable in application in the real world, then the use of those on mathematical objects is also reasonable, provided we limit the domain of interpretation of what the mathematical object means.
Mathematics has been so focused on formalising the abstract connections within itself, it makes sense to also define the applicability of axioms to the real world.
Thus, whenever we use, say, Topology, to understand the real world, that comes with an understanding of the logical axioms needed.
Unanswered Questions
This may provide a satisfactory answer to Mathematics as it is used, but what about human curiosity? After all, there are statements within mathematics which relate to proofs within mathematics, and seem defined by this rather than given meaning through applications. Intriguingly, the biggest problems in set theory (and measure theory) comes from the creation of sets so wild and crazy that it is hard o see any resemblance to the real world. But in what sense can a set, intuitively a ‘collection of objects’ be disconnected from our understanding of the real world, ie the place where objects seem to exist?! Would a blind, deaf, mute person who had lost their smell and their taste and their touch (so just a mind without external input) be able to conceive of objects?
So the dilemma remains for the most abstract parts of mathematics.
I can’t think of an answer. But perhaps you can. So let me know in the comments! And if you can’t, then let me know in the comments anyway. It gives me an excuse to stop doing the statistical analysis I’d otherwise be doing right now. An Epilogue is below, but optional. It gives some final thoughts.
The author is an undergraduate studying Economics at Cambridge University. His passions include Mathematics, Philosophy, sub-mediocre music composition, making sub-sub-mediocre puns, finger painting, and writing bios of himself in the third person.
Epilogue I on logic
If there’s one lesson to be learned here, it’s to rethink our relationship with logic.
Logic, which should be a way to refine and formalise our categories and concepts (such as ‘true’ and ‘false’), has replaced them!
This is why we get so confused when first trying to understand the difference, say, between the Law of the Excluded Middle and the Principle of Bivalence. We ended up equating classical logic as the correct set of categories and way of viewing the world. If you doubt that the difference between these two can be confusing, I invite you to read this on Philosophy stack exchange.
In this light, it seems natural that there was some pushback against logic (*ahem, certain French Philosophers*), against the idea of simple, objective categories which capture all of reality. But the correct response must be to refine these — and I have given several examples of how — rather than to throw the baby out with the bathwater.
Epilogue II on the Cretan Paradox
This paradox is strongly linked to another paradox. This is because the crux of the issue is really whether he, Epimenedes, is a liar. After all, if he is not a liar, then not all Cretans are liars, as he is a Cretan.
The statement I am currently making is a lie.
Epilogue III: Kant
I can’t resist a quick poke here at Kant. He thought there were some self evident truths which we could then build our knowledge off. But we have learned that this is bollocks! It is totally unclear what the right assumptions are, and it is up to us to work them out from our preconceived knowledge of the world. He got it the wrong way round: you have to be able to assume some knowledge or understanding of the world to choose and refine your axioms. Rather than the perfect logical system necessarily existing, we have to reflect on what system suits our needs and which axioms are appropriate.
Or, in other words, the centuries long quest for pure philosophical truth created by reason alone is dead.
But that quest died long ago for other reasons. As Nietzsche puts it in Beyond Good and Evil
The eagerness and subtlety, I should even say craftiness, with which the problem of “the real and the apparent world” is dealt with at present throughout Europe, furnishes food for thought and attention; and he who hears only a “Will to Truth” in the background, and nothing else, cannot certainly boast of the sharpest ears. In rare and isolated cases, it may really have happened that such a Will to Truth — a certain extravagant and adventurous pluck, a metaphysician’s ambition of the forlorn hope — has participated therein: that which in the end always prefers a handful of “certainty” to a whole cartload of beautiful possibilities; there may even be puritanical fanatics of conscience, who prefer to put their last trust in a sure nothing, rather than in an uncertain something. But that is Nihilism, and the sign of a despairing, mortally wearied soul, notwithstanding the courageous bearing such a virtue may display.
Epilogue IV: Derrida and Mathematics
I was reading through an article on the Stanford Encyclopedia of Philosophy, and I was struck by a similarity between this conception of mathematics and some ideas of Derrida.
For Derrida, written marks or signifiers do not arrange themselves within natural limits, but form chains of signification that radiate in all directions. As Derrida famously remarks, “there is no outside-text” (Derrida 1974 [1967], 158), that is, the text includes the difference between any “inside” or “outside.”
I’m no Derrida buff, so perhaps some Philosophy department somewhere is going to put me in crosshairs. But this might ring a bell.
Pure Mathematics is like this — ‘there is no outside-text’ — it is a self referential set of signifiers! And, on its own, its meaning is abstract and mysterious. I remember asking a mathematician what the ‘empty set’ was. The answer was it is ‘the mathematical object called a set, which has no elements’. Is it really a set? What the heck does that mean?
So, you could take a Derridean approach, and view Pure Mathematics as fundamentally self-referential, with the application of it giving the extra meaning.
(This doesn’t fully satsify me — the development and understanding of mathematics has always been tied to its applications. But it was too surprising a connection not to share)
Epilogue V: Free will, Determinism and Logic
For those who are interested, whether or not the Principle of Bivalence always holds is tied to determinism. In the dice roll example earlier, perhaps there’s no indeterminism, I just lack the relevant information. I.e. lack of knowledge may be being conflated with the Principle of Bivalence failing. Aristotle was quite worried about this.
I think that both determinism and probabilistic thinking can be formulated in terms of each other. Some of my thoughts are explained here, hereand here
P.P.P.P.S I’m so sorry I lack self control! If you’ve made it this far (congratulations) and I, several weeks later, can add an interesting update. I asked on Mathematics Stack Exchange, and it turns out certain (extremely important!) proofs cannot be done without Law of the Excluded Middle. The Cantor-Bernstein theorem is one such example. My intuition that resticting the axiom set will result in fewer available proofs was correct!
Computational neuroscience, broadly defined, is the mathematical and physical modeling of neural processes at a chosen scale, from molecular and cellular to systems, for the purpose of understanding how the brain represents and processes information. The ultimate objective is to provide an understanding of how an organism takes in sensory information, how information is integrated and used in the brain, and how the output of such processing results in meaningful decisions and behaviors by the organism in order to allow it to function and thrive in its environment. In an attempt to understand what the brain is doing and how, we build computational models that aim to replicate and explain the observed or measured data in order to arrive at a deeper understanding of brain function.
The process goes something like this: Beginning with a set of experimental observations or measurements, for example measuring the electrcial activity of neurons in response to a stimulus (electrophysiology), a model is postulated. The model aims to provide a set of rules or relationships that if given the initial experimental observations (or at least a part of it) would be able to describe and explain some aspect of the brain. In general, this almost always begins with a qualitative “guess” about how the data fit together and what are the likely rules that govern the relationships between it. This qualitative picture of the model is then “translated” into a quantitative mathematical framework The model, once constructed, is still nothing more than a guess, and so testing it with the goal of building circumstantial support for it (or against it) is then carried out by numerical (computer) simulations, often where the answers or outputs are known from experiment, so that they can be compared with the outputs computed by the model.
But what if we define mathematical neuroscience not as the generation of hypotheses based on numerical simulations of postulated ideas, but as the systematic analytical investigation of data driven theorems?
At this point several outcomes are possible, assuming the model is at least partially correct. One possibility is that the model is able to describe the data set used to constructed it but cannot make any novel non-trivial predictions or new hypotheses. A more productive outcome is when the model results in a unexpected experimental hypothesis that can be tested and verified. This may lead to novel experimental findings. In turn, new data then allow the fine tuning or modification of the model in an iterative way. But in all cases though, the core of the process is the same: one guesses at a model and uses mathematics to justify the guess. The actual validation of the guess is based on numerical simulations, and an iterative approach improves the model. Note however, that in the typical way in which computational neuroscience is practiced, the mathematics involved is primarily descriptive and does not participate in the process of discovery. Given this, we can define computational neuroscience, somewhat more provocatively, as numerical simulations of postulated models and ideas constructed from qualitative hypotheses or guesses.
But what if we define mathematical neuroscience not as the generation of hypotheses based postulated ideas, but as the systematic analytical investigation of data driven theorems? The key ideas is that mathematical conjectures about the brain can be written down and logically proven. The axioms, i.e., the starting point ground truths, are not unknown or postulated hypotheses about how the brain might work, but, within the limits of experimental verification, are the simplest possible set of experimentally verified ‘knowns’ that support the construction of the statement of truth being made by the conjecture. In other words, the initial goal is to set up a conjecture that is mathematically sound and is based on an agreed upon set of experimentally validated axioms about the neurobiology, and then use any mathematics possible to formally extend the starting point in order to arrive at a novel insight or hypothesis or new understanding about how the brain works.
These axioms are the same types of experimental observations and measurements that form the starting point in computational neuroscience, but instead of qualitatively guessing a possible relationship that explains the set of observables, the objective is abstract, the set of experimental observations into a corresponding complimentary set of simple mathematical statements. No assumptions or guesses whatsoever regarding the relationship between either the set of experimental observables or their corresponding mathematical representations need be made at this point. Only, as simply as possible, straight forward statements of fact that everyone would agree upon.
The next step is to set up a conjecture that says something about the set of axioms. While this is itself a guess, often the product of much trial and error, it is a mathematical guess. This means that once a plausible conjecture is written down, it can be attempted to be proven. This is the fundamental consideration that differentiates computational neuroscience from mathematical neuroscience as I’m defining it here. In computational neuroscience a model is written down that is a guess about the relationship between the data itself, but there is no formal logical way to “prove” the model correct or incorrect. So numerical simulations are done. But this is never proof of anything. Writing down a valid conjecture on the other hand is a very narrow statement about a very specific set of facts. And it has the potential to be proven; meaning that it can be established as true or false analytically. And once proven it is true forever. One has established a truth about the relationship between the starting axioms from a logical set of arguments. No simulations or other guesses are required.
Combine such a mathematically rigorous approach with emerging applications of machine learning to neuroscience discovery …
As a proof of concept and power of this approach, in our own research we did just this. A mathematical description about how neurons and networks of neurons communicate with each other led to a series of mathematical proofs that resulted in a theoretical framework and prediction about how there must exist a balance between the time an individual neuron takes to process information internally, i.e. locally, versus the amount of time it takes for the information to propagate throughout the network, i.e. globally. (What we called the refraction ratio.) This, in turn, led to an experimentally testable prediction about how real biological neurons optimize this mathematical ratio. We were able to show computationally that at least in one specific type of neuron in the brain, the cells have shapes (morphologies) specifically designed to nearly preserve the theoretically predicted ideal ratio. This was not a serendipitous discovery about the neurobiology. We didn’t just get lucky. The mathematics and theoretical work pointed us in that direction.
Like most other areas of science, there is a tremendous and continuously growing amount of data in neuroscience, but comparatively less theory about how all the data comes together in order to arrive at a sytems-like engineering understanding about how the brain works. In fact, the link between mathematics, engineering, and neuroscience will only continue to become ever more stronger. It has to. We simply will not be able to understand how the brain works as a system without it.
Combine such a mathematically rigorous approach with emerging applications of machine learning to neuroscience discovery, and we could be on the verge of understanding the brain, and how it fails in disease, beyond anything we could have imagined so far.
Most people may think that mathematics is about doing hard sums with large numbers. Whilst it is true that you can do seriously hard computations all day long (try computing the integral of the square root of tan), the majority of mathematics is based on proofs.
Here I will outline some of the most common methods of proving a statement (and some of the funnier ones). For each method I will try and give a simple example of a proof using that method.
Photo by Miguel Henriques on Unsplash
Direct Proof
The method of direct proof uses logical steps to achieve the desired result. These logical steps can be axioms, definitions, and previously proved propositions.
For example: the sum of two even integers is an even integer.
Proof:
We consider two even integers m, n. They are even so can be written as m=2a, n=2b, for some integers a and b (this is by definition of being an even integer). Then the sum m+n=2a+2b, which we can write as 2(a+b). Since a+b is an integer, we can conclude that 2(a+b) is even and therefore m+n must be even. QED.
Proof by Contradiction
In essence, you assume the opposite and then try to derive some contradiction. The idea is that if assuming something leads to a contradiction, then that original assumption must be false.
This proof is usually useful in trying to prove the uniqueness of something. I look to group theory for an example: Prove there is a unique identity element.
Proof:
Suppose the contrary: let there be two distinct elements e and e’ (e≠e’) that have the property that e·x = x·e = x and e’·x = x·e’ = x for all x in the group. Consider e·e’. By definition of identity, we can argue that e·e’ = e and also e·e’=e’, therefore e=e’ which is a contradiction to our initial assumption. Therefore, the identity element must be unique. QED
Proof by Induction
If we want to prove that some proposition holds for all integers, then induction can be a good way to go. The idea is that you prove the proposition for a base case (say n=0). Then you assume the proposition to hold for some n, then use that to prove that it must hold for n+1. The hand-wavy reason this works is that since it holds for 0, it must then hold for 1, since it holds for 1, it must hold for 2, etc…
Granted, I see this one less and less, but used it very often when proving things in high-school.
Example Proposition:
The sum 0+1+…+k = k(k+1)/2.
Proof
Base Case: k = 0. The left hand side is 0 and setting k=0 on the right hand side, we see that k(k+1)/2 = 0. Therefore the base case holds.
Inductive step: Now assume that the above statement holds for k=n for some integer n.
Now we want to show that it holds for k=n+1. On the left hand side, we have 0+1+…+n+(n+1) = n(n+1)/2 + (n+1). This holds by our inductive assumption. We can now write it as n(n+1)/2 + (n+1)=(n+1)((n+1)+1)/2 which is exactly the statement for the proposition when k=n+1.
Therefore since the proposition holds for k=0, and if k=n is true then k=n+1 is true, then the above proposition holds for all integer values of k. QED
Algebra by Serg Lang
There are more methods that you will encounter along the way when studying and reading mathematics, but the three above will get you most of the way. If those fail then here are some other methods which remain tried and tested methods:
“I am telling you this is true”
“Gauss said it is true, therefore it must be”
Proof by Triviality
“The proof is trivial, so I leave it as an exercise to the reader”
Proof by Delayed Triviality
“The proof is trivial. ” At this point, a student puts up their hand and asks if it is indeed trivial. The lecturer leaves for 30 minutes, when they return:
“Yes, it indeed is trivial”
Proof by Plausibility
“It looks like it should be true, so let us assume it is”
Proof by Dense Notation
“Let us assume …” Then over the course of the next 5 hours, fills up 20 blackboards with 4 different alphabets “And therefore the abc conjecture holds.”
Inter-Universal Teichmuller Theory IV, Shinichi Mochizuki
Proof by Inaccessible Reference
“The proof can be found on page 845 of the 687 page book”
Alternative example:
“Proof can be found in a private memoir of the Hungarian Mathematical Society in 1898, the only remaining copy can be found in the basement of Béla Bollobás (untranslated)”
Proof by Deferred Reference
“The proof may be found in my upcoming paper”
It is often noted that the upcoming paper is not so upcoming as once thought.
Proof by Circular Reference
“The proof for theorem 12 follows directly by theorem 15.
The proof for theorem 15 follows directly by theorem 12.”
Proof by Assumption of RH
“Assume the Riemann Hypothesis…”
Proof by Hand Waving
Proof by Vigorous Hand Waving
For when hand waving was not sufficiently convincing
Proof by Overwhelming Evidence
“We have checked several billion integers and haven’t found a counterexample yet”
Proof by Insufficient Margin Width
“I have discovered a truly marvellous proof of this, which this margin is too narrow to contain.”
Proof by …
The remaining methods of proof are left as an exercise to the reader.