Within the world of mathematics, nothing has a profound impact on, well, everything. In “nothing”, we know that we’re effectively referring to the number zero’s (0) identity as the absence of anything. The application of zero, in a primitive sense, has been innately understood from the beginning of recorded history — even wild animals notice an absence of resources.
Quantifying This Principle, However, Wasn’t Quite As Obvious
This innate understanding for zero passes as common sense, yet it took thousandsof years to develop a mathematical value to numerically represent it. While the concept is understood & applied to everyday life by the earliest humans, thewritten, numerical value of zero was only distinguished in relatively recent history. It’s obvious in hindsight, but think about the paradox involved here — we use numbers to represent value, yet, zero, or nothing, is inherently value-less.
Zero’s value is equivalent to the lack of value. An analogy here: zero is to math, as black is to color. Black is the lack of color, as zero is the lack of numerical value. Although black is the lack of color, it is still irrefutably a color. Applying this same principle to zero, the lack of a numerical value does not retract zero’s identity as a number.
As the epicenter of early civilization, Mesopotamia offers the earliest conceptual resemblances of a ‘zero’ figure. Through preserved artifacts, archeologists were able to decipher what the Babylonian’s sexagesimal number system looked like ~4000 years ago, in 2000BC:
Originally Published https://www.setzeus.com/
Though less efficient than our decimal system, the Babylonian numerical system was impressively useful for record-keeping with respect to time. Unfortunately, this system was also used in the merchant class for sale & income records, where its shortcomings shine through.
The Biggest Difficulties The Babylonians Faced With Their System Was Rooted In The Lack Of A Numerical Zero.
When recording, scribes would denote ‘a lack of value’ as two wedge marks in the column that had no numerical value. For example, a value of ‘101’ is recorded using the ‘no value’ double wedge mark in the ten’s column. This double wedge helped the scribe to differentiate between ‘101’ & ‘11’. But how is this different from our modern zero? It is important to understand that this double-wedge notation was not given a numerical value. Rather, it was simply used as a placeholder for a column with no value; not quit “0,” but more so the Babylonian equivalent to using ‘N/A’. Although within this context it served the same purpose as a zero, its functionality & versatility is nonexistent in comparison to the numerical zero.
When counting to 10 or even 100, this system seems reasonable; every number has its unique symbol, or a combination of two symbols (as seen in the notation of 11/12 in the above figure). It is only when recording larger numbers that the kinks in this system become apparent. Consider the US national Debt of $23,576,361,671,434. Using the table of figures above, we’d come up with the following table value:
Originally Published https://www.setzeus.com/
As illustrated above, we calculate the Total Value (right-most column) of each row by multiplying the Symbol Value times the Column Value; we then repeat this with the next row, all the way down the table. Finally, we aggregate all Total Values to output a single final value. Complicated? Not entirely. Practical? Not at all.
The Babylonian system worked sufficiently when dealing with smaller quantities because merchants of this time were dealing with quantities in the hundreds, not the millions. The issue with this system, as highlighted above, is more obvious when dealing with very large quantities. With the infinite property that numbers possess, the Babylonians faced a daunting task to determine the value of large numbers from their established columns & symbols.
In retrospect, the problem lies in the lack of the number zero. In the modern numeral system, the difference between 100, 1,000, &, 10,000 is a mere addition of a significant zero. While in the Babylonian numeral system, completely differentiated symbols are required to represent these quantities. The symbol-based system became obsolete when numbers had more use than simply counting the loaves of bread sold in a day. With mathematics playing a pivotal role in the technological & societal advance of the human race, the concept of the absence of anything needed to be quantifiable. Without a defined number zero, the vast majority of mathematical proofs & theorems would be unachievable. The invention, or more so discovery, of the number zero, was a monumental leap in the advancement of society & has been traced to early 7th century India.
An Indian Invention
While the birthplace of the numerical zero is debated within history & math circles, India is the most likely. While the concept of zero is visible in Mesopotamia, China, & Mayan culture, the numeric value was first assigned in ancient Indian writings. The very first known writing to include the numerical zero is found in the Bakhshali Manuscript — a manual to arithmetic for Indian merchants dating as far back as 7th century AD. Archeologists discovered that this ancient manuscript, written on birch bark, contained black dots under numbers that were determined to be the first known usage of zero as a numerical value:
Circle(ish) Zero Representation Shown In Red
Compared to the previous Mesopotamian usage of zero, this manuscript did not utilize the dot as a placeholder for an empty value but rather as its own number. Peter Gobets, secretary for the Zero Project, a foundation dedicated to the study of the development of zero in India, hypothesizes that:
The mathematical zero may have arisen from contemporaneous philosophy of emptiness, or Shunyata
A core concept of the Buddhist teachings central to Indian life, this philosophy materialized into the mathematical principle of zero by Indian mathematician Brahmagupta. His teachings were the first to define zero & its mathematical operations in 628AD. However, it is worth noting that Brahmagupta’s contribution is about ~500 years after the “Bakhshali Manuscript.” This again suggests that while Brahmagupta was the first to define zero, he was far from the first to discover its principle. Regardless, it’s evident that India is most likely the correct candidate for the geographical origin of the number zero.
European Resistance
The importance of zero is irrefutable, yet Europe was specifically hesitant to accept this new mathematical principle. It was first introduced to Europe by the Moorish conquest in the 8th century & later developed within Italian by Fibonacci. As the idea spread throughout Europe, there was push-back from different religious leaders of Europe. Dr. Vander Hoek of the Zero Project explains that the religious leaders believed,
God Was In Everything That Was. Everything That Was Not Was Of The Devil
The concept of zero was interpreted, by some, as a satanic teaching because “nothing” was viewed as logically equivalent to being ‘empty of God’. Along with religion, blatant racial issues compounded the push-back. In 1299, for example, Florence banned the usage of Arabic numerals: a direct result of the fear that Europeans had of the Arabic & Hindu people. This ban limited the merchants within the city to only the usage of roman numerals; an outdated system that did not have a numeral zero. It wasn’t until the 1600’s when roman numerals were again superseded by the Arabic numeral system in Florence. With perspective to the technological advancements over the past 50 years, it makes one wonder what three centuries of zero could have done for advancement.
Breakthrough Reflection
The vitality of the discovery of zero cannot be understated — zero is multi-functional. It’s both a key placeholder in the modern number & it’s own number as well; as a placeholder, it differentiates ‘1’ from ‘10’ & it also allows a system in which only 10 digits are necessary (as opposed to new symbols for large numbers, such as the Babylonian numerical system). Yet this isn’t all — zero is also the ‘middle-man’ between positive & negative.
Perhaps the largest, lasting impact that resulted in the implementation of a numerical zero is the common utilization of mathematics. Before zero’s introduction, calculations were almost exclusively reserved for mathematicians using an Abacus — a tool allowed for simple calculations. With the introduction of the Arabic numeral system (which remember was catalyzed by the numeral zero) common people were able to compute complex calculations that extend far beyond the capabilities of Abacus. This swell of public utilization exponentially drove the growth in the fields of science, technology, & the advancement of the human race.
The unfathomable reality is that zero is found almost nowhere in the natural world — there is always something, even in apparent nothingness. An empty sky is really just full of space. This paradox might be abstract & contradictory, but it brings about interesting insight to the late discovery of zero. Early advances in sciences and math were brought about by the study & understanding of the natural world.
Since zero does not exist in the natural world it is no surprise that it took thousands of years for civilization to conceptualize the numerical value of nothing.
Functionality was always understood, but the numerical quality of nothing has escaped the grasp of human comprehension until of recent. The concept of nothing has always been, yet it took the quantification of this nothingness to catalyze every aspect of modern life.
The field of Statistics and Probability is useful for a lot of things like weather forecasts, scientific research, machine learning, and data analysis. However, I recently had an experience which caused me to add one more thing to that list: guarding your laptop against theft.
In the month of September last year, I suffered the same fate as many people around the world. My laptop got stolen. To be honest, the incident was mostly my fault. I made a hasty decision based on my understanding of statistics, but after I lost the device I learned a lesson about statistics that could have saved it.
In this article, I try to explain the kind of statistical thinking that caused me to err as well as the lesson I learned from it. That lesson is from the field of Discrimination and Classification.
Discrimination and Classification
Discrimination and classification are closely related concepts in the world of Statistics and Probability. For those who are conversant with topics in machine learning, this would be familiar. In their book Applied Multivariate Statistical Analysis, Johnson and Wichern state, “discrimination and classification are multivariate techniques concerned with separating distinct set of objects and with allocating new objects to previously defined groups.”
In plain English, this means that if we are presented with certain “objects”(which could be things, scenarios or people), we would like to put those objects into different groups based on certain characteristics. As an example, if I were to present you with detailed information(like income, level of education and morality) of a bunch of strangers, discrimination could help you distinguishbetween trustworthy and untrustworthy people among them. Based on the groups you’ve made, if I present you with a new stranger, classification would teach you where to allocate that new individual.
This is the same idea that insurance companies might use to classify someone as being likely or otherwise to be in an accident soon. This would inform the amount of premium that customer might be made to pay.
This was the kind of knowledge I needed last year. Now that I have that knowledge, I realize that if I had discriminated and classified properly, I would still have that laptop today.
So what happened on the day of the theft?
Discriminating and Classifying The Laptop
When I lost my laptop last year, I was a final year student of mathematics and statistics in the university. On that September day, I was in hurry to get to the lecture hall and was in no mood for any delay. I was running late for an interim assessment that was about to start in the next 10 minutes, but I needed a place to keep my bag which contained my laptop.
If I wasn’t so late, I would have readily gone to keep my bag at the usual place I kept it. However, time wasn’t on my side, so I began to hesitate on that decision.
Should I keep it in the safe place I trust? Or should I send it into the lecture hall and place it where everyone else places their bags? I had heard of people’s bags getting stolen during exams but I thought perhaps I just might be lucky.
In order to make the best decision based on convenience and safety, I turned to the one thing I knew best apart from math: statistics and probability. I was a student of statistics, after all.
So I asked myself, “What is the probability that my laptop would get stolen if I keep it in the usual safe place?”
I replied to myself, “Obviously, close to zero. The place is very safe.”
“What about where everyone else places their bags?”
And that’s where I failed my statistics lecturers in the answer I gave. The convenience of not having to walk far in order to store my bag clouded my sense of judgement.
I knew that my laptop was attractive and very expensive. Many of the students in my class had seen it before and were probably envious. That should have informed me that the probability of it getting stolen was higher than usual.
Yet, I managed to convince myself that everything would be fine. I was certainly going to keep one eye on my test paper and keep the other eye on the bag. Because of the intricate plans I had, I reasoned that the probability of me losing the bag was close to zero.
Well, I failed. The thief had no regard for probability at all.
However, in the next semester, I took the course Multivariate Analysis and once I studied Discrimination and Classification, I understood what I had done wrong.
Discriminating and Classifying The Right Way
Here’s how my past self should have reasoned about the situation. What I needed to realize was that once I strip away all the unnecessary details, this is really all just a problem in Discrimination and Classification. What my past self was trying to do was to classify my laptop into one of two categories: likely to get stolen or not likely to get stolen.
Now, before we go through the proper way to classify my laptop, we need to define some variables. Let
x = my laptop
S = collection of all stolen laptops in the world
N = collection of all non-stolen laptops in the world
f = probability density function associated with S
g = probability density function associated with N
c(S|N) = cost of classifying the laptop in S when it actually belongs to N
c(N|S) = cost of classifying the laptop in N when it actually belongs to S
p = prior probability of S
q = prior probability of N
Now that we’ve got all that notation out of the way, we need to determine which group my laptop is likely to fall into based on information we know. Is x(my laptop) likely to fall into group S(stolen laptops) or group N(laptops not stolen)?
To classify appropriately, Johnson and Wichern have provided us with formulas in their book that we can work with.
Classify x into S if
But classify x into N if
For the purposes of this article, the most important parts of those equations are the prior probabilities(namely,pand q) and the costs of misclassification(namely, c(S|N) and c(N|S)).
Prior Probability
Prior probability of a group of objects can be regarded as the proportion of objects that typically fall into that particular group. So in the case of stolen laptops, the prior probability of the group S is the proportion of laptops in the world that get stolen. In order to know how likely it is that a random laptop falls into this category, we need some official statistics.
One such useful statistic comes from a 2018 Kensington report (PDF), which suggests that 1 in 10 laptops will likely be stolen or lost from an organization over the lifetime of each computer. So if I were to take this value as it is and apply it to my situation, that puts p = 0.10 and q = 0.90. (And for those of you who are more curious, Techspective cites a Gartner study indicating that a laptop is stolen every 53 seconds.)
However, you might look at the probability 0.1 and say, “Bro, that’s so small! There’s no way your laptop is getting stolen.” But that’s where I got it wrong. Prior probability isn’t the only thing needed to do a proper classification.
Cost of Misclassification
Cost of misclassification means exactly what you think it means. It accounts for the cost, mostly financial, of making a wrong classification of an object. In other words, it is the cost of classifying an object as belonging to one group when it in fact belongs to a different group.
So in the case of stolen laptops, we could take into account the cost of classifying my laptop as not likely to be stolen when it is actually likely to get stolen. That is what I failed to think about.
If I classified the device as not likely to get stolen but it eventually got stolen, what would be the cost to me?
To answer that question, I should have realized that I was a student who had managed to squeeze out some money to buy a laptop like that. The cost of losing such a laptop in the middle of a busy school year was going to be immense. So although 0.1 might have seemed a small probability, the cost of several hundred dollars should have been enough to knock some sense into me.
Nevertheless, I could have also considered the cost of classifying a laptop as likely to be stolen when it is actually not likely to be stolen. However, that cost should be irrelevant, since “it is better to be safe than sorry.”
Johnson and Wichern said it best. “Another aspect of classification is cost. As an example, failing to diagnose a potentially fatal illness is substantially more ‘costly’ than concluding that the disease is present when, in fact, it is not.”
They continued, “An optimal classification procedure should, whenever possible, account for the costs associated with misclassification.”
Thanks, Johnson and Wichern. I wish you both had told me this while I still had my laptop.
Conclusion
Like I said at the start, statistics and probability are useful for a lot of things: weather forecasts, scientific research, machine learning, data analysis, and guarding your laptop from thieves. Ultimately, statistics helps to make better decisions. That is what this field of study was made for.
From the littlest of decisions like placing your bag down, to the biggest of decisions like solving unemployment, statistics has the tools to supply you with probabilities you can work with.
Now, whenever I take a decision, I ask myself what the cost of misclassification is. Is the decision I’m about to make worth the risk?No matter how small the probability is, am I about to make a decision that I would deeply regret later if it goes wrong? If the answer is a confident YES, then I stay clear of repeating my past mistake.
So in a sense, the theft of my laptop has taught me a lesson about Probability that sounds counter-intuitive: in the face of high cost, Probability says ignore probability.That’s a rule I understand very well now. It took a thief to teach me that.
REFERENCES
[1] Richard A. Johnson and Dean W. Wichern. Applied Multivariate Statistical Analysis(Sixth Edition). Pearson, pp 575–584.
[2] Kensington, Survey: IT Security & Laptop Theft, 2016
The now famous Einstein-Szilárd letter was written at the initiative of Hungarian nuclear physicist Leó Szilárd with help from Edward Teller and Eugene Wigner in 1939. It was signed by Albert Einstein and sent to the President of the United States, Franklin D. Roosevelt in October 1939. The letter argued that the United States should engage in uranium research. Its writing was motivated by the news of the discovery of uranium fission by Otto Hahn and Fritz Strassmann nine months prior.
The letter prompted Roosevelt to propose the undertaking which would later become the Manhattan Project, producing the first nuclear weapons and — following the atomic bombings of Hiroshima and Nagasaki — leading to the unconditional surrender of Imperial Japan and the conclusion of World War II.
Leo Szilárd’s Role
Hungarian physicist Leó Szilárd (1898–1964) was born in Budapest in 1898 and attended the Palatine Joseph Technical University before enlisting in the Austro-Hungarian Army during World War I. After the war, he resumed engineering studies but due to the chaotic political situation in Hungary eventually left for Berlin in 1919, where he enrolled at the Technische Hochschule, eventually transferring to instead pursue physics at Friedrich Wilhelm University. There, he attended lectures by Planck, Franck, von Laue and Albert Einstein (1879–1955). Szilárd’s doctoral dissertation Über die thermodynamischen Schwankungserscheinungen (“On The Manifestation of Thermodynamic Fluctuations”) was completed in 1922, winning top honors. He was appointed as an assistant to von Laue in 1924 and completed his habilitation in 1927 to become a Privatdozent. Szilárd was given German citizenship in 1930 but was forced to leave the country in 1933 following Adolf Hitler ascent to power.
In the UK, beginning in 1933, Szilárd worked as a physicist in St. Bartholomew’s Hospital, working on radioactive isotopes for medical purposes. By 1938–39, he was working as a visiting researcher in the U.S., eventually settling at Columbia University. His research concerned nuclear chain reactions, a concept he had conceived of in 1933 while reading Ernest Rutherford (1871–1937)’s disparaging remarks about an experiment of his students John Cockcroft (1897–1967) and Ernest Walton (1903–1995). In the experiment — now considered the first man-made splitting of an atom — the two used protons from an accelerator to split lithium-7 into alpha particles. The experiment showed that much greater amounts of energy were produced by the reaction than that which was supplied by the proton. Cockcroft and Walton would be awarded the 1951 Nobel Prize in Physics for the discovery. However, citing “inefficiencies in the process”, Rutherford had dismissed the idea that such a concept could be used to generate power in the future:
“We might in these processes obtain very much more energy than the proton supplied, but on the average we could not expect to obtain energy in this way. It was a very poor and inefficient way of producing energy, and anyone who looked for a source of power in the transformation of the atoms was talking moonshine. But the subject was scientifically interesting because it gave insight into the atoms.” — Ernest Rutherford
British physicist James Chadwick (1891-1974) had discoverd the neutron a year before, in 1932. He also determined that the neutron was a new elementary particle, distinct from the proton, albeit of similar mass. Having been trained both as an engineer and a physicist, Szilárd postulated that if instead of a proton, Cockroft and Walton had used neutrons, the process might have been a self-perpertuating chaing reaction capable of producing power without the need for protons or an accelerator.
Szilárd even proposed a patent for the idea the following year, the first for a nuclear reactor. Although the patent was granted in 1936, it was not published until 1949. In his momentous book The Making of the Atomic Bomb* Richard Rhodes describes Szilárd’s moment of inspiration for the nuclear chain reacion in the following way:
In London, where Southampton Row passes Russell Square, across from the British Museum in Bloomsbury, Leo Szilard waited irritably one gray Depression morning for the stoplight to change. A trace of rain had fallen during the night; Tuesday, September 12, 1933, dawned cool, humid and dull. Drizzling rain would begin again in early afternoon. When Szilard told the story later he never mentioned his destination that morning. He may have had none; he often walked to think. In any case another destination intervened. The stoplight changed to green. Szilard stepped off the curb. As he crossed the street time cracked open before him and he saw a way to the future, death into the world and all our woes, the shape of things to come.
— Excerpt, The Making of the Atomic Bomb* by Rhodes (1986)
Over the next few years, beginning in 1935, Szilard attempted to generate nuclear chain reactions using beryllium and indium bombarded with X-rays. Still living in the UK, he had emigrated from Germany after the passing of the Berufsbeamtengesetz (“Law for the Restoration of the Professsional Civil Service”) law and the following “Great Purge of 1933” in which large number of “insufficiently Aryan” civil servants had to leave their jobs. He even wrote Winston Churchill’s scientific advisor Frederic Lindemann (1886–1957) to discuss “the question of whether or not the liberation of nuclear energy can be achieved in the immediate future”, in particular whether “double neutrons“ could be produced “then it is certainly less bold to expect this achievement in the immediate future than to believe the opposite” (Rhodes, 1986).
By the time of the writing of the Einstein-Szilárd letter in 1939, Szilárd was working from Columbia University in New York City, living at King’s Crown Hotel on West 116th Street. The news of the discovery of nuclear fission by Otto Hahn (1879–1968) and Fritz Strassmann (1902–1980), and its theoretical explanation by Lise Meitner(1878–1968) and Otto Frisch (1904–1979), reached New York in early February 1939. Niels Bohr (1885–1962), staying at the Princeton Faculty Center, had arrived with the news from Copenhagen on February 4th. Szilárd learned of it from Eugene Wigner(1902–1995) when he visited Princeton a few days later. Although the implications of Bohr’s news were still unclear, Szilárd’s mind quickly postulated that the neutron-driven fission of heavy atoms could be used to create a nuclear chain reaction, yielding massive amounts of energy for electric power generation and potentially, atomic bombs. He tried but failed to convince Enrico Fermi (1901–1954) of its potential, and so set out on his own to show experimentally that such was the case.
Szilárd lecturing on nuclear fission in the 1940s (Photo: unknown)
Bohr had arrived with the news from Europe in early February. By the end of February, Szilárd had applied for and obtained permission to use a laboratory at Columbia for three months. He next convinced the postgraduate researcher Walter Zinn (1906–2000) to collaborate with him. Zinn had been conducting experiments bombarding uranium with 2.5MeV neutrons he obtained from a small accelerator. As Rhodes writes, Szilárd suggested that Zinn’s experiments would be more successful if he used lower energy neutrons. Zinn agreed, but didn’t know how to obtain them. Szilárd did, and so suggested the two collaborate on the following experiment:
“All we needed to do,” he explained later, “was to get a gram of radium, get a block of beryllium, expose a piece of uranium to the neutrons which come from the beryllium,, and then see by means of the ionization chamber which Zinn had built whether fast neutrons were emitted in the process. Such an experiment need not take more than an hour or two to perform, once the equipment has been built and if you have the neutron source.”
— Excerpt, The Making of the Atomic Bomb* by Rhodes (1986)
Szilárd wrote Lindemann at Oxford University for him to ship a cylinder of beryllium he had left in a laboratory there. It arrived on February 18th. The neutron source, two grams of radium sealed in a brass capsule, Szilárd rented using funds borrowed from a fellow inventor named Benjamin Liebowitz. It arrived in early March. From there, the process was simple. In a laboratory at Columbia, Szilárd and Zinn used their radium-beryllium source to bombard a piece of uranium with neutrons
“Everything was ready and all we had to do was to turn a switch, lean back, and watch the screen of a television tube. If flashes of light appeared on the screen, that would mean that neutrons were emitted in the fission process of uranium and this in turn would mean that the large-scale liberation of atomic energy was just around the corner. We turned the switch and saw the flashes. We watched them for a little while and then we switched everything off and went home.”
— Excerpt, The Making of the Atomic Bomb* by Rhodes (1986)
As Szilárd later wrote,
“That night, there was very little doubt in my mind that the world was headed for grief”
Szilárd and Zinn roughly estimated the production of neutrons emitted per fission to be two. Szilárd immediately notified his friends Wigner and Teller at Princeton. The latter remembered the moment vividly:
I was at my piano, attempting with the collaboration of a friend and his violin to make Mozart sound like Mozart, when the telephone rang. It was Szilárd, calling from New York. He spoke to me in Hungarian, and he said only one thing: “I have found the neutrons”. — Edward Teller
Although they had found neutrons there, on the seventh floor of Pupin Hall at Columbia University, they had not found a nuclear chain reaction. The result was however significant enough to help Szilárd convince Fermi and his collaborator Herbert L. Anderson (1914–1988) to try a larger experiment, using 230 kg of uranium and a neutron moderator in the form of carbon to slow the neutrons down, maximizing the chance of fission. Szilárd and Fermi collaborated on the design of what would become the first nuclear reactor to maintain a self-sustaining nuclear chain reaction. The design consisted of a pile of uranium oxide blocks interspersed with graphite bricks. Its design is detailed in the publication:
It was published on August 1st, 1939. Fermi and his collaborators went on to achieve the first successful self-sustaining nuclear chain reaction on December 2nd, 1942 at the University of Chicago.
Left: The “Chicago Pile-1 “ (CP-1), a diagram of the first nuclear reactor to achieve a self-sustaining chain reaction. Right: Szilárd and Norman Hilberry (1899–1986) at the site of CP-1 at the University of Chicago, some years after the war.
Einstein’s Role
Albert Einstein had by the summer of 1939 been involved in the anti-Nazi movement for over six years. Persecuted in Germany, he abandoned his professorship at the University of Berlin in December 1932, months before the enactment of the Berufsbeamtengesetz. As the story goes, Einstein feared for the safety of himself and his family, having been listed in a German magazine as an “enemy of the German regime” with an accompanying illustration marked “not yet hanged” with a $5,000 bounty (Jerome & Taylor, 2006).
Photo of Churchill and Einstein taken at Chartwell during the summer of 1933. From the article “Genius Loves Company” in Airmail
First relocating to the UK, Einstein eventually settled at the newly founded Institute for Advanced Study in Princeton, New Jersey in 1933. From this “quaint and ceremonious village populated by puny demigods on stilts” (his words) Einstein played a pivotal role in the rescue operations that commenced as the persecution of Jewish German scientists intensified. Indeed, it was Einstein who initially motivated Winston Churchill to send his chief scientific advisor Frederick Lindemann (1886-1957) to Germany to help Jewish scholars, including Max Born (1882–1970), find work in England.
The Letter
“Sir: Some recent work […] leads me to expect that the element of uranium may be turned into a new and important source of energy in the immediate future.” — Albert Einstein
Following his confirmation of the viability of a nuclear chain reaction, Szilárd grew even more concerned. That is, he worried that German scientists working under the Nazi regime might attempt to exploit nuclear fission for bomb-making purposes. He concluded, somewhat surprisingly, that among the first on the list to be warned were the Belgian government, because the Belgian Congo was the best source of uranium ore. He came to this conclusion after conferring with his compatriots Teller and Wigner. But who could communicate the warning to the Belgian government? As Rhodes (1986) writes, “it occurred to Szilárd that his old friend Albert Einstein knew the Queen of Belgium”, having met her on a trip to Antwerp in 1929. The three “Martians” (Szilárd, Teller and Wigner) hence concluded that he, by that time a celebrity, would be the most suitable person to warn them.
Einstein and Szilárd’s history went back to when they first met in the early 1920s in Berlin. Einstein had been thoroughly impressed by Szilárd’s doctoral dissertation. The two even designed and patented an Einstein-Szilárd refrigerator pump in 1927, which was later used for the circulation of liquid sodium coolant in nuclear reactors (Robinson, 2015). And so, on July 12th 1939 Szilárd and Wigner got in the latter’s car (Szilárd didn’t own a car, indeed never even learned to drive) and drove to Cutchogue, Long Island where Einstein was vacationing (Rhodes, 1986). Reportedly, the two Hungarians “had no luck soliciting directions to the house”, and at one point, according to Szilárd, “We were at the point of giving up and going back to New York” when “I saw a boy aged maybe seven or eight standing on the curb. I leaned out of the window and I said,
‘Say, do you by any chance know where Professor Einstein lives?’”
The boy knew, and directed them. Szilárd told Einstein about his Columbia experiments, his calculations on chain reactions in uranium and graphite, and the need to warn the Belgian government. Surprised to learn that the great man was not aware of his paper on nuclear chain reactions, Szilárd explained the possibility of generating chain reactions as the result of nuclear fission. Einstein’s reaction was to blurt out
“I never thought of that!”
From his own years of resisting nazism, Einstein was however quick to instinctively share his visitors’ fear that the Nazi’s would use the knowledge to build weapons (Robinson, 2015). He later described Einstein being “very quick to see the implications and perfectly willing to assume responsibility for sounding the alarm even though it was quite possible that the alarm might prove to be a false alarm”. Although hesitant to write the Queen of Belgium directly, Einstein suggested instead contacting an acquaintance who was a member of the Belgian cabinet. Wigner next suggested that the U.S. government should be warned as well, pointing out that the three immigrants were approaching a foreign government without giving the U.S. State Department an opportunity to object.
Reportedly, Einstein dictated a letter to the Belgians and Wigner wrote it down in longhand German. Szilárd drafted a cover letter. Over the next three weeks, the three men went back and fourth with several drafts. At one point an economist named Alexander Sachs (1893–1973) in conversation with Szilárd convinced him that the matters they were discussing first and foremost concerned the White House. He hence proposed that the best thing to do from a practical point of view was to inform the U.S. President Franklin D. Roosevelt (1882–1945) directly. Sachs, having contributed economics texts to Roosevelt’s campaign speeches in 1932, insisted that if Szilárd and Einstein formulated the text, he could communicate it to Roosevelt. Drawing on Einstein’s first draft, Szilárd prepared a new draft of their letter, meant specifically for the President. Szilárd went to see Einstein for his signature on July 30th, this time driven by Teller, who later described being “entered [into] history as Szilárd’s chauffeur”.
Sachs traveled to Washington in October, bringing with him the letter baring Einstein’s name. On October 11th, he presented himself in the late afternoon to Roosevelt in the Oval Office.
“Alex, what are you up to?”
Sachs opened with a story of a young American inventor who wrote a letter to Napoleon (Rhodes, 1986), Robert Fulton (1765–1815), the inventor of the steamboat and submarine. Next, Sach’s tone grew more serious as he cautioned the President to listen carefully. In addition to Einstein and Szilárd’s letter, he read his own summation which he thought more suitable for Roosevelt’s level of understanding of scientific matters. His statement ended with the words:
“Personally I think there is no doubt that sub-atomic energy is available all around us, and that one day man will release and control its almost infinite power. We cannot prevent him from doing so and can only hope that the will not use it exclusively in blowing up his next door neightbor.”
A paragraph to which Roosevelt is said to have responded (Rhodes, 1986):
“Alex, what you are after is to see that the Nazis don’t blow us up”
The rest is history.
The Einstein-Szilárd Letter (August 2nd, 1939)
Sir:
Some recent work by E. Fermi and L. Szilard, which has been communicated to me in manuscript, leads me to expect that the element of uranium may be turned into a new and important source of energy in the immediate future. Certain aspects of the discussion which has arisen seem to call for watchfulness and, if necessary, quick action on the part of the Administration. I believe therefor that it is my duty to bring to your attention the following facts and recommendations:
In the course of the last four months it has been made probable — through the work of Joliot in France as well as Fermi and Szilárd in America — that it may become possible to set up a nuclear chain reaction in a large mass of uranium, by which vast amounts of power and large quantities of new radium-like elements would be generated. Now it appears almost certain that this could be achieved in the immediate future.
This new phenomenon would also lead to the construction of bombs, and it is conceivable — though much less certain — that extremely powerful bombs of a new type may thus be constructed. A single bomb of this type, carried by boat and exploded in a port, might very well destroy the whole port together with some of the surrounding territory. However, such bombs might very well prove to be too heavy for transportation by air.
The United States has only very poor ores of uranium in moderate quantities. There is some good ore in Canada and former Czechoslovakia, wile the most important source of uranium is Belgian Congo.
In view of this situation you may think it is desirable to have some permanent contact maintained between the Administration and the group of physicists working on chain reactions in America. One possible way of achieving this might be for you to entrust with the task a person who has your confidence and who could perhaps serve in an inofficial capacity. His task might comprise the following:
a) to approach Government Departments, keep them informed of the further development, and put forward recommendations for Government action, giving particular attention to the problem of securing a supply of uranium ore for the United States;
b) to speed up the experimental work, which is at present being carried on within the limits of the budgets of University laboratories, by providing funds, if such funds be required, through his contact with private persons who are willing to make contributions for this cause, and perhaps also by obtaining the co-operation of industrial laboratories which have the necessary equipment.
I understand that Germany has actually stopped the sale of uranium from the Czechoslovakian mines which she has taken over. That she should have taken such early action might perhaps be understood on the ground that the son of the German Under-Secretary of State, von Weizsäcker, is attached to the Kaiser-Wilhelm-Institut in Berlin where some of the American work on uranium is now being repeated.
Paul Erdos is to this day remembered as the man who devoted his entire life to mathematics. Living out of a suitcase traveling from university to university, throughout his life he survived off speaking fees and modest endowments from various universities.
19 Jul 2019 — 22 min read Paul Erdős with Béla Bollobás at the University of Cambridge in 1990. Photo: Simons Foundation, 1990.
When asked how to best describe his friend Paul Erdős (1913–1996), mathematician Joel Spencer (1946-) once wrote the following:
“Mathematical truth is immutable; it lies outside physical reality … This is our belief; this is our core motivating force. Yet our attempts to describe this belief to our nonmathematical friends are akin to describing the Almighty to an atheist. Paul embodied this belief in mathematical truth. His enormous talents and energies were given entirely to the Temple of Mathematics. He harbored no doubts about the importance, the absoluteness, of his quest.”
Paul Erdos is to this day remembered as the man who devoted his entire life to mathematics. Living out of a suitcase traveling from university to university, throughout his life he survived off speaking fees and modest endowments from various universities. As a child he could multiply three-digit numbers in his head before the age of four. Before the age of 20, he had reproved Chebyshev’s Theorem that for any n there is a prime number between n and 2n. At 21, he earned simultaneous undergraduate and doctoral degrees in mathematics from the University of Budapest. In his 83 years of life, he published over 1500 academic papers with more than 500 collaborators, making him the most prolific mathematician in history, comparable only with Leonard Euler.
This week’s newsletter is about Paul Erdős, who devoted his life entirely to mathematics.
Early Life (1913-29)
Colourized photographs of Erdős in 1921 (left) and in Louis J. Mordell (1888-1972)’s garden in Manchester, in the 1930s (right)
Paul Erdős was born in Budapest, Austria-Hungary on the 26th of March 1913 to parents Lajos and Anna Erdős. His parents were both high school math teachers, which, in Hungary at the time required them to have Ph.Ds in mathematics. His two older sisters died at ages 3 and 5 from scarlet fever a few days before he was born, and so he grew up an only child. Erdős was home school until the age of 10. His mother Anna tutored him in addition to being the sole provider for the household following Erdős’ father’s capture during World War I. As his mother went to work during the day, a German governess was hired to look after him (O’Connor and Robertson, 2000).
The quintessential child prodigy, Erdős’ fascination with mathematics developed early as he taught himself to read by going through math books that his parents left around the house. By the time he was four, he could calculate in his head how many seconds a person had been alive (Hoffman, 1998). Once, when asked by a friend of his parents how much 250 less than 100 is, the three-year old Erdős is reported to have replied ”150 below zero”, already having discovered negative numbers. Being math teachers, his parents would both encourage Erdős’ talents for mathematics. Already at the age of 16, his father introduced him to infinite series and set theory, which would become lifelong obsessions for him.
In high school, Erdős would regularly solve the problems posed in the Középiskolai Matematikai és Fizikai Lapok (KöMaL), a monthly publication of Math and Physics texts for Secondary Schools. The publication is often credited with a large share of Hungarian students’ success in mathematics in the late 19th and early 20th century (Babai, 2001). As an adult, Erdős would later go on to publish several articles about problems in elementary plane geometry in the periodical, despite its target audience’s median age of 17–18 years old. In 1962, Hungarian-American mathematician László Lovász (1948-) reportedly came across one of Erdős’ articles and was “so enchanted that he read it nearly 20 times”. It was also through KöMaL that Erdős would first meet his long time collaborators Pál Turan (1910–1976) and Tibor Gallai (1912–1992).
At one point during the post-World War I years, Hungary fell under the rule of a right-wing conservative (some say nationalist) admiral named Miklós Horthy (1868-1957). Horthy enacted the first European antisemitic laws similar to those Hitler would introduce in Germany thirteen years later, including limiting the number of Jews that were permitted to study in Hungarian universities. Despite such numerus clausus Erdős, Turan and Gallai all still managed to enter Pázmány Péter University in Budapest on account of their winning national mathematics competitions.
University (1930-1934)
Erdős entered university to study mathematics at the age of 17 in 1930 (Bruno & Baker, 1999). He was awarded his doctorate in addition to an undergraduate degree in 1934 for his dissertation entitled Über die Primzahlen gewisser arithmetischer Reihen (“On Primes in Certain Artithmetic Series”). In his dissertation he proved the existence of prime numbers between n and 2n in certain arithmetic progressions. According to some, his methods of proof were “striking for their elegance”, despite being formulated by a 19-year old. His advisor was Lipót Fejér (1880-1959), the thesis advisor of numerous other Hungarian mathematicians including John von Neumann (1903-57), George Pólya (1887-1985) and Erdős’ friend Turán. Following his graduation, Erdős accepted a post-doctoral fellowship at the University of Manchester working with Louis J. Mordell (1888-1972), who arranged for him to receive a four-year fellowship there (Babai, 1996). Already during this tenure, he travelled widely in the UK, meeting G.H. Hardy (1877-1947) and Stanislaw Ulam (1909-84) in 1934 and ’35, respectively.
His wanderlust was already in evidence. […] From 1934 he hardly ever slept in the same bed for seven consecutive nights”. — Béla Bollobás
Left: Erdos about to sail to America in 1938. Right: Erdos in Princeton in 1939 (from Bollobás, 1998)
As Hitler took Austria in the Anschluss of March 1938, Erdős had to cancel his planned trip home for the spring, only to return briefly during the summer before hastily returning to England, and then America. In the U.S., Erdős accepted a one-year scholarship of $1500 at the Institute for Advanced Study, mingling in Princeton University’s Fine Hall with fellow European refugees the likes of Albert Einstein (1879-1955), Kurt Gödel (1906-78), von Neumann, Oscar Morgenstern (1903-77) and Eugene Wigner (1902-95). Erdős would later remember 1938–39 as his “best year” (Babai, 1996).
However, as Erdős did not conform to Princeton’s standards (reportedly being perceived as “uncouth and unconventional”), he was only offered a six-month extensions to his fellowship and eventually instead took up an invitation from Ulam to visit at the University of Wisconsin-Madison, setting out on what would become a lifelong trek from institution to institution, around the world.
Career (1934–1996)
“In the taxonomy of mathematicians, there are problem solvers and theoreticians“ — Sylvia Nasar
In relation to Nasar’s taxonomy, Erdős was the former (see the Two Cultures of Mathematics by Gowers, 2000 for an in-depth discussion of the differences). Believing strongly in the practice of mathematics as a social activity, most of his papers were written with co-authors. As of the writing of this essay, Erdős had 93,726 citations on his 1,657 publications on Google Scholar, with a reported 511 different collaborators (Oakland University, 2015). However, as his childhood friend once recalled:
I don’t think Erdős actually wrote many papers himself. His handwriting was abominable. Readable, but childlike. — Andrew Vázsonyi
From the documentary “N is a Number: A Portrait of Paul Erdős” by George Paul Csicsery (1993)
Bertrand’s Postulate
In 1845 Joseph Bertrand (1822–1900) conjectured that there is always at least one prime between n and 2n for n ≥ 2. Bertrand himself verified the statement for all numbers in the interval 2 < n < 3,000,000. The conjecture was proved by Pafnuty Chebyshev (1821–1894) in 1852. A simpler proof using the properties of the Gamma function was later provided by Ramanujan in 1919.
At age 19, Erdős in 1932 published his first paper, providing a surprising elementary proof using binomial coefficients and the Chebyshev function ϑ(x). The paper, entitled Beweis eines Satzes von Tschebyschef (“On a proof of a theorem of Chebyshev”) was published in Acta Scientifica Mathematica and constituted the main finding of his doctoral dissertation.
Erdős’ proof considers the middle binomial coefficient:
The binomial coefficient
A lower bound is:
A lower bound for the binomial coefficient in equation 1
Indeed, the binomial coefficient in equation 1 is the largest term in the 2n+1-term sum:
The first part of Erdős’ proof shows that if there is no prime p with n < p ≤ 2n, then we can put an upper bound on the binomial coefficient that is smaller than 4ⁿ / (2n +1) unless n is “small”. This verifies Bertrand’s postulate for all sufficiently large n. The second part deals with small the cases where n is “small”. These are dealt with by hand. For a narration of these cases, see Galvin (2015).
The Prime Number Theorem (1948)
In July 1948, Erdős met Norwegian mathematician Atle Selberg (1917-2007) at the Institute for Advanced Study. From their brief encounter, an elementary proof of the Prime Number Theorem appeared (Babai, 1996). Originally emerging as a consequence of the independent works of Adrien-Marie Legendre (1752-1833), Carl Friedrich Gauss 1777-1855 and J.P. Gustav L. Dirichlet (1805-59), the prime number theorem states:
The Prime Number theorem states that as x goes to infinity, the prime counting function π(x) will approximate the function x/ln(x).
The theorem was famously independently proved by Jacques Hadamard (1865-1963) and Charles Jean de la Vallée Poussin (1866-1962) in 1896 using the Riemann zeta function. Selberg in March 1948 established that the asymptotic formula
Selberg’s formula
Where the Jacobi theta function ϑ(x) is equal to the sum of the log of primes less than or equal to x and O(x) is an upper bound for x expressed in Big O notation. By July, both Selberg himself and Erdős had used Selberg’s formula to prove the prime number theorem. Who of the two proved the result first became somewhat of a priority dispute (Goldfeld, 2004), leading the two to, unfortunately, never collaborate again.
Ramsey Theory
Of Erdős’ most important results, his contributions to the development of Ramsey theory clearly stand out. Ramsey theory is the branch of mathematics concerned with studying the ‘conditions under which order must appear’.
A typical example of a such a problem starts out with a mathematical structure (such as the graph above), which is then cut into pieces. A typical question is “How big must the original structure be in order to ensure that at least one of the pieces has a given interesting property?”
The Erdős-Szekeres Theorem (1935)
The Erdős-Szekers theorem makes precise one of the corollaries of Ramsey theory, namely that given r and s any sequence of distinct real numbers with length at least (r – 1)(s – 1) + 1 contains a monotonically increasing subsequence of length r or a monotonically decreasing subsequence of length s. The result was first shown in Erdős and Szekers’ influential 1935 paper A Combinatorial Problem in Geometry.
The Erdős-Szekeres Theorem (1935)
Any real sequence of at least ad + 1 terms contains either an ascending subsequence of a + 1 terms or a descending subsequence of d + 1 terms.
A subsequence is a sequence that can be derived from another sequence by deleting some of the elements without changing the order of the sequence. For instance, given the sequence ABCD, its subsequences are ABC, BCD, AB, BC, CD, AC, AD, BC and BD. Given r = 2 and s = 2:
ExampleFor r = 2 and s = 2, the formula tells us that any permutation of three numbers has an increasing subsequence of length three or a decreasing subsequence of length two. Among the six permutations of the numbers 1,2,3:• 1,2,3 has an increasing subsequence of all three numbers• 1,3,2 has a decreasing subsequence: 3,2• 2,1,3 has a decreasing subsequence: 2,1• 2,3,1 has two decreasing subsequences: 2,1 and 3,1• 3,1,2 has two decreasing subsequences: 3,1 and 3,2
• 3,2,1 has three decreasing subsequences: 3,2, 3,1, and 2,1.
In their paper Erdős and Szekeres used proof by induction to show that f(n) = (n – 1)² + 1, where f(n) denotes the least integer such that any subsequence of f(n) real numbers must contain a monotone subsequence of length n. Steele (1995) later reviewed six different proofs of the same theorem, including the original proof by Erdős and Szekeres (1935), those by the pigeon-hole principle (Hammersley, 1972; Black, 1971; Seidenberg, 1959), one by one-to-one correspondence (Standon and White, 1986) as well as one that follows from Dilworth’s theorem (1950). The most widely cited proof is likely that of Hammersley (1972). The key idea of this proof by pigeon-hole is to place the elements of the sequence x₁, x₂, …, x_m with m = n² + 1 into a set of ordered columns by the following rules: a) let x₁ start the first column, and b) for i ≥ 1, if xᵢ is greater than or equal to the value that is on top of a column, put xᵢ on top of the first such column, and c) otherwise start with a new column xᵢ (Steele, 1995).
There are two points to notice in Hammersley’s proof. The first is that the elements of any column correspond to an increasing subsequence. The second is that the only time we shift to a later column is when we have an element that is smaller than one of its predecessors. Thus, given k columns in the final construction, one can trace back from the last and find a monotone subsequence of length k. Since n² + 1 numbers are placed into the column structure, one must either have more than n columns or some column of greater height than n. Either way, there must be a monotone subsequence of length n + 1 (Steele, 1995).
Colourized photograph of Paul Erdos from his visit to Chennai in 1975 (Photo: Krishnaswami Alladi)
The “slickest and most systematic” of the proofs, according to Steele, is that which “is naturally suggested by dynamic programming”, presented in a single page by Seidenberg (1959):
Proof of the Erdős-Szekeres Theorem (Seidenberg, 1959)
Given a sequence of length (r – 1)(s – 1) + 1, label each number nᵢ in the sequence with a pair (aᵢ, bᵢ) where:
• aᵢ is the length of the longest monotonically increasing subsequence ending with nᵢ and
• bᵢ is the length of the monotonically decreasing subsequence ending with nᵢ.
Each two numbers in the sequence are labeled with a different pair: if i < j and nᵢ ≥ nⱼ then aᵢ < aⱼ and on the other hand if nᵢ ≥ nⱼ then bᵢ < bⱼ.
But, there are only (r – 1)(s – 1) possible labels if aᵢ is at most r – 1 and bᵢ is at most s – 1. So, by the pigeonhole principle, there must exist a value of i for which aᵢ or bᵢ is outside this range.
If aᵢ is out of range then nᵢ is part of an increasing sequence of length at least rᵢ. If bᵢ is out of range then nᵢ is part of a decreasing sequence of length at least s.
The Happy Ending Theorem (1935)
In 1932, Erdős and Szekeres’ mutual friend Esther Klein (1910-2005) observed that:
The Happy Ending Theorem
Any set of five points in the plane in general position has a subset of four points that form the vertices of a convex quadrilateral.
Points in general position in the plane are points which no three belong to a line. Convex quadrilateral are polygons with four edges and four corners whose interior angles are less than 180°, such as rectangles, rhomboids, trapezoids and parallelograms. The three distinct types of placements of five points in general position in the plane are (Morris & Soltan, 2016):
Any five points in general position in the plane determine a polygon with four edges and four corners whose angles are less than 180°
The problem statement and theorem was one of the first influential results that spurred the further development of Ramsey theory. Erdős called the result the “Happy Ending Problem” because it eventually led to the marriage of Szekeres and Klein. The theorem is a particular case of the more general theorem proved by Erdős and George Szekeres in the same 1935 paper that proved the Erdős-Szekeres theorem of monotonically increasing and decreasing infinite subsequences, namely that:
Erdős & Szekeres’ Generalization of the Happy Ending Theorem (1935)
For any positive integer n, any sufficiently large finite set of points in the plane in general position has a subset of n points that form the vertices of a convex polygon.
Colourized photograph of Erdos with John Selfridge (1927-2010)
The Erdos-Szekeres Conjecture (1935)
While the Erdős-Szekeres theorem (1935) proves the existence of the finite number g(n), in the same paper Erdős and Szekeres also conjectured what the number g(n) is:
The Erdős-Szekeres conjecture
The smallest number of points m for which any general position arrangement contains a convex subset of n points is 2ⁿ⁻² + 1.
The conjecture is known to hold for its known values of g(3), g(4), g(5), g(6). It is trivial to observe that g(3) = 3, i.e. that any three points in the plane that do not belong to a line form a triangle with interior angles less than 180°. Klein’s observation in the happy ending theorem is that g(4) = 5. That g(5) = 9 was first proven by Endre Makai, but first appeared in print in a proof by Kalbfleisch, Kalbfleisch and Stanton (1971). A computer-aided proof that g(6) = 17 was proved by Szekeres & Peters (2006). They carried out a computer search which eliminated all possible configurations of 17 points without convex hexagons. The value of g(n) is unknown for values larger than n = 6, and the Erdős-Szekeres conjecture still remains open.
Recognition
For the results mentioned, and (literally) thousands of others, Erdős was throughout his life recognized as a first-rate mathematician. Although he was never awarded the prestigious Fields Medal, he was awarded several other prestigious awards for his achievements.
Erdos receiving an honorary doctorate from the University of Wisconsin—Madison in 1973. (Photo:
Ali Eminov
)
In 1951 he was awarded the Cole Prize of the American Mathematical Society for his many papers on the theory of numbers, in particular for his 1949 paper On a New Method in Elementary Number Theory which leads to an elementary proof of the prime number theorem, published in the Proceedings of the National Academy of Sciences in 1949. In 1984, he was also awarded the prestigious Wolf Prize in mathematics by the Wolf Foundation in Israel for “for his numerous contributions to number theory, combinatorics, probability, set theory and mathematical analysis, and for personally stimulating mathematicians the world over”. He donated most of the money from the award to the Department of Mathematics at the Technion for the establishment of a memorial fund in the name of his mother Anna (Israel Institute of Technology, 1997).
During the last decades of his life he was awarded at least fifteen (!) honorary doctorates, and became a member of the scientific academies of eight countries, including in his native Hungary, the U.S., UK and Israel.
Personality
He loved to play silly tricks to amuse children and to make sly jokes and thumb his nose at authority. But most of all, Erdős loved those who loved numbers, mathematicians.
— Bruce Schechter (1998)
Over the years, Erdős accrued as much fame for his personality as he did for his mathematics. Described by his biographer Paul Hoffman as “probably the most eccentric mathematician in the world”, famously Erdős developed his only language to accompany his unique nomadic lifestyle, referring to himself as a PGOM, LD, AD, LD, CD, a “poor great old man, living dead, archaeological discovery, legally dead, counts dead”.
Famously, possessions meant little Erdős, who carried most of his belongings in “two half-full suitcases” as he travelled from institution to institution, living with friends and colleagues.
Erdos with two children, whom he in general referred to as ‘epsilons’, an arbitrarily small positive quantity (Photo:
left
and
right
)
Eccentricities
“When I contemplated leaving mathematics to go to the Technical University and become an engineer, Erdős said: “I’ll hide, and when you enter the gate of the Technical University, I will shoot you”. This settled the issue. — Andrew Vázsonyi
Despite his wandering lifestyle, Erdős was famously absentminded, regularly misplacing his passport, wallet and glasses. Characteristically, he would blame his absentmindedness on “the SF”, “supreme fascist”, his own idiosyncratic term for God. In addition to his use of such terms, Erdős is often remembered for his use of aphorisms such as, for instance, “There are three signs of senility. The first sign is that man forgets his theorems. The second sign is that he forgets to zip up. The third sign is that he forgets to zip down.”
Erdős began learning English at age 7 when his father returned from Siberia, where he had learned English to pass along the hours in captivity. Having had no English teacher, his father had however not learned how to pronounce words, and so set about teaching his son English with a strange Hungarian accent. The accent remained one of Erdős’ most characteristic traits.
Anecdotes about Erdős’ cognitive abilities also abound. His childhood friend Andy Vázsonyi told stories of being a teenager when Erdős came to see him in his father’s shoe store in Budapest:
“I was sitting in the back of the shop one day, when Erdős knocked at the door and entered. “Give me a four digit number,” he said. “2,532,” I replied. “The square of it is 6,411,024. Sorry, I am getting old and cannot tell you the cube”, he said. “How many proofs of the Pythagorean Theorem do you know?” “One,” I said. “I know 37,” he replied. “Did you know that the points of a straight line do not form a countable set?” He proceeded to show me Cantor’s proof of using the diagonal. “I must run,” and with that he left.”
— Excerpt, The World’s Most Beloved Mathematical Genius “Leaves” by Andrew Vázsonyi (1996)
Relationships
Erdős met mathematicians Ronald Graham (1935-2020) and his wife Fan Chung (1949-) in 1963. The couple provided (actually, built) Erdős a room in their house that he could live in when he wanted to, when he wasn’t traveling.
Colourized photograph of Erdos with Haans Ludwig Hamburger, Istvan Juhasz and Edoardo Samarini (Photo:
Ali Eminov
)
Among his family members, Erdős’ father had died of a heart attack in 1942, but due to the war, Erdős didn’t learn of it until 1945. His mother survived the prosecution of Jews in Hungary by hiding. Erdős saw her again in 1948 when he went back to Hungary for the first time since before the war. Eventually, as Erdős began moving from institution to institution in America, his mother began traveling with him. He reportedly held her hand as she went to bed each night, right up until her death in 1971 in Calgary where Erdős was giving a lecture.
Drug Use
Starting around the time of the death of his mother in 1971, Erdős fell depressed and began taking medication, first antidepressants then amphetamines in the form of Benzedrine/Ritalin, at a dose of 10 to 20 milligrams per day. As one of the leading scientists in Hungary, he had no trouble finding doctors to prescribe him what he wanted. Around the same time (perhaps unsurprisingly) he also began increasing the number of work-hours of the day, upwards of 19 at the most (Hoffman, 1998).
Worried about his drug use, Graham in 1979 famously bet Erdős $500 that he could not stop taking amphetamines for a month (Hill, 2004). Erdős quit cold turkey and after 30 days won the bet, prompting him to proclaim:
You’ve showed me I’m not an addict. But I didn’t get any work done. I’d get up in the morning and stare at a blank piece of paper. I’d have no ideas, just like an ordinary person. You’ve set mathematics back a month.
After the month break, Erdős promptly resumed his use of amphetamines. According to mathematician Jochanan Schonheim “He took the pills discreetly and never talked about it, though he didn’t keep it a secret either. In the last years of his life, he stopped using those pills because of heart problems. He had a pacemaker at the end.” (Karpel, 2002).
FBI Record
In 2013, Cody Winchester submitted a request under the Freedom of Information Act to the Federal Bureau of Investigations (FBI) to search the agency’s central records system for all information responsive to the request ‘Paul Erdős’. The request was eventually granted on March 27th 2014.
In the unclassified FBI records released, it appears that the FBI had tracked Erdos and his movements for decades, beginning in the 1950s. The conclusion of the investigation appears to have been that, as Beryl Lipton later wrote, ‘The FBI spent decades tracking mathematician Paul Erdős, only to conclude that the guy was just really into math’. As Erdős’ FBI file states:
*Redacted* advised that subject is one of the top mathematicians in the world, is in the abstract field of mathematics and is purely a mathematician with typical atomspheric mind as related to factual things, that is he is of the genius type who lives within his own mental scope, and it is difficult to know him personally”
— From Erdős’ FBI file
Page 21 of of Erdos’ FBI file (MuckRock, 2015)
Humanitarianism
In addition to his mathematical genius, unusual lifestyle and eccentric personality, Erdős was known to his friends, collaborators and admirers for his compassion for other people.
Erdős teaching Terence Tao, then 10 years old. Tao won the Fields medal in 2006 for his contributions to “partial differential equations, combinatorics, harmonic analysis and additive number theory”. Photo: Billy and Grace Tao (1985)
Erdős generally donated the money he received from awards and other jobs to people in need and various worthy causes. Living off stipends and modest fees from universities and conferences, any money left over he used to fund cash prizes for proofs of problems he found interesting. The prizes varied from $25 to several thousand dollars. The most famous of such ‘Erdős problems’ is likely the Collatz conjecture, for which Erdős offered $500 for the person who comes up with a solution.
In the documentary N is a Number: A Portrait of Paul Erdos. , Ron Graham tells the story of a young mathematician who was admitted to Harvard, but whose parents wouldn’t agree to pay for him, despite being able to afford it. Having only met the student once at random, Erdős gave the student him $1,000 saying “Pay me back if you can. If you can’t, please do the same for someone else”. Indicative of his principled, compassionate life, shortly before his death in 1996 Erdős renounced an honorary degree from the University of Waterloo over what he considered to be the unfair treatment of his colleague Adrian Bondy (1944-) who was dissmissed from his tenured position for his acceptance of a teaching position in France.
Erdős Numbers
As a tribute to his prolific career, Erdős’ collaborators came up with the idea of an “Erdős number”, describing the collaborative distance between Erdős and other researchers, as measured by authorship of published papers. For instance:
Erdős himself has Erdős number 0.
People who co-authored a paper with Erdős have Erdős number 1.
People who co-authored a paper with someone of Erdős number 1 have Erdős number 2 and so on.
In his lifetime his three most frequent collaborators were András Sárközy (62 papers), András Hajnal (56 papers) and Ralph Faudree (50 papers). His friend Pál Turán wrote 30 papers with Erdős, while Graham and Erdős wrote 28.
The description of an Erdős number was published by Casper Goffman in a 1969 paper entitled And what is your Erdős number? in the American Mathematical Monthly. Later the Erdős number gained prominence as a tool to study how mathematicians cooperate to find answers to unsolved problems. Today, several projects are devoted to studying connectivity among researchers, using the Erdős number as a proxy. For example, Erdős collaboration graphs can tell us how authors cluster, how the number of co-authors per paper evolves over time, or how new theories propagate.
Death (1996)
To find another life this century as intensely devoted to abstraction, one must reach back to Ludwig Wittgenstein, who stripped his life bare for philosophy. But whereas Wittgenstein discarded his family fortune as a form of self-torture, Mr. Erdős gave away most of the money he earned because he simply did not need it … And where Wittgenstein was driven by near suicidal compulsions, Mr. Erdős simply constructed his life to extract the maximum amount of happiness.
— The Economist, 1996
Erdős died in Warsaw, Poland on September 20th 1996 at 83 years old. He was attending a conference when he had a heart attack (Bruno, 1999). His death was remembered by most of the worlds’ foremost news publications, including obituaries in The Chicago Tribune, The New York Times, The Independent, The Washington Post and many others.
There are two canonical biographies of Erdős’ life. The first, heavily references in this essay, is the biography
Hoffman, P. 1998. The Man Who Loved Only Numbers*. Hyperion Books.
The other, is the documentary film made about Erdős in 1993, entitled N is a Number: A Portrait of Paul Erdos.
In addition, I would recommend to interested readers one article: Paul Erdős just left town by László Babai, the book: Erdős on graphs: His Legacy of Unsolved Problemswritten by his friends Chung and Graham, and more recent Quanta Magazine article A Puzzle of Clever Connections Nears a Happy End by Kevin Hartnett.
References
Babai, L. (1996). Paul Erdos just left town. Available at: https://newtraell.cs.uchicago.edu/research/publications/techreports/TR-2001-11
Bollaobás, B. 1998.Paul Erdos and his Mathematics. The American Mathematical Monthly 105(3), p. 209-237.
Bruno, LC. & Baker, LW. (1999). Math and mathematicians: The History of Math Discoveries around the World. UXL. Detroit, MI.
Erdos, P., 1932. Beweis eines satzes von tschebyschef. Acta Scientifica Mathematica, 5, pp.194–198.
Erdos, P. & Szekeres, G. (1935). A combinatorial problem in geometry. Compositio Mathematica 2. pp. 463–470.
Galvin, D. (2015). Erdos’ proof of Bertrand’s postulate. Working paper. Available at: https://www3.nd.edu/~dgalvin1/pdf/bertrand.pdf
Goldfeld, D. (2004).“The elementary proof of the prime number theorem: an historical perspective” (PDF). In Chudnovsky, David; Chudnovsky, Gregory; Nathanson, Melvyn (eds.). Number theory (New York, 2003). New York: Springer-Verlag. pp. 179–192.
Goffman, C. (1969). And What is Your Erdos Number? American Mathematical Monthly 76 (7) pp. 791.
Gowers, T. (2000). The Two Cultures of Mathematics. In V. I. Arnold; Michael Atiyah; Peter D. Lax; Barry Mazur (eds.). Mathematics: Frontiers and Perspectives. American Mathematical Society.
Hoffman, R. (1998). The Man Who Loved Only Numbers.
Israel Institute of Technology, 1997. The Anna and Paul Erdos Post-Doctoral Fellowship. Available at: http://jeffe.cs.illinois.edu/compgeom/files/erdos-postdoc.html
Karpel, D. (2002). A Beautiful Mind. Haaretz. Available at: https://www.haaretz.com/1.5324065
Oakland University, 2015. Erdos Number Project Data Files. Available at: http://www.oakland.edu/enp/thedata/
O’Connor, JJ. & Robertson, EF. (2000). Paul Erdős. MacTutor History of Mathematics archive. School of Mathematics and Statistics, University of St Andrews, Scotland. Available at: http://www-history.mcs.st-and.ac.uk/
Schechter, B. (1998).My Brain Is Open : The Mathematical Journeys of Paul Erdos. p. 17
Seidenberg, A. (1959). A simple proof of a theorem of Erdős and Szekeres. Journal of the London Mathematical Society 34. pp. 352
Vázsonyi, A. (1996). Paul Erdos, The World’s Most Beloved Mathematical Genius “Leaves”. Pure Math. Appl. 7. pp. 1–12.
From the Big bang to the Heat death of the universe
It’s not because you’re stupid or weren’t concentrating in school
Two envelopes with different amounts of money in them. Choose the better one with a higher chance than fifty-fifty!
Natural numbers were created by God, everything else is the work of men — Kronecker (1823–1891).
In a purely logical argument, even if the premises aren’t in any way (semantically) connected to the conclusion, the argument may still be both valid and sound. Professor Edwin D. Mares displays what he sees as a problem with purely formal logic when he offers us the following example
Motion of a moving particle is considered to be one of the most instructive and useful physical systems one can study. In a real world case, such systems may exhibit immense complexity and intractability. But if we are lucky, we may be able to isolate the moving particle from unwanted
The nature of subsymmetries
The British writer, mathematician and logician Charles Lutwidge Dodgson (which was Lewis Carroll’s real name) worked in the fields of geometry, matrix algebra, mathematical logic and linear algebra. Dodgson was also an influential logician. (He introduced the Method of Trees; which was the earliest use of a truth tree.
A Framework For Defining Geometric Objects In Space
Developing an Intuition for Radicals
The three main contemporary ways to understand the foundations of mathematics.
This post presents a novel method for approximating 1st roots of cubic polynomials that avoids the lengthy gradient and height calculation iterations associated with Newton’s and some of my own previously posted methods.
It particularly addresses the task of approximating cubic polynomial roots in ‘difficult to get to’ locations – namely where root gradients are low and close to turning points. Such ‘architecture’ would normally require two or more iterations to solve.
This method exploits the amazing symmetry of cubic polynomials and their ‘component architecture’ by ‘rotating’ roots into segments where the underlying quadratic architecture approximates the cubic curvature.
This means we can use the simpler quadratic math to calculate a root!
This post assumes knowledge of polynomials at the high school level.
Note: In the interests of read time I’ll skip basic calculations where evident.
Background
In a previous post dealing with quadratic polynomials, Maths is Graphs — A Visual Perspective, I presented an ‘X-ray view’ of how polynomials are constituted from their component parts.
This showed graphically how the three separate building blocks: ax², bx and c, form the function y=ax²+bx+c,and importantly, how these sum to zero at the roots.
Furthering this recognition that any polynomial is simply the sum of its parts at any point x, there is nothing to prevent us exchanging or sharing parts from one component group with another — provided the overall total doesn’t change.
For example, consider the cubic: y=Ax³+Bx²+Cx+D
As long as the total function remains the same, we can reconfigure this as:
y=Ya+Yb where; Ya=(A-l)x³-mx²-nx-h and
Yb=lx³+(B+m)x²+(C+n)x+D+h
By managing these component functions we can create intercepts or ‘Nodes’ with the principal function, which can be used as platforms for Newton’s and other approximation methods.
This post furthers this methodology by using quadratic coefficient transfers (B+-m) to flatten the quadratic parabola to better simulate the principal function profile in a particular segment.
Root Rotation
Refer to Graph 1 below, of a cubic polynomial y=-x³-3x²+4x+10 (blue) with its component functions Ya=-x³ (dashed green), and Yb=-3x²+4x+10 (dashed black).
It can be seen that the quadratic component’s Turning Point J=(0.67, 11.33) and the Inflection Point (Xip,Yip) span a segment where the quadratic and cubic curves are closely aligned.
Rb1 is the intercept of the quadratic segment Yb=-3x²+4x+10 and a line, X AxisFlipped, y=8 which is equidistant with the X-axis from Yip=4.
Rotation
Given the close alignment of the cubic and quadratic curves, if the cubic function is visually rotated 180 deg. clockwise about the Inflection Point (Xip, Yip), Root Rb, it will transpose very closely to quadratic intercept Rb1.
This meansRoot Rbcan be approximately represented by interceptRb1’squadratic math. I.e. we can use the standard Quadratic Equation to treat theRb1intercept as a root withy=8as its X-axis.
Note: Rotation is only required when Rb1 and Rb are on opposite sides of Yip.
Graph 1
Quadratic math
Consider the cubic polynomial y=Ax³+Bx²+Cx+D and the bracketed part of the formula below, which is basically the standard Quadratic Equation of component Bx²+Cx+D with its X-axis being D-X Axis Flipped=D-2Yip.
The unbracketed term 2B/3A=2Xip simply shifts the selected Root x result across Xip to Root Rb x-coordinate.
Hence:
Approx. Root Rb=2Xip-Rb1 where Rb1 is the selected intercept nearest the inflection point (Xip, Yip).
This is calculated as:
The following example will demonstrate the simplicity of Root Rotation before I introduce improvements by changing the component functions.
Example 1
Referring again to y=-x³-3x²+4x+10
Calculate Xip and Yip as follows:
dy/dx=-3x²-6x+4
d²y/dx²=-6x-6
Hence: Xip=-1 giving Yip=+1–3–4+10=+4
Using the formula:
Root Rb=-1.62 compared with actual Rb=-1.6.
Roots 2 and 3
The remaining two roots are simply ‘downloaded’ from the ‘Extended Quadratic Equation’ I presented in Cubic Polynomials-A Simpler Approach.
Where factorL=-Root Rb=1.6; A=-1, B=-3 and D=10, giving roots Ra=-3.3 and Rc=1.9.
This example shows that the method is particularly effective with roots near turning points.
Where the roots are closer to the
Inflection Axis Yip it becomes necessary to manage the components to improve curve matching. We’ll discuss changes in the next section.
Example 2
Refer to Graph 2 below, showing function y=x³+6x²-x-30 with the quadratic component Intercept K=(-1.65, -12) with the line y=-12 being on the Inflection Point at Xip=(-2, -12).
Intercept Rb1 of the quadratic and X Axis Flipped y=-24 approximates Root Rb after rotation.
Solving for Root Rb:
Hence: Root Rb=-3.08 compared with actual Rb=-3.
Managing the ‘Architecture’
While both examples have returned satisfactory results with minimal effort, compared with Newton’s and other approximations, improved accuracy can be achieved by reducing the gap between the quadratic and cubic curves in the segment between constant D and the inflection point (Xip, Yip).
This can be achieved by flattening the quadratic curve Yb=(B+m)x²+Cx+D by deducting m from the coefficient of x² where m=Sqrt(Xip) —as detailed in the next example.
Example 3
Referring to Graph 3 below, and returning to y=x³+6x²-x-30 with Gap Xip-Intercept Di=-2+1.65=-0.35.
We can reduce this gap by letting m=Sqrt(Xip)=1.414.
This flattens the quadratic so that gap Xip-Intercept Di reduces from -0.35 to gap Xip-Intercept Ei=-2+1.87=-0.13 and moves intercept Rb1 from 0.92 to -1.04.
The ‘flattened’ quadratic Yb=(6–1.414)x²-x-30 is shown dashed red and component Ya=X³+1.414 in dashed green.
Hence:
Approx. Root Rb=2Xip-Rb1=-4+1.04=-2.96 compared with actual Root Rb=-3.
Graph 3
This process can be conveniently packaged into the above formula to present the novel 1st Root Approximation Formula as follows, where (B-Sqrt(Xip)) replaces coefficient B:
Rotating Roots into a managed ‘architecture segment’ is a relatively quick and accurate method of approximating a 1st root of a cubic polynomial by use of an amended Quadratic Equation and ‘rotational imagination’.
Care does need to be taken to get the sign right when allocating Root Rb1 to Root Rb.
In cases like Example 1 where the turning point is close to the roots, it can be highly accurate where normally two or more iterations of Newton’s method would be required. However given the ease of deducting Sqrt(Xip) from coefficient B,it is recommended this be the norm when using the method.
I hope I have demonstrated how reconfiguring the ‘LEGO’ block of functions can be effective in approximating roots of cubic polynomials, and another way to make math work for you; not you for it!
Tensors Are The Facts Of The Universe — Lillian Lieber
As the popular saying goes, necessity is the mother of invention — and when it comes to math, nature never ceases to provide us with a source for this necessity. Any experienced engineer can confirm: a very real necessity for engineers of all disciplines is to precisely model physical objects & quantities, regardless of perspective or frame of reference.
Assuming you’re familiar with nothing further than pre-calculus, your mental model for graphing, & in general geometrically representing objects, goes to the good ole’ Cartesian Coordinate system. With it’s ever familiar X & Y axis that intersect at an origin, this system has provided the foundation for mathematically representing space up to this point. Even in early sciences like Newtonian physics, we apply linear motion equations to observe the path of a thrown ball or shot projectile within this convenient space. And staying within the confines of this theoretical, orthogonal space serves students very well as a learning tool.
However, these perfectly-symmetrical, centered Cartesian Coordinates are theoretical learning tools — they’re of little use when it comes to real objects.
When mathematicians, engineers or physicists model an object in real life, there are a handful of undeniably fixed properties that are independent of any type of coordinate system. To show this, let’s start simple, by describing the theoretical temperature (K) at a certain point (P) in a square room represented by an XY-plane. Below, we represent this same point (P) using two different origins, or coordinate systems:
Physical Quantities MeanThe Same Thing In Different Planes
Regardless of the coordinate system we use to represent our point (P), the physical expression of the object, temperature, in this case, should be the same. But let’s take that thought experiment further with a physical object that has both direction & length, say, acceleration. Now let’s imagine we drop some object (K) from the top of a building; when K is dropped, the acceleration vector (A) acts on it. Using the same principle as the previous example, we can also graph this scenario in multiple ways:
The Position & Position Components Change, But Not The Combination
The position variables (K), represented by (x,y) pairs changes but the acceleration vector (A) does not. The physical quantity of acceleration expresses the same meaning independent of our coordinate system.
We can extend these examples to higher dimensions & the point remains the same: mathematicians & engineers need a way to geometrically represent physical quantities & understand how they behave under different coordinate systems.
With this necessity, eventually arrived the concept of tensors, the crux of this article. What follows is not the most strictly-accurate mathematical definition, but rather an intuitive ramp-up that serves as a great starting point for the rest of this light introduction:
Tensors are mathematical objects that are invariant under a change of coordinates & have components that change in predictable ways
Tensor analysis & it’s follow-up tensor calculus, all revolve around these types of IRL objects. From stress to conductivity to electromagnetism, tensors are inevitable in the higher up branches of STEM. Quite famously, & a personal motivator for learning tensors, Einstein’s (yes, that Einstein), General Relatively equation is written exclusively in terms of tensors. Again, there’s a more accurate, abstract definition that we’ll mention towards the end, but this will give us a starting off point as we walk-through the basics of Tensor math.
Before we move on, however, let’s discuss what tensors are not. Unfortunately, most advanced math is skipped over until it’s a need-to-have in the engineering toolkit; this, in turn, creates further confusions as each field introduces tensors with slightly different vernacular. Among others answers, if you Google “what are tensors” you’ll like come across the following:
— Tensors are containers of data
— Tensors are generalized matrices
Both of these have some truth to them but they’re unequivocally incorrect. Using a mental model of storage does capture the notion that tensors store a makeup of components, but it overlooks a key principle: tensor components follow specificbehaviors under linear transforms. On the second definition, it’s true that the rest of this walk-through will include columns, rows, & matrices — but these are merely ways of spatially organizing numbers; these are the tools of tensors, yet they underwhelm in capturing what tensors are.
With that out of the way, it’s time to break out some math & walk-through the very beginnings of tensors.
As you’ll shortly see, an introduction to tensors typically includes reviewing objects that most are already familiar with (rows, columns & matrices); this often leads to students glazing over or speed-reading through these reviews, which is a grave error. The field of tensors assigns new meanings to these objects & alsointroduces a plethora of new notation, both factors which greatly contributed to the confusion among newcomers.
In general, be warned: while the next few sections appear like a 101 linear algebra reviews, they’re most certainly not. We need to work our way through tensors, & in order to do so, instead of overwhelming by introducing all new notations, rules & meanings at once, we’ll bring them in piece-by-piece as we review familiar tools.
Scalars & Nomenclature
We can start by re-visiting the most simple example of a physical quantity: a scalar. As I’m sure you’ve heard numerous times, a scalar is simply a number with no direction, only magnitude — but are all scalars tensors? No.
Recall that the tool, a scalar, is not the definition. Temperature & magnitude? Yes, these scalars qualify as (0,0)-tensorsor rank-0 tensors (we’ll formally define these notations later) since they represent the exact same meaning from all frames of reference (coordinate systems / basis vectors). But here’s an example of a scalar that is not a rank-0 tensor: light frequency. Why? Because your measurement is dependent on position — for example, whether you’re moving toward or away from the light would change the frequency measured. With this deeper clarification on what qualifies as a tensor & an informal introduction to one of the common notations (“rank-x tensor”), we’re good to move on to vectors.
Re-visiting Vectors
Our first real look at a special-type of tensor is an object most are familiar with: good ole’ vectors. Vectors are a unique type of tensor (this will become clearer later) writtenstrictly as column vectors of n-dimensions. Additionally, again, using notation that’ll become much clearer later, they’re also written as (1,0)-tensors or rank-1 tensors.
Let’s make sure we’re clear on a few concepts when it comes to vectors. First, if you’re not familiar or you forgot the terminology, we’re going to introduce the term basis vector. An intuitive way of thinking of basis vectors is to consider them as the equivalent of x,y,z, etc… axis. This property of vector basis is exactly why we can denote vectors as scalars multiplied by the individual vector basis components as follows: v = 5i + 6j + 4k. Here, i, j & k are the basis vectors — we can switch these out though.
Instead of using different variables (i,j,k) we can abstract this property out to any n-number of dimensions & simply replace each letter with x followed by a subscript: 5e1 + 6e2 + 4I3. Simplifying a step further, we realize that for any vector of n-dimension, instead of writing each scalar & vector basis term out, we can instead compactly write the vector as a sum. In general, we can then represent vectors with a sum notation followed by all scalars multiplied by all vector basis:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
Moving forward, it’s very likely that you’ll come across similar notation as this is one of the two the standard notations for vectors in Tensor-land.
Transforms
We’ve mentioned that a key property of tensors is that their meaning, the combination of the components, is invariant to coordinate changes, or transforms. So what exactly does this mean? Let’s again start by laying out some vector (v) laid out on the same space yet defined by two different sets of basis vectors, original (e1,e2) & alternative (~e1,~e2):
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
As we can see above, our original basis vectors (blue) line up to represent our comfortable, orthogonal coordinate system, while the alternate basis vectors (teal) creates a new, non-orthogonal vector space. The key takeaway here is that vector (orange) does not change under a change of basis. Before we analyze the components of V in both spaces, let’s first walk-through the math of how the basis vectors transform, from (e1,e2) to (~e1,~e2).
Forward Transform
Let’s ignore the vector v for a moment & just focus on the two sets of vector basis — how can we mathematically represent our alternative (~e1,~e2) basis vectors in terms of (e1,e2)? As it turns out, this is a straight-forward process, all we have to do is define each alternative basis vector as a sum of some scalar & our original basis vector (ei. ~e1 = ae1 + be2 & ~e2 = ce1 + de2). Shown below, all we did to calculate our scalars (a,b,c,d) is manually grab our basis vectors (e1,e2) & scaled them appropriately. Originally both 130pxs in length, we scaled & transformed our original basis vectors until they intersected with the alternative basis vectors& wrote them down as fractions with a common denominator of 5 (this is all in pixels but the units are irrelevant):
Count The Squares To Double-Check
As seen on the bottom right of the image above, the result is a matrix with n-columns; these columns tell us how to move forward from our original basis to an alternative basis. Appropriately, this matrix is commonly described as a forward transform; we’ll denote it with a capital, bold F moving forward(another common representation is a capital S). But what if we wanted to go in reverse? From the alternative vector basis to our original?
Backward Transform
As the name suggest, there is a related transform, the Backward Transform (B) that does the exact opposite of the Forward Transform (F); and whenever we think inverse when working with matrices, the identity matrix (I)should come to mind. As implied, the basis vector Forward & Backward Transforms are related as follows: BF = I. We could manually measure out our original basis vectors but now as sums of our alternative basis vectors like we did above. Instead, however, let’s double-check our intuition by continuing our example algebraically — we have F & I, so let’s work through the steps to arrive at at the Backward Transform (B):
Sorry But…Algebra Left To The Reader
With four variables & four equations, the algebra above is tedious, but simple. Note, however, that we again introduced new notation — instead of using random letters as our variables for the component of the backward transform, we’re using the letter “B” (for “backward”) with both subscripts & superscripts. This is a common trip-up when learning about tensors: the superscripts are not power symbols, they indicate the components position within the matrix. In fact, moving forward, for the rest of this guide you can safely assume that all superscripts indicate position, not exponents. This notation will become more clear & powerful as we move on, but for now it’s worth noting that the superscripts (above “B”) indicate columns while the subscripts refer to rows.
Working through the algebra, we arrive at the four components on the right that together create the Backwards Transform. Beautiful. With both of these transforms we can seamlessly switch from our original basis vectors to our alternative basis vectors; we can summarize this relationship with the following ~fancy~ sum notations:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
To transform from our original basis to our alternative basis, we multiply the original with the Forward Transform; to transform from our alternative basis to our original basis, we multiply the alternative with the Backward Transform.
Vector Components Under Transform
Excellent, we know how the basis vectors transform, now let’s break down how the vector components transform — how do the components of our example v vector (v) differ under the two different sets of basis vectors (e1,e2, & ~e1,~e2)? How do they behave during the transform? Below, we again show our same vector in the two different spaces, this time though, we’re highlighting the vector components as sums of their respective basis vectors:
V = [.5,.5] (except as a column since it’s a vector)!
Immediately it’s clear that the vector components ((1/2)*e1,(1/2)*e2) & (c*~e1,d*~e2) are different — but we already how they transform right? The Forward Transform worked for the basis vectors, so let’s simply apply the same transform to the components of the vector in the original basis space:
Applying Forward Transform To The Components…Something Went Wrong.
Something went wrong here. As we can see above, transforming the original component vectors with the Forward Transform did not return the same vector — the image on the right shows that we instead arrived at some new vector. To understand why this is, let’s carefully observe the movement of the component vectors through the transform:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
Watching just the components of the vectors during our transform highlights something interesting: the components move in the opposite direction. They move “against” the basis transform, or, in more appropriate nomenclature, they’re contravariant. We’ll re-visit this term continuously as it’s a critical part of the Tensor dictionary; for now, all we’ve shown is the components of a vector under a basis transformation transform in the opposite direction. We’ll quickly double-check this by now multiplying our original vector components by the Backward Transform(instead of the assumed Forward Transform):
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
Finally, as we can see above, our original vector components transformed with the Backwards Transform spat out the same vector, but now in terms of our alternative basis vectors: v~e = (15/22)~e1 + (10/22)~e2.
We’ve already covered a decent amount of new vocabulary & notation, we’ll summarize this below before we move on to our next example of a special tensor: the covector.
Section Review
Note* — From Wiki To Wolfram, the formal, right definition for each of the concepts defined below & in following reviews are a Google search away; for effectiveness, I’m introducing these terms in a much more informal/beginner-friendly manner but still highly encourage looking them up.
Basis Vector: the independent x,y,z…n axis for a given vector space. These basis vectors, usually denoted by the letter e, are what allows us to represent vectors as a sum of scalar multiplied by basis vectors.
Vectors: a special type of tensor represented strictly as columns & written as an aggregate sum of scalars multiplied by n basis vectors. Also known as contravariant vectors & (1,0)-tensors.
Sum Notation: a new way to write vectors as with the sum notation; they’re represented as an aggregate sum of scalars multiplied by n basis vectors.
Forward & Backward Transforms: the transform of vector components from an old basis vector to a new basis vector (or vice versa) — represented by matrices or with the letter F/S (Forward) & B/T (Backward). The dot product of the Forward & Backward transforms results in the identity matrix I:
Contraviarant Transformation: how the components of a vector transform under a change in basis; easy to remember by the name, they change contra/against the forward transform, this is reflected in the formula above.
Covectors
Alright, admittedly the previous section was mainly a review with a dash of new topics. Unfortunately, the learning curve steepens in this next section as it introduces an entirely new special class of tensors with rather ambiguous terminology. For example, apart from “covector,” the objects we’ll cover here are also known as: (0,1)-tensors, 1-forms, covariant vectors, dual vectors, linear functors & functions. To assuage this learning curve, we’ll again start with application & examples, & eventually trend towards an abstract, but accurate understanding.
Reasonably, the lowest-hanging branch of conceptualization when it comes to covectors is the tool of choice: rows. In Tensor-land, vectors are strictly written as columns & co-vectors are strictly written as rows.
If you have some linear algebra background, it’s worth giving the disclaimer: transposing columns & rows is not allowed. Why? Because in real life basis vectors are almost never orthogonal; switching column & row vectors convey the same meaning & return the same vector only in the special occasion of an orthogonal plane.
Fine, so we know how covectors are written, but what exactly are they? Let’s first dissect the technical, right, albeit abstract definition of a covector/covariant vector/dual vector/1-form/(0,1)-tensor:
Given a vector space 𝑉, there is a “dual” space 𝑉∗ which consists of linear functions 𝑉→𝔽 (where 𝔽 is the underlying field) known as dual vectors. Given 𝑣∈𝑉,𝜙∈𝑉∗, we can plug in to get a number 𝜙(𝑣).
I struggled tremendously with this definition as its compactly packed with much abstractness, so let’s go through this piece-by-piece. Ignore the first clause (we’ll circle back) & go to “consists of linear functions 𝑉→𝔽 (where 𝔽 is the underlying field) known as dual vectors.” A really simple translation here is: a dual/co/covariant/1-form is a linear function that “eats” a vector (V) as an input & returns some scalar (F); the notation for this usually looks something like this: 𝜙(v) = c. Returning to the first clause, these linear functions & the resultant scalars do not belong to our vector space, but to a separate space known & written as the “dual space” V*. Hopefully that reduces some confusion on what covectors are:
Linear functions represented by rows in an array that input normal vectors & output scalars.
These scalars do not exist in our vector space V but rather in some other space which we’ll refer to as the “dual space.” We visualize normal (or contravariant) vectors of n-dimensions as arrows with direction & magnitude in our vector space — so how do we visualize covectors?
We can gain a visual understanding by drawing out an arbitrary covector (𝜙 = [2 1]) acting on our example vector space, not on any specific vector just yet:
Covector [2,1] Is Best Visualized As A Set Of Lines
All we’re doing above is exploring how an example covector (𝜙 = [2 1]) would look like when it spits out varying scalars.This is why referring to “covectors” as linear functions makes sense: they’re best visualized by a series of lines (2d), planes (2d) or hyperplanes. Instead of inserting any one specific vector into our covector (𝜙 = [2 1]), we drewout the general covector. To see how this specific covector (𝜙) interacts with our orange example vector (v), let’s now lay it over our cover:
𝜙(V) = # Of Lines Crossed
Our diagram now represents our covector (𝜙) with the input of our vector (v): 𝜙 (v). Highlighted in yellow, the resultant scalar from 𝜙 (V) is the number of lines of crossed; in this specific example, we can count that our vector (v) crosses approximately ~1.5 lines, so we can say that 𝜙 (v) = 1.5.To further our visual understanding, we can double-check this algebraically:
𝜙(V) = 1.5
We’ve now proven both algebraically & visually that 𝜙(v) = 1.5. With this experience under our belts, let’s re-visit the technical definition for a covector:
Given a vector space 𝑉, there is a “dual” space 𝑉∗ which consists of linear functions 𝑉→𝔽 (where 𝔽 is the underlying field) known as dual vectors. Given 𝑣∈𝑉,𝜙∈𝑉∗, we can plug in to get a number 𝜙(𝑣).
Let’s now switch our focus over to the “dual space V*” part of the definition. When it comes to vectors in a vector space V, we can write them out as sums of scalars & basis vectors: v= ae1 + be2 + ce3…so what’s the equivalent of writing out our basis covectors in the dual space V*? How do we derive them & what do they look like? Once we know this, we can finally explore how covector components behave under a change in basis.
The Dual Basis | Basis Covectors
Just like we have two basis vectors (e1,e2) & two alternative basis vectors (~e1,~e1), we can also write our our covector as the aggregate sum of scalars & basis covectors (also commonly known as the dual basis ); instead of e & ~e, we’ll define our dual basis components with sigma (ϵ1,ϵ2) variables. First, as a quick prerequisite & sneak peak however, we’ll need to learn about the famous Kronecker Delta.
Kronecker Delta
The Kronecker Delta is a special, compact function that tells us how vector & covector basis interact over the same index (it’s okay if this terminology is still a bit unclear):
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
The above tells us that that whenever we take the dot product of a vector basis & a covector basis, if they share the same index, the dot product equals 1, otherwise, it equals zero. It’s critical to note that up to this point we’ve strictly used subscripts (lowered indices) in our e & ~e basis vectors; this is because in Tensor-land:
Vectors/contravariant vectors are written with lowered indices, while covectors/covariant vectors are written with superscripts, or raised indices.
In the formula above, the superscript i refers to indices of basis vectors while the subscript j refers to indices of basis covectors. We can immediately apply the Kronecker Delta to help us derive the basis covectors (ϵ1,ϵ2). Take a moment to understand exactly what the KD implies: every vector & covector basis with the same index equals one, which means a covector line is interacted with at a single tangential point (though not always strictly perpendicular) like in breakdown of our original basis below:
Our basis vector & dual basis
The diagram above breaks down our covector into it’s respective dual basis; following the Kronecker Delta, we can see that each basis vector/covector pairs are perpendicular & that they combine to represent our dual basis: B = [1 1]ϵ or ϵ1 + ϵ2.
Covectors As A Sum
Great. But what about representing our specific example covector 𝜙? Our previous way of writing it as a row [2 1]ϵ still works; or, we can write it in a similar sum notation that we used for vectors — multiplying the dual basis by their appropriate scalars: 𝜙 = [2 1]ϵ =2ϵ1 + 1ϵ2. If we draw out both scalar/covector multiples, we can see that we indeed arrive at the graph of lines from a few paragraphs above:
Our covector 𝜙 = 2ϵ1 + 1ϵ2 expressed in its components
Now with a basic grasp on covectors, their output, basis covectors & the dual space, we can finally turn our attention to the real topic du jour — how do covectors & their components behave under a transform?
Covectors Under Transform — Covariant Components
In the previous section, we discovered that while basis vectors transform one way (with the Forward Transform), the underlying vector components transformed in a contravariant manner (with the Backward Transform); predictably, as the title & etymology implies, covectors transform in the opposite way:
Covector components transform with (or covariantly to) the change in basis — they follow the Forward Transform; the basis covectors, or dual basis, follow the Backward Transform.
We can best show this by continuing our example. We’ll first derive our alternate dual basis (~ϵ) using the Backward Transform; then, we’ll express our covector 𝜙in terms of our alternate dual basis. To double-check that everything worked out, we’ll algebraically verify that 𝜙~ϵ (v~e) = 1.5; in other words, we’ll confirm, that that our covector acting on our vector returns the same value regardless of the basis it’s expressed in.
Covector Component Transform
The components of a covector transform in the same direction as the transform to its dual basis; for us, that means if we want to express our covector 𝜙 in terms of ~ϵ, instead of ϵ, we need to get the dot product of 𝜙 & our Forward Transform — this is done below:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
The output on the right expresses our covector 𝜙 in terms of our new basis; from the section on vectors above, we also already derived our example vector v in the new basis (as a reminder, our vector components changed from [.5,5.5]e to [15/22,10/22]~e). To wrap this section up, we can algebraically verify the spatial independence nature of tensors by once again passing our example vector v through our covector 𝜙, except with both tensors expressed in our new/updated basis. Recall that above we calculated that 𝜙 (v) = 1.5:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
It worked! Our covector acting on our vector returned the exact same value regardless of whether the components were expressed in our original (blue arrows) or updated (green arrows) basis. This is a huge leap in our understanding of tensors because it highlights the invariant properties of tensors — while the components of both our vector & covector changed, their geometric & algebraic properties, the whole of their parts, preserved its meaning.
Section Review
Covectors: a special type of tensor represented strictly as row & written as an aggregate sum of scalars multiplied by n dual basis vectors. Also known as contravariant vectors, (1,0)-tensors, 1-forms, covariant vectors, dual vectors, linear functors & functions:
Dual Basis Vector: the independent x,y,z…n axis for a given covector/dual space V*. These dual basis, usually denoted by the symbol ϵ, are what allows us to represent covectors as a sum of scalar multiplied by basis covectors — they’re usually derived algebraically with the Kronecker Delta. They follow the Backward Transform when updating with new basis vectors:
Kronecker Delta: a key formula that describes how variant & covariant components interact; often used to derive the dual basis given a set of basis vectors. new way to write vectors as with the sum notation; they’re represented as an aggregate sum of scalars multiplied by n basis vectors. Vectors/contravariant vectors are written with lowered indices, while covectors/covariant vectors are written with superscripts, or raised indices. Adherence to this notation is critical for further concepts.
Covariant Transformations: how the components of a covector transform under a change in basis; easy to remember by the name, they change co/with the forward transform, this is reflected in the formula below:
Linear Mapping
In addition to a simple (0,0-tensor, we’ve now reviewed two special types of tensors: vectors/(1,0)-tensors & covectors/(0,1)-tensors. As the metaphorical bow on top, we’ll wrap up this light intro by introducing a fourth, final type of tensor that’ll connect to the previous ones.
As the title suggests, this final type of tensor is known as a linear map. Much like our previous tensors, it also has an array of names such as linear transforms or (1,1)-tensors. Next, similar to the previous sections, we’ll introduce the tool of choice for representing & manipulating linear maps: matrices. Vectors are written as columns, covectors are written as rows, & linear maps are written as matrices.
Next, onto the output of a linear map; much like a covector takes in an input vector (v) & returns a scalar, a linear map takes in an input vector (v) & returns a new vector (w). Linear maps transform input vectors but they do not transform a basis — they simply map one vector to another in the same vector space.
Moving on from both our representation & output, let’s dive into the abstract definition likely found in a textbook:
A function𝐿:𝐮→𝐯 is a linear map if for any two vectors𝐮,𝐯 & any scalar c, the following two conditions are satisfied (linearity):
𝐿(𝐮+𝐯)𝐿(𝑐𝐮)=𝐿(𝐮)+𝐿(𝐯)=𝑐𝐿(𝐮)
Linear maps transform vectors, but they do not transform the basis; another , more-abstract way of thinking about linear maps is to consider them as spatial transforms. Explanations, no matter how formal or informal have limits in conveying meaning, so let’s go ahead & jump into our very last example.
Let’s imagine a linear map (L)e = {(1,-2),(5,-3)}. Below, we’ll apply this linear map (L) to our existing vector (v) to return some new vector (w):
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
In the images above, starting from the left, we first see our original vector v (.5,.5); in the middle section, we start with the linear map (L) which acts on our vector v to output a second vector w; in the final image to the right, we see the new vector w along with its components. Take a moment to walk through the algebra above; below, we extrapolate L(v) by breaking it up into terms of our basis vectors e1 & e2:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
Defining our linear map in terms of our basis vectors shows us, once again, how this operation can be expressed with a series of sums.
Before distilling further, it’s now worth clarifying a point made all the way in the beginning: rows, columns, & matrices are tools used to represent matrices, they themselves are not tensors. For example, we’ve already used matrices earlier in this guide when they were used to represent Forward & Backward transforms — these were nothing but square arrays of real numbers, their definition depended on a specific choice of bases. Those were not tensors, yet they were matrices; linear maps are tensors represented by matrices. As we’ll see below, yes, the inner components of a linear map will change according to a new basis, but it’s geometric meaning will not, because, once again, tensors are invariant — they preserve their meaning.
Linear Maps Under Transform
Naturally, similar to the previous sections, we’re interested in exploring in-variance of these objects — aka how they behave under a basis transform. We already know what our vector (v) & L(v) or (w) look like on our original e basis, now, we’re going to define both L & (w) in terms of our new basis ~e.
In the very first section on vector/contravariant vectors, we worked through a change in basis (shown by the green-teal arrows with ~e) for our vector v = (15/22)~e1 + (10/22)~e2. By now it’s hopefully obvious that, like all previous tensors, we need to update their internal components whenever we have a change in basis — this includes our new L(v).
We want our linear map (L) to reflect the exact same transform v to w regardlessof our change in basis; the matrix of numbers we used for (L), which were in our original e basis {(1,-2),(5,-3)}, are no longer accurate in our new basis ~e. So the question begs, how do the numbers in our linear map change in the new ~e basis? Walking through it all in sum notation, we’ll go through the algebra below:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
We’ve now figured out how to update the components in our linear map to accommodate our change in basis! As you can see above, updating our linear map from an old basis to a new basis uses both a Forward & a Backward Transform; intuitively, this should make sense since a linear map, or a (1,1)-tensor has bothcontravariant & covariant components. Below, we’ll quickly double check our derived formula by implementing it with the example linear map from above. We’re specifically going to calculate:
~L — The components of our linear map in our new basis
~L(v~e) — The output of our linear map on our vector
We’ll have both a matrix that represents our linear map L in our new basis & the components to our output vector w in the new basis. If this all works out, then our vector w will look exactly as it does in the example above. Please refer to former diagrams to find F,B & L:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
And we’re done! We’ve now updated our linear map according to the change in basis. If you look at the final column on the bottom right, we indeed got our transformed vector w written in terms of our updated basis vector (~e): w = (165/242)~e1 + (715/242)~e2.
To conclude this very last section, we’ll overlay the above on our original basis — highlighting, once again, the invariant property of tensors, in this case, a (1,1)-tensor, or a linear map:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
Section Review
Linear Maps: a special type of tensor represented strictly as matrices & written as an aggregate sum of scalar components multiplied by n basis vectors. Also known as linear transforms or (1,1)-tensors since they have both covariant & contravariant components:
Einstein Notation: we barely, briefly saw this when going through the algebra to determine how to transform our linear map. Basically, Einstein, yes, theEinstein, realized that as long as we’re careful about our super & subscript indices, we could drop the sigma sum notation & keep the same meaning. If/when we continue onward, this will be the standard notation for tensors — to assume a series of aggregate sums.
Linear Map Transformations: how the components of a linear map transform under a change in basis; absolutely critical to note that we use both the Forward & Backward Transform since linear maps, (1,1)-tensors, have both covariant & contravariant components:
*For sake of brevity I’ve excluded how we derived transforming from a linear map in the new basis to our old basis; as a slight hint for a self-directed exercise, observe that FB = KD, or that the dot product of the Forward & Backward Transform result in the Kronecker Delta.
Putting It All Together — Tensor Product Preview
We’ve now reviewed two unique types of tensors that together, acting as building blocks (vectors & covectors), allowed us to combined them to introduce a third type of tensor (linear maps). Throughout, we’ve continuously reminded ourselves that the power behind using tensors is to accurately represent objects in real life, regardless of our spatial perspective, property or transform.
Contravariant & covariant transformations, aka whether the components or the whole transform against or with basis changes, are really the driving mechanics behind tensors; the covariant components of are the perpendicular projections on the basis vectors, while the contravariant components are the parallel projections.
Denoted by a superscript & a subscript respectively, we also learned the special Kronecker Delta formula which provided us with an understanding for how co & contravariant transformations interact. Additionally, we picked up an entirely new system of notations — starting with representing vectors & covectors with a sum notation, then afterward introducing the sigma-less Einstein notation.
In short, this guide was certainly not concise, but it was all a prerequisite to finally introduce a much more abstract, yet accurate, general definition of a tensor. Vectors, covectors, & linear operators are all special cases of tensors, the best definition for a tensor is:
An Object That Is Invariant Under A Change Of Coordinates & Has Components That Change In A Special Predict Way
We’ve already seen a sneak peek of it, but of course, the follow up question is exactly what way is it that tensor components transform? The following formula looks incredibly intimating, which is why we’ve left it until now, but trust that we’ve learned everything needed to fully understand it:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
We know that the F & B stand for Forward & Backward Transforms, the T, appropriately, stands for Tensor. Let’s isolate that T momentarily & tie its notation back to one of very few times we introduced indices & the (m,n) — tensor standard. Below, we finally come full circle:
Originally Published: https://www.setzeus.com/public-blog-post/a-light-intro-to-tensors
Now, let’s look back at the general tensor formula. A tensor is an object with (m,n) amount of contra & covariant components; to figure out how any tensor of (m,n) dimensions changes under a basis transform, all we have to do is follow the general rule set above. All the coordinates stored upstairs are the contravariant coordinates, all the coordinates stored downstairs are the covariant coordinates. Just based on how many (m,n) components any random tensor contains, we can now immediately predict how it’ll change under a basis transform.
In Closing
Looking back at what we presented in the preceding sections, we can now (hopefully) fully understand why a contravariant vector is in fact a (1,0)-tensor, a covariant vector is a(0,1)-tensor & a linear map is a (1,1)-tensor. At any point P in a generalized system, with an associated set of local axes & coordinate surfaces, we can specify two related but distinct sets of unit vectors: (1.) a set tangent to the local axes, & (2.) another set perpendicular to the local coordinate surfaces. Known as basis vectors & dual basis, just with this basic information we can fully predict how geometric object will behave under a spatial transform.
Tensors, tensor analysis & tensor calculus takes account of coordinate independence & of the peculiarities of different kinds of spaces in one grand sweep. Its formalisms are structurally the same regardless of the space involved, the number of (m,n)-dimensions, & so on. For this reason, tensors are very effective tools in the hands of theorists working in advanced studies — a power tool indeed.
Onto The Metric Tensor, The Tensor Product & General Relativity
Before we completely close out, I’d like to circle back to the very opening of this guide: the motivation. Why bother learning about tensors in the first place?
Well, you’ll evantually run into applied tensors in advanced branches of physics & engineering; however, my personal motivation started down this path because Einsteins eminent formula, work, & understanding of our universe is written in tensors —yes, his general relativity formula in particular is written in tensors:
Einstein’s General Relativity — Expressed With The Ricci Tensor, The Metric Tensor…
Every variable with two subscripts is in fact a tensor. With this guide you’re now much more able to Google & work through each term. Take particular note of the gUV term above, just like the linear map is a combination of covariant & contravariant components, this g is also a special, yet common type of tensor known as the metric tensor. In fact, I purposely left out the metric tensor because it’s so important that it merits it’s own follow-up; as a sneak preview, the metric tensor is the tool that helps us define distance in any vector space — another incredibly useful tool, especially when modeling gravity.
If you’ve stuck all the way though I want to say a final note of gratitude: thank you, this piece took out more of me than I signed up for — I hope you find it useful or at least interesting. Now looking forward to writing up at least three follow-ups on: the metric tensor, all types of tensor products, & finally, general relativity .
The beginner’s guide to proving the Fundamental Theorem of Calculus, with both a visual approach for those less keen on algebra, and an algebraic, slightly more rigorous approach, for those keen on exactness.
By the end, I hope you feel a bit more like a mathematician 🙂
Introduction, motivation and ‘hello!’
Hello! We are going to understand one of the most historically important and brilliant proofs in mathematics. Important and brilliant because it reduces previously impossible problems — that of integrating functions — into the art of spotting a derivative. But more on that soon.
What is wonderful about this proof is that there are two approaches, both of which complement each other, but also can be understood independently. To begin with, we will see an informal statement of the theorem, and an informal statement of the proof. This will give the intuition and ‘essence’ of what we are doing. This proof will be visual in nature, and not require excessive or complicated algebra. This part will convey some key ideas without algebra, but at the cost of being less exact. Next will come a formal statement and proof. This is optional. Why do I nevertheless encourage you to try and understand it, even if you aren’t very comfortable with ‘algebra’ proofs compared to visual proofs?
The visual proof captures the key ideas, but the formal proof shows how mathematicians turn those ideas into mathematical objects and then prove things about the mathematical object
Having seen the visual proof, you will have some idea what is going on in the algebra proof even if you don’t follow all the details
Ideas in mathematics sometimes take a while to sink in. Taking time to think about something is never time wasted. At some later point the ideas will click, or be handy elsewhere. Time spent thinking about mathematics is fundamentally time well spent. Although, I am somewhat opinionated 🙂
You’ll never know if you never try 🙂
A (very) short introduction to Derivatives (for those who haven’t encountered derivatives before)
Derivatives are about approximating functions with straight lines. The idea is that, near a point, the tangent line provides a pretty good approximation to how the function is changing.
The derivative of a line at a point can be viewed as the slope of the ‘best’ linear approximation at that point.
The derivative as the ‘best’ linear approximation near a point. Attribution: derivative work: Pbroks13 (talk)Tangent-calculus.png: Rhythm / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
The idea is that, for many functions, a lot of the information about the function is contained in using a linear function to approximate it. Obviously, this approximation isn’t perfect, but if such an approximation holds everywhere, we learn a lot about the function: in fact we can recreate the function upto a constant term.
At the end of the article are some resources on understanding derivatives and other aspects of calculus, if you want to go into greater detail. We will also define the derivative at bit more precisely later.
The Fundamental Theorem of Calculus then tells us that, if we define F(x) to be the area under the graph of f(t) between 0 and x, then the derivative of F(x) is f(x).
Let’s digest what this means. Below is a red line — this is our function f. We want to find out the area between 0 and x — x is marked red on the x-axis. Our function F tells us, for each point on the x axis, what the area is under the curve at that point. [Please excuse my poorly drawn ‘x’]
source: Desmos.com
We want to determine what derivative of our function F is — at x. We can get a graphing calculator (I used desmos, but geogebra is also good and free) to plot F(x), which I have done below:
Graph of the function F
So this function looks like it should have a derivative. But what is it?
Imagine we look at the best line approximation to F(x) close to x. What might this look like? Well, how about we make a good guess.
Let’s suppose F(x+dx) is roughly equal to F(x) + dx*f(x)
For instance, at x = 8, we might say that F(8.00001) is well approximated by F(8) +0.00001*f(8). What is the ‘visual’ proof of this?
Let’s look at the area under the graph again.
When we use the approximation F(x+dx) roughly equals F(x) + dx*f(x) we see the following. dx*f(x) is represented by the area of the red rectangle, which has height f(x) and width dx. Is this a good linear approximation? Yes! Rewriting, our approximating function at x = 8 is F(8) + h*f(8). We also see that the rectangle contains ‘nearly’ all the area F(x) would have gained by going to F(x+dx). This can be seen below, where the area we ‘missed’ is just the small blue shaded region, which is much much smaller than the rectangular region.
However, there are several improvements we can make on this proof. Yes, it certainly looks like a good approximation on this graph, but does it work on all graphs? After all, graphs can look very different. Also, how are we defining our ‘best linear approximation’? This leads to formulating the problem using some algebra.
Part II: A Statement using Algebra
First, we want to define our derivative. This is done as follows:
[‘lim’ stands for ‘limit’]
The limit just means to look at what happens to the expression as dx gets arbitrarily close to 0. So, you might compute the following sequence and see where it ‘tends’ towards
For the functions we’re interested in, it won’t matter which sequence you pick for ‘dx’ , provided it tends to 0.
We get a nice visual feel for it in the following diagram:
My own creation!
We then see that, as dx tends to 0, the limit of the gradient of the straight line connecting F(x) with F(x+dx) is defined to be our derivative. This can be seen below.
Tangent animation.gifFrom Wikimedia Commons, the free media repositorycommons.wikimedia.org
We use a limit because, while x = 0.01 or 0.00001 may seem small to us, for a function like x¹⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰ a 0.01 difference suddenly results in a huge change in output. The limit means that dx can be made arbitrarily small so that we can always zoom in enough that our function can be approximated by a straight line.
***Note: there are some functions which cannot be approximated nicely by a straight line locally to a point no matter how far you zoom in, but these are dragons to slay on another day, with different techniques!***
Next, we want some notation to represent the area under the curve between 0 and x. We write:
***what does the f(t)dt mean? One way to look at it is f is a function of some variable t, so we integrate over t. The variable which denotes how far we integrate is x, so the upper bound of integration is x but we write f(t) as a function of t. It doesn’t matter really which variable name we use for f apart from avoiding using ‘x’ twice, because then we would have given the symbol two different meanings.***
Now, our task is to prove that:
Where, in the second line, we have just plugged in our definition of F(x) as the area under the curve, using the notation introduced above.
Part IIIa: A Proof using Algebra
We now prove
First we observe that
This is because we are only interested in the area between x and x+dx. This is seen in the diagram below, where we are really interested in the red area.
So, the problem then is to work out what the following limit is:
Here we assume the f(t) is continuous at t = x. (We can actually use a weaker assumption, but it requires more effort, as we will see in the final section)
What is the definition of continuity at x? This will take a bit of time to wrap your head around! (Read through the definition twice, then continue, as I will explain it in more informal language)
What does this mean? It means for any (small) number, we can find a small band around x where f(t) is less than that small number away from f(x). For instance, you might set the ‘small’ number to be 0.001. Then, I might find that if t is less than 0.00001 away from x, then we are guaranteed that|f(t) — f(x)| < 0.001. In this case, suppose x = 8, then |f(8) — f(8.000001)|
After hearing about Eugene Wigner’s article, “The Unreasonable Effectiveness of Mathematics in the Natural Sciences,” for many years, I recently decided to read and digest the article for myself. (If you’re so inclined, you can read about the results here.) Continuing this exploration of the philosophy of mathematics led me to Mario Livio’s intriguing book, Is God a Mathematician?, selected by the Washington Post as one of the best books of 2009. Here, I present a summary of that book. My research on this book in turn led me to Livio’s Scientific American article, “Why Math Works,” an interview with Krista Tippett, and a Nova special that features him prominently. I encourage you to explore these sources as well as the book itself.
Themes
Livio does not intend the title as a theological question. He is instead referring to two questions concerning the mysterious effectiveness of mathematics in the physical world.
The first question is “Does mathematics have an existence independent of the human mind?” or put another way, is mathematics discovered, extant in an abstract realm, independent of the world we perceive with our senses, or is it an invention, that is, a creation by the human mind? Those who belong to the “discovered” camp are known as “Platonists,” and include (or have included) such notables as G. H. Hardy, Roger Pendrose, Kurt Gödel, and Martin Gardner. On the “formalist” side of the argument lies Albert Einstein, David Hilbert, Georg Cantor, the Bourbaki group, and others.
The second question is: “Why do mathematical concepts have applicability far beyond the context in which they have originally been developed?” This is the question that Eugene Wigner asked. As he put it, “It is difficult to avoid the impression that a miracle confronts us here, comparable in its striking nature to the miracle that the human mind can string a thousand arguments together without getting itself into contradictions.”
Livio clarifies and focuses this second question by making a distinction between active and passive applicability. By active applicability, he means mathematics that was created as a tool for a purpose. Although no one is surprised that active mathematics is effective — it was developed specifically for the purpose, and the theories were tailored to fit the observations — it may be surprising just howeffective it can be. For instance, the theoretical calculation of the magnetic moment of an electron in quantum electrodynamics matches the actual value to within a few parts per trillion. Passive applicability is more surprising. “Passive” in this context refers to applicability of mathematics beyond the intent of the original theory. For instance, entire subfields of mathematics are created with no physical application in mind, “math for math’s sake,” but then decades or even centuries later, these theories fit real world situations with astounding exactitude. For instance, group theory was developed in the early 1800s by Évariste Galois to study the solvability of polynomial equations, yet worked beautifully in the 20th century to categorize elementary particles. As another example, so-called “non-Euclidean geometries,” which defy the parallel postulate of traditional Euclidean geometry (more below), came as a shock to the mathematical community in the mid-1800s and were even derided by many as not being “true” mathematics. Yet Einstein formulated his general theory of relativity on Riemannian geometry.
Early Platonists
The first few chapters of the book concentrate on some famous early names in mathematics: Pythagoras, Plato, Archimedes, Galileo, Descartes, and Newton. These men helped to simultaneously formalize mathematics and effect its role as the language of science. Each of these mathematicians were Platonists, who viewed mathematics as pre-existing, either from some mysterious realm of thought, or as arising from the mind of God.
The Pythagoreans
While their predecessors were primarily interested in mathematics as a practical tool, for instance, for measuring out land, or predicting flood cycles, the followers of Pythagoras were the first “pure mathematicians,” interested in mathematics for its own sake. As such, they pioneered the use of rigorous proof, the practice of starting with self evident postulates and utilizing logic to prove conclusions from them. As to the question of whether mathematics was discovered or invented, they fell firmly into the discovered camp. Livio says it this way: “On the question of whether mathematics was discovered or invented, Pythagoras and the Pythagoreans had no doubt — mathematics was real, immutable, omnipresent, and more sublime than anything that could conceivably emerge from the feeble human mind. The Pythagoreans literally embedded the universe into mathematics. In fact, to the Pythagoreans, God was not a mathematician — mathematics was God!”
They also set the stage for Plato. “The importance of the Pythagorean philosophy lies not only in its actual, intrinsic value. By setting the stage, and to some extent the agenda, for the next generation of philosophers — Plato in particular — the Pythagoreans established a commanding position in Western thought.”
Archimedes
Although today Archimedes is mostly known for his clever engineering inventions, famously including the hydraulic screw for raising water and various engines of war, his primary interest was in pure mathematics. In geometry, he presented general methods for areas and volumes of a variety of figures, in number theory, he approximated lower and upper bounds of π as 3 10/71 and 3 1/7, respectively. He also invented a system to denote and manipulate numbers of any magnitude.
Legend has it that he died because he wouldn’t move from a mathematical diagram during a battle, and a Roman soldier killed him. Alfred North Whitehead wrote, “The death of Archimedes at the hands of a Roman soldier is symbolical of a world change of the first magnitude. The Romans were a great race, but they were cursed by the sterility which waits upon practicality. They were not dreamers enough to arrive at new points of view, which could give more fundamental control over the forces of nature. No Roman lost his life because he was absorbed in the contemplation of a mathematical diagram.”
Plato
Although Plato was not a mathematician per se, he was a great philosopher of mathematics, influencing the field for the following millennia. To Plato, mathematical truths did not refer to geometric figures represented on papyrus or marked in the sand, but to abstract objects that exist in an idealized non-physical world of true forms. This view that mathematics is extant in an abstract realm independent of the world of our senses became known as “Platonism.”
Galileo
Galileo enrolled in the faculty of arts of the University of Pisa to study medicine but changed to mathematics. He became a huge fan of Archimedes. He said, “Those who read his works realize only too clearly how inferior are all other minds compared with Archimedes’ and what small hope is left of ever discovering things similar to the ones he discovered.”
Galileo is, of course, known for his challenges to the geocentric model. Although he did not invent the telescope, he greatly improved it, and made many discoveries that supported the heliocentric view of the universe. This view, the scientifically correct one, resulting in his eventually being tried by the inquisition and subjected to house arrest for the rest of his days.
More importantly for our purposes is the impact he brought to science. Livo says, “The philosopher of science Alexandre Koyré (1892–1964) remarked once that Galileo’s revolution in scientific thinking can be distilled to one essential element: the discovery that mathematics is the grammar of science. While the Aristotelians were happy with a qualitative description of nature, and even for that they appealed to Aristotle’s authority, Galileo insisted that scientists should listen to nature itself, and that the keys to deciphering the universe’s parlance were mathematical relations and geometrical models… Centuries before the question of why mathematics was so effective in explaining nature was even asked, Galileo thought he already knew the answer! To him, mathematics was simply the language of the universe. To understand the universe, he argued, one must speak this language. God is indeed a mathematician.”
Descartes
Descartes, of “cogito, ergo sum” fame, introduced several mathematically revolutionary concepts. In particular, he developed the use of functions in mathematics. Such a function encodes a “rule” that a physical phenomena follows. For instance, Newton’s gravitation states that the gravitational attraction between two masses is proportional to their masses and inversely proportional to the square of the distance between them. He also proposed the Cartesian coordinate system, which married the fields of algebra and geometry into a new field, analytic geometry.
Descartes also argued that the natural world is best described through mathematics, and not through the our often misleading perceptions. “I recognize no matter in corporeal things apart from that which the geometers call quantity, and take as the object of their demonstrations…And since all natural phenomena can be explained in this way, I do not think that any other principles are either admissible or desirable in physics.” He went even further, suggesting that nearly all of human knowledge could follow from systematic application of mathematics.
Newton
Newton, undoubtedly one of the greatest mathematicians of all time, took Descartes’ idea that the cosmos can be described in terms of mathematics and made it reality. He formulated the fundamental laws of mechanics, deciphered the laws describing planetary motion, and erected the theoretical basis for the phenomena of light and color. All of these ideas were written in the language of mathematics. He also, of course, developed differential and integral calculus.
Non-Euclidean Geometry
Thus far, we would be forgiven for either concluding that the Platonist view is the obvious perspective or that the whole debate is a moot point. However, in the mid-1800s, the mathematical world was in for a shock. This bombshell came in the form of so-called “non-Euclidean” geometries.
Euclidean geometry
Until the beginning of the 19th century, Euclidean geometry was considered the epitome of truth and certainty. The Platonic ideals of Euclidean geometry were considered to be abstractions of their real-world counterparts. Even Kant, who in his Critique of Pure Reason asserted that we know what we know by sensory perception processed by the brain, made an exception for geometry, stating, “Space is not an empirical concept which has been derived from external experience… Space is a necessary representation a priori, forming the very foundation of all external intuitions…” Or put more simply, it was taken more or less as a given that absolute truths about the universe exist. Non-Euclidean geometries were about to challenge this view.
To understand non-Euclidean geometries, we must first understand what is meant by Euclidean geometry.
Circa 300 BC, Euclid laid out geometry in his thirteen volume work The Elements. He started with ten axioms assumed to be self-evidently true and constructed geometry through logic applied to these postulates. The first four axioms, such as the first one, which states that a straight line may be drawn through any two points, are obvious. The fifth postulate, the so-called “parallel postulate” read as follows. “If two lines lying in a plane intersect a third line in such a way that the sum of the internal angles on one side is less than the two right angles, then the two lines inevitably will intersect each other if extended sufficiently on that side.”
Image from Wikipedia: By 6054 — Edit of http://pl.wikipedia.org/wiki/Grafika:Parallel_postulate.svg by User:Harkonnen2, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=4559984
So if angles α and β together are less than 180 degrees, the lines must cross.
Non-Euclidean Geometry
The fifth postulate seems convoluted and artificial. For centuries, attempts had been made to “clean up” Euclid’s geometry by proving the fifth postulate as a consequence of the other four postulates with no success.
In the mid-1800s, it was shown that one could create a new kind of geometry by choosing a different axiom than the fifth postulate. Furthermore, the resulting geometries described space just as accurately as Euclidean geometry. This is the key turning point in our story. The implication is that mathematics is not a preordained truth, but determined by choices that one makes. This is why the questions we’re talking about even make sense to discuss. Up until now, no one looked at mathematics in this way — it was taken as a given.
Livio says, “Let me pause here for a moment to allow for the meaning of the word “choosing” to sink in. For millennia, Euclidean geometry had been regarded as unique and inevitable — the sole true description of space. The fact that one could choose the axioms and obtain an equally valid description turned the entire concept on its ear. The certain, carefully constructed deductive scheme suddenly became more similar to a game, in which the axioms simply played the role of the rules. You could change the axioms and play a different game. The impact of this realization on the understanding of the nature of mathematics cannot be overemphasized.”
The stage for alternate geometries was set by Girolamo Saccheri, who investigated the consequences of replacing the fifth postulate by another statement, and Georg Klügel and Johann Heinrich Lambert, who realized that alternative geometries could exist. Three mathematicians in particular created non-Euclidean geometries, once and for all showing that Euclidean geometry was not the only representation of space: Nikolai Ivanovich Lobachevsky, János Bolyai, and Carl Friedrich Gauss, who presented what is now known as hyperbolic geometry. Bernhard Riemann furthered the field of non-Euclidean geometry with “elliptical geometry.” These developments led Henri Poincaré to declare that Kant was wrong. Space is not intrinsically Euclidean before we perceive it, but rather is a learned framework. In his words, “…to conclude that the axioms of geometry are ‘neither synthetic a priori intuitions nor experimental facts. They are conventions. Our choice among all possible conventions is guided by experimental facts, but it remains free.” Even more geometries would follow, such as projective and differential geometry.
Reactions
The reactions to non-Euclidean geometries were varied. Many mathematicians did not consider these alternative geometries to be legitimate types of mathematics. One objection was consistency. Would these geometries ultimately lead to self contradictions? By the 1870s, Eugenio Beltrami and Felix Klein had shown that if Euclidean geometry is consistent, then so are the non-Euclidean geometries. Some mathematicians turned to arithmetic (number theory), connecting geometry with numbers in the spirit of Descartes analytic geometry. David Hilbert demonstrated that if arithmetic is consistent, then so is geometry. This observation doesn’t really answer the question though, but only pushes it downstream.
Another objection was relevance. Initially, the non-Euclidean geometries were seen as curiosities, with no connection to the three dimensional space of the real world that we live in. Many mathematicians, including Gerolamo Cardano and John Wallis, refused to consider any mathematics that dimensions higher than three. Jean D’Alembert and Joseph Lagrance considered 4D mathematics, with only with time as the fourth dimension.
The “Freedom of Mathematics”
Hermann Grüssman explored geometries in any number of dimensions, which originated the branch of mathematics we now call linear algebra. To him, mathematics was an abstract creation of the mind that does not necessarily have any application to the real world. At this point, mathematics was no longer restrained to describing the three dimensional observable world.
In the 1700s, Euler had declared, “mathematics, in general, is the science of quantity; or, the science that investigates the means of measuring quantity.” The introduction of these abstract geometries, along with the notion of infinity, had distorted the meaning of “measurement” or “quantity” beyond recognition. In addition, mathematics had become the study of abstractions, far removed from physical reality. Georg Cantor characterized the new spirit of mathematics by a “declaration of independence”: “Mathematics is in its development entirely free and is only bound in the self-evident respect that its concepts must both be consistent with each other and also stand in exact relationships, ordered by definitions, to those concepts which have previously been introduced and are already at hand and established.” Dedekind added, “I consider the number concept entirely independent of the notions or intuitions of space and time…Numbers are free creations of the human mind.” Cantor continued, saying “The essence of mathematics lies entirely in its freedom.” By the close of the 1800s, most mathematicians considered mathematics to not be about the explanation of the physical world, but rather as the logical consequences of given axioms.
Foundations of Mathematics
Non-Euclidean geometries and their aftermath led mathematicians to seriously investigate the foundations of mathematics. This investigation was pursued via logic.
Logicism
Logicism is the point of view that mathematics can be reduced to logic. Bertrand Russell explained it this way. “Mathematics and logic, historically speaking, have been entirely distinct studies. Mathematics has been connected with science, logic with Greek. But both have developed in modern times: logic has become more mathematical and mathematics has become more logical. The consequence is that it has now [in 1919] become wholly impossible to draw a line between the two; in fact the two are one. They differ as boy and man: logic is the youth of mathematics and mathematics is the manhood of logic.”
Both camps, formalists and Platonists, embraced logicism. Formalists were happy to see the “games” of mathematics coalesce into one übergame, whereas Platonists hoped that logic would eventually reveal a single metaphysical origin to mathematics.
Logic and Mathematics
Logic has long concerned itself with concepts and propositions, and by making valid inferences from the relationships between propositions. For example, a syllogism is a form a reasoning in which a conclusion is drawn from two given premises of a certain form. For instance, all cats are mammals, all mammals have four legs, therefore all cats have four legs is a syllogism. Symbolically, this line of reasoning can be denoted as “every X is a Y; every Y is a Z; therefore, all Xs are Zs.” One cannot help but notice the parallels between symbolic logic and mathematics.
In the 1700s, Gottfried Wilhelm Leibniz, famous for formulating calculus independently of Newton, and the ensuing bitter dispute over priority, envisioned a characteristica universalis, which was a system to compute the truth or falsity of any statement symbolically. However, his formulation had issues — the vagueness of what might constitute an “alphabet of thought” and the difficulty of determining when thoughts are “equal” — that doomed this experiment to never really get off the ground.
In the mid-1800s, Augustus DeMorgan began to make strides with the algebraization of logic, including quantifying the predicate. George Boole, who corresponded with DeMorgan since 1842, published two influential books, The Mathematical Analysis of Logic, and The Laws of Thought, which together transformed logic into a type of algebra.
Friedrich Frege wrote two more notable books, the Begriffsschrift and the Grundgesetze der Arithmetic, in which he set out to prove the truths of arithmetic from a few axioms of logic. He wanted to show that even the natural numbers could be reduced to logical constructs, and build from these concepts to show the truth of arithmetic. Although it was a brilliant idea to prove arithmetic from basic concepts, his execution was fatally flawed — one of his axioms, known as Basic Law V, was shown by Bertrand Russell to lead to a contradiction.
Despite this flaw, Russell believed that deriving arithmetic from logic was the correct approach. He wrote, along with Alfred North Whitehead, the Principia Mathematica, still considered as one of the most influential books in the history of logic. Russell indeed successfully fixed Basic Law V, but in a highly artificial and convoluted way.
Eventually the paradoxes were more elegantly eliminated by Ernst Zermelo and Abraham Fraenkel by axiomatizing set theory in a self-consistent way. This seemed to give mathematics a kind of objective certainty, giving the Platonists a long sought after victory. However, once again, one of the axioms, this time the so-called axiom of choice, proved to be problematic.
The axiom of choice states that if X is a collection of nonempty sets, then we can choose one element in each set in X to form a new set Y. There is no controversy when considering finite sets since one can easily see, for instance, that for a hundred boxes, each containing at least one marble, one can take one marble from each box to form a new set of a hundred marbles. Extending this notion, we can see that the axiom must be true even for countably infinite sets. However, in the case of uncountably infinite sets, we cannot even define how such a choice could be made.
Similar to the days of Euclid’s fifth axiom, mathematicians attempted to either prove or refute the axiom of choice from the other axioms. In the 1930s, Kurt Gödel showed that the axiom of choice can neither be proved nor disproved from the other Z-F axioms.
The next phase of the debate was initiated by David Hilbert, the Formalist, who believed mathematics was no more than a collection of meaningless formulae. His ambitious program was to show the correctness of mathematics using mathematical methods (so-called “metamathematics”). As he said, “My investigations in the new grounding of mathematics have as their goal nothing less than this: to eliminate, once and for all, the general doubt about the reliability of mathematical inference…Everything that previously made up mathematics is to be rigorously formalized, so that mathematics proper or mathematics in the strict sense becomes a stock of formulae…In addition to this formalized mathematics proper, we have a mathematics that is to some extent new: a metamathematics that is necessary for securing mathematics, and in which — in contrast to the purely formal modes of inference in mathematics proper — one applies contextual inference, but only to prove the consistency of the axioms…Thus the development of mathematical science as a whole takes place in two ways that constantly alternate: on the one hand we derive provable formulae from the axioms by formal inference; on the other, we adjoin new axioms and prove their consistency by contextual inference.”
Once again, enter Gödel. As a teenager, he had become fascinated by Russell and Whitehead’s Principia Mathematica and by Hilbert’s program. For his dissertation, he chose to determine whether Hilbert’s formal approach was sufficient to produce mathematics. A year after being awarded his dissertation, he dropped the bomb in the form of Gödel’s “Incompleteness Theorem.” From Livio: “…the incompleteness theorems proved that Hilbert’s formalist program was essentially doomed from the start. Gödel showed that any formal system that is powerful enough to be of any interest is inherently either incomplete or inconsistent. That is, in the best case, there will always be assertions that the formal system can neither prove nor disprove. In the worst, the system would yield contradictions.”
For a technical explanation of the Gödel’s Incompleteness Theorem, please refer to this article by Jørgen Veisdal.
Modern Views
One suspects that most professional mathematicians go about their daily work without becoming mired in philosophical meta-discussions of their favorite subject. As Philip Davis and Reuben Hersch say in The Mathematical Experience, “the typical mathematician is a Platonist on weekdays and a formalist on Sundays.”
Some twentieth century mathematicians did indeed have strong views in one camp or the other. For instance, G. H. Hardy gives a strong Platonist persective in his A Mathematician’s Apology, whereas Edward Kasner and James Newman describe the opposite Formalist view in Mathematics and the Imagination.
On the Platonist side, some people believe that mathematics exists not in some kind of metaphysical realm, but is baked into the physical world. For instance, Max Tegmark, the MIT astrophysicist, takes the view that “our universe is not just described by mathematics — it is mathematics.” He argues that external physical reality exists independently of humans, then examines what the nature of the “theory of everything” of such a reality, states that such a description must be free of any human “baggage” (such as language), then concludes that such a final theory cannot consist of any human constructs, and therefore the only possible description of the cosmos is one that only involves abstract concepts and their relations, i.e., mathematics. Livio fails to be impressed by this chain of reasoning, claiming that Tegmark in effect assumes his conclusion.
Livio also mentions a number of neuroscientists, such as Jean-Pierre Changeux, and Lakoff and Núñez, that take the view that mathematics is a human construction of the human brain.
So what does the author think? Livio concludes the book by stating that the question of whether mathematics is invented or discovered is misleading, since it implies that mathematics must be either one or the other. His view is that mathematics is both invented and discovered. The mathematical concepts and definitions are invented, whereas the logical consequences of these concepts are discovered.
Conclusion
In this article, I’ve necessarily given you only the highlights of the book. I have skipped over much of the biographical material and the more technical aspects of the debates, which are well worth delving into. I encourage you to read it for yourself. Enjoy!