Notation, Sets, and Functions
1 Overview
A great deal of quantitave empirical research in the social sciences and statistics textbooks use notation extensively. Notation is useful to write about general concepts or ideas that apply to a broad range of settings. The downside is that notation makes things harder to read.
This section introduces common notation practices in the social sciences that will help you in both in methods courses and when reading/writing papers. Usage may vary across fields and over time, but becoming familiar with general uses may still help you understand the deviations.
2 Set Notation
2.1 Conditions
A set is a collection of elements, we use symbols to refer to different parts of the set.
Notation | Meaning | Use |
---|---|---|
{ | bracket | Specify a set (e.g. {2, 3}) |
\(\exists\) | “there exists”: true for at least one thing | Specify a condition to be satisfied |
\(\forall\) | “for all”; true for all elements | Specify which elements belong in a set (all that satisfy a criteron) |
\(\exists\) | “exists”; there is something true | Specify a rule or proposition that is true |
\(\in\) | “in” or “element of” | States what something / an element is a member of |
| | Such that | used in set theoretic definitions re: which values satisfy a particular (set of) condition(s) |
\(\notin\) | excluding (element) | States that something is not a member of a set |
\(\equiv\) | equivalent | Set theory equal |
For example, \(A = \{4n \in \mathbb{N} | n \in \mathbb{N}\}\) says \(A\) is the set of natural numbers that are divisible by 4. We could list out the elements of this set. Note that curly brackets are always used for sets, never parentheses. E.g. {4, 8, 12, …} and not (4, 8, 12, …).
2.2 Operators
We can also symbols to denote operations between sets.
Notation | Meaning | Use |
---|---|---|
\(\subset\) | Subset | Less than |
\(\subseteq\) | Subset | Less than equal |
\(\varnothing\) | Empty set | Zero |
\(\cap\) | Intersection | AND |
\(\cup\) | Union | OR |
\(\setminus\) | Difference | Minus (remove elements from sets) |
For example, \(A \cap B\) contains the elements that appear in A and B at the same time, whereas \(A \cup B\) contains the elements that appear in either A or B even if they overlap.
2.3 Number sets
Certain sets of numbers are so frequent they have their own notation.
Notation | Meaning | Use |
---|---|---|
\(\mathbb{N}\) | Natural numbers | {(0), 1, 2, 3, 4, 5, …} |
\(\mathbb{Z}\) | Integers (pos and neg) | {…, -3, -2, -1, 0, 1, 2, 3, …} |
\(\mathbb{Q}\) | Rational numbers (quotient) | (all fractions–produced by numbers divided by an integer) |
\(\mathbb{R}\) | Real numbers (pos, neg, fractions) | (any point on a number line) |
\(\mathbb{I}\) | Imaginary number | i = \(\sqrt(-1)\) |
\(\mathbb{C}\) | Complex numbers | (a + bi) |
2.4 Greek letters
Greek letters are often used to represent parameters or abstract quantities of interest. Below you can find the letter with its corresponding syntax for math mode in markdown languages (e.g. LaTeX, Typst).1
\(\alpha\) \(A\) \alpha A | \(\nu\) \(N\) \nu N |
\(\beta\) \(B\) \beta B | \(\xi\) \(\Xi\) \xi \Xi |
\(\gamma\) \(\Gamma\) \gamma \Gamma | \(o\) \(O\) o O (omicron) |
\(\delta\) \(\Delta\) \delta \Delta | \(\pi\) \(\Pi\) \pi \Pi |
\(\epsilon\) \(\varepsilon\) \epsilon \varepsilon | \(\rho\) \(\varrho\) P \rho \varrho P |
\(\zeta\) \(Z\) \zeta Z | \(\sigma\) \(\Sigma\) \sigma \Sigma |
\(\eta\) \(H\) \eta H | \(\tau\) \(T\) \tau T |
\(\theta\) \(\vartheta\) \(\Theta\) \theta \vartheta \Theta | \(\upsilon\) \(\Upsilon\) \upsilon \Upsilon |
\(\iota\) \(I\) \iota I | \(\phi\) \(\varphi\) \(\Phi\) \phi \varphi \Phi |
\(\kappa\) \(K\) \kappa K | \(\chi\) \(X\) \chi X |
\(\lambda\) \(\Lambda\) \lambda \Lambda | \(\psi\) \(\Psi\) \psi \Psi |
\(\mu\) \(M\) \mu M | \(\omega\) \(\Omega\) \omega \Omega |
Somme commonly used Greek letters include:
- \(\delta\) “delta” (used for integrals and discount factors in game theory)
- \(\Delta\) “delta” (used for difference/change)
- \(\beta\) “beta” (used for coefficients in regression)
- \(\mu\) “mu” (used for means)
- \(\sigma\) “sigma” (used for standard deviation)
- \(\lambda\) “lambda” (used for eigenvalues in linear algebra)
- \(\epsilon\) “epsilon” (used for the error term in regressions)
- \(\tau\) “tau” (used for treatment effects in experiments)
Different Greek letters are also used to denote in machine learning and Bayesian statistics.
2.5 Greek vs. Latin letters
In papers using statistical methods, you may encounter a combination of Greek and English/Latin letters. Usage may vary depending on the application, but generally:
- Greek letters denote the unobserved truth that we want to learn about. For example, \(\mu\) is the average adult height in the United States
- Greek letters with markings are our estimate of the truth based on our data. To calculate the average adult height, we use the mean \(\hat{\mu}\) (which we read as “mu hat”)
- Latin letters denote actual data in our sample, usually variables. For example, \(X\) is a collection of adult heights in a survey we conducted
- Latin letters with markings denote calculations in our data. For example, \(\bar{X}\) (“X bar”) is the sample mean adult height
This means we use \(X\) in our data to calculate \(\bar{X}\), which is our estimate \(\hat{\mu}\) of the unknown quantity \(\mu\). A good statistical procedure yields a \(\hat{\mu}\) that approximates \(\mu\), although this is generally hard to assess since \(\mu\) is unobserved. If we knew the unknown quantity of interest, then no statistical analysis would be necessary.
3 Logic and Proofs
A complete treatment of logic and proofs goes beyond the scope of math camp. Here, we will focus on a few relevant terms and symbol.
3.1 Proof terms
In math, probability, statistics, and game theory, proofs often have different parts.
- Assumptions are statements taken to be true
- Propositions are statements thought to be true given the assumptions
- Theorems is a proven proposition
- Lemmas are intermediary theorems that are not important on their own, but necessary to arrive at more important theorems or conclusions
- Corollaries are propositions that follow directly from the proof of another proposition and do not require further proof
3.2 Necessary and sufficient
Sometimes we want to understand the conditions that determine certain outcomes. To do this, the language of necessary and sufficient conditions is useful. Consider outcome Y and conditions A and B.
A condition is sufficient if it occurs also when the outcome occurs. So, if A occurs when Y also occurs, they are sufficient for B to occur. B can happen without A, but we do not see A without B.
A condition is necessary if we never see the outcome without it. So, if we observe Y, B must have also occurred. B can happen without Y, but never Y without B.
Sometimes, conditions operate in more complicated ways. We can have an Insufficient but Necessary part of an Unnecessary but Sufficient set of conditions an outcome to occur (INUS). For example, having unprotected sexual intercourse is insufficient for HIV transmission, but it can be necessary part of one of the many ways in which HIV is transmitted. Informally, it helps to think about INUS conditions as component causes.
Also, we can have Sufficient but Unnecessary condition that is part of a condition set that is Insufficient but Necessary for the outcome. For example, fraudulent elections are a sufficient but unnecessary condition to erode democracy. In turn, per the democratic peace theory, the absence of democracy is necessary but not sufficient to cause war. It helps to think of SUIN conditions as precipitating causes.
3.3 Logical operators
Like regular math operators, logical operators denote operations on sets. Some of these are useful for data analysis work. For example, when subsetting data. Most of these operators exist in R and other statistical computing software, although sometimes with modified syntax.
Notation | Meaning | Use |
---|---|---|
\(\wedge\) | And | Intersection.Discussing elements in both sets |
\(\vee\) | Or | Union. Discussing elements that are in multiple sets |
\(\sim\) | Not | Negation |
! | Not | Negation (used in R most often) |
< | Less than | Inequality(good for specifying conditions when filtering) |
<= | Less than equal to | Inequality (good for specifying conditions when filtering; include the value) |
> | Greater than | Inequality (good for specifying conditions when filtering) |
>= | Greater than equal to | Inequality (good for specifying conditions when filtering; include the value) |
!= | Not equal to | Exclude values when filtering (anything other than the exact value) |
== | Exactly equal | Helpful in R for when a value is exactly satisfied |
%in% | In | Useful when searching for terms |
3.4 Proofs
We won’t do proofs in Math Camp, but we’ll reference them implicitly throughout the course and in they may come back in your fall coursework.
3.4.1 Direct proofs
Direct proofs work through proof and typically use one of the following methods: + General (deductive) proof: typically done using definitions, etc. Showing how the outcome logically follows building on rules and assumptions. + Proof by exhaustion: Break up the outcome into sub cases and show for each case (done often in game theory for possible values) + Proof by construction: These proofs demonstrate existence (is there a square that is the sum of two squares?). + Proof by induction: Start small and show it is true for any number (e.g. start with a small n, n=1, then expand to n+1)
3.4.2 Indirect proofs
These show that the outcome must be true because there is no possible alternative.These are typically demonstrated using the following methods:
- Proof by counterexample: using a counterexample (x implies y, yet we observe y without x…x cannot imply y (aka x not necessary for y)).
- Proof by contradiction: assume that the statement is false and try to prove it wrong, eventually demonstrating that a contradiction emerges. Thus, the statement cannot be false.