Minor Review: Machine Learning Fundamentals  Probability Theory  Bayes Theorem
Minor Points
Section "Why is Bayes' Theorem of interest"
 Change "interest" to "Interest" in section title. Tested with Headline Capitalization Website.
Section "Bayes Theorem"
 From a notebook editors point of view, it would be nicer to have many small markdown cells instead of a few large ones (personal opinion). When douple clicking on a cell for editing, it jumps down to the bottom of the cell and then it's hard to locate the desired position in the markdown code.
 Change
p(AB)
top(A \mid B)
after "From the visualization it becomes clear that the following applies:"  Change
p(AB)
top(A \mid B)
in sentence "The mathematical notation for such a question is a conditional probability $p(AB)$, [...]"  Missing period at the end of sentence "[...] divided by the cardinality of universe $\Omega$".
Section "Cookie Problem Example"
 "in praxis" > "in practice"
 BayesTheorem_Cookie_Equation.png looks blurry
Major Points and General Thoughts
Please take the following points only as consideration and not as a major criticism:

In this notebook A and B denote sets of specific events. For example A is "Number of cases where chocolate addiction is present". Therefore p(A) is only a single probability value. When I'm reading p(A), I would usually think of A (Addicted) as a random variable that can take on several values like A=positive or A=negative. A would be a probability table.

The same comes to mind in the Cookie Problem example. There is no clear distinction between a random variable ("Cookie" or "Bowl") and a certain value (e.g. "Vanilla" or "1") this variable can take.

I don't even know if my concerns are valid here, because you are using a set notation and capital A is perfectly fine to describe a set. Also the notation is consistent throughout the notebook. I'm just concerned about later examples, like in my Cookie Problem Exercise notebook, where something like P(A) describes a probability table and P(A=positive) or P(A=negative) the possible values (subtables), which might be confusing.

In my notebooks I'm using
P(A \mid B)
with upper case P instead ofp(A \mid B)
. Is there a special meaning in lower case P over upper case P. If not, we should agree on one of these notations.
Section "Why is Bayes' Theorem of interest"
 I think the "Disease" example is incomplete because the prevalence (probability of disease occurring in population) is missing.
Final Thoughts
Overall the notebook provides a very good explanation and the visual approach gave me a new understanding of this topic, although I was already familiar with Bayes' Theorem.
The art style is great and we should use it in other notebooks as well.