计算机代考程序代写 Bayesian algorithm Probability and Distributions – cscodehelp代写
Probability and Distributions
Liang National University
1
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021
3
6.2.1 Discrete Probabilities
• When the target space is discrete, we can imagine the probability distribution of multiple random variables as filling out a (multidimensional) array of numbers.
• We define the joint probability as the entry of both values jointly 𝑃𝑋=𝑥%,𝑌=𝑦) =𝑛%)
𝑁
𝑛%) is the number of events with state 𝑥% and 𝑦) and 𝑁 total number of events
• The probability that 𝑋 = 𝑥, 𝑌 = 𝑦 is written as 𝑝(𝑥, 𝑦)
4
6.2.1 Discrete Probabilities
• The marginal probability that 𝑋 = 𝑥 irrespective of the value of 𝑌 is written as
𝑝(𝑥)
• We write 𝑋~𝑝 𝑥 to denote that the random variable 𝑋 is distributed
according to 𝑝 𝑥
• If we consider only the instances where 𝑋 = 𝑥, then the fraction of instances (conditional probability) for which 𝑌 = 𝑦 is written as 𝑝 𝑦|𝑥 .
5
13
6
4
17
≥ 30
< 30
• 𝑋: AD scoring. 𝑌: LBJ scoring.
• 𝑋 has two possible states; 𝑌 has two possible states
• Weuse𝑛%) todenotethenumberofeventswithstate𝑋=𝑥and𝑌=𝑦.
Example
< 30
≥ 30
• Totalnumberofevents𝑁=13+6+4+17=40
• Value𝑐 istheeventsumofthe𝑖thcolumn,i.e.,𝑐 =∑> 𝑛
% % )<=%)
• 𝑟 istherowsum,i.e.,𝑟 =∑> 𝑛 ) ) %<=%)
• The probability distribution of each random variable, the marginal probability, can be seen as the sum over a row or column
and
𝑟 ∑> 𝑛
𝑃𝑋=𝑥% = %= )<= %) 𝑁𝑁
𝑃𝑌=𝑦) =)= %<= %) 𝑁𝑁
𝑐 ∑> 𝑛
𝑃 𝐿𝐵𝐽 𝑠𝑐𝑜𝑟𝑒𝑠 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 30 𝑝𝑡𝑠 = =IJK LM
6
Example
• For discrete random variables with a finite number of events, we assume that probabilities sum up to one, that is
>>
P𝑃𝑋=𝑥% =1andP𝑃𝑌=𝑦) =1 %<= )<=
• The conditional probability is the fraction of a row or column in a particular cell. For example, the conditional probability of 𝑌 given 𝑋 is
𝑃𝑌=𝑦)|𝑋=𝑥% =𝑛%) 𝑐%
• and the conditional probability of 𝑋 given 𝑌 is 𝑛%) 𝑃𝑋=𝑥%|𝑌=𝑦) =𝑟)
• 𝑃 𝐿𝐵𝐽 < 30| 𝐴𝐷 ≥ 30 = VWX = =Z YX KJ=Z
≥ 30
< 30
13
4
< 30
≥ 30
6
17
7
6.2.2 Continuous Probabilities
• A function 𝑓: R^ ⟶ R is called a probability density function (pdf) if
• Its integral exists and
∀𝒙 ∈ R^:𝑓(𝒙) ≥ 0
c 𝑓(𝒙)𝑑𝒙=1. Rd
• Observe that the probability density function is any function 𝑓 that is non- negative and integrates to one. We associate a random variable 𝑋 with this
function 𝑓 by
where 𝑎, 𝑏 ∈ R; 𝑥 ∈ R are outcomes of the continuous random variable 𝑋.
This association is called the distribution of the random variable 𝑋.
• Note: the probability of a continuous random variable 𝑋 taking a particular value 𝑃 𝑋 = 𝑥 is zero. This is to specify an interval where 𝑎 = 𝑏
𝑃 𝑎≤𝑋≤𝑏 =c 𝑓(𝑥)𝑑𝑥 h
i
8
6.2.2 Continuous Probabilities
• A cumulative distribution function (cdf) of a multivariate real-valued random variable 𝑋 with states 𝒙 ∈ Rj is given by
𝐹l 𝒙 =𝑃𝑋=≤𝑥=,⋯,𝑋^≤𝑥^ .
where 𝑋= 𝑋=,⋯,𝑋^ n,𝒙= 𝑥=,⋯,𝑥^ n, and the right-hand side represents
the probability that random variable 𝑋% takes the value smaller than or equal to 𝑥% .
• The cdf can be expressed also as the integral of the probability density
function 𝑓(𝒙) so that
qW qd
𝐹l𝒙=c ⋯c 𝑓𝑧=,⋯,𝑧^𝑑𝑧=⋯𝑑𝑧^
op op
9
6.2.3 Contrasting Discrete and Continuous Distributions
• Let 𝑍 be a discrete uniform random variable with three states {𝑧 = −1.1, 𝑧 = 0.3, 𝑧 = 1.5}. The probability mass function can be represented as a table of probability values:
• States can be located on the 𝑥-axis, and the 𝑦-axis represents the probability of a particular state
Uniform distribution (discrete): a finite number of values are equally likely to
be observed; every one of 𝑛 values has equal probability 1/𝑛
10
6.2.3 Contrasting Discrete and Continuous Distributions • Let 𝑋 be a continuous random variable taking values in range 0.9 ≤ 𝑋 ≤ 1.6
• Observe that the height of the density can be greater than 1. However, it
needs to hold that
=.K
c 𝑝(𝑥)𝑑𝑥 = 1 M.z
Uniform distribution (continuous): denoted as 𝑈(𝑎, 𝑏)
1 𝑓𝑥 ={𝑏−𝑎
for 𝑎 ≤ 𝑥 ≤ 𝑏 0 otherwise
11
6.3 Sum Rule, Product Rule, and Bayes’ Theorem
• 𝑝 𝒙, 𝒚 is the joint distribution of the two random variables 𝒙, 𝒚 • 𝑝 𝒙 and 𝑝 𝒚 are the marginal distributions
• 𝑝 𝒚|𝒙 is the conditional distribution of 𝒚 given 𝒙
• The sum rule states that
𝑝𝒙=
P𝑝 𝒙,𝒚 𝒚∈𝒴
c 𝑝 𝒙,𝒚 d𝒚 𝒴
if𝒚isdiscrete if𝒚iscontinuous
where 𝒴 are the states of the target space of random variable 𝑌. • We sum out (or integrate out) the set of states 𝒚 of the random 𝑌 • The sum rule is also known as the marginalization property.
• If 𝒙 = 𝑥=,⋯,𝑥^ n, we obtain the marginal
𝑝𝑥% =c𝑝𝑥=,⋯,𝑥^d𝒙\%
by repeated application of the sum rule where we integrate/sum out all random
variables except 𝑥%, which is indicated by i, which reads all “except i.”
12
𝑝𝑥 =0.4𝒩å𝒙 10 , 1 0é+0.6𝒩(𝒙 0 , 8.4 2.0é 2 0 1 0 2.0 1.7
• The distribution is bimodal (has two peaks)
• One of marginal distributions is unimodal (has one peak)
13
6.3 Sum Rule, Product Rule, and Bayes’ Theorem
• The product rule relates the joint distribution to the conditional distribution
𝑝𝒙,𝒚 =𝑝𝒚|𝒙𝑝𝒙
• Every joint distribution of two random variables can be factorized (written as a product) of two other distributions
• The product rule also implies
𝑝𝒙,𝒚 =𝑝𝒙|𝒚𝑝𝒚
14
6.3 Sum Rule, Product Rule, and Bayes’ Theorem
• Let us assume we have some prior knowledge 𝑝 𝒙 about an unobserved random variable 𝒙 and some relationship 𝑝 𝒚 | 𝒙 between 𝒙 and a second random variable 𝒚, which we can observe. If we observe 𝒚, we can use Bayes’ theorem (also Bayes’ rule or Bayes’ law) to draw some conclusions about 𝒙 given the observed values of 𝒚.
óñòîóñôëëö êïñëï 𝑝𝒙|𝒚 =𝑝𝒚|𝒙 𝑝𝒙
êëíìîïñëï
𝑝𝒚 õ
îúñöîùûî
is a direct consequence of the product rule, since
and so that
𝑝𝒙,𝒚 =𝑝𝒙|𝒚𝑝𝒚 𝑝𝒙,𝒚 =𝑝𝒚|𝒙𝑝𝒙
𝑝𝒙|𝒚𝑝𝒚 =𝑝𝒚|𝒙𝑝𝒙 ⟺𝑝𝒙|𝒚 =𝑝𝒚|𝒙𝑝𝒙 𝑝𝒚
15
𝑝 𝐿𝐵𝐽<30|𝐴𝐷≥30 =
𝑝𝒙|𝒚 =𝑝𝒚|𝒙𝑝𝒙 𝑝𝒚
𝑝 𝐴𝐷≥30|𝐿𝐵𝐽<30 𝑝 𝐿𝐵𝐽<30 𝑝 𝐴𝐷 ≥ 30
17†21 17 =21 40=
23 23 40
13
6
4
17
< 30
≥ 30
< 30
≥ 30
16
6.3 Sum Rule, Product Rule, and Bayes’ Theorem
óñòîóñôëëö êïñëï 𝑝𝒙|𝒚 =𝑝𝒚|𝒙 𝑝𝒙
𝑝𝒚 õ
unobserved (latent) variable 𝒙 before observing any data
• 𝑝 𝒚 | 𝒙 , the likelihood, describes how 𝒙 and 𝒚 are related
• 𝑝 𝒚 | 𝒙 is the probability of the data 𝒚 if we were to know the latent variable 𝒙
• We call 𝑝 𝒚 | 𝒙 either the “likelihood of 𝒙 (given 𝒚)” or the “probability of 𝒚 given 𝒙” (𝒚 is observed; 𝒙 is latent)
• 𝑝 𝒙 | 𝒚 , the posterior, is the quantity of interest in Bayesian statistics because it expresses exactly what we are interested in, i.e., what we know about 𝒙 after having observed 𝒚 (e.g., linear regression or Gaussian mixture models)
Linear regression: 𝑝 𝑦|𝒙, 𝜽
êëíìîïñëï
• 𝑝 𝒙 is the prior, which encapsulates our subjective prior knowledge of the
îúñöîùûî
17
6.3 Sum Rule, Product Rule, and Bayes’ Theorem
óñòîóñôëëö êïñëï 𝑝𝒙|𝒚 =𝑝𝒚|𝒙 𝑝𝒙
êëíìîïñëï
𝑝𝒚 õ
• The quantity
îúñöîùûî
𝑝 𝒚 : = c 𝑝 𝒚 | 𝒙 𝑝 𝒙 𝑑𝒙 = 𝔼l 𝑝 𝒚 | 𝒙
• is the marginal likelihood/evidence.
• The marginal likelihood integrates the numerator with respect to the latent variable 𝒙
18
6.4 Summary Statistics and Independence 6.4.1 Means and Covariances
• The expected value of a function 𝑔 ∶ R ⟶ R of a univariate continuous random variable 𝑋 ∼ 𝑝(𝑥) is given by
𝔼l𝑔𝑥 =c𝑔𝑥𝑝𝑥𝑑𝑥 𝒳
• Correspondingly, the expected value of a function 𝑔 of a discrete random variable 𝑋 ∼ 𝑝(𝑥) is given by
𝔼l𝑔𝑥 =P𝑔𝑥𝑝𝑥 q∈𝒳
where 𝒳 is the set of possible outcomes (the target space) of the random variable 𝑋
• We consider multivariate random variables 𝑋 as a finite vector of univariate random variables 𝑋=, ⋯ , 𝑋V n. For multivariate random variables, we define the expected value element wise 𝔼lW 𝑔(𝑥=)
𝔼l𝑔𝒙 = ⋮ ∈R^ 𝔼ld 𝑔(𝑥^)
where the subscript 𝔼l® indicates that we are taking the expected value with
respect to the dth element of the vector 𝒙
19
6.4.1 Means and Covariances
• The mean of a random variable 𝑋 with states 𝒙 ∈ R^ is an average and is
defined as
where
𝔼lW 𝑥= ^ ⋮ ∈ R
𝔼ld 𝑥^
if 𝑋 is a continuous random variable if 𝑋 is a discrete random variable
for 𝑑 = 1, . . . , 𝐷, where the subscript 𝑑 indicates the corresponding dimension of 𝒙. The integral and sum are over the states 𝒳 of the target space of the random variable 𝑋.
𝔼l 𝒙 =
c 𝑥©𝑝(𝑥©)d𝑥© 𝒳
P 𝑥%𝑝(𝑥© = 𝑥%)
𝔼q® 𝑥© :=
qÆ∈𝒳
20
6.4.1 Means and Covariances
• The expected value is a linear operator. For example, given a real-valued
function𝑓 𝒙 = 𝑎𝑔 𝒙 +𝑏h 𝒙 where𝑎,𝑏 ∈ Rand𝒙∈R^,weobtain 𝔼l𝑓𝒙 =c𝑓𝒙𝑝𝒙d𝒙
= c 𝑎𝑔 𝒙 + 𝑏h 𝒙 𝑝 𝒙 d𝒙
= 𝑎 c 𝑔 𝒙 𝑝 𝒙 d𝒙 + 𝑏 c h 𝒙 𝑝 𝒙 d𝒙 =𝑎𝔼l𝑔𝒙 +𝑏𝔼lh𝒙
21
An End-to-End Transformer Model for 3D Object Detection. et al., ICCV 2021
6.4.1 Means and Covariances
• The covariance between two univariate random variables 𝑋, 𝑌 ∈ R is given by the expected product of their deviations from their respective means, i.e.,
𝐶𝑜𝑣l,2 𝑥,𝑦 ≔𝔼l,2 𝑥−𝔼l 𝑥 𝑦−𝔼2 𝑦
• By using the linearity of expectations, It can be rewritten as the expected value of the product minus the product of the expected values, i.e.,
𝐶𝑜𝑣𝑥,𝑦 =𝔼𝑥𝑦 −𝔼𝑥𝔼𝑦
• The covariance of a variable with itself 𝐶𝑜𝑣 𝑥, 𝑥 is called the variance and is
denoted by 𝕍l 𝑥 .
• The square root of the variance is called the standard deviation and is often
denoted by 𝜎(𝑥).
• If we consider two multivariate random variables 𝑋 and 𝑌 with states 𝒙 ∈ R^and 𝒚 ∈ R¶ respectively, the covariance between 𝑋 and 𝑌 is defined as
Cov𝒙,𝒚 =𝔼𝒙𝒚n −𝔼𝒙𝔼𝒚n=𝐶𝑜𝑣𝒚,𝒙n∈R^׶
23
6.4.1 Means and Covariances
• The variance of a random variable 𝑋 with states 𝒙 ∈ R^ and a mean vector
𝝁∈R^ isdefinedas 𝕍l 𝒙 =Covl 𝒙,𝒙
=𝔼l 𝒙−𝝁 𝒙−𝝁n =𝔼l𝒙𝒙n −𝔼l𝒙𝔼l𝒙n
𝐶𝑜𝑣 𝑥=,𝑥= 𝐶𝑜𝑣 𝑥=,𝑥> ⋯ 𝐶𝑜𝑣 𝑥=,𝑥^ 𝐶𝑜𝑣 𝑥>,𝑥= 𝐶𝑜𝑣 𝑥>,𝑥> ⋯⋱ 𝐶𝑜𝑣 𝑥>,𝑥^
=
𝐶𝑜𝑣 𝑥^,𝑥= ⋯ 𝐶𝑜𝑣 𝑥^,𝑥^
⋮⋮⋯⋮
• The 𝐷 × 𝐷 matrix is called the covariance matrix of the multivariate random variable 𝑋. The covariance matrix is symmetric and positive definite and tells us something about the spread of the data.
• On its diagonal, the covariance matrix contains the variances of 𝑥%
24
Example – Computation of covariance matrix
𝑥= • 𝑋= 𝒙=,𝒙> = 2 1 𝑥>
3 1 𝑥I 2
13
• 𝝁 = 1.5 ;
2
−1 1
• 𝑋−𝝁= 0.5 −0.5 1 −1
𝑥= 𝑥> 𝑥I 𝑥 o = 2 −1 −2 =
=
• ∑=> 𝑋−𝝁 𝑋−𝝁 => −1 0.5 1 𝑥>
−2 1 2 𝑥I
25
6.4.1 Means and Covariances
• The correlation between two random variables 𝑋, 𝑌 is given by
corr 𝑥,𝑦 = Cov 𝑥,𝑦 ∈ −1,1 𝕍𝑥𝕍𝑦
• The covariance (and correlation) indicate how two random variables are related;
• Positive correlation corr[𝑥, 𝑦] means that when 𝑥 grows, then 𝑦 is also expected to grow. Negative correlation means that as 𝑥 increases, then 𝑦 decreases
Two-dimensional datasets with identical means and variances along each axis (colored lines) but with different covariances.
26
6.4.2 Empirical Means and Covariances
• In 6.4.1 we defined population mean and covariance, as it refers to the true
statistics for the population
• In machine learning, we have a finite dataset of size 𝑁
• The empirical mean vector is the arithmetic average of the observations for each variable, and it is defined as ¿
𝒙ø ≔ 𝑁1 P 𝒙 V V<=
Where 𝒙V ∈ R^.
• The empirical covariance matrix is a 𝐷×𝐷 matrix
1¿
P : = 𝑁 P ( 𝒙 V − 𝒙ø ) ( 𝒙 V − 𝒙ø ) n
V<=
• To compute the statistics for a particular dataset, we would use the observations 𝒙=, . . . , 𝒙¿ and use the two equations above.
27
6.4.3 Three Expressions for the Variance
• The standard definition of variance is the expectation of the squared deviation of a random variable 𝑋 from its expected value μ, i.e.,
𝕍l𝑥:=𝔼l 𝑥−𝜇>
• This is equivalent to the mean of a new random variable 𝑍 ∶= (𝑋 − 𝜇)>.
• We use a two-pass algorithm: one pass through the data to calculate μ, and then a second pass using this estimate μà to calculate the variance.
• It can be converted to the so-called raw-score formula for variance: 𝕍l 𝑥 = 𝔼l 𝑥> − (𝔼l 𝑥 )>
• The mean of the square minus the square of the mean. It can be calculated empirically in one pass
• A third way to understand the variance is that it is a sum of pairwise differences between all pairs of observations. Consider a sample 𝑥=, . . . , 𝑥¿ of realizations of random variable 𝑋, and we compute the squared difference between pairs of 𝑥% and 𝑥). By expanding the square, we can show that the
sum of 𝑁>pairwise differences is the empirical variance of the observations:
1¿1¿1¿ 𝑁>P(𝑥%−𝑥))>=⋯=2𝑁P𝑥%>− 𝑁P𝑥%
>
%,)<= %<= %<=
28
6.4.4 Sums and Transformations of Random Variables
• Considertworandomvariables𝑋,𝑌withstates𝒙,𝒚∈R^.Then:
• 𝔼𝒙+𝒚 =𝔼𝒙 +𝔼𝒚
• 𝔼𝒙−𝒚 =𝔼𝒙 −𝔼𝒚
• 𝕍𝒙+𝒚 =𝕍𝒙 +𝕍𝒚 +Cov𝒙,𝒚 +Cov𝒚,𝒙
• 𝕍𝒙−𝒚 =𝕍𝒙 +𝕍𝒚 −Cov𝒙,𝒚 −Cov𝒚,𝒙
• Mean and (co)variance have useful properties when it comes to affine transformation of random variables. Consider a random variable 𝑋 with mean 𝝁 and covariance matrix 𝜮 and an affine transformation 𝒚 = 𝑨𝒙 + 𝒃 of 𝒙. Then 𝒚 is itself a random variable whose mean vector and covariance matrix are given by
• 𝔼2 𝒚 =𝔼l 𝑨𝒙+𝒃 =𝑨𝔼l 𝒙 +𝒃=𝑨𝝁+𝒃
• 𝕍2 𝒚 =𝕍l 𝑨𝒙+𝒃 =𝕍l 𝑨𝒙 =𝑨𝕍l 𝒙𝑨n=𝑨𝜮𝑨n
• Furthermore,
• Cov𝒙,𝒚 =𝔼𝒙𝑨𝒙+𝒃n −𝔼𝒙𝔼𝑨𝒙+𝒃n
• =𝔼𝒙𝒃n+𝔼𝒙𝒙n 𝑨n−𝝁𝒃n−𝝁𝝁n𝑨n
• =𝝁𝒃n−𝝁𝒃n+ 𝔼𝒙𝒙n −𝝁𝝁n 𝑨n
• =𝜮𝑨n
• whereΣ = 𝔼 𝒙𝒙n − 𝝁𝝁n isthecovarianceof𝑋.
29
6.4.5 Statistical Independence
• Two random variables 𝑋, 𝑌 are statistically independent if and only if
𝑝𝒙,𝒚 =𝑝𝒙𝑝𝒚
• Intuitively, two random variables 𝑋 and 𝑌 are independent if the value of 𝒚 (once known) does not add any additional information about 𝒙 (and vice versa). If 𝑋, 𝑌 are (statistically) independent, then
𝑝𝒚|𝒙 =𝑝𝒚 𝑝𝒙|𝒚=𝑝𝒙
Covl,2 𝒙,𝒚 =𝟎 𝕍l,2𝒙+𝒚 =𝕍l𝒙 +𝕍2𝒚
• The last point may not hold in converse, i.e., two random variables can have covariance zero but are not statistically independent.
• covariance measures only linear dependence. Therefore, random variables that are nonlinearly dependent could have covariance zero.
• Example. Consider a random variable 𝑋 with zero mean (𝔼l 𝑥 = 0) and also 𝔼l 𝑥I = 0. Let 𝑦 = 𝑥> (hence, 𝑌 is dependent on 𝑋) and consider the covariance between 𝑋 and 𝑌. But this gives I
Cov𝑥,𝑦=𝔼𝑥𝑦−𝔼𝑥𝔼𝑦=𝔼𝑥 =0
30
6.5 Gaussian Distribution
• The Gaussian distribution is the most well-studied probability distribution for continuous-valued random variables.
• It is also referred to as the normal distribution.
• For a univariate random variable, the Gaussian distribution has a density that
is given by
• The multivariate Gaussian distribution is fully characterized by a mean vector
μ and a covariance matrix 𝜮 and defined as
𝑝 𝒙 𝝁 , 𝜮 = 2 𝜋 o ^> 𝜮 o => e x p ( − 12 𝒙 − 𝝁 o 𝜮 o = 𝒙 − 𝝁 )
where𝒙 ∈ Rj.Wewrite𝑝(𝑥)=𝒩 𝒙𝝁,𝜮 or𝑋 ∼𝒩(𝝁,𝜮).
• The special case of the Gaussian with zero mean and identity covariance,
that is, 𝝁 = 𝟎 and 𝜮 = 𝑰, is referred to as the standard normal distribution. 31
𝑝𝑥𝜇,𝜎>= 1 exp−𝑥−𝜇> 2𝜋𝜎> 2𝜎>
6.5 Gaussian Distribution
• Figure below shows a univariate Gaussian and a bivariate Gaussian with corresponding samples.
32
Spherical Gaussian
• General probability density function
𝑝 𝒙 𝝁 , 𝜮 = 2 𝜋 o dX 𝜮 o WX e x p ( − => 𝒙 − 𝝁 o 𝜮 o = 𝒙 − 𝝁 )
• Spherical Gaussian
•𝑝𝒙|𝝁,𝜎> =2𝜋𝜎>o^/>exp−= 𝒙−𝝁>, 𝝁∈R^,𝜎∈R. >ÕX
𝜎> 0 0
0 𝜎> 0 diagonal covariance, equal variances 0 0 𝜎>
P=
33
Spherical Gaussian
34
Gaussian with diagonal covariance matrix (variance not equal for different 𝑥%)
35
Gaussian with full covariance matrix
36
6.5.1 Marginals and Conditionals of Gaussians are Gaussians
• Let 𝑋 and 𝑌 be two multivariate random variables that may have different dimensions.
• We write the Gaussian distribution in terms of the concatenated states[𝒙; 𝒚], 𝑝𝒙,𝒚=𝒩𝝁q,𝚺qq 𝚺qŒ
where 𝚺qq = Cov 𝒙, 𝒙 and 𝚺ŒŒ = Cov 𝒚, 𝒚 are the marginal covariance matrices of 𝒙 and 𝒚, respectively, and 𝚺qŒ = Cov 𝒙, 𝒚 is the cross-covariance matrix between 𝒙 and 𝒚.
• The conditional distribution 𝑝(𝒙 | 𝒚) is also Gaussian and given by 𝑝𝒙│𝒚 =𝒩 𝝁q│Œ,𝚺q│Œ
𝝁 =𝝁+𝚺𝚺o=𝒚−𝝁 q│Œ q qŒŒŒo= Œ
𝚺q│Œ = 𝚺qq − 𝚺qŒ𝚺ŒŒ𝚺Œq
• The marginal distribution 𝑝(𝒙) of a joint Gaussian distribution 𝑝(𝒙, 𝒚) is itself
Gaussian and computed by applying the sum rule and given by
𝑝(𝒙) = c 𝑝(𝒙, 𝒚)𝑑𝒚 = 𝒩(𝒙│𝝁q, 𝚺qq)
𝝁Œ 𝚺Œq 𝚺ŒŒ
37
• Consider the bivariate Gaussian distribution
𝑝𝑥=,𝑥> =𝒩 0,0.3 −1 2 −1 5
• We can compute the parameters of the univariate Gaussian, conditioned on 𝑥> = −1, to obtain the mean and variance respectively.
• Numerically,thisi𝜇s
𝜎>qW│qX
qW│qX
• The marginal distribution 𝑝 𝑥= can be obtained by
𝑝(𝒙) = c 𝑝(𝒙, 𝒚)𝑑𝒚 = 𝒩(𝒙│𝝁q, 𝚺qq)
which is essentially using the mean and variance of the random variable 𝑥=. So we have,
𝑝𝑥= =𝒩0,0.3
39
Check your understanding
• Covariance (correlation) of univariate random variables has two directions, i.e., negative and positive values have different meanings, while variance does not.
• When the covariance of two random variables equals to the sum of their individual covariances, the two variables are statistically independent.
• The empirical mean = ∑¿ 𝒙 approximates the expected value ¿ V<= V
∫𝒳 𝑔 𝒙 𝑝 𝒙 𝑑𝒙ofarandomvariablewhen𝑁islarge.
• In the figure (cumulative distribution function, cdf), which Gaussian has the
largest variance?
• Green?Blue?Red?Yellow?
• What is the mean of the red Gaussian?
• The cdf terminates at →1.
• The cdf starts from → 0.
• The cdf always increases.
40
https://en.wikipedia.org/wiki/Normal_distribution