计算机代考程序代写 Bayesian algorithm Probability and Distributions – cscodehelp代写

Probability and Distributions
Liang National University
1

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021

3

6.2.1 Discrete Probabilities
• When the target space is discrete, we can imagine the probability distribution of multiple random variables as filling out a (multidimensional) array of numbers.
• We define the joint probability as the entry of both values jointly 𝑃𝑋=𝑥%,𝑌=𝑦) =𝑛%)
𝑁
𝑛%) is the number of events with state 𝑥% and 𝑦) and 𝑁 total number of events
• The probability that 𝑋 = 𝑥, 𝑌 = 𝑦 is written as 𝑝(𝑥, 𝑦)
4

6.2.1 Discrete Probabilities
• The marginal probability that 𝑋 = 𝑥 irrespective of the value of 𝑌 is written as
𝑝(𝑥)
• We write 𝑋~𝑝 𝑥 to denote that the random variable 𝑋 is distributed
according to 𝑝 𝑥
• If we consider only the instances where 𝑋 = 𝑥, then the fraction of instances (conditional probability) for which 𝑌 = 𝑦 is written as 𝑝 𝑦|𝑥 .
5

13
6
4
17
≥ 30

< 30 • 𝑋: AD scoring. 𝑌: LBJ scoring. • 𝑋 has two possible states; 𝑌 has two possible states • Weuse𝑛%) todenotethenumberofeventswithstate𝑋=𝑥and𝑌=𝑦. Example < 30 ≥ 30 • Totalnumberofevents𝑁=13+6+4+17=40 • Value𝑐 istheeventsumofthe𝑖thcolumn,i.e.,𝑐 =∑> 𝑛
% % )<=%) • 𝑟 istherowsum,i.e.,𝑟 =∑> 𝑛 ) ) %<=%) • The probability distribution of each random variable, the marginal probability, can be seen as the sum over a row or column and 𝑟 ∑> 𝑛
𝑃𝑋=𝑥% = %= )<= %) 𝑁𝑁 𝑃𝑌=𝑦) =)= %<= %) 𝑁𝑁 𝑐 ∑> 𝑛
𝑃 𝐿𝐵𝐽 𝑠𝑐𝑜𝑟𝑒𝑠 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 30 𝑝𝑡𝑠 = =IJK LM
6

Example
• For discrete random variables with a finite number of events, we assume that probabilities sum up to one, that is
>>
P𝑃𝑋=𝑥% =1andP𝑃𝑌=𝑦) =1 %<= )<= • The conditional probability is the fraction of a row or column in a particular cell. For example, the conditional probability of 𝑌 given 𝑋 is 𝑃𝑌=𝑦)|𝑋=𝑥% =𝑛%) 𝑐% • and the conditional probability of 𝑋 given 𝑌 is 𝑛%) 𝑃𝑋=𝑥%|𝑌=𝑦) =𝑟) • 𝑃 𝐿𝐵𝐽 < 30| 𝐴𝐷 ≥ 30 = VWX = =Z YX KJ=Z ≥ 30 < 30 13 4 < 30 ≥ 30 6 17 7 6.2.2 Continuous Probabilities • A function 𝑓: R^ ⟶ R is called a probability density function (pdf) if • Its integral exists and ∀𝒙 ∈ R^:𝑓(𝒙) ≥ 0 c 𝑓(𝒙)𝑑𝒙=1. Rd • Observe that the probability density function is any function 𝑓 that is non- negative and integrates to one. We associate a random variable 𝑋 with this function 𝑓 by where 𝑎, 𝑏 ∈ R; 𝑥 ∈ R are outcomes of the continuous random variable 𝑋. This association is called the distribution of the random variable 𝑋. • Note: the probability of a continuous random variable 𝑋 taking a particular value 𝑃 𝑋 = 𝑥 is zero. This is to specify an interval where 𝑎 = 𝑏 𝑃 𝑎≤𝑋≤𝑏 =c 𝑓(𝑥)𝑑𝑥 h i 8 6.2.2 Continuous Probabilities • A cumulative distribution function (cdf) of a multivariate real-valued random variable 𝑋 with states 𝒙 ∈ Rj is given by 𝐹l 𝒙 =𝑃𝑋=≤𝑥=,⋯,𝑋^≤𝑥^ . where 𝑋= 𝑋=,⋯,𝑋^ n,𝒙= 𝑥=,⋯,𝑥^ n, and the right-hand side represents the probability that random variable 𝑋% takes the value smaller than or equal to 𝑥% . • The cdf can be expressed also as the integral of the probability density function 𝑓(𝒙) so that qW qd 𝐹l𝒙=c ⋯c 𝑓𝑧=,⋯,𝑧^𝑑𝑧=⋯𝑑𝑧^ op op 9 6.2.3 Contrasting Discrete and Continuous Distributions • Let 𝑍 be a discrete uniform random variable with three states {𝑧 = −1.1, 𝑧 = 0.3, 𝑧 = 1.5}. The probability mass function can be represented as a table of probability values: • States can be located on the 𝑥-axis, and the 𝑦-axis represents the probability of a particular state Uniform distribution (discrete): a finite number of values are equally likely to be observed; every one of 𝑛 values has equal probability 1/𝑛 10 6.2.3 Contrasting Discrete and Continuous Distributions • Let 𝑋 be a continuous random variable taking values in range 0.9 ≤ 𝑋 ≤ 1.6 • Observe that the height of the density can be greater than 1. However, it needs to hold that =.K c 𝑝(𝑥)𝑑𝑥 = 1 M.z Uniform distribution (continuous): denoted as 𝑈(𝑎, 𝑏) 1 𝑓𝑥 ={𝑏−𝑎 for 𝑎 ≤ 𝑥 ≤ 𝑏 0 otherwise 11 6.3 Sum Rule, Product Rule, and Bayes’ Theorem • 𝑝 𝒙, 𝒚 is the joint distribution of the two random variables 𝒙, 𝒚 • 𝑝 𝒙 and 𝑝 𝒚 are the marginal distributions • 𝑝 𝒚|𝒙 is the conditional distribution of 𝒚 given 𝒙 • The sum rule states that 𝑝𝒙= P𝑝 𝒙,𝒚 𝒚∈𝒴 c 𝑝 𝒙,𝒚 d𝒚 𝒴 if𝒚isdiscrete if𝒚iscontinuous where 𝒴 are the states of the target space of random variable 𝑌. • We sum out (or integrate out) the set of states 𝒚 of the random 𝑌 • The sum rule is also known as the marginalization property. • If 𝒙 = 𝑥=,⋯,𝑥^ n, we obtain the marginal 𝑝𝑥% =c𝑝𝑥=,⋯,𝑥^d𝒙\% by repeated application of the sum rule where we integrate/sum out all random variables except 𝑥%, which is indicated by i, which reads all “except i.” 12 𝑝𝑥 =0.4𝒩å𝒙 10 , 1 0é+0.6𝒩(𝒙 0 , 8.4 2.0é 2 0 1 0 2.0 1.7 • The distribution is bimodal (has two peaks) • One of marginal distributions is unimodal (has one peak) 13 6.3 Sum Rule, Product Rule, and Bayes’ Theorem • The product rule relates the joint distribution to the conditional distribution 𝑝𝒙,𝒚 =𝑝𝒚|𝒙𝑝𝒙 • Every joint distribution of two random variables can be factorized (written as a product) of two other distributions • The product rule also implies 𝑝𝒙,𝒚 =𝑝𝒙|𝒚𝑝𝒚 14 6.3 Sum Rule, Product Rule, and Bayes’ Theorem • Let us assume we have some prior knowledge 𝑝 𝒙 about an unobserved random variable 𝒙 and some relationship 𝑝 𝒚 | 𝒙 between 𝒙 and a second random variable 𝒚, which we can observe. If we observe 𝒚, we can use Bayes’ theorem (also Bayes’ rule or Bayes’ law) to draw some conclusions about 𝒙 given the observed values of 𝒚. óñòîóñôëëö êïñëï 𝑝𝒙|𝒚 =𝑝𝒚|𝒙 𝑝𝒙 êëíìîïñëï 𝑝𝒚 õ îúñöîùûî is a direct consequence of the product rule, since and so that 𝑝𝒙,𝒚 =𝑝𝒙|𝒚𝑝𝒚 𝑝𝒙,𝒚 =𝑝𝒚|𝒙𝑝𝒙 𝑝𝒙|𝒚𝑝𝒚 =𝑝𝒚|𝒙𝑝𝒙 ⟺𝑝𝒙|𝒚 =𝑝𝒚|𝒙𝑝𝒙 𝑝𝒚 15 𝑝 𝐿𝐵𝐽<30|𝐴𝐷≥30 = 𝑝𝒙|𝒚 =𝑝𝒚|𝒙𝑝𝒙 𝑝𝒚 𝑝 𝐴𝐷≥30|𝐿𝐵𝐽<30 𝑝 𝐿𝐵𝐽<30 𝑝 𝐴𝐷 ≥ 30 17†21 17 =21 40= 23 23 40 13 6 4 17 < 30 ≥ 30 < 30 ≥ 30 16 6.3 Sum Rule, Product Rule, and Bayes’ Theorem óñòîóñôëëö êïñëï 𝑝𝒙|𝒚 =𝑝𝒚|𝒙 𝑝𝒙 𝑝𝒚 õ unobserved (latent) variable 𝒙 before observing any data • 𝑝 𝒚 | 𝒙 , the likelihood, describes how 𝒙 and 𝒚 are related • 𝑝 𝒚 | 𝒙 is the probability of the data 𝒚 if we were to know the latent variable 𝒙 • We call 𝑝 𝒚 | 𝒙 either the “likelihood of 𝒙 (given 𝒚)” or the “probability of 𝒚 given 𝒙” (𝒚 is observed; 𝒙 is latent) • 𝑝 𝒙 | 𝒚 , the posterior, is the quantity of interest in Bayesian statistics because it expresses exactly what we are interested in, i.e., what we know about 𝒙 after having observed 𝒚 (e.g., linear regression or Gaussian mixture models) Linear regression: 𝑝 𝑦|𝒙, 𝜽 êëíìîïñëï • 𝑝 𝒙 is the prior, which encapsulates our subjective prior knowledge of the îúñöîùûî 17 6.3 Sum Rule, Product Rule, and Bayes’ Theorem óñòîóñôëëö êïñëï 𝑝𝒙|𝒚 =𝑝𝒚|𝒙 𝑝𝒙 êëíìîïñëï 𝑝𝒚 õ • The quantity îúñöîùûî 𝑝 𝒚 : = c 𝑝 𝒚 | 𝒙 𝑝 𝒙 𝑑𝒙 = 𝔼l 𝑝 𝒚 | 𝒙 • is the marginal likelihood/evidence. • The marginal likelihood integrates the numerator with respect to the latent variable 𝒙 18 6.4 Summary Statistics and Independence 6.4.1 Means and Covariances • The expected value of a function 𝑔 ∶ R ⟶ R of a univariate continuous random variable 𝑋 ∼ 𝑝(𝑥) is given by 𝔼l𝑔𝑥 =c𝑔𝑥𝑝𝑥𝑑𝑥 𝒳 • Correspondingly, the expected value of a function 𝑔 of a discrete random variable 𝑋 ∼ 𝑝(𝑥) is given by 𝔼l𝑔𝑥 =P𝑔𝑥𝑝𝑥 q∈𝒳 where 𝒳 is the set of possible outcomes (the target space) of the random variable 𝑋 • We consider multivariate random variables 𝑋 as a finite vector of univariate random variables 𝑋=, ⋯ , 𝑋V n. For multivariate random variables, we define the expected value element wise 𝔼lW 𝑔(𝑥=) 𝔼l𝑔𝒙 = ⋮ ∈R^ 𝔼ld 𝑔(𝑥^) where the subscript 𝔼l® indicates that we are taking the expected value with respect to the dth element of the vector 𝒙 19 6.4.1 Means and Covariances • The mean of a random variable 𝑋 with states 𝒙 ∈ R^ is an average and is defined as where 𝔼lW 𝑥= ^ ⋮ ∈ R 𝔼ld 𝑥^ if 𝑋 is a continuous random variable if 𝑋 is a discrete random variable for 𝑑 = 1, . . . , 𝐷, where the subscript 𝑑 indicates the corresponding dimension of 𝒙. The integral and sum are over the states 𝒳 of the target space of the random variable 𝑋. 𝔼l 𝒙 = c 𝑥©𝑝(𝑥©)d𝑥© 𝒳 P 𝑥%𝑝(𝑥© = 𝑥%) 𝔼q® 𝑥© := qÆ∈𝒳 20 6.4.1 Means and Covariances • The expected value is a linear operator. For example, given a real-valued function𝑓 𝒙 = 𝑎𝑔 𝒙 +𝑏h 𝒙 where𝑎,𝑏 ∈ Rand𝒙∈R^,weobtain 𝔼l𝑓𝒙 =c𝑓𝒙𝑝𝒙d𝒙 = c 𝑎𝑔 𝒙 + 𝑏h 𝒙 𝑝 𝒙 d𝒙 = 𝑎 c 𝑔 𝒙 𝑝 𝒙 d𝒙 + 𝑏 c h 𝒙 𝑝 𝒙 d𝒙 =𝑎𝔼l𝑔𝒙 +𝑏𝔼lh𝒙 21 An End-to-End Transformer Model for 3D Object Detection. et al., ICCV 2021 6.4.1 Means and Covariances • The covariance between two univariate random variables 𝑋, 𝑌 ∈ R is given by the expected product of their deviations from their respective means, i.e., 𝐶𝑜𝑣l,2 𝑥,𝑦 ≔𝔼l,2 𝑥−𝔼l 𝑥 𝑦−𝔼2 𝑦 • By using the linearity of expectations, It can be rewritten as the expected value of the product minus the product of the expected values, i.e., 𝐶𝑜𝑣𝑥,𝑦 =𝔼𝑥𝑦 −𝔼𝑥𝔼𝑦 • The covariance of a variable with itself 𝐶𝑜𝑣 𝑥, 𝑥 is called the variance and is denoted by 𝕍l 𝑥 . • The square root of the variance is called the standard deviation and is often denoted by 𝜎(𝑥). • If we consider two multivariate random variables 𝑋 and 𝑌 with states 𝒙 ∈ R^and 𝒚 ∈ R¶ respectively, the covariance between 𝑋 and 𝑌 is defined as Cov𝒙,𝒚 =𝔼𝒙𝒚n −𝔼𝒙𝔼𝒚n=𝐶𝑜𝑣𝒚,𝒙n∈R^׶ 23 6.4.1 Means and Covariances • The variance of a random variable 𝑋 with states 𝒙 ∈ R^ and a mean vector 𝝁∈R^ isdefinedas 𝕍l 𝒙 =Covl 𝒙,𝒙 =𝔼l 𝒙−𝝁 𝒙−𝝁n =𝔼l𝒙𝒙n −𝔼l𝒙𝔼l𝒙n 𝐶𝑜𝑣 𝑥=,𝑥= 𝐶𝑜𝑣 𝑥=,𝑥> ⋯ 𝐶𝑜𝑣 𝑥=,𝑥^ 𝐶𝑜𝑣 𝑥>,𝑥= 𝐶𝑜𝑣 𝑥>,𝑥> ⋯⋱ 𝐶𝑜𝑣 𝑥>,𝑥^
=
𝐶𝑜𝑣 𝑥^,𝑥= ⋯ 𝐶𝑜𝑣 𝑥^,𝑥^
⋮⋮⋯⋮
• The 𝐷 × 𝐷 matrix is called the covariance matrix of the multivariate random variable 𝑋. The covariance matrix is symmetric and positive definite and tells us something about the spread of the data.
• On its diagonal, the covariance matrix contains the variances of 𝑥%
24

Example – Computation of covariance matrix
𝑥= • 𝑋= 𝒙=,𝒙> = 2 1 𝑥>
3 1 𝑥I 2
13
• 𝝁 = 1.5 ;
2
−1 1
• 𝑋−𝝁= 0.5 −0.5 1 −1
𝑥= 𝑥> 𝑥I 𝑥 o = 2 −1 −2 =
=
• ∑=> 𝑋−𝝁 𝑋−𝝁 => −1 0.5 1 𝑥>
−2 1 2 𝑥I
25

6.4.1 Means and Covariances
• The correlation between two random variables 𝑋, 𝑌 is given by
corr 𝑥,𝑦 = Cov 𝑥,𝑦 ∈ −1,1 𝕍𝑥𝕍𝑦
• The covariance (and correlation) indicate how two random variables are related;
• Positive correlation corr[𝑥, 𝑦] means that when 𝑥 grows, then 𝑦 is also expected to grow. Negative correlation means that as 𝑥 increases, then 𝑦 decreases
Two-dimensional datasets with identical means and variances along each axis (colored lines) but with different covariances.
26

6.4.2 Empirical Means and Covariances
• In 6.4.1 we defined population mean and covariance, as it refers to the true
statistics for the population
• In machine learning, we have a finite dataset of size 𝑁
• The empirical mean vector is the arithmetic average of the observations for each variable, and it is defined as ¿
𝒙ø ≔ 𝑁1 P 𝒙 V V<= Where 𝒙V ∈ R^. • The empirical covariance matrix is a 𝐷×𝐷 matrix 1¿ P : = 𝑁 P ( 𝒙 V − 𝒙ø ) ( 𝒙 V − 𝒙ø ) n V<= • To compute the statistics for a particular dataset, we would use the observations 𝒙=, . . . , 𝒙¿ and use the two equations above. 27 6.4.3 Three Expressions for the Variance • The standard definition of variance is the expectation of the squared deviation of a random variable 𝑋 from its expected value μ, i.e., 𝕍l𝑥:=𝔼l 𝑥−𝜇>
• This is equivalent to the mean of a new random variable 𝑍 ∶= (𝑋 − 𝜇)>.
• We use a two-pass algorithm: one pass through the data to calculate μ, and then a second pass using this estimate μà to calculate the variance.
• It can be converted to the so-called raw-score formula for variance: 𝕍l 𝑥 = 𝔼l 𝑥> − (𝔼l 𝑥 )>
• The mean of the square minus the square of the mean. It can be calculated empirically in one pass
• A third way to understand the variance is that it is a sum of pairwise differences between all pairs of observations. Consider a sample 𝑥=, . . . , 𝑥¿ of realizations of random variable 𝑋, and we compute the squared difference between pairs of 𝑥% and 𝑥). By expanding the square, we can show that the
sum of 𝑁>pairwise differences is the empirical variance of the observations:
1¿1¿1¿ 𝑁>P(𝑥%−𝑥))>=⋯=2𝑁P𝑥%>− 𝑁P𝑥%
>
%,)<= %<= %<= 28 6.4.4 Sums and Transformations of Random Variables • Considertworandomvariables𝑋,𝑌withstates𝒙,𝒚∈R^.Then: • 𝔼𝒙+𝒚 =𝔼𝒙 +𝔼𝒚 • 𝔼𝒙−𝒚 =𝔼𝒙 −𝔼𝒚 • 𝕍𝒙+𝒚 =𝕍𝒙 +𝕍𝒚 +Cov𝒙,𝒚 +Cov𝒚,𝒙 • 𝕍𝒙−𝒚 =𝕍𝒙 +𝕍𝒚 −Cov𝒙,𝒚 −Cov𝒚,𝒙 • Mean and (co)variance have useful properties when it comes to affine transformation of random variables. Consider a random variable 𝑋 with mean 𝝁 and covariance matrix 𝜮 and an affine transformation 𝒚 = 𝑨𝒙 + 𝒃 of 𝒙. Then 𝒚 is itself a random variable whose mean vector and covariance matrix are given by • 𝔼2 𝒚 =𝔼l 𝑨𝒙+𝒃 =𝑨𝔼l 𝒙 +𝒃=𝑨𝝁+𝒃 • 𝕍2 𝒚 =𝕍l 𝑨𝒙+𝒃 =𝕍l 𝑨𝒙 =𝑨𝕍l 𝒙𝑨n=𝑨𝜮𝑨n • Furthermore, • Cov𝒙,𝒚 =𝔼𝒙𝑨𝒙+𝒃n −𝔼𝒙𝔼𝑨𝒙+𝒃n • =𝔼𝒙𝒃n+𝔼𝒙𝒙n 𝑨n−𝝁𝒃n−𝝁𝝁n𝑨n • =𝝁𝒃n−𝝁𝒃n+ 𝔼𝒙𝒙n −𝝁𝝁n 𝑨n • =𝜮𝑨n • whereΣ = 𝔼 𝒙𝒙n − 𝝁𝝁n isthecovarianceof𝑋. 29 6.4.5 Statistical Independence • Two random variables 𝑋, 𝑌 are statistically independent if and only if 𝑝𝒙,𝒚 =𝑝𝒙𝑝𝒚 • Intuitively, two random variables 𝑋 and 𝑌 are independent if the value of 𝒚 (once known) does not add any additional information about 𝒙 (and vice versa). If 𝑋, 𝑌 are (statistically) independent, then 𝑝𝒚|𝒙 =𝑝𝒚 𝑝𝒙|𝒚=𝑝𝒙 Covl,2 𝒙,𝒚 =𝟎 𝕍l,2𝒙+𝒚 =𝕍l𝒙 +𝕍2𝒚 • The last point may not hold in converse, i.e., two random variables can have covariance zero but are not statistically independent. • covariance measures only linear dependence. Therefore, random variables that are nonlinearly dependent could have covariance zero. • Example. Consider a random variable 𝑋 with zero mean (𝔼l 𝑥 = 0) and also 𝔼l 𝑥I = 0. Let 𝑦 = 𝑥> (hence, 𝑌 is dependent on 𝑋) and consider the covariance between 𝑋 and 𝑌. But this gives I
Cov𝑥,𝑦=𝔼𝑥𝑦−𝔼𝑥𝔼𝑦=𝔼𝑥 =0
30

6.5 Gaussian Distribution
• The Gaussian distribution is the most well-studied probability distribution for continuous-valued random variables.
• It is also referred to as the normal distribution.
• For a univariate random variable, the Gaussian distribution has a density that
is given by
• The multivariate Gaussian distribution is fully characterized by a mean vector
μ and a covariance matrix 𝜮 and defined as
𝑝 𝒙 𝝁 , 𝜮 = 2 𝜋 o ^> 𝜮 o => e x p ( − 12 𝒙 − 𝝁 o 𝜮 o = 𝒙 − 𝝁 )
where𝒙 ∈ Rj.Wewrite𝑝(𝑥)=𝒩 𝒙𝝁,𝜮 or𝑋 ∼𝒩(𝝁,𝜮).
• The special case of the Gaussian with zero mean and identity covariance,
that is, 𝝁 = 𝟎 and 𝜮 = 𝑰, is referred to as the standard normal distribution. 31
𝑝𝑥𝜇,𝜎>= 1 exp−𝑥−𝜇> 2𝜋𝜎> 2𝜎>

6.5 Gaussian Distribution
• Figure below shows a univariate Gaussian and a bivariate Gaussian with corresponding samples.
32

Spherical Gaussian
• General probability density function
𝑝 𝒙 𝝁 , 𝜮 = 2 𝜋 o dX 𝜮 o WX e x p ( − => 𝒙 − 𝝁 o 𝜮 o = 𝒙 − 𝝁 )
• Spherical Gaussian
•𝑝𝒙|𝝁,𝜎> =2𝜋𝜎>o^/>exp−= 𝒙−𝝁>, 𝝁∈R^,𝜎∈R. >ÕX
𝜎> 0 0
0 𝜎> 0 diagonal covariance, equal variances 0 0 𝜎>
P=
33

Spherical Gaussian
34

Gaussian with diagonal covariance matrix (variance not equal for different 𝑥%)
35

Gaussian with full covariance matrix
36

6.5.1 Marginals and Conditionals of Gaussians are Gaussians
• Let 𝑋 and 𝑌 be two multivariate random variables that may have different dimensions.
• We write the Gaussian distribution in terms of the concatenated states[𝒙; 𝒚], 𝑝𝒙,𝒚=𝒩𝝁q,𝚺qq 𝚺qŒ
where 𝚺qq = Cov 𝒙, 𝒙 and 𝚺ŒŒ = Cov 𝒚, 𝒚 are the marginal covariance matrices of 𝒙 and 𝒚, respectively, and 𝚺qŒ = Cov 𝒙, 𝒚 is the cross-covariance matrix between 𝒙 and 𝒚.
• The conditional distribution 𝑝(𝒙 | 𝒚) is also Gaussian and given by 𝑝𝒙│𝒚 =𝒩 𝝁q│Œ,𝚺q│Œ
𝝁 =𝝁+𝚺𝚺o=𝒚−𝝁 q│Œ q qŒŒŒo= Œ
𝚺q│Œ = 𝚺qq − 𝚺qŒ𝚺ŒŒ𝚺Œq
• The marginal distribution 𝑝(𝒙) of a joint Gaussian distribution 𝑝(𝒙, 𝒚) is itself
Gaussian and computed by applying the sum rule and given by
𝑝(𝒙) = c 𝑝(𝒙, 𝒚)𝑑𝒚 = 𝒩(𝒙│𝝁q, 𝚺qq)
𝝁Œ 𝚺Œq 𝚺ŒŒ
37

• Consider the bivariate Gaussian distribution
𝑝𝑥=,𝑥> =𝒩 0,0.3 −1 2 −1 5
• We can compute the parameters of the univariate Gaussian, conditioned on 𝑥> = −1, to obtain the mean and variance respectively.
• Numerically,thisi𝜇s
𝜎>qW│qX=−1 =𝒩0.6,0.1
qW│qX =𝒩 0,0.3 −1 2 −1 5
• The marginal distribution 𝑝 𝑥= can be obtained by
𝑝(𝒙) = c 𝑝(𝒙, 𝒚)𝑑𝒚 = 𝒩(𝒙│𝝁q, 𝚺qq)
which is essentially using the mean and variance of the random variable 𝑥=. So we have,
𝑝𝑥= =𝒩0,0.3
39

Check your understanding
• Covariance (correlation) of univariate random variables has two directions, i.e., negative and positive values have different meanings, while variance does not.
• When the covariance of two random variables equals to the sum of their individual covariances, the two variables are statistically independent.
• The empirical mean = ∑¿ 𝒙 approximates the expected value ¿ V<= V ∫𝒳 𝑔 𝒙 𝑝 𝒙 𝑑𝒙ofarandomvariablewhen𝑁islarge. • In the figure (cumulative distribution function, cdf), which Gaussian has the largest variance? • Green?Blue?Red?Yellow? • What is the mean of the red Gaussian? • The cdf terminates at →1. • The cdf starts from → 0. • The cdf always increases. 40 https://en.wikipedia.org/wiki/Normal_distribution

Leave a Reply

Your email address will not be published. Required fields are marked *