農林漁牧網

您現在的位置是:首頁 > 農業

每日文章(七十三)統計估計和假設檢驗,也可以很簡單(中英文)

2022-02-22由 倚天聽說 發表于 農業

什麼是統計假設檢驗

Daily article 73: Estimation and Hypothesis Testing can be easy

Finally, we confront estimation and hypothesis testing, the two subdivisions of statistical inference。 I know they are pretty annoying for some of you, so please bear with me in this article。 We are not going to walk through all the complicated distributions, equations and concepts (degree of freedom, test statistics, reject hypothesis, etc。 ) Once again, we focus on the logic behind。 In real life, we usually can’t get all statistical data of a population, so we use sampling data instead。 The purpose of both estimation and hypothesis testing is to find parameters of the population according to the parameters of samples。

Suppose we draw a large sample from a population and calculate the mean of that sample。 For example, we have 100 people in a sample。 We calculate the average height of people in that sample (let’s say 170cm) and we want to find the average height of all people。 Usually, we don’t want to calculate a specific number but a range within which we believe that the average height of all people will fall。 Narrowly, estimation is to find that range。 We can definitely come up with a very wide range ( let’s say 100cm-200cm) but it’s just nonsense。 We can also come up with a very narrow range (let’s say 169cm-171cm) but we are not so sure about it。 Usually we first determine a degree of confidence, which means how certain we are about the result。 Then we use the sample mean, standard error and degree of confidence to calculate the range。 Intuitively, the range should be around sample mean。

Unlike estimation which is to find a range, Hypothesis Testing is to figure out whether my statement about a population is true。 For example, I can make a statement according to the sample of 100 people with average height of 170cm: the average height of all people is 172cm。 Hypothesis Testing is to figure out whether my statement is true。 You may think my statement is false because the two numbers are different。 But in hypothesis testing, I’m actually guessing an interval rather than a specific number。 According to the standard error and significance level (measures how certain I am about the result), we usually calculate a range around my guess and check whether the parameter of sample falls in that range。 For example, although the statement I make is 172cm, we may calculate a range of 169cm - 175cm。 Since the sample mean 170cm falls in the range, my statement is therefore true。 Estimation and Hypothesis Testing are pretty tricky but interesting。

終於,我們要討論統計估計和假設檢驗了,這兩個概念屬於統計推斷的範疇。我知道它們挺煩人的,所以請你稍加忍耐。我們不會討論那些複雜的分佈、公式和概念(比如自由度,檢驗統計量,拒絕假設之類的)我們再一次地,從背後的邏輯來理解這兩個概念。在現實生活中,我們往往不能獲取總體的統計資料,所以我們一般採用抽取樣本的方式。統計估計和假設檢驗的目的,都是透過樣本引數(比如均值)去找到總體引數。

假設我們從總體中抽取了一個大的樣本,並計算了樣本的均值。比如,我們抽取了一個含有100個人的樣本,計算了樣本人群的平均身高(假設是170cm)並希望由此算出所有人的平均身高。通常情況下,我們並不指望能算出一個確切的數字,而是算出一個區間,我們認為所有人的平均身高落在這個區間裡。狹義來說,統計估計就是去找到這個區間。我們當然可以隨便說一個很大的區間(比如100cm到200cm),但它沒有實際價值。我們也可以說一個很小的區間(比如169cm-171cm),但這個結果的可信度就不一定很高了。通常情況下我們先確定一個置信度,它衡量了我們對於結果的確信程度。然後我們利用樣本的均值,標準誤差和置信度來算出這個區間。不言而喻,這個區間應該是在樣本均值附近的。

與統計估計不同,假設檢驗則是去檢驗我關於總體的一個陳述是否為真。比如,根據上面的樣本,我可以做出如下陳述:所有人的平均身高是172cm。假設檢驗就是去檢驗我這句話是否是真實的。你可能覺得這句話很明顯是錯的,因為170和172不相等。但事實上在假設檢驗中,我的陳述並不僅僅是一個確定的值,而是它附近的一個區間,這個區間是根據顯著性水平(同樣是衡量了我們對於結果的確信程度)和標準誤差算出來的。我們會看樣本均值是否落在這個區間裡,來決定我的陳述是否為真。比如,雖然我說的是172cm,但算出來的區間可能是169-175cm。而170cm落在這個區間裡,所以我的這句陳述事實上為真。統計估計和假設檢驗就是這樣,難以掌握又令人著迷。

每日文章(七十三)統計估計和假設檢驗,也可以很簡單(中英文)