Home
Omar Hosney
๐ Statistical Analysis in Python
t-test ๐งช
- Independent t-test: Compares means between two groups.
from scipy.stats import ttest_ind
group1 = [20, 21, 22, 23, 24]
group2 = [25, 26, 27, 28, 29]
t_stat, p_val = ttest_ind(group1, group2)
print(t_stat, p_val)
- Paired t-test: Compares means from the same group at different times.
from scipy.stats import ttest_rel
before = [20, 21, 22, 23, 24]
after = [25, 26, 27, 28, 29]
t_stat, p_val = ttest_rel(before, after)
print(t_stat, p_val)
- One-sample t-test: Tests if the sample mean is different from a known value.
from scipy.stats import ttest_1samp
sample = [20, 21, 22, 23, 24]
t_stat, p_val = ttest_1samp(sample, 22)
print(t_stat, p_val)
ANOVA ๐
- One-way ANOVA: Compares means across multiple groups.
from scipy.stats import f_oneway
group1 = [20, 21, 22, 23, 24]
group2 = [25, 26, 27, 28, 29]
group3 = [30, 31, 32, 33, 34]
f_stat, p_val = f_oneway(group1, group2, group3)
print(f_stat, p_val)
- Two-way ANOVA: Assesses the effect of two factors.
import statsmodels.api as sm
from statsmodels.formula.api import ols
data = {'value': [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
'group1': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
'group2': ['X', 'X', 'Y', 'Y', 'X', 'X', 'Y', 'Y', 'X', 'Y']}
df = pd.DataFrame(data)
model = ols('value ~ C(group1) + C(group2) + C(group1):C(group2)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
Chi-Square Test ๐ข
- Chi-Square Test: Tests relationship between categorical variables.
from scipy.stats import chi2_contingency
table = [[10, 20, 30], [6, 9, 17], [8, 15, 20]]
chi2, p, dof, expected = chi2_contingency(table)
print(chi2, p)
Correlation ๐
- Pearson correlation: Measures linear relationship.
from scipy.stats import pearsonr
x = [1, 2, 3, 4, 5]
y = [2, 3, 4, 5, 6]
corr, p_val = pearsonr(x, y)
print(corr, p_val)
- Spearman correlation: Measures rank-order relationship.
from scipy.stats import spearmanr
x = [1, 2, 3, 4, 5]
y = [2, 3, 4, 5, 6]
corr, p_val = spearmanr(x, y)
print(corr, p_val)
Regression ๐
- Linear Regression: Models relationship between variables.
from sklearn.linear_model import LinearRegression
X = [[1], [2], [3], [4], [5]]
y = [2, 3, 4, 5, 6]
model = LinearRegression()
model.fit(X, y)
print(model.coef_, model.intercept_)
- Logistic Regression: Models binary outcomes.
from sklearn.linear_model import LogisticRegression
X = [[1], [2], [3], [4], [
5]]
y = [0, 0, 1, 1, 1]
model = LogisticRegression()
model.fit(X, y)
print(model.coef_, model.intercept_)
Non-parametric Tests ๐
- Mann-Whitney U test: Compares differences between two independent groups.
from scipy.stats import mannwhitneyu
group1 = [1, 2, 3, 4, 5]
group2 = [6, 7, 8, 9, 10]
u_stat, p_val = mannwhitneyu(group1, group2)
print(u_stat, p_val)
- Wilcoxon signed-rank test: Compares differences between two related groups.
from scipy.stats import wilcoxon
before = [20, 21, 22, 23, 24]
after = [25, 26, 27, 28, 29]
w_stat, p_val = wilcoxon(before, after)
print(w_stat, p_val)