Tools/SAS
SAS - 표본 추출
Deepplay
2017. 12. 7. 01:42
반응형
SAS를 통한 표본 추출
임의추출
코드
FILENAME REFFILE 'C:\Users\\sample.csv';
PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=WORK.population;
GETNAMES=YES;
RUN;
proc surveyselect data=population method=srs n=200
out=Work.sample;
run;
결과
The SAS System |
The SURVEYSELECT Procedure
Selection Method | Simple Random Sampling |
---|
Input Data Set | POPULATION |
---|---|
Random Number Seed | 137581001 |
Sample Size | 200 |
Selection Probability | 0.13708 |
Sampling Weight | 7.295 |
Output Data Set | SAMPLE |
표본 평균 계산
코드
proc surveymeans data=sample total=1459;
var MSSubclass;
run;
결과
The SAS System |
The SURVEYMEANS Procedure
Data Summary | |
---|---|
Number of Observations | 200 |
Statistics | |||||
---|---|---|---|---|---|
Variable | N | Mean | Std Error of Mean | 95% CL for Mean | |
MSSubClass | 200 | 59.450000 | 2.648750 | 54.2267810 | 64.6732190 |
- 이 때, 표본 평균의 분산 추정량은 (N-n)/N * s^2/n 으로 계산된다. (유한 모집단이기 때문에 유한모집단 수정계수를 곱한다.)
- 모분산(sigma^2)을 아는 경우에는 (N-n)/(N-1) * sigma^2/n이다.
표본 비율의 추정
코드
data new;
set sample;
bin=(MSZoning='RL');
run;
proc surveymeans data=new total=1459;
var bin;
run;
결과
The SAS System |
The SURVEYMEANS Procedure
Data Summary | |
---|---|
Number of Observations | 200 |
Statistics | |||||
---|---|---|---|---|---|
Variable | N | Mean | Std Error of Mean | 95% CL for Mean | |
bin | 200 | 0.760000 | 0.028124 | 0.70454146 | 0.81545854 |
층화 추출법
* LotShape라는 변수를 기준으로 층화추출한다. ;
proc sort data=population;
by MasVnrType;
run;
proc freq data=population;
tables MasVnrType;
run;
The SAS System |
The FREQ Procedure
MasVnrType | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
BrkCmn | 10 | 0.69 | 10 | 0.69 |
BrkFace | 434 | 29.75 | 444 | 30.43 |
NA | 16 | 1.10 | 460 | 31.53 |
None | 878 | 60.18 | 1338 | 91.71 |
Stone | 121 | 8.29 | 1459 | 100.00 |
proc surveyselect data=population method=srs n=(5,5,5,5,5)
out=sample2;
strata MasVnrType;
run;
The SAS System |
The SURVEYSELECT Procedure
Selection Method | Simple Random Sampling |
---|---|
Strata Variable | MasVnrType |
Input Data Set | POPULATION |
---|---|
Random Number Seed | 132740001 |
Number of Strata | 5 |
Total Sample Size | 25 |
Output Data Set | SAMPLE2 |
반응형