data direct; input x y; cards; 1 17.5 3 20.5 ;Here is an example of data input from another file:
data pulse; infile '/p/stat/Data/MJ/pulse.dat' missover; input x y;The names direct and pulse are arbitrary, but can be used later in your SAS program to identify this particular data set. Details of input phrases (use either infile or cards, but not both):
data a; create new data set named "a" input x y z; input 3 numbers at a time as variables x,y,z input trt $ x y; input treatment "trt" as a character string and x,y as numbers. Note the dollar sign ($). infile 'blah.dat' missover; use file "blah.dat" for the data "missover" skips over missing data rather than going to a new line (must appear BEFORE the input phrase) infile 'blah.dat' firstobs=2; skip first observation (first line) handy way to document column names infile 'blah.dat' lrecl=2000; allow for really long records readlines; same as cards (I think!) cards; read data from following lines (must appear AFTER the input phrase) ; end of data entry for "cards" phrase (good convention, but not required)Data values must have spaces between them (tabs can cause problems on some systems). All values must be on the same line if using the missover option. Missing data is represented by a period (.) as place holder. This can also be useful for estimation and prediction at new values using proc reg.
data logs; set direct; logy = log(y);This creates a new data set logs from the set direct from data input above. The variable logy is created as the natural log of the variable y. Here are details of the first line and some transformations:
data a; set b; create data set "a" using existing set "b" z = log(y); create variable z as natural log of variable y z = log10(y); log base 10 z = sqrt(y); square root z = x*y; multiplication (+ addition) (- subtraction) (/ division) z = y**2; exponent: "y squared" or "y to the 2nd power" z = y**0.5; "y to the 1/2 power" (same as sqrt(y)) z = x**-2; negative exponent: "1 over (x squared)" z = sin(x); trigonometric sine function of x (also cos(x), tan(x), ...)
data a; set b; z = sqrt(count); /* counts (Poisson distribution) */ /* variance proportional to mean */ z = log(conc); /* concentrations, weights (log normal) */ /* SD proportional to mean */ /* constant coefficient of variation (CV) */ z = arsin(sqrt(prop)); /* proportions (0-1) */ z = arsin(sqrt(pct/100)); /* percentages (0-100) */ /* (Binomial distribution) */ /* variance proportional highest in middle */
data other; set big; /* create other from big */ if x > 10; /* only use these cases */Suppose you had data set field with 3 treatments called control, wet, dry and you wanted to delete the control group for some procedures,
data trtonly; set field; /* create trtonly from field */ if trt = 'control' then delete; /* delete control group */Here is some more detail on the if phrase:
g = 0; /* g=0 for large x */ if x < 10 then g = 1; /* g=1 for small x */ if y = 99 then y = .; /* recode 99 as missing data */ if y = . then y = 0; /* recode missing data as 0 */ if z < 10 or y > 10 then x = 5; /* examples of union (or) */ if z < 10 and y > 10 then x = 6; /* and intersection (and) */ if x <= 10; /* keep only x at most 10 */ if x >= 10; /* keep only x at least 10 */ if not (x = 10); /* keep only if x is not 10 */You already saw how to add variables in transformations above. You can drop variables:
data a; set b; z = log(y); /* create new variable z */ drop y; /* drop old variable y */Usually dropping is NOT done because the cost of carrying the unused variables is very small (unless you have a lot of data!). However, this is sometimes useful if the data need to be presented in a different way. For instance,
data abc; input n0 n1 n2 n3 n4 n5; cards; 1.4 1.5 1.2 2.1 2.1 2.8 1.7 1.4 1.0 1.4 1.7 2.1 1.1 1.9 2.5 2.6 2.1 2.2 1.7 1.3 1.1 1.0 2.0 1.8 1.0 1.8 1.5 1.4 2.2 2.3 data resps; set abc; resp = n0; level = 0; output; resp = n1; level = 1; output; resp = n2; level = 2; output; resp = n3; level = 3; output; resp = n4; level = 4; output; resp = n5; level = 5; output; drop n0--n5;Basically, the output phrase produces a new observation after we create the variables resp and level.
data a; do i=1 to 10; uni=ranuni(0); /* an argument of 0 uses the clock as a seed */ /* otherwise, use a 5 to 7 digit odd number */ output; end;Note the use of a do loop, which is ended by an end; phrase. The output forces creation of a new case for each uniform number. Each case in set a will have the variables uni and i. Here are the random number generators:
x = ranuni(seed) /* uniform between 0 & 1 */ x = a+(b-a)*ranuni(seed); /* uniform between a & b */ x = ranbin(seed,n,p); /* binomial size n prob p */ x = rancau(seed); /* cauchy with loc 0 & scale 1 */ x = a+b*rancau(seed); /* cauchy with loc a & scale b */ x = ranexp(seed); /* exponential with scale 1 */ x = ranexp(seed) / a; /* exponential with scale a */ x = a-b*log(ranexp(seed)); /* extreme value loc a & scale b */ x = rangam(seed,a); /* gamma with shape a */ x = b*rangam(seed,a); /* gamma with shape a & scale b */ x = 2*rangam(seed,a); /* chi-square with d.f. = 2*a */ x = rannor(seed); /* normal with mean 0 & SD 1 */ x = a+b*rannor(seed); /* normal with mean a & SD b */ x = ranpoi(seed,a); /* poisson with mean a */ x = rantri(seed,a); /* triangular with peak at a */ x = rantbl(seed,p1,p2,p3); /* random from (1,2,3) with probs */ /* p1,p2,p3 */The seed above is either 0 (use clock to randomly start sequence); positive (used as initial seed -- it should be odd and less than 2**31-1); or negative (use the clock to restart the sequence every time). The performance is untested for 0 or negative seed -- use at your own risk. The seed is only examined on the first encounter with a random number generator in your program, so you cannot change the process once you begin.
data uniform; do i = 1 to 20; x = ranuni(0); output; end; data a; merge b uniform; proc sort; by x; data c; set a; /* _N_ = line number */ trt = ceil(_N_ / 5); /* ceil = next highest integer */ proc sort; by id; proc print; var id trtHere is a randomized comblete design, with 3 blocks and 4 treatments per block. We assign the treatments 1,2,3,4 at random to the 4 sites within a block.
data a; do block = 1 to 3; do site = 1 to 4; x = ranuni(0); output; end; end; proc sort; by block x; data c; set a; trt = 1 + mod(_N_ - 1, 4); /* mod = remainder of _N_/4 */ proc sort; by block site; proc print; var block site trt;
Last modified: Tue Feb 6 14:12:35 1996 by Brian Yandell Tue Feb 14 11:09:50 1995 by Stat Www (statwww@stat.wisc.edu)