The INPUT statement is quite powerful and versatile. When creating SAS data sets from instream data or from raw data stored in external files, the INPUT statement should be used to describe how the raw data fields should be read and stored in the new SAS data set.
When creating SAS data sets from raw data stored in external files, using the INPUT statement with the INFILE statement is recommended.
When using instream data to create SAS data sets, the INPUT statement should be used with the DATALINES, CARDS or LINES statements.
In SAS, there are four types of INPUT STYLES that describe how raw data should be read. They are:
1. Column input
2. Formatted input
3. List input (simple and modified)
4. Named input
When creating SAS data sets from raw data stored in external files, using the INPUT statement with the INFILE statement is recommended.
When using instream data to create SAS data sets, the INPUT statement should be used with the DATALINES, CARDS or LINES statements.
In SAS, there are four types of INPUT STYLES that describe how raw data should be read. They are:
1. Column input
2. Formatted input
3. List input (simple and modified)
4. Named input
COLUMN INPUT:
This input style should be used when reading standard character or numeric values that are stored in fixed fields.
Standard numeric values only contain:
· Numbers
· Decimal points
· Scientific or E-notation
· The plus or minus signs.
Nonstandard numeric values are as follows:
· They contain special characters such as commas, dollar signs and percent signs.
· Date, time or datetime values.
· Stored as fraction, integer binary, real binary or hexadecimal formats.
The requirement that all the data be stored in fixed fields simply means that for all rows of raw data, each data field should be stored in the same column. The following is an example of standard numeric and character data stored in fixed fields. FIG 1:
This input style should be used when reading standard character or numeric values that are stored in fixed fields.
Standard numeric values only contain:
· Numbers
· Decimal points
· Scientific or E-notation
· The plus or minus signs.
Nonstandard numeric values are as follows:
· They contain special characters such as commas, dollar signs and percent signs.
· Date, time or datetime values.
· Stored as fraction, integer binary, real binary or hexadecimal formats.
The requirement that all the data be stored in fixed fields simply means that for all rows of raw data, each data field should be stored in the same column. The following is an example of standard numeric and character data stored in fixed fields. FIG 1:
When using the COLUMN INPUT style for reading raw data, the following SAS syntax should be used:
INPUT variable <$> startcol-endcol …;
· variable is used for specifying the name you have assigned to the field.
· $ is optional and is used to specify that the variable type is character. If the variable type is numeric then $ should not be used.
· startcol specifies the starting column for the variable.
· endcol specifies the ending column for the variable.
The following SAS code depicts the use of the COLUMN INPUT style in reading the above raw data. The raw data will be treated as instream data, in which case the DATALINES statement will be used.
INPUT variable <$> startcol-endcol …;
· variable is used for specifying the name you have assigned to the field.
· $ is optional and is used to specify that the variable type is character. If the variable type is numeric then $ should not be used.
· startcol specifies the starting column for the variable.
· endcol specifies the ending column for the variable.
The following SAS code depicts the use of the COLUMN INPUT style in reading the above raw data. The raw data will be treated as instream data, in which case the DATALINES statement will be used.
It is important to emphasize the advantages of using the COLUMN INPUT style when reading raw data.
· Character variables that contain embedded blanks can be read without any additional code.
An example would be using the following SAS code to read an address with the value: “2 15th Street Northwest”.
INPUT address $ 1-30;
· If there is missing data for a particular field then no placeholder is required. When SAS gets to the specified column(s), the data is simply read as missing. This does not cause other fields to be read incorrectly.
· An entire field or parts of a field can be read by simply specifying the desired columns in the code. For example, if the raw data contains the value “Male”, then the following code can be used to read just the first letter of that raw data field.
INPUT gender $ 1;
· The fields of the raw data do not have to be separated by blanks or other delimiters. As long as the programmer is knowledgeable of the data, then appropriate columns can be specified for reading the data correctly. For example, the following code can be used to read the data
“1233ONM1979”.
INPUT ID 1-4 Province $ 5-6 Gender $ 7 DOB 8-11;
FORMATTED INPUT:
This input style is similar to COLUMN INPUT except for the fact that it can also be used for reading nonstandard data. In summary, the FORMATTED INPUT style can be used when reading both standard and nonstandard data in fixed fields.
When using the FORMATTED INPUT style for reading raw data, the following SAS syntax should be used:
INPUT <pointer-control> variable informat.;
· Pointer-control is used for positioning the input pointer on a specified column. Two common pointer controls are @n and n+. The pointer control @n moves the input pointer to a specific column number n whereas the pointer control +n moves the input pointer forward n places to a column number that is relative to the current position.
· variable is used for specifying the name you have assigned to the field.
· Informat is a special SAS instruction that specifies how raw data should be read. Some common SAS informats are:
o $w. for reading character values
o w.d for reading standard numeric data.
o COMMAw.d, PERCENTw.d, DATEw., MMDDYYw. and DATETIMEw. for reading a variety of nonstandard numeric data.
The following SAS code depicts the use of the FORMATTED INPUT style in reading the FIG 1 data set above.
· Character variables that contain embedded blanks can be read without any additional code.
An example would be using the following SAS code to read an address with the value: “2 15th Street Northwest”.
INPUT address $ 1-30;
· If there is missing data for a particular field then no placeholder is required. When SAS gets to the specified column(s), the data is simply read as missing. This does not cause other fields to be read incorrectly.
· An entire field or parts of a field can be read by simply specifying the desired columns in the code. For example, if the raw data contains the value “Male”, then the following code can be used to read just the first letter of that raw data field.
INPUT gender $ 1;
· The fields of the raw data do not have to be separated by blanks or other delimiters. As long as the programmer is knowledgeable of the data, then appropriate columns can be specified for reading the data correctly. For example, the following code can be used to read the data
“1233ONM1979”.
INPUT ID 1-4 Province $ 5-6 Gender $ 7 DOB 8-11;
FORMATTED INPUT:
This input style is similar to COLUMN INPUT except for the fact that it can also be used for reading nonstandard data. In summary, the FORMATTED INPUT style can be used when reading both standard and nonstandard data in fixed fields.
When using the FORMATTED INPUT style for reading raw data, the following SAS syntax should be used:
INPUT <pointer-control> variable informat.;
· Pointer-control is used for positioning the input pointer on a specified column. Two common pointer controls are @n and n+. The pointer control @n moves the input pointer to a specific column number n whereas the pointer control +n moves the input pointer forward n places to a column number that is relative to the current position.
· variable is used for specifying the name you have assigned to the field.
· Informat is a special SAS instruction that specifies how raw data should be read. Some common SAS informats are:
o $w. for reading character values
o w.d for reading standard numeric data.
o COMMAw.d, PERCENTw.d, DATEw., MMDDYYw. and DATETIMEw. for reading a variety of nonstandard numeric data.
The following SAS code depicts the use of the FORMATTED INPUT style in reading the FIG 1 data set above.
Stay tuned for the Part Two of this blog which will focus on LIST INPUT and NAMED INPUT.
Information regarding the INPUT statement and its corresponding input styles can be found on the SAS website at:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146292.htm
Happy Learning!
Information regarding the INPUT statement and its corresponding input styles can be found on the SAS website at:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146292.htm
Happy Learning!