Perelman School of Medicine at the University of Pennsylvania

Center for Preventive Ophthalmology and Biostatistics (CPOB)


  1. Place the variable names in the first row. Be sure that the names follow the following rules:
    • Variable names can’t be more than 8 characters long.
    • Variable names must start with a letter.
    • Variable names may only have letters, numbers, or underscores in them
    • Do not use following characters in variable names: %,$,#,@,!,+,*,~,",.,-,.
    • No blanks in variable names.
    • Be sure that each variable name is unique (no duplicate variable names)
    • Be sure variable names are on the first row only!
  2. Make sure the data is in the rectangular form, each row represents an observation and each column represents value for a variable.
  3. Only include the raw, NO summarized data please. Don’t include extraneous data in your Excel file, like row or column totals, graphs, comments, annotations, etc. 
  4. Include a unique identifying number for each case. If you need more than one identifier, such as Household ID and Subject ID, place these in separate columns. If you have several spreadsheets containing data on the same individuals, include their identifier(s) on each sheet.
  5. Only include one value per cell. Don’t enter data such as "120/80" for blood pressure. Enter systolic blood pressure as one variable, and diastolic blood pressure as another variable. Don't enter data as "A, C, D" or "BDF" if there are three possible answers to a question. Include a separate column for each answer.
  6. For the measurement with units, such as “120 lb” use two columns, one column for number (120) and another column for unit (lb).
  7. Don't leave blank rows or columns in the data.
  8. Don’t mix numeric and character values (e.g. names and ID numbers) in the same column.
  9. Date values are best entered in three columns: one for month, one for day, one for year.
  10. If you have missing values, indicate them with a numeric code, such as 99 or 999, or leave the cell blank. Be sure that the missing value code is not confused with a "real" data value.
  11. Save the spreadsheet with values only – not formulas.
  12. Do not underline text, or use boldface or italics.