Helping define the terms used in our database


Statistical Terms & Use




Follow Us on Facebook


Welcome to the Statistical Terms and Use page for our database. Here you'll find the most common stats use in benchmarking and testing of our instruments. For a complete walk through of how to best read and utilize our measures please review our RRF webinar, Fundamentals of Measurement in Older Adults.

Terms & Definitions


International Classification of Functioning, Disability, and Health (ICF Domain):

Categorizes assessments into:

  • Body Function
  • Body Structure
  • Activity
  • Participation
  • Environmental Factor
  • Personal Factor

For more information consult the WHO's ICF Framework

Cut-Off Scores: A cut-off score designates a positive or negative test outcome.  This information could be used to classify individuals into groups such as minimum, moderate or severe impairment. For example, a cut-off score could represent the maximum score an individual could achieve on a test and be classified as having a risk of falls (i.e., < 48 indicates patient is at risk for falls). 

Normative Data: Normative data represent scores pulled from published literature.  Normative data provides "normal" values for specific variables within a population.  This type of research typically appears in validation studies and therefore may not represent the full range of outcomes clinicians may encounter; however, this data can provide approximate guidelines.  Whenever possible, normative data is presented with data collected from other measures researchers or clinicians have used in the course of their work.

Face Validity: An assumption that an instrument is valid based on its appearance ( is a reasonable measure of the variable being assessed). 

Considerations: These reflect potential considerations users should keep in mind when using an instrument.  Often measurements within particular diagnostic populations come with unique assumptions.  These should be kept in mind, particularly if the measure has only limited use in the population of interest.

Standard Error of Measurement (SEM): The Standard Error of Measurement (SEM) is a reliability measure that assesses response stability.  The SEM estimates the standard error in a set of repeated scores. In the Rehabilitation Measures Database, the SEM was frequently pulled directly from peer reviewed journal articles.  However, whenever the statistics were available in the published articles, the following equation was utilized to calculate the SEM:

  • SEM = Standard Deviation from the 1st test x (square root of (1-ICC))

Clinical Bottom Line: The SEM is the amount of error that you can consider as measurement error.

Minimal Detectable Change (MDC): A statistical estimate of the smallest amount of change that can be detected by a measure that corresponds to a noticeable change in ability. In the Rehabilitation Measures Database, the MDC was frequently pulled directly from peer reviewed journal articles.  However, whenever the statistics were available in the published articles, the following equation was utilized to calculate the MDC:

  • MDC = 1.96 x  SEM x square root of 2

The MDC is calculated in terms of confidence of predication. For example, MDC95 is based on a 95% confidence interval, while a MDC90 is based on a 90% confidence interval.  Anytime a MDC was calculated for the Rehabilitation Measures Database, the MDC95 was used.

Clinical Bottom Line: The MDC is the minimum amount of change in a patient's score that ensures the change isn't the result of measurement error.

Minimal Clinically Important Difference (MCID): MCID represents the smallest amount of change in an outcome that might be considered important by the patient or clinician. 

Clinical Bottom Line: The MCID is a published value of change in an instrument that indicates the minimum amount of change required for your patient to feel a difference in the variable you are measuring.

Test-retest Reliability: Establishes that an instrument is capable of measuring a variable with consistency.

Clinical Bottom Line: If you are planning to use an instrument for individual decision-making, it is recommended that you use an instrument with an ICC > 0.9. If you are planning to use the instrument to measure progress of a large group (as in research), an instrument with an ICC > 0.7 is acceptable. 

Interrater Reliability: Determines variation between two or more raters who  measure the same group of subjects.

Clinical Bottom Line: 
Excellent Reliability: ICC > 0.75;
Adequate Reliability: ICC 0.40  to < 0.74;
Poor Reliability: ICC < 0.40

Intrarater Reliability: Determines stability of data recorded by one individual across two or more trials. See Interrater Reliability Criteria.

Internal Consistency: The extent to which items in the same instrument all measure the same trait. Typically measured using Cronbach's alpha.

Clinical Bottom Line: 
Excellent: Cronbach's alpha > .8;
Adequate:  Cronbach's alpha < .8 and >.7;
Poor: Cronbach’s alpha <.7; 
Scores higher than .9 may indicate redundancy in the scale questions.

Predictive Validity: Indicates that the outcomes of an instrument predict a future state or outcome.

Clinical Bottom Line: 
Excellent:  correlation coefficient > 0.6;
Adequate: correlation coefficient 0.31 - 0.59;
Poor:  correlation coefficient < 0.30

For Receiver Operating Characteristic (ROC) analysis (area under the curve): Excellent: > 0.9
Adequate: 0.7 - 0.89; Poor: < 0.7

Concurrent Validity: Establishes validity when two measures are taken at relatively the same time, often indicates that the test could be used instead of a gold-standard. See Predictive Validity Criteria

Convergent Validity: Convergent validity refers to the degree to which two measures demonstrate similar results.  For example, a new measure may assess gait speed using a new technique.  Validation of this new measure would include outcomes obtained from established measures of gait speed.  The degree to which these two assessments of gait speed converge provides evidence of the new measure's validity. See Predictive Validity Criteria

Discriminant Validity: Discriminant validity is the degree to which two or more measures, assessing theoretically different constructs, demonstrate a difference in outcomes.  Discriminate validity evidence is commonly gathered during test validation to ensure that two or more measures are NOT assessing the same underlying trait or dimension. See Predictive Validity Criteria

Clinical Bottom Line: High correlations between measures (greater than .90) indicate the measures are assessing the same domain and may be redundant.

Content Validity: The items that make up an instrument adequately sample the universe of possible items that compose the construct being measured. Typically assessed by measuring agreement between Subject Matter Experts (SME), although several other techniques can also be used. .

Construct Validity: Establishes the ability of an instrument to measure an abstract concept and the degree to which the instrument reflects the theoretical components of it.  

Includes convergent and discriminant validity.Construct validity is assessed using several lines of evidence including Content, Construct and Criteria related validity.  Construct validity is a property of the inferences regarding the use of a measure as opposed to a property of the measure itself.

Floor Effects: Floor effects occur when a measure’s lowest score is unable to assess a patient’s level of ability.  For example a measure that assesses caregiver depression may not be sensitive enough to assess low or intermittent levels of depression among caregivers.

Clinical Bottom Line: 
Excellent:  No floor effects;
Adequate: Floor effects < 20%;
Poor:  Floor effects for > 20%

Ceiling Effects: Ceiling effects occur when a measure’s highest score is unable to assess a patient’s level of ability.  This might be particularly common for measures used over multiple occasions.  For example, a patient’s pre-rehab score may be in-range at the initial evaluation, but the patient’s ability exceeds the measure's highest score over time.  Therefore, it is unable to accurately assess progress as the patient improves.

Clinical Bottom Line: 
Excellent:  No ceiling effects;
Adequate: Ceiling effects < 20%;
Poor: Ceiling effects > 20%



Andresen, E. M. (2000). "Criteria for assessing the tools of disability outcomes research." Arch Phys Med Rehabil 81(12 Suppl 2): S15-20. 

Fitzpatrick, R., Davey, C., et al. (1998). "Evaluating patient-based outcome measures for use in clinical trials." Health Technol Assess 2(14): i-iv, 1-74. 

Portney, L., Watkins, M., et al. (2000). Foundations of clinical research: applications to practice, Prentice Hall Upper Saddle River, NJ.

Standards of validity and the validity of standards in performance assessment. Messick, Samuel; Educational Measurement: Issues and Practice, Vol 14(4), Win, 1995.

Let's get connected.

Request an appointment