A few words about the data:
We mentioned having the anonymized class profiles for the classes of 2000-2020. We obtained these through proper request channels to the US Military Academy, and received the files in Excel, one file per class, for each of classes 2000-2020.
The files had no explanation for an number of codes, so we made reasonable assumptions as to what some of the coding was. “M” for Male is not a stretch. The “Data dictionary” is below. Additional errata published April 2022 [here].
Field | Description |
cdt_id | Applicant profile identifier, e.g. “m_00023” where “m” represents the year, and the number is the sequential identifier. |
apply_clyr | The class year that the applicant applied for |
prnt_income_cd | Numerical series seeming to range from 5 to 22. It’s unclear exactly what this meant, so we didn’t use it much. |
pr_sex_cd | Sex code – M or F, male or female. |
pr_race_pop_cd | Race population code – A, B, H, N, O, W; we assume this represents Asian, Black, Hispanic, Native American, Other, and White. |
sat_math | SAT Math scores* |
sat_vrbl | SAT Verbal scores* |
sat_write | SAT Writing scores* |
act_eng | ACT English scores* |
act_math | ACT Math scores* |
act_read | ACT Reading scores* |
act_sci_rsn | ACT Sci/Reasoning scores* |
act_write | ACT Writing scores* |
cand_cum_gpa | High school cumulative GPA. Not used due to widely varying GPA methodologies and curricula. |
grd_scale | High school grade scale; again, not used here |
class_rank | High school class rank |
class_size | High school class size |
hs_sport_1 | High school sports and letters |
hs_sport_1_ltr | High school sports and letters |
hs_sport_2 | High school sports and letters |
hs_sport_2_ltr | High school sports and letters |
hs_sport_3 | High school sports and letters |
hs_sport_3_ltr | High school sports and letters |
prior_svc_flg | Flag denoting prior military service |
civil_prep_stat_cd | Unclear; denoted in “C” and “D”s; not used |
usmaps_stat_cd | US Military Academy Prep school ; not used, though could be potentially correlated to entry scores, GPAs, and Prior Service. |
pae_stat_cd | Physical Aptitude Examination status code |
applicant_qualified_flg | Whether applicant was deemed “qualified”; we assume this means academically, physically, and with nomination. it’s unclear whether this meant an offer was extended. |
applicant_admitted_flg | Whether applicant was Admitted. Surprisingly, all values were “N” for this column, even for cadets with GPAs and other indicators of attendance, indicating faulty query or data. |
yrcpt_11 | Years as team captain? null values |
national_merit_schlr_flg | National Merit Scholar flag; unclear what levels of Nat’l Merit this represents. Only shows “N” for classes through 2010. Large variances in later years indicate a change in methodology, so not used here. |
prnt_usma_grad_flg | Parent as USMA grad flag. Disregarded here, though one could analyze legacy admissions’ relationship to performance. |
cqpa | Cumulative Quality Point Average; weighted average of Academic, Military, and Physical performance. Used as a proxy for overall performance at the Academy. |
caps | Cumulative Academic Point Score; measure of Academic performance |
cmps | Cumulative Military Point Score; measure of Military performance |
cpps | Cumulative Physical Point Score; measure of Physical performance |
STAP_attnd_flg | Summer Term Academic Program attendance flag |
ay1_cdtco | Academic year cadet company |
ay2_cdtco | Academic year cadet company |
ay3_cdtco | Academic year cadet company |
ay4_cdtco | Academic year cadet company |
ay5_cdtco | Academic year cadet company. Null for all sets. |
ay6_cdtco | Academic year cadet company. Null for all sets. |
fos_maj_1_cd | Major or Field of Study code |
basic_br_cd | Basic Branch code |
usma_stat_cd | A, C, G, S, X; We don’t know what “A” stands for; “C” seems to be “Cadet”, “G” graduated, “S” Separated, “X”… unknown. |
sep_rsnnm | Separation reason codes |
actvy_nm | Activity name, such as sports, clubs, or other activities |
days_in_actvy | Number of days in the activity; Some numbers are negative, which was unexplained. |
actvy_typ | Activity type; IM for Intramural; CL for Club; CS for Corp Squad (Varsity athletics); and others |
appl_rcvd_dt | Date that the application was received |
eagle_gold_flg | Flag for Eagle Scout or Gold Award receipt |
class_presdnt_flg | Flag for whether an applicant was a Class President |
*In a number of our analyses we use mostly SAT scores, and some ACT scoring for specific questions, because almost all cadets took the SATs, and fewer took the ACTs (27.7K vs 22.3K respectively). SAT and ACT scoring have undergone changes. For this reason, the scoring comparison is not quite apples-to-apples for classes across those timeframes; however, we’ve used only Math and Verbal as reasonable proxies for aptitude comparisons. In 2005, the SAT test went from a 1600-pt scale to a 2400 point scale, and added a writing section. In 2016, it stopped penalizing takers for incorrect answers and moved from a 2400 scoring scale to 1600 scoring, including evidence-based reading comprehension, math, and an optional essay. The ACT changed scoring in 2013, including standard and extended time scores together in its grade scales.
Other notes:
It seems there were some changes in coding through the years; for example, some separation codes may have changed or been added through the class years, so in some cases of analysis we removed classes where we thought the data was choppy. For example, the percentage of National Merit Scholar flags for cadets went from ~1% in 2012 to ~15% for 2013 and up to 20% by 2020. Clearly, something changed either at the Academy or with National Merit recognition, and it wasn’t that the Academy was admitting 1400% more 99th-percentile students every year.
We also got data on summer activities such as the Military Individual Advanced Development (MIAD) , Individual Advanced Development (IAD), and some miscellaneous activities. We did not use these in our assessments (yet!), although no doubt there are interesting facts in the data about who and how USMA sent where during the summers.
As needed, we also added some fields for analysis purposes. For example, since the “Admitted” flag had only “N” values, we created an “Attended” flag column, using conditional logic on whether the cadet was assigned a 1st year Academic Year co or had a separation reason- meaning, the cadet actually attended West Point enough to get a separation or AY company. We’ll mention these as needed on any particular topics.
We are planning on posting this data after we get through examining some of the specific claims made by Heffington and USMA. Which will start, by the way, in the next post.
Thoughtful criticism and factual corrections are welcome.