About the Data

A few words about the data:

We mentioned having the anonymized class profiles for the classes of 2000-2020. We obtained these through proper request channels to the US Military Academy, and received the files in Excel, one file per class, for each of classes 2000-2020. 

The files had no explanation for an number of codes, so we made reasonable assumptions as to what some of the coding was. “M” for Male is not a stretch. The “Data dictionary” is below. Additional errata published April 2022 [here]

FieldDescription
cdt_idApplicant profile identifier, e.g. “m_00023” where “m” represents the year, and the number is the sequential identifier.
apply_clyrThe class year that the applicant applied for
prnt_income_cdNumerical series seeming to range from 5 to 22. It’s unclear exactly what this meant, so we didn’t use it much.
pr_sex_cdSex code – M or F, male or female.
pr_race_pop_cdRace population code – A, B, H, N, O, W; we assume this represents Asian, Black, Hispanic, Native American, Other, and White.
sat_mathSAT Math scores*
sat_vrblSAT Verbal scores*
sat_writeSAT Writing scores*
act_engACT English scores*
act_mathACT Math scores*
act_readACT Reading scores*
act_sci_rsnACT Sci/Reasoning scores*
act_writeACT Writing scores*
cand_cum_gpaHigh school cumulative GPA. Not used due to widely varying GPA methodologies and curricula.
grd_scaleHigh school grade scale; again, not used here
class_rankHigh school class rank
class_sizeHigh school class size
hs_sport_1High school sports and letters
hs_sport_1_ltrHigh school sports and letters
hs_sport_2High school sports and letters
hs_sport_2_ltrHigh school sports and letters
hs_sport_3High school sports and letters
hs_sport_3_ltrHigh school sports and letters
prior_svc_flgFlag denoting prior military service
civil_prep_stat_cdUnclear; denoted in “C” and “D”s; not used
usmaps_stat_cdUS Military Academy Prep school ; not used, though could be potentially correlated to entry scores, GPAs, and Prior Service.
pae_stat_cdPhysical Aptitude Examination status code
applicant_qualified_flgWhether applicant was deemed “qualified”; we assume this means academically, physically, and with nomination. it’s unclear whether this meant an offer was extended.
applicant_admitted_flgWhether applicant was Admitted. Surprisingly, all values were “N” for this column, even for cadets with GPAs and other indicators of attendance, indicating faulty query or data.
yrcpt_11 Years as team captain? null values
national_merit_schlr_flgNational Merit Scholar flag; unclear what levels of Nat’l Merit this represents. Only shows “N” for classes through 2010. Large variances in later years indicate a change in methodology, so not used here.
prnt_usma_grad_flgParent as USMA grad flag. Disregarded here, though one could analyze legacy admissions’ relationship to performance.
cqpaCumulative Quality Point Average; weighted average of Academic, Military, and Physical performance. Used as a proxy for overall performance at the Academy.
capsCumulative Academic Point Score; measure of Academic performance
cmpsCumulative Military Point Score; measure of Military performance
cppsCumulative Physical Point Score; measure of Physical performance
STAP_attnd_flgSummer Term Academic Program attendance flag
ay1_cdtcoAcademic year cadet company
ay2_cdtcoAcademic year cadet company
ay3_cdtcoAcademic year cadet company
ay4_cdtcoAcademic year cadet company
ay5_cdtcoAcademic year cadet company. Null for all sets.
ay6_cdtcoAcademic year cadet company. Null for all sets.
fos_maj_1_cdMajor or Field of Study code
basic_br_cdBasic Branch code
usma_stat_cdA, C, G, S, X; We don’t know what “A” stands for; “C” seems to be “Cadet”, “G” graduated, “S” Separated, “X”… unknown.
sep_rsnnmSeparation reason codes
actvy_nmActivity name, such as sports, clubs, or other activities
days_in_actvyNumber of days in the activity; Some numbers are negative, which was unexplained.
actvy_typActivity type; IM for Intramural; CL for Club; CS for Corp Squad (Varsity athletics); and others
appl_rcvd_dtDate that the application was received
eagle_gold_flgFlag for Eagle Scout or Gold Award receipt
class_presdnt_flgFlag for whether an applicant was a Class President

*In a number of our analyses we use mostly SAT scores, and some ACT scoring for specific questions, because almost all cadets took the SATs, and fewer took the ACTs (27.7K vs 22.3K respectively). SAT and ACT scoring have undergone changes. For this reason, the scoring comparison is not quite apples-to-apples for classes across those timeframes; however, we’ve used only Math and Verbal as reasonable proxies for aptitude comparisons. In 2005, the SAT test went from a 1600-pt scale to a 2400 point scale, and added a writing section. In 2016, it stopped penalizing takers for incorrect answers and moved from a 2400 scoring scale to 1600 scoring, including evidence-based reading comprehension, math, and an optional essay. The ACT changed scoring in 2013, including standard and extended time scores together in its grade scales. 

Other notes:

It seems there were some changes in coding through the years; for example, some separation codes may have changed or been added through the class years, so in some cases of analysis we removed classes where we thought the data was choppy. For example, the percentage of National Merit Scholar flags for cadets went from ~1% in 2012 to ~15% for 2013 and up to 20% by 2020. Clearly, something changed either at the Academy or with National Merit recognition, and it wasn’t that the Academy was admitting 1400% more 99th-percentile students every year.

We also got data on summer activities such as the Military Individual Advanced Development (MIAD) , Individual Advanced Development (IAD), and some miscellaneous activities. We did not use these in our assessments (yet!), although no doubt there are interesting facts in the data about who and how USMA sent where during the summers.

As needed, we also added some fields for analysis purposes. For example, since the “Admitted” flag had only “N” values, we created an “Attended” flag column, using conditional logic on whether the cadet was assigned a 1st year Academic Year co or had a separation reason- meaning, the cadet actually attended West Point enough to get a separation or AY company. We’ll mention these as needed on any particular topics.

Update: We also found that there are Duplicate Records across different years. These are records with different candidate IDs, but the *exact same record data*. We found this out relatively late in the analysis projects. But for anyone wanting to use the data, read this: https://usmadata.com/2022/04/07/housekeeping-data-errata/.

We are planning on posting this data after we get through examining some of the specific claims made by Heffington and USMA. Which will start, by the way, in the next post.

Thoughtful criticism and factual corrections are welcome.

Leave a Reply

Discover more from usmaData

Subscribe now to keep reading and get access to the full archive.

Continue reading