I’ve recently been dipping my toe into the world of data science (just like everyone else it seems) and I wanted to find a low-stakes personal project to hone my skills outside of a structured example. Since it’s fall, I decided to tackle (haha) a homegrown College Football ranking system.

For those unfamiliar with the sport, in the United States the biggest game around in terms of excitement and raw viewership is American Football. Young children play the game and progress up the ladder through high school and on to university teams if they have the talent and the desire. Very good players can be drafted into the National Football League and play at the highest level of the sport in the world.

College football is regulated by a non-profit called the National Collegiate Athletic Association or NCAA. It regulates sports programs ranging from football to basketball to bowling for over 1200 American institutes of higher learning. As a way to ensure competitive parity schools are grouped into three divisions and generally only compete against teams from their own division. Division 1 football is further subdivided into the Football Bowl Subdivision and the Football Championship Subdivision. Generally speaking football programs in the FBS are larger, better funded, have more talented athletes, and stronger coaches.

Within the FBS teams are grouped into regional conferences as a way to limit travel distances and keep schedules somewhat consistent between seasons. While that consistency can create compelling rivalries and storylines between neighboring teams it also limits the number of head to head matchups seen each year. Coupled with a short season of 12 to 13 games, objectively determining the best teams in the league is extremely difficult. Analysts and fans typically rely on a combination of game statistics and subjective opinions from experts to rank teams with plenty of heated discussions along the way.

Since 1936 the gold standard for CFB ranks has been the weekly top 25 poll conducted by the Associated Press. The AP poll asks 65 sportswriters and broadcasters to arrange their top 25 college teams and compiles those surveys into a single ranking. Until 1998 and the introduction of the Bowl Championship Series the de facto national champion was the team at the top of the AP poll at the end of the season. Under the BCS the AP poll still played a role in selecting the two teams that played in the championship game but it was only a single component in a somewhat complex and confusing system.

The AP and other human polls were used in concert with a blended model of computer ranking algorithms to arrive at a final ranked list at the end of the regular season. The complexity of the system generated enormous controversy throughout the BCS’s lifetime as fans and schools often felt that their team had been snubbed or overlooked by some shortcoming of the ranking model. In 2014 the BCS was formally decommissioned in favor of the College Football Playoff, a four team single elimination bracket seeded by thirteen member committee.

In creating Forward Progress I wanted to return to the BCS practice of ranking teams using nothing but objective game data in lieu of human input. My methods for generating the rankings are discussed in another post for those interested in technical details. Otherwise, be sure to check back week by week and see how the Forward Progress algorithm performs.