Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.

ETL with Python and Pandas for Soccer Scheduling

posted on August 29, 2021 | tags: [ data science, soccer, python ]
soccer field
Moving SJSL to the new GotSport platform for scheduling required a solution to ETL data to the Arbiter platform for referees

New Platforms, New Features, Missing Cheese

As the Information Officer for the South Jersey Soccer League, I'm tasked with making sure our platforms run and run well. SJSL has over 500 teams from over 70 clubs in South Jersey. Our clubs and teams depend on these systems for registration, game schedules, referees and week to week operations to get their players on the pitch.

This Fall, our platform provider for registration, schedules and clubs, GotSport required SJSL to move off their legacy GotSoccer platform to their new modernized system. This was a heavy lift for our scheduling process. Scheduling works with data provided from each team registration, each club, each venue and their related fields to create an optimal schedule.

To assign referees to this schedule, our assignors use a referee scheduling platform, Arbiter and GotSoccer, the legacy system was integrated with this platform. From GotSoccer, the schedule could be synchronized at the press of a button.

That cheese was moved. No such feature to synchronize between GotSport and Arbiter.

But ...

Arbiter has an import feature and I can export the schedule from GotSport. The tricky part is the Arbiter import validates the upload and Sites, Subsites - Venues and Pitches or Fields, need to map to the Arbiter data. I need to get 2549 games into Arbiter for the referee assignors to start assigning referees.

Python and Pandas to the rescue.


SJSL GotSport Toolbox - ETL

Need to clean up code for public consumption

Points of interest:

  1. pseudo code the process
  2. reading excel files
  3. replace with dictionary and set_index
  4. merge 2 dataframes
  5. clean up key values
  6. add and remove columns for the export
  7. date value with insert_space from lambda
  8. final export to_csv using timestamp to rerun as needed

Results from the ETL

Total Category Expected Actual
Total Exported from GotSport 2549
Total Import Actual 2488
Total Import Errors 26
TBR (12/24) 24
Missing 11