It's my first time working with databases so I spent a bunch of hours reading and watching videos. The data I am analyzing is a limited set of marathon data, and the goal is to produce statistics on each runner.
I am looking for advice and suggestions on my database design as well as how I might go about producing statistics. Please see this image for my proposed design:
Basically, I'm thinking there's a many-to-many relationship between Races and Runners: there are multiple runners in a race, and a runner can have run multiple races. Therefore, I have the bridge table called Race_Results to store the time and age for a given runner in a given race.
The Statistics table is what I'm looking to get to in the end. In the image are just some random things I may want to calculate.
So my questions are:
-
Does this design make sense? What improvements might you make?
-
What kinds of SQL queries would be used to calculate these statistics? Would I have to make some other tables in between - for example, to find the percentage of the time a runner finished within 10 minutes of first place, would I have to first make a table of all runner data for that race and then do some queries, or is there a better way? Any links I should check out for more on calculating these sorts of statistics?
-
Should I possibly be using python or another language to get these statistics instead of SQL? My understanding was that SQL has the potential to cut down a few hundred lines of python code to one line, so I thought I'd try to give it a shot with SQL.
Thanks!
Aucun commentaire:
Enregistrer un commentaire