mardi 4 août 2015

SQL Database design for statistical analysis of many-to-many relationship

It's my first time working with databases so I spent a bunch of hours reading and watching videos. The data I am analyzing is a limited set of marathon data, and the goal is to produce statistics on each runner.

I am looking for advice and suggestions on my database design as well as how I might go about producing statistics. Please see this image for my proposed design:

My Design

Basically, I'm thinking there's a many-to-many relationship between Races and Runners: there are multiple runners in a race, and a runner can have run multiple races. Therefore, I have the bridge table called Race_Results to store the time and age for a given runner in a given race.

The Statistics table is what I'm looking to get to in the end. In the image are just some random things I may want to calculate.

So my questions are:

  1. Does this design make sense? What improvements might you make?

  2. What kinds of SQL queries would be used to calculate these statistics? Would I have to make some other tables in between - for example, to find the percentage of the time a runner finished within 10 minutes of first place, would I have to first make a table of all runner data for that race and then do some queries, or is there a better way? Any links I should check out for more on calculating these sorts of statistics?

  3. Should I possibly be using python or another language to get these statistics instead of SQL? My understanding was that SQL has the potential to cut down a few hundred lines of python code to one line, so I thought I'd try to give it a shot with SQL.

Thanks!

Aucun commentaire:

Enregistrer un commentaire