Stay tuned!

26 Jun

It has been nearly a full year (to the day) since I made my last post. I am still enjoying a steady stream of vistors from the few posts I have which is pleasing – it seems some people have found it useful. 

As many of you know, full time work is time consuming – especially if that work is your hobby and passion. Even in my ‘off-hours’ I find myself returning to ‘work’ related problems and not spending as much time on the graphing as I would like. It is not that it I don’t find it interesting anymore but simply that the other interests are the ones I’m spending time on. 

However, I have had a number of ideas brewing and have had a whole heap of submissions from readers. I’ve also learnt a few more programming languages in the last year which makes it so much easier to access online content (e.g. Python is amazing). I’ve also been working on a whole bunch of semantic libraries which parse information into some meaningful structure. Rather than studying the networks of wikipedia, I’ve designed it to crawl through the search results of various engines. This comes with its own challenges and so it has taken my a long time to get into a position where I can create something unique. 

I will be posting these items in the coming month so stay tuned. Thanks to everyone for your comments – I read *all* of them and respond to any of the questions you’ve been having.

I’ll see you all soon.

Griff

The Graphs Of Football, Basketball, Ice Hockey, Baseball and Soccer

26 Jul The Graph Of Baseball Teams

This time: SPORTS. My goal here is to see if I can create a network of famous sports people based on only two pieces of information: their name and their team. I’m going to need your help on this one so keep your wits about you as you read on.

The sports I chose for this study were:

  1. American Football
  2. Basketball
  3. Ice Hockey
  4. Baseball
  5. Soccer (Football)

I simply went with the sports which had the most data available on Freebase‘s sports section. Just to be clear, I will be calling Association Football, ‘Soccer‘, to distinguish it from American Football. Sorry Europe.

Concept

To get the core dataset I simply queried Freebase for players which had an available team. I then had to think of a way to connect them. I figured players move around quite a bit over a given career so I thought perhaps I could link both teams and players together based on their affiliations (i.e. two distinct but related graphs).

I settled on two types of graphs for each sport:

  1. The Player Graph: This consists of people only e.g if Diego Maradona played for FC Barcelona and Gary Lineker also played for FC Barcelona, then a connection can be drawn between the players: Maradona and Gary.
  2. The Team Graph: This consists of teams only e.g. if Diego Maradona played for FC Barcelona and Boca Juniors, then a connection can be drawn between the clubs:  FC Barcelona and Boca Juniors.

In this way, for the player map it sort of builds up a network of affiliation. Sure, the players might not have ever met or had any influence on one another, but their connection to the same club does connect them together via a common strand. Similarly for the club/team maps, clubs which have had similar players move between them will be more closely connected. Writing this just now, I haven’t yet seen the graph I’m about to create so it will be interesting to see if certain club allegiances come out of the woodwork. The player map should, like the TV Actor map create sub-networks of actual teams within the larger network. We shall see…

Lastly, most of the people and teams will be American (I’m limited to what Freebase gave me!). Can American viewers please comment on any interesting features, particularly within Baseball, Basketball and Football! As Freebase’s data becomes more complete, these graphs will become more complete.

Method

Again I decided to approach this one using matrices. Please see my post on Wikipedia personalities to see how the datasets are preprocessed. I did however have to design a brief program to connect the various teams and players together. It essentially involves a tripple for loop (there are definitely better ways of doing this!). I chose Matlab since a lot of the code from previous posts can easily be copied across to treat new datasets. I’ve written a few functions now which help process Freebases’ somewhat annoying csv outputs – especially if I have to obtain them from the data dumps and not queries. If there is sufficient interest in how my program works, I can make it available. Most of my time was spent trying to understand unicode and encoding formats to interpret non-english names. I actually learned quite a bit about script blocks and how symbols, east-asian scripts etc. are stored – it was something I had wondered about for a long time. You can find your Sunday afternoon readings on this here (intense) and here (less intense).

Anyway, my little program basically converts this:

Diego Maradona,Boca Juniors
David Beckham,LA Galaxy
David Beckham,A.C. Milan
David Beckham,Preston North End F.C.
David Beckham,Real Madrid
David Beckham,Manchester United F.C.
Gary Lineker,England national football team
Gary Lineker,Leicester City F.C.
etc. etc.

into two matrices which connect players with players and teams with teams. I really should get better at Perl for this sort of stuff but alas… I haven’t the time.

The Graphs

Since I’ve done a number of sports I’ve broken the next section down into the various sports I selected. In each category you’ll find two graphs. One connects players with players and the other connects teams with teams. Rather than commenting on each individual graph in turn, there are a few general observations I’ve made (let me know if I’ve missed something in the comments section):

  • Clustered names usually will indicate an entire team. The more central a player name the more likely that person has been in a range of clubs with no major allegiances. People closer to only two or three isolated clusters of people will likely have strong ties to only the neighbouring clubs. The bigger the node, the more people they have played with over the course of their career. Keep in mind many players are still currently active and so their network is still being formed.
  • Clustered clubs are a bit more interesting because they bring out some underlying structure. See if you can notice certain club types sticking together. In the more international sports the various colours will represent individual countries e.g. in soccer, English and German football teams cluster together because the players moving between the ranks usually belong to the country the club originates from.
  • Knowledge of the players, teams and how they are related will probably allow you to get more out of the graph than I did. My sports knowledge is mediocre at best. Let me know if there is anything peculiar/interesting in the comments.
  • There may be a few names which look strange. Whenever you see a country, say ‘Italy’, that refers to the national team of that sport. I cut the labels down so it was more manageable. Hopefully they are self-explanatory. That reminds me, please let me know if there are duplicates of anything.
  • Lastly, the size of the node in every graph has nothing whatsoever to do with the strength of the team or player in their respective sports! Without further ado, here is my latest batch of graphs… remember to click on the high-res version if you would like to explore the graphs properly!

American Football

The Graph Of American Football Players (zoom hi-res version)

The Graph Of American Football Teams (zoom hi-res version)

Basketball

The Graph Of Basketball Players (zoom high-res version)

The Graph Of Basketball Teams (zoom high-res version)

Ice Hockey

The Graph Of Ice Hockey Players (zoom high-res version)

The Graph Of Ice Hockey Teams (zoom high-res version)

Baseball

The Graph Of Baseball Players (zoom high-res version)

The Graph Of Baseball Teams (zoom high-res version)

Football (Soccer)

The Graph Of Soccer Players (zoom high-res version)

The Graph Of Soccer Teams (zoom high-res version)

Caveats

As with all of my graphs, there are a few of important things to keep in mind:

  1. The datasets are incomplete. Many of your favourite players and teams could very likely be missing from the graph (especially non-Americans) – I’m sorry – I can’t do anything about that. This incompleteness will also lead to slight confusion as to what the various sized nodes actually mean. For the player graphs, the bigger the nodes, the most connections that person has to other people within the network. This essentially just means that the biggest nodes have shared the largest number of clubs with the largest number of people. Similarly for the club maps, the larger nodes are simply clubs which have the greatest reach in terms of the number of connections to other clubs through their current or previously players.
  2. The network is simply an exploration in connecting information. If you want to read facts or obtain clear cut answers to your questions about sports players and teams: go read Wikipedia or the original Freebase entries. This work, as with most of my others straddle a ground somewhere between entertainment and information. Where these networks fall, I do not know – I am at the mercy of you, the reader.

Keep the suggestions coming in – I’ve got about a billion projects at various stages of production. Thank-you: they have all been great. My other computer has been thinking for two days to create the dataset for one of the next graphs so stay tuned friend.

Until next time… stay crunchy.

The Graph Of TV Actors

20 Jul All of these actors have worked together in a number of TV series.

This time I wanted to see the relationship between TV actors. I’m not especially interested in TV series but I am quite interested in how they work together. The fact that many actors have been in a number of TV series creates a great network of information.

Method:

  1. I first went to Freebase’s tried to download every actor available their corresponding TV shows. Unfortunately, Freebase had over 57,000 nodes which disabled me from querying what I wanted. I decided to do it manually.
  2. Freebase has regular data dumps where they store the entire networks on an ftp server. I simply navigated to where the TV actors were and downloaded the appropriate file.
  3. I then imported these into Matlab and ran a script which connected every actor with every other actor based on the TV show they had been in.  Once this had been run I then exported the list into Excel, did some formatting and produced the required input for Gephi.
  4. I exported these and then manually went around and added the labels for each of TV series in Gimp. Let me know if any are wrong!

The Graph:

The Graph Of TV Actors

Click here to zoom around.

As one would expect there are sub-networks within the entire graph. I’ve labelled to the best of my ability the TV series each of the sub-networks belong to. Now obviously there is going to be some overlap and so there might be the odd actor who doesn’t belong to the neighbouring label. The majority of the network should however.

Some of the sub-networks include:

Gilmore Girls, Alias and Arrested Development

Saved By The Bell and Frasier

The Power Rangers

All of these actors have worked together in a number of TV series. Hence the mess.

As you might have noticed, the TV series here are reasonably old. This is probably a result of the TV actor information on Freebase being incomplete. It is growing at an incredible rate and so I don’t think it will be too long until more modern series appear on the graph.

I couldn’t label the central regions because it is so entangled. I’ll let you try and work out who belongs to what series on your own.

Future

One could feasibly create a map for film actors also. I have downloaded the data but it is in a slightly more technical format which requires a more sophisticated program. Film actors would be much richer and have so much more structure which would be fascinating to see.

The same could be said of directors, producers, writers etc. so there really is no end to how many different types of networks you could create. Lastly, as an option, here is a poster version.

Anyway, just a short one today.

Follow

Get every new post delivered to your Inbox.

Join 75 other followers

%d bloggers like this: