Part One - Associations Between Musical Subreddits

Introduction

Reddit has many musical subreddits, and I became curious as to which genres are most closely associated with each other. To figure this out, I wrote a Python script to fetch the top 1,000 posts of all time from each of the following subreddits:


Methodology

I parsed out the artist, song, and subreddit from each submission under the assumption that the posts were in "Artist - Song" format, cleaned up the data by removing extra characters such as '()' and '[]', and made the strings uniform (trimming and conversion to lower case.) From there, I condensed like artists in to a Python dictionary with the format ['artist', [set of subreddits]]. I then iterated through all subreddit combinations and counted each connection, except for commutative comparisons (for example, pop compared to metal is the same thing as metal compared to pop) and obviously to comparisons to itself (metal wouldn't compare to metal.) After some trial and error, I made sure to output this to CSV in a way that could be directly imported in to Gephi.

After importing the data, I messed around with a few visualization techniques and found the results very interesting. I've included two below: Yifan Hu proportional and Frutcherman Reingold.


The Data

Fruchterman Reingold Layout

The Fruchterman Reingold layout is "concerned with drawing undirected graphs according to some generally accepted aesthetic criteria:

1. Distribute the vertices evenly in the frame.

2. Minimize edge crossings.

3. Make edge lengths uniform.

4. Reflect inherent symmetry.

5. Conform to the frame."

Though they explain their algorithm isn't explicitly trying to achieve these goals, it "does well at distributing vertices evenly, making edge lengths uniform, and reflecting symmetry"

Their actual algorithm focuses on two principles:

"1. Vertices connected by an edge should be drawn near each other

2. Vertices should not be drawn too close to each other"

Because of these principles, we get an excellent visual look in to the relation of subreddits. As a product of these rules, this layout clusters like subreddits together, putting more niche and unique subreddits with fewer connections to the outside, and more mainstream subreddits with many connections closer towards the center. In theory, the node closest to the center should be the node with the most connections.

In the Fruchterman Reingold layout above, /r/indie_rock, /r/alternativerock, and /r/indieheads all seem to be center pieces in this graph with 13, 23, and 12 unique connections respectively. /r/indie_rock's connections are actually a subset of all the connections /r/alternativerock has however, so it is safe to say that /r/alternativerock and /r/indieheads together serve as the glue for many subreddits.

Another interesting thing to note is the relative interconnectivity of the like-genre subreddits - the electronic music subreddits have a very large amount of interconnectivity (bottom left corner), as do the heavier music subreddits (closer towards the top of the graph). Indie music (closer towards the center) is also very interconnected. This goes to show how artists and even genres tend to be very loosely defined when it comes to where they belong, and who enjoys hearing them. Remember, as these are the artists I was able to parse from the top 1,000 posts, these artists are very likely to have been voted highly in each respective post. This means that artists weren't just cross posted to other subreddits and then downvoted, these were voted on and in a sense having their belonging to the subreddit confirmed by the users.

Yifan Hu Proportional Layout

For looking at proportional distancing of like-genre subreddits, the Yifan Hu Proportional layout gives us a great look at how similar certain subreddits are clustered by strength of similarity. The Yifan Hu Proportional is also a force-directed graph algorithm that combines the Fruchterman Reingold algorithm with some much more complicated mathematics that are currently above my level. The product, however, is a fantastic representation of relative connectivity. This representation better shows us the interconnectivity of the respective "macrogenres", displaying the clustering effect of electronic music, indie, and heavier music while showing the relative irrelevancy of genres like country, jazz, and sea shanties. This also serves as a mini case study on what is popular among reddit users, and it comes as no surprise that electronic music is far more active of a community than country music.

References

Algorithms for Visualizing Large Networks - Yifan Hu (September 2011)

Graph Drawing by Force-directed Placement - Thomas M. J. Fruchterman and Edward M. Reingold (November 1991)

Efficient, High-Quality Force-Directed Graph Drawing - Yifan Hu (2006)