We have been working on automating large parts of the content ordering on Metro.co.uk since our responsive redesign in Dec ’12. This has grown from managing a few widgets across the site to controlling the majority of the homepage. The below describes the process we went through to achieve this.
The first step was to get data out of its separate repositories and in a place that allowed manipulation. We began to collect page views and social interactions from APIs provided by Twitter, Facebook and Omniture into a MySQL database. The basic initial equation we used to figure out what was popular was:
Popular = ((Tweets + Comments + Likes + Shares) * 25) + Views
We decided to give the social signals a significant multiplier as a share action gave a much stronger signal of how much the content was liked than a page view. Based on this thought process we started with a very high multiplier of around 50 but reduced this to 25 after looking at the results over a period of time. This was calculated every half hour after fresh data was collected. The main issue with this approach was the frequency of change within the top stories. So we changed ordering to be based on the rate of change between the previously calculated score. We called this trending and it gave a really interesting snapshot of what was popular on the site and this improved the frequency of change.
Trending = Popular Score (Now) – Popular Score (30 minutes ago)
When we redesigned the site this sat right below the top five stories on the homepage. When we check the stats we were surprised that the number of clicks were almost identical to the clicks on the top module that were editorially selected and also had very large clickable images. Clearly people were interested in what other everyone else was interested in. This data also proliferated across our sidebars and into our Tablet and Phone Editions via an API. It updated 24/7 and required no manual intervention so no matter what time you visited the site it displayed a mix of fresh popular content. Based on the numbers we decided to see if this could be extended to run more of the site.
One of Metro’s competitive advantages is that it runs a very lean digital operation. We want people to spend their time finding and writing great content rather than worrying about placement. A relatively small volume of our traffic goes to our channel pages (e.g. News) or looks at sidebars due to the volume of mobile traffic we pull in. So we decided to experiment further with this concept and get it to run the rest of the homepage. We had placed a timebased feed at the bottom of the homepage and had been very surprised with the number of interactions. Just being time based wouldn’t work for the homepage as we needed a measure of popularity, freshness and time.
I sat down last December and started prototyping approaches to take. We had to give new stories a chance to be featured before they had as much data. We also didn’t want one huge story to be at the top for too long. After much deliberation/time spent running around Hyde Park I decided I would use a coefficient based on story age to boost posts in the early stages and penalise them once they had been live long enough to have an unfair advantage.
(Views + (Social * 25)) * Hours Since Published Coefficient
Hours Since Publish | Coefficient |
0 | 25 |
1 | 15 |
2 | 5 |
3 | 3 |
4 | 1 |
4-8 | 0.7 |
9-12 | 0.3 |
13-24 | 0.05 |
25-36 | 0.02 |
We had also been talking about how we could distribute content around the site better. Especially at the bottom of articles as this is where the majority of our traffic comes. This is when we placed the same timebased feed from the homepage underneath every article in what used to be empty space. We were amazed once again at the level of interaction this received and decided to optimise the data behind it.
The main idea was rather than restricting the stream to just articles from the specific sub-category (e.g. Sport/Football) you were in, we could use the highest level category and boost the subcategory you were in. Eg see all of Sport but have all Football stories closest to the top. This would give other very popular sports stories a place as well as aiding circulation around the site.
(Current Channel Boost + Views + (Social * 10)) * Hours Since Published Coefficient
I then added an option to allow you to pass in the channels that a user visited the most and using the same logic to prioritise their content near the top. Again not excluding content but rather ordering it in a way that was most appropriate for the person viewing. This is going to form the basis of our new Android App with it just displaying the top ten stories based on your browsing habits.
(Users Channels Boost + Views + (Social * 10)) * Hours Since Published Coefficient
The final twist was to use the data that we had collected on the backend to style the front-end. Our News Feed now changes its imagesize depending on if a story is trending or has been flagged by editorial. This give prominence to the popular stories as the algorithm pushes them further down the list and keeps the styling changing 24/7. As we use data from existing editorial placements it also means that there is no extra work to manage. We have worked on optimising the placement on this and we are now getting over 3 million News Feed lazy loads being requested weekly and over 500,000 extra page views. Considering this area of the site used to be blank this is a tidy return on investment. The latest enhancement is to grey out stories you have read to give an even bigger prominence to stories you have yet to consume.
The other large jumps in interactions were triggered by:
- Changing the height on when additional content was pulled in
- Increasing the size of images for trending stories
- Putting comments behind a tab and making timeline the default view
The hardest part of this process was getting all the data in the right place from the APIs in the first place. Once this was present and we had a nice API around this the rest of this process was a iterative effort between editorial and development. There are a lot of moving parts but the benefits of having a site that reacts to users behaviours and works 24/7 is something that we feel is hugely worthwhile.
Nice work in showing people how this was done. I’m thinking that if these ideas could be presented in such a way that a website owner could tweak the algorithm themselves – and then track via metrics – it could be be very useful. You have me thinking now!
This is built in such a way that all of the inputs are stored in SQL, so it would be reasonably straight forward to allow people to tweak and then see the results. I would really like to be able to increase the data stored and retrieved around a click based on data like if the image was large and what position it was in the feed.