Back to blog

Social media scraping - what it is and why it matters

Posted on 7th January 2019

Scraping is an automated process where sites such as Facebook and LinkedIn collect data from your web pages. The scraped data is used to create a little preview of the page in question when you post a link to it on social media. This is done using a process called Open Graph protocol, which you can read about here.

The preview is basically an ‘introduction card’ that invites people to click through to your site.

It should look a bit like this:

How scraping works

The information used to create these previews is taken from the metadata that’s entered (or should be!) in the back-end of your website. This includes stuff like page titles, page descriptions, meta tags and so on. (You can read more about the different types of metadata and what they’re used for here.) In the example above, Facebook has used the page title, page description, and a thumbnail image to make up the preview, which is pretty standard.

Scraping is an automated process that will first happen when you post a link to one of your web pages on social media and certain other websites (unless you’ve set up your site to restrict access to crawlbots – not a good idea as your site won’t show up on any search engines!)

There are also other ways you can trigger Facebook or LinkedIn to scrape your site, such as uploading social plugins to web pages, which may set off an automated scrape.

Why it’s important to get it right

Naturally, you want your ‘introduction card’ to look as appealing as possible when you post a link on social media. And that means getting the metadata for every page in your website spot-on and keeping it up to date.

If you’re an it’seeze Web Design Nottingham customer, this is (as you’d expect…) easy, as our sites have a user-friendly back-end that prompts you to enter the correct information and even tells you if it’s up to scratch. You can find out more here.

When scraping goes wrong

When you post a link, you might find that Facebook, LinkedIn et al. display an outdated version of your metadata, or an old image. This is because, once scraping has taken place for the first time, a cache is created that stores the collected information. This will then be reused next time you post, even if you’ve updated the linked web page in the meantime.

The cache will be cleared at some point – after 7 days in the case of LinkedIn, for example – but until that happens, you’re stuck with outdated information. This can be frustrating if you’re constantly updating your site with new content, images, and other information that you’re keen for people to know about ASAP.

Sorting out your scraping

Luckily, there are tools you can use to get Facebook, LinkedIn, and other sites to manually re-scrape your website after you’ve made changes. LinkedIn has recently launched a service called Post Inspector, which you can use to clear the LinkedIn preview cache. If this doesn’t work, a previous method is still available which should be able to help.

Meanwhile, Facebook for Developers offer a Sharing Debugger which gives you more control over how your content appears to others on your Facebook page. And Twitter offers a Troubleshooting Guide to help you sort out any problems with your previews, known as Twitter Cards.

With an it’seeze website, you also have control over the Open Graph Image that's used, using the Page Metadata Panel.

Need a little help? Ask it’seeze Nottingham!

Whether you’re in a mess with your metadata or can’t get your page previews to play ball, we can help. If you already have an it’seeze website, you’ll find a wealth of information at our Customer Support website – or you can get in touch to speak to a consultant if you prefer.

And if you’re not an it’seeze Nottingham customer, why not visit our website to find out more about us and how we work? There are lots of benefits to moving your website over to us and we’d love to discuss them with you. So, give us a call on 0115 777 3001 or contact us online today.