Protect Your Most Valuable Blog Resource, Stop Content Scraping and Plagiarism

September 17, 2010 | 8 Comments

There’s a very popular saying amongst bloggers, and it goes: content is king. As a blogger, your content is your most precious resource. I don’t know about you, but I’m not going to let sploggers and feed scrapers take that away from me. Not if I can help it. Not if you can help it. How?

Label your feeds with copyright notices.

Add your name, website, and URL (site URL or post URL) to your feed so that when it is read elsewhere, others will know where it really came from.

Recommendation: FeedEntryHeader Plugin. Many feed customization plugins exist, but I like this particular plugin because it affixes the necessary information before the content of the post rather than after, as feed scrapers usually truncate the content. And if you can help it, spell out the URL in plain text to your website or blog post rather than link to it using HTML. Scrapers will definitely want visitors to think they didn’t steal someone else’s content.

Feedback: Do you use summaries instead of full feeds because you don’t want scrapers to access them? Or do you provide both?

Block questionable visitors.

If they can’t find your blog, they won’t be able to take advantage of it.

Recommendation: AntiLeech Plugin. This plugin ideally stops potential scrapers from accessing your website content and instead feeds them fake content. You can enter either IP addresses or User Agent strings that identify the scrapers. Read more about AntiLeech here.

The tricky part is figuring out who your enemy is. They will have to scrape your feed first for you to know about it, right? You can use ©Feed to figure out who is reading your feeds, but more often than not they actually send trackbacks to your post once they’ve scraped it, so you can get their IP address from that as well.

Feedback: Where do you find your IP address blacklists?

Disable hotlinking.

Hotlinking is a term that describes how other people use your content with your own server bandwidth, which is how much data your server transfers over a period of time. Every time someone loads your website, all those files that get loaded is equal to a certain bandwidth. So if people keep hotlinking your photos, music, or videos, your bandwidth quota for the month (or quarter or year) gets used up. Now hotlinking may not be an issue for you—if you have lots of bandwidth, and don’t care about attribution or who uses your content. Normally it is; it’s bad netiquette. If you do care, you need to stop people from hotlinking.

Recommendation: Hotlink Protection Plugin. Enter the file location which you want to protect, and if an external website loads any image from it, a different image will be displayed (which is customizable). Since images are the most common target anyway, this plugin will suffice.

Feedback: Do you host your own images or do you hotlink them from sites like PhotoBucket?

*Note: What the plugins can accomplish can also be done in less straightforward but more flexible methods like PHP programming, .htaccess editing, cPanel configuration, web applications.

Take action.

Protecting your content isn’t just about setting up defense mechanisms. You should be vigilant enough to find out if you’ve been scraped or plagiarized and then do something about it.

Recommendation: 6 Steps to Stop Content Theft. These are six long and tough steps, but if you value your work, you will be thankful when it gets you through:

  1. Detection
  2. Preserving the Evidence
  3. Contact the Plagiarist (if Practical)
  4. Contacting the Advertisers (optional)
  5. Contacting the Host
  6. Contacting the Search Engines

Feedback: Do you think Filipino bloggers stand a chance in a battle against plagiarism, with all these (US-biased) steps that need to be accomplished?

Feedback: Do you know that Creative Commons Licenses like the CC Attribution 3.0 License have been ported to play nicely with Philippine copyright laws?

Sugod mga kapatid!

Right now, fighting plagiarism especially in the form of sploggers and scrapers is very tedious. Hopefully things get easier in the future, but for now, at least we stand a very good chance against it.

Leave a Comment | Tags: , , , , , , , , ,

When Blogs Become Unacknowledged Mainstream Media Sources

April 5, 2008 | 7 Comments

Blogs being scraped and plagiarized by other blogs is one thing, but what do you do about the beast that is mainstream media? It’s very disappointing for this type of thing to happen, since television and newspaper companies often consider themselves far more legitimate and reputable than anything that comes from the Internet.

Yet we are experiencing an emerging culture of taking things without asking permission, much less giving proper attribution. It could be a bunch of different reasons, mostly revolving around ignorance, but what does that say about supposedly educated professionals in the fields of journalism and mass media? Where are these people’s ethics?

This is a lesson for both providers of content and those who are in need of it: Just because you can find it on Google doesn’t mean you can use it for yourself. If the TV stations strongly oppose putting their shows on YouTube, then they should treat online content with the same respect they expect. Konting respeto lang.

Similarly, if you put anything on the Internet, it is bound to be plagiarized. Certain measures can be taken to avoid this, but the most desperate of people will still find a way to steal your content: remove watermarks, paraphrase sentences, et cetera.

I’m sure bloggers would be honored for their writings, photographs, and other creations to be featured in “the real world”. There’s a reason why Creative Commons emerged as a more generous alternative to copyright. And for those who don’t already know, we have a local group that has ported the Creative Commons licenses into the Philippine legal context, so creators are more protected than ever. The problem is, people don’t read. We don’t read street signs, we don’t follow instructions. We just go online and take whatever it is we need without worrying about the consequences. CC licenses are already easy to read, but we don’t take time out to understand what “non-commercial use” even means! Because chances are, we won’t get caught anyway.

Then again, the Philippine blogging community is growing more assertive and formidable each year. And this dilemma is one of the best ways we can prove that. Act now, fellow Pinoy bloggers!

Leave a Comment | Tags: , , , , ,