For some reason I had to delete some pages, these pages are using the HTML suffix, so I blocked them in robots.txt use Disallow: /*.html, but it’s been almost a year, I found that google robot often capture these pages, How can I quickly let Google completely remove these pages? And I have removed these URL from google webmaster tool by google index-> remove URLs, but Google still capture these pages.
Tip: Along with delicious I search on scoop.it for similar opportunities. If they liked an article related to a year.. say 2013 and you update the resource to 2014 chances are they’ll share it. Kind of a twist on your delicious + sky scraper technique. You don’t even have to make the content much different or better, just updated! Got some fantastic links recently because of it.
In March 2006, KinderStart filed a lawsuit against Google over search engine rankings. KinderStart's website was removed from Google's index prior to the lawsuit, and the amount of traffic to the site dropped by 70%. On March 16, 2007, the United States District Court for the Northern District of California (San Jose Division) dismissed KinderStart's complaint without leave to amend, and partially granted Google's motion for Rule 11 sanctions against KinderStart's attorney, requiring him to pay part of Google's legal expenses.
Robots.txt is not an appropriate or effective way of blocking sensitive or confidential material. It only instructs well-behaved crawlers that the pages are not for them, but it does not prevent your server from delivering those pages to a browser that requests them. One reason is that search engines could still reference the URLs you block (showing just the URL, no title or snippet) if there happen to be links to those URLs somewhere on the Internet (like referrer logs). Also, non-compliant or rogue search engines that don't acknowledge the Robots Exclusion Standard could disobey the instructions of your robots.txt. Finally, a curious user could examine the directories or subdirectories in your robots.txt file and guess the URL of the content that you don't want seen.
Great post. I know most of the stuff experienced people read and think “I know that already”… but actually lots of things we tend to forget even though we know them. So its always good to read those. What I liked most was the broken link solution. Not only to create a substitute for the broken link but actually going beyond that. I know some people do this as SEO technique but its actually also useful for the internet as you repair those broken links that others find somewhere else.