Some visitors know that, besides this new site and Wordpress blog, I have a number of blogs on Google’s Blogger, and a major site called doaskdotell.com. I have a site based on my legal name, johnwboushka.com, which is based on my IT resume only (as I am “retired” now). I also have minimal profiles on Myspace and Facebook, which I use very little. I have been self-publishing on the Internet since 1996 and gone through several different strategies and domains and have experimented with a number of strategies for deploying my material. As a result, it might seem a bit helter-skelter, although it makes sense to me, and to someone who spends some time with it.
Recently, there have appeared on the Google account forums a number of problems reported by bloggers, and a number of bloggers have expressed frustration at what they perceive as slow response or customer service. There are three particular areas of concern. One is disabling or removal of blogs for suspected terms of service violations. Another is similar disabling or removal when marked as “spam”, apparently by automated “spam blog” detection. A third is removal from Adsense, apparently for generating invalid clicks. Bloggers claim, with some credibility, that these are “false positive” situations and in some cases they indeed get reinstated. But they are maintaining that the service is slow to respond to the formal appeals that the service makes available. In a few cases the bloggers might have been publishing on domains that they rent and control, but in many other cases they were bloggers using the “free” service.
Let me say here that I am not nor have I ever been a Google or Blogger employee. I have no proprietary knowledge. I am only trying to make sense of what I see going on and make constructive criticism, both of bloggers and of the Blogger service. I want to stay on Google’s good graces. I have a customer service background myself (toward the end of my IT career particularly), and it is quite plausible that I could really help them if I worked there. I just might want to try for a job there later (and move out of “retirement”). So I simply offer here what “problem solving” that I can.
These are somewhat distinct issues. First, take the “terms of service” violations. These may include (OCILLA) “safe harbor” takedowns of material that other parties have alleged infringe on existing copyrights under the Digital Millennium Copyright Act. According to the law, alleged infringers have a right to contest claims, and there are well-established procedures that ISP’s and services like Blogger must follow. There are other possible TOS situations, such as perceived libel, obscenity or hate speech, child pornography, and the like. Terms of service rules are influenced not only by United States law but by laws in other major western democratic countries (like in Canada and Europe) which sometimes are stricter on some topics. Indeed, in some cases ISP’s might remove material on receiving complaints and authorities or other complainants might not proceed with further action. Whether the speaker could successfully appeal would seem uncertain. But one of the most important things to remember about TOS problems is that they can exist whether or not the service is free (Blogger) or rented from a conventional ISP (like Verio, Yahoo!, Network Solutions, etc.)
One practical tip for bloggers might be to be careful about embedded hot links to images and videos. Publishers should make sure that they really have permission from the original sources. Embedded YouTube videos seem to be permissible, but not if the original video was pirated or is otherwise illegal.
The second area here contains getting paid by advertising services, whether Google’s Adsense, or any number of other variously structured opportunities from Internet marketing competitors. Sometimes people get canceled, particularly when the business model is partly related to clicks rather than just product purchase. Actually, a few years ago, I had Linkshare with another domain, and advertisers would “pre-approve” you with them before you could even sign up with them, because some vendors (like airlines) won’t advertise on “controversial” sites. That could be an effective system. With a click-based system, the publisher is told not to click on his or her own ads or directly encourage others (particularly household members or anyone in repeated contact) to do so. Google, for example, offers a somewhat complicated scheme (including updating the Windows registry) to “test drive” the ads without their being counted. I can make a suggestion here, that the scheme could be simplified. For example, if a publisher has a fixed broadband IP address, she could furnish that to the advertiser with the “good faith” understanding that clicks from that address are not tallied. Advertisers probably can analyze clicks by IP source and other origin routing parameters anyway. Or some logon procedure could be developed.
I’ve saved the “best” for the last here – the problem of spam blogs. First for the easy part. Spam comments are a problem too, but they’re easy to control just by using comment monitoring (and perhaps additionally requiring the commenter to sign with a captcha). The more recent problem is spam blogs, or “splogs”, about which a lot has been written in the past three years. Wikipedia gives a comprehensive and rather definitive analysis here.
Literature suggests that splogs are generated by sophisticated software and made to “look real” by scraping passages of text from legitimate sites, and then throwing in links to products generally thought of as less reputable. It seems to be driven by the fact that some advertisers and customers do bite, and the unlimited opportunity of a “free” source seems to make the idea “work”, much as with spam in email. Splogs are not illegal in themselves, but they threaten the business model upon which free publishing services are offered. There does not seem to be a big problem with them outside of these services yet.
Another related problem that has been reported is that sometimes spam postings get put on legitimate blogs by spammers who break security, perhaps just to spam, perhaps to inflict vandalism.
One can imagine that automated spam detection software would be difficult to implement reliably without a lot of false negatives and positives. It is difficult for the legitimate blogger to determine what to do to avoid false positive detection, as any recommendation could be imitated by sploggers, and so therefore the algorithms themselves must remain proprietary and secret.
It would seem, however, that even here some common sense rules could apply. The main problems seems to be links. Maybe some of it is excessive links in relation to text. But links to major news sites or government sites or other sites to substantiate what is said in a blog normally should not be a problem, because it isn’t logical that any search engine algorithm would base the ranking of a major corporate site based on references in blogs. Another issue would be “mutual admiration societies.” Blogger profiles allow the enumeration of other sites, but that capacity could be expanded to include a number of affiliation sites, in order to inform robots and search engines with “good faith” that announced affiliation should be taken into consideration in determining page ranking; once properly informed, robots would have no reason to consider such links as possible symptoms of splogging.
Often blogging providers (like Blogger) offer captcha verification to prevent automated content generation. But it is reported that sploggers get around this, and it seems that blogs are being taken down as suspected splogs even though the captcha technique ought to be available and work.
I’ve wondered about other possible symptoms. Perhaps the appearance of apparently nonsense words – but maybe they are tags in a source of a script that is being documented. Perhaps the appearance of many incomplete sentences, but those are common in informal blogs. Perhaps a wide range of subject matter. But even that is likely legitimate. The “range” could come from the use of metaphor, or it could come from the fact that the writer is “connecting the dots” and relating two or more issues or subjects not normally discussed at the same time to make a novel but legitimate argument.
So we come back to the “free service” argument. Anything that offers infinite resources for free will attract bad actors. So speakers who take advantage of it cannot claim much ownership or real rights. Perhaps all amateur blogging is in some sense a kind of “self spam” and fits into a gray area. Even rented ISP service has become so inexpensive with such generous bandwidth and storage space as to require little capital. Can this remain so forever, or will the difficulty in sifting out what is legitimate eventually make these services unprofitable for companies? Could bloggers face the same fate as weekend air travelers? That is the “anti-amateur” argument, yet “professionalism” in a world of extreme capitalism hasn’t offered a lot either.
Charging for emails has been proposed seriously as an antidote for spam in emails (something I support, if the “postage” is a microcharge per message); maybe the same concept is needed for what is now “free blogging.” There could be a limit on the number of posts, the disk space used (there is for images already), or the number of years retention, after which some sort of rental plan similar to that offered by ISP’s would have to be paid for. It could still be reasonably and affordably priced, and offer more sophisticated permanent archiving. Other possible concept could be a tiny charge for each post, just like a tiny one for each email, in order to discourage automated content generation on blogs. Perhaps the charge would be less or waived when a captcha is filled out.
July 24 2008
There are also a few reports of parties with Blogger accounts closed while the blogs stayed up. The blogger is directed to a link that mentions a “perceived terms of service violation”. There is supposed to be a manual review available. But if the blogs themselves stayed up, this sounds like a problem with automatic violation detection.
The “terms of service” violations could deal with all the “usual” problems: spam, obscenity, chain letters, virus propagation, etc.
Aug. 4, 2008
Blogger admitted somewhat profusely over the first weekend of August (to is bloggers at login) that it has a problem with false positives, and had even accidentally locked some blogs not identified as spam. There is a story at Blog Herald by Thord Daniel Hedengren, link here.