StopBadware.org

Today, we are announcing a new project — StopBadware.org
— at the Berkman Center, in partnership with the Oxford Internet
Institute and our unpaid special advisors at Consumer Reports Web
Watch.  This is an active research initiative that collects data
and stories from consumers in a  publicly-accessible
clearinghouse, sets forth a series of guidelines for what constitutes
“badware” in our view, and will involve our publication of an ongoing
series of reports about downloadable applications that violate these
guidelines.  We are fortunate to have the support of Google, Sun,
and Lenovo, as well as an exceptional 8-member working group and world-class advisory board.

This project is very much intended as a complement to the many other
good efforts underway to stem the tide of badware, like the work of
TrustE, the AntiSpyware Coalition, researcher Ben Edelman, and many
others in the public and private sectors. 

States, Companies, Privacy, Speech

David Berlind has a great piece
based in large measure on an interview with Jonathan Zittrain about the
law enforcement/privacy/tech company flap kicked off by DOJ’s measures
to get Google to comply with their order to turn over search
data. 

Berlind sets it in the right frame, I think, which is not a simple
request for a single set of information to solve a given case or to
stop a crime from happening but rather in the larger context of the
role of technology companies vis-a-vis states in carrying out law
enforcement activities:

“In the bigger picture though (and on the heels of the domestic spying
issue), the warrant for search data, particularly when there isn’t an
investigation into a specific case of wrongdoing, raises more questions
about how far the Feds can and will go when it comes to mining domestic
sources of information that many (including Google, apparently) believe
to be off-limits to the government.  Most US-based Internet users,
for example, use the Internet on the assumption that a record of their
behavior (whether it includes personally identifiable information or
not) won’t fall into government hands.

“Perhaps the most obvious question is ‘where does it end?’  Does
compliance with the DOJ’s request set an ugly precedent that paves the
way for the Feds to comeback for a mile once they’ve taken an inch?
Even if the data that Yahoo, Microsoft, and AOL turned over to the Feds
was uncompromising in terms of privacy, with no particular criminal
investigation taking place, what happens when the Feds see something
they don’t like? Can they just come back for more and take it? Not to
be alarmist or extreme here, but is China — where Yahoo and Microsoft
(also this) have already had anti-democratic run-ins with that nation’s
government — on the other end of the spectrum along which
domestic Internet surveillance policies are shifting and how far along
that spectrum of chilling effects will the US shift?”

(Before reading what Berlind/JZ said, I did an interview with Red Herring on the same topic.)

Burningbird's post and comments

There’s a remarkable and worthwhile thread forming over at Shelley Powers’ Burningbird blog after her post on RSS and copyright
I don’t agree with everything written there, but it’s a fascinating
back-and-forth, and features two posts (at least) from Denise Howell,
which alone makes it worth the read.  It strikes me as just the
kind of sorting process that we need to go through, to get opinions
about these norms aired and understood.

On RSS, but nothing to do with copyright

Gregory Lamb of the Christian Science Monitor has a great, forward-looking piece on the future of RSS:

“Mike Richwalsky has an online helper who keeps him informed. It tells
him when his friends post new items on their websites or new photos to
sites like Flickr. It advises him on what Netflix movies he might want
to rent and gives him the latest scoop on his favorite sports team, the
Pittsburgh Steelers. It also alerts him if his name, or that of
Allegheny College, where he works as a Web administrator, is mentioned
online. It’s even ready to signal him if an online merchandiser gets a
hard-to-find Xbox 360 game console in stock.

His helper is an RSS aggregator. RSS stands for Really Simple
Syndication, and its purpose is in fact really simple: ‘Feed’ the user
information every time a weblog, news source, or a selected website has
been updated with new information.”

Susan Mernit's Point on RSS and Copyright

Susan Mernit, a wonderful analyst of all things Web 2.0, writes,
in a thoughtful post responsive to the flap over RSS and copyright:
“But it seems to be what Palfrey has not yet addressed–which makes
sense considering this company is so new–is that many of the players
entering into the bundled space recognize they have to give more back
to their creative sources than just a little traffic or a thank you.
… Without some share in the revenue, it’s not right to make $$ from
anything more than a headline and a digest, unless the blogger has
specifically given permission for a great depth to be published off
site.”

No better time than the present to address it.  I don’t have an
answer, by any means, but it seems like a terrific question, one
well-worthy of discussion.  I should note that I don’t think of
this as a “legal” issue (those are addressed in an all-too-long post
yesterday).  But I think it’s a critical issue from the
perspective of developing this ecosystem based on syndicated
content. 

I wonder if what Susan points to is an emerging consensus, which would
help clarify the community’s views and the norm around aggregation (we
could call them the “Mernit Principles”):

1) If a for-profit company a) aggregates RSS headlines and digests of feeds only
(presumably there’s a norm around what is appropriate “digesting”, but presume for these purposes it’s
something well short of a full feed, consistent across all sources
aggregated); b) provides an easy mechanism for those who wish to opt-out to opt-out; and c) observes all licenses
and other stated preferences of those who offer feeds, then it’s OK to make money on the
aggregated content with ads served alongside the content in some
fashion.  (Perhaps My Yahoo! is a — presumably very profitable —
example of such a
model, or something along these lines, as My Yahoo! seems to render
just headlines from the RSS feeds I’ve got loaded in there.)  It
reminds me of what Dave said back in December about how to make money online.

2) If a for-profit company aggregates full RSS feeds and makes money
from the aggregation, it’s not enough to give the source of the feeds
some links back or a hat-tip or similar kinds of  non-cash remuneration.  If full RSS feeds are included in
the aggregated content, then some form of revenue-sharing needs to be
worked out to repatriate cash to the people creating the
works.   Such a model might be what Gather.com
and others seem to be suggesting as the way forward (“It just seems
fair that we share our advertising revenue with you based on the
quality and popularity of the content you contribute on Gather.”) 
Such a model could make sense in the way that eBay and Google have made
sense: serving as public online platforms on which other people could make a bit of
money, while ensuring that the platform providers got enough, say, to go public
and render the founders billionaires.

Does that sound right? 

(A broken record, I know, but my disclosures apply here big-time, as with other posts over the past two days.)

Follow-up to RSS and Copyright

Mikel.org has a post
that is right on — where Top10 Sources or anyone else makes a mistake
in republishing an RSS feed that is subject to a (cc) license, it
should fix that mistake fast.  (It’s possible, of course, for
someone to give license to do something beyond what the (cc) license
says, so there may be other facts in play here; but the core point
remains.)  The human-and-technical system may be fallible, and/but
things that slip through the cracks of the policy should be corrected promptly.  Michael also notes that Top10 Sources
itself should have an outbound (cc) license, especially to Share-Alike,
where Top10 Sources has the right to do so.  Again, that’s right,
and should (will!) be fixed. 

(Update: with thanks to Michael for
pointing it out, and to the Top10 Sources team for quick turnaround,
the changes have been made to the site.
)

RSS and Copyright, circa 2006

There’s been a flurry of posts and comments on a topic I’ve long been
watching, which is the status of copyright and syndication
technologies.  It’s arisen this time in the context of a project
that I’m involved with outside of my Harvard work, called Top10 Sources
(see my disclosures
for more; it’s important to note that what you read below is
potentially colored by my obvious interests here, though I believe I’ve
been 100% consistent on the merits of this argument since I became
involved in the discussion).  

The issue, raised by a few respected members of the blogosphere, Mike Rundle, Om Malik and Adam Green among others, is whether Top10 Sources
is doing something that violates copyright or, separately, is doing
something that is outside of “the bounds of accepted aggregator
behavior” (perhaps related to the furor over splogs).  My view is
that the site is doing neither.  I believe also that this issue is
a very important one to vet fully, as a community, because this debate is
going to recur and recur until we sort it out.

What Top10 Sources does is to introduce readers who
ordinarily don’t spend all their time reading blogs into the
medium.  The idea is to offer a directory of reading lists,
available as web pages and as OPML files, as well as a quick synopsis
of what each of the chosen sites is saying.  Top10 Sources is
meant to be helpful to the RSS-offering community by directing readers
to great content, to get people to subscribe to your feeds, to get
people clicking through to blogs.  The Top10 Sources editorial
group also ends up learning about communities built around ideas. 
(Soon, Top10 Sources will enable anyone to create their own, competing
Top10 lists and upload them to the site, which will add another
dimension to the analysis below.)

On the copyright matter: what Top10 Sources does is instructive to
whether it’s lawful. 
First, an editor, as part of an editorial team chooses a topic, spends
a LOT of time in the community of people writing about this topic,
consults some technical metrics for the sources, and chooses 10 online
sources (defined simply as offering a feed syndicated using some flavor
of RSS) that cover that topic.  The editors periodically repeat
this process, taking one source off the list when a voice fades or
stops covering the same topic, and adding a new voice as it emerges as
important and topical.  The point is to create a human-edited
Reading List by topic, and to contribute those sources into a
human-created, limited search engine.  

As the editor compiles the site, the editor sends out an e-mail to the
person who appears to be responsible for the site, or, sometimes, posts
a comment to say that the site has been chosen.  The site renders
a list of those sites offering the feeds as directlinks to the
page.  The site also subscribes to those feeds and renders them
all together on a single page.  It is this latter activity that I
take to be the concern.  

The issue raised here is whether it is a copyright violation to render
these syndicated feeds in this way.  As a matter of copyright law,
I contend that it is not.  The strong form of the pro-copyright
argument runs like this: the creator of the RSS feed retains,
automatically, all copyrights in the content in the feed and retains
all rights in its republication, use as a derivative work, and so
forth.  Given that those rights have been retained fully by the
creator of the site, the argument goes, it is unlawful for someone —
presumably in a commercial context — to republish that copyrighted
context without license to do so.  This is the Web 2.0 variant of
the argument that is litigated frequently in the context of web-based
content, with plaintiffs like the RIAA and the MPAA (in the p2p
context), the publishers (like McGraw-Hill, or Perfect 10) who are
suing Google, and the like.  

Though I don’t believe this to be the end of the story, to be fully
responsive to this argument, Top10 Sources offers to remove any feed
chosen as a top source immediately.  So far, out of the 1,500+
sources chosen to be included in one of the site’s recommended
“Reading Lists,” only two sites have asked to be removed, both owned by
the same copyright holder.  (Out of deference to them, I won’t
list them here, though the publisher is well-known for his stance on
this topic, which I respect.)  So, as an open invitation, for
anyone included in one of those lists who wishes to be taken out, just
write to terms of service, in the footer of every page, which includes a section on Copyright.  

Why this is not the end of the story is that there are several other
factors to consider.  One is a defense of fair use, which is a
four-factor test that excuses some activity that would otherwise be
unlawful.  Another is the concept of implied license: why, after
all, would someone in fact offer an RSS feed if they did not want to be
included in aggregators?  As an empirical matter, the fact that
far fewer than 1% of those that Top10 Sources has included in
aggregators in fact have complained about inclusion suggests a norm
around what people are expecting when they decide to syndicate their
content.  As a broader sample size, consider all of the
aggregators, whether public or private in the market, which now number
in the hundreds, and the fact that we have not yet had a train wreck
around the presentation of content in these web pages.  Another is
the fact that many people have written in, asking to be added to the
aggregators.

This is so because, fundamentally, RSS is ads.  As Dave Winer has written, “RSS itself is an advertising medium, if you use it correctly.”    Or, put another way
by Mitch Ratcliffe, “RSS is not content, it’s a
channel.”    The point of many public aggregators is a
place to run these ads, or a TV Guide to these new channels.  Some
people also embed ads in their feeds, presumably so that these ads will
run other places and be seen or clicked through.  Another way to put it: “People come back to places that send them away.”  (Recall what happened to the AOL walled-garden model.)

If a publisher of RSS feeds thinks of it differently, that publisher
has options.  First, the publisher can and should put a license in
the feed that says what they want people to do or not to do with their
feeds.  Creative Commons licenses, as I’ve argued on this blog,
are the way to
go — to embed them into the RSS feeds when they go out, with clear
instructions for your intent.  If you want people to run your feed
in private aggregators,
but not in public aggregators that are for-profit, to re-offer your
content just as you’ve offered it, and to attibute authorship to you,
why not add to your feed a BY-NC-SA
license?   Second,
the publisher of the source, as some have done, can make clear on their
blogs or by
writing to those who aggregate or allow others to aggregate their
content not to do so, pursuant, for instance, to the DMCA 512
procedures.  If an aggregator does not abide your wishes, then the
publisher can seek to assert a copyright complaint via the courts or
otherwise.  But to switch the presumption, somehow, back to a
strong form of the copyright argument would do far more harm than good.

I’ve been worried about this issue since early in 2003.   
Is history repeating itself?  Is the blogosphere arguing itself
right into a trainwreck of the sort that has played out over music and
movies?  Consider the world that A (prominent) VC envisions, here  and here,
wherein content is micro-chunked and syndicated.  This world
cannot emerge if every plausible copyright claim is asserted and
litigated.  Is it a “permission culture,” as Lawrence Lessig has
argued, that we want to head for, where every use of syndicated content
must be pre-approved?

OK, so maybe you don’t like the micro-chunked and syndicated version of
the future.  Even without that version of the future, the rights
in syndicated content should be clarified.  There’s no doubt that
common practice is to share the content that you are syndicating for a
wide variety of uses.  That’s the default that has emerged. 
Simple, clear, online licenses should demark those feeds that are not
meant to be consumed broadly in such a fashion, before the train-wreck
hits.

Back to Top10 Sources, I expect to take up this issue again with the
management team.  I don’t think there’s anything being
done wrong from the perspective of the law.  But we should take up
for discussion some of the
ethical issues that Mike Rundle and Om Malik raise and suggestions that Adam Green
makes about how much of a given feed that the
site republishes — maybe a truncated version of the feeds is the right
thing to render.  The point is not to “steal” someone’s content,
but rather to direct readers to that person’s content after giving a
snippet of it.  Perhaps the right answer is to limit how much of a
feed’s offering is republished in the aggregator.

The broader issue of RSS and copyright remains.  The community is
speaking, to large extent, by creating a norm around syndication and
aggregation which is very important.  It would be a great shame if
the terrific changes being wrought by online publication, syndication,
and aggregation were to be brought down by an aggressive (and in my
view, wrong) reading of the world’s copyright laws.  As my friend
and colleague Jonathan Zittrain might say, the Internet and its
communities have a terrific way of “self-healing.”  This topic is
a great one for the Internet community to solve on its own before it
becomes a (self-)destructive fight.

(Addendum: Adam Green responds, with helpful annotation.  For the record, Adam, you were wildly overqualified for that Extension School class.  It was no doubt the right decision to have dropped.)

Two follow-ups to Berlind Tuesday at Berkman

For those who missed it, David Berlind submitted to an interview — how’s that for being a good sport when the tables are turned! — for the Berkman homepage blog.  Here’s a [snip]:

Question: You mentioned in today’s luncheon series that you had an idea for a transparent workflow for journalists, but that the software sucked. What would that look like?

David Berlind: Just make it easier to encode raw material and transmit it to people who might want to take a look.  The technologies exist.  They’re just not glued together in a way that takes the friction out. For example, a typical blogging system has all the RSS you’d ever need. But, if one source of your material as journalist is e-mail, just try moving your e-mails into the blogging system so “watchdogs” can get at that source material via RSS.  It’s doable.  But it’s so burdensome that you give up trying (especially when you think about how journalists have to work harder faster, etc.. going back to what we have to do to survive in the first question).  The last thing we need is something else that takes our precious time.  With the press of two or three buttons, you could record a phone interview and publish it into an RSS feed.  But someone has to design the software to make it that simple.

David also posted after-thoughts on one of his several blogs over at ZDNet.  Adam Green had this to say after lunch. 

A great Tuesday guest; thanks, David.  (This coming week: Dan Gillmor.)

David Berlind at the Berkman Center

We’ve got David Berlind, executive editor of ZDNet
here as part of our Tuesday luncheon series at the Berkman Center
today.  We’re hoping to rope him into JZ’s cyberlaw class (to talk
ODF, SCO, and the like) and also for the fellows’ meeting this
afternoon, too.  His 75-minute tour-de-force on a range of issues
from the software industry’s past and predictions about the future down
to specifics of certain XML formats, DRM, and open standards.  My
guess is the podcast of the lunch, though long, will be a good one.  (Audio webcast is here now, if you are tuning in at midday, Boston time, on Tuesday, January 10, 2006.)

Nart Villeneuve on Filtering in First Monday

Our truly wonderful colleague in the ONI, Nart Villeneuve,
director of technology at the Citizen Lab at the University of Toronto
(and, truth be told, a key element of the brains behind all filtering
research), has a timely new article in First Monday on filtering. 

His abstract: “Increasingly, states are adopting practices aimed at
regulating and controlling the Internet as it passes through their
borders. Seeking to assert information sovereignty over their
cyber–territory, governments are implementing Internet content
filtering technology at the national level. The implementation of
national filtering is most often conducted in secrecy and lacks
openness, transparency, and accountability. Policy–makers are seemingly
unaware of significant unintended consequences, such as the blocking of
content that was never intended to be blocked. Once a national
filtering system is in place, governments may be tempted to use it as a
tool of political censorship or as a technological “quick fix” to
problems that stem from larger social and political issues. As
non–transparent filtering practices meld into forms of censorship the
effect on democratic practices and the open character of the Internet
are discernible. States are increasingly using Internet filtering to
control the environment of political speech in fundamental opposition
to civil liberties, freedom of speech, and free expression. The
consequences of political filtering directly impact democratic
practices and can be considered a violation of human rights.”

A relevant finding for the swirling debate over China and the role of
US corporations: “Countries such as Iran, Saudi Arabia, United Arab
Emirates (UAE), Tunisia, Yemen and Sudan all use commercial filtering
products developed by U.S. corporations.”