Guan’s blog

home

CDN with TCP anycast lite

17 Mar 2011

This is an idea for how to construct a better CDN. I don’t know if anyone has thought of this before. It’s a boring technical discussion, so it goes under the fold.

A CDN is supposed to route client requests to the “nearest” node in the network. There are generally two common ways to identify the nearest server:

  1. When the client makes the initial DNS request, have it be served from a DNS server that uses IP anycast. This means that there are many different BGP announcements for the DNS server’s IP address, generally one for each CDN node. The DNS server reached by the client then returns the unicast IP address for the CDN node right next to it. This method returns the CDN node nearest to the client’s DNS server, as determined by BGP.
  2. Give everyone the same IP address, which points to a webserver that looks up the user in a GeoIP database. The client is directed to the geographically nearest CDN node with an HTTP redirect.
The HTTP redirect based method works poorly when the GeoIP database has no or inaccurate information about the client’s IP address or when geographical distance corresponds poorly to network distance. Anycast DNS routing usually works well, but is bad when the client is far from his recursive DNS resolver. This can happen if a geographically dispersed ISP has a central DNS cache, or when users use OpenDNS or Google Public DNS. For example, there have been stories of large Apple update downloads being slow because many Google Public DNS users reach the same Akamai node.

TCP anycast is a third, much less common, way to run a CDN. As far as I know only CacheFly and MaxCDN use it. With anycast TCP, the geographically dispersed content servers are accessed through a single IP address within an anycast announcement. When you make a request via HTTP or RTMP, BGP will make sure your request reaches the nearest CDN node, and content servers in that node will serve your content back.

The problem with TCP anycast is that TCP is a stateful protocol and the servers will get confused if, through the vagaries of BGP, subsequent packets within the same TCP connection reach a different CDN node. The presentation I linked to above claims that TCP anycast works very well, even for long-lived connections. However, it is still not widely adopted. CacheFly probably employs some very talented network engineers. It just seems easy to get wrong.

My proposal is a simpler variant of TCP anycast. It involves an HTTP redirect, which some may find unacceptable, but which could work for a lot of CDN users:

  1. All users get the same IP address, in an anycast announcement.
  2. That IP address points to a webserver in the nearest CDN node. It immediately redirects at the HTTP level to a different hostname/IP address that is node specific. For example, if I request http://cdn.guan.dk/, I will reach the anycast web server at the New York node, which will redirect me to http://nyc.cdn.guan.dk/, which points to an IP address in a unicast announcement.
  3. The content server at nyc.cdn.guan.dk serves up content without interruption.
This setup uses TCP anycast to determine the client’s location and which CDN node to use, but does not rely on routes to be stable over a long-lived TCP session, only the time it takes to redirect the user, hopefully less than 100 ms and about 8 to 10 packets.

Reactions to the New York Times paywall

17 Mar 2011

Cory Doctorow:

This won’t work.

Erick Schonfeld:

That’s all fairly reasonable and forward-thinking. But there is one part of the pricing plan that is wrong-headed. It discriminates by device. Depending on what device you read the paper on, you will be charged differently for an all-digital subscription. The pricing plans start at $15 a month for Web access plus iPhone, Android, or other smartphone apps. On the iPad or other tablets, it will cost $20 a month. And if you want to switch between the Web, phone, and tablet, that will cost you $35 a month.

Felix Salmon, after estimating how much revenue the paywall will bring in:

That’s extra revenues of $24 million per year. $24 million is a minuscule amount for the New York Times company as a whole; it’s dwarfed not only by total revenues but even by those total digital advertising revenues of more than $300 million a year. This is what counts as a major strategic move within the NYT? … So by my back-of-the-envelope math, the paywall won’t even cover its own development costs for a good two years, and beyond that will never generate enough money to really make a difference to NYTCo revenues. Maybe that might change if the NYT breaks its promise to offer full website access for free to all print subscribers. But that decision would be fraught in all manner of other ways. For the time being, though, I just can’t see how this move makes any kind of financial sense for the NYT. The upside is limited; the downside is that it ceases to be the paper of record for the world. Who would take that bet?

Update: Kevin Drum:

It’s true, as Felix says, that a rough calculation suggests that the paywall won't initially generate a ton of revenue for the Times. Still, I really don’t see a business model going forward in which companies like the Times continue to lose print subscribers as they give away their product online. One way or another, news readers have to get used to paying for content that they use heavily, and they might as well start getting used to it now. After all, if the Times, which is easily the best general purpose news outlet in the country, can’t convince people to pay for their stuff, then who can?

But what is the point of making news readers pay if that doesn’t actually make you money? Pride? Most print subscribers pay less than it costs to print and distribute the newspaper, but there are sound business reasons for making them pay anyway. Those reasons may also apply to some websites, but I’m not sure the Times qualifies.

Update 2: Tyler Cowen:

Don’t ask me to explain all the details of the pay wall system (can’t people set up rotating faux blogs and tweets, rich with daily NYT links, to get around the limits?), but I know there will be an articles quota, twenty per month. So the new NYT incentive is to have more than twenty must-read articles each month. Maybe they’re hire Bill Simmons. … The NYT arguably will be running fewer cliched or predictable or easily substitutable articles. It should make the paper less comprehensive, but sharper at the edges. The incentive of NYT writers to keep blogs — so people can access their columns easily — will go up.

On that last point, the incentive for NYT writers will be to keep blogs outside nytimes.com because most of the official blogs will be behind the paywall.

Which NYT blogs will be outside the paywall?

17 Mar 2011

As previously rumored, the many New York Times blogs will also be behind the paywall:

Visitors can enjoy 20 free articles (including blog posts, slide shows, video and other multimedia features) each calendar month on NYTimes.com

Blog front pages will still be accessible, but anything under the fold is paywalled. There are two exceptions, DealBook:

To honor our commitment to our loyal DealBook readers, all our articles will continue to be accessible without a digital subscription.

And The Learning Network:

To honor The Times’s longstanding commitment to educators and students, this blog and all its posts, as well as all Times articles linked from them and from our Twitter and Facebook accounts, will be accessible without a digital subscription.

None of the other blogs have a notice like these. (I actually had to click through every one of them to check. Some of them haven’t been updated in ages.)

Some notable non-exceptions are the two local blogs, Fort Greene and Clinton Hill, which is a partnership with the CUNY Graduate School of Journalism, and East Village, with NYU’s Carter Journalism Institute.

I am also surprised that the op-ed columnist blogs (Ross Douthat, Nicholas Kristof and Paul Krugman) are not excluded. Will they disappear from the conversation?

Financial innovation counterfactuals

15 Mar 2011

Josh Lerner and Peter Tufano have a new paper (I can’t find an ungated copy) on a research agenda on the welfare implications of financial innovation that involves considering counterfactual histories in which the innovations never occurred.

I’m not sure what to think of this as a research agenda, but there are some interesting case studies in the paper itself, which consists largely of narrative analysis. The inspiration comes from Robert Fogel:

Just two years later [1964], Robert W. Fogel, a future Nobel laureate in economics, published his masterpiece Railroads and American Economic Growth. In it, Fogel advanced a method, now used in history, political science and economic history, to consider counterfactual histories.

Fogel combines counterfactual reasoning with empirical estimates of development. He compares observed GDP increases with three counterfactuals: no railroads at all, an extension of internal navigation (canals), and the improvement of country roads.

They believe that the impact of venture capital on social welfare has generally been positive, citing statistics on innovation, product market strategies, startup outcomes, as well stock market outcomes:

In late 2008, 895 firms were publicly traded on U.S. markets after receiving their private financing from venture capitalists … By late 2008, venture-backed firms that had gone public made up over thirteen percent of the total number of public firms in existence in the at that time. And of the total market value of public firms ($28 trillion), venture-backed firms came in at $2.4 trillion—8.4 percent.

They also believe that mutual funds, index funds and ETFs were beneficial to investors, while evidence on the benefits of securitization, compared to plausible counterfactuals, seems to be much more mixed.

The consumer’s dilemma license

15 Mar 2011

The Social Science Research Council has a seemingly excellent 440-page report on Media Piracy in Emerging Economies. It has an interesting license for the electronic version:

  • US$8 for non-commercial use in high-income countries—a list that for the present purposes includes the USA, Western Europe, Japan, Australia, Israel, Singapore, and several of the Persian Gulf States (Kuwait, Qatar, the United Arab Emirates, Brunei, and Bahrain), but not Canada.
  • Free for non-commercial use outside the above-listed high-income countries.
  • US$2000 for commercial use, defined as use by businesses that realize financial gain from film, music, software, or publishing, and/or the enforcement of copyrights thereof, with annual revenues greater than US$1 million. Volume licensing is available.
(Emphasis added.) (Hat tip Randy Picker.)

They later clarified the license to define journalism as non-commercial activity and reserve a free copy for Chris Dodd.

You may interpret this license as a comment on copyright, piracy and intellectual “property”, but I see it as a dig at Canada and heartily approve.

I did pay the $8. I’ll report back when I have read the report.