Network Working Group | J. Arkko |
Internet-Draft | Ericsson |
Intended status: Informational | July 18, 2019 |
Expires: January 19, 2020 |
Architectural Considerations on Serving Web Content in a Content Aggregation Fashion
draft-arkko-arch-web-packaging-00
News aggregators and search engines have used various formats to enable them to republish web resources. These formats have included Google’s AMP, Facebook’s Instant Articles, and Apple’s News Format, and new developments such as the Web Packaging proposal. This memo discusses the architectural considerations in these systems, in view of what aspects of the content delivery the different parties can or cannot control.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 19, 2020.
Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
News aggregators and search engines have used various formats to enable them to republish web resources. These formats have included Google’s AMP, Facebook’s Instant Articles, and Apple’s News Format.
The technical reasons behind these formats and associated mechanisms include desire to improve performance, e.g., via pre-fetching content from the distributing site. There are some issues, however, including giving up much control to the aggregator site. For instance, it is possible for the aggregator to modify content, the original publisher’s URL may not be visible, and it isn’t always possible for the publisher to control how and when the content is display or how access to it is controlled.
Web Packaging is one proposal to address some of these issues. The basic idea is that content should be presented to users as if it were obtained from the original site, no matter where it was actually fetched from, and still be assured that the content has not been modified.
This memo discusses the architectural considerations in these systems, in view of what aspects of the content delivery the different parties can or cannot control.
In a traditional setting, the publisher holds the infrastructure needed to serve the content. They are in charge of the integrity and confidentiality of the information served, as well as all mechanisms related to access control, analytics, localization, advertisements, etc.
Optionally, it used to be possible for transparent proxies to cache content. This allowed re-delivery from another entity, but relied on the HTTP transactions to be in the clear, and allowed the proxy to perform modifications on the content. Compression for specific clients or networks was a common practice in these systems. Given that HTTPS has become almost universally employed for web connections today, this type of caching is no longer possible in the general case.
In the Content Delivery Network (CDN) model, some or all of the infrastructure needed to serve the content is outsource to a separate service. This is economically desirable in a lot of cases, because a large CDN provider can amortize the cost of a world-wide distributed delivery network across its many customers
Similarly, a CDN model allows a publisher to outsource the delivery function to someone else who may be more focused on that function.
The downside with the CDN model is that typically, one needs to give the TLS keys needed for representing the publisher’s website (e.g., “www.example.com”) to the CDN. There have been some proposals, e.g., LURK to reduce the impact of this. But in any case, a CDN network will be able to provide any, even modified content to the users, and will be able to see all content. However, the CDN provider and the publisher have a business agreement, which obviously discourages one-side actions by the CDN network.
In the Web Packaging model, a resource, a web page, or even an entire site to be packaged in a manner that it can be stored, shared, and re-distributed. Some of the motivations for doing this include:
Variants of the web packaging model include technologies focused on the priorities of content aggregators that allow the aggregators to perform all tasks (e.g., content validation, identifying marks, or even modification) at will, without giving much room for the publishers to secure or control the content. Accelerated Mobile Pages (AMP) [AMP] falls in this category, for instance. The newest Web Packaging model being standardized enables, for instance, publisher to sign content and both the aggregator and the users to verify that the content actually comes from the publisher [I-D.yasskin-http-origin-signed-responses] [I-D.yasskin-wpack-bundled-exchanges].
The key questions in this space really about power and control. Who has the power to:
The different approaches discussed earlier take a different cut at allocating the control to the different players.
In the long-term, who controls particular aspects will drive the architecture of the web and the evolution of the business ecosystem.
It seems evident that there is a need to provide better controls for the publisher to control the aggregation that their content is involved in, both in terms of its detailed look and feel, and securing and access controlling it in appropriate ways.
While some of the newest proposal go a long way towards resolving some of the concerns, they do not address everything. For instance, Web Packaging does not provide confidentiality against the content aggregator. As Barnes and Cooper argue in [BA], there are design examples that prove confidentiality could be provided through proxies.
The discussion is also (perhaps naturally) focused on current large-scale web traffic. There may be other use cases for security that is not tied to a transport session, however. These use case may matter a lot as well, e.g., IOT devices that wish to deliver or receive information but may not be reliably connected but rather depend on relays delivering information packages forward. Today such arrangements typically involve relays that can read and modify all that content.
Also, it is essential that Internet is not designed for centralization. [RFC1958] says “This allows for uniform and relatively seamless operations in a competitive, multi-vendor, multi-provider public network”. And [RFC3935] says “We embrace technical concepts such as decentralized control, edge-user empowerment and sharing of resources, because those concepts resonate with the core values of the IETF community.”
The lead-up of many of the developments discussed in this paper relate to peer-to-peer networking and enabling faster access to content for underserved areas. But a bigger factor may be the tussle for control between different parties, in the end if a user gets their page from a search result original publisher does not significantly impact how much material needs to be downloaded. Peer-to-peer and Information-Centric networking are potentially very useful technologies, but it seems that their role in this particular case is perhaps exaggerated. In particular, in the author’s opinion the representation that issues in the web packaging space are a tradeoff between high efficiency and keeping power at the publisher side are false.
The opinions expressed in this paper are solely the author’s opinions, and subject to change.
The author would like to thank IAB members and the participants of the 2019 Exploring Synergy between Content Aggregation and the Publisher Ecosystem (ESCAPE) IAB workshop held in Herndon, Virginia USA.
This paper was particularly influenced by the workshop papers [BE], [DN], and [RE].
[AMP] | AMP, ., "AMP HTML Specification", AMP Open Source Project , 2019. |
[BA] | Barnes, R. and A. Cooper, "Protecting Content from the Cache", Position paper in the IAB ESCAPE workshop, July 2019, Herndon, Virginia, USA , 2019. |
[BE] | Berjon, R., "ESCAPE: The New York Times Position", Position paper in the IAB ESCAPE workshop, July 2019, Herndon, Virginia, USA , 2019. |
[DN] | DePuydt, M. and M. Nelson, "Signed Exchanges and The Importance of Trust in Aggregator/Publisher relationships", Position paper in the IAB ESCAPE workshop, July 2019, Herndon, Virginia, USA , 2019. |
[I-D.yasskin-http-origin-signed-responses] | Yasskin, J., "Signed HTTP Exchanges", Internet-Draft draft-yasskin-http-origin-signed-responses-06, July 2019. |
[I-D.yasskin-wpack-bundled-exchanges] | Yasskin, J., "Bundled HTTP Exchanges", Internet-Draft draft-yasskin-wpack-bundled-exchanges-01, July 2019. |
[RE] | Rescorla, E., "Ecosystem Impacts of Web Content Syndication", Position paper in the IAB ESCAPE workshop, July 2019, Herndon, Virginia, USA , 2019. |
[RFC1958] | Carpenter, B., "Architectural Principles of the Internet", RFC 1958, DOI 10.17487/RFC1958, June 1996. |
[RFC3935] | Alvestrand, H., "A Mission Statement for the IETF", BCP 95, RFC 3935, DOI 10.17487/RFC3935, October 2004. |