| Gillian Spraggs ( @ 2009-07-24 00:24:00 |
Google Book Settlement: the Likelihood of Piracy
For earlier posts on the Google Book Settlement, see:
Google Book Settlement: the Background
How the Google Book Settlement affects European authors and rights-holders: 1
How the Google Book Settlement affects European authors and rights-holders: 2
Google Book Settlement: the Proposed Book Rights Registry 1
Google Book Settlement: Some Clarifications
Google Book Settlement: the Proposed Book Rights Registry 2
Google Book Settlement and 'Unclaimed Funds'
***
19. Under the Settlement Google Inc. proposes to deal with works differently depending on whether or not they are defined as ‘commercially available’ according to the terms of the Agreement. The Agreement defines a book as ‘commercially available’ at a given point if the rights holder or his or her licencee were offering it for sale new in the United States ‘through one or more then-customary channels of trade’. In that case Google will classify the book as ‘in print’ and will not make any ‘display uses’ of it, such as providing previews to searchers, including it in institutional subscriptions, or allowing consumer purchase of online access to it.
The definition of ‘commercially available’ has caused alarm among foreign publishers, since it seems to imply that books in print but not published or directly distributed in the US would be made available by Google to searchers (in preview) and customers (for online access), unless and until the rights-holders registered the works at issue with the Book Rights Registry and changed the settings, or applied to have them completely removed from the book corpus. However, following consultation with the lawyers who negotiated the Settlement on behalf of the AAP, the Publishers Association of the UK has reported that Google plans to classify any book as ‘commercially available’ if it can be purchased new from within the US through a website.
[http://www.googlebooksettlement.com/in tl/en/Settlement-Agreement.pdf, §§ 1.28, 1.47, 1.48, 3.2.(b), 3.2.(d);
http://www.publishers.org.uk/download.c fm?docid=4A07F799-400E-41CA-980B10389878 2A4B;
see also http://www.copyright.com/media/pdfs/Hea lyinterview.pdf]
If Google were to make a mistake in determining whether a book is available in the US, the Agreement lays down one ‘sole remedy’: Google must correct the mistake within 30 days. A lot of damage might be done to the value of a copyright in 30 days. If a rights-holder were to pursue a dispute against Google, over a mistake or a disagreement, their only recourse under the Settlement would be to submit themselves to arbitration by an arbitrator drawn from a pool that had been previously selected by Google and the Book Rights Registry. [see § 17].
[http://www.googlebooksettlement.com/in tl/en/Settlement-Agreement.pdf, §§ 3.2.(d).(i), (iv)]
20. The Settlement Agreement goes into some detail about the restrictions that will apply if a consumer should purchase a work through Google. The purchaser will be allowed to view the work online, but not to download an electronic copy, or to print more than 20 pages at a time, or copy and paste more than four pages at a time.
This is evidently supposed to reassure rights-holders that their works are safe from piracy. The fact that the Authors Guild and the AAP, on the one hand required these restrictions, and on the other hand were (presumably) satisfied by them indicates a remarkable technological ignorance on their part: which is, perhaps, particularly strange on the part of the publishers. To put it plainly: anything that is displayed on the screen of an ordinary computer can be copied and saved and/or printed by the user. Anyone who thinks otherwise has never investigated the use of the PrintScr key on their keyboard. For greater convenience, there are screen capture programs available online, some of them free to download. Screen capture is a perfectly legal technology with legitimate uses.
In the case of browsers, every image that appears on the screen is saved to the folder of temporary internet files, from which it may be afterwards retrieved. It is not clear whether Google plans to grant purchasers access to their books through an ordinary browser or some kind of proprietary software. Even if it is the latter, the act of reading the book will involve downloading each page to the computer, and there will probably be some kind of cache involved (where images and other files are saved temporarily to expedite access). The last time I programmed anything it was in OPL (remember that?) but I strongly suspect it would not be hard for a knowledgeable person to devise a program that will intercept the images of the pages and save them automatically.
The Settlement Agreement states that any pages printed will have a watermark identifying them as copyright, and including ‘encrypted session identifying information … which could be used to identify the authorized user that printed the material or the access point from which the material was printed.’ This sounds very secure, but if the page images can be captured, there is no reason why this would matter very much.
Once the images are available, it is an easy matter with modern optical character recognition software to extract the text; indeed, these days there is at least one free online service that will do this, as well as free software that can be downloaded. Even if Google embedded ‘encrypted session identifying information’ in every page, this would be lost in the conversion to text.
Kent Fitch, a programmer at the National Library of Australia, observed on his blog earlier this year that
Given the reality of inevitable piracy of digitised books, the interests of rights holders and Google are seriously misaligned. Google has little incentive to be very worried about piracy, and in any case, they're smart enough to know there's nothing they can do about it. All they need is to sell 40 odd copies (or get equivalent per-book institutional subscription revenue to their book database) and they’re in the black. If the sell 100, they've got a 200% return on investment, whereas the rights holders haven’t even covered the costs of the layout artist.
Digitised books from the Google repository will be pirated and there’s nothing that can be done about it. DRM wouldn’t help a bit, copies will be untraceable, watermarks will be removed.
He also points out that ‘printing’ can as easily be to a file as a printer: it is ‘up to the controller of the system on which printing is done’.
[http://ltmem.blogspot.com/2009/02/goog le-book-settlement-doesnt-address.html]
21. Another security issue was raised in a point made from the floor at the Columbia conference. The participant noted that regardless of the fact that the complete Book Search corpus was only supposed to be accessible from within the US, people outside the US could use a proxy server located within the US to access the service. No one responded to his point, and I think they did not understand it. However, he is right. Proxies are offered as a free service by some websites, and they make territories meaningless. It would probably be hard, even impossible, to fool the Google Book Service into letting one open an account with it from an address outside the US, but it is likely to be an easy job for those so inclined to access the extra preview facilities to be offered under the Settlement.
[http://kernochancenter.org/Googlebooks settlementrecording.htm; http://media.law.columbia.edu/kerno chan/kernochangoogle090313tape3t.html]
For earlier posts on the Google Book Settlement, see:
Google Book Settlement: the Background
How the Google Book Settlement affects European authors and rights-holders: 1
How the Google Book Settlement affects European authors and rights-holders: 2
Google Book Settlement: the Proposed Book Rights Registry 1
Google Book Settlement: Some Clarifications
Google Book Settlement: the Proposed Book Rights Registry 2
Google Book Settlement and 'Unclaimed Funds'
***
19. Under the Settlement Google Inc. proposes to deal with works differently depending on whether or not they are defined as ‘commercially available’ according to the terms of the Agreement. The Agreement defines a book as ‘commercially available’ at a given point if the rights holder or his or her licencee were offering it for sale new in the United States ‘through one or more then-customary channels of trade’. In that case Google will classify the book as ‘in print’ and will not make any ‘display uses’ of it, such as providing previews to searchers, including it in institutional subscriptions, or allowing consumer purchase of online access to it.
The definition of ‘commercially available’ has caused alarm among foreign publishers, since it seems to imply that books in print but not published or directly distributed in the US would be made available by Google to searchers (in preview) and customers (for online access), unless and until the rights-holders registered the works at issue with the Book Rights Registry and changed the settings, or applied to have them completely removed from the book corpus. However, following consultation with the lawyers who negotiated the Settlement on behalf of the AAP, the Publishers Association of the UK has reported that Google plans to classify any book as ‘commercially available’ if it can be purchased new from within the US through a website.
[http://www.googlebooksettlement.com/in
http://www.publishers.org.uk/download.c
see also http://www.copyright.com/media/pdfs/Hea
If Google were to make a mistake in determining whether a book is available in the US, the Agreement lays down one ‘sole remedy’: Google must correct the mistake within 30 days. A lot of damage might be done to the value of a copyright in 30 days. If a rights-holder were to pursue a dispute against Google, over a mistake or a disagreement, their only recourse under the Settlement would be to submit themselves to arbitration by an arbitrator drawn from a pool that had been previously selected by Google and the Book Rights Registry. [see § 17].
[http://www.googlebooksettlement.com/in
20. The Settlement Agreement goes into some detail about the restrictions that will apply if a consumer should purchase a work through Google. The purchaser will be allowed to view the work online, but not to download an electronic copy, or to print more than 20 pages at a time, or copy and paste more than four pages at a time.
This is evidently supposed to reassure rights-holders that their works are safe from piracy. The fact that the Authors Guild and the AAP, on the one hand required these restrictions, and on the other hand were (presumably) satisfied by them indicates a remarkable technological ignorance on their part: which is, perhaps, particularly strange on the part of the publishers. To put it plainly: anything that is displayed on the screen of an ordinary computer can be copied and saved and/or printed by the user. Anyone who thinks otherwise has never investigated the use of the PrintScr key on their keyboard. For greater convenience, there are screen capture programs available online, some of them free to download. Screen capture is a perfectly legal technology with legitimate uses.
In the case of browsers, every image that appears on the screen is saved to the folder of temporary internet files, from which it may be afterwards retrieved. It is not clear whether Google plans to grant purchasers access to their books through an ordinary browser or some kind of proprietary software. Even if it is the latter, the act of reading the book will involve downloading each page to the computer, and there will probably be some kind of cache involved (where images and other files are saved temporarily to expedite access). The last time I programmed anything it was in OPL (remember that?) but I strongly suspect it would not be hard for a knowledgeable person to devise a program that will intercept the images of the pages and save them automatically.
The Settlement Agreement states that any pages printed will have a watermark identifying them as copyright, and including ‘encrypted session identifying information … which could be used to identify the authorized user that printed the material or the access point from which the material was printed.’ This sounds very secure, but if the page images can be captured, there is no reason why this would matter very much.
Once the images are available, it is an easy matter with modern optical character recognition software to extract the text; indeed, these days there is at least one free online service that will do this, as well as free software that can be downloaded. Even if Google embedded ‘encrypted session identifying information’ in every page, this would be lost in the conversion to text.
Kent Fitch, a programmer at the National Library of Australia, observed on his blog earlier this year that
Given the reality of inevitable piracy of digitised books, the interests of rights holders and Google are seriously misaligned. Google has little incentive to be very worried about piracy, and in any case, they're smart enough to know there's nothing they can do about it. All they need is to sell 40 odd copies (or get equivalent per-book institutional subscription revenue to their book database) and they’re in the black. If the sell 100, they've got a 200% return on investment, whereas the rights holders haven’t even covered the costs of the layout artist.
Digitised books from the Google repository will be pirated and there’s nothing that can be done about it. DRM wouldn’t help a bit, copies will be untraceable, watermarks will be removed.
He also points out that ‘printing’ can as easily be to a file as a printer: it is ‘up to the controller of the system on which printing is done’.
[http://ltmem.blogspot.com/2009/02/goog
21. Another security issue was raised in a point made from the floor at the Columbia conference. The participant noted that regardless of the fact that the complete Book Search corpus was only supposed to be accessible from within the US, people outside the US could use a proxy server located within the US to access the service. No one responded to his point, and I think they did not understand it. However, he is right. Proxies are offered as a free service by some websites, and they make territories meaningless. It would probably be hard, even impossible, to fool the Google Book Service into letting one open an account with it from an address outside the US, but it is likely to be an easy job for those so inclined to access the extra preview facilities to be offered under the Settlement.
[http://kernochancenter.org/Googlebooks