The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
There's No Graphy Like Stega-No-Graphy
CNN reports, relying on a Wall Street Journal story,
Music-annotation website Genius is accusing Google of stealing lyrics from its website and publishing them in search results … thus breaking Genius' terms of service and siphoning off traffic.
The Wall Street Journal, which first reported the news, said that a Genius employee noticed the first instance in 2016. Rapper Desiigner's song "Panda" had hard-to-decipher lyrics. So, the company had the rapper transcribe the song for them—and then Genius saw their version being displayed on Google.
To show that Google was allegedly doing this, developers at Genius alternated the lyrics' apostrophes between straight and curly styles in a targeted way: When the apostrophes were converted into Morse code, it spelled out the phrase "red handed," the newspaper said. Genius said it notified Google first in 2017 and as recently as April about the practice.
Google states that it gets its lyrics from a third party, so who did the copying (if there was copying) is not clear. Steganography, for those who don't know the term, is (to quote the American Heritage Dictionary),
The deliberate concealment of data within other data, as by embedding digitized text in a digitized image.
The Morse code component, of course, was just a little bit of flair—the steganography, or, if you prefer, the watermark, would have worked regardless of how the curly quote substitution was arranged. Thanks to Prof. Mark Liberman (Language Log) for the pointer.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
Whoops.
I'll say, this is an increasing problem, the re-reporting of news, events, transcriptions, etc. Ironically, it's a CNN news story that reports from...the Wall Street Journal. (Does CNN do any reporting anymore, or just copy off other papers?).
CNN reporting is more original than your comments on this blog.
Aww, you're cute but you miss the point.
Point is, data acquisition, processing, input, and analysis, no matter the source of the data, is expensive. It takes time, resources, cross-checking, etc. Whether the data source be news stories, or lyrics, scientific data, etc.
It's much cheaper and effective to wait for someone else to do it, then report that (with a slightly different take). Few people want to do the hard work of the actual acquisition, when 90% of the credit can be obtained for just re-reporting or using someone else's work.
Once upon a time, there was value to "getting to the story first". The WSJ would have the paper out a full day ahead of time, and CNN would need to wait until their next reporting session, or another paper would need to wait till their next print run. Now, the rereporting is so fast with the internet that there's not the same value, if any.
In a real sense, it's the issue that spawned copyright and patent laws, updated for the digital world. Data acquisition, processing, input, and analysis has value, even if it's not strictly copyright-able or patent-able. But as long as it's much more efficient to wait for someone else to do the hard work, then re-report and/or use it, then it'll discourage the actual work in the first place.
"Once upon a time, there was value to 'getting to the story first'."
Which created some problems of its own. When journalists are racing to publish first, they're more likely to overlook details or go with incomplete facts.
Mistakes (whether from rushing to print or to air or from something else) gives room for politicians who don't want the truth about them reported to claim that the news media are biased against them, and have a substantial portion of the audience saying "yeah, there they go, reporting bad things about one of my guys again. They're totally biased."
This misses the point, in that just re-reporting stories would keep the incomplete facts and overlooking details.
And all this is explanation for why, when I try to search for a news story to figure out WHAT happened, instead, (or at least far more prominently in search results) I get "X reacts to" or "See what people are saying about X" or "Here's a collection of 15 'random' tweets about X". It is getting increasingly difficult to find just the basic facts of things instead of reactions to them.
It is, and it's an increasing problem.
In many ways, the single most important clause in the US Constitution is "To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries"
Is this shocking? Exploiting other people's IP is a large part of Google's business model.
I'll note that many lyrics companies publish lyrics with near homophones substituted for one or two of the actual words in the song, to the same effect.
And back in the day, Thomas Maps would deliberately make tiny mistakes on each page, for the same reason. I wonder how far back in history this practice goes? (If we look, 20,000 years ago, at the French cave paintings, will we see a bull with 3 horns, added to the scene in order to catch the plagiarist painter 2 caves down?)
...Thomas Guide maps, of course. [no edit button. sigh]
Far as I know, all map makers did the same.
In one case, the small mistake (a non-existent town) was used to make a copyright infringement claim and the defense was that the place actually existed -- which, by that point, it did. Putting the imaginary place on the map brought it into being:
https://www.npr.org/sections/krulwich/2014/03/18/290236647/an-imaginary-town-becomes-real-then-not-true-story
Back in law school, we read the case of the telephone book publishers who inserted fake names and numbers into the phone books they published to catch copiers, only to learn the hard way that collections of data are not copywriteable, only original works of creative expression are.
So the copier should have gotten in trouble for copying the creative, fake names in the phone book.
Writing down song lyrics and pretending you own that transcription reminds me of the early days of fan picture sites, where one site would "rip off" the magazine scans another site scanned in.
"That's my hard work violating copyright. How dare you!"
[…] from Law https://reason.com/2019/06/17/theres-no-graphy-like-stega-no-graphy/ […]
It is amazing just how much information is communicated via steganography: there is an abundance of software which "robs" the least-significant bits of uncompressed and losslessly-compressed image data for use in communicating non-image-related information. Many governments and non-governmental-organizations which would otherwise censor messages do not make the effort to examine the bit sequences within each image and therefore miss censorship opportunities. Self-booting steganography tools, many borrowing techniques from the venerable Kermit software, are increasingly commonplace... which suggests an increasingly commonplace need to avoid censorship.
Back when I was just out of college, and bought my first computer, (Mid 80's.) I wrote a Mandelbrot set program, and for yucks stored the settings for each image by steganography. It isn't actually a difficult technique.
Pernicious esquivalience.
"To show that Google was allegedly doing this ..."
Correction: "To show that Google was doing this ..."
There - fixed.
Their intention was 'to show,' not 'to allegedly show ..." They later 'alleged.'
Saying soandso "alleged" is about protecting one's behind from a lawsuit, and has gotten twisted out of that form and is used when it need not be.
The reporting of this story, unfortunately including here, is bogus. Both Genius and Google license the lyrics from publishers, so even if Google did directly or indirectly copy a few of them (apparently 100 out of over a million) them from Genius, they have a license so it's quite legal.
h/t Techdirt with more details https://www.techdirt.com/articles/20190617/13335342414/dumbest-gotcha-story-week-google-genius-copying-licensed-lyrics.shtml
Not quite as bogus as you're suggesting. Google licensed the right to publish the lyrics, they didn't license from Genius the right to use their work in transcribing them. So there was something stolen, just not the lyrics themselves.
Now, whether there's any legal recourse for Google stealing Genius' work is another question.
Is steganography enough to argue that what was copied was actually a derivative work, and not the original work that Google had licensed?