RICHARD KADREY, et al., Plaintiffs, v. META PLATFORMS, INC., et al., Defendants Case No. 23-cv-03417-VC (TSH) United States District Court, N.D. California Signed December 20, 2024 Filed January 17, 2025 Hixson, Thomas S., United States Magistrate Judge PUBLIC VERSION OF DISCOVERY ORDER AT ECF NO. 351 Re: Dkt. Nos. 308, 309, 321, 334, 335, 336 The Court addresses ECF Nos. 308, 309, 321, 334, 335 and 336 as follows. A. ECF No. 308 1. RFP 118 Plaintiffs’ RFP 118 requested “[a]ll Documents and Communications, including source code, relating to any efforts, attempts, or measures implemented by Meta to prevent Llama Models from emitting or outputting copyrighted material.” In the joint discovery letter brief, Plaintiffs move to compel the data mentioned in Sections 3 and 4.2 of Meta’s Llama 2 paper and the data mentioned in Sections 4.2 and 5.4.3 of Meta’s Llama 3 paper. At the hearing, Plaintiffs made clear they want certain particular data referred to in those sections, not all of it. Accordingly, as discussed at the hearing, the Court ORDERS the parties to file a supplemental joint discovery letter brief concerning this RFP by December 23, 2024.[1] 2. RFP 119 Plaintiffs’ RFP 119 requested “[a]ll Documents and Communications, including source code, relating to the processing of copyrighted material used in training Llama Models, including storage and deletion of copyrighted material.” Plaintiffs move to compel Meta to (1) produce or identify all copies it made of copyrighted works, including but not limited to Plaintiffs’ works, (2) search the custodial files of its 15 custodians, plus relevant non-custodial databases (which they contend include work email, Workplace Chat and WhatsApp) for documents involving Llama and (a) the removal of copyright management information from literary works or (b) torrenting of data that includes literary works. There are a number of problems with this motion to compel. The first portion of it (all copies of copyrighted works) seeks things not requested by this RFP, which concerns “the processing of copyrighted material used in training Llama Models.” The Court also does not see how all copies of copyrighted works is proportional to the needs of this case, which is about the use of copyrighted materials to train the Llama models, not all copyright infringement committed by Meta. During the hearing Plaintiffs offered to narrow their request to datasets that include Plaintiffs’ copyrighted works (rather than everyone’s copyrighted works), but that limitation did not change the fact that the RFP did not ask for that, or that datasets not used to train the Llama models are not relevant or proportional to the case. With respect to the second part of this motion to compel, the Court does not agree that work email, Workplace Chat and WhatsApp should be treated as non-custodial sources. That would effectively blow up the custodial limitations in the ESI Order (ECF No. 101). Plaintiffs are candid that this is exactly what they are trying to do. They want email searches done for everyone at Meta who works in AI, which Plaintiffs say is a thousand people and Meta says is two thousand. The Court would never have granted that relief no matter when Plaintiffs asked for it, but asking for the number of custodians to be increased from 15 to a thousand (or two thousand) nine days before the close of fact discovery leaves one to wonder what is going on here. However, for the 15 custodians the Court agrees that custodial files regarding Llama and removing copyright management information from literary works are responsive to this RFP and relevant and proportional to the needs of the case. The removal of CMI is relevant to willfulness, for example. The Court does not see how torrenting is responsive to this RFP, which is about the processing of data, not its acquisition. As noted below, the Court does think torrenting is relevant to the case, but it is not responsive to the RFP Plaintiffs moved to compel on. Accordingly, the Court ORDERS Meta to search the custodial files for its 15 custodians and produce documents and communications regarding Llama and stripping or removal of CMI from literary works. The Court limits this order to custodial documents because the only alleged non-custodial sources Plaintiffs referred to in the joint discovery letter brief are ones the Court declines to treat as non-custodial. The Court otherwise DENIES Plaintiffs’ motion to compel. B. ECF No. 309 Plaintiffs argue that Meta has improperly redacted non-privileged business-related communications. Plaintiffs argue (correctly) that advice from in-house counsel concerning business matters is not privileged. Plaintiffs also seem to argue that the “primary purpose” test has to be measured against a document as a whole, and that the attorney-client privilege cannot apply to part of a document. But that’s not correct. “That the document as a whole addresses predominantly business matters does not negate the privilege as to the portion containing requests for legal advice.” United States v. Chevron Corp., 1996 WL 444597, *2 (N.D. Cal. May 30, 1996). “Thus, despite the overall nature of the document, the client may assert the attorney-client privilege over isolated sentences or paragraphs within a document.” Id.; see also In re Meta Pixel Healthcare Litigation, 2024 WL 3381029, *6 (N.D. Cal. July 10, 2024) (approving redactions for attorney-client privilege). The Court ordered Meta to submit the documents in question for in camera review. Having performed an in camera review, the Court SUSTAINS Meta’s privilege claims as to Meta-Kadrey_00146534, Meta-Kadrey_00146557 and Meta-Kadrey_00146583. With respect to Meta-Kadrey_00152812, the Court SUSTAINS Meta’s privilege claim as to the redaction on Meta-Kadrey_00152826 but OVERRULES Meta’s privilege claim as to the redaction on MetaKadrey_00152834, which describes an action the legal department took and does not contain legal advice. For the same reason, in Meta-Kadrey_00152994, the Court OVERRULES Meta’s privilege claim as to the reaction on Meta-Kadrey_00153003. The Court SUSTAINS Meta’s privilege claims as to Meta-Kadrey_00153393, Meta-Kadrey_00153799, Meta-Kadrey_00154472, Meta-Kadrey_00154479 and Meta-Kadrey_00154729. The Court OVERRULES Meta’s privilege claim as to the redaction in Meta-Kadrey_00155464. Nothing in the document or Meta’s privilege log indicates that the referenced approval was not a business decision. The language about legal issues appears to be the author’s personal opinion, and Meta does not say in its privilege log that the author is an attorney. Meta’s privilege log says that this redaction reflects legal advice, but the Court disagrees. The Court SUSTAINS Meta’s privilege claim as to MetaKadrey_00155715. For Meta-Kadrey_00156178, the Court SUSTAINS Meta’s privilege claim for the first bullet point under “Initial Convo” on Meta-Kadrey_00156185 but OVERRULES Meta’s claim of privilege for the redactions beneath that on that page because they are a summary of what TripAdvisor said. The Court SUSTAINS Meta’s claim of privilege as to page MetaKadrey_00156189. C. ECF No. 321 1. Search Terms Plaintiffs propose to add five search strings. However, Meta is correct that Plaintiffs’ proposal does not comply with the ESI Order (ECF No. 101). A consequence of that is that the Court does not know if these search strings are any good. The ESI Order provides: “If the search terms proposed by the Requesting Party have an unreasonably high or overbroad yield, the Producing Party may review a randomly generated 95/5 confidence level/margin of error sample set of documents to determine the overbreadth of the proposed search terms. Where appropriate, the Producing Party may develop alternative search terms that are more narrowly tailored to capture the relevant, responsive, non-privileged documents from the additional Null Set Sample, and provide a hit report on those terms.” The idea is that the search terms are supposed to be informed by whether they have an unreasonably high or overbroad yield, but that analysis hasn’t happened yet. Plaintiffs are asking the Court to order Meta to use certain search strings in an informational vacuum. Plaintiffs complain that Meta disclosed its search terms six weeks after Plaintiffs requested them, and more generally that Meta was slow and unresponsive during meet and confer. Meta disagrees. If Plaintiffs thought Meta was acting too slowly, the better course of action would have been to file a discovery letter brief asking the Court to order Meta to disclose its search terms sooner, and more generally to speed up the process under the ESI Order. Instead of doing that, Plaintiffs seek to bypass any vetting of their proposed search terms. The Court is not going to do that. Plaintiffs’ motion to compel the use of their search terms is DENIED. 2. Other Issues As discussed above, Meta’s email and Workplace chat are custodial sources, and the Court rejects Plaintiffs’ argument to the contrary. The Court will not effectively abolish custodial limitations, as Plaintiffs request. There is nothing wrong with Meta collecting WhatsApp messages from those custodians who said they may have relevant messages. The Court once again declines to expand the time frame for document production by a year and a half in response to a joint discovery letter brief filed just a few days before the close of fact discovery. Plaintiffs’ motion to compel is DENIED as to these issues. D. ECF No. 334 1. Clawed Back Documents Meta clawed back pages 93506 and 93507 from Meta_Kadrey_00093499 during a deposition, then clawed back the same pages from four duplicate versions of the document (Meta_Kadrey_00079969, Meta_Kadrey_00093389, Meta_Kadrey_00093430 and Meta_Kadrey_00093446). Following in camera review, the Court SUSTAINS the claw back. Those pages contain legal advice and are privileged. And there is no reason to think that the production of these documents without those pages redacted was anything other than inadvertent. 2. Meta’s Decision to Stop Licensing Efforts for Llama Text Data Meta’s Sy Choudhury testified that the decision to pause licensing efforts for Llama text data was made in a meeting he had in April 2023 with his boss Marc Shedroff and Meta’s attorney Natascha Parks. Subsequent declarations indicate it was two meetings and the attorney was Morvarid Metanat. In any event, Choudhury acknowledged that “[i]t was a multifaceted decision of which part of this included technical concerns, business concerns, and legal concerns both for -- you know, and so it was a multifaceted decision.” However, several times during his depositions he claimed that the entire meeting was attorney-client privileged. It is true that in other portions of his depositions he did go into some of the business reasons for the pause. But the Court is concerned that the witness’s testimony about the business reasons for the pause may be incomplete because the witness repeatedly claimed that the whole meeting was privileged. Meta is entitled to invoke the attorney-client privilege to shield the legal advice requested or received at that meeting (or meetings), but that’s it. Where, as here, a business decision was made for a combination of legal and business reasons, the business reasons are not privileged. Putting a lawyer in a meeting does not make everything privileged. The Court GRANTS Plaintiffs’ motion to compel and ORDERS Meta to make Choudhury available for an additional two hours of deposition, one hour as a 30(b)(1) witness and one as a 30(b)(6) witness, regarding Meta’s April 2023 decision to pause or cease Meta’s licensing efforts for text data for use with Llama. E. ECF No. 335 Under the existing pleadings, torrenting is relevant because that’s how Meta acquired LibGen. Plaintiffs have not shown that Meta’s 30(b)(6) witness on this topic (Michael Clark) was unprepared to testify about torrenting. Plaintiffs include a subheading in the joint discovery letter brief that says: “No Meta Witnesses Were Adequately Prepared to Testify About Torrenting.” However, in reviewing the argument they make, their actual complaint is that no witnesses were prepared to testify about seeding, not torrenting. The Court understands from the cited testimony that a requirement of torrenting is that the participant also agree to seeding (i.e., you can’t just take data, you have to supply some in return). Plaintiffs have shown that Meta’s 30(b)(6) witnesses were unprepared to discuss Meta’s seeding in any level of detail. However, the Court does not see that seeding falls within any of Plaintiffs’ 30(b)(6) deposition topics. The Court also does not see how details about seeding are relevant to the claims in the existing pleadings. Accordingly, Plaintiffs’ motion to compel is DENIED. F. ECF No. 336 Plaintiffs contend that Meta is abusing the attorney-client privilege with respect to redactions made to documents concerning its mitigation efforts. Plaintiffs challenge the redactions Meta has made to a number of documents listed in Exhibit C to the joint discovery letter brief. Meta has submitted most of those documents for in camera review (it appears that Meta_Kadrey_00051427 was omitted). The Court has reviewed about half of these documents in camera, which was a time consuming task. In the documents the Court reviewed, the Court saw no problems with Meta’s privilege redactions. The Court concludes that further in camera review is unwarranted, as there is no indication that Meta is abusing the privilege. The Court also does not see how Meta’s restrained invocation of the privilege gives rise to a plausible “sword and shield” problem. Plaintiffs’ motion is DENIED. IT IS SO ORDERED. Dated: December 20, 2024 Footnotes [1] At the hearing Plaintiffs explained that the particular dispute concerning the organization of Meta’s source code that they raised in ECF No. 308 is no longer live.