Editorial, The Need for Collective Standards: Validating Raw Data in Legal Empirical Analysis

ByEditorial Board

Dec 31, 2020

Zachary J. Bass, Ashley C. Ulrich, Ryan B. McLeod, Kevin Qiao, Joanne Dynak, Lvxiao Chen, Garrett Heller, Minyoung Ryoo, Magdalena Christoforou, Jesse Kirkland, Neil Chitrao, Patrick A. Reed, Amanda Gonzalez Burton, Siddra Shah, Nicholas J. Isaacson, Jerrit Yang*

Download a PDF of this article here.

For some time, legal academia has experienced an increase in articles utilizing empirical analysis.¹ Never before has fluency in statistical methods been more important. Whether it is collecting datasets of court decisions to analyze policing trends² or using natural language processing to analyze the likely replicability of patented inventions,³ legal scholars are using these tools to arrive at results that disrupt conventional wisdom and uncover doctrinal patterns.

However, student-edited legal journals have largely failed to adapt their editorial systems to empirical works.⁴ Although law reviews have agreed on a common citational system,⁵ there exists no customary practices for validating statistical findings in published legal academia. This gap is exacerbated by the fact that journal editors are normally students lacking the necessary expertise to properly validate raw data, which is why some legal journals choose not to validate in the first place.⁶

This is a serious problem. Legislators, judges, and lawyers commonly cite to inferential legal studies when crafting policy, making decisions, and putting forward arguments.⁷ Whereas practitioners already adept at Stata or R may be able to access an author’s raw data and recreate its results, others may be wary of relying on empirical studies without assurance in their accuracy. Even worse, they may cite to these studies without knowing that they are statistically invalid. Something must be done. It is time that legal journals fill this methodological gap by entering into commonly accepted practices for validating empirical legal works. Our community deserves to be confident that what it reads has been properly vetted.

For Professor Barton Beebe’s article in particular (published in the fall edition of the 10th volume of our journal), the author worked with several research assistants to code various attributes associated with 579 cases—case disposition, venue, treatment of fair use factors, etc. Beebe then performed a number of regressions and other statistical analyses to discern trends and relationships in the underlying data. The critical findings to his article, of which there are many, are based on copyright cases decided over 40 years and in every judicial district. But the robustness of these results hinges on the initial accuracy of the data coding.

Reviewing the initial data coding to articles like Professor Beebe’s is very much within the skillset of law journals. Reviewing cases to confirm disposition, venue, treatment of fair use factors, etc. is merely an extension of the work that law journals already take on. The only difference, then, is the scale to the work. Realistically, JIPEL and most other law journals do not have the resources to validate the data coding for 579 cases, especially where each case averages over 12 pages in length and is associated with over 100 data inputs.

Instead, JIPEL worked with an economist to devise what it believes is a defensible and reproducible strategy that other journals can undertake when reviewing the underlying data to similar statistics-based articles: reviewing a representative sample set of the data coding.⁸ A summary of this process can be seen in Appendix 1 of Beebe’s work.

We acknowledge that this method is not suitable for every empirically-focused article. Nevertheless, we believe that making it available may help other journals move forward along the path toward adopting more rigorous and standardized review for the underlying data and assumptions in empirical legal works.

Footnotes

*Volume 10 Editorial Board of the NYU Journal of Intellectual Property and Entertainment Law (JIPEL).

See, e.g., Michael Heise, An Empirical Analysis of Empirical Legal Scholarship Production, 1990–2009, 2011 Ill. L. Rev. 1739 (2011) (describing a growth of empirical methods being used in legal scholarship from 2000s through 2010s) (citing Shari Seidman Diamond & Pam Mueller, Empirical Legal Scholarship in Law Reviews, 6 Ann. Rev. L & Soc. Sci. 581 (2010) (finding in a review of 60 law review volumes published between 1998 and 2008, nearly half of law review articles included some empirical content, although original research was less common)).
Joanna C. Schwartz, How Qualified Immunity Fails, 127 Yale L.J. 2 (2017) (analyzing the role qualified immunity plays in constitutional litigation from a review of the dockets of 1,183 cases filed against state and local law enforcement defendants in five federal court districts over a two-year period).
Janet Freilich, The Replicability Crisis in Patent Law, 95 Indiana L.J. (2020) (analyzing 500 patents and patent applications using methodological quality of experiments as a proxy for their reproducibility and finding that many experiments are probably not reproducible).
See, e.g., Kathryn Zeiler, The Future of Empirical Legal Scholarship: Where Might We Go from Here?, 66 J. Legal Educ. 78, 78 (2016) (“This is partly because law review editorial boards, usually comprising solely law students, do not systematically require expert review of submitted work.”).
See The Bluebook: A Uniform System of Citation (Columbia L. Rev. Ass’n et al. eds., 21 ed. 2020).
We have spoken with multiple law professors who explained that they have never had their raw data validated by the legal journal that accepted their work for publication.
See Lee Epstein & Gary King, The Rules of Inference, 69 U. Chi. L. Rev. 1, 2, 4-6 (2002) (“[R]esearch that offers claims or make inferences based on observations about the real world—on topics ranging from the imposition of the death penalty to the effect of court decisions on administrative agencies to the causes of fraud in the bankruptcy system to the use of various alternative dispute mechanisms—can play an important role in public discourse . . . and can affect our political system’s handling of many issues.”) (citing Ronald J. Tabak, How Empirical Studies Can Affect Positively the Politics of the Death Penalty, 83 Cornell L. Rev. 1431, 1431 (1998)).
A special thanks to a friend of JIPEL, economist Alissa Ph.D., for her assistance and advice in helping JIPEL to architect this data validation exercise.