Zachary J. Bass, Ashley C. Ulrich, Ryan B. McLeod, Kevin Qiao, Joanne Dynak, Lvxiao Chen, Garrett Heller, Minyoung Ryoo, Magdalena Christoforou, Jesse Kirkland, Neil Chitrao, Patrick A. Reed, Amanda Gonzalez Burton, Siddra Shah, Nicholas J. Isaacson, Jerrit Yang*
Download a PDF of this article here.
For some time, legal academia has experienced an increase in articles utilizing empirical analysis.1 Never before has fluency in statistical methods been more important. Whether it is collecting datasets of court decisions to analyze policing trends2 or using natural language processing to analyze the likely replicability of patented inventions,3 legal scholars are using these tools to arrive at results that disrupt conventional wisdom and uncover doctrinal patterns.
However, student-edited legal journals have largely failed to adapt their editorial systems to empirical works.4 Although law reviews have agreed on a common citational system,5 there exists no customary practices for validating statistical findings in published legal academia. This gap is exacerbated by the fact that journal editors are normally students lacking the necessary expertise to properly validate raw data, which is why some legal journals choose not to validate in the first place.6
This is a serious problem. Legislators, judges, and lawyers commonly cite to inferential legal studies when crafting policy, making decisions, and putting forward arguments.7 Whereas practitioners already adept at Stata or R may be able to access an author’s raw data and recreate its results, others may be wary of relying on empirical studies without assurance in their accuracy. Even worse, they may cite to these studies without knowing that they are statistically invalid. Something must be done. It is time that legal journals fill this methodological gap by entering into commonly accepted practices for validating empirical legal works. Our community deserves to be confident that what it reads has been properly vetted.
For Professor Barton Beebe’s article in particular (published in the fall edition of the 10th volume of our journal), the author worked with several research assistants to code various attributes associated with 579 cases—case disposition, venue, treatment of fair use factors, etc. Beebe then performed a number of regressions and other statistical analyses to discern trends and relationships in the underlying data. The critical findings to his article, of which there are many, are based on copyright cases decided over 40 years and in every judicial district. But the robustness of these results hinges on the initial accuracy of the data coding.
Reviewing the initial data coding to articles like Professor Beebe’s is very much within the skillset of law journals. Reviewing cases to confirm disposition, venue, treatment of fair use factors, etc. is merely an extension of the work that law journals already take on. The only difference, then, is the scale to the work. Realistically, JIPEL and most other law journals do not have the resources to validate the data coding for 579 cases, especially where each case averages over 12 pages in length and is associated with over 100 data inputs.
Instead, JIPEL worked with an economist to devise what it believes is a defensible and reproducible strategy that other journals can undertake when reviewing the underlying data to similar statistics-based articles: reviewing a representative sample set of the data coding.8 A summary of this process can be seen in Appendix 1 of Beebe’s work.
We acknowledge that this method is not suitable for every empirically-focused article. Nevertheless, we believe that making it available may help other journals move forward along the path toward adopting more rigorous and standardized review for the underlying data and assumptions in empirical legal works.