Data Sharing and Reuse: A Method by the AIRR Community.

Journal: Methods In Molecular Biology (Clifton, N.J.)

High-throughput sequencing of adaptive immune receptor repertoires (AIRR, i.e., IG and TR ) has revolutionized the ability to study the adaptive immune response via large-scale experiments. Since 2009, AIRR sequencing (AIRR-seq) has been widely applied to survey the immune state of individuals (see "The AIRR Community Guide to Repertoire Analysis" chapter for details). One of the goals of the AIRR Community is to make the resulting AIRR-seq data FAIR (Findable, Accessible, Interoperable, and Reusable) (Wilkinson et al. Sci Data 3:1-9, 2016), with a primary goal of making it easy for the research community to reuse AIRR-seq data (Breden et al. Front Immunol 8:1418, 2017; Scott and Breden. Curr Opin Syst Biol 24:71-77, 2020). The basis for this is the MiAIRR data standard (Rubelt et al. Nat Immunol 18:1274-1278, 2017). For long-term preservation, it is recommended that researchers store their sequence read data in an INSDC repository. At the same time, the AIRR Community has established the AIRR Data Commons (Christley et al. Front Big Data 3:22, 2020), a distributed set of AIRR-compliant repositories that store the critically important annotated AIRR-seq data based on the MiAIRR standard, making the data findable, interoperable, and, because the data are annotated, more valuable in its reuse. Here, we build on the other AIRR Community chapters and illustrate how these principles and standards can be incorporated into AIRR-seq data analysis workflows. We discuss the importance of careful curation of metadata to ensure reproducibility and facilitate data sharing and reuse, and we illustrate how data can be shared via the AIRR Data Commons.

Brian Corrie, Scott Christley, Christian Busse, Lindsay Cowell, Kira C Neller, Florian Rubelt, Nicholas Schwab