#CRAM4GH

DNA doesn't take up much space - there is a copy of our entire genome in each of our cells. When a couple of genomes are sequenced, the information takes up enough space to fill a standard laptop 💻 @GA4GH @BonfieldJames #BigData #CRAM4GH #CRAM https://t.co/eMNHchURve

Warning: specification geekery ahead. #CRAM4GH I've finally submitted the CRAM v3.1 draft spec. It's a bit of a monster! (Sorry) I suspect this is going to take time to go through. 20-30% size reductions are realistic for "archive mode". https://t.co/FBxBQQWpyz

DNA doesn't take up much space - there is a copy of our entire genome in each of our cells. When a couple of genomes are sequenced, the information takes up enough space to fill a standard laptop @GA4GH @BonfieldJames #BigData #CRAM4GH #CRAM https://t.co/yGiJa63nnj

DNA doesn't take up much space - there is a copy of our entire genome in each of our cells. When a couple of genomes are sequenced, the information takes up enough space to fill a standard laptop @GA4GH @BonfieldJames #BigData #CRAM4GH #CRAM https://t.co/WmawD8Z8NT

Software developers @sangerinstitute, @emblebi and beyond, including @BonfieldJames, have been developing custom algorithms to store the #bigdata that DNA sequencing produces @GA4GH #CRAM4GH #CRAM #DataCompression #DNAsequencing https://t.co/i2YXNCfeDr

A blog update on CRAM is well overdue, so tonight I decided to give some updates on the progress for CRAMv3.1, and musings on the nature of compression ratio vs time vs memory. #CRAM4GH https://t.co/N5BJyVmzay

DNA doesn't take up much space - there is a copy of our entire genome in each of our cells. When a couple of genomes are sequenced, the information takes up enough space to fill a standard laptop @GA4GH @BonfieldJames #BigData #CRAM4GH #CRAM https://t.co/hdeHx5Yflc

GA4GH
a year ago

AI identifies risk for certain genetic disorders, #CRAM4GH Twitter Chat Recap, and more Genomics and Health News for April 8 - 15, 2019 - https://t.co/vji6Vp5OXi

GA4GH
a year ago
Missed the #CRAM4GH Twitter chat last Friday? View a recap of the conversation here: https://t.co/wK5aUaPBav https://t.co/g6pmYMMsFo

Missed the #CRAM4GH Twitter chat last Friday? View a recap of the conversation here: https://t.co/wK5aUaPBav https://t.co/g6pmYMMsFo

Software developers @sangerinstitute, @emblebi and beyond, including @BonfieldJames, have been developing custom algorithms to store the #bigdata that DNA sequencing produces @GA4GH #CRAM4GH #CRAM #DataCompression #DNAsequencing https://t.co/KpW3Vo7fEA

@cmdcolin @drtkeane @GA4GH The original paper (Fritz, @ewanbirney, et al) did mention the idea of assemblies of the reads that didn't map to the reference in order to create novel embedded references (large insertions, contamination, etc). It's an idea I'd like to explore. #CRAM4GH

@cmdcolin @drtkeane @GA4GH It's useful if the reference will be used once only, eg a denovo assembly. Note CRAM (currently) can only embed one reference per "slice", which harms efficiency a little. SAM/BAM have no analogue as they don't use a reference for compression. #CRAM4GH

colin
a year ago

@drtkeane @GA4GH @BonfieldJames Sorry to jump in, what is the idea of embedded references? any reading material? does it have any analog in SAM format? #CRAM4GH

@ewanbirney @TechnicalVault @GA4GH Instrument manufacturers need to ponder this too. Eg if you started with a 16-bit ADC, do some processing before writing out a 32-bit float, you still only really have ~16bits of information and 16bits of noise. Is it really "lossy" if we quantise them? #CRAM4GH

GA4GH
a year ago

Thanks to everyone who participated in the #CRAM4GH twitter chat! To learn more about the #CRAM #fileformat and how you can adopt it in your own pipelines and workflows, visit https://t.co/ijZl22zYmI

Thank you all. Signing off with a philosophical view. It has been said that data compression is an artificial intelligence problem. Truly understand the data, and you'll know be able to describe it in the most succinct manner. #CRAM4GH

@GA4GH @ewanbirney @drtkeane CRAM certainly makes use of @ga4gh refget for retrieving any remote references (optional - you can use local ones too). CRAM is also supported as an on-the-wire format by the htsget protocol. Note htsget can also support on-the-fly file format conversion. #CRAM4GH

@BonfieldJames @TechnicalVault @GA4GH But ... I think there’s plenty of innovation to do here. In some sense this is much a “responsible data model” issue as it is a compression question #CRAM4GH

@GA4GH @kauralasoo @ewanbirney @drtkeane Good question, and sadly I've no idea on the answer. At the very least I'd expect existing lossy compression methods (eg crumble, qvz2) to not be detrimental, but as with everything you need to test, test, test! #CRAM4GH

Next Page