This week got a very positive message from one of our software developers working on the InnerBuddies start-up. He is working on a new proprietary InnerBuddies A.I. model that can more accurately identify the bacteria (on genus level) which are included in microbiome samples.
There are many different calculation models which are used globally in DNA sequencing. Based on a DNA sequence (a sequence of amino acids) they can compare these sequences with databases of “known bacterial sequences” and they try to find the “best match” to determine which bacteria has been found. The golden standard that is used by universities around the world is the QIIME2 library. However, there is not a lot of information available on how accurate these calculation libraries really are. The only way to know which bacteria are really included in a sample is by buying isolated bacteria on the market, by mixing them and then run the DNA sequencer on the mix. However, this is very expensive and therefore not scalable to large sample numbers (to be able to calculate a proper accuracy number).
To be able to properly benchmark the different calculation libraries, InnerBuddies has previously invested in building a synthetic microbiome sample generator. This tool takes known bacterial signatures, puts them together inside a mix and uses an A.I. model to generate mutations on that mix. The mutations A.I. model has been trained to simulate the biological mutation process and also the errors generated by the sequencing machine. By using this tool we generated 24.000 synthetic DNA scans (with known “truth values”) and we then used this dataset to benchmark QIIME2 (the industry standard), KRAKEN2 (a new model from 2021 that we also implemented in our pipeline) and our newly developed proprietary InnerBuddies A.I. model. And guess what? Our team has managed to build an extremely accurate DNA analysis method.
The below comparison results are based on 24.000 DNA scans of the V3 and V4 areas (roughly 450 base pairs) of the 16S rRNA genome. Accuracy has been calculated on the genus level. Our own proprietary InnerBuddies calculation pipeline will shortly become available in the InnerBuddies portal and API services. The InnerBuddies method is completely different from current methods used in DNA sequencing analysis. It utilizes natural language processing techniques to analyse DNA sequences.