DNA-based data storage and computation company Catalog Technologies reported today what it said is “a historic breakthrough in DNA computation by demonstrating the ability to search data stored in DNA in a massively parallel and scalable manner with resource usage almost independent of the data size.”
The company said this demonstration is a result of CATALOG’s collaboration with potential customers and partners in understanding their use cases. With its platform, and using text as an example, CATALOG was able to show how chemistry could be leveraged to compute over archives in parallel.
In September, using a “combinatorial writing scheme,” CATALOG encoded approximately 17,000 words from Shakespeare’s Hamlet into DNA in a few minutes on Shannon, CATALOG’s flagship writer. On this DNA archive, CATALOG performed a parallel search computation and successfully retrieved all occurrences of a query word. The company said its approach required no complex pre-processing. “Instead, CATALOG’s approach leveraged the massively parallel nature of chemistry to retrieve all occurrences of the query word in a number of steps that is almost independent of the size of the dataset,” the company said.
Then, last month, in a scalability demonstration, CATALOG encoded approximately 200,000 words of eight Shakespeare tragedies into DNA and successfully retrieved all occurrences of a query word in all eight plays using approximately the same number of chemical computing steps, time and resources as the initial Hamlet search, according to the company. CATALOG is on track to demonstrate this search scalability on data sets containing over 100 million words by mid-2023. CATALOG’s “approach shows, for the first time, how to leverage the massive parallelism of DNA chemistry to search almost any amount of data stored in DNA without the expected proportional increase in resources,” the company said.
“It’s great to see actual demonstrations of using DNA for computing,” said Earl Joseph, CEO, Hyperion Research. “DNA-based chemistry is an intriguing medium for advancing both storage and compute-related solutions. CATALOG’s recent demonstration of the ability to perform a common compute function that can scale in capability without a commensurate need to scale resources shows the progress in creating DNA-based solutions and CATALOG’s unique technology to implement DNA-based computing.”
In explaining its demonstration, CATALOG said search is a foundational element of computing. When searching on the internet, queries are often returned quickly because of the time-consuming and costly process of indexing data. However, over 90 percent of enterprise data is unstructured, making it expensive and sometimes impossible to search effectively. This is a critical barrier in cases where a lack of timely search results can lead to missed insights that can have costly long-term implications in many industries, including oil and gas, finance and government.
In recent years, the company said, the IT industry has witnessed a proliferation of purpose-fit technologies, including accelerators like GPUs, quantum computers and extreme parallel computers.
This performance and scale, however, comes at the expense of higher energy consumption, larger memory and long-term storage demands, and higher management complexity. This has generated tremendous interest and momentum in chemistry-based DNA computing systems, which have a far smaller physical footprint, consume orders of magnitude lower energy, and are resistant to traditional electronic security vulnerabilities.
While many in research and academia are developing approaches to use DNA as a storage platform for archival purposes, CATALOG’s proprietary approach to encoding data in DNA is uniquely positioned for computing at scale to gain critical insights into data stored in DNA, according to CATALOG. Many researchers and labs testing DNA-based storage focus on storing information densely inside the DNA molecule. CATALOG turns this idea on its head and stores information in a specific collections of DNA molecules. Unlike other approaches, this allows CATALOG latitude in designing the DNA sequence that is optimal for computing and to make writing orders of magnitude more efficient.
In addition to proving DNA computing capability, with this achievement CATALOG has also demonstrated how powerful computing capabilities can increase the efficiency and cost-effectiveness of reading data back from DNA – currently a significant challenge for the field – by orders of magnitude.
“This historic and transformational achievement is based on years of work with partners and collaborators that helped make DNA-based computation a reality,” said Hyunjun Park, Ph.D., founder and CEO at CATALOG. “With the advantages of DNA-based data storage and computation demonstrated, we now turn our attention to addressing more sophisticated applications from signal processing to machine learning over massive datasets. In parallel, we are working closely with partners and collaborators to reduce the size and complexity of our platform and to identify specific workloads to target commercial offerings.”
CATALOG said it is accelerating the vision of DNA computing by making advances in DNA computing algorithms and applications with potential commercial use in areas including artificial intelligence, machine learning, data analytics, and secure computing. In addition, CATALOG is developing solutions for DNA-based information security, a rack-sized and desk-sized DNA data storage and computation platform, DNA data storage as a service, and a DNA data storage and computing API.