By Diagnostics World Staff
May 23, 2017 | When playing music on iTunes the computing is invisible, and the same should be true when analyzing biodata Misha Kapushesky, CEO of Genestack, will argue at Bio-It World Conference & Expo. The company has developed a set of easy-to-use tools that allows the collection and aggregation of data across various sources - public and internal - and then organizes, searches, analyzes, and visualizes it without the need for sophisticated programming knowledge.
Pain points common to all
“The role of the bioinformatician is disappearing,” asserts Kapushesky in a statement. “For too long we have expected biologists to either learn programming languages like Python, or to use computer analysts, but now we have the tools to make this process invisible and this will free the life scientists to concentrate on interpreting the information.”
Kapushesky sees the new pain point of drug discovery and genomics as data management: “Getting a grasp of the data is difficult. People want to find out where all the data is, both within the organization and in public repositories, and then figure out what is relevant to their research. Data is created in different ways so you need to understand its provenance and find ways to make it shareable by different R&D groups.”
Kapushesky understands these issues well. When he joined the European Bioinformatics Institute in 2002 he found that the functional genomics scientists lacked the infrastructure and tools required to do their work. He set about creating an infrastructure that would support the team, such as building a large expression data repository that would support various queries and interface with other data sources and applications.
“Talking to colleagues across pharma and consumer products companies I quickly realized that there are a number of pain points that are common across different industries,” he recalls. “So I set up Genestack to create an ecosystem that would bring together data, tools and knowledge and make the type of system that I had developed at EBI accessible to teams of different sizes.”
Powerful way to describe data
The company has now been going for five years and has many household names in its client portfolio. A major feature of Genestack’s system is the flexibility it offers: it can be set up on the cloud or as part of an in-house system, adding value to legacy investments. It is not a closed system so clients can add their own tools and analysis types.
Kapushesky explains that clients don’t want to get tied into proprietary systems. He says: “What Genestack offers is a really powerful mechanism for describing data - this could be projects, experiments, studies, or individual data types such as a chemical structure or dose. The beautiful feature is that once you find your data you can do analysis ‘on the fly’ and get immediate results.”
Genestack is built so that different data types have a descriptor. For example if it is a sequencing assay you can define its type, whether it is array or vcf data, its source - such as from a private or public ontology - and its attributes. These templates control how the data is described and can be shared with partners and collaborators. Once created, the templates are tested with sample data and a validation report is produced to ensure the data has been described as desired.
Easy to use tools
Kapushesky continues: “We have developed lots of tools to help people create data sensibly, with features such as the autocompletes that recognizes if the data is a dose, unit, or time point.”
Within Genestack, researchers can create their own pipelines to structure their analyses and dependencies. These are recorded with an audit trail so for every result there is a precise provenance of how it was created. This allows pipelines to be tested on a small data set and then rerun on thousands of samples easily.
Searches could include, for example, “find all chemical compounds that are structurally similar and provoke a similar gene signature with response to treatment”. This search in Genestack would include public drug response data and could also be extended to include the client’s own data, with appropriate permissions.
Widely applicable
Genestack’s technology is applicable to a wide range of data sources, including agri-genomics, where it has been used in a recent Innovate UK project with Rothamsted Research, the world’s oldest agricultural research organization. The institute had developed tools that linked traits to phenotypes but it was only able to update the tools every six months. Genestack migrated the tools onto its platform, which allowed the scientists to create gene-trait networks using data on public networks in hours instead of months.