data - Digital Science

Fragmented knowledge in pharma: Bridging the divide between private and public data

David Ellis — Mon, 22 Jan 2024 11:45:54 +0000

Despite the increasing availability of public data, why are so many pharma and life sciences organizations still grappling with a persistent knowledge divide? This discrepancy was a focal point at the recent BioTechX conference in October, Europe’s largest biotechnology congress that brings together researchers and leaders in pharma, academia and business. Attendees and presenters voiced the same concern: the need to connect data from different sources and all internal corporate data through one, integrated semantic data layer.

For example, combining global public research with proprietary data would provide pharmaceutical companies with valuable knowledge that could help drive significant advancements in research and product development. Linking to public knowledge can enrich existing internal research with metadata and contextual meaning, and equip decision-makers with new insights, context and perspectives they might not have had access to originally, leading to more strategic and informed decisions. For instance, it could help fast-track target discovery and reduce research costs or streamline processes from R&D to clinical trials to market access.

Additionally, this integration of data through a semantic layer unlocks the potential to drive many AI solutions across the pharma value chain and embed a layer of trustworthiness and explainability in these applications. Having reliable and precise AI solutions is critical, especially in the pharma sector, which deals with sensitive and high-stakes use cases, such as using AI to discover hidden relations between drugs, genes, diseases, etc., across multiple datasets, clinical trials or publications, for example.

Using fragmented data could cause AI to miss essential connections, or at worse lead to inaccurate predictions regarding drug interactions or outcomes.

Several root causes contributed to this gap between the public and private data spheres, including an absence of suitable infrastructure and technology to connect disparate data and global public research, and a lack of tools to contextualize retrieved data and derive meaning from it.

Integrating public knowledge

Private and public data in pharma and life science

Although a considerable amount of research is public and open, many companies don’t have the necessary software or technology (though these are now widely available) to access these vast datasets, while for other research they require a license for full access to the available literature.

Even when full access is permitted, there is still a time and labor expenditure that impedes the immediate use of this data. For example, for companies that have dedicated internal resources to deep dive into scientific literature reviews, these literature research reports can take weeks or even months to complete because a (human) worker will need to review hundreds or thousands of these sources manually. Additionally, new data is being published constantly, making it difficult to keep up with emerging research.

Many pharma, biotechnology or medical devices organizations will also outsource research work to Contract Research Organizations (CROs) that carry out services such as clinical trials management or biopharmaceutical development, with the aims of simplifying drug development and entry into the market. However, collaborating with CROs can require extensive back-and-forth, from sharing data and results to constant meetings and email communication. As a result, the fragmentation between the company and CRO can lead to notable delays, miscommunication, or at worse, culminate in incorrect decision-making.

Fragmentation within public research

A divide exists not only between private and public data, but this fragmentation also occurs within public research, thus exacerbating the issue. For example, several datasets can stem from the same research. To establish connections between these datasets, they need to be extensively reviewed, cross-referenced and analyzed. However, many of these public datasets do not integrate well with each other or other public sources, such as KEGG (Kyoto Encyclopedia of Genes and Genomes) and GWAS (Genome-Wide Association Studies) Catalog, for various reasons, including a lack of a standardized format or insufficient annotation, for example. Consequently, linking the metadata from these sources becomes challenging, making it hard to gain a clear understanding of the relations between them.

The quality of public research can also vary. There’s a considerable amount of manually curated data available that doesn’t provide evidence (i.e. data from clinical trials) or fails to cite the original report/document from which it is referencing, making it challenging to validate the accuracy, quality and conclusions drawn from the data.

Internal data silos

When leveraging corporate data, companies in the pharma and life sciences space have individuals, teams and departments all producing valuable data that could be used immediately or in the future, such as in the later stage of the pharma life cycle. For example, data from clinical trials for one drug could be extremely valuable in understanding the application of a target (e.g. protein) with another disease (e.g. a side effect in one trial could become a targeted disease in another), this data can also be used for drug safety review or commercial purposes later on. However, this data becomes difficult to share and repurpose later on if it lacks the original meaning and context in which it was created (which it often does). It might be presented in a spreadsheet without a clear legend or instructions on how to use or interpret the data. Also, data can get stuck within internal systems that require software and specialized technical expertise to retrieve or is buried within documents and personal emails. Retrieving data is also a time-staking and laborious task leaving little room for actual analysis and application of insights.

Knowledge drain

Once data is retrieved and supplemented with information and meaning, insights can be derived from the data, which is what we consider ‘knowledge’. Knowledge is the penultimate step in the decision-making process, right before a final decision can be made, making it an essential asset to any company.

Unfortunately, knowledge often becomes lost. This phenomenon, where knowledge becomes trapped or irretrievable, is something we call the Bermuda Triangle of Knowledge Drain. It refers to knowledge that gets swallowed into a vortex, swirling away into oblivion. When knowledge is left stuck in the minds of domain experts (who are unable to pass on their expertise due to a leave or departure from the company), when it’s confined to physical documents, slideshows and emails, or becomes isolated within siloed systems and software, it creates the perfect storm for a knowledge drain to occur.

The Bermuda Triangle of knowledge drain

The solution is a knowledge graph. Knowledge graphs can bring together both private data and public research knowledge while addressing the challenges found within these two domains. It seamlessly connects to external datasets, utilizes existing metadata and ontologies, and imbues internal data with contextual meaning. The semantic layer within the knowledge graph allows you to connect to public data sources and transforms data into consumable, shareable and actionable knowledge while adhering to FAIR data practices, ensuring reusability and interoperability of data. As mentioned above, it also adds a trust and explainability foundation for AI applications, ensuring accuracy and eliminating potential hallucinations. This added layer of trust helps companies maximize existing AI investments and generate trustworthy and explainable AI solutions such as AI-driven drug discovery.

To conclude, the urgency and significance of addressing this fragmentation are unmistakable. Despite the existing challenges, the integration of public and private data presents substantial benefits for companies in the pharma and life sciences industries, which can be achieved through the implementation of a knowledge graph. A knowledge graph, such as the soon-to-be-launched Dimensions Knowledge Graph, can aid in streamlining the trial and manufacturing process, fast-tracking drug discovery, speeding up drug safety review processes and ensuring reusability of knowledge.

Stay tuned to learn about the Dimensions Knowledge Graph, a ready-made knowledge graph providing access to one of the world’s largest interconnected sets of semantically annotated knowledge databases while powering smart and reliable AI solutions across the pharma value chain.

For more information about knowledge graphs, check out this blog post by metaphacts.

The post Fragmented knowledge in pharma: Bridging the divide between private and public data appeared first on Digital Science.

Putting data at the heart of your organizational strategy

Mon, 08 Jan 2024 07:34:22 +0000

How to centralize research data for strategic, evidence-based decisions

‘Have you done your due diligence?’ These six words induce fear and dread in anyone involved in finance, with the underlying threat that huge peril may be about to engulf you if the necessary homework hasn’t been done. Due diligence in the commercial sphere is a hygiene factor – a basic, if detailed, audit of risk to ensure that all possible outcomes have been assessed so nothing comes out of the woodwork once an investment has been made.

The question, however, is just as important for academic institutions looking to check the data on their research programs: have you done your due diligence on that? If not, then a linked database such as Dimensions can help you.

Strategic objectives

At a recent panel discussion hosted by Times Higher Education (THE) in partnership with Digital Science on optimizing research strategy, the question of due diligence was framed by looking at the academic research lifecycle and the challenges emanating from the increased amount of data now accessible to universities. More specifically, how universities could extract and utilize verified data from the ever–increasing number of sources they had at their disposal.

Speaking on the panel, Digital Science’s Technical Product Solutions Manager Ann Campbell believes there are numerous benefits to using new modes of data to overcome problems associated with data overload. “It’s important to think holistically, of not only the different systems that are involved here but also the different departments and stakeholders,” she said. “It’s better to have an overarching data model or a perspective from looking at the research life cycle instead of separate research silos or different silos of data that you find within these systems.”

The panel recognized that self–reporting for academics could lead to gaps in the data, while different impact data could also be missed due to a lack of knowledge or understanding on behalf of faculty members.

Digital Science seeks to address these problems by adding some power to its Dimensions linked database in the shape of Google BigQuery. By marrying this computing power to the size and scope of Dimensions, academics and research managers are empowered to identify specific data from all stages of the research lifecycle. This allows researchers to seamlessly combine external data with their own internal datasets, giving them the holistic view of research identified by Ann Campbell in the discussion.

Data savant

The theme of improving the capabilities of higher education institutions when it comes to data utilization has been most vividly described by Ann Campbell in her November presentation to the Times Higher Education Digital Universities conference in Barcelona in October. Memorably, she compared universities’ use of data to the plot of popular TV drama Game of Thrones. Professors as dragons? Rival departments as warring families? Well not quite, but what Ann did observe was that there are many competing elements within HEIs – research management, research information, academic culture, the library – and above them are senior management who have key questions that can only be answered using data and insights across all of them:

Which faculties have a high impact? Should we invest more in them?
Which faculties have high potential but are under–resourced?
How can we promote our areas of excellence?
How can we identify departments with strong links to industry?
What real–world research impact can we feed back into our curriculum?
Are we mitigating potential reputational risk through openness and transparency?

Bringing these disparate challenges together requires a narrative, which is another reason why the Game of Thrones analogy works so well as we see that for all the moving parts of the story to work, a coherent story is required. This can be how an institution’s research culture strategy is working with a rise in early career international collaborations, how an increase in new funding opportunities followed a drive to increase interdisciplinary collaborations, or how the global reputation of a university could be seen to have improved its impact rankings position due to increased SDG–related research.

Any good story needs to have the right ingredients, and where Digital Science can really help an institution is to bring together those ingredients from across an organization into viewable and manageable narratives.

Telling stories

But the big picture is not the whole story, of course. There are other, smaller narratives swirling through HEIs at any given time that reflect the different specialisms, hot topics or focus areas of the university. Three of these focus areas most commonly found in modern universities are research integrity, industry partnerships and research impact, and these were discussed recently at another collaborative webinar between THE and Digital Science: Utilising data to deliver research integrity, industry partnerships and impact.

This panel discussion was a little more granular, and teased out some specific challenges for institutions when it came to data utilization. For research integrity, certain data relating to authorship can be used as ‘trust markers’, based around authorship, reproducibility and transparency. Representing Digital Science, Technical Product Solutions Manager Kathryn Weber–Boer went through the trust markers that form the basis of the Dimensions Research Integrity solution for universities.

But why are these trust markers important? The panel discussion also detailed that outside universities’ realm of interest, both funders and publishers were increasingly interested in research integrity and the provenance of research emanating from universities. As such, products like Dimensions Research Integrity were forming a key part of the data management arsenal that universities needed in the modern research funding environment.

In addition, utilization and scrutiny of such data can help move the dial in other important areas, such as changing research culture and integrity. Stakeholders want to trust in the research that’s being done, know it can be reproduced, and also see there is a level of transparency. All of these factors then influence the promotion and implementation of more open research activities.

Another important aspect of research integrity and data utilization is not just having information on where data is being shared in what way, it is also whether it is being shared as it has been recorded as, and where it is actually located. As pointed out in the discussion, Dimensions is a ‘dataset of datasets’ and allows the cross–referencing of these pieces of information to understand if research integrity data points are aligned.

Positive outlook

Discussions around research integrity and data management can often be gloomy affairs, but there is some degree of optimism now there are increasing numbers of products on the markets to help HEIs meet their goals and objectives in these spheres of activity. Effective data utilization will undoubtedly be one of THE critical success factors for universities in the future, and it won’t just be for the effective management of issues like research integrity or reputations. With the lightning fast development, adoption of Generative AI in the research space and increasing interest in issues like research security and international collaboration, data utilization – and who universities partner with to optimize it – has never been higher up the agenda.

You can view the webinars here on utilizing new modes of data and delivering research integrity.

Learn more about how Dimensions can help you

The post Putting data at the heart of your organizational strategy appeared first on Digital Science.