Natalie Jonk - Digital Science

AI in drug discovery: Key insights from a computational biology roundtable

Natalie Jonk — Thu, 02 Oct 2025 09:59:50 +0000

This article distills key insights from the expert roundtable, “AI in Literature Reviews: Practical Strategies and Future Directions,” held in Boston on June 25 where a range of R&D professionals joined this roundtable, bringing perspectives from across the pharmaceutical and biotechnology landscape. Attendees included senior scientists, clinical development leads, and research informatics specialists, alongside experts working in translational medicine and pipeline strategy. Participants represented both global pharmaceutical companies and emerging biotechs, providing a balanced view of the challenges and opportunities shaping innovation in drug discovery and development.

Discussions covered real-world use cases, challenges in data quality and integration, and the evolving relationship between internal tooling and external AI platforms. The roundtable reflected both enthusiasm and realism about AI’s role in drug discovery – underscoring that real progress depends on high-quality data, strong governance, and tools designed with scientific nuance in mind. Trust, transparency, and reproducibility emerged as core pillars for building AI systems that can support meaningful research outcomes.

If you’re in an R&D role, whether in computational biology, informatics, or scientific strategy and looking to scale literature workflows in an AI-enabled world, keep reading for practical insights, cautionary flags, and ideas for future-proofing your approach.

Evolving roles and tooling strategies

Participants emphasized the diversity of AI users across biopharma, distinguishing between computational biologists and bioinformaticians in terms of focus and tooling. While foundational tools like Copilot have proven useful, there’s a growing shift toward developing custom AI models for complex tasks such as protein structure prediction (e.g., ESM, AlphaFold).

AI adoption is unfolding both organically and strategically. Some teams are investing in internal infrastructure like company-wide chatbots and data-linking frameworks while navigating regulatory constraints around external tool usage. Many organizations have strict policies governing how proprietary data can be handled with AI, emphasizing the importance of controlled environments.

Several participants noted they work upstream from the literature, focusing more on protein design and sequencing. For these participants, AI is applied earlier in the R&D pipeline before findings appear in publications.

Stock image

Data: Abundance meets ambiguity

Attendees predominantly use public databases such as GeneBank and GISAID rather than relying on the literature. Yet issues persist: data quality, inconsistent ontologies, and a lack of structured metadata often require retraining public models with proprietary data. While vendors provide scholarly content through large knowledge models, trust in those outputs remains mixed. Raw, structured datasets (e.g., RNA-seq) are strongly preferred over derivative insights.

One participant described building an internal knowledge graph to examine drug–drug interactions, highlighting the challenges of aligning internal schemas and ontologies while ensuring data quality. Another shared how they incorporate open-source resources like Kimball and GBQBio into small molecule model development, with a focus on rigorous data annotation.

Several participants raised concerns about false positives in AI-driven search tools. One described experimenting with ChatGPT in research mode and the Rinsit platform, both of which struggled with precision. Another emphasized the need to surface metadata that identifies whether a publication is backed by accessible data, helping them avoid studies that offer visualizations without underlying datasets.

A recurring theme was the frustration with the academic community’s reluctance to share raw data, despite expectations to do so. As one participant noted:

“This is a competitive area—even in academia. No one wants to publish and then get scooped. It’s their bread and butter. The system is broken—that’s why we don’t have access to the raw data.”

When datasets aren’t linked in publications, some participants noted they often reach out to authors directly, though response rates are inconsistent. This highlights a broader unmet need: pharma companies are actively seeking high-quality datasets to supplement their models, especially beyond what’s available in subject-specific repositories.

Literature and the need for feedback loops

Literature monitoring tools struggle with both accuracy and accessibility. Participants cited difficulties in filtering false positives and retrieving extractable raw data. While tools like ReadCube SLR allow for iterative, user-driven refinement, most platforms still lack persistent learning capabilities.

The absence of complete datasets in publications, often withheld due to competitive concerns, remains a significant obstacle. Attendees also raised concerns about AI-generated content contaminating future training data and discussed the legal complexities of using copyrighted materials.

As one participant noted:

“AI is generating so much content that it feeds back into itself. New AI systems are training on older AI outputs. You get less and less real content and more and more regurgitated material.”

Knowledge graphs and the future of integration

Knowledge graphs were broadly recognized as essential for integrating and structuring disparate data sources. Although some attendees speculated that LLMs may eventually infer such relationships directly, the consensus was that knowledge graphs remain critical today. Companies like metaphacts are already applying ontologies to semantically index datasets, enabling more accurate, hallucination-free chatbot responses and deeper research analysis.

What’s next: Trust, metrics, and metadata

Looking forward, participants advocated for AI outputs to include trust metrics, akin to statistical confidence scores, to assess reliability. Tools that index and surface supplementary materials were seen as essential for discovering usable data.

One participant explained:

“It would be valuable to have a confidence metric alongside rich metadata. If I’m exploring a hypothesis, I want to know not only what supports it, but also the types of data, for example, genetic, transcriptomic, proteomic, that are available. A tool that answers this kind of question and breaks down the response by data type would be incredibly useful. It should also indicate if supplementary data exists, what kind it is, and whether it’s been evaluated.”

Another emphasized:

“A trustworthiness metric would be highly useful. Papers often present conflicting or tentative claims, and it’s not always clear whether those are supported by data or based on assumptions. Ideally, we’d have tools that can assess not only the trustworthiness of a paper, but the reliability of individual statements.”

There was also recognition of the rich, though unvalidated, potential in preprints, particularly content from bioRxiv, which can offer valuable data not yet subjected to peer review.

Conclusion

The roundtable reflected both enthusiasm and realism about AI’s role in drug discovery. Real progress depends on high-quality data, strong governance, and tools designed with scientific nuance in mind. Trust, transparency, and reproducibility emerged as core pillars for building AI systems that can support meaningful research outcomes.

Digital Science: Enabling trustworthy, scalable AI in drug discovery

At Digital Science, our portfolio directly addresses the key challenges highlighted in this discussion.

ReadCube SLR offers auditable, feedback-driven literature review workflows that allow researchers to iteratively refine systematic searches.
Dimensions & metaphacts offers the Dimensions Knowledge Graph, a comprehensive, interlinked knowledge graph connecting internal data with public datasets (spanning publications, grants, clinical trials, etc.) and ontologies—ideal for powering structured, trustworthy AI models that support projects across the pharma value chain.
Altmetric identifies early signals of research attention and emerging trends, which can enhance model relevance and guide research prioritization.

For organizations pursuing centralized AI strategies, our products offer interoperable APIs and metadata-rich environments that integrate seamlessly with custom internal frameworks or LLM-driven systems. By embedding transparency, reproducibility, and structured insight into every tool, Digital Science helps computational biology teams build AI solutions they can trust.

The post AI in drug discovery: Key insights from a computational biology roundtable appeared first on Digital Science.

How experts are redefining research visibility beyond traditional metrics

Natalie Jonk — Thu, 25 Sep 2025 09:43:04 +0000

On-Demand Webinar: The Future of Research Visibility: Beyond Traditional Metrics

Introduction

Success in scientific publishing has long been measured by citations and impact factors. Yet in today’s Medical Affairs landscape, the definition of value is shifting rapidly. This article recaps insights from the recent panel discussion “The Future of Research Visibility: Beyond Traditional Metrics,” where experts from across the field explored how publication success is evolving, which new measures of impact matter most, and how digital transformation and AI are reshaping the game.

Bringing a wealth of diverse perspectives, the panel featured Shehla Sheikh, Head of Medical Communication & Publications at Kyowa Kirin; Kim Della Penna, Scientific Communications Director for Lymphoma, Myeloid, and Multiple Myeloma at Johnson & Johnson; Myriam Cherif, Founder of Kalyx Medical and former Regional Medical Director at GSK; and Carlos Areia, Senior Data Scientist at Digital Science. The discussion was moderated by Natalie Jonk, Enterprise Marketing Segment Lead, who guided the conversation through the critical challenges and opportunities shaping the future of research visibility.

Success: Still a moving target

Defining success remains one of the greatest challenges. For some organizations, it’s still as simple as getting the data published. For others, success means shaping clinical guidelines or influencing real-world decision-making.

Kim explained:

“A lot of these tools help us see who is engaging with our publication. Are they sharing the publication, did they find it important enough to share? Where is the data being incorporated? Is it being used in policy and guidelines, cost data, real-world healthcare data or by population health decision makers for access?”

Myriam emphasized how the lens has broadened over the past decade:

“A decade ago, people just looked at impact factors and citations. Now, we discuss with HCPs how data applies to patients. Sometimes a paper may be more practical for certain regions. We’ve moved toward a more holistic approach.”

Metrics beyond the traditional

Today, a wealth of data is available, but the challenge is deciding which metrics are truly meaningful. Downloads, mentions, and social media shares are only part of the story.

Carlos noted the complexity:

“Things are changing quite fast with data. How do you track success when different publications have different goals? Sometimes the goal is to see how quickly new studies get into clinical guidelines. Other times, it’s about reaching a very specific group of oncologists in one country.”

Sentiment analysis is also emerging as a key tool:

“We can now see if a publication has been well or badly received by, for example, a group of cardiologists. Medical Affairs is adapting rapidly to what real-time data can offer,” Carlos added.

The discoverability dilemma

Shehla raised a critical issue: ensuring publications are findable by the right stakeholders.

“Discoverability is super important. A lot of data ends up in supplementary indices, which aren’t always accessible. If it’s not directly available through the paper, that’s problematic. It raises the question: how much do we include in the main publication versus holding back for supplementary materials?”

The difficulty, she argued, isn’t just in publishing but in making materials trackable. Without DOIs or identifiers, measuring performance across channels becomes impossible.

Carlos emphasized that when any content type, including supplementary data, infographics, and plain language summaries, is uploaded to Figshare and assigned a DOI, it becomes both accessible and trackable. This is a critical step that several Digital Science customers are already using to monitor and demonstrate the impact of their materials and gain really deep insights regarding who is engaging with their content.

Formats and channels that resonate

Visual and digital formats are transforming scientific communication. With tools like Altmetric and Figshare, it’s now possible to track which content resonates with different audiences, for example, whether visual abstracts work best for patients, short videos for junior doctors, or news platforms or Medscape for senior clinicians.

Key takeaways from the discussion included:

Infographics and visual abstracts help make complex data more digestible for both HCPs and patients.
Social media engagement, accelerated since COVID-19, has expanded the demographic reach of publications.
Podcasts, YouTube, and blogs are emerging as alternative channels for research dissemination.

Shehla summarized the opportunity:

“Data visualization has been a game changer. It helps people understand complex results without dumbing them down. But it has to be a true representation of the data.”

Strategic decision-making with engagement data

Engagement data is no longer just descriptive – it’s strategic.

Myriam explained:

“This data helps us know which publications to amplify and in what format. If a subgroup analysis is relevant for Asia or South America, we integrate it into the regional strategy. Affiliates want to know how to use this data locally, whether in slides or field medical materials.”

Carlos added an example of reverse engineering success:

“We worked with a partner who had two trials presented at the same congress. One made it into a guideline in a specific country much faster than the other. By looking back at the local attention it had on social media, news and others, we tried to understand why.”

The future: AI, social media, and trust

Looking ahead, AI and digital platforms are set to further disrupt how success is measured.

Myriam highlighted new challenges:

“Citations and downloads will matter less. AI tools are already being used by HCPs to answer questions on diseases and treatments. But a recent study showed less than 15% overlap in references across Google, ChatGPT, and Perplexity when asked the same question. Metadata and referencing are going to be critical to ensure our publications are being picked up correctly.”

Kim added:

“We need to optimize what we create so AI can pick up data through correct tagging. Who is engaging, what types of data they’re engaging with, and what channel they use – these are all factors we have to plan for.”

Carlos cautioned on the risks:

“AI is a wonderful tool if used correctly – but like computer scientists used to say: it’s ‘garbage in, garbage out’. AI is very confident even when it’s wrong. The real value comes from using the right data together with AI to help people understand it better and extract the needed insights from it, whilst mitigating its potential for misuse and misinformation.”

Conclusion: Toward a holistic, dynamic view of impact

As the panel made clear, measuring publication performance can no longer be reduced to a single number. Success is multi-dimensional, context-specific, and evolving alongside technology and stakeholder expectations.

Traditional metrics such as citations and impact factors remain useful, but they are no longer sufficient. Engagement data, sentiment, and discoverability are now central to understanding whether a publication truly resonates and reaches its intended audience. At the same time, AI, social media, and new digital formats are reshaping how, and by whom research is consumed. And sometimes, the most meaningful measures are the informal ones: when medical scientific liaisons hear health care professionals discussing a paper, when KOLs reference it unprompted, or when data directly influences patient care.

A call to reframe success

The future of publication success will depend on Medical Affairs teams embracing this broader, more dynamic definition of impact. By combining rigorous traditional metrics with innovative digital measures, and by ensuring content is discoverable, trackable, and presented in accessible formats, organizations can create lasting value. Most importantly, reframing success around real-world influence and patient outcomes ensures that research doesn’t just get published, it makes a difference.

Continue the conversation

At Digital Science, we’re committed to helping Medical Affairs professionals thrive in an era where research visibility and impact are being redefined. To deepen the insights shared in this panel, we invite you to explore our latest white paper, “Empowering Medical Affairs in the Digital Age,” authored by thought leader Mary Ellen Bates. Inside, you’ll find practical strategies to navigate evolving challenges, demonstrate value, and drive measurable outcomes.

Mary Ellen Bates will also be leading our upcoming webinar, “From Data Chaos to Strategic Impact: Transforming Medical Affairs in the Digital Age” (Tuesday 28 October 2025).

The post How experts are redefining research visibility beyond traditional metrics appeared first on Digital Science.