Abhijiy Nayak NLP for Most cancers Analysis

Abhijit Nayak, Senior Information Scientist at Cognizant and IEEE convention speaker, discusses constructing production-grade info extraction techniques for most cancers analysis and why area experience issues greater than mannequin measurement.

A July survey in Synthetic Intelligence Overview analyzed 156 NLP research in oncology and recognized a sample: transformer fashions carry out impressively on analysis benchmarks, then collapse when deployed in scientific workflows. ClinicalBERT extracts most cancers diagnoses precisely from curated pathology experiences. The identical structure fails when hospital documentation varies by doctor, establishment, and division. The technical foundations are stronger than ever. The techniques nonetheless don’t work in manufacturing.

The sample is acquainted throughout healthcare AI: spectacular benchmarks on curated datasets, adopted by friction when the identical techniques meet real-world circumstances. In oncology, the place 80% of the info wanted for remedy choices and analysis sits in unstructured scientific notes, this hole has penalties. Most cancers registries fall behind. Scientific trial matching slows. Remedy insights that would inform care stay buried in tens of millions of paperwork that nobody has time to learn manually.

Abhijit Nayak, Senior Information Scientist (NLP) at Cognizant, builds extraction pipelines that truly survive contact with messy hospital information. His techniques course of tens of millions of oncology data—extracting diagnoses, biomarker outcomes, remedy timelines—with the validation logic and audit trails scientific environments demand. This yr, he’s presenting analysis on LLM reproducibility and immediate optimization at IEEE conferences in Vienna and Singapore. We mentioned what kills NLP techniques once they transfer from paper to manufacturing, how area experience catches edge instances that bigger fashions miss, and why understanding oncology documentation patterns issues greater than basis mannequin parameter counts.

— A July survey in Synthetic Intelligence Overview analyzed 156 NLP research in oncology and located a constant sample — fashions that carry out effectively in analysis hardly ever survive contact with scientific workflows. You construct extraction pipelines that course of tens of millions of scientific notes. What really kills these techniques once they transfer from paper to manufacturing?

— Actually, it begins with one thing boring — the info simply appears utterly totally different. While you learn a analysis paper, they’re skilled on a dataset the place the whole lot is properly formatted, sentences are full, and terminology is constant. And then you definitely get an actual pathology report, and it’s a large number. One doctor writes tumour staging in a desk, whereas one other locations it someplace in the course of a paragraph with abbreviations I’ve by no means seen earlier than. Scientific notes typically embrace phrases like “see prior outcomes” with out really repeating the values. You’re extracting the identical sort of data, however the way in which it’s written varies considerably throughout establishments, departments, and typically even amongst particular person medical doctors.

After which there’s all of the infrastructure that no one writes papers about, as a result of it’s not novel, it’s simply work. You want ingestion, pre-processing, extraction, normalization to plain terminologies, validation logic, and audit trails. Educational benchmarks deal with F1 scores for entity recognition. However in manufacturing, in case your normalization step silently fails on an uncommon enter, the entire downstream evaluation is unsuitable — and in oncology, that may imply a missed biomarker or an incorrect remedy timeline.

However I believe the toughest half is definitely incomes belief from the scientific facet. These are individuals who have been doing guide abstraction for years. They know each edge case, each exception. In case your system hallucinates as soon as, if it misses one thing apparent, you’ve misplaced them. So you find yourself constructing all this explainability infrastructure, exhibiting supply sentences, confidence scores, and flagging ambiguous instances. None of that will get revealed as a result of it’s engineering, not analysis. However with out it, nothing deploys.

— Your pipelines extract diagnoses, tumor traits, remedy regimens, biomarker outcomes, remedy timelines — all from unstructured textual content. A pathology report from one doctor could look solely totally different from a scientific be aware from one other. How do you construct techniques that deal with that variability and nonetheless hit the accuracy that clinicians will really belief?

— You don’t clear up it with one mannequin. That’s the primary false impression — folks suppose you practice an enormous transformer, throw paperwork at it, and it figures the whole lot out. Doesn’t work that approach in oncology. The variability is simply too excessive, and the price of errors is simply too excessive.

What really works is breaking the issue into smaller items. Pathology experiences want totally different dealing with than radiology summaries. Progress notes are their very own beast. So that you construct specialised parts — one module focuses on tumor staging, one other on remedy regimens, one other on biomarker extraction. Each is tuned for its particular doc sort, its particular terminology patterns.

And then you definitely layer validation on high. Medical logic checks — does this staging make sense for this most cancers sort? Does this remedy timeline align with what we extracted concerning the analysis date? If one thing appears off, it will get flagged. Not rejected routinely, simply flagged for overview. As a result of typically the bizarre case is definitely right, and typically your mannequin made a mistake. You desire a human making that decision, not the system silently selecting one interpretation.

The belief piece comes from transparency. After we floor an extracted worth, we present precisely the place it got here from — the sentence, the doc, the date. Clinicians can click on by way of and confirm. They’re not being requested to belief a black field. And over time, once they see the system getting it proper persistently, once they see it catching issues they could have missed in a 50-page document — that’s when adoption really occurs.

— You’ve described your techniques as production-grade pipelines with MLOps, monitoring, and analysis requirements. Since 2022, you’ve been main AI/ML technique for healthcare tasks at Cognizant — deciding which use instances to prioritize, which architectures to standardize. What does it really take to maneuver an oncology NLP system from prototype to one thing a analysis staff depends on every day?

— Versioning, monitoring, and a correction pipeline that truly closes the loop. Each extraction must be reproducible months later — utilizing the identical mannequin model, configuration, and preprocessing. In regulated environments, “we up to date the mannequin” isn’t a solution. Monitoring catches drift earlier than customers do — new report templates, totally different documentation kinds, accuracy drops on particular most cancers sorts. We had tumor staging extraction degrade after one web site modified its pathology format. Caught it in dashboards inside days.

The suggestions loop is commonly what groups overlook. Clinicians flag errors, these corrections feed again into coaching information, fashions get retrained, and efficiency improves. Sounds apparent, however operationalizing it requires tooling — annotation interfaces, information pipelines, retraining schedules. We spent months constructing that infrastructure earlier than it started to repay.

The precise prioritization choices come all the way down to scientific influence versus technical feasibility. Some extractions are high-value however extraordinarily exhausting, like parsing free-text remedy modifications. Others are simpler wins. You sequence the roadmap so early deployments construct credibility when you deal with the extra advanced issues in parallel.

— Later this yr, you’re presenting at two IEEE conferences — FMLDS in Vienna on LLM reproducibility by way of three-way caching, ICNGN in Singapore on immediate optimization for sentiment evaluation. How do these hook up with your oncology work, or are they parallel tracks?

— They’re immediately related, simply abstracted. The reproducibility paper emerged from a real-world manufacturing drawback — LLM outputs aren’t deterministic, as the identical immediate yields barely totally different outcomes throughout runs. In analysis, that’s noise. In scientific pipelines the place audit trails and reproducible extractions are required, it’s a blocker. The caching structure we developed solves that on the infrastructure degree.

The immediate optimization work is about getting constant efficiency with out fine-tuning. In healthcare, you typically can’t ship affected person information to exterior APIs for mannequin coaching. So that you want prompting methods that work reliably out of the field. The emoji analysis sounds playful, however the underlying query is critical — how do you engineer prompts that produce secure, predictable outputs throughout totally different enter distributions?

Each papers deal with issues I hit in manufacturing first. The tutorial framing got here later.

— You’ve served as a choose at Devpost AI hackathons alongside panelists from Netflix, Meta, and Google. While you’re evaluating tasks from youthful groups, what separates an answer that appears spectacular in a demo from one that would really be deployed?

The very first thing I take a look at is what occurs when inputs break. Demo tasks all the time present the glad path — clear information, anticipated conduct, spectacular outcomes. Nevertheless, deployable techniques must fail gracefully and recognise when they’re unsure. In healthcare submissions, I particularly look ahead to edge case pondering — a 95% correct classifier means nothing if failures cluster round uncommon circumstances the place misclassification really kills somebody. Sturdy groups set up confidence thresholds and human overview triggers from the outset. And you may all the time inform when a staff talked to actual customers versus simply constructed for the demo. The structure choices are utterly totally different.

— Past healthcare, you’ve constructed foundational AI fashions for startups within the US philanthropic sector. That’s a pointy distinction — oncology is life-or-death, philanthropy is social influence. How transferable are the strategies?

— Extra transferable than you’d count on. Philanthropic organizations sit on large quantities of unstructured information — grant functions, influence experiences, program narratives. The identical core drawback: crucial info is buried in textual content that no one has time to learn manually. The extraction pipelines I constructed for oncology — doc classification, entity recognition, normalization — adapt immediately. What adjustments is the ontology, not the structure. In oncology, you’re extracting tumor staging and biomarker values. In philanthropy you’re extracting funding quantities, program outcomes, and geographic focus. The validation logic differs, the area dictionaries are distinct, however the engineering patterns stay the identical. And truthfully, working throughout domains makes you higher at each. You cease over-fitting your pondering to 1 drawback area.

— The subheadline of this interview is “why area experience issues greater than mannequin measurement.” In a area the place each month brings a brand new LLM with extra parameters, that’s a contrarian place. For somebody constructing a profession in healthcare AI, ought to they deal with the most recent basis fashions or spend money on understanding the medical area itself?

Area experience, with out query. I’ve seen groups use GPT-4+ on scientific notes and obtain mediocre outcomes as a result of they don’t totally perceive what they’re extracting. They’ll’t inform when the mannequin hallucinates a biomarker worth that makes no scientific sense. They don’t know which errors are catastrophic and that are tolerable. In the meantime, somebody who understands oncology documentation patterns, is aware of how tumor staging works, and might learn a pathology report — that particular person builds higher techniques with smaller fashions. The inspiration mannequin is a device. Realizing what to make with it, understanding tips on how to validate outputs, understanding the place the sting instances cover — that’s the exhausting half, and it comes from area data. Chase the fashions, and also you’re all the time behind. Put money into the area, and also you’re all the time worthwhile.