Bridging the Gap or Widening the Divide: A Call for Capacity-Building in Artificial Intelligence for Healthcare in the Philippines

The emerging field of generative artificial intelligence (GAI) and some of its well-known technologies: ChatGPT, Google Bard and Claude, have gained substantial popularity due to their enormous potential in healthcare applications, as seen in medically fine-tuned models such as Med-PaLM and ChatDoctor. While these advancements are impressive, the dependence of AI development on data volume and quality raises questions about the generalizability of these models. Regions with lower medical research output risk bias and misrepresentation in AI-generated content, especially when used to assist clinical practice. Upon testing of a prompt concerning the isoniazid dosing of Filipinos versus other ethnic and racial groups, responses from GPT-4, GPT-3, Bard and Claude resulted in 3 out of 4 outputs showing convincing but false content, with extended prompting illustrating how response hallucination happens in GAI models. To address this, model refinement techniques such as fine-tuning and prompt ensembles are suggested; however, refining AI models for local contextualization requires data availability, data quality and quality assurance frameworks. Clinicians and researchers in the Philippines and other underrepresented regions are called to initiate capacity-building efforts to prepare for AI in healthcare. Early efforts from all stakeholders are needed to prevent the exacerbation of health inequities, especially in the new clinical frontiers brought about by GAI.


INTRODUCTION
From automated scribing to clinical decision support tools -the fast-paced development of artificial intelligence (AI) and virality of user-friendly tools like ChatGPT opens a transformative frontier in clinical practice.High-level advances in healthcare applications like generalist biomedical AI systems (GMAI) and multimodal models appear to be significant steps towards bridging the care gap; [1][2][3] however, the inherent dependency of AI development on data quality and availability may exacerbate inequalities in medicine, especially in communities with scarce medical literature.This raises concerns about the generalizability of healthcare AI systems, especially considering that most of the development in this space is built with Western-oriented datasets.Unlike clinical practice guidelines that consider local context, AI models are built from pretraining on vast amounts of data that may or may not consider clinical nuances in underrepresented localities.[4] Despite the availability of AI-based tools, healthcare professionals in the Philippines must be wary of the risks of utilizing them in clinical practice and build initiatives to contextualize AI tools with local clinical nuances in mind.

What is Artificial Intelligence?
Artificial intelligence (AI) may be defined in a multitude of ways.However, for the purposes of understanding its impact on clinical practice, AI is defined as a collection of technologies that utilize computational learning methods (eg, machine learning, neural networks, deep learning, natural language processing) in interpreting, analyzing and generating data, aiming to replicate human-level intelligence.[4,5] The most relevant type of AI in this discussion is GAI, which is differentiated from other types of AI by its autonomous ability to flexibly generate seemingly novel data without explicit human programming.[4] Under GAI, there are multiple types of applications like diffusion models, large language models and multimodal models, which focus on image, text and a combination of multiple data types (eg, audio, video, image, text) for interpretation and generation, respectively.

ChatGPT and the Development of Generative AI for Healthcare Applications
GAI traces back to the first neural networks developed in 1943, and one of the groundworks that laid the foundation for generative models roots from a 2017 paper published by Google titled "Attention is all you need," -giving birth to transformer models, the core model used in Generative Pre-Trained Language Models (GPT) that we all know today as the technology behind ChatGPT.[6,7] ChatGPT is an AI-powered conversational model that produces human-like responses to multimodal inputs released by OpenAI in November 2022.[8] It earned 100 million users in 2 months, the fastest uptake of users in any emerging technology.[9] Its fame comes from its ability to read, summarize, interpret and produce human-like output based on human-produced input -giving it the capacity to hold conversations with human prompts, thus opening a world of possibilities in various domains, including medicine.[9][10][11] Since ChatGPT, public interest and rapid development of healthcare-focused tools powered by GAI models like GPT, PaLM and LLaMA have rapidly been advancing and are continuously gaining popularity due to their enormous potential in clinical use.As of October 2023, there are 692 AI and machine learning-enabled medical devices filed with the US Food and Drug Administration; however, no GAI-enabled projects have been approved just yet.[12] A list of notable developments in GAI for healthcare built in connection with technology giants Google, Meta, Microsoft and Amazon are summarized in Table 1.
The list was limited to initiatives and projects linked to GAI tools from major technology companies with publicly-recognized in-house AI foundation models for concise information delivery.As of writing, the pacing of AI development consists of hundreds to thousands of initiatives in early-stage startups and nationally-backed research worldwide; hence, the scope of this table was limited to a few notable models.

Bias, Underrepresentation and Hallucination in GAI
While recent developments are indeed revolutionary, it is unlikely for AI tools to be a plug-and-play technology.One particular risk that AI carries is inherently rooted in the underlying technology of its creation.Generative AI relies on Foundation Models, which are large models built from processing massive amounts of data using self-supervised pretraining methods that learn the patterns and relationships of data from the provided content.[18] The pretraining methods that build foundation models are observed to be scale-dependent, and model competitiveness improves with greater data volume.[19] However, this poses a significant risk: data-poor communities, especially in developing regions with lower medical research output, lag in AI utilization due to models being skewed to represent available data from developed regions.The domination of Western data in medical literature, and consequently, AI pretraining datasets, risks the failure to represent the unique clinical presentations of patients from minority groups with scarce medical literature.[20,21] Simply put, foundational models are trained on vast amounts of data and follow the simple concept of 'garbage in, garbage out'.If data is either unavailable or is of poor quality -then the output of an AI model may not be of clinical use.Bluntly put, it is unlikely for US-based AI research groups to include updated clinical practice guidelines and medical literature from the Philippines in training their models.
Before the ChatGPT-induced AI boom, concerns about bias in healthcare AI had already been pointed out.[20,21] However, what makes the era of GAI different is its capacity not only to interpret but also analyze and generate novel data.What comes with its ability to create new data is the capability to generate seemingly true but fabricated data, achieved in various ways, like by citing research papers that don't exist or convincing claims without factual evidence, a phenomenon commonly referred to as hallucination.[22,23] This multiplies the risk of bias, especially when these tools assist in clinical decision-making.
To illustrate this, we show the AI-generated responses to a prompt that is very relevant to the Philippine healthcare situation, as shown in Table 2.
No prior chats, custom instructions or prompt tuning methods were performed.GPT-3.5 and GPT-4 were set on default settings and internet access was not enabled.The response from Google Bard is a snippet from a 97-word answer for conciseness on this paper; see Appendix 1 for full response and settings applied.
In literature, it is generally agreed that Filipinos are rapid acetylators, which comes as a consequence of being a country with high tuberculosis disease burden.[24,25] Despite this, three out of the four AI-generated responses, as shown in Table 2, convincingly claim otherwise.When GPT-4, the latest model from OpenAI as of October 2023, was retested with a request to provide peer-reviewed citations for the claim, the response provided produced a fake citation with a DOI link that leads nowhere, as shown in Table 3

below:
No prior chats, custom instructions or prompt tuning methods were performed.GPT-4 was set on default settings.
Upon further assessment, the real-world paper that closely resembles the cited article in the response found in Table 3  Clearly, GPT-4 was able to retrieve a reference; however, it falsely modified the title, journal and links in a very convincing format.When reviewing the studies included in the cited meta-analysis, Filipinos were not even part of the included samples, [26] yet it was utilized to generate the response.If not doublechecked otherwise, then this claim may appear to be as good as any material containing a complete citation.
While we acknowledge that elaborate prompttuning or modification of model weights and settings can create responses with better precision, these hallucinations still pose a significant danger, especially when applied in clinical practice.Assessing the risk of isoniazid-induced liver injury relies on the patient's rapid or slow acetylator status.[24,25,27] If AI-assisted clinical decision support tools generate hallucinating responses like the one shown above, potentially influencing a physician's dosing regimen, then patients are placed under unnecessary harm.

Capacity-building for artificial intelligence in the Philippines
GAI and its applications in healthcare are in their infancy stages, and there are a multitude of Table 2 AI-Generated Responses on Isoniazid dosing variability, retrieved from models by OpenAI (ChatGPT GPT-3.5 and GPT-4), Google (Bard) and Anthropic (Claude).Generated on 20 October 2023.

PROMPT:
In less than 100 words, explain why isoniazid dosing tends to differ for Filipinos when compared to other racial and ethnic groups.

ChatGPT (GPT-3.5)
Isoniazid dosing can differ for Filipinos due to genetic variations in drug metabolism.Many Filipinos have a higher prevalence of slow acetylation, a genetic trait that affects how the body processes isoniazid.This can result in slower drug clearance and increased risk of adverse effects or toxicity at standard dosages.Therefore, lower initial dosages or alternative regimens are often recommended to ensure safety and effectiveness in this population.

ChatGPT (GPT-4)
Isoniazid dosing can vary for Filipinos due to their genetic makeup, which often includes a higher prevalence of slow acetylators.The acetylation process, governed by the NAT2 gene, affects how quickly isoniazid is metabolized in the body.Slow acetylators metabolize the drug more slowly, leading to higher drug levels in the blood and increased risk of side effects.Consequently, healthcare providers may adjust dosages or monitor these patients more closely to mitigate the risks of toxicity and other adverse effects while ensuring the medication is effective.

Google Bard
Isoniazid is an antibiotic used to treat and prevent tuberculosis (TB).The dosing of isoniazid varies depending on a number of factors, including age, weight and kidney function.However, there is also evidence that ethnicity can play a role in isoniazid dosing.Filipinos are more likely to experience side effects from isoniazid than other racial and ethnic groups.This is thought to be due to a genetic difference that affects the way Filipinos metabolize isoniazid.As a result, Filipinos typically receive a lower dose of isoniazid than other groups.

Claude AI
Isoniazid dosing often needs adjustment in Filipino patients because they tend to metabolize the drug more rapidly due to genetic differences influencing liver enzymes.This leads to lower isoniazid levels, reducing efficacy.Lower initial doses are recommended for Filipinos to avoid subtherapeutic levels while minimizing side effects.
ways to ensure the applicability of internationally developed AI models in the Philippine setting.Just like how clinical practice guidelines stem from a combination of internationally accepted standards applied to the local context, foundation models can further be trained on local datasets using methods like fine-tuning, word embeddings, content injection, prompt ensembles and a multitude of other techniques.[2,3,19,[28][29][30] Despite the existence of these methods, model refinement continues to rely on fundamentals like high-quality datasets and validation frameworks.
With this in mind, clinicians and researchers in the Philippines need to develop quality assurance (QA) and validation frameworks to assess models similar to the likes of MedQA, PubMedQA and MultiMedQA, which were used to assess medically fine-tuned foundation models like Google's Med-PaLM.[13,31] Health facilities should be prepared to employ recurring local validation strategies that take site-specific context into assessing AI implementation [32] and ensure safe data-gathering methods with a privacy-first approach to iterating AI models.
Early initiatives like employing electronic health record systems, building and validating QA frameworks using updated clinical practice guidelines from medical societies and creating anonymized patient datasets from tertiary hospitals using Philippine data should be further welcomed and supported at a policy level.

The state of artificial intelligence projects in the Philippines
The Philippines is not completely behind the AI wave: The Department of Health, together with the Philippine Health Insurance Corporation, is pioneering the National Health Data Repository (NHDR), which would be the country's first integrated architecture for medical data.[33] While the NHDR Framework is already available online, there has yet to be an official statement on its projected completion.
Among other projects is the CANDLE Study by the National Health Institute, which includes phenotypically scoring CT scans for early liver cancer detection [34]; the CHERISH2 App of the Department of Health in partnership with a tertiary hospital for screening COVID-Pneumonia [35]; and a Dengue prediction model using Long Short-term Memory (LSTM) made by researchers from the University of the Philippines [36].
Overall, progress is being made with the collaborative efforts of both the private and public sectors in developing Filipino-oriented AI, and this serves as a waypoint towards building locally contextualized resources for emerging AI models.

CONCLUSION
It is clear that AI is here to stay and developing countries risk lagging behind.In 2023 alone, major tech giants have been launching medically fine-tuned models left and right, and are actively integrating these into applications for consumer use.Though impressively scoring in clinical question banks, these models still grapple with biases, hallucinations and misinformation.This is rooted in how these models are trained, provided the limited data of patients from the lesser end of the digital divide.
Fundamentally, AI is dependent on data quality, volume and validation.The lack of Filipino representation in the pretraining data poses substantial risks for these models to perpetuate false or potentially harmful clinical advice.These cutting-edge technologies are not plug-and-play, and capacitybuilding is needed to prepare ourselves to welcome this emerging field.The Philippines must employ policies to ensure the safety and quality of emergent AI technologies; clinicians and researchers should initiate the development of validated datasets and

PROMPT:
In less than 100 words, explain why isoniazid dosing tends to differ for Filipinos when compared to other racial and ethnic groups.Cite peer-reviewed references for your claims.

RESPONSE from ChatGPT (GPT-4):
Isoniazid dosing can differ for Filipinos due to genetic variations that affect drug metabolism.A higher prevalence of slow acetylators is found among Filipinos and other Asian populations leading to slower rate of isoniazid clearance from the body.Slow acetylators may require adjusted dosing to minimize the risk of drug toxicity while ensuring treatment efficacy.Reference: Cai Y, Yi J, Zhou C, et al.NAT2 polymorphisms and susceptibility to anti-tuberculosis drug-induced liver injury: a meta-analysis.Int J Tuberc Lung Dis.2012;16(5):589-595.DOI: 10.5588/ijtld.11.0568.Please note, always refer to the most recent and pertinent literature for up-to-date information.
quality assurance frameworks, and medical schools must include AI in classroom discourse early on.
The pursuit of representation in this emerging field is paramount in determining the course of AI in Philippine healthcare.Our early initiatives will shape the future of this emerging technology: will AI become a stepping stone towards universal healthcare, or add to our list of health inequities to overcome?
The tightrope between bridging the gap or widening the divide relies entirely on how we collectively decide to tip the balances of this technology -and we are at the forefront of building its future.

Table 1
Google, Meta, Microsoft and Amazon-based initiatives on Generative Artificial Intelligence in Healthcare.
[15]DoctorA medically fine-tuned large language model refined from the Large Language Model Meta AI (LLaMA) as the baseline foundation.Adaptation and refinement were done using a dataset of 100,000 anonymized patient-doctor dialogues.Built as a chat assistant for patient medical advice.Li, et al. 2023[14]LLaMA model is a foundation model developed by Meta[15]

Table 3
Retest of Isoniazid dosing query with citation prompt for GPT-4.Generated on 20 October 2023.