Research activities in the C4AI are organized around five Great Challenges that combine fundamental aspects of artificial intelligence with applications in selected fields such as agribusiness, climate, and health. The current challenges are:
NLP2
Resources to Bring NLP of Portuguese to State-of-Art
Putting together open data and tools to enable high-level NLP of the Portuguese language.
Leaders: Marcelo Finger, Sandra M. Aluísio and Thiago A. S. Pardo
We aim to produce resources for Brazilian Portuguese that will enable state-of-the-art tools and applications. We concentrate on both written and spoken modalities for Portuguese, focusing on three main tasks:
To grow resources to lead Portuguese NLP to world state-of-the-art, effectively moving Portuguese out of the low-resource language scenario, with essential resources and tools as well as applications on some current critical society demands.
We aim to obtain the following results for the syntactical front:
On distributional models and NLI:
For spoken language:
PROINDL
Artificial Intelligence Technologies to Strengthen the Indigenous Languages of Brazil
Using Artificial Intelligence in partnership with Indigenous communities to develop tools to preserve, revitalize and disseminate the Indigenous languages of Brazil.
Leaders: Claudio Pinhanez, Luciana Storto
Most of the Indigenous languages if Brazil are under threat of disappearing by the end of the 21st century. On the one hand, Indigenous peoples and their territories continue to be under attack by individuals and organizations, with invasions, the spread of disease, and the destruction of ecosystems on which they depend. On the other hand, the violent processes which began with colonization and persist until today, such as forced migration, catechesis, and the imposition of European languages, have significantly affected the number of speakers of Indigenous languages.
This joint project by IBM Research and USP explores the creation and use of Artificial Intelligence for the development, in partnership with Indigenous communities, of tools to preserve, revitalize, and disseminate Indigenous languages of Brazil. However, although AI has made great strides in the last 10 years in languages such as English and Chinese, its use in Indigenous language contexts is still incipient and hampered by the lack of data and programs to support research and development. PROINDL focuses on exploring innovative solutions to these challenges.
This project is integrated with the objectives and principles of the International Decade of Indigenous Languages established in 2022 by the UN and UNESCO, aiming at the strengthening and continuity of Indigenous languages around the world, articulated in the “Declaration of Los Pinos” (Chapoltepek). Within this context, we have ongoing partnerships with Indigenous communities in the city of São Paulo area which explore, together with their members, the development of necessary, desired, and sustainable solutions.
The project comprises the following areas of work:
KEML
Knowledge-Enhanced Machine Learning for Reasoning about Ocean Data
Merging data-driven learning and knowledge-based reasoning to answer complex queries about the Blue Amazon.
Leaders: Fabio Cozman and Eduardo Tannuri
Recent breakthroughs in AI have depended on parallel processing of big datasets so as to learn large models through optimization. Further breakthroughs should be possible by judiciously enlisting knowledge representation and planning techniques so as to make learning more efficient, less brittle, and free of biases.
In this context, we investigate conversational agents that can answer high-level questions. Conversations with such agents should include arguments, causes, explanations, and reasoning; it should be possible to conduct a conversation over time and with a purposeful goal, taking into account desires and intentions of the user. Overall, these conversational agents are a laboratory in which to study the connection between data-driven machine learning and knowledge-driven reasoning and planning.
GOML
Graph-Oriented Machine Learning for Stroke Diagnosis and Rehabilitation
Improving Stroke diagnosis, treatment, and rehabilitation with graph-oriented machine learning on multimodal data.
Leaders: José Krieger and Zhao Liang
The recent advances of machine learning in medicine have been remarkable. However, there are still important issues that need to be addressed. Here we deal with two important questions:
1) How to integrate and select relevant medical features (biomarkers) from large-scale heterogeneous and dynamical sources?
In applications of machine learning in medicine, we often have to deal with large-scale heterogeneous and dynamical data sets. For example, in the case of applications and scientific research related to stroke, or cerebrovascular accident (CVA), various kinds of data accumulated for long period of time, such as texts, images, genetic biomarkers, electric signals, patient’s symptoms, and geographic information are often available even for a single patient. Information integration is essential to correctly address health problems, as healthcare professionals rarely use only one type of information when solving a medical problem. Another important aspect when dealing with a large amount of features is to properly select the most relevant ones: understanding which features are most relevant for the classification of a stroke provides important information for quick and accurate diagnosis and treatment.
2) How to interpret decisions made by machine learning algorithms and how to integrate human and artificial intelligence?
Currently, successful machine learning techniques do not provide an explicit mechanism to satisfactorily explain how a given result is achieved. Such a logical explanation is necessary in many medical applications, for example, in disease diagnosis. The lack of interpretability deeply impacts the possibilities of integrating human and artificial intelligence in Medicine. In the majority of the cases, healthcare professionals still consider machine learning algorithms as black-box machines. Again, this is highly influenced by the lack of interpretability of machine learning strategies.
Our approaches primarily deal with Cerebrovascular Accident (CVA) as the application domain. According to the WHO, more than one billion people in the world have some disability; among chronic diseases, stroke stands out because it is the main cause of disability and the second cause of death in the world. Much progress has been made in understanding the risk factors, mortality and rehabilitation of stroke; however, incidence continues to increase as a result of an aging population and other risk factors. The identification of more precise and sensitive stroke biomarkers can help to modify this worrying situation. Furthermore, developing diagnostic approaches with high accuracy and prediction of individualized outcomes is one of the main ambitions and is one of the strategies of the WHO 2014-2021 global action plan (ODS – objective 3, best health for all at all ages).
The objective here is two-fold.
For the proposed study, we will use datasets from ATLAS (Anatomical Tracings of Lesions After Stroke), InCor (Heart Institute of Medical School of USP) stroke dataset (200 T1-weighted MRIs and Reports), and the data sets of IMREA – Instituto de Medicina Física e Reabilitação do Hospital das Clínicas FMUSP.
AgriBio
Causal Multicriteria Decision Making in Food Production Networks
Developing causal multicriteria AI models for decision making under uncertainty in food production networks.
Leaders: Antonio Saraiva and Alexandre Delbem
The agribusiness productive cycles, environmental sustainability, and food security are current demands that defy worldwide authorities. In these settings, proper modeling of heterogeneous large-scale information, resilient learning systems that work with the dynamicity of real environments, and methods that find a balance among many concerns on costs and benefits are significant challenges. Representation learning, resilience enhancement, and multicriteria decision making are important tools to deal with those challenges.
The construction of reliable causal models is an open problem. Advanced methods for generating Dynamic Bayesian Networks (DBNs) based on the capture of tacit knowledge can enable causal models that combine continuous and discrete variables (a level of heterogeneity) and that are also adaptive.
Hybridization through ensembles of conventional knowledge-based models and learning methods is a possible way to produce useful solutions for real-world complex problems. Such processes can contribute to resilience through dataset evaluation and improvement, and selection of learner parameters (as meta- features) in a scenario of ensemble setup, dynamic ensemble selection and meta-learning. The integration of resilient-enhanced models with the DBN-based approaches may generate a higher level of predictive resilience.
The construction of new approaches for multicriteria decision making that combine the solutions found by the conventional knowledge-based techniques and by the proposed learning methods seems a promising strategy to generate short- and long-term innovations.
An important aspect of food security is climate change, mainly involving water supply. Hydrological models are investigated aiming at developing preliminary methods that can combine knowledge-based and data-driven approaches. Models for critical hydrological conditions, as droughts and floods, are also investigated in order to benefit predictions of crop water stress or perishability.
Al Humanity
AI in Emerging Countries: Public Policies and the Future of Work
Mapping, understanding, and addressing the impact of AI in emerging countries.
Leaders: Glauco Arbix, João Paulo Veiga
Societies are increasingly delegating to AI systems many complex and risk-intensive decisions, such as diagnosing patients, hiring workers, granting parole, and managing financial transactions. At the same time, there is significant consensus that in the field of AI, emerging countries are lagging behind pioneering countries, in particular the USA and China.
Countries like Brazil urgently need to get closer to the best practices in AI. To that end, they must develop strategies to qualify professionals, move forward in building a specific ecosystem and in developing public policies aimed at realizing the country’s potential. Moreover, AI, automation and rapid digitalization may favor the reduction of employment and alter the labor market; the use of biometric techniques can accentuate prejudices; the performance of companies without a base of values can erode ethical and even democratic principles adopted by society.
It is necessary to examine novel questions around liability regarding the limits of current regulatory frameworks in dealing with disparate and unexpected impacts and in preventing algorithmic harms to society. Given AI’s broad impact, these pressing questions can only be successfully addressed from a multi- disciplinary perspective.