Problem 3

Regression models for probability of default prediction



Brief description of the challenge:

Probability of default (over the course of a year) on a bank loan payment (PD) is one of the decisive metrics in assessing an economy in general. An Excel file is provided, with data for every 3 months over the last 3 decades with possible explanatory variables and the target variable mentioned (this only up to 2020) and the aim is to obtain regression models that allow estimates for different time delays.


This problem does not fit into a usual dynamic, since the target variable on a given date can only be estimated over the course of the following year. Therefore, the objective consists in evaluating several models, using different lags and leads for the various variables, so that the forecast is made in a realistic way, with the information available at a given time.

Imagem do Problema 3


Mathematical background:

Students should be able to obtain linear and logistic regression models, and possibly other types of more robust models, such as neural networks (although the available information is substantially reduced and sparse), or decision trees. An analysis of the various time series can also be developed within the scope of classical forecasting models, assessing the presence of seasonality or autoregression.


Coordinador:

Ricardo Enguiça, Departamento de Matemática, Instituto Superior de Engenharia de Lisboa, Portugal.



The 10IMW is promoted by the Portuguese Network of Mathematics for Industry and Innovation, PT-MATHS-IN, and by the Spanish Network for Mathematics and Industry, math-in. It is supported by the Department of Mathematics and the Center for Mathematics of the University of Minho, through the FCT-CMAT Projects with the references UIDB/00013/2020 and UIDP/00013/2020.

CMAT

FCT-H

RP