This typically refers (at least in 2023) to Transformer models.

The amount of compute required to train such a model is typically Quadratic in the Number of Parameters. [Source: someone told me this]