This typically refers (at least in 2023) to Transformer models.
The amount of compute required to train such a model is typically Quadratic in the Number of Parameters. [Source: someone told me this]