It’s my understanding that this is one of the ways the DeepSeek really shines - instead of having a general one-size-fits-all model and trying to make LLMs into GenAI, they use a multitude of smaller models that can be hotswapped for different tasks in different contexts. The kind of summary you want for a news article is vastly different from the kind of summary you want for an academic paper, and being able to recognize when to use different models for different use cases is very powerful.
It’s my understanding that this is one of the ways the DeepSeek really shines - instead of having a general one-size-fits-all model and trying to make LLMs into GenAI, they use a multitude of smaller models that can be hotswapped for different tasks in different contexts. The kind of summary you want for a news article is vastly different from the kind of summary you want for an academic paper, and being able to recognize when to use different models for different use cases is very powerful.