Introduction 皇冠to Large Language Models _欧博ABG官网-欧博官方网址-会员登入

Need to tell us more? [[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eLanguage models predict and generate text by estimating the probability of a token or sequence of tokens occurring within a longer sequence, useful for tasks like text generation and translation.\u003c/p\u003e\n"],["\u003cp\u003eLarge language models (LLMs) are advanced language models with vast parameters and datasets, enabling them to process longer text sequences and perform complex tasks like summarization and question answering.\u003c/p\u003e\n"],["\u003cp\u003eTransformers are a key architecture in LLMs, utilizing attention mechanisms to focus on important input parts and enhance processing efficiency.\u003c/p\u003e\n"],["\u003cp\u003eLLMs have various applications, including text generation, translation, sentiment analysis, and code generation, but also present considerations such as cost, bias, and ethical implications.\u003c/p\u003e\n"]]],[],null,["# Introduction to Large Language Models\n\nNew to language models or large language models? Check out the resources below.\n| **Estimated Read Time:** 20 minutes\n| **Learning objectives:**\n|\n| - Define language models and large language models (LLMs).\n| - Define key LLM concepts, including Transformers and self-attention.\n| - Describe the costs and benefits of LLMs, along with common use cases.\n\nWhat is a language model?\n-------------------------\n\nA language model is a machine learning\n[model](https://developers.google.com/machine-learning/glossary#model)\nthat aims to predict and generate plausible language. Autocomplete is a\nlanguage model, for example.\n\nThese models work by estimating the probability of a\n[token](https://developers.google.com/machine-learning/glossary#token) or\nsequence of tokens occurring within a longer sequence of tokens. Consider the\nfollowing sentence: \n\n When I hear rain on my roof, I _______ in my kitchen.\n\nIf you assume that a token is a word, then a language model determines the\nprobabilities of different words or sequences of words to replace that\nunderscore. For example, a language model might determine the following\nprobabilities: \n\n cook soup 9.4%\n warm up a kettle 5.2%\n cower 3.6%\n nap 2.5%\n relax 2.2%\n ...\n\nA \"sequence of tokens\" could be an entire sentence or a series of sentences.\nThat is, a language model could calculate the likelihood of different entire\nsentences or blocks of text.\n\nEstimating the probability of what comes next in a sequence is useful for all\nkinds of things: generating text, translating languages, and answering\nquestions, to name a few.\n\nWhat is a large language model?\n-------------------------------\n\nModeling human language at scale is a highly complex and resource-intensive\nendeavor. The path to reaching the current capabilities of language models and\nlarge language models has spanned several decades.\n\nAs models are built bigger and bigger, their complexity and efficacy increases.\nEarly language models could predict the probability of a single word; modern\nlarge language models can predict the probability of sentences, paragraphs, or\neven entire documents.\n\nThe size and capability of language models has exploded over the last\nfew years as computer memory, dataset size, and processing power increases, and\nmore effective techniques for modeling longer text sequences are developed.\n\n### How large is large?\n\nThe definition is fuzzy, but \"large\" has been used to describe BERT (110M\nparameters) as well as PaLM 2 (up to 340B parameters).\n\n[Parameters](https://developers.google.com/machine-learning/glossary#parameter)\nare the\n[weights](https://developers.google.com/machine-learning/glossary#weight)\nthe model learned during training, used to predict the next token in the\nsequence. \"Large\" can refer either to the number of parameters in the model, or\nsometimes the number of words in the dataset.\n\n### Transformers\n\nA key development in language modeling was the introduction in 2017 of\nTransformers, an architecture designed around the idea of\n[attention](https://developers.google.com/machine-learning/glossary#attention).\nThis made it possible to process longer sequences by focusing on the most\nimportant part of the input, solving memory issues encountered in earlier\nmodels.\n\nTransformers are the state-of-the-art architecture for a wide variety of\nlanguage model applications, such as translators.\n\nIf the input is **\"I am a good dog.\"** , a Transformer-based translator\ntransforms that input into the output **\"Je suis un bon chien.\"**, which is the\nsame sentence translated into French.\n\nFull Transformers consist of an\n[encoder](https://developers.google.com/machine-learning/glossary#encoder) and a\n[decoder](https://developers.google.com/machine-learning/glossary#decoder). An\nencoder converts input text into an intermediate representation, and a decoder\nconverts that intermediate representation into useful text.\n\n### Self-attention\n\nTransformers rely heavily on a concept called self-attention. The self part of\nself-attention refers to the \"egocentric\" focus of each token in a corpus.\nEffectively, on behalf of each token of input, self-attention asks, \"How much\ndoes every other token of input matter to **me**?\" To simplify matters, let's\nassume that each token is a word and the complete context is a single\nsentence. Consider the following sentence:\n\u003e The animal didn't cross the street because it was too tired.\n\nThere are 11 words in the preceding sentence, so each of the 11 words is paying\nattention to the other ten, wondering how much each of those ten words matters\nto them. For example, notice that the sentence contains the pronoun **it** .\nPronouns are often ambiguous. The pronoun **it** always refers to a recent noun,\nbut in the example sentence, which recent noun does **it** refer to: the animal\nor the street?\n\nThe self-attention mechanism determines the relevance of each nearby word to\nthe pronoun **it**.\n\nWhat are some use cases for LLMs?\n---------------------------------\n\nLLMs are highly effective at the task they were built for, which is generating\nthe most plausible text in response to an input. They are even beginning to show\nstrong performance on other tasks; for example, summarization, question\nanswering, and text classification. These are called\n[emergent abilities](https://research.google/pubs/pub52065/). LLMs can even\nsolve some math problems and write code (though it's advisable to check their\nwork).\n\nLLMs are excellent at mimicking human speech patterns. Among other things,\nthey're great at combining information with different styles and tones.\n\nHowever, LLMs can be components of models that do more than just\ngenerate text. Recent LLMs have been used to build sentiment detectors,\ntoxicity classifiers, and generate image captions.\n\n### LLM Considerations\n\nModels this large are not without their drawbacks.\n\nThe largest LLMs are expensive. They can take months to train, and as a result\nconsume lots of resources.\n\nThey can also usually be repurposed for other tasks, a valuable silver lining.\n\nTraining models with upwards of [a trillion parameters](https://cloud.google.com/blog/products/ai-machine-learning/training-a-recommender-model-of-100-trillions-parameters-on-google-cloud)\ncreates engineering challenges. Special infrastructure and programming\ntechniques are required to coordinate the flow to the chips and back again.\n\nThere are ways to mitigate the costs of these large models. Two approaches are\n[offline inference](https://developers.google.com/machine-learning/glossary#offline-inference)\nand\n[distillation](https://ai.googleblog.com/2021/12/training-machine-learning-models-more.html).\n\nBias can be a problem in very large models and should be considered in training\nand deployment.\n\nAs these models are trained on human language, this can introduce numerous\npotential ethical issues, including the misuse of language, and bias in race,\ngender, religion, and more.\n\nIt should be clear that as these models continue to get bigger and perform\nbetter, there is continuing need to be diligent about understanding and\nmitigating their drawbacks. Learn more about Google's approach to\n[responsible AI](https://ai.google/responsibility/responsible-ai-practices/).\n\nLearn more about LLMs\n---------------------\n\nInterested in a more in-depth introduction to large language models? Check\nout the new [Large language models](/machine-learning/crash-course/llm) module\nin [Machine Learning Crash Course](/machine-learning/crash-course)."]]

(责任编辑：)

搜索

热门标签:

Introduction 皇冠to Large Language Models