白嫖资源训练 DeepSeek R1 推理模型精华

发布于 2025-2-26 14:40

浏览

0收藏

DeepSeek 颠覆了 AI 领域，通过推出一系列全新高级推理模型挑战 OpenAI 的主导地位。最棒的是？这些模型完全免费使用，没有任何限制，每个人都可以使用。您可以在下面观看有关如何微调 DeepSeek 的视频教程。

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

在本教程中，我们将在 Hugging Face 的医疗思维链数据集上对模型进行微调，微调的基础模型为 DeepSeek-R1-Distill-Llama-8B。这个精简的 DeepSeek-R1 模型是通过在使用 DeepSeek-R1 生成的数据上对 Llama 3.1 8B 模型进行微调而创建的。它展示了与原始模型类似的推理能力。

如果您是 LLM 和微调的新手，我强烈建议您参加 Python 中的大型语言模型导论课程。

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

DeepSeek R1 简介

中国人工智能公司 DeepSeek AI （深度求索）已开源其第一代推理模型 DeepSeek-R1 和 DeepSeek-R1-Zero，它们在数学、编码和逻辑等推理任务上的表现可与 OpenAI 的 o1 相媲美。您可以访问 DeepSeek 的官方网站了解更详细的内容。

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

DeepSeek-R1-Zero

DeepSeek-R1-Zero 是第一个完全用大规模强化学习（而不是监督式微调）来训练的开源模型。这种方式让模型能够自己探索思路链推理，解决复杂问题，并不断改进输出。不过，它也有一些问题，比如会重复推理步骤、生成的内容不容易读懂，还有可能会混杂不同的语言，这些都会影响它的清晰度和实用性。

DeepSeek-R1

DeepSeek-R1 的推出是为了改进 DeepSeek-R1-Zero 的不足，通过在强化学习前加入一些初始数据，为处理推理和非推理任务打下更好的基础。这种分阶段的训练方法让模型在数学、代码和推理测试中的表现达到了与 OpenAI-o1 相当的高水平，同时还提高了输出内容的可读性和连贯性。

DeepSeek 蒸馏

除了那些需要大量计算资源和内存支持的大型语言模型外，DeepSeek 还开发了一系列精简版模型。这些更紧凑且高效的模型已经证明能够在推理性能上保持高水平。它们的参数规模从 1.5B 到 70B 不等，同时保留了卓越的推理能力。特别值得一提的是，DeepSeek-R1-Distill-Qwen-32B 模型在多个基准测试中均超过了 OpenAI-o1-mini 的表现。较小规模的模型成功地继承了大规模模型的推理特性，充分展示了知识蒸馏技术的有效性。

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

来源：deepseek-ai/DeepSeek-R1

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

阅读DeepSeek -R1：功能、o1 比较、提炼模型等博客，了解其主要功能、开发过程、提炼模型、访问、定价以及与 OpenAI o1 的比较。

微调所需资源

模型	GPU	CPU	内存	磁盘	耗时
DeepSeek-R1-Distill-Llama-8B	T4 x 2 15G	4核	32G	200G	23分钟

什么？你说上面的配置太高？😱 好吧，跟我往下走，教你如何白嫖！👇👇👇

微调 DeepSeek R1：分步指南

要微调DeepSeek R1模型，您可以按照以下步骤操作：

1. 设置

对于这个项目，我们使用 Kaggle 作为我们的 Cloud IDE，因为它可以免费访问 GPU，而这些 GPU 通常比 Google Colab 中提供的 GPU 更强大。首先，启动一个新的 Kaggle 笔记本，并将您的 Hugging Face 令牌和 Weights & Biases 令牌添加为机密。关于如何获取令牌参考文末 QA 环节。

您可以通过导航到 Add-onsKaggle 笔记本界面中的选项卡并选择Secrets选项来添加机密。

设置机密后，安装 unslothPython 包。Unsloth 是一个开源框架，旨在使微调大型语言模型 (LLM) 的速度提高 2 倍，并且更节省内存。

阅读我们的 Unsloth 指南：优化和加速 LLM 微调，以了解 Unsloth 的主要特性、各种功能以及如何优化您的微调工作流程。

!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git1.
2.

使用我们从 Kaggle Secrets 中安全提取的 Hugging Face API 登录到 Hugging Face CLI。

from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()


hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
login(hf_token)1.
2.
3.
4.
5.
6.
7.

使用您的 API 密钥登录 Weights & Biases（wandb）并创建一个新项目来跟踪实验和微调进度。

import wandb


wb_token = user_secrets.get_secret("wandb")


wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset', 
    job_type="training", 
    annotallow="allow"
)1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

2. 加载模型和标记器

对于这个项目，我们正在加载DeepSeek-R1-Distill-Llama-8B 的 Unsloth 版本。此外，我们将以 4 位量化加载模型，以优化内存使用和性能。

from unsloth import FastLanguageModel


max_seq_length = 2048 
dtype = None 
load_in_4bit = True




model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token, 
)1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

3. 微调前的模型推理

为了为模型创建提示样式，我们将定义一个系统提示，并包含用于生成问题和响应的占位符。提示将引导模型逐步思考并提供合乎逻辑且准确的响应。

prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.


### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 


### Question:
{}


### Response:
<think>{}"""




## =========================以下为中文翻译======================================


prompt_style = """以下是一条描述任务的指令，以及为其提供更多背景信息的输入内容。请给出一个能恰当完成该请求的回复。在回答之前，仔细思考问题，并创建一个逐步的思路链，以确保回复符合逻辑且准确。


### 指令：
你是一位在临床推理、诊断和治疗计划方面拥有高级知识的医学专家。请回答以下医学问题。


### 问题：
{}


### 回复：
<think>{}"""1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.

在这个例子中，我们将向提供一个医疗问题 prompt_style，将其转换为标记，然后将标记传递给模型进行响应生成。

question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"




FastLanguageModel.for_inference(model) 
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")


outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])




## =========================以下为中文翻译======================================
一位 61 岁的女性，有长期在咳嗽或打喷嚏等活动时不自主漏尿但夜间无漏尿的病史，进行了妇科检查和棉签试验。基于这些发现，膀胱测压最有可能揭示她的残余尿量和逼尿肌收缩情况如何？1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.

英文效果

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

中文效果

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

即使没有微调，我们的模型也成功地生成了思路链，并在给出最终答案之前进行了推理。推理过程封装在 <think></think> 标签中。

那么，为什么我们还需要微调呢？推理过程虽然详细，但却冗长而不简洁。此外，最终答案是以项目符号格式呈现的，这偏离了我们想要微调的数据集的结构和风格。

<think>
Okay, so I have this medical question to answer. Let me try to break it down. The patient is a 61-year-old woman with a history of involuntary urine loss during activities like coughing or sneezing, but she doesn't leak at night. She's had a gynecological exam and a Q-tip test. I need to figure out what cystometry would show regarding her residual volume and detrusor contractions.


First, I should recall what I know about urinary incontinence. Involuntary urine loss during activities like coughing or sneezing makes me think of stress urinary incontinence. Stress incontinence typically happens when the urethral sphincter isn't strong enough to resist increased abdominal pressure from activities like coughing, laughing, or sneezing. This usually affects women, especially after childbirth when the pelvic muscles and ligaments are weakened.


The Q-tip test is a common diagnostic tool for stress urinary incontinence. The test involves inserting a Q-tip catheter, which is a small balloon catheter, into the urethra. The catheter is connected to a pressure gauge. The patient is asked to cough, and the pressure reading is taken. If the pressure is above normal (like above 100 mmHg), it suggests that the urethral sphincter isn't closing properly, which is a sign of stress incontinence.


So, based on the history and the Q-tip test, the diagnosis is likely stress urinary incontinence. Now, moving on to what cystometry would show. Cystometry, also known as a filling cystometry, is a diagnostic procedure where a catheter is inserted into the bladder, and the bladder is filled with a liquid to measure how much it can hold (residual volume) and how it responds to being filled (like during a cough or sneeze). This helps in assessing the capacity and compliance of the bladder.


In a patient with stress incontinence, the bladder's capacity might be normal, but the sphincter's function is impaired. So, during the cystometry, the residual volume might be within normal limits because the bladder isn't overfilled. However, when the patient is asked to cough or perform a Valsalva maneuver, the detrusor muscle (the smooth muscle layer of the bladder) might not contract effectively, leading to an increase in intra-abdominal pressure, which might cause leakage.


Wait, but detrusor contractions are usually associated with voiding. In stress incontinence, the issue isn't with the detrusor contractions but with the sphincter's inability to prevent leakage. So, during cystometry, the detrusor contractions would be normal because they are part of the normal voiding process. However, the problem is that the sphincter doesn't close properly, leading to leakage.


So, putting it all together, the residual volume might be normal, but the detrusor contractions would be normal as well. The key finding would be the impaired sphincter function leading to incontinence, which is typically demonstrated during the Q-tip test and clinical history. Therefore, the cystometry would likely show normal residual volume and normal detrusor contractions, but the underlying issue is the sphincter's inability to prevent leakage.
</think>


Based on the provided information, the cystometry findings in this 61-year-old woman with stress urinary incontinence would likely demonstrate the following:


1. **Residual Volume**: The residual volume would be within normal limits. This is because the bladder's capacity is typically normal in cases of stress incontinence, where the primary issue lies with the sphincter function rather than the bladder's capacity.


2. **Detrusor Contractions**: The detrusor contractions would also be normal. These contractions are part of the normal voiding process and are not impaired in stress urinary incontinence. The issue is not with the detrusor muscle but with the sphincter's inability to prevent leakage.


In summary, the key findings of the cystometry would be normal residual volume and normal detrusor contractions, highlighting the sphincteric defect as the underlying cause of the incontinence.<｜end▁of▁sentence｜>1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.

4.加载和处理数据集

我们将通过添加复杂思路链列的第三个占位符来稍微改变处理数据集的提示样式。

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.


### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 


### Question:
{}


### Response:
<think>
{}
</think>
{}"""




## =========================以下为中文翻译======================================
train_prompt_style = """以下是一个描述任务的指令，与提供进一步上下文的输入相配对。写出一个适当完成请求的回应。在回答之前，仔细思考问题并创建一个逐步的思维链，以确保逻辑准确的回应。


### 指令：
您是医学专家，在临床推理、诊断和治疗计划方面拥有先进的知识。
请回答以下医学问题。


### 问题：


{}
### 响应：
<think>
{}
</think>
{}"""1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.

编写 Python 函数，在数据集中创建一个“文本”列，该列由训练提示样式组成。用问题、文本链和答案填充占位符。

我们从 Hugging Face 获取医疗行业的思维链数据集中加载前 500 个样本。之后，我们将 text 使用formatting_prompts_func 函数映射列。

from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]1.
2.
3.
4.

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

数据集样例

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

正如我们所看到的，文本列有一个系统提示、说明、思路链以及答案。

"Below is an instruction that describes a task, paired with an input that provides further context. \n
Write a response that appropriately completes the request. \n
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n
### Instruction:\n
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \n
Please answer the following medical question. \n\n
### Question:\n
A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n
### Response:\n
<think>\n
Okay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem. \n\nThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.\n\nNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal. \n\n
All in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.\n
</think>\n
Cystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.
<｜end▁of▁sentence｜>"1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.

5. 建立模型

使用目标模型，我们将通过向模型添加低秩适配器来建立模型。

model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.

接下来，我们将设置训练参数并创建训练器，通过提供模型、分词器、数据集以及其他重要的训练参数，这些参数将优化我们的微调过程。

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported


trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

如果报错提示：AttributeError: _unwrapped_old_generate 则更新下库

# 更新库到最新版本
pip install --upgrade unsloth transformers


# 或者回退到特定版本
pip install unsloth==x.y.z transformers==a.b.c1.
2.
3.
4.
5.
6.

6.模型训练

运行以下命令开始训练。

trainer_stats = trainer.train()1.

等待，训练中，不知道为啥只用一个 GPU 可能是没有开并行训练的事，后续可以改下脚本试下

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

训练过程耗时 23 分钟。训练损失逐渐减少，这是模型性能提高的一个好兆头。

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

如果您在运行上述代码时遇到问题，请参阅微调 DeepSeek R1（推理模型） Kaggle 笔记本。

7. 微调后的模型推理

为了比较结果，我们将向微调模型提出与之前相同的问题，看看有什么变化。

question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"




FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")


outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.

这要好得多且更加准确。思路连贯，答案直截了当且在一个段落中。微调是成功的。

<think>
Okay, so let's think about this. We have a 61-year-old woman who's been dealing with involuntary urine loss during things like coughing or sneezing, but she's not leaking at night. That suggests she might have some kind of problem with her pelvic floor muscles or maybe her bladder.


Now, she's got a gynecological exam and a Q-tip test. Let's break that down. The Q-tip test is usually used to check for urethral obstruction. If it's positive, that means there's something blocking the urethra, like a urethral stricture or something else.


Given that she's had a positive Q-tip test, it's likely there's a urethral obstruction. That would mean her urethra is narrow, maybe due to a stricture or some kind of narrowing. So, her bladder can't empty properly during activities like coughing because the urethral obstruction is making it hard.


Now, let's think about what happens when her bladder can't empty. If there's a urethral obstruction, the bladder is forced to hold more urine, increasing the residual volume. That's because her bladder doesn't empty completely. So, her residual volume is probably increased.


Also, if her bladder can't empty properly, she might have increased detrusor contractions. These contractions are usually stronger to push the urine out. So, we expect her detrusor contractions to be increased.


Putting it all together, if she has a urethral obstruction and a positive Q-tip test, we'd expect her cystometry results to show increased residual volume and increased detrusor contractions. That makes sense because of the obstruction and how her bladder is trying to compensate by contracting more.
</think>
Based on the findings of the gynecological exam and the positive Q-tip test, it is most likely that the cystometry would reveal increased residual volume and increased detrusor contractions. The positive Q-tip test indicates urethral obstruction, which would force the bladder to retain more urine, thereby increasing the residual volume. Additionally, the obstruction can lead to increased detrusor contractions as the bladder tries to compensate by contracting more to expel the urine.<｜end▁of▁sentence｜>1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.

8. 本地保存模型

现在，让我们在本地保存 adopter、full model 和 tokenizer ，以便我们可以在其他项目中使用它们。

new_model_local = "DeepSeek-R1-Medical-COT"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)


model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)1.
2.
3.
4.
5.
6.

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

9. 将模型推送至 Hugging Face Hub

我们还可以把 adopter, tokenizer, and model 推送到 Hugging Face Hub，以便 AI 社区可以将此模型集成到他们的系统中来利用它。

new_model_online = "skyxiaowang/DeepSeek-R1-Medical-COT"
model.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)


model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit"))1.
2.
3.
4.
5.
6.

注意：要提交到自己的命名空间下，提供的 HF 的 token 必须要有 write 权限

等待上传....

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

ok，上传完成，登录 HF 查看，模型已经存在

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

学习之旅的下一步是将模型部署到云端。您可以按照如何使用 BentoML 部署 LLM 指南进行操作，该指南提供了使用 BentoML 和 vLLM 等工具高效且经济高效地部署大型语言模型的分步流程。

或者，如果您更喜欢在本地使用该模型，您可以将其转换为 GGUF 格式并在您的机器上运行。为此，请查看微调 Llama 3.2 并在本地使用：分步指南指南，其中提供了有关本地使用的详细说明。

微调结束，记着手动关闭 kaggle 环境，节省 GPU 资源

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

结论

在人工智能领域，情况正在迅速变化。开源社区正在崛起，挑战过去三年中一直统治人工智能领域的专有模型的主导地位。开源大型语言模型（LLMs）正变得更好、更快、更高效，使得在较低的计算和内存资源上对其进行微调比以往任何时候都更容易。在本教程中，我们探索了 DeepSeek R1 推理模型，并学习了如何对其精简版本进行微调以用于医疗问答任务。经过微调的推理模型不仅能提高性能，还能使其在医学、紧急服务和医疗保健等关键领域得到应用。为了应对 DeepSeek R1 的推出，OpenAI 推出了两个强大的工具：OpenAI 的 o3，一个更先进的推理模型，以及由新的计算机使用代理（CUA）模型驱动的 OpenAI 的 Operator AI 代理，它可以自主浏览网站并执行任务。xAI 推出了带深度思考的 Grok 3，一个用 20 万块显卡训练的大模型，性能超过所有同类开源和闭源模型，但是实测也差强人意，每日智能免费问两次，收费也贵的吓人，得到了 30 美元/月，我摸了摸钱包还是很自觉的去用 DeepSeek R1 了，免费又好用，谁能不爱？

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

如果你觉着一步一步的写代码比较费时，不要紧我已经给你准备好了懒人脚本，如下：

https://www.kaggle.com/code/kingabzpro/fine-tuning-deepseek-r1-reasoning-model

你说我对你好不好？🐶

关于小白问题的 QA 解答

1. 如何获取 HF 令牌

访问 Hugging Face 官网并登录你的账户。

点击右上角你的头像，选择 “Settings”（设置）。

在左侧菜单中选择 “Access Tokens”（访问令牌）。

点击 “New token”（新令牌），为令牌设置一个名称，选择合适的权限（通常选择 “read” 即可），然后点击 “Generate a token”（生成令牌），复制生成的令牌。

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

2. 如何获取 Weights & Biases 令牌

访问 Weights & Biases 官网并登录你的账户。

点击右上角你的头像，选择 “Settings”（设置）。

在 “API Keys”（API 密钥）部分，点击 “Generate”（生成），复制生成的 API 密钥。

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

3. Kaggle 使用

添加密钥

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

开启免费 GPU

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区

点星标，不迷路，获取最新最前沿的人工智能技术

白嫖资源训练 DeepSeek R1 推理模型-AI.x社区图片

[1] Python 中的大型语言模型导论：https://www.datacamp.com/courses/introduction-to-llms-in-python

[2] 强化学习：基于 Python 示例的介绍：https://www.datacamp.com/tutorial/reinforcement-learning-python-introduction

[3] 思维链推理习：https://www.datacamp.com/tutorial/chain-of-thought-prompting

[4] DeepSeek-R1：https://github.com/deepseek-ai/DeepSeek-R1

[5] DeepSeek-R1 功能和 o1 的比较、蒸馏模型等：https://www.datacamp.com/blog/deepseek-r1

[6] Weights & Biases 官网（wandb）： https://wandb.ai/home

[7] kaggle：https://www.kaggle.com/

[8] 原文链接：https://www.datacamp.com/tutorial/fine-tuning-deepseek-r1-reasoning-model?utm_source=chatgpt.com

[9] Unsloth 指南：https://www.datacamp.com/tutorial/unsloth-guide-optimize-and-speed-up-llm-fine-tuning

[10] 基模 HF 地址：https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B

[11] Kaggle 使用指南：https://blog.csdn.net/weixin_42426841/article/details/143591586

[12] 医学思维链数据集：https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT?row=46

[13] 微调 DeepSeek R1（推理模型）Kaggle 笔记本：https://www.kaggle.com/code/kingabzpro/fine-tuning-deepseek-r1-reasoning-model

[14] 如何使用 BentoML 部署 LLM：https://www.datacamp.com/tutorial/deploy-llms-with-bentoml

[15] 微调 Llama 3.2 并在本地使用：分步指南：https://www.datacamp.com/tutorial/fine-tuning-llama-3-2

[16] Hugging Face 官网：https://huggingface.co/

[17] OpenAI 的 O3：特性、与 O1 的比较、发布日期及更多内容：https://www.datacamp.com/blog/o3-openai

[18] OpenAI 的 Operator：示例、用例、竞争及更多：https://www.datacamp.com/blog/operator

[19] 懒人脚本：https://www.kaggle.com/code/kingabzpro/fine-tuning-deepseek-r1-reasoning-model

[20] DeepSeek 的官方网站：https://www.deepseek.com/

本文转载自 AIGC前沿技术追踪，作者：喜欢学习的小仙女

标签

51CTO

51CTO博客

51CTO学堂

白嫖资源训练 DeepSeek R1 推理模型精华

DeepSeek R1 简介

DeepSeek-R1-Zero

DeepSeek-R1

DeepSeek 蒸馏

微调所需资源

微调 DeepSeek R1：分步指南

1. 设置

2. 加载模型和标记器

3. 微调前的模型推理

英文效果

中文效果

4.加载和处理数据集

5. 建立模型

6.模型训练

7. 微调后的模型推理

8. 本地保存模型

9. 将模型推送至 Hugging Face Hub

结论

关于小白问题的 QA 解答

1. 如何获取 HF 令牌

2. 如何获取 Weights & Biases 令牌

3. Kaggle 使用

目录

51CTO

51CTO博客

51CTO学堂

白嫖资源训练 DeepSeek R1 推理模型 精华

DeepSeek R1 简介

DeepSeek-R1-Zero

DeepSeek-R1

DeepSeek 蒸馏

微调所需资源

微调 DeepSeek R1：分步指南

1. 设置

2. 加载模型和标记器

3. 微调前的模型推理

英文效果

中文效果

4.加载和处理数据集

5. 建立模型

6.模型训练

7. 微调后的模型推理

8. 本地保存模型

9. 将模型推送至 Hugging Face Hub

结论

关于小白问题的 QA 解答

1. 如何获取 HF 令牌

2. 如何获取 Weights & Biases 令牌

3. Kaggle 使用

目录

白嫖资源训练 DeepSeek R1 推理模型精华