借助LLM实现模型选择和试验自动化-51CTO.COM

译者 | 布加迪

审校 | 重楼

大语言模型（LLM）已成为一种工具，从回答问题到生成任务列表，它们在许多方面简化了我们的工作。如今个人和企业已经使用LLM来帮助完成工作。

代码生成和评估最近已经成为许多商业产品提供的重要功能，以帮助开发人员处理代码。LLM还可以进一步用于处理数据科学工作，尤其是模型选择和试验。

本文将探讨如何将自动化用于模型选择和试验。

借助LLM实现模型选择和试验自动化

我们将设置用于模型训练的数据集和用于自动化的代码。在这个例子中，我们将使用来自Kaggle的信用汽车欺诈数据集。以下是我为预处理过程所做的准备。

import pandas as pd

df = pd.read_csv('fraud_data.csv')
df = df.drop(['trans_date_trans_time', 'merchant', 'dob', 'trans_num', 'merch_lat', 'merch_long'], axis =1)

df = df.dropna().reset_index(drop = True)
df.to_csv('fraud_data.csv', index = False)1.
2.
3.
4.
5.
6.
7.

我们将只使用一些数据集，丢弃所有缺失的数据。这不是最优的过程，但我们关注的是模型选择和试验。

接下来，我们将为我们的项目准备一个文件夹，将所有相关文件放在那里。首先，我们将为环境创建requirements.txt文件。你可以用下面的软件包来填充它们。

openai
pandas
scikit-learn
pyyaml1.
2.
3.
4.

接下来，我们将为所有相关的元数据使用YAML文件。这将包括OpenAI API密钥、要测试的模型、评估度量指标和数据集的位置。

llm_api_key: "YOUR-OPENAI-API-KEY"
default_models:
  - LogisticRegression
  - DecisionTreeClassifier
  - RandomForestClassifier
metrics: ["accuracy", "precision", "recall", "f1_score"]
dataset_path: "fraud_data.csv"1.
2.
3.
4.
5.
6.
7.

然后，我们导入这个过程中使用的软件包。我们将依靠Scikit-Learn用于建模过程，并使用OpenAI的GPT-4作为LLM。

import pandas as pd
import yaml
import ast
import re
import sklearn
from openai import OpenAI
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

此外，我们将设置辅助（helper）函数和信息来帮助该过程。从数据集加载到数据预处理，配置加载器在如下的函数中。

model_mapping = {
    "LogisticRegression": LogisticRegression,
    "DecisionTreeClassifier": DecisionTreeClassifier,
    "RandomForestClassifier": RandomForestClassifier
}

def load_config(config_path='config.yaml'):
    with open(config_path, 'r') as file:
        config = yaml.safe_load(file)
    return config

def load_data(dataset_path):
    return pd.read_csv(dataset_path)

def preprocess_data(df):
    label_encoders = {}
    for column in df.select_dtypes(include=['object']).columns:
        le = LabelEncoder()
        df[column] = le.fit_transform(df[column])
        label_encoders[column] = le
    return df, label_encoders1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.

在同一个文件中，我们将LLM设置为扮演机器学习角色的专家。我们将使用下面的代码来启动它。

def call_llm(prompt, api_key):
    client = OpenAI(api_key=api_key)
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are an expert in machine learning and able to evaluate the model well."},
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content.strip()1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

你可以将LLM模型更改为所需的模型，比如来自Hugging Face的开源模型，但我们建议暂且坚持使用OpenAI。

我将在下面的代码中准备一个函数来清理LLM结果。这确保了输出可以用于模型选择和试验步骤的后续过程。

def clean_hyperparameter_suggestion(suggestion):
    pattern = r'\{.*?\}'
    match = re.search(pattern, suggestion, re.DOTALL)
    if match:
        cleaned_suggestion = match.group(0)
        return cleaned_suggestion
    else:
        print("Could not find a dictionary in the hyperparameter suggestion.")
        return None

def extract_model_name(llm_response, available_models):
    for model in available_models:
        pattern = r'\b' + re.escape(model) + r'\b'
        if re.search(pattern, llm_response, re.IGNORECASE):
            return model
    return None

def validate_hyperparameters(model_class, hyperparameters):
    valid_params = model_class().get_params()
    invalid_params = []
    for param, value in hyperparameters.items():
        if param not in valid_params:
            invalid_params.append(param)
        else:
            if param == 'max_features' and value == 'auto':
                print(f"Invalid value for parameter '{param}': '{value}'")
                invalid_params.append(param)
    if invalid_params:
        print(f"Invalid hyperparameters for {model_class.__name__}: {invalid_params}")
        return False
    return True

def correct_hyperparameters(hyperparameters, model_name):
    corrected = False
    if model_name == "RandomForestClassifier":
        if 'max_features' in hyperparameters and hyperparameters['max_features'] == 'auto':
            print("Correcting 'max_features' from 'auto' to 'sqrt' for RandomForestClassifier.")
            hyperparameters['max_features'] = 'sqrt'
            corrected = True
    return hyperparameters, corrected1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.

然后，我们将需要该函数来启动模型和评估训练过程。下面的代码将用于通过接受分割器数据集、我们要映射的模型名称以及超参数来训练模型。结果将是度量指标和模型对象。

def train_and_evaluate(X_train, X_test, y_train, y_test, model_name, hyperparameters=None):
    if model_name not in model_mapping:
        print(f"Valid model names are: {list(model_mapping.keys())}")
        return None, None

    model_class = model_mapping.get(model_name)
    try:
        if hyperparameters:
            hyperparameters, corrected = correct_hyperparameters(hyperparameters, model_name)
            if not validate_hyperparameters(model_class, hyperparameters):
                return None, None
            model = model_class(**hyperparameters)
        else:
            model = model_class()
    except Exception as e:
        print(f"Error instantiating model with hyperparameters: {e}")
        return None, None
    try:
        model.fit(X_train, y_train)
    except Exception as e:
        print(f"Error during model fitting: {e}")
        return None, None


    y_pred = model.predict(X_test)
    metrics = {
        "accuracy": accuracy_score(y_test, y_pred),
        "precision": precision_score(y_test, y_pred, average='weighted', zero_division=0),
        "recall": recall_score(y_test, y_pred, average='weighted', zero_division=0),
        "f1_score": f1_score(y_test, y_pred, average='weighted', zero_division=0)
    }
    return metrics, model1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.

准备就绪后，我们就可以设置自动化过程了。有几个步骤我们可以实现自动化，其中包括：

1.训练和评估所有模型

2. LLM选择最佳模型

3. 检查最佳模型的超参数调优

4. 如果LLM建议，自动运行超参数调优

def run_llm_based_model_selection_experiment(df, config):
    #Model Training
    X = df.drop("is_fraud", axis=1)
    y = df["is_fraud"]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

    available_models = config['default_models']
    model_performance = {}

    for model_name in available_models:
        print(f"Training model: {model_name}")
        metrics, _ = train_and_evaluate(X_train, X_test, y_train, y_test, model_name)
        model_performance[model_name] = metrics
        print(f"Model: {model_name} | Metrics: {metrics}")

    #LLM selecting the best model
    sklearn_version = sklearn.__version__
    prompt = (
        f"I have trained the following models with these metrics: {model_performance}. "
        "Which model should I select based on the best performance?"
    )
    best_model_response = call_llm(prompt, config['llm_api_key'])
    print(f"LLM response for best model selection:\n{best_model_response}")

    best_model = extract_model_name(best_model_response, available_models)
    if not best_model:
        print("Error: Could not extract a valid model name from LLM response.")
        return
    print(f"LLM selected the best model: {best_model}")

    #Check for hyperparameter tuning
    prompt_tuning = (
        f"The selected model is {best_model}. Can you suggest hyperparameters for better performance? "
        "Please provide them in Python dictionary format, like {'max_depth': 5, 'min_samples_split': 4}. "
        f"Ensure that all suggested hyperparameters are valid for scikit-learn version {sklearn_version}, "
        "and avoid using deprecated or invalid values such as 'max_features': 'auto'. "
        "Don't provide any explanation or return in any other format."
    )
    tuning_suggestion = call_llm(prompt_tuning, config['llm_api_key'])
    print(f"Hyperparameter tuning suggestion received:\n{tuning_suggestion}")

    cleaned_suggestion = clean_hyperparameter_suggestion(tuning_suggestion)
    if cleaned_suggestion is None:
        suggested_params = None
    else:
        try:
            suggested_params = ast.literal_eval(cleaned_suggestion)
            if not isinstance(suggested_params, dict):
                print("Hyperparameter suggestion is not a valid dictionary.")
                suggested_params = None
        except (ValueError, SyntaxError) as e:
            print(f"Error parsing hyperparameter suggestion: {e}")
            suggested_params = None

    #Automatically run hyperparameter tuning if suggested
    if suggested_params:
        print(f"Running {best_model} with suggested hyperparameters: {suggested_params}")
        tuned_metrics, _ = train_and_evaluate(
            X_train, X_test, y_train, y_test, best_model, hyperparameters=suggested_params
        )
        print(f"Metrics after tuning: {tuned_metrics}")
    else:
        print("No valid hyperparameters were provided for tuning.")1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.

在上面的代码中，我指定了LLM如何根据试验评估我们的每个模型。我们使用以下提示根据模型的性能来选择要使用的模型。

prompt = (
        f"I have trained the following models with these metrics: {model_performance}. "
        "Which model should I select based on the best performance?")1.
2.
3.

你始终可以更改提示，以实现模型选择的不同规则。

一旦选择了最佳模型，我将使用以下提示来建议应该使用哪些超参数用于后续过程。我还指定了Scikit-Learn版本，因为超参数因版本的不同而有变化。

prompt_tuning = (
        f"The selected model is {best_model}. Can you suggest hyperparameters for better performance? "
        "Please provide them in Python dictionary format, like {'max_depth': 5, 'min_samples_split': 4}. "
        f"Ensure that all suggested hyperparameters are valid for scikit-learn version {sklearn_version}, "
        "and avoid using deprecated or invalid values such as 'max_features': 'auto'. "
        "Don't provide any explanation or return in any other format.")1.
2.
3.
4.
5.
6.

你可以以任何想要的方式更改提示，比如通过更大胆地尝试调优超参数，或添加另一种技术。

我把上面的所有代码放在一个名为automated_model_llm.py的文件中。最后，添加以下代码以运行整个过程。

def main():
    config = load_config()
    df = load_data(config['dataset_path'])
    df, _ = preprocess_data(df)
    run_llm_based_model_selection_experiment(df, config)


if __name__ == "__main__":
    main()1.
2.
3.
4.
5.
6.
7.
8.
9.

一旦一切准备就绪，你就可以运行以下代码来执行代码。

python automated_model_llm.py1.

输出：

LLM selected the best model: RandomForestClassifier
Hyperparameter tuning suggestion received:
{
'n_estimators': 100,
'max_depth': None,
'min_samples_split': 2,
'min_samples_leaf': 1,
'max_features': 'sqrt',
'bootstrap': True
}
Running RandomForestClassifier with suggested hyperparameters: {'n_estimators': 100, 'max_depth': None, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'bootstrap': True}
Metrics after tuning: {'accuracy': 0.9730041532071989, 'precision': 0.9722907483489197, 'recall': 0.9730041532071989, 'f1_score': 0.9724045530119824}1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

这是我试验得到的示例输出。它可能和你的不一样。你可以设置提示和生成参数，以获得更加多变或严格的LLM输出。然而，如果你正确构建了代码的结构，可以将LLM运用于模型选择和试验自动化。

结论

LLM已经应用于许多使用场景，包括代码生成。通过运用LLM（比如OpenAI GPT模型），我们就很容易委派LLM处理模型选择和试验这项任务，只要我们正确地构建输出的结构。在本例中，我们使用样本数据集对模型进行试验，让LLM选择和试验以改进模型。

原文标题：Model Selection and Experimentation Automation with LLMs，作者：Cornellius Yudha Wijaya