GNN 欺诈检测管道
所有环境都需要额外的 Conda 包,这些包可以使用 conda/environments/all_cuda-125_arch-x86_64.yaml
或 conda/environments/examples_cuda-125_arch-x86_64.yaml
环境变量文件安装。有关更多信息,请参阅 要求 部分。
环境 |
支持 |
注释 |
---|---|---|
Conda | ✔ | |
Morpheus Docker 容器 | ✔ | |
Morpheus 发布容器 | ✔ | |
Dev 容器 | ✔ |
在运行 GNN 欺诈检测管道之前,必须将其他要求安装到您的 Conda 环境中。
conda env update --solver=libmamba \
-n ${CONDA_DEFAULT_ENV} \
--file ./conda/environments/examples_cuda-125_arch-x86_64.yaml
使用 Morpheus 运行带有交易数据的 GNN 欺诈检测管道。管道已在 run.py
中配置,并带有多个命令行选项
python examples/gnn_fraud_detection_pipeline/run.py --help
Usage: run.py [OPTIONS]
Options:
--num_threads INTEGER RANGE Number of internal pipeline threads to use.
[x>=1]
--pipeline_batch_size INTEGER RANGE
Internal batch size for the pipeline. Can be
much larger than the model batch size. Also
used for Kafka consumers. [x>=1]
--model_max_batch_size INTEGER RANGE
Max batch size to use for the model. [x>=1]
--model_fea_length INTEGER RANGE
Features length to use for the model.
[x>=1]
--input_file PATH Input data filepath. [required]
--training_file PATH Training data filepath. [required]
--model_dir PATH Trained model directory path [required]
--output_file TEXT The path to the file where the inference
output will be saved.
--help Show this message and exit.
要启动配置的 Morpheus 管道,请运行以下命令
python examples/gnn_fraud_detection_pipeline/run.py
====Registering Pipeline====
====Building Pipeline====
====Building Pipeline Complete!====
====Registering Pipeline Complete!====
====Starting Pipeline====
====Pipeline Started====
====Building Segment: linear_segment_0====
Added source: <from-file-0; FileSourceStage(filename=/examples/gnn_fraud_detection_pipeline/validation.csv, iterative=False, file_type=FileTypes.Auto, repeat=1, filter_null=False, filter_null_columns=None, parser_kwargs=None)>
└─> morpheus.MessageMeta
Added stage: <deserialize-1; DeserializeStage(ensure_sliceable_index=True, task_type=None, task_payload=None)>
└─ morpheus.MessageMeta -> morpheus.ControlMessage
Added stage: <fraud-graph-construction-2; FraudGraphConstructionStage(training_file=/examples/gnn_fraud_detection_pipeline/training.csv)>
└─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <monitor-3; MonitorStage(description=Graph construction rate, smoothing=0.05, unit=messages, delayed_start=False, determine_count_fn=None, log_level=LogLevels.INFO)>
└─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <gnn-fraud-sage-4; GraphSAGEStage(model_dir=/examples/gnn_fraud_detection_pipeline/model, batch_size=100, record_id=index, target_node=transaction)>
└─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <monitor-5; MonitorStage(description=Inference rate, smoothing=0.05, unit=messages, delayed_start=False, determine_count_fn=None, log_level=LogLevels.INFO)>
└─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <gnn-fraud-classification-6; ClassificationStage(model_xgb_file=/examples/gnn_fraud_detection_pipeline/model/xgb.pt)>
└─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <monitor-7; MonitorStage(description=Add classification rate, smoothing=0.05, unit=messages, delayed_start=False, determine_count_fn=None, log_level=LogLevels.INFO)>
└─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <serialize-8; SerializeStage(include=None, exclude=None, fixed_columns=True)>
└─ morpheus.ControlMessage -> morpheus.MessageMeta
Added stage: <monitor-9; MonitorStage(description=Serialize rate, smoothing=0.05, unit=messages, delayed_start=False, determine_count_fn=None, log_level=LogLevels.INFO)>
└─ morpheus.MessageMeta -> morpheus.MessageMeta
Added stage: <to-file-10; WriteToFileStage(filename=output.csv, overwrite=True, file_type=FileTypes.Auto, include_index_col=True, flush=False)>
└─ morpheus.MessageMeta -> morpheus.MessageMeta
====Building Segment Complete!====
Graph construction rate[Complete]: 265 messages [00:00, 1016.18 messages/s]
Inference rate[Complete]: 265 messages [00:00, 545.08 messages/s]
Add classification rate[Complete]: 265 messages [00:00, 492.11 messages/s]
Serialize rate[Complete]: 265 messages [00:00, 480.77 messages/s]
====Pipeline Complete====
CLI 示例
上面的示例说明了如何使用 Python API 构建自定义 Morpheus 管道。或者,也可以使用 Morpheus 命令行来完成相同的目标。为此,我们必须确保 examples
目录在 PYTHONPATH
中可用,并且每个自定义阶段都注册为插件。
注意:由于 gnn_fraud_detection_pipeline
模块对 Python 可见,我们可以通过模块名称而不是更详细的文件路径来指定插件。
从 Morpheus 仓库的根目录运行
PYTHONPATH="examples" \
morpheus --log_level INFO \
--plugin "gnn_fraud_detection_pipeline" \
run --pipeline_batch_size 1024 --model_max_batch_size 32 --edge_buffer_size 4 \
pipeline-other --model_fea_length 70 --label=probs \
from-file --filename examples/gnn_fraud_detection_pipeline/validation.csv --filter_null False \
deserialize \
fraud-graph-construction --training_file examples/gnn_fraud_detection_pipeline/training.csv \
monitor --description "Graph construction rate" \
gnn-fraud-sage --model_dir examples/gnn_fraud_detection_pipeline/model/ \
monitor --description "Inference rate" \
gnn-fraud-classification --model_xgb_file examples/gnn_fraud_detection_pipeline/model/xgb.pt \
monitor --description "Add classification rate" \
serialize \
to-file --filename "output.csv" --overwrite