Julia 教程 / Python 互操作（PyCall.jl）

Python 互操作（PyCall.jl）

Python 拥有最丰富的科学计算和机器学习生态。PyCall.jl 让 Julia 无缝调用 Python 代码，反之亦然，实现了两个生态系统的优势互补。

1. PyCall.jl 安装与配置

1.1 安装

using Pkg
Pkg.add("PyCall")

# 或在 Pkg 模式下
# ]add PyCall

1.2 配置 Python 环境

# 查看当前使用的 Python
using PyCall
println(pyversion)  # 如 v"3.11.0"

# 指定 Python 路径（修改前需要重启 Julia）
# ENV["PYTHON"] = "/usr/bin/python3"
# Pkg.build("PyCall")

1.3 使用 Conda.jl 管理 Python 环境

using Conda

# 列出已安装的包
Conda.list()

# 安装 Python 包
Conda.add("numpy")
Conda.add("matplotlib")
Conda.add("scikit-learn")

# 指定 channel
Conda.add("opencv", channel="conda-forge")

# 创建独立环境
Conda.create("myenv")
Conda.add("numpy", "myenv")

配置方式	Python 来源	适用场景
默认	Conda 内置 Python	简单使用，无需系统 Python
`ENV["PYTHON"]`	系统指定路径	需要使用特定 Python 环境
Conda.jl	Conda 管理的环境	精确控制 Python 包版本

2. 基本使用：导入 Python 模块

2.1 pyimport

using PyCall

# 导入 Python 模块
math = pyimport("math")
os = pyimport("os")
json = pyimport("json")

# 调用函数
println(math.sin(3.14159))  # ≈ 0
println(math.sqrt(2))       # 1.4142135623730951
println(os.getcwd())

2.2 导入 Python 对象

# 导入特定函数
@pyimport numpy as np
@pyimport matplotlib.pyplot as plt

# 使用
arr = np.array([1, 2, 3, 4, 5])
println(np.mean(arr))

💡 提示：@pyimport 是宏形式，等价于 pyimport 但更简洁。Julia 1.0+ 推荐使用 pyimport 配合 const。

2.3 嵌套模块导入

# 导入子模块
sklearn = pyimport("sklearn")
ensemble = sklearn.ensemble
rf = ensemble.RandomForestClassifier(n_estimators=100)

# 或一步到位
RandomForestClassifier = pyimport("sklearn.ensemble").RandomForestClassifier

3. 类型自动转换

3.1 标量类型转换

Python 类型	Julia 类型	方向
`int`	`Int64`	双向
`float`	`Float64`	双向
`str`	`String`	双向
`bool`	`Bool`	双向
`bytes`	`Vector{UInt8}`	双向
`None`	`nothing`	双向
`complex`	`Complex{Float64}`	双向

# Julia → Python
py_val = PyCall.pyint(42)        # Python int
py_float = PyCall.pyfloat(3.14)  # Python float

# Python → Julia
math = pyimport("math")
jl_val = convert(Int, math.factorial(10))  # 3628800

3.2 容器类型转换

# Python list → Julia Vector
py_list = py"[1, 2, 3, 'hello']"  # Python 字面量
jl_vec = Vector(py_list)           # [1, 2, 3, "hello"]

# Julia Vector → Python list
jl_arr = [1, 2, 3]
py_list2 = PyCall.pycall(py"list", PyObject, jl_arr)

# Python dict → Julia Dict
py_dict = py"{'a': 1, 'b': 2}"
jl_dict = Dict(py_dict)  # Dict("a" => 1, "b" => 2)

# Julia Dict → Python dict
jl_data = Dict("x" => 10, "y" => 20)
py_obj = PyObject(jl_data)

3.3 禁用自动转换

# 保持 Python 对象不自动转换
np = pyimport("numpy")
arr = np.array([1, 2, 3])

typeof(arr)  # PyObject（而不是 Julia Array）

# 手动转换
jl_arr = Array(arr)  # 转为 Julia 数组

4. 调用 Python 函数

4.1 基本函数调用

# 使用 pycall（推荐）
builtins = pyimport("builtins")
result = PyCall.pycall(builtins.len, Int, [1, 2, 3, 4, 5])  # 5

# 直接调用（语法糖）
str_mod = pyimport("str")
py_str = str_mod.upper("hello")  # "HELLO"

4.2 使用关键字参数

json = pyimport("json")

# Python: json.dumps(data, indent=2, ensure_ascii=False)
data = Dict("name" => "张三", "age" => 30)
result = json.dumps(data, indent=2, ensure_ascii=false)
println(result)

4.3 处理 Python 异常

try
    pyimport("nonexistent_module")
catch e
    if e isa PyCall.PyError
        println("Python 错误: ", e.val)
        # e.val 是 Python 异常对象
    end
end

5. NumPy 数组互操作

5.1 零拷贝转换

using PyCall

np = pyimport("numpy")

# Julia → NumPy（零拷贝）
jl_arr = rand(1000, 1000)
np_arr = np.array(jl_arr)  # 共享内存，无复制！

# NumPy → Julia（零拷贝）
np_data = np.random.randn(500, 500)
jl_data = PyCall.PyArray(np_data)
# jl_data 与 np_data 共享同一块内存

5.2 PyArray 接口

np = pyimport("numpy")

# 创建 NumPy 数组
np_arr = np.zeros((3, 4), dtype=np.float64)

# 作为 Julia 数组操作（零拷贝）
jl_view = PyCall.PyArray(np_arr)
jl_view[1, 1] = 42.0  # 直接修改，np_arr 也会变

# 验证
println(np_arr)  # 第一个元素变为 42.0

5.3 类型对应表

NumPy dtype	Julia 类型
`float64`	`Float64`
`float32`	`Float32`
`int64`	`Int64`
`int32`	`Int32`
`complex128`	`ComplexF64`
`bool`	`Bool`
`uint8`	`UInt8`

⚠️ 注意：NumPy 默认行优先（C order），Julia 默认列优先（Fortran order）。传递矩阵时注意内存布局差异。

6. Python 类实例化

6.1 创建 Python 对象

# 实例化 Python 类
collections = pyimport("collections")
Counter = collections.Counter

# 创建实例
counter = Counter("abracadabra")
println(counter)  # Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

# 访问属性
println(counter["a"])  # 5
println(counter.most_common(3))  # [('a', 5), ('b', 2), ('r', 2)]

6.2 自定义 Python 类

# 在 Julia 中定义 Python 类（使用 pytype）
# 更常见的是直接使用 Python 已有的类

datetime = pyimport("datetime")
dt = datetime.datetime(2024, 1, 15, 10, 30, 0)
println(dt.year)   # 2024
println(dt.month)  # 1

# 调用方法
formatted = dt.strftime("%Y-%m-%d %H:%M")
println(formatted)  # "2024-01-15 10:30"

6.3 继承 Python 类

# 使用 @pydef 宏（PyCall 提供，但较旧）
# 推荐直接调用 Python 方法

# 更实用的方式：创建包装器
struct PythonWrapper
    obj::PyObject
end

function Base.show(io::IO, w::PythonWrapper)
    print(io, py"repr($(w.obj))")
end

7. PythonPlot.jl — Matplotlib 绑定

7.1 基本使用

using PythonPlot

# 创建图形
fig, ax = subplots()
x = range(0, 2π, length=100)
ax.plot(x, sin.(x), label="sin(x)")
ax.plot(x, cos.(x), label="cos(x)")
ax.legend()
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_title("三角函数")
fig.savefig("trig.png", dpi=150)

7.2 子图布局

using PythonPlot

fig, axes = subplots(2, 2, figsize=(10, 8))

# 左上
axes[0, 0].plot(rand(50))
axes[0, 0].set_title("随机线图")

# 右上
axes[0, 1].hist(randn(1000), bins=30)
axes[0, 1].set_title("正态分布直方图")

# 左下
axes[1, 0].scatter(rand(50), rand(50), c=rand(50), cmap="viridis")
axes[1, 0].set_title("散点图")

# 右下
data = [randn(100) for _ in 1:4]
axes[1, 1].boxplot(data)
axes[1, 1].set_title("箱线图")

tight_layout()
fig.savefig("subplots.png")

7.3 3D 绘图

using PythonPlot

fig = figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection="3d")

x = range(-3, 3, length=50)
y = range(-3, 3, length=50)
X, Y = meshgrid(x, y)
Z = sin.(sqrt.(X.^2 + Y.^2))

ax.plot_surface(X, Y, Z, cmap="coolwarm", alpha=0.8)
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_zlabel("Z")
fig.savefig("3d_surface.png")

8. 实用场景

8.1 调用 Python 的 scikit-learn

using PyCall

sklearn = pyimport("sklearn")
datasets = sklearn.datasets
model_selection = sklearn.model_selection

# 加载数据集
iris = datasets.load_iris()
X = Array(iris["data"])    # Julia 数组
y = Array(iris["target"])  # Julia 数组

# 训练模型
RandomForestClassifier = pyimport("sklearn.ensemble").RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)

# 使用 Python 的 train_test_split
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 训练
clf.fit(X_train, y_train)

# 预测
accuracy = clf.score(X_test, y_test)
println("准确率: $accuracy")

8.2 调用 Python 的 pandas

using PyCall

pd = pyimport("pandas")

# 创建 DataFrame
df = pd.DataFrame(Dict(
    "name" => ["Alice", "Bob", "Charlie"],
    "age" => [25, 30, 35],
    "city" => ["Beijing", "Shanghai", "Guangzhou"]
))

# 数据操作
println(df.describe())
println(df[df["age"] .> 28])

# 读取 CSV
# df = pd.read_csv("data.csv")

# 转为 Julia DataFrame
using DataFrames
jl_df = DataFrame(df)

9. JuliaCall — Python 调用 Julia

9.1 安装

pip install juliacall

9.2 在 Python 中使用 Julia

# Python 代码
from juliacall import Main as jl

# 调用 Julia 函数
result = jl.sqrt(2.0)
print(result)  # 1.4142135623730951

# 加载 Julia 包
jl.seval("using LinearAlgebra")
A = jl.rand(3, 3)
eigenvalues = jl.eigvals(A)
print(eigenvalues)

# 传递 NumPy 数组
import numpy as np
arr = np.array([1.0, 2.0, 3.0])
result = jl.sum(arr)

9.3 在 Julia 中准备接口

# mymodule.jl
module MyModule

export process_data, compute_statistics

function process_data(data::Vector{Float64})
    return data .- mean(data)
end

function compute_statistics(data::Vector{Float64})
    m = mean(data)
    s = std(data)
    return Dict("mean" => m, "std" => s, "min" => minimum(data), "max" => maximum(data))
end

end

# 在 Python 中使用
from juliacall import Main as jl
jl.include("mymodule.jl")

import numpy as np
data = np.random.randn(1000)
result = jl.compute_statistics(data)
print(result)

10. 性能对比与选择指南

10.1 调用开销对比

操作	PyCall 开销	原生 Julia	说明
标量函数调用	~1 μs	~1 ns	PyCall 有 Python 解释器开销
数组传递（零拷贝）	~0.1 μs	0	PyArray 零拷贝
数组传递（复制）	~10 μs/MB	-	取决于数组大小
模块加载	~100 ms	~10 ms	首次 pyimport 较慢

10.2 何时使用 PyCall

场景	推荐方案
Python 已有成熟库（如 scikit-learn, TensorFlow）	✅ 使用 PyCall
Julia 已有等效库	❌ 使用 Julia 原生库
频繁调用小函数	❌ 尽量用 Julia 重写
批量数据处理	✅ 使用 PyArray 零拷贝
需要 Python 可视化（matplotlib）	✅ 使用 PythonPlot.jl
生产环境部署	⚠️ 考虑依赖复杂度

10.3 性能优化建议

# ❌ 不推荐：在循环中频繁调用 Python
for i in 1:10000
    pyfunc(data[i])  # 每次调用都有 Python 解释器开销
end

# ✅ 推荐：批量调用
pyfunc(data)  # 一次调用处理整个数组

# ✅ 推荐：预编译 Python 模块
const np = pyimport("numpy")  # const 避免重复查找
const pd = pyimport("pandas")

常见问题与陷阱

问题	原因	解决方案
`PyError: No module named 'xxx'`	Python 包未安装	`Conda.add("xxx")`
数组维度顺序错误	C/Fortran 顺序差异	使用 `np.asfortranarray` 或 `permutedims`
GIL 竞争	Python 全局解释器锁	单线程调用 Python
内存泄漏	Python 对象未释放	使用 `pyfinalize()` 或手动 `del`
字符串编码问题	UTF-8/Latin-1 混淆	确保 Python 端使用 UTF-8

# 检查 Python 环境
using PyCall
pyimport("sys").path  # Python 模块搜索路径
pyimport("numpy").__version__  # 检查版本

业务场景

场景一：机器学习模型推理

训练阶段使用 Python 的 TensorFlow，推理阶段使用 Julia 通过 PyCall 加载模型。Julia 的高性能数据预处理配合 Python 的模型，实现了低延迟推理服务。

场景二：数据科学工作流

使用 Julia 进行高性能数值计算和数据处理，通过 PyCall 调用 pandas 进行数据透视和可视化。两种语言各取所长。

场景三：渐进式迁移

团队计划从 Python 迁移到 Julia。使用 PyCall 允许逐模块迁移，Julia 代码可以调用尚未迁移的 Python 模块，实现平滑过渡。

总结

主题	关键要点
安装	`PyCall.jl` + Conda.jl 或系统 Python
导入	`pyimport("module")` 或 `@pyimport module as name`
类型转换	标量自动转换，数组使用 PyArray 零拷贝
NumPy	`np.array(jl_arr)` 零拷贝，注意行/列优先
PythonPlot	matplotlib 的 Julia 封装
JuliaCall	`pip install juliacall`，Python 调用 Julia
性能	批量调用优于逐元素调用