diff --git a/docs/api/paddle/CUDAPlace_cn.rst b/docs/api/paddle/CUDAPlace_cn.rst
index 077a4f1a24d..d140de5e6a9 100644
--- a/docs/api/paddle/CUDAPlace_cn.rst
+++ b/docs/api/paddle/CUDAPlace_cn.rst
@@ -19,7 +19,7 @@ CUDAPlace
参数
::::::::::::
- - **id** (int,可选) - GPU 的设备 ID。如果为 ``None``,则默认会使用 id 为 0 的设备。默认值为 ``None``。
+ - **id** (int) - GPU 的设备 ID。
代码示例
::::::::::::
diff --git a/docs/api/paddle/audio/Overview_cn.rst b/docs/api/paddle/audio/Overview_cn.rst
new file mode 100644
index 00000000000..81e66a650ba
--- /dev/null
+++ b/docs/api/paddle/audio/Overview_cn.rst
@@ -0,0 +1,72 @@
+.. _cn_overview_callbacks:
+
+paddle.audio
+---------------------
+
+
+paddle.audio 目录是飞桨在语音领域的高层 API。具体如下:
+
+- :ref:`音频特征相关 API `
+- :ref:`音频处理基础函数相关 API `
+- :ref:`音频 I/O 相关 API `
+- :ref:`语音数据集相关 API `
+
+.. _about_features:
+
+音频特征相关 API
+::::::::::::::::::::
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+ :widths: 10, 30
+
+ " :ref:`LogMelSpectrogram ` ", "计算语音特征 LogMelSpectrogram"
+ " :ref:`MelSpectrogram ` ", "计算语音特征 MelSpectrogram"
+ " :ref:`MFCC ` ", "计算语音特征 MFCC"
+ " :ref:`Spectrogram ` ", "计算语音特征 Spectrogram"
+
+.. _about_functional:
+
+音频处理基础函数相关 API
+::::::::::::::::::::
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+ :widths: 10, 30
+
+ " :ref:`compute_fbank_matrix ` ", "计算 fbank 矩阵"
+ " :ref:`create_dct ` ", "计算离散余弦变化矩阵"
+ " :ref:`fft_frequencies ` ", "计算离散傅里叶采样频率"
+ " :ref:`hz_to_mel` ", "转换 hz 频率为 mel 频率"
+ " :ref:`mel_to_hz` ", "转换 mel 频率为 hz 频率"
+ " :ref:`mel_frequencies` ", "计算 mel 频率"
+ " :ref:`power_to_db` ", "转换能量谱为分贝"
+ " :ref:`get_window` ", "得到各种窗函数"
+
+.. _about_backends:
+
+音频 I/O 相关 API
+::::::::::::::::::::
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+ :widths: 10, 30
+
+ " :ref:`get_current_backend ` ", "获取现在的语音 I/O 后端"
+ " :ref:`list_available_backends ` ", "获取可设置得语音 I/O 后端"
+ " :ref:`set_backend ` ", "设置语音 I/O 后端"
+ " :ref:`load ` ", "载入音频"
+ " :ref:`info ` ", "查询音频信息"
+ " :ref:`save ` ", "保存音频"
+
+.. _about_datasets:
+
+音频数据集相关 API
+::::::::::::::::::::
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+ :widths: 10, 30
+
+ " :ref:`TESS ` ", "TESS 数据集"
+ " :ref:`ESC50 ` ", "ESC50 数据集"
diff --git a/docs/api/paddle/audio/backends/get_current_backend_cn.rst b/docs/api/paddle/audio/backends/get_current_backend_cn.rst
new file mode 100644
index 00000000000..0cadbca12cd
--- /dev/null
+++ b/docs/api/paddle/audio/backends/get_current_backend_cn.rst
@@ -0,0 +1,21 @@
+.. _cn_api_audio_backends_get_current_backend:
+
+get_current_backend
+-------------------------------
+
+.. py:function:: paddle.audio.backends.get_current_backend()
+
+获取现在的处理语音 I/O 的后端名称。
+
+参数
+::::::::::::
+
+返回
+:::::::::
+
+``str``,语音 I/O 的后端名称。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.backends.get_current_backend
diff --git a/docs/api/paddle/audio/backends/list_available_backends_cn.rst b/docs/api/paddle/audio/backends/list_available_backends_cn.rst
new file mode 100644
index 00000000000..9155138a80f
--- /dev/null
+++ b/docs/api/paddle/audio/backends/list_available_backends_cn.rst
@@ -0,0 +1,21 @@
+.. _cn_api_audio_backends_list_available_backends:
+
+list_available_backends
+-------------------------------
+
+.. py:function:: paddle.audio.backends.list_available_backends()
+
+获取可用的音频 I/O 后端。
+
+参数
+::::::::::::
+
+返回
+:::::::::
+
+``List[str]``,可用的音频 I/O 后端集合。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.backends.list_available_backends
diff --git a/docs/api/paddle/audio/backends/set_backend_cn.rst b/docs/api/paddle/audio/backends/set_backend_cn.rst
new file mode 100644
index 00000000000..776b6f1197d
--- /dev/null
+++ b/docs/api/paddle/audio/backends/set_backend_cn.rst
@@ -0,0 +1,22 @@
+.. _cn_api_audio_backends_set_backend:
+
+set_backend
+-------------------------------
+
+.. py:function:: paddle.audio.backends.set_backend(backend_name: str)
+
+设置处理语音 I/O 的后端。
+
+参数
+::::::::::::
+
+ - **backend_name** (str) - 语音 I/O 后端名称,现支持 ``'wave_backend'`` ,如果安装了 paddleaudio >=1.0.2,则也支持 ``'soundfile'`` 。
+
+返回
+:::::::::
+无
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.backends.set_backend
diff --git a/docs/api/paddle/audio/datasets/ESC50_cn.rst b/docs/api/paddle/audio/datasets/ESC50_cn.rst
new file mode 100644
index 00000000000..c9d40d2c6cd
--- /dev/null
+++ b/docs/api/paddle/audio/datasets/ESC50_cn.rst
@@ -0,0 +1,27 @@
+.. _cn_api_audio_datasets_ESC50:
+
+ESC50
+-------------------------------
+
+.. py:class:: paddle.audio.datasets.ESC50(mode: str = 'train', split: int = 1, feat_type: str = 'raw', archive=None, **kwargs)
+
+
+`ESC50 `_ 数据集的实现。
+
+参数
+:::::::::
+
+ - **mode** (str,可选) - ``'train'`` 或 ``'dev'`` 模式两者之一,默认值为 ``'train'``。
+ - **split** (int,可选) - 默认是 1,指定 dev 的文件夹。
+ - **feat_type** (str,可选) - 默认是 raw,raw 是原始语音,支持 mfcc,spectrogram,melspectrogram,logmelspectrogram。指定从音频提取的语音特征。
+ - **archive** (dict,可选) - 默认是 None,类中已经设置默认 archive,指定数据集的下载链接和 md5 值。
+
+返回
+:::::::::
+
+:ref:`cn_api_io_cn_Dataset`,ESC50 数据集实例。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.datasets.ESC50
diff --git a/docs/api/paddle/audio/datasets/TESS_cn.rst b/docs/api/paddle/audio/datasets/TESS_cn.rst
new file mode 100644
index 00000000000..7a29ef4bac0
--- /dev/null
+++ b/docs/api/paddle/audio/datasets/TESS_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_audio_datasets_TESS:
+
+TESS
+-------------------------------
+
+.. py:class:: paddle.audio.datasets.TESS(mode: str = 'train', n_folds = 5, split = 1, feat_type = 'raw', archive=None, **kwargs)
+
+
+`TESS `_ 数据集的实现。
+
+参数
+:::::::::
+
+ - **mode** (str,可选) - ``'train'`` 或 ``'dev'`` 模式两者之一,默认值为 ``'train'``。
+ - **n_folds** (int,可选) - 默认是 5,指定把数据集分为的文件夹数目, 1 个文件夹是 dev,其他是 train。
+ - **split** (int,可选) - 默认是 1,指定 dev 的文件夹。
+ - **feat_type** (str,可选) - 默认是 raw,raw 是原始语音,支持 mfcc,spectrogram,melspectrogram,logmelspectrogram。指定从音频提取的语音特征。
+ - **archive** (dict,可选) - 默认是 None,类中已经设置默认 archive,指定数据集的下载链接和 md5 值。
+
+返回
+:::::::::
+
+:ref:`cn_api_io_cn_Dataset`,TESS 数据集实例。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.datasets.TESS
diff --git a/docs/api/paddle/audio/features/LogMelSpectrogram_cn.rst b/docs/api/paddle/audio/features/LogMelSpectrogram_cn.rst
new file mode 100644
index 00000000000..b73998c57bc
--- /dev/null
+++ b/docs/api/paddle/audio/features/LogMelSpectrogram_cn.rst
@@ -0,0 +1,40 @@
+.. _cn_api_audio_features_LogMelSpectrogram:
+
+LogMelSpectrogram
+-------------------------------
+
+.. py:class:: paddle.audio.features.LogMelSpectrogram(sr=22050, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', ref_value=1.0, amin=1e-10, top_db=None, dtype='float32')
+
+计算给定信号的 log-mel 谱。
+
+参数
+::::::::::::
+
+ - **sr** (int,可选) - 采样率,默认 22050。
+ - **n_fft** (int,可选) - 离散傅里叶变换中频率窗大小,默认 512。
+ - **hop_length** (int,可选) - 帧移,默认 512。
+ - **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。
+ - **window** (str,可选) - 窗函数名,默认'hann'。
+ - **power** (float,可选) - 幅度谱的指数,默认是2.0。
+ - **center** (bool,可选) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始,默认是 True。
+ - **pad_mode** (str,可选) - 如果 center 是 True,选择填充的方式,默认值是'reflect'。
+ - **n_mels** (int,可选) - mel bins 的数目,默认是64。
+ - **f_min** (float,可选) - 最小频率(hz),默认 50.0。
+ - **f_max** (float,可选) - 最大频率(hz),默认为 None。
+ - **htk** (bool,可选) - 在计算 fbank 矩阵时是否用在 HTK 公式缩放,默认是 False。
+ - **norm** (Union[str,float],可选) - 计算 fbank 矩阵时正则化的种类,默认是'slaney',你也可以 norm=0.5,使用 p-norm 正则化。
+ - **ref_value** (float,可选) - 参照值,如果小于 1.0,信号的 db 会被提升,相反 db 会下降,默认值为 1.0。
+ - **amin** (float,可选) - 输入的幅值的最小值,默认是1e-10。
+ - **top_db** (float,可选) - log-mel 谱的最大值(db),默认是None。
+ - **dtype** (str,可选) - 输入和窗的数据类型,默认是'float32'。
+
+
+返回
+:::::::::
+
+计算``LogMelSpectrogram``的可调用对象。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.features.layers.LogMelSpectrogram
diff --git a/docs/api/paddle/audio/features/MFCC_cn.rst b/docs/api/paddle/audio/features/MFCC_cn.rst
new file mode 100644
index 00000000000..2c7ef2ad4a2
--- /dev/null
+++ b/docs/api/paddle/audio/features/MFCC_cn.rst
@@ -0,0 +1,40 @@
+.. _cn_api_audio_features_MFCC:
+
+MFCC
+-------------------------------
+
+.. py:class:: paddle.audio.features.MFCC(sr=22050, n_mfcc=40, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', ref_value=1.0, amin=1e-10, top_db=None, dtype='float32')
+
+计算给定信号的 MFCC。
+
+参数
+::::::::::::
+
+ - **sr** (int,可选) - 采样率,默认 22050。
+ - **n_mfcc** (int,可选) - mfcc 的维度,默认 40。
+ - **n_fft** (int,可选) - 离散傅里叶变换中频率窗大小,默认 512。
+ - **hop_length** (int,可选) - 帧移,默认 512。
+ - **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。
+ - **window** (str,可选) - 窗函数名,默认'hann'。
+ - **power** (float,可选) - 幅度谱的指数,默认是2.0。
+ - **center** (bool,可选) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始,默认是 True。
+ - **pad_mode** (str,可选) - 如果 center 是 True,选择填充的方式,默认值是'reflect'。
+ - **n_mels** (int,可选) - mel bins 的数目,默认是64。
+ - **f_min** (float,可选) - 最小频率(hz),默认 50.0。
+ - **f_max** (float,可选) - 最大频率(hz),默认为 None。
+ - **htk** (bool,可选) - 在计算 fbank 矩阵时是否用在 HTK 公式缩放,默认是 False。
+ - **norm** (Union[str, float],可选) - 计算 fbank 矩阵时正则化的种类,默认是'slaney',也可以 norm=0.5,使用 p-norm 正则化。
+ - **ref_value** (float,可选) - 参照值, 如果小于 1.0,信号的 db 会被提升, 相反 db 会下降, 默认值为 1.0。
+ - **amin** (float,可选) - 输入的幅值的最小值,默认是1e-10。
+ - **top_db** (float,可选) - log-mel 谱的最大值(db),默认是 None。
+ - **dtype** (str,可选) - 输入和窗的数据类型,默认是'float32'。
+
+返回
+:::::::::
+
+计算``MFCC``的可调用对象。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.features.layers.MFCC
diff --git a/docs/api/paddle/audio/features/MelSpectrogram_cn.rst b/docs/api/paddle/audio/features/MelSpectrogram_cn.rst
new file mode 100644
index 00000000000..c25c73e43a2
--- /dev/null
+++ b/docs/api/paddle/audio/features/MelSpectrogram_cn.rst
@@ -0,0 +1,37 @@
+.. _cn_api_audio_features_MelSpectrogram:
+
+MelSpectrogram
+-------------------------------
+
+.. py:class:: paddle.audio.features.MelSpectrogram(sr=22050, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', dtype='float32')
+
+求得给定信号的 Mel 谱。
+
+参数
+::::::::::::
+
+ - **sr** (int,可选) - 采样率,默认 22050。
+ - **n_fft** (int,可选) - 离散傅里叶变换中频率窗大小,默认 512。
+ - **hop_length** (int,可选) - 帧移,默认 512。
+ - **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。
+ - **window** (str,可选) - 窗函数名,默认'hann'。
+ - **power** (float,可选) - 幅度谱的指数,默认是2.0。
+ - **center** (bool,可选) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始,默认是 True。
+ - **pad_mode** (str,可选) - 如果 center 是 True,选择填充的方式,默认值是'reflect'。
+ - **n_mels** (int,可选) - mel bins 的数目,默认是64。
+ - **f_min** (float,可选) - 最小频率(hz),默认 50.0。
+ - **f_max** (float,可选) - 最大频率(hz),默认为 None。
+ - **htk** (bool,可选) - 在计算 fbank 矩阵时是否用在 HTK 公式缩放,默认是 False。
+ - **norm** (Union[str, float],可选) -计算 fbank 矩阵时正则化的种类,默认是'slaney',也可以 norm=0.5,使用 p-norm 正则化。
+ - **dtype** (str,可选) - 输入和窗的数据类型,默认是'float32'。
+
+
+返回
+:::::::::
+
+计算``MelSpectrogram``的可调用对象。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.features.MelSpectrogram
diff --git a/docs/api/paddle/audio/features/Spectrogram_cn.rst b/docs/api/paddle/audio/features/Spectrogram_cn.rst
new file mode 100644
index 00000000000..2b7856715e2
--- /dev/null
+++ b/docs/api/paddle/audio/features/Spectrogram_cn.rst
@@ -0,0 +1,30 @@
+.. _cn_api_audio_features_Spectrogram:
+
+Spectrogram
+-------------------------------
+
+.. py:class:: paddle.audio.features.Spectrogram(n_fft=512, hop_length=512, win_length=None, window='hann', power=1.0, center=True, pad_mode='reflect', dtype='float32')
+
+通过给定信号的短时傅里叶变换得到频谱。
+
+参数
+::::::::::::
+
+ - **n_fft** (int,可选) - 离散傅里叶变换中频率窗大小,默认 512。
+ - **hop_length** (int,可选) - 帧移,默认 512。
+ - **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。
+ - **window** (str,可选) - 窗函数名,默认'hann'。
+ - **power** (float,可选) - 幅度谱的指数,默认是1.0。
+ - **center** (bool,可选) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始,默认是True。
+ - **pad_mode** (str,可选) - 如果 center 是 True,选择填充的方式,默认值是'reflect'。
+ - **dtype** (str,可选) - 输入和窗的数据类型,默认是'float32'。
+
+
+返回
+:::::::::
+
+计算``Spectrogram``的可调用对象.
+
+代码示例
+:::::::::
+COPY-FROM: paddle.audio.features.Spectrogram
diff --git a/docs/api/paddle/audio/functional/compute_fbank_matrix_cn.rst b/docs/api/paddle/audio/functional/compute_fbank_matrix_cn.rst
new file mode 100644
index 00000000000..146c4f86fd9
--- /dev/null
+++ b/docs/api/paddle/audio/functional/compute_fbank_matrix_cn.rst
@@ -0,0 +1,30 @@
+.. _cn_api_audio_functional_compute_fbank_matrix:
+
+compute_fbank_matrix
+-------------------------------
+
+.. py:function:: paddle.audio.functional.compute_fbank_matrix(sr, n_fft, n_mels=64, f_min=0.0, f_max=None, htk=False, nrom='slaney', dtype='float32')
+
+计算 mel 变换矩阵。
+
+参数
+::::::::::::
+
+ - **sr** (int) - 采样率。
+ - **n_fft** (int) - fft bins 的数目。
+ - **n_mels** (float,可选) - mels bins 的数目,默认是64。
+ - **f_min** (float,可选) - 最小频率(hz),默认是0.0。
+ - **f_max** (Optional[float],可选) - 最大频率(hz),默认是 None。
+ - **htk** (bool,可选) - 是否使用 htk 缩放,默认是 False。
+ - **norm** (Union[str, float],可选) - norm 的类型,默认是'slaney'。
+ - **dtype** (str,可选) - 返回矩阵的数据类型,默认'float32'。
+
+返回
+:::::::::
+
+``paddle.Tensor``,Tensor shape (n_mels, n_fft//2 + 1)。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.functional.compute_fbank_matrix
diff --git a/docs/api/paddle/audio/functional/create_dct_cn.rst b/docs/api/paddle/audio/functional/create_dct_cn.rst
new file mode 100644
index 00000000000..14e6343a6c5
--- /dev/null
+++ b/docs/api/paddle/audio/functional/create_dct_cn.rst
@@ -0,0 +1,26 @@
+.. _cn_api_audio_functional_create_dct:
+
+create_dct
+-------------------------------
+
+.. py:function:: paddle.audio.functional.create_dct(n_mfcc, n_mels, norm='ortho', dtype='float32')
+
+计算离散余弦变换矩阵。
+
+参数
+::::::::::::
+
+ - **n_mfcc** (float) - mel 倒谱系数数目。
+ - **n_mels** (int) - mel 的 fliterbank 数。
+ - **norm** (float,可选) - 正则化类型,默认值是'ortho'。
+ - **dtype** (str,可选) - 默认'float32'。
+
+返回
+:::::::::
+
+``paddle.Tensor``,Tensor 形状 (n_mels, n_mfcc)。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.functional.create_dct
diff --git a/docs/api/paddle/audio/functional/fft_frequencies_cn.rst b/docs/api/paddle/audio/functional/fft_frequencies_cn.rst
new file mode 100644
index 00000000000..83a951cd69b
--- /dev/null
+++ b/docs/api/paddle/audio/functional/fft_frequencies_cn.rst
@@ -0,0 +1,25 @@
+.. _cn_api_audio_functional_fft_frequencies:
+
+fft_frequencies
+-------------------------------
+
+.. py:function:: paddle.audio.functional.fft_frequencies(sr, n_fft, dtype='float32')
+
+计算 fft 频率。
+
+参数
+::::::::::::
+
+ - **sr** (int) - 采样率。
+ - **n_fft** (int) - fft bins 的数目。
+ - **dtype** (str,可选) - 默认'float32'。
+
+返回
+:::::::::
+
+``paddle.Tensor``,Tensor 形状 (n_fft//2 + 1,)。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.functional.fft_frequencies
diff --git a/docs/api/paddle/audio/functional/get_window_cn.rst b/docs/api/paddle/audio/functional/get_window_cn.rst
new file mode 100644
index 00000000000..3b59263ab77
--- /dev/null
+++ b/docs/api/paddle/audio/functional/get_window_cn.rst
@@ -0,0 +1,26 @@
+.. _cn_api_audio_functional_get_window:
+
+get_window
+-------------------------------
+
+.. py:function:: paddle.audio.functional.get_window(window, win_length, fftbins=True, dtype='float64')
+
+根据参数给出对应长度和类型的窗函数。
+
+参数
+::::::::::::
+
+ - **window** (str 或者 Tuple[str,float]) - 窗函数类型,或者(窗参数类型, 窗函数参数),支持的窗函数类型'hamming','hann','gaussian','general_gaussian','exponential','triang','bohman','blackman','cosine','tukey','taylor'。
+ - **win_length** (int) - 采样点数。
+ - **fftbins** (bool,可选) - 如果是 True,给出一个周期性的窗,如果是 False 给出一个对称性的窗,默认是 True。
+ - **dtype** (str,可选) - 默认'float64'。
+
+返回
+:::::::::
+
+``paddle.Tensor``,对应窗表征的 Tensor 。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.functional.get_window
diff --git a/docs/api/paddle/audio/functional/hz_to_mel_cn.rst b/docs/api/paddle/audio/functional/hz_to_mel_cn.rst
new file mode 100644
index 00000000000..6162f4d7f07
--- /dev/null
+++ b/docs/api/paddle/audio/functional/hz_to_mel_cn.rst
@@ -0,0 +1,24 @@
+.. _cn_api_audio_functional_hz_to_mel:
+
+hz_to_mel
+-------------------------------
+
+.. py:function:: paddle.audio.functional.hz_to_mel(feq, htk=False)
+
+转换 Hz 为 Mels。
+
+参数
+::::::::::::
+
+ - **freq** (Tensor, float) - 输入 tensor。
+ - **htk** (bool,可选) - 是否使用 htk 缩放,默认 False。
+
+返回
+:::::::::
+
+``paddle.Tensor 或 float``,mels 值。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.functional.hz_to_mel
diff --git a/docs/api/paddle/audio/functional/mel_frequencies_cn.rst b/docs/api/paddle/audio/functional/mel_frequencies_cn.rst
new file mode 100644
index 00000000000..9e7e6517452
--- /dev/null
+++ b/docs/api/paddle/audio/functional/mel_frequencies_cn.rst
@@ -0,0 +1,27 @@
+.. _cn_api_audio_functional_mel_frequencies:
+
+mel_frequencies
+-------------------------------
+
+.. py:function:: paddle.audio.functional.mel_frequencies(n_mels=64, f_min=0.0, f_max=11025, htk=False, dtype='float32')
+
+计算 Mels 频率。
+
+参数
+::::::::::::
+
+ - **n_mels** (int,可选) - 输入 tensor,默认 64。
+ - **f_min** (float,可选) - 最小频率(hz),默认 0.0。
+ - **f_max** (float,可选) - 最大频率(hz),默认 11025.0。
+ - **htk** (bool,可选) - 是否使用 htk 缩放,默认 False。
+ - **dtype** (str,可选) - 默认'float32'。
+
+返回
+:::::::::
+
+``paddle.Tensor``,Tensor 形状 (n_mels,)。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.functional.mel_frequencies
diff --git a/docs/api/paddle/audio/functional/mel_to_hz_cn.rst b/docs/api/paddle/audio/functional/mel_to_hz_cn.rst
new file mode 100644
index 00000000000..39a2cf61ad0
--- /dev/null
+++ b/docs/api/paddle/audio/functional/mel_to_hz_cn.rst
@@ -0,0 +1,24 @@
+.. _cn_api_audio_functional_mel_to_hz:
+
+mel_to_hz
+-------------------------------
+
+.. py:function:: paddle.audio.functional.mel_to_hz(feq, htk=False)
+
+转换 Mels 为 Hz。
+
+参数
+::::::::::::
+
+ - **mel** (Tensor, float) - 输入 tensor。
+ - **htk** (bool,可选) - 是否使用 htk 缩放,默认 False。
+
+返回
+:::::::::
+
+``paddle.Tensor 或 float``,hz 为单位的频率。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.functional.mel_to_hz
diff --git a/docs/api/paddle/audio/functional/power_to_db_cn.rst b/docs/api/paddle/audio/functional/power_to_db_cn.rst
new file mode 100644
index 00000000000..e60633271dc
--- /dev/null
+++ b/docs/api/paddle/audio/functional/power_to_db_cn.rst
@@ -0,0 +1,26 @@
+.. _cn_api_audio_functional_power_to_db:
+
+power_to_db
+-------------------------------
+
+.. py:function:: paddle.audio.functional.power_to_db(spect, ref_value=1.0, amin=1e-10, top_db=80.0)
+
+转换能量谱为分贝单位。
+
+参数
+::::::::::::
+
+ - **spect** (Tensor) - stft 能量谱,输入 tensor。
+ - **ref_value** (float,可选) - 参照值,振幅相对于 ref 进行缩放,默认 1.0。
+ - **amin** (float,可选) - 最小阈值,默认 1e-10。
+ - **top_db** (float,可选) - 阈值,默认 80.0。
+
+返回
+:::::::::
+
+``paddle.Tensor 或 float``,db 单位的能量谱。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.functional.power_to_db
diff --git a/docs/api/paddle/audio/info_cn.rst b/docs/api/paddle/audio/info_cn.rst
new file mode 100644
index 00000000000..05578856483
--- /dev/null
+++ b/docs/api/paddle/audio/info_cn.rst
@@ -0,0 +1,22 @@
+.. _cn_api_audio_info:
+
+info
+-------------------------------
+
+.. py:function:: paddle.audio.info(filepath:str)
+
+获取音频的相关信息,如采用率,通道数等。
+
+参数
+::::::::::::
+
+ - **filepath** (str) - 输入音频路径。
+返回
+:::::::::
+
+``AudioInfo``, 音频相关信息。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.info
diff --git a/docs/api/paddle/audio/load_cn.rst b/docs/api/paddle/audio/load_cn.rst
new file mode 100644
index 00000000000..bb08dd4583d
--- /dev/null
+++ b/docs/api/paddle/audio/load_cn.rst
@@ -0,0 +1,26 @@
+.. _cn_api_audio_load:
+
+load
+-------------------------------
+
+.. py:function:: paddle.audio.load(filepath: Union[str, Path], frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True)
+
+获取音频数据。
+
+参数
+::::::::::::
+
+ - **filepath** (str 或者 Path) - 输入音频路径。
+ - **frame_offset** (int,可选) - 默认是 0,开始读取音频起始帧。
+ - **num_frames** (int,可选) - 默认是-1,读取音频帧数,-1 表示读取全部帧。
+ - **normalize** (bool,可选) - 默认是 True。如果是 True,返回是音频值被规整到[-1.0,1.0],如果是 False,那么就返回原始值。
+ - **channels_first** (bool,可选) - 默认是 True。如果是 True,那么返回的形状是[channel,time],如果是 False,则是[time,channel]。
+返回
+:::::::::
+
+``Tuple[paddle.Tensor, int]``,音频数据值,采样率。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.load
diff --git a/docs/api/paddle/audio/save_cn.rst b/docs/api/paddle/audio/save_cn.rst
new file mode 100644
index 00000000000..653c8a1bdb3
--- /dev/null
+++ b/docs/api/paddle/audio/save_cn.rst
@@ -0,0 +1,26 @@
+.. _cn_api_audio_save:
+
+save
+-------------------------------
+
+.. py:function:: paddle.audio.save(filepath: str, src: paddle.Tensor, sample_rate: int, channels_first: bool = True, encoding: Optional[str] = None, bits_per_sample: Optional[int] = 16)
+
+保存音频数据。
+
+参数
+::::::::::::
+
+ - **filepath** (str 或者 Path) - 保存音频路径。
+ - **src** (paddle.Tensor) - 音频数据。
+ - **sample_rate** (int) - 采样率。
+ - **channels_first** (bool,可选) - 如果是 True,那么 src 的 Tensor 形状是[channel,time],如果是 False,则是[time,channel]。
+ - **encoding** (Optional[str],可选) - 默认是 None,编码信息。
+ - **bits_per_sample** (Optional[int],可选) - 默认是 16,编码位长。
+返回
+:::::::::
+无
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.audio.save
diff --git a/docs/api/paddle/static/create_parameter_cn.rst b/docs/api/paddle/create_parameter_cn.rst
similarity index 88%
rename from docs/api/paddle/static/create_parameter_cn.rst
rename to docs/api/paddle/create_parameter_cn.rst
index 2c3859bb02b..3c50e44d24f 100644
--- a/docs/api/paddle/static/create_parameter_cn.rst
+++ b/docs/api/paddle/create_parameter_cn.rst
@@ -1,12 +1,10 @@
-.. _cn_api_fluid_layers_create_parameter:
+.. _cn_api_paddle_create_parameter:
create_parameter
-------------------------------
-.. py:function:: paddle.static.create_parameter(shape,dtype,name=None,attr=None,is_bias=False,default_initializer=None)
-
-
+.. py:function:: paddle.create_parameter(shape,dtype,name=None,attr=None,is_bias=False,default_initializer=None)
创建一个参数。该参数是一个可学习的变量,拥有梯度并且可优化。
diff --git a/docs/api/paddle/dist_cn.rst b/docs/api/paddle/dist_cn.rst
index ae09e3cb863..1d9b09115d9 100644
--- a/docs/api/paddle/dist_cn.rst
+++ b/docs/api/paddle/dist_cn.rst
@@ -20,6 +20,7 @@ x (4-D Tensor): 8 x 1 x 6 x 1
y (4-D Tensor): 1 x 7 x 1 x 5
+
(2) 确定输出 `z` 每一维度的大小:从两个输入的维度中选取最大值。
z (4-D Tensor): 8 x 7 x 6 x 5
@@ -49,9 +50,11 @@ z (4-D Tensor): 8 x 7 x 6 x 5
参数
::::::::::::
- - **x** (Tensor): 1-D 到 6-D Tensor,数据类型为 float32 或 float64。
- - **y** (Tensor): 1-D 到 6-D Tensor,数据类型为 float32 或 float64。
- - **p** (float,optional):用于设置需要计算的范数,数据类型为 float32 或 float64。默认值为 2。
+ - **x** (Tensor) - 1-D 到 6-D Tensor,数据类型为 float32 或 float64。
+ - **y** (Tensor) - 1-D 到 6-D Tensor,数据类型为 float32 或 float64。
+ - **p** (float,可选) - 用于设置需要计算的范数,数据类型为 float32 或 float64。默认值为 2。
+
+
返回
::::::::::::
diff --git a/docs/api/paddle/geometric/Overview_cn.rst b/docs/api/paddle/geometric/Overview_cn.rst
new file mode 100644
index 00000000000..a563173495c
--- /dev/null
+++ b/docs/api/paddle/geometric/Overview_cn.rst
@@ -0,0 +1,47 @@
+.. _cn_overview_paddle_geometric:
+
+paddle.geometric
+---------------------
+
+paddle.geometric 目录下包含飞桨框架支持的图领域的相关 API。具体如下:
+
+- :ref:`高性能图消息传递 `
+- :ref:`高效图采样 `
+- :ref:`数学分段求值 `
+
+.. _faster_message_passing:
+
+高性能图消息传递
+==========================
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+
+ " :ref:`paddle.geometric.send_u_recv ` ", "节点特征消息传递"
+ " :ref:`paddle.geometric.send_ue_recv ` ", "节点融合边特征消息传递"
+ " :ref:`paddle.geometric.send_uv ` ", "源节点与目标节点消息发送并计算"
+
+.. _faster_graph_sampling:
+
+高效图采样
+==========================
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+
+ " :ref:`paddle.geometric.sample_neighbors ` ", "无权重邻居采样"
+ " :ref:`paddle.geometric.reindex_graph ` ", "同构图场景下的子图重编号"
+ " :ref:`paddle.geometric.reindex_heter_graph ` ", "异构图场景下的子图重编号"
+
+.. _math_segment:
+
+数学分段求值
+==========================
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+
+ " :ref:`paddle.geometric.segment_sum ` ", "分段求和"
+ " :ref:`paddle.geometric.segment_mean ` ", "分段求均值"
+ " :ref:`paddle.geometric.segment_max ` ", "分段求最大值"
+ " :ref:`paddle.geometric.segment_min ` ", "分段求最小值"
diff --git a/docs/api/paddle/geometric/reindex_graph_cn.rst b/docs/api/paddle/geometric/reindex_graph_cn.rst
new file mode 100644
index 00000000000..91eec01f138
--- /dev/null
+++ b/docs/api/paddle/geometric/reindex_graph_cn.rst
@@ -0,0 +1,39 @@
+.. _cn_api_geometric_reindex_graph:
+
+reindex_graph
+-------------------------------
+
+.. py:function:: paddle.geometric.reindex_graph(x, neighbors, count, value_buffer=None, index_buffer=None, name=None)
+
+主要应用于图学习领域,需要与图采样相关的 API 配合使用,主要处理同构图场景。其主要目的是对输入的中心节点信息和邻居信息进行从 0 开始的重新编号,以方便后续的图模型子图训练。
+
+.. note::
+ 输入 ``x`` 中的元素需保证是独有的,否则可能会带来一些潜在的错误。输入的节点将会和邻居节点一同从 0 进行编号。
+
+以输入 x = [0, 1, 2] 作为例子解释。假设我们有邻居 neighbors = [8, 9, 0, 4, 7, 6, 7],以及邻居数量 count = [2, 3, 2]。
+则可以得知节点 0 的邻居为 [8, 9],节点 1 的邻居为 [0, 4, 7],节点 2 的邻居为 [6, 7]。经过此 API 计算后,共计会返回三个结果:
+ 1. reindex_src: [3, 4, 0, 5, 6, 7, 6]
+ 2. reindex_dst: [0, 0, 1, 1, 1, 2, 2]
+ 3. out_nodes: [0, 1, 2, 8, 9, 4, 7, 6]
+可以看到 ``reindex_src`` 和 ``reindex_dst`` 中的值实际上是各个节点在 ``out_nodes`` 中对应的下标索引。
+
+参数
+:::::::::
+ - **x** (Tensor) - 输入的中心节点原始编号,数据类型为:int32、int64。
+ - **neighbors** (Tensor) - 中心节点的邻居节点编号,数据类型为:int32、int64。
+ - **count** (Tensor) - 中心节点各自的邻居数目,数据类型为:int32。
+ - **value_buffer** (Tensor,可选) - 用于快速哈希索引的缓存 Tensor,可加速重编号过程。数据类型为 int32,并且应当事先填充为-1。默认值为 None。
+ - **index_buffer** (Tensor,可选) - 用于快速哈希索引的缓存 Tensor,可加速重编号过程。数据类型为 int32,并且应当事先填充为-1。默认值为 None。如果需要使用加速重编号过程,则 ``value_buffer`` 和 ``index_buffer`` 均不可为空。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+ - reindex_src (Tensor) - 重编号后的边对应的源节点信息。
+ - reindex_dst (Tensor) - 重编号后的边对应的目标节点信息。
+ - out_nodes (Tensor) - 返回去重后的输入中心节点信息和邻居信息,且为原始编号。注意,我们将输入的中心节点编号信息放置于前面,而邻居节点放置于后面。
+
+
+代码示例
+::::::::::
+
+COPY-FROM: paddle.geometric.reindex_graph
diff --git a/docs/api/paddle/geometric/reindex_heter_graph_cn.rst b/docs/api/paddle/geometric/reindex_heter_graph_cn.rst
new file mode 100644
index 00000000000..2055cc7e604
--- /dev/null
+++ b/docs/api/paddle/geometric/reindex_heter_graph_cn.rst
@@ -0,0 +1,40 @@
+.. _cn_api_geometric_reindex_heter_graph:
+
+reindex_heter_graph
+-------------------------------
+
+.. py:function:: paddle.geometric.reindex_heter_graph(x, neighbors, count, value_buffer=None, index_buffer=None, name=None)
+
+主要应用于图学习领域,需要与图采样相关的 API 配合使用,主要处理异构图场景。其主要目的是对输入的中心节点信息和邻居信息进行从 0 开始的重新编号,以方便后续的图模型子图训练。
+
+.. note::
+ 输入 ``x`` 中的元素需保证是独有的,否则可能会带来一些潜在的错误。输入的节点将会和邻居节点一同从 0 进行编号。
+
+以输入 x = [0, 1, 2] 作为例子解释。对于异构图 A ,假设我们有邻居 neighbors = [8, 9, 0, 4, 7, 6, 7],以及邻居数量 count = [2, 3, 2];
+则可以得知节点 0 的邻居为 [8, 9],节点 1 的邻居为 [0, 4, 7],节点 2 的邻居为 [6, 7]。对于异构图 B,假设有邻居 neighbors = [0, 2, 3, 5, 1],
+以及邻居数量 count = [1, 3, 1],则可以得知节点 0 的邻居为 [0],节点 1 的邻居为 [2, 3, 5]。经过此 API 计算后,共计会返回三个结果:
+ 1. reindex_src: [3, 4, 0, 5, 6, 7, 6, 0, 2, 8, 9, 1]
+ 2. reindex_dst: [0, 0, 1, 1, 1, 2, 2, 0, 1, 1, 1, 2]
+ 3. out_nodes: [0, 1, 2, 8, 9, 4, 7, 6, 3, 5]
+可以看到 ``reindex_src`` 和 ``reindex_dst`` 中的值实际上是各个节点在 ``out_nodes`` 中对应的下标索引。
+
+参数
+:::::::::
+ - **x** (Tensor) - 输入的中心节点原始编号,数据类型为:int32、int64。
+ - **neighbors** (list | tuple) - 中心节点对应到各个异构图中的邻居节点编号,数据类型为:int32、int64。
+ - **count** (list | tuple) - 中心节点对应到各个异构图中的邻居数目,数据类型为:int32。
+ - **value_buffer** (Tensor,可选) - 用于快速哈希索引的缓存 Tensor,可加速重编号过程。数据类型为 int32,并且应当事先填充为-1。默认值为 None。
+ - **index_buffer** (Tensor,可选) - 用于快速哈希索引的缓存 Tensor,可加速重编号过程。数据类型为 int32,并且应当事先填充为-1。默认值为 None。如果需要使用加速重编号过程,则 ``value_buffer`` 和 ``index_buffer`` 均不可为空。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+ - reindex_src (Tensor) - 重编号后的边对应的源节点信息。
+ - reindex_dst (Tensor) - 重编号后的边对应的目标节点信息。
+ - out_nodes (Tensor) - 返回去重后的输入中心节点信息和邻居信息,且为原始编号。注意,我们将输入的中心节点编号信息放置于前面,而邻居节点放置于后面。
+
+
+代码示例
+::::::::::
+
+COPY-FROM: paddle.geometric.reindex_heter_graph
diff --git a/docs/api/paddle/geometric/sample_neighbors_cn.rst b/docs/api/paddle/geometric/sample_neighbors_cn.rst
new file mode 100644
index 00000000000..efdbdc258bf
--- /dev/null
+++ b/docs/api/paddle/geometric/sample_neighbors_cn.rst
@@ -0,0 +1,31 @@
+.. _cn_api_geometric_sample_neighbors:
+
+sample_neighbors
+-------------------------------
+
+.. py:function:: paddle.geometric.sample_neighbors(row, colptr, input_nodes, sample_size=-1, eids=None, return_eids=False, perm_buffer=None, name=None)
+
+主要应用于图学习领域,主要目的是提供高性能图邻居采样方法。通过输入图的 CSC(Compressed Sparse Column,压缩列信息),分别对应 ``row`` 和 ``colptr``,从而将图转换为适用于邻居采样的格式,再输入需要进行采样的中心节点 ``input_nodes``,以及采样的邻居个数 ``sample_size``,从而可以获得对应中心节点采样后的邻居。另外,在 GPU 版本提供了 Fisher-yates 高性能图采样方法。
+
+参数
+:::::::::
+ - **row** (Tensor) - 输入原始图的 CSC 格式的行信息,数据类型为:int32、int64,形状为[num_edges, 1] 或 [num_edges]。
+ - **colptr** (Tensor) - 输入原始图的 CSC 格式的压缩列信息,数据类型应当与 ``row`` 一致,形状为[num_nodes + 1, 1]或 [num_nodes + 1]。
+ - **input_nodes** (Tensor) - 需进行邻居采样的中心节点信息,数据类型应当与 ``row`` 一致。
+ - **sample_size** (int) - 采样邻居个数。默认值为-1,表示采样输入中心节点的所有邻居。
+ - **eids** (Tensor,可选) - 输入原始图在 CSC 格式下的边编号信息。如果 ``return_eids`` 为 True,则不能为空。数据类型应当与 ``row`` 一致。默认为 None,表示不需要返回边编号信息。
+ - **return_eids** (bool) - 是否返回采样后对应的原始边编号信息,默认为 False。
+ - **perm_buffer** (Tensor,可选) - Fisher-yates 采样方法需要用到的缓存 Tensor。如果需使用高性能图采样方法,则不能为空。数据类型应当与 ``row`` 一致,形状为[num_edges],填充内容为 0 至 num_edges 的顺序递增序列。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+ - out_neighbors (Tensor) - 返回采样后的邻居节点。
+ - out_count (Tensor) - 返回中心节点各自对应的采样邻居数目,形状应该与 ``input_nodes`` 一致。
+ - out_eids (Tensor) - 如果 ``return_eids`` 为 True,则会返回采样边对应的编号信息,否则不返回。
+
+
+代码示例
+::::::::::
+
+COPY-FROM: paddle.geometric.sample_neighbors
diff --git a/docs/api/paddle/geometric/segment_max_cn.rst b/docs/api/paddle/geometric/segment_max_cn.rst
new file mode 100644
index 00000000000..9c7c545b382
--- /dev/null
+++ b/docs/api/paddle/geometric/segment_max_cn.rst
@@ -0,0 +1,34 @@
+.. _cn_api_geometric_segment_max:
+
+segment_max
+-------------------------------
+
+.. py:function:: paddle.geometric.segment_max(data, segment_ids, name=None)
+
+
+分段求最大值函数。
+
+此运算符,将 ``segment_ids`` 中相同索引对应的 ``data`` 的元素,进行求最大值操作。其中 ``segment_ids`` 是一个单调非减序列。
+具体而言,该算子计算一个 Tensor ``out``,使得
+
+.. math::
+
+ out_i = \max_{j \in \{segment\_ids_j == i \} } data_{j}
+
+其中求最大值的索引 ``j``,是符合 ``segment_ids[j] == i`` 的所有 ``j`` 。
+
+
+参数
+:::::::::
+ - **data** (Tensor) - 张量,数据类型为 float32、float64。
+ - **segment_ids** (Tensor) - 一维张量,与输入数据 ``data`` 的第一维大小相同,表示 ``data`` 分段位置,单调非减。合法的数据类型为 int32、int64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+ Tensor,分段求最大值的结果。空的 segment_id 对应的默认值为 0。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.geometric.segment_max
diff --git a/docs/api/paddle/geometric/segment_mean_cn.rst b/docs/api/paddle/geometric/segment_mean_cn.rst
new file mode 100644
index 00000000000..57988f029c3
--- /dev/null
+++ b/docs/api/paddle/geometric/segment_mean_cn.rst
@@ -0,0 +1,34 @@
+.. _cn_api_geometric_segment_mean:
+
+segment_mean
+-------------------------------
+
+.. py:function:: paddle.geometric.segment_mean(data, segment_ids, name=None)
+
+
+分段求均值函数。
+
+此运算符,将 ``segment_ids`` 中相同索引对应的 ``data`` 的元素,进行求均值操作。其中 ``segment_ids`` 是一个单调非减序列。
+具体而言,该算子计算一个 Tensor ``out``,使得
+
+.. math::
+
+ out_i = \mathop{mean}_{j \in \{segment\_ids_j == i \} } data_{j}
+
+其中求均值的索引 ``j``,是符合 ``segment_ids[j] == i`` 的所有 ``j`` 。
+
+
+参数
+:::::::::
+ - **data** (Tensor) - 张量,数据类型为 float32、float64。
+ - **segment_ids** (Tensor) - 一维张量,与输入数据 ``data`` 的第一维大小相同,表示 ``data`` 分段位置,单调非减。合法的数据类型为 int32、int64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+ Tensor,分段求均值的结果。空的 segment_id 对应的默认值为 0。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.geometric.segment_mean
diff --git a/docs/api/paddle/geometric/segment_min_cn.rst b/docs/api/paddle/geometric/segment_min_cn.rst
new file mode 100644
index 00000000000..afdc60e542f
--- /dev/null
+++ b/docs/api/paddle/geometric/segment_min_cn.rst
@@ -0,0 +1,34 @@
+.. _cn_api_geometric_segment_min:
+
+segment_min
+-------------------------------
+
+.. py:function:: paddle.geometric.segment_min(data, segment_ids, name=None)
+
+
+分段求最小值函数。
+
+此运算符,将 ``segment_ids`` 中相同索引对应的 ``data`` 的元素,进行求最小值操作。其中 ``segment_ids`` 是一个单调非减序列。
+具体而言,该算子计算一个 Tensor ``out``,使得
+
+.. math::
+
+ out_i = \min_{j \in \{segment\_ids_j == i \} } data_{j}
+
+其中求最小值的索引 ``j``,是符合 ``segment_ids[j] == i`` 的所有 ``j`` 。
+
+
+参数
+:::::::::
+ - **data** (Tensor) - 张量,数据类型为 float32、float64。
+ - **segment_ids** (Tensor) - 一维张量,与输入数据 ``data`` 的第一维大小相同,表示 ``data`` 分段位置,单调非减。合法的数据类型为 int32、int64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+ Tensor,分段求最小值的结果。空的 segment_id 对应的默认值为 0。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.geometric.segment_min
diff --git a/docs/api/paddle/geometric/segment_sum_cn.rst b/docs/api/paddle/geometric/segment_sum_cn.rst
new file mode 100644
index 00000000000..88a61038082
--- /dev/null
+++ b/docs/api/paddle/geometric/segment_sum_cn.rst
@@ -0,0 +1,36 @@
+.. _cn_api_geometric_segment_sum:
+
+segment_sum
+-------------------------------
+
+.. py:function:: paddle.geometric.segment_sum(data, segment_ids, name=None)
+
+
+分段求和函数。
+
+此运算符,将 ``segment_ids`` 中相同索引对应的 ``data`` 的元素,进行求和操作。其中 ``segment_ids`` 是一个单调非减序列。
+具体而言,该算子计算一个 Tensor ``out``,使得
+
+.. math::
+
+ out_i = \sum_{j \in \{segment\_ids_j == i \} } data_{j}
+
+其中求和的索引 ``j``,是符合 ``segment_ids[j] == i`` 的所有 ``j`` 。
+
+
+参数
+:::::::::
+
+ - **data** (Tensor) - 张量,数据类型为 float32、float64。
+ - **segment_ids** (Tensor) - 一维张量,与输入数据 ``data`` 的第一维大小相同,表示 ``data`` 分段位置,单调非减。合法的数据类型为 int32、int64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+
+ Tensor,分段求和的结果。空的 segment_id 对应的默认值为 0。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.geometric.segment_sum
diff --git a/docs/api/paddle/geometric/send_u_recv_cn.rst b/docs/api/paddle/geometric/send_u_recv_cn.rst
new file mode 100644
index 00000000000..94c4459d6f7
--- /dev/null
+++ b/docs/api/paddle/geometric/send_u_recv_cn.rst
@@ -0,0 +1,47 @@
+.. _cn_api_geometric_send_u_recv:
+
+send_u_recv
+-------------------------------
+
+.. py:function:: paddle.geometric.send_u_recv(x, src_index, dst_index, reduce_op="sum", out_size=None, name=None)
+
+主要应用于图学习领域,目的是为了减少在消息传递过程中带来的中间变量显存或内存的损耗。其中,``x`` 作为输入的节点特征 Tensor,首先利用 ``src_index`` 作为索引来 gather 出在 ``x`` 中相应位置的数据,随后再将 gather 出的结果利用 ``dst_index`` 来更新到对应的输出结果中,其中 ``reduce_op`` 表示不同的更新方式,包括 sum、mean、max、min 共计 4 种处理模式。另外,提供了 ``out_size`` 参数,用于设置实际输出的形状,有利于减少实际显存占用。
+
+.. code-block:: text
+
+ X = [[0, 2, 3],
+ [1, 4, 5],
+ [2, 6, 7]]
+
+ src_index = [0, 1, 2, 0]
+
+ dst_index = [1, 2, 1, 0]
+
+ reduce_op = "sum"
+
+ out_size = None
+
+ Then:
+
+ Out = [[0, 2, 3],
+ [2, 8, 10],
+ [1, 4, 5]]
+
+参数
+:::::::::
+ - **x** (Tensor) - 输入的节点特征 Tensor,数据类型为:float32、float64、int32、int64。另外,我们在 GPU 计算中支持 float16。
+ - **src_index** (Tensor) - 1-D Tensor,数据类型为:int32、int64。
+ - **dst_index** (Tensor) - 1-D Tensor,数据类型为:int32、int64。注意:``dst_index`` 的形状应当与 ``src_index`` 一致。
+ - **reduce_op** (str) - 不同更新方式,包括 sum、mean、max、min。默认值为 sum。
+ - **out_size** (int64 | Tensor | None) - 可以通过根据实际需求设置 ``out_size`` 来改变实际输出形状。默认值为 None,表示这个参数将不会被使用。注意,``out_size`` 的值必须等于或大于 ``max(dst_index) + 1`` 。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+``Tensor`` ,维度和数据类型都与 ``x`` 相同,存储运算后的结果。如果 ``out_size`` 参数正确设置了,则输出结果的第 0 维大小是 ``out_size`` ,其余维度大小与 ``x`` 相同。
+
+
+代码示例
+::::::::::
+
+COPY-FROM: paddle.geometric.send_u_recv
diff --git a/docs/api/paddle/geometric/send_ue_recv_cn.rst b/docs/api/paddle/geometric/send_ue_recv_cn.rst
new file mode 100644
index 00000000000..203c55856ac
--- /dev/null
+++ b/docs/api/paddle/geometric/send_ue_recv_cn.rst
@@ -0,0 +1,53 @@
+.. _cn_api_geometric_send_ue_recv:
+
+send_ue_recv
+-------------------------------
+
+.. py:function:: paddle.geometric.send_ue_recv(x, y, src_index, dst_index, message_op="add", reduce_op="sum", out_size=None, name=None)
+
+主要应用于图学习领域,目的是为了减少在消息传递过程中带来的中间变量显存或内存的损耗。其中,``x`` 作为输入节点特征 Tensor,首先利用 ``src_index`` 作为索引来 gather 出在 ``x`` 中相应位置的数据,接着与边特征 Tensor ``e`` 进行计算,计算方式包括 add、sub、mul、div。随后再将计算的结果利用 ``dst_index`` 来更新到对应的输出结果中,其中 ``message_op`` 表示输入 ``x`` 和 ``e`` 之间的计算方式, ``reduce_op`` 表示不同的结果更新方式,包括 sum、mean、max、min 共计 4 种处理模式。另外,提供了 ``out_size`` 参数,用于设置实际输出的形状,有利于减少实际显存占用。
+
+.. code-block:: text
+
+ x = [[0, 2, 3],
+ [1, 4, 5],
+ [2, 6, 7]]
+
+ y = [1, 1, 1]
+
+ src_index = [0, 1, 2, 0]
+
+ dst_index = [1, 2, 1, 0]
+
+ message_op = "add"
+
+ reduce_op = "sum"
+
+ out_size = None
+
+ Then:
+
+ Out = [[1, 3, 4],
+ [4, 10, 12],
+ [2, 5, 6]]
+
+参数
+:::::::::
+ - **x** (Tensor) - 输入的节点特征 Tensor,数据类型为:float32、float64、int32、int64。另外,我们在 GPU 计算中支持 float16。
+ - **y** (Tensor) - 输入的边特征 Tensor,数据类型为:float32、float64、int32、int64。数据类型需与 ``x`` 相同。另外,我们在 GPU 计算中支持 float16。
+ - **src_index** (Tensor) - 1-D Tensor,数据类型为:int32、int64。
+ - **dst_index** (Tensor) - 1-D Tensor,数据类型为:int32、int64。注意:``dst_index`` 的形状应当与 ``src_index`` 一致。
+ - **message_op** (str) - 不同计算方式,包括 add、sub、mul、div。默认值为 add。
+ - **reduce_op** (str) - 不同更新方式,包括 sum、mean、max、min。默认值为 sum。
+ - **out_size** (int64 | Tensor | None) - 可以通过根据实际需求设置 ``out_size`` 来改变实际输出形状。默认值为 None,表示这个参数将不会被使用。注意,``out_size`` 的值必须等于或大于 ``max(dst_index) + 1`` 。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+``Tensor`` ,维度和数据类型都与 ``x`` 相同,存储运算后的结果。如果 ``out_size`` 参数正确设置了,则输出结果的第 0 维大小是 ``out_size`` ,其余维度大小与 ``x`` 相同。
+
+
+代码示例
+::::::::::
+
+COPY-FROM: paddle.geometric.send_ue_recv
diff --git a/docs/api/paddle/geometric/send_uv_cn.rst b/docs/api/paddle/geometric/send_uv_cn.rst
new file mode 100644
index 00000000000..efb30e49f23
--- /dev/null
+++ b/docs/api/paddle/geometric/send_uv_cn.rst
@@ -0,0 +1,44 @@
+.. _cn_api_geometric_send_uv:
+
+send_uv
+-------------------------------
+
+.. py:function:: paddle.geometric.send_uv(x, y, src_index, dst_index, message_op="add", name=None)
+
+主要应用于图学习领域,目的是为了减少在消息传递过程中带来的中间变量显存或内存的损耗。其中,``x`` 作为输入的节点特征 Tensor,首先利用 ``src_index`` 作为索引来 gather 出在 ``x`` 中相应位置的数据,接着利用 ``dst_index`` gather 出 ``y`` 中相应位置的数据,再通过 ``message_op`` 确认计算方式,最终返回。其中,``message_op`` 包括另外 add、sub、mul、div 共计四种计算方式。
+
+.. code-block:: text
+
+ x = [[0, 2, 3],
+ [1, 4, 5],
+ [2, 6, 7]]
+
+ src_index = [0, 1, 2, 0]
+
+ dst_index = [1, 2, 1, 0]
+
+ message_op = "add"
+
+ Then:
+
+ Out = [[0, 2, 3],
+ [2, 8, 10],
+ [1, 4, 5]]
+
+参数
+:::::::::
+ - **x** (Tensor) - 输入的节点特征 Tensor,数据类型为:float32、float64、int32、int64。另外,我们在 GPU 计算中支持 float16。
+ - **src_index** (Tensor) - 1-D Tensor,数据类型为:int32、int64。
+ - **dst_index** (Tensor) - 1-D Tensor,数据类型为:int32、int64。注意:``dst_index`` 的形状应当与 ``src_index`` 一致。
+ - **message_op** (str) - 不同计算方式,包括 add、sub、mul、div。默认值为 add。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+``Tensor`` ,输出更新后的边特征。
+
+
+代码示例
+::::::::::
+
+COPY-FROM: paddle.geometric.send_uv
diff --git a/docs/api/paddle/incubate/sparse/Overview_cn.rst b/docs/api/paddle/incubate/sparse/Overview_cn.rst
deleted file mode 100644
index 7683455b0f1..00000000000
--- a/docs/api/paddle/incubate/sparse/Overview_cn.rst
+++ /dev/null
@@ -1,54 +0,0 @@
-.. _cn_overview_paddle:
-
-paddle.incubate.sparse
----------------------
-
-paddle.incubate.sparse 目录包含飞桨框架支持稀疏数据存储和计算相关的 API。具体如下:
-
-- :ref:`稀疏数据结构相关 `
-- :ref:`数学操作 API `
-- :ref:`NN 相关 API `
-
-.. _about_sparse_tensor:
-
-稀疏数据结构相关
-::::::::::::::::::::
-
-.. csv-table::
- :header: "API 名称", "API 功能"
-
- " :ref:`paddle.incubate.sparse.sparse_coo_tensor ` ", "创建一个 COO 格式的 SparseTensor"
- " :ref:`paddle.incubate.sparse.sparse_csr_tensor ` ", "创建一个 CSR 格式的 SparseTensor"
- " :ref:`paddle.incubate.sparse.is_same_shape ` ", "判断两个 Tensor 的形状是否相同, 支持 DenseTensor 与 SparseTensor 相互比较"
-
-
-.. _about_sparse_math:
-
-数学操作相关
-::::::::::::::::::::
-
-.. csv-table::
- :header: "API 名称", "API 功能"
-
- " :ref:`paddle.incubate.sparse.abs` ", "绝对值函数"
- " :ref:`paddle.incubate.sparse.add` ", "Sparse Tensor 逐元素相加"
- " :ref:`paddle.incubate.sparse.asin` ", "arcsine 函数"
- " :ref:`paddle.incubate.sparse.asinh` ", "反双曲正弦函数"
- " :ref:`paddle.incubate.sparse.atan` ", "反双曲正切函数"
- " :ref:`paddle.incubate.sparse.add ` ", "逐元素加法"
- " :ref:`paddle.incubate.sparse.subtract ` ", "逐元素减法"
- " :ref:`paddle.incubate.sparse.multiply ` ", "逐元素乘法"
- " :ref:`paddle.incubate.sparse.divide ` ", "逐元素除法"
-
-
-.. _about_sparse_nn:
-
-NN 相关
-::::::::::::::::::::
-
-.. csv-table::
- :header: "API 名称", "API 功能"
-
- " :ref:`paddle.incubate.sparse.nn.Conv3D` ", "三维卷积"
- " :ref:`paddle.incubate.sparse.nn.SubmConv3D` ", "三维的 submanifold 卷积"
- " :ref:`paddle.incubate.sparse.nn.Relu` ", "激活函数"
diff --git a/docs/api/paddle/kthvalue_cn.rst b/docs/api/paddle/kthvalue_cn.rst
index 8b79215cadc..9328d9b128f 100644
--- a/docs/api/paddle/kthvalue_cn.rst
+++ b/docs/api/paddle/kthvalue_cn.rst
@@ -11,7 +11,7 @@ kthvalue
:::::::::
- **x** (Tensor) - 一个输入的 N-D ``Tensor``,支持的数据类型:float32、float64、int32、int64。
- **k** (int,Tensor) - 需要沿轴查找的第 ``k`` 小,所对应的 ``k`` 值。
- - **axis** (int,可选) - 指定对输入 Tensor 进行运算的轴,``axis`` 的有效范围是[-R, R),R 是输入 ``x`` 的 Rank, ``axis`` 为负时与 ``axis`` + R 等价。默认值为-1。
+ - **axis** (int,可选) - 指定对输入 Tensor 进行运算的轴,``axis`` 的有效范围是[-R, R),R 是输入 ``x`` 的 Rank, ``axis`` 为负时与 ``axis`` + R 等价。默认值为 None, 此时等当于-1。
- **keepdim** (bool,可选)- 是否保留指定的轴。如果是 True,维度会与输入 x 一致,对应所指定的轴的 size 为 1。否则,由于对应轴被展开,输出的维度会比输入小 1。默认值为 1。
- **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
diff --git a/docs/api/paddle/matmul_cn.rst b/docs/api/paddle/matmul_cn.rst
index 970b871e5c5..7eedfb7803d 100644
--- a/docs/api/paddle/matmul_cn.rst
+++ b/docs/api/paddle/matmul_cn.rst
@@ -26,10 +26,10 @@ matmul
参数
:::::::::
- - **x** (Tensor):输入变量,类型为 Tensor,数据类型为 float32, float64。
- - **y** (Tensor):输入变量,类型为 Tensor,数据类型为 float32, float64。
- - **transpose_x** (bool,可选):相乘前是否转置 x,默认值为 False。
- - **transpose_y** (bool,可选):相乘前是否转置 y,默认值为 False。
+ - **x** (Tensor) - 输入变量,类型为 Tensor,数据类型为 float32, float64。
+ - **y** (Tensor) - 输入变量,类型为 Tensor,数据类型为 float32, float64。
+ - **transpose_x** (bool,可选) - 相乘前是否转置 x,默认值为 False。
+ - **transpose_y** (bool,可选) - 相乘前是否转置 y,默认值为 False。
- **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
返回
diff --git a/docs/api/paddle/meshgrid_cn.rst b/docs/api/paddle/meshgrid_cn.rst
index b510cbfe9d8..628c124476b 100644
--- a/docs/api/paddle/meshgrid_cn.rst
+++ b/docs/api/paddle/meshgrid_cn.rst
@@ -8,14 +8,16 @@ meshgrid
-对每个张量做扩充操作。输入是张量或者包含张量的列表,包含 k 个一维张量,输出 k 个 k 维张量。
+对每个 Tensor 做扩充操作。输入是 Tensor 或者包含 Tensor 的列表,包含 k 个一维 Tensor,输出 k 个 k 维 Tensor。
参数
::::::::::::
- - \* **args** (Tensor|Tensor 数组)- 输入变量为 k 个一维张量,形状分别为(N1,), (N2,), ..., (Nk, )。支持数据类型为 float32,float64,int32,int64。
+ - \* **args** (Tensor|Tensor 数组)- 输入变量为 k 个一维 Tensor,形状分别为(N1,), (N2,), ..., (Nk, )。支持数据类型为 float32、float64、int32 和 int64。
- ** **kargs** (可选)- 目前只接受 name 参数(str),具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+
返回
::::::::::::
diff --git a/docs/api/paddle/nonzero_cn.rst b/docs/api/paddle/nonzero_cn.rst
index 479207eedf6..6fe88580760 100644
--- a/docs/api/paddle/nonzero_cn.rst
+++ b/docs/api/paddle/nonzero_cn.rst
@@ -15,13 +15,17 @@ nonzero
参数
:::::::::
- - **x** (Tensor)– 输入张量。
- - **as_tuple** (bool, optinal) - 返回格式。是否以 ``1-D Tensor`` 构成的元组格式返回。
+ - **x** (Tensor)– 输入的 Tensor。
+ - **as_tuple** (bool,可选) - 返回格式。是否以 ``1-D Tensor`` 构成的元组格式返回。
+
+
返回
:::::::::
- **Tensor or tuple(1-D Tensor)**,数据类型为 **INT64** 。
+
+
代码示例
:::::::::
diff --git a/docs/api/paddle/put_along_axis_cn.rst b/docs/api/paddle/put_along_axis_cn.rst
index 3004da5beb8..d7df131c8b8 100644
--- a/docs/api/paddle/put_along_axis_cn.rst
+++ b/docs/api/paddle/put_along_axis_cn.rst
@@ -18,7 +18,7 @@ put_along_axis
返回
:::::::::
-- **out** (Tensor) - 输出 Tensor,indeces 矩阵选定的下标会被插入 value,与 ``arr`` 数据类型相同。
+输出 Tensor,indeces 矩阵选定的下标会被插入 value,与 ``arr`` 数据类型相同。
代码示例
:::::::::
diff --git a/docs/api/paddle/roll_cn.rst b/docs/api/paddle/roll_cn.rst
index 84057c557e8..971499a2147 100644
--- a/docs/api/paddle/roll_cn.rst
+++ b/docs/api/paddle/roll_cn.rst
@@ -15,9 +15,11 @@ roll
- **x** (Tensor)– 输入的 Tensor。
- **shifts** (int|list|tuple) - 滚动位移。如果 ``shifts`` 是一个元组或者列表,则 ``axis`` 必须是相同大小的元组或者列表,输入张量将依次沿着每个维度滚动相应的数值。
- - **axis** (int|list|tuple, optinal) – 滚动轴。默认值为 None。
+ - **axis** (int|list|tuple,可选) – 滚动轴。默认值为 None。
- **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+
返回
:::::::::
diff --git a/docs/api/paddle/sparse/Overview_cn.rst b/docs/api/paddle/sparse/Overview_cn.rst
new file mode 100644
index 00000000000..b8bb3ea2997
--- /dev/null
+++ b/docs/api/paddle/sparse/Overview_cn.rst
@@ -0,0 +1,96 @@
+.. _cn_overview_paddle_sparse:
+
+paddle.sparse
+---------------------
+
+paddle.sparse 目录包含飞桨框架支持稀疏数据存储和计算相关的 API。具体如下:
+
+- :ref:`稀疏 Tensor 创建 `
+- :ref:`稀疏 Tensor 运算 `
+- :ref:`稀疏组网类 `
+- :ref:`稀疏组网类的函数式 API `
+
+.. _about_sparse_tensor:
+
+稀疏 Tensor 创建
+::::::::::::::::::::
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+
+ " :ref:`paddle.sparse.sparse_coo_tensor ` ", "创建一个 COO 格式的 SparseTensor"
+ " :ref:`paddle.sparse.sparse_csr_tensor ` ", "创建一个 CSR 格式的 SparseTensor"
+
+.. _about_sparse_math:
+
+稀疏 Tensor 运算
+::::::::::::::::::::
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+
+ " :ref:`paddle.sparse.sin ` ", "对稀疏 Tensor 逐元素求正弦"
+ " :ref:`paddle.sparse.tan ` ", "对稀疏 Tensor 逐元素求正切"
+ " :ref:`paddle.sparse.asin ` ", "对稀疏 Tensor 逐元素求反正弦"
+ " :ref:`paddle.sparse.atan ` ", "对稀疏 Tensor 逐元素求反正切"
+ " :ref:`paddle.sparse.sinh ` ", "对稀疏 Tensor 逐元素求双曲正弦"
+ " :ref:`paddle.sparse.tanh ` ", "对稀疏 Tensor 逐元素求双曲正切"
+ " :ref:`paddle.sparse.asinh ` ", "对稀疏 Tensor 逐元素求反双曲正弦"
+ " :ref:`paddle.sparse.atanh ` ", "对稀疏 Tensor 逐元素求反双曲正切"
+ " :ref:`paddle.sparse.sqrt ` ", "对稀疏 Tensor 逐元素求算数平方根"
+ " :ref:`paddle.sparse.square ` ", "对稀疏 Tensor 逐元素求平方"
+ " :ref:`paddle.sparse.log1p ` ", "对稀疏 Tensor 逐元素计算 ln(x+1)"
+ " :ref:`paddle.sparse.abs ` ", "对稀疏 Tensor 逐元素求绝对值"
+ " :ref:`paddle.sparse.pow ` ", "对稀疏 Tensor 逐元素计算 x 的 y 次幂"
+ " :ref:`paddle.sparse.cast ` ", "对稀疏 Tensor 逐元素转换类型"
+ " :ref:`paddle.sparse.neg ` ", "对稀疏 Tensor 逐元素计算相反数"
+ " :ref:`paddle.sparse.deg2rad ` ", "对稀疏 Tensor 逐元素从度转换为弧度"
+ " :ref:`paddle.sparse.rad2deg ` ", "对稀疏 Tensor 逐元素从弧度转换为度"
+ " :ref:`paddle.sparse.expm1 ` ", "对稀疏 Tensor 逐元素进行以自然数 e 为底的指数运算并减 1"
+ " :ref:`paddle.sparse.mv ` ", "稀疏矩阵乘向量,第一个参数为稀疏矩阵,第二个参数为稠密向量"
+ " :ref:`paddle.sparse.matmul ` ", "稀疏矩阵乘,第一个参数为稀疏矩阵,第二个参数为稠密矩阵或者稀疏矩阵"
+ " :ref:`paddle.sparse.addmm ` ", "稀疏矩阵乘与加法的组合运算"
+ " :ref:`paddle.sparse.masked_matmul ` ", "稀疏矩阵乘,第一、二个参数均为稠密矩阵,返回值为稀疏矩阵"
+ " :ref:`paddle.sparse.add ` ", "对稀疏 Tensor 逐元素相加"
+ " :ref:`paddle.sparse.subtract ` ", "对稀疏 Tensor 逐元素相减"
+ " :ref:`paddle.sparse.multiply ` ", "对稀疏 Tensor 逐元素相乘"
+ " :ref:`paddle.sparse.divide ` ", "对稀疏 Tensor 逐元素相除"
+ " :ref:`paddle.sparse.is_same_shape ` ", "判断两个稀疏 Tensor 或稠密 Tensor 的 shape 是否一致"
+ " :ref:`paddle.sparse.reshape ` ", "改变一个 SparseTensor 的形状"
+ " :ref:`paddle.sparse.coalesce` ", "对 SparseCooTensor 进行排序并合并"
+ " :ref:`paddle.sparse.transpose ` ", "在不改变数据的情况下改变 ``x`` 的维度顺序, 支持 COO 格式的多维 SparseTensor 以及 COO 格式的 2 维和 3 维 SparseTensor"
+
+.. _about_sparse_nn:
+
+稀疏组网类
+::::::::::::::::::::
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+
+ " :ref:`paddle.sparse.nn.ReLU ` ", "激活层"
+ " :ref:`paddle.sparse.nn.ReLU6 ` ", "激活层"
+ " :ref:`paddle.sparse.nn.LeakyReLU ` ", "激活层"
+ " :ref:`paddle.sparse.nn.Softmax ` ", "激活层"
+ " :ref:`paddle.sparse.nn.Conv3D ` ", "三维卷积层"
+ " :ref:`paddle.sparse.nn.SubmConv3D ` ", "子流形三维卷积层"
+ " :ref:`paddle.sparse.nn.BatchNorm` ", " Batch Normalization 层"
+ " :ref:`paddle.sparse.nn.SyncBatchNorm` ", " Synchronized Batch Normalization 层"
+ " :ref:`paddle.sparse.nn.MaxPool3D` ", "三维最大池化层"
+
+.. _about_sparse_nn_functional:
+
+稀疏组网类函数式 API
+::::::::::::::::::::
+
+.. csv-table::
+ :header: "API 名称", "API 功能"
+
+ " :ref:`paddle.sparse.nn.functional.relu ` ", "激活函数"
+ " :ref:`paddle.sparse.nn.functional.relu6 ` ", "激活函数"
+ " :ref:`paddle.sparse.nn.functional.leaky_relu ` ", "激活函数"
+ " :ref:`paddle.sparse.nn.functional.softmax ` ", "激活函数"
+ " :ref:`paddle.sparse.nn.functional.attention ` ", "稀疏 attention 函数"
+ " :ref:`paddle.sparse.nn.functional.conv3d ` ", "三维卷积函数"
+ " :ref:`paddle.sparse.nn.functional.subm_conv3d ` ", "子流形三维卷积函数"
+ " :ref:`paddle.sparse.nn.functional.max_pool3d ` ", "三维最大池化函数"
diff --git a/docs/api/paddle/sparse/abs_cn.rst b/docs/api/paddle/sparse/abs_cn.rst
new file mode 100644
index 00000000000..f7fda037455
--- /dev/null
+++ b/docs/api/paddle/sparse/abs_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_abs:
+
+abs
+-------------------------------
+
+.. py:function:: paddle.sparse.abs(x, name=None)
+
+
+逐元素计算输入 :attr:`x` 的绝对值,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = |x|
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.abs
diff --git a/docs/api/paddle/incubate/sparse/add_cn.rst b/docs/api/paddle/sparse/add_cn.rst
similarity index 78%
rename from docs/api/paddle/incubate/sparse/add_cn.rst
rename to docs/api/paddle/sparse/add_cn.rst
index 9f68cad2ebb..ef892cd8e6a 100644
--- a/docs/api/paddle/incubate/sparse/add_cn.rst
+++ b/docs/api/paddle/sparse/add_cn.rst
@@ -1,10 +1,9 @@
-.. _cn_api_paddle_incubate_sparse_add:
+.. _cn_api_paddle_sparse_add:
add
-------------------------------
-.. py:function:: paddle.incubate.sparse.add(x, y, name=None)
-
+.. py:function:: paddle.sparse.add(x, y, name=None)
输入 :attr:`x` 与输入 :attr:`y` 逐元素相加,并将各个位置的输出元素保存到返回结果中。
@@ -14,10 +13,10 @@ add
等式为:
.. math::
- Out = X + Y
+ out = x + y
-- :math:`X`:多维稀疏 Tensor。
-- :math:`Y`:多维稀疏 Tensor。
+- :math:`x`:多维稀疏 Tensor。
+- :math:`y`:多维稀疏 Tensor。
参数
:::::::::
@@ -33,4 +32,4 @@ add
代码示例
:::::::::
-COPY-FROM: paddle.incubate.sparse.add
+COPY-FROM: paddle.sparse.add
diff --git a/docs/api/paddle/sparse/addmm_cn.rst b/docs/api/paddle/sparse/addmm_cn.rst
new file mode 100644
index 00000000000..77ee62b5d49
--- /dev/null
+++ b/docs/api/paddle/sparse/addmm_cn.rst
@@ -0,0 +1,49 @@
+.. _cn_api_paddle_sparse_addmm:
+
+addmm
+-------------------------------
+
+.. py:function:: paddle.sparse.addmm(input, x, y, beta=1.0, alpha=1.0, name=None)
+
+.. note::
+ 该 API 从 `CUDA 11.0` 开始支持。
+
+对输入 :attr:`x` 与输入 :attr:`y` 求稀疏矩阵乘法,并将 `input` 加到计算结果上。
+
+数学公式:
+
+.. math::
+ out = alpha * x * y + beta * input
+
+输入、输出的格式对应关系如下:
+
+.. note::
+
+ input[SparseCsrTensor] + x[SparseCsrTensor] @ y[SparseCsrTensor] -> out[SparseCsrTensor]
+
+ input[DenseTensor] + x[SparseCsrTensor] @ y[DenseTensor] -> out[DenseTensor]
+
+ input[SparseCooTensor] + x[SparseCooTensor] @ y[SparseCooTensor] -> out[SparseCooTensor]
+
+ input[DenseTensor] + x[SparseCooTensor] @ y[DenseTensor] -> out[DenseTensor]
+
+该 API 支持反向传播,`input` 、 `x` 、 `y` 的维度相同且>=2D,不支持自动广播。
+
+参数
+:::::::::
+ - **input** (SparseTensor|DenseTensor) - 输入 Tensor,可以为 Coo 或 Csr 格式 或 DenseTensor。数据类型为 float32、float64。
+ - **x** (SparseTensor) - 输入 Tensor,可以为 Coo 或 Csr 格式。数据类型为 float32、float64。
+ - **y** (SparseTensor|DenseTensor) - 输入 Tensor,可以为 Coo 或 Csr 格式 或 DenseTensor。数据类型为 float32、float64。
+ - **beta** (float, 可选) - `input` 的系数。默认:1.0。
+ - **alpha** (float, 可选) - `x * y` 的系数。默认:1.0。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+SparseTensor|DenseTensor: 其 Tensor 类型、dtype、shape 与 `input` 相同。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.addmm
diff --git a/docs/api/paddle/sparse/asin_cn.rst b/docs/api/paddle/sparse/asin_cn.rst
new file mode 100644
index 00000000000..7b1584681fd
--- /dev/null
+++ b/docs/api/paddle/sparse/asin_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_asin:
+
+asin
+-------------------------------
+
+.. py:function:: paddle.sparse.asin(x, name=None)
+
+
+逐元素计算输入 :attr:`x` 的反正弦,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = asin(x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.asin
diff --git a/docs/api/paddle/sparse/asinh_cn.rst b/docs/api/paddle/sparse/asinh_cn.rst
new file mode 100644
index 00000000000..127485831b9
--- /dev/null
+++ b/docs/api/paddle/sparse/asinh_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_asinh:
+
+asinh
+-------------------------------
+
+.. py:function:: paddle.sparse.asinh(x, name=None)
+
+
+逐元素计算输入 :attr:`x` 的反双曲正弦,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = asinh(x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.asinh
diff --git a/docs/api/paddle/sparse/atan_cn.rst b/docs/api/paddle/sparse/atan_cn.rst
new file mode 100644
index 00000000000..640b14e1bf3
--- /dev/null
+++ b/docs/api/paddle/sparse/atan_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_atan:
+
+atan
+-------------------------------
+
+.. py:function:: paddle.sparse.atan(x, name=None)
+
+
+逐元素计算输入 :attr:`x` 的反正切,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = atan(x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.atan
diff --git a/docs/api/paddle/sparse/atanh_cn.rst b/docs/api/paddle/sparse/atanh_cn.rst
new file mode 100644
index 00000000000..4ea48820fe8
--- /dev/null
+++ b/docs/api/paddle/sparse/atanh_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_atanh:
+
+atanh
+-------------------------------
+
+.. py:function:: paddle.sparse.atanh(x, name=None)
+
+
+逐元素计算输入 :attr:`x` 的反双曲正切,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = atanh(x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.atanh
diff --git a/docs/api/paddle/sparse/cast_cn.rst b/docs/api/paddle/sparse/cast_cn.rst
new file mode 100644
index 00000000000..f408ffffd32
--- /dev/null
+++ b/docs/api/paddle/sparse/cast_cn.rst
@@ -0,0 +1,26 @@
+.. _cn_api_paddle_sparse_cast:
+
+cast
+-------------------------------
+
+.. py:function:: paddle.sparse.cast(x, index_dtype=None, value_dtype=None, name=None)
+
+输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。将稀疏 Tensor 的 index 转换为 `index_dtype` 类型
+( `SparseCsrTensor` 的 index 指: `crows` 与 `col` ),value 转换为 `value_dtype` 类型,
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **index_dtype** (np.dtype|str, optional) - SparseCooTensor 的 index 类型,SparseCsrTensor 的 crows/cols 类型。可以是 uint8,int8,int16,int32,int64。
+ - **value_dtype** (np.dtype|str, optional) - SparseCooTensor 或 SparseCsrTensor 的 value 类型。可以是 uint8,int8,int16,int32,int64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor,稀疏格式与 :attr:`x` 相同,数据类型为被转换后的类型。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.cast
diff --git a/docs/api/paddle/sparse/coalesce_cn.rst b/docs/api/paddle/sparse/coalesce_cn.rst
new file mode 100644
index 00000000000..07ec3e6959a
--- /dev/null
+++ b/docs/api/paddle/sparse/coalesce_cn.rst
@@ -0,0 +1,22 @@
+.. _cn_api_paddle_sparse_coalesce:
+
+coalesce
+-------------------------------
+
+.. py:function:: paddle.sparse.coalesce(x, name=None)
+
+coalesce 操作包含排序和合并相同 indices 两步,执行 coalesce 后,x 变成按 indices 进行有序排序,并行每个 indices 只出现一次。
+
+参数
+:::::::::
+ - **x** (Tensor) - 输入 SparseCooTensor
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+返回 coalesce 后的 SparseCooTensor。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.coalesce
diff --git a/docs/api/paddle/sparse/deg2rad_cn.rst b/docs/api/paddle/sparse/deg2rad_cn.rst
new file mode 100644
index 00000000000..f34825a612d
--- /dev/null
+++ b/docs/api/paddle/sparse/deg2rad_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_deg2rad:
+
+deg2rad
+-------------------------------
+
+.. py:function:: paddle.sparse.deg2rad(x, name=None)
+
+
+逐元素将输入 :attr:`x` 从度转换为弧度,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ deg2rad(x) = \pi * x / 180
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.deg2rad
diff --git a/docs/api/paddle/incubate/sparse/divide_cn.rst b/docs/api/paddle/sparse/divide_cn.rst
similarity index 78%
rename from docs/api/paddle/incubate/sparse/divide_cn.rst
rename to docs/api/paddle/sparse/divide_cn.rst
index 9247513c42b..d9c78301095 100644
--- a/docs/api/paddle/incubate/sparse/divide_cn.rst
+++ b/docs/api/paddle/sparse/divide_cn.rst
@@ -1,11 +1,9 @@
-.. _cn_api_paddle_incubate_sparse_divide:
+.. _cn_api_paddle_sparse_divide:
divide
-------------------------------
-.. py:function:: paddle.incubate.sparse.divide(x, y, name=None)
-
-
+.. py:function:: paddle.sparse.divide(x, y, name=None)
输入 :attr:`x` 与输入 :attr:`y` 逐元素相除,并将各个位置的输出元素保存到返回结果中。
@@ -14,10 +12,10 @@ divide
等式为:
.. math::
- Out = X / Y
+ out = x / y
-- :math:`X`:多维稀疏 Tensor。
-- :math:`Y`:多维稀疏 Tensor。
+- :math:`x`:多维稀疏 Tensor。
+- :math:`y`:多维稀疏 Tensor。
参数
:::::::::
@@ -33,4 +31,4 @@ divide
代码示例
:::::::::
-COPY-FROM: paddle.incubate.sparse.divide
+COPY-FROM: paddle.sparse.divide
diff --git a/docs/api/paddle/sparse/expm1_cn.rst b/docs/api/paddle/sparse/expm1_cn.rst
new file mode 100644
index 00000000000..ba88b0f1ffa
--- /dev/null
+++ b/docs/api/paddle/sparse/expm1_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_sparse_expm1:
+
+expm1
+-------------------------------
+
+.. py:function:: paddle.sparse.expm1(x, name=None)
+
+逐元素计算输入 :attr:`x` 的 `exp(x)-1` ,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = exp(x) - 1
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.expm1
diff --git a/docs/api/paddle/incubate/sparse/is_same_shape_cn.rst b/docs/api/paddle/sparse/is_same_shape_cn.rst
similarity index 79%
rename from docs/api/paddle/incubate/sparse/is_same_shape_cn.rst
rename to docs/api/paddle/sparse/is_same_shape_cn.rst
index 02f9d5464a8..b22f51b3228 100644
--- a/docs/api/paddle/incubate/sparse/is_same_shape_cn.rst
+++ b/docs/api/paddle/sparse/is_same_shape_cn.rst
@@ -1,11 +1,9 @@
-.. _cn_api_paddle_incubate_sparse_is_same_shape:
+.. _cn_api_paddle_sparse_is_same_shape:
is_same_shape
-------------------------------
-
-
-.. py:function:: paddle.incubate.sparse.is_same_shape(x, y)
+.. py:function:: paddle.sparse.is_same_shape(x, y)
返回两个 Tensor 形状比较的结果,判断输入 :attr:`x` 与输入 :attr:`y` 的形状是否相同,支持 DenseTensor、SparseCsrTensor 与 SparseCooTensor 之间任意两种的形状比较。
@@ -23,4 +21,4 @@ bool,两个 Tensor 形状比较的结果,相同为 True,不同为 False。
代码示例
:::::::::
-COPY-FROM: paddle.incubate.sparse.is_same_shape
+COPY-FROM: paddle.sparse.is_same_shape
diff --git a/docs/api/paddle/sparse/log1p_cn.rst b/docs/api/paddle/sparse/log1p_cn.rst
new file mode 100644
index 00000000000..29da8106d4d
--- /dev/null
+++ b/docs/api/paddle/sparse/log1p_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_sparse_log1p:
+
+log1p
+-------------------------------
+
+.. py:function:: paddle.sparse.log1p(x, name=None)
+
+逐元素计算 :attr:`x+1` 的自然对数,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = ln(1+x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.log1p
diff --git a/docs/api/paddle/sparse/masked_matmul_cn.rst b/docs/api/paddle/sparse/masked_matmul_cn.rst
new file mode 100644
index 00000000000..4d60b41c7ab
--- /dev/null
+++ b/docs/api/paddle/sparse/masked_matmul_cn.rst
@@ -0,0 +1,40 @@
+.. _cn_api_paddle_sparse_masked_matmul:
+
+masked_matmul
+-------------------------------
+
+.. py:function:: paddle.sparse.masked_matmul(x, y, mask, name=None)
+
+.. note::
+ 该 API 从 `CUDA 11.3` 开始支持。
+
+对输入 :attr:`x` 与输入 :attr:`y` 两个 DenseTensor 求矩阵乘法,同时根据稀疏 Tensor `mask` 进行压缩存储,
+返回一个与 `mask` 布局一致的稀疏 Tensor。
+
+输入、输出的格式对应关系如下:
+
+.. note::
+
+ x[DenseTensor] @ y[DenseTensor] * mask[SparseCooTensor] -> out[SparseCooTensor]
+
+ x[DenseTensor] @ y[DenseTensor] * mask[SparseCsrTensor] -> out[SparseCsrTensor]
+
+该 API 支持反向传播,`x` 和 `y` 必须 >= 2D,不支持自动广播。 `x` 的 shape 应该为 `[*, M, K]` , `y` 的 shape 应该为
+`[*, K, N]` , `mask` 的 shape 应该为 `[*, M, N]` 。其中 `*` 为 0 或者批维度。
+
+参数
+:::::::::
+ - **x** (DenseTensor) - 输入的 DenseTensor。数据类型为 float32、float64。
+ - **y** (DenseTensor) - 输入的 DenseTensor。数据类型为 float32、float64。
+ - **mask** (SparseTensor) - 输入的稀疏掩码,是一个稀疏 Tensor,可以为 Coo 或 Csr 格式。数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+SparseTensor: 其 Tensor 类型、dtype、shape 均与 `mask` 相同。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.masked_matmul
diff --git a/docs/api/paddle/sparse/matmul_cn.rst b/docs/api/paddle/sparse/matmul_cn.rst
new file mode 100644
index 00000000000..929ba721b62
--- /dev/null
+++ b/docs/api/paddle/sparse/matmul_cn.rst
@@ -0,0 +1,42 @@
+.. _cn_api_paddle_sparse_matmul:
+
+matmul
+-------------------------------
+
+.. py:function:: paddle.sparse.matmul(x, y, name=None)
+
+.. note::
+ 该 API 从 `CUDA 11.0` 开始支持。
+
+对输入 :attr:`x` 与输入 :attr:`y` 求稀疏矩阵乘法,`x` 为稀疏 Tensor, `y` 可为稀疏 Tensor 或稠密 Tensor。
+
+输入、输出的格式对应关系如下:
+
+.. note::
+
+ x[SparseCsrTensor] @ y[SparseCsrTensor] -> out[SparseCsrTensor]
+
+ x[SparseCsrTensor] @ y[DenseTensor] -> out[DenseTensor]
+
+ x[SparseCooTensor] @ y[SparseCooTensor] -> out[SparseCooTensor]
+
+ x[SparseCooTensor] @ y[DenseTensor] -> out[DenseTensor]
+
+该 API 支持反向传播,`x` 和 `y` 必须 >= 2D,不支持自动广播。 `x` 的 shape 应该为 `[*, M, K]` , `y` 的 shape 应该为
+`[*, K, N]` ,其中 `*` 为 0 或者批维度。
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的 Tensor,可以为 Coo 或 Csr 格式。数据类型为 float32、float64。
+ - **y** (SparseTensor|DenseTensor) - 输入 Tensor,可以为 Coo 或 Csr 格式 或 DenseTensor。数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+SparseTensor|DenseTensor: 其 Tensor 类型由 `x` 和 `y` 共同决定,数据类型与输入相同。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.matmul
diff --git a/docs/api/paddle/incubate/sparse/multiply_cn.rst b/docs/api/paddle/sparse/multiply_cn.rst
similarity index 77%
rename from docs/api/paddle/incubate/sparse/multiply_cn.rst
rename to docs/api/paddle/sparse/multiply_cn.rst
index b11147861e3..69d4e5b52b5 100644
--- a/docs/api/paddle/incubate/sparse/multiply_cn.rst
+++ b/docs/api/paddle/sparse/multiply_cn.rst
@@ -1,11 +1,9 @@
-.. _cn_api_paddle_incubate_sparse_multiply:
+.. _cn_api_paddle_sparse_multiply:
multiply
-------------------------------
-.. py:function:: paddle.incubate.sparse.multiply(x, y, name=None)
-
-
+.. py:function:: paddle.sparse.multiply(x, y, name=None)
输入 :attr:`x` 与输入 :attr:`y` 逐元素相乘,并将各个位置的输出元素保存到返回结果中。
@@ -14,10 +12,10 @@ multiply
等式为:
.. math::
- Out = X \odot Y
+ Out = x \odot y
-- :math:`X`:多维稀疏 Tensor。
-- :math:`Y`:多维稀疏 Tensor。
+- :math:`x`:多维稀疏 Tensor。
+- :math:`y`:多维稀疏 Tensor。
参数
:::::::::
@@ -33,4 +31,4 @@ multiply
代码示例
:::::::::
-COPY-FROM: paddle.incubate.sparse.multiply
+COPY-FROM: paddle.sparse.multiply
diff --git a/docs/api/paddle/sparse/mv_cn.rst b/docs/api/paddle/sparse/mv_cn.rst
new file mode 100644
index 00000000000..869f79cd940
--- /dev/null
+++ b/docs/api/paddle/sparse/mv_cn.rst
@@ -0,0 +1,38 @@
+.. _cn_api_paddle_sparse_mv:
+
+mv
+-------------------------------
+
+.. py:function:: paddle.sparse.mv(x, vec, name=None)
+
+.. note::
+ 该 API 从 `CUDA 11.0` 开始支持。
+
+输入 :attr:`x` 为稀疏矩阵,输入 :attr:`vec` 为稠密向量,对 `x` 与 `vec` 计算矩阵与向量相乘。
+
+输入、输出的格式对应关系如下:
+
+.. note::
+
+ x[SparseCsrTensor] @ vec[DenseTensor] -> out[DenseTensor]
+
+ x[SparseCooTensor] @ vec[DenseTensor] -> out[DenseTensor]
+
+该 API 支持反向传播。输入 `x` 的 shape 应该为 `[M, N]` ,输入 `vec` 的 shape 应该为 `[N]` ,输出 `out`
+的 shape 为 `[M]` 。
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的 2D 稀疏 Tensor,可以为 SparseCooTensor|SparseCsrTensor。数据类型为 float32、float64。
+ - **vec** (DenseTensor) - 输入 1D 稠密 Tensor,表示一个向量。数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+DenseTensor: 维度为 1,表示一个向量,数据类型与输入相同。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.mv
diff --git a/docs/api/paddle/sparse/neg_cn.rst b/docs/api/paddle/sparse/neg_cn.rst
new file mode 100644
index 00000000000..400b7931fdf
--- /dev/null
+++ b/docs/api/paddle/sparse/neg_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_neg:
+
+neg
+-------------------------------
+
+.. py:function:: paddle.sparse.neg(x, name=None)
+
+
+逐元素计算 :attr:`x` 的相反数,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = -x
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.neg
diff --git a/docs/api/paddle/sparse/nn/BatchNorm_cn.rst b/docs/api/paddle/sparse/nn/BatchNorm_cn.rst
new file mode 100644
index 00000000000..b76ac0b0a71
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/BatchNorm_cn.rst
@@ -0,0 +1,65 @@
+.. _cn_api_paddle_sparse_nn_BatchNorm:
+
+BatchNorm
+-------------------------------
+
+.. py:class:: paddle.sparse.nn.BatchNorm(num_features, momentum=0.9, epsilon=1e-05, weight_attr=None, bias_attr=None, data_format="NDHWC", use_global_stats=None, name=None)
+
+
+构建稀疏 ``BatchNorm`` 类的一个可调用对象,具体用法参照 ``代码示例`` 。可以处理 4D SparseCooTensor ,实现了批归一化层(Batch Normalization Layer)的功能,可用作卷积和全连接操作的批归一化函数,根据当前批次数据按通道计算的均值和方差进行归一化。更多详情请参考: `Batch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift `_ 。
+
+当 use_global_stats = False 时 :math: `\mu_{\beta}` 和 :math: `\sigma_{\beta}^{2}` 是 minibatch 的统计数据。计算公式如下:
+
+.. math::
+
+ \mu_{\beta} &\gets \frac{1}{m} \sum_{i=1}^{m} x_i \quad &// mini-batch-mean \\
+ \sigma_{\beta}^{2} &\gets \frac{1}{m} \sum_{i=1}^{m}(x_i - \mu_{\beta})^2 \quad &// mini-batch-variance \\
+
+- :math:`x` :批输入数据
+- :math:`m` :当前批次数据的大小
+
+当 use_global_stats = True :math:`\mu_{\beta}` 和 :math:`\sigma_{\beta}^{2}` 是全局(或运行)统计数据(moving_mean 和 moving_variance),通常来自预先训练好的模型。计算公式如下:
+
+.. math::
+
+ moving\_mean = moving\_mean * momentum + \mu_{\beta} * (1. - momentum) \quad &// global mean \\
+ moving\_variance = moving\_variance * momentum + \sigma_{\beta}^{2} * (1. - momentum) \quad &// global variance \\
+
+归一化函数公式如下:
+
+.. math::
+
+ \hat{x_i} &\gets \frac{x_i - \mu_\beta} {\sqrt{\sigma_{\beta}^{2} + \epsilon}} \quad &// normalize \\
+ y_i &\gets \gamma \hat{x_i} + \beta \quad &// scale-and-shift \\
+
+- :math:`\epsilon` :添加较小的值到方差中以防止除零
+- :math:`\gamma` :可训练的比例参数
+- :math:`\beta` :可训练的偏差参数
+
+参数
+::::::::::::
+
+ - **num_features** (int) - 指明输入 ``Tensor`` 的通道数量。
+ - **momentum** (float,可选) - 此值用于计算 ``moving_mean`` 和 ``moving_var`` 。默认值:0.9。更新公式如上所示。
+ - **epsilon** (float,可选) - 为了数值稳定加在分母上的值。默认值:1e-05。
+ - **weight_attr** (ParamAttr|bool,可选) - 指定权重参数属性的对象。如果为 False,则表示每个通道的伸缩固定为 1,不可改变。默认值为 None,表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_ParamAttr` 。
+ - **bias_attr** (ParamAttr,可选) - 指定偏置参数属性的对象。如果为 False,则表示每一个通道的偏移固定为 0,不可改变。默认值为 None,表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_ParamAttr` 。
+ - **data_format** (string,可选) - 指定输入数据格式,数据格式可以为“NCDHW"。默认值:“NCDHW”。
+ - **use_global_stats** (bool,可选) – 指示是否使用全局均值和方差。在预测或测试模式下,将 ``use_global_stats`` 设置为 true 或将 ``is_test`` 设置为 true,这两种行为是等效的。在训练模式中,当设置 ``use_global_stats`` 为 True 时,在训练期间也将使用全局均值和方差。默认值:False。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name` ,一般无需设置,默认值为 None。
+
+
+返回
+::::::::::::
+无
+
+形状
+::::::::::::
+
+ - input:形状为(批大小,维度,高度,宽度,通道数)的 5-D SparseCooTensor。
+ - output:和输入形状一样。
+
+代码示例
+::::::::::::
+
+COPY-FROM: paddle.sparse.nn.BatchNorm
diff --git a/docs/api/paddle/sparse/nn/Conv3D_cn.rst b/docs/api/paddle/sparse/nn/Conv3D_cn.rst
new file mode 100644
index 00000000000..2f593f41b14
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/Conv3D_cn.rst
@@ -0,0 +1,99 @@
+.. _cn_api_paddle_sparse_nn_Conv3D:
+
+Conv3D
+-------------------------------
+
+.. py:class:: paddle.sparse.nn.Conv3D(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros', weight_attr=None, bias_attr=None, data_format="NDHWC")
+
+**稀疏三维卷积层**
+
+稀疏三维卷积层(sparse convolution3D layer),根据输入、卷积核、步长(stride)、填充(padding)、空洞大小(dilations)一组参数计算得到输出特征层大小。输入和输出是
+NDHWC 格式,其中 N 是批尺寸,C 是通道数,D 是特征层深度,H 是特征层高度,W 是特征层宽度。如果 bias_attr 不为 False,卷积计算会添加偏置项。
+
+对每个输入 X,有等式:
+
+.. math::
+
+ Out = W * X + b
+
+其中:
+
+ - :math:`X`:输入值,NDHWC 格式的 5-D Tensor
+ - :math:`W`:卷积核值,DHWCM 格式的 5-D Tensor
+ - :math:`*`:卷积操作
+ - :math:`b`:偏置值,1-D Tensor,形为 ``[M]``
+ - :math:`Out`:输出值,NDHWC 格式的 5-D Tensor,和 ``X`` 的形状可能不同
+
+参数
+::::::::::::
+
+ - **in_channels** (int) - 输入图像的通道数。
+ - **out_channels** (int) - 由卷积操作产生的输出的通道数。
+ - **kernel_size** (int|list|tuple) - 卷积核大小。可以为单个整数或包含三个整数的元组或列表,分别表示卷积核的深度,高和宽。如果为单个整数,表示卷积核的深度,高和宽都等于该整数。
+ - **stride** (int|list|tuple,可选) - 步长大小。可以为单个整数或包含三个整数的元组或列表,分别表示卷积沿着深度,高和宽的步长。如果为单个整数,表示沿着高和宽的步长都等于该整数。默认值:1。
+ - **padding** (int|list|tuple|str,可选) - 填充大小。如果它是一个字符串,可以是"VALID"或者"SAME",表示填充算法,计算细节可参考上述 ``padding`` = "SAME"或 ``padding`` = "VALID" 时的计算公式。如果它是一个元组或列表,它可以有 3 种格式:
+
+ - (1)包含 5 个二元组:当 ``data_format`` 为"NCDHW"时为 [[0,0], [0,0], [padding_depth_front, padding_depth_back], [padding_height_top, padding_height_bottom], [padding_width_left, padding_width_right]],当 ``data_format`` 为"NDHWC"时为[[0,0], [padding_depth_front, padding_depth_back], [padding_height_top, padding_height_bottom], [padding_width_left, padding_width_right], [0,0]];
+ - (2)包含 6 个整数值:[padding_depth_front, padding_depth_back, padding_height_top, padding_height_bottom, padding_width_left, padding_width_right];
+ - (3)包含 3 个整数值:[padding_depth, padding_height, padding_width],此时 padding_depth_front = padding_depth_back = padding_depth, padding_height_top = padding_height_bottom = padding_height, padding_width_left = padding_width_right = padding_width。若为一个整数,padding_depth = padding_height = padding_width = padding。默认值:0。
+
+ - **dilation** (int|list|tuple,可选) - 空洞大小。可以为单个整数或包含三个整数的元组或列表,分别表示卷积核中的元素沿着深度,高和宽的空洞。如果为单个整数,表示深度,高和宽的空洞都等于该整数。默认值:1。
+ - **groups** (int,可选) - 三维卷积层的组数。根据 Alex Krizhevsky 的深度卷积神经网络(CNN)论文中的成组卷积:当 group=n,输入和卷积核分别根据通道数量平均分为 n 组,第一组卷积核和第一组输入进行卷积计算,第二组卷积核和第二组输入进行卷积计算,……,第 n 组卷积核和第 n 组输入进行卷积计算。默认值:1。
+ - **padding_mode** (str,可选) - 填充模式。包括 ``'zeros'``, ``'reflect'``, ``'replicate'`` 或者 ``'circular'``。默认值:``'zeros'`` 。
+ - **weight_attr** (ParamAttr,可选) - 指定权重参数属性的对象。默认值为 None,表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+ - **bias_attr** (ParamAttr|bool,可选)- 指定偏置参数属性的对象。若 ``bias_attr`` 为 bool 类型,只支持为 False,表示没有偏置参数。默认值为 None,表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+ - **data_format** (str,可选) - 指定输入的数据格式,输出的数据格式将与输入保持一致,可以是"NCDHW"和"NDHWC"。N 是批尺寸,C 是通道数,D 是特征深度,H 是特征高度,W 是特征宽度。默认值:"NDHWC"。 当前只支持"NDHWC"。
+
+
+属性
+::::::::::::
+
+weight
+'''''''''
+本层的可学习参数,类型为 ``Parameter``
+
+bias
+'''''''''
+本层的可学习偏置,类型为 ``Parameter``
+
+形状
+::::::::::::
+
+ - 输入::math:`(N, D_{in}, H_{in}, W_{in}, C_{in})`
+ - 卷积核::math:`(K_{d}, K_{h}, K_{w}, C_{in}, C_{out})`
+ - 偏置::math:`(C_{out})`
+ - 输出::math:`(N, D_{out}, H_{out}, W_{out}, C_{out})`
+
+ 其中
+
+ .. math::
+
+ D_{out} &= \frac{\left ( D_{in} + padding\_depth\_front + padding\_depth\_back-\left ( dilation[0]*\left ( kernel\_size[0]-1 \right )+1 \right ) \right )}{stride[0]}+1
+
+ H_{out} &= \frac{\left ( H_{in} + padding\_height\_top + padding\_height\_bottom-\left ( dilation[1]*\left ( kernel\_size[1]-1 \right )+1 \right ) \right )}{stride[1]}+1
+
+ W_{out} &= \frac{\left ( W_{in} + padding\_width\_left + padding\_width\_right -\left ( dilation[2]*\left ( kernel\_size[2]-1 \right )+1 \right ) \right )}{stride[2]}+1
+
+ 如果 ``padding`` = "SAME":
+
+ .. math::
+ D_{out} = \frac{(D_{in} + stride[0] - 1)}{stride[0]}
+
+ H_{out} = \frac{(H_{in} + stride[1] - 1)}{stride[1]}
+
+ W_{out} = \frac{(W_{in} + stride[2] - 1)}{stride[2]}
+
+ 如果 ``padding`` = "VALID":
+
+ .. math::
+ D_{out} = \frac{\left ( D_{in} -\left ( dilation[0]*\left ( kernel\_size[0]-1 \right )+1 \right ) \right )}{stride[0]}+1
+
+ H_{out} = \frac{\left ( H_{in} -\left ( dilation[1]*\left ( kernel\_size[1]-1 \right )+1 \right ) \right )}{stride[1]}+1
+
+ W_{out} = \frac{\left ( W_{in} -\left ( dilation[2]*\left ( kernel\_size[2]-1 \right )+1 \right ) \right )}{stride[2]}+1
+
+
+代码示例
+::::::::::::
+
+COPY-FROM: paddle.sparse.nn.Conv3D
diff --git a/docs/api/paddle/sparse/nn/LeakyReLU_cn.rst b/docs/api/paddle/sparse/nn/LeakyReLU_cn.rst
new file mode 100644
index 00000000000..5c61d818c2f
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/LeakyReLU_cn.rst
@@ -0,0 +1,33 @@
+.. _cn_api_paddle_sparse_nn_LeakyReLU:
+
+LeakyReLU
+-------------------------------
+.. py:class:: paddle.sparse.nn.LeakyReLU(negative_slope=0.01, name=None)
+
+稀疏 LeakyReLU 激活层,创建一个可调用对象以计算输入 `x` 的 `LeakReLU` 。
+
+.. math::
+ LeakyReLU(x)=
+ \left\{
+ \begin{array}{rcl}
+ x, & & if \ x >= 0 \\
+ negative\_slope * x, & & otherwise \\
+ \end{array}
+ \right.
+
+其中,:math:`x` 为输入的 Tensor
+
+参数
+::::::::::
+ - **negative_slope** (float,可选) - :math:`x < 0` 时的斜率。默认值为 0.01。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+形状
+:::::::::
+ - input:任意形状的 SparseTensor。
+ - output:和 input 具有相同形状和数据类型的 SparseTensor。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.LeakyReLU
diff --git a/docs/api/paddle/sparse/nn/MaxPool3D_cn.rst b/docs/api/paddle/sparse/nn/MaxPool3D_cn.rst
new file mode 100644
index 00000000000..e966be62fb9
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/MaxPool3D_cn.rst
@@ -0,0 +1,36 @@
+.. _cn_api_paddle_sparse_nn_MaxPool3D:
+
+MaxPool3D
+-------------------------------
+
+.. py:function:: paddle.sparse.nn.MaxPool3D(kernel_size, stride=None, padding=0, ceil_mode=False, return_mask=False, data_format="NDHWC", name=None)
+
+构建 `MaxPool3D` 类的一个可调用对象,其将构建一个三维最大池化层,根据输入参数 `kernel_size`, `stride`,
+`padding` 等参数对稀疏输入特征做最大池化操作。 输入输出都是 "NDHWC" 格式,其中 N 是批大小, C 是特征的通道数, H 是特征的高, W 是特征的宽。
+
+参数
+:::::::::
+ - **kernel_size** (int|list|tuple) - 池化核大小。如果它是一个元组或列表,它必须包含三个整数值,(pool_size_Depth,pool_size_Height, pool_size_Width)。若为一个整数,则表示 D,H 和 W 维度上均为该值,比如若 kernel_size=2,则池化核大小为[2,2,2]。
+ - **stride** (int|list|tuple,可选) - 池化层的步长。如果它是一个元组或列表,它将包含三个整数,(pool_stride_Depth,pool_stride_Height, pool_stride_Width)。若为一个整数,则表示 D, H 和 W 维度上 stride 均为该值。默认值为 None ,这时会使用 kernel_size 作为 stride 。
+ - **padding** (str|int|list|tuple,可选) - 池化填充。如果它是一个字符串,可以是"VALID"或者"SAME",表示填充算法。如果它是一个元组或列表,它可以有 3 种格式:(1)包含 3 个整数值:[pad_depth, pad_height, pad_width];(2)包含 6 个整数值:[pad_depth_front, pad_depth_back, pad_height_top, pad_height_bottom, pad_width_left, pad_width_right];(3)包含 5 个二元组:当 data_format 为"NCDHW"时为[[0,0], [0,0], [pad_depth_front, pad_depth_back], [pad_height_top, pad_height_bottom], [pad_width_left, pad_width_right]],当 data_format 为"NDHWC"时为[[0,0], [pad_depth_front, pad_depth_back], [pad_height_top, pad_height_bottom], [pad_width_left, pad_width_right], [0,0]]。若为一个整数,则表示 D、H 和 W 维度上均为该值。默认值:0 。
+ - **ceil_mode** (bool,可选) - 是否用 ceil 函数计算输出高度和宽度。如果是 True ,则使用 `ceil` 计算输出形状的大小。默认为 False 。
+ - **return_mask** (bool,可选) - 是否返回最大索引和输出。默认为 False 。
+ - **data_format** (str,可选) - 输入和输出的数据格式,可以是"NCDHW"和"NDHWC"。N 是批尺寸,C 是通道数,D 是特征深度,H 是特征高度,W 是特征宽度。当前只支持:"NDHWC" 。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name` ,一般无需设置,默认值为 None 。
+
+
+形状
+:::::::::
+ - **x** (Tensor):默认形状为(批大小,通道数,长度,高度,宽度),即 NCDHW 格式的 5-D Tensor。其数据类型为 float16, float32, float64, int32 或 int64。
+ - **output** (Tensor):默认形状为(批大小,通道数,输出特征长度,输出特征高度,输出特征宽度),即 NCDHW 格式的 5-D Tensor。其数据类型与输入相同。
+
+
+返回
+:::::::::
+计算 MaxPool3D 的可调用对象
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.MaxPool3D
diff --git a/docs/api/paddle/sparse/nn/ReLU6_cn.rst b/docs/api/paddle/sparse/nn/ReLU6_cn.rst
new file mode 100644
index 00000000000..fe6987130eb
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/ReLU6_cn.rst
@@ -0,0 +1,26 @@
+.. _cn_api_paddle_sparse_nn_ReLU6:
+
+ReLU6
+-------------------------------
+.. py:class:: paddle.sparse.nn.ReLU6(name=None)
+
+稀疏 ReLU6 激活层,创建一个可调用对象以计算输入 `x` 的 `ReLU6` 。
+
+.. math::
+ ReLU6(x) = min(max(0,x), 6)
+
+其中,:math:`x` 为输入的 Tensor
+
+参数
+::::::::::
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+形状
+:::::::::
+ - input:任意形状的 SparseTensor。
+ - output:和 input 具有相同形状和数据类型的 SparseTensor。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.ReLU6
diff --git a/docs/api/paddle/sparse/nn/ReLU_cn.rst b/docs/api/paddle/sparse/nn/ReLU_cn.rst
new file mode 100644
index 00000000000..ce8abe3b462
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/ReLU_cn.rst
@@ -0,0 +1,26 @@
+.. _cn_api_paddle_sparse_nn_ReLU:
+
+ReLU
+-------------------------------
+.. py:class:: paddle.sparse.nn.ReLU(name=None)
+
+稀疏 ReLU 激活层,创建一个可调用对象以计算输入 `x` 的 `ReLU` 。
+
+.. math::
+ ReLU(x) = max(x, 0)
+
+其中,:math:`x` 为输入的 Tensor
+
+参数
+::::::::::
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+形状
+:::::::::
+ - input:任意形状的 SparseTensor。
+ - output:和 input 具有相同形状和数据类型的 SparseTensor。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.ReLU
diff --git a/docs/api/paddle/sparse/nn/Softmax_cn.rst b/docs/api/paddle/sparse/nn/Softmax_cn.rst
new file mode 100644
index 00000000000..5e8137a4384
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/Softmax_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_nn_Softmax:
+
+Softmax
+-------------------------------
+.. py:class:: paddle.sparse.nn.Softmax(axis=-1, name=None)
+
+稀疏 Softmax 激活层,创建一个可调用对象以计算输入 `x` 的 `Softmax` 。
+
+当输入 `x` 为 `SparseCsrTensor` 时,仅支持 axis=-1,是由于 Csr 稀疏存储格式,更适合按行读取数据。
+
+如果将 `x` 从稀疏矩阵转换为稠密矩阵, :math:`i` 代表行数, :math:`j` 代表列数,且 axis=-1 时有如下公式:
+
+.. math::
+ softmax_ij = \frac{\exp(x_ij - max_j(x_ij))}{\sum_j(exp(x_ij - max_j(x_ij))}
+
+参数
+::::::::::
+ - **axis** (int, 可选) - 指定对输入 SparseTensor 计算 softmax 的轴。对于 SparseCsrTensor,仅支持 axis=-1。默认值:-1。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+形状
+:::::::::
+ - input:任意形状的 SparseTensor。
+ - output:和 input 具有相同形状和数据类型的 SparseTensor。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.Softmax
diff --git a/docs/api/paddle/sparse/nn/SubmConv3D_cn.rst b/docs/api/paddle/sparse/nn/SubmConv3D_cn.rst
new file mode 100644
index 00000000000..645de32af4f
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/SubmConv3D_cn.rst
@@ -0,0 +1,100 @@
+.. _cn_api_paddle_sparse_nn_SubmConv3D:
+
+SubmConv3D
+-------------------------------
+
+.. py:class:: paddle.sparse.nn.SubmConv3D(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros', key=None, weight_attr=None, bias_attr=None, data_format="NDHWC")
+
+**子流形稀疏三维卷积层**
+
+子流形稀疏三维卷积层(submanifold sparse convolution3D layer),根据输入、卷积核、步长(stride)、填充(padding)、空洞大小(dilations)一组参数计算得到输出特征层大小。输入和输出是
+NDHWC 格式,其中 N 是批尺寸,C 是通道数,D 是特征层深度,H 是特征层高度,W 是特征层宽度。如果 bias_attr 不为 False,卷积计算会添加偏置项。
+
+对每个输入 X,有等式:
+
+.. math::
+
+ Out = W * X + b
+
+其中:
+
+ - :math:`X` :输入值,NDHWC 格式的 5-D Tensor
+ - :math:`W` :卷积核值,DHWCM 格式的 5-D Tensor
+ - :math:`*` :子流形稀疏卷积操作的定义参考论文:https://arxiv.org/abs/1706.01307
+ - :math:`b` :偏置值,1-D Tensor,形为 ``[M]``
+ - :math:`Out` :输出值,NDHWC 格式的 5-D Tensor,和 ``X`` 的形状可能不同
+
+参数
+::::::::::::
+
+ - **in_channels** (int) - 输入图像的通道数。
+ - **out_channels** (int) - 由卷积操作产生的输出的通道数。
+ - **kernel_size** (int|list|tuple) - 卷积核大小。可以为单个整数或包含三个整数的元组或列表,分别表示卷积核的深度,高和宽。如果为单个整数,表示卷积核的深度,高和宽都等于该整数。
+ - **stride** (int|list|tuple,可选) - 步长大小。可以为单个整数或包含三个整数的元组或列表,分别表示卷积沿着深度,高和宽的步长。如果为单个整数,表示沿着高和宽的步长都等于该整数。默认值:1。
+ - **padding** (int|list|tuple|str,可选) - 填充大小。如果它是一个字符串,可以是 "VALID" 或者 "SAME" ,表示填充算法,计算细节可参考上述 ``padding`` = "SAME" 或 ``padding`` = "VALID" 时的计算公式。如果它是一个元组或列表,它可以有 3 种格式:
+
+ - (1)包含 5 个二元组:当 ``data_format`` 为 "NCDHW" 时为 [[0,0], [0,0], [padding_depth_front, padding_depth_back], [padding_height_top, padding_height_bottom], [padding_width_left, padding_width_right]],当 ``data_format`` 为 "NDHWC" 时为[[0,0], [padding_depth_front, padding_depth_back], [padding_height_top, padding_height_bottom], [padding_width_left, padding_width_right], [0,0]];
+ - (2)包含 6 个整数值:[padding_depth_front, padding_depth_back, padding_height_top, padding_height_bottom, padding_width_left, padding_width_right];
+ - (3)包含 3 个整数值:[padding_depth, padding_height, padding_width],此时 padding_depth_front = padding_depth_back = padding_depth, padding_height_top = padding_height_bottom = padding_height, padding_width_left = padding_width_right = padding_width。若为一个整数,padding_depth = padding_height = padding_width = padding。默认值:0。
+
+ - **dilation** (int|list|tuple,可选) - 空洞大小。可以为单个整数或包含三个整数的元组或列表,分别表示卷积核中的元素沿着深度,高和宽的空洞。如果为单个整数,表示深度,高和宽的空洞都等于该整数。默认值:1。
+ - **groups** (int,可选) - 三维卷积层的组数。根据 Alex Krizhevsky 的深度卷积神经网络(CNN)论文中的成组卷积:当 group = n ,输入和卷积核分别根据通道数量平均分为 n 组,第一组卷积核和第一组输入进行卷积计算,第二组卷积核和第二组输入进行卷积计算,……,第 n 组卷积核和第 n 组输入进行卷积计算。默认值:1。
+ - **padding_mode** (str,可选) - 填充模式。包括 ``'zeros'``, ``'reflect'``, ``'replicate'`` 或者 ``'circular'`` 。默认值:``'zeros'`` 。
+ - **key** (str,可选) - 这个 key 是用来保存或者使用相同的 rulebook ,rulebook 的定义参考论文:https://pdfs.semanticscholar.org/5125/a16039cabc6320c908a4764f32596e018ad3.pdf。 默认是 None。
+ - **weight_attr** (ParamAttr,可选) - 指定权重参数属性的对象。默认值为 None,表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+ - **bias_attr** (ParamAttr|bool,可选)- 指定偏置参数属性的对象。若 ``bias_attr`` 为 bool 类型,只支持为 False,表示没有偏置参数。默认值为 None,表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+ - **data_format** (str,可选) - 指定输入的数据格式,输出的数据格式将与输入保持一致,可以是 "NCDHW" 和 "NDHWC" 。N 是批尺寸,C 是通道数,D 是特征深度,H 是特征高度,W 是特征宽度。默认值:"NDHWC" 。 当前只支持 "NDHWC" 。
+
+
+属性
+::::::::::::
+
+weight
+'''''''''
+本层的可学习参数,类型为 ``Parameter``
+
+bias
+'''''''''
+本层的可学习偏置,类型为 ``Parameter``
+
+形状
+::::::::::::
+
+ - 输入::math:`(N, D_{in}, H_{in}, W_{in}, C_{in})`
+ - 卷积核::math:`(K_{d}, K_{h}, K_{w}, C_{in}, C_{out})`
+ - 偏置::math:`(C_{out})`
+ - 输出::math:`(N, D_{out}, H_{out}, W_{out}, C_{out})`
+
+ 其中
+
+ .. math::
+
+ D_{out} &= \frac{\left ( D_{in} + padding\_depth\_front + padding\_depth\_back-\left ( dilation[0]*\left ( kernel\_size[0]-1 \right )+1 \right ) \right )}{stride[0]}+1
+
+ H_{out} &= \frac{\left ( H_{in} + padding\_height\_top + padding\_height\_bottom-\left ( dilation[1]*\left ( kernel\_size[1]-1 \right )+1 \right ) \right )}{stride[1]}+1
+
+ W_{out} &= \frac{\left ( W_{in} + padding\_width\_left + padding\_width\_right -\left ( dilation[2]*\left ( kernel\_size[2]-1 \right )+1 \right ) \right )}{stride[2]}+1
+
+ 如果 ``padding`` = "SAME":
+
+ .. math::
+ D_{out} = \frac{(D_{in} + stride[0] - 1)}{stride[0]}
+
+ H_{out} = \frac{(H_{in} + stride[1] - 1)}{stride[1]}
+
+ W_{out} = \frac{(W_{in} + stride[2] - 1)}{stride[2]}
+
+ 如果 ``padding`` = "VALID":
+
+ .. math::
+ D_{out} = \frac{\left ( D_{in} -\left ( dilation[0]*\left ( kernel\_size[0]-1 \right )+1 \right ) \right )}{stride[0]}+1
+
+ H_{out} = \frac{\left ( H_{in} -\left ( dilation[1]*\left ( kernel\_size[1]-1 \right )+1 \right ) \right )}{stride[1]}+1
+
+ W_{out} = \frac{\left ( W_{in} -\left ( dilation[2]*\left ( kernel\_size[2]-1 \right )+1 \right ) \right )}{stride[2]}+1
+
+
+代码示例
+::::::::::::
+
+COPY-FROM: paddle.sparse.nn.SubmConv3D
diff --git a/docs/api/paddle/sparse/nn/SyncBatchNorm_cn.rst b/docs/api/paddle/sparse/nn/SyncBatchNorm_cn.rst
new file mode 100644
index 00000000000..578cbcd99c6
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/SyncBatchNorm_cn.rst
@@ -0,0 +1,83 @@
+.. _cn_api_paddle_sparse_nn_SyncBatchNorm:
+
+SyncBatchNorm
+-------------------------------
+
+.. py:class:: paddle.sparse.nn.SyncBatchNorm(num_features, epsilon=1e-5, momentum=0.9, weight_attr=None, bias_attr=None, data_format='NCHW', name=None)
+
+构建 ``SyncBatchNorm`` 类的一个可调用对象,具体用法参照 ``代码示例`` 。实现了跨卡 GPU 同步的批归一化(Cross-GPU Synchronized Batch Normalization Layer)的功能,可用在其他层(类似卷积层和全连接层)之后进行归一化操作。根据所有 GPU 同一批次的数据按照通道计算的均值和方差进行归一化。更多详情请参考:`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift `_ 。
+
+当模型处于训练模式时,:math:`\mu_{\beta}` 和 :math:`\sigma_{\beta}^{2}` 是所有 GPU 上同一 minibatch 的统计数据。计算公式如下:
+
+.. math::
+ \mu_{\beta} &\gets \frac{1}{m} \sum_{i=1}^{m} x_i \quad &// mini-batch-mean \\
+ \sigma_{\beta}^{2} &\gets \frac{1}{m} \sum_{i=1}^{m}(x_i - \mu_{\beta})^2 \quad &// mini-batch-variance \\
+
+- :math:`x` :所有 GPU 上同一批输入数据
+- :math:`m` :所有 GPU 上同一批次数据的大小
+
+当模型处于评估模式时,:math:`\mu_{\beta}` 和 :math:`\sigma_{\beta}^{2}` 是全局(或运行)统计数据(moving_mean 和 moving_variance,这两个统计量通常来自预先训练好的模型)。计算公式如下:
+
+.. math::
+
+ moving\_mean = moving\_mean * momentum + \mu_{\beta} * (1. - momentum) \quad &// global mean \\
+ moving\_variance = moving\_variance * momentum + \sigma_{\beta}^{2} * (1. - momentum) \quad &// global variance \\
+
+归一化函数公式如下:
+
+.. math::
+
+ \hat{x_i} &\gets \frac{x_i - \mu_\beta} {\sqrt{\sigma_{\beta}^{2} + \epsilon}} \quad &// normalize \\
+ y_i &\gets \gamma \hat{x_i} + \beta \quad &// scale-and-shift \\
+
+- :math:`\epsilon` :添加较小的值到方差中以防止除零
+- :math:`\gamma` :可训练的比例参数
+- :math:`\beta` :可训练的偏差参数
+
+.. note::
+
+ 如果您想用容器封装您的模型,而且您的模型在预测阶段中包含 ``SyncBatchNorm`` 这个算子的话,请使用 ``nn.LayerList`` 或者 ``nn.Sequential`` 而不要直接使用 ``list`` 来封装模型。
+
+参数
+::::::::::::
+
+ - **num_features** (int) - 指明输入 ``Tensor`` 的通道数量。
+ - **epsilon** (float,可选) - 为了数值稳定加在分母上的值。默认值:1e-05。
+ - **momentum** (float,可选) - 此值用于计算 ``moving_mean`` 和 ``moving_var`` 。默认值:0.9。更新公式如上所示。
+ - **weight_attr** (ParamAttr|bool,可选) - 指定权重参数属性的对象。如果设置为 ``False`` ,则表示本层没有可训练的权重参数。默认值为 None,表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+ - **bias_attr** (ParamAttr|bool,可选) - 指定偏置参数属性的对象。如果设置为 ``False`` ,则表示本层没有可训练的偏置参数。默认值为 None,表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+ - **data_format** (string,可选) - 指定输入数据格式,数据格式可以为“NCHW"。默认值:“NCHW”。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name` ,一般无需设置,默认值为 None。
+
+形状
+::::::::::::
+
+ - input:一个二维到五维的 ``Tensor`` 。
+ - output:和 input 相同形状的 ``Tensor`` 。
+
+代码示例
+::::::::::::
+
+COPY-FROM: paddle.sparse.nn.SyncBatchNorm
+
+方法
+:::::::::
+convert_sync_batchnorm(layer)
+'''''''''''''''''''''''''''''
+
+把 ``BatchNorm`` 层转换为 ``SyncBatchNorm`` 层。
+
+参数
+::::::::::::
+
+ - **layer** (paddle.nn.Layer) - 包含一个或多个 ``BatchNorm`` 层的模型。
+
+返回
+::::::::::::
+
+ 如果原始模型中有 ``BatchNorm`` 层,则把 ``BatchNorm`` 层转换为 ``SyncBatchNorm`` 层的原始模型。
+
+代码示例
+::::::::::::
+
+COPY-FROM: paddle.sparse.nn.SyncBatchNorm.convert_sync_batchnorm
diff --git a/docs/api/paddle/sparse/nn/functional/attention_cn.rst b/docs/api/paddle/sparse/nn/functional/attention_cn.rst
new file mode 100644
index 00000000000..5f36ad1c5cc
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/functional/attention_cn.rst
@@ -0,0 +1,41 @@
+.. _cn_api_paddle_sparse_nn_functional_attention:
+
+attention
+-------------------------------
+.. py:function:: paddle.sparse.nn.functional.attention(query, key, value, sparse_mask, key_padding_mask=None, attn_mask=None, name=None)
+
+.. note::
+ 该 API 从 `CUDA 11.7` 开始支持。
+
+稀疏 Attention,该 API 内部使用 SparseCsrTensor 来存储 Transformer 模块中的 attention 矩阵,从而达到减少显存占用、提高性能的目的。
+参数 `sparse_mask` 描述了稀疏矩阵的非 0 元素索引布局。
+
+.. math::
+ result = softmax(\frac{ Q * K^T }{\sqrt{d}}) * V
+
+其中:矩阵 `Q` `K` `V` 表示 attention 模块的三个输入 Tensor,其 shape 均为 `[batch_size, num_heads, seq_len, head_dim]` ,
+公式中的 `d` 代表 `head_dim` 。
+
+参数
+::::::::::
+ - **query** (DenseTensor) - Attention 模块的 `query` 输入,4D Tensor,数据类型为 float32、float64。
+ - **key** (DenseTensor) - Attention 模块的 `key` 输入,4D Tensor,数据类型为 float32、float64。
+ - **value** (DenseTensor) - Attention 模块的 `value` 输入,4D Tensor,数据类型为 float32、float64。
+ - **sparse_mask** (SparseCsrTensor) - Attention 模块的非 0 元素布局,是一个 3D 的 SparseCsrTensor,shape 为 `[batch_size*num_heads, seq_len, seq_len]` 。
+ 同时每个批次的非 0 元素个数均相等。`crows` 和 `cols` 的数据类型为 int64,`value` 的数据类型为 float32、float64。
+ - **key_padding_mask** (DenseTensor, 可选) - Attention 模块中的 key padding mask,是一个 2D 的 DenseTensor,shape 为 `[batch_size, seq_len]` 。
+ 数据类型为 float32、float64。默认:None,表示无此掩码运算。
+ - **attn_mask** (DenseTensor, 可选) - Attention 模块中的 attention mask,是一个 2D 的 DenseTensor,shape 为 `[seq_len, seq_len]` 。
+ 数据类型为 float32、float64。默认:None,表示无此掩码运算。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+
+返回
+:::::::::
+DenseTensor: 维度为 4,shape 为 `[batch_size, num_heads, seq_len, head_dim]` ,dtype 与输入相同。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.functional.attention
diff --git a/docs/api/paddle/sparse/nn/functional/conv3d_cn.rst b/docs/api/paddle/sparse/nn/functional/conv3d_cn.rst
new file mode 100644
index 00000000000..2f5f5c1aec4
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/functional/conv3d_cn.rst
@@ -0,0 +1,69 @@
+.. _cn_api_paddle_sparse_nn_functional_conv3d:
+
+conv3d
+-------------------------------
+
+.. py:function:: paddle.sparse.nn.functional.conv3d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1, data_format="NDHWC", name=None)
+
+稀疏三维卷积层(convolution3D layer),根据输入、卷积核、步长(stride)、填充(padding)、空洞大小(dilations)一组参数计算得到输出特征层大小。输入和输出是 NCDHW 或 NDHWC 格式,其中 N 是批尺寸,C 是通道数,D 是特征层深度,H 是特征层高度,W 是特征层宽度。如果 bias_attr 不为 False,卷积计算会添加偏置项。
+
+对每个输入 X ,有等式:
+
+.. math::
+
+ Out = W * X + b
+
+其中:
+
+ - :math:`X` :输入值,NCDHW 或 NDHWC 格式的 5-D Tensor
+ - :math:`W` :卷积核值,MCDHW 格式的 5-D Tensor
+ - :math:`*` :卷积操作
+ - :math:`b` :偏置值,1-D Tensor,形为 ``[M]``
+ - :math:`Out` :输出值,NCDHW 或 NDHWC 格式的 5-D Tensor,和 ``X`` 的形状可能不同
+
+**示例**
+
+- 输入:
+
+ 输入形状::math:`(N, D_{in}, H_{in}, W_{in}, C_{in})`
+
+ 卷积核形状::math:`(D_f, H_f, W_f, C_{in}, C_{out})`
+
+- 输出:
+
+ 输出形状::math:`(N, D_{out}, H_{out}, W_{out}, C_{out})`
+
+参数
+::::::::::::
+
+ - **x** (Tensor) - 输入是形状为 :math:`[N, D, H, W, C]` 的 5-D SparseCooTensor,N 是批尺寸,C 是通道数,D 是特征层深度,H 是特征高度,W 是特征宽度,数据类型为 float16, float32 或 float64 。
+ - **weight** (Tensor) - 形状为 :math:`[kD, kH, kW, C/g, M]` 的卷积核(卷积核)。 M 是输出通道数,g 是分组的个数,kH 是卷积核的高度,kW 是卷积核的宽度。
+ - **bias** (Tensor,可选) - 偏置项,形状为::math:`[M]` 。
+ - **stride** (int|list|tuple,可选) - 步长大小。卷积核和输入进行卷积计算时滑动的步长。
+
+ - 如果它是一个列表或元组,则必须包含三个整型数:(stride_depth, stride_height,stride_width)。
+ - 若为一个整数,stride_depth = stride_height = stride_width = stride。默认值:1。
+ - **padding** (int|list|tuple|str,可选) - 填充大小。如果它是一个字符串,可以是"VALID"或者"SAME",表示填充算法,计算细节可参考上述 ``padding`` = "SAME"或 ``padding`` = "VALID" 时的计算公式。如果它是一个元组或列表,它可以有 3 种格式:
+
+ - (1)包含 5 个二元组:当 ``data_format`` 为"NCDHW"时为 [[0,0], [0,0], [padding_depth_front, padding_depth_back], [padding_height_top, padding_height_bottom], [padding_width_left, padding_width_right]],当 ``data_format`` 为"NDHWC"时为[[0,0], [padding_depth_front, padding_depth_back], [padding_height_top, padding_height_bottom], [padding_width_left, padding_width_right], [0,0]];
+ - (2)包含 6 个整数值:[padding_depth_front, padding_depth_back, padding_height_top, padding_height_bottom, padding_width_left, padding_width_right];
+ - (3)包含 3 个整数值:[padding_depth, padding_height, padding_width],此时 padding_depth_front = padding_depth_back = padding_depth, padding_height_top = padding_height_bottom = padding_height, padding_width_left = padding_width_right = padding_width。若为一个整数,padding_depth = padding_height = padding_width = padding。默认值:0。
+ - **dilation** (int|list|tuple,可选) - 空洞大小。空洞卷积时会使用该参数,卷积核对输入进行卷积时,感受野里每相邻两个特征点之间的空洞信息。如果空洞大小为列表或元组,则必须包含两个整型数:(dilation_height,dilation_width)。若为一个整数,dilation_height = dilation_width = dilation。默认值:1。
+ - **groups** (int,可选) - 二维卷积层的组数。根据 Alex Krizhevsky 的深度卷积神经网络(CNN)论文中的成组卷积:当 group=n,输入和卷积核分别根据通道数量平均分为 n 组,第一组卷积核和第一组输入进行卷积计算,第二组卷积核和第二组输入进行卷积计算,……,第 n 组卷积核和第 n 组输入进行卷积计算。默认值:1。
+ - **weight_attr** (ParamAttr,可选) - 指定权重参数属性的对象。默认值为 None,表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+ - **bias_attr** (ParamAttr|bool,可选)- 指定偏置参数属性的对象。若 ``bias_attr`` 为 bool 类型,只支持为 False,表示没有偏置参数。默认值为 None,表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+ - **data_format** (str,可选) - 指定输入的数据格式,输出的数据格式将与输入保持一致,可以是"NCDHW"和"NDHWC"。N 是批尺寸,C 是通道数,D 是特征层深度,H 是特征高度,W 是特征宽度。默认值:"NCDHW"。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name` ,一般无需设置,默认值为 None。
+
+返回
+::::::::::::
+5-D Tensor ,数据类型与 ``input`` 一致。返回卷积计算的结果。
+
+返回类型
+::::::::::::
+Tensor。
+
+代码示例
+::::::::::::
+
+COPY-FROM: paddle.sparse.nn.functional.conv3d
diff --git a/docs/api/paddle/sparse/nn/functional/leaky_relu_cn.rst b/docs/api/paddle/sparse/nn/functional/leaky_relu_cn.rst
new file mode 100644
index 00000000000..331ef21ed0c
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/functional/leaky_relu_cn.rst
@@ -0,0 +1,33 @@
+.. _cn_api_paddle_sparse_nn_functional_leaky_relu:
+
+leaky_relu
+-------------------------------
+.. py:function:: paddle.sparse.nn.functional.leaky_relu(x, negative_slope=0.01, name=None)
+
+稀疏 leaky_relu 激活函数,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+.. math::
+ leaky_relu(x)=
+ \left\{
+ \begin{array}{rcl}
+ x, & & if \ x >= 0 \\
+ negative\_slope * x, & & otherwise \\
+ \end{array}
+ \right.
+
+其中,:math:`x` 为输入的 Tensor。
+
+参数
+::::::::::
+ - **x** (Tensor) - 输入的稀疏 Tensor,可以是 SparseCooTensor 或 SparseCsrTensor,数据类型为 float32、float64。
+ - **negative_slope** (float,可选) - :math:`x < 0` 时的斜率。默认值为 0.01。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.functional.leaky_relu
diff --git a/docs/api/paddle/sparse/nn/functional/max_pool3d_cn.rst b/docs/api/paddle/sparse/nn/functional/max_pool3d_cn.rst
new file mode 100644
index 00000000000..b780e6ef0f9
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/functional/max_pool3d_cn.rst
@@ -0,0 +1,35 @@
+.. _cn_api_paddle_sparse_nn_functional_max_pool3d:
+
+
+max_pool3d
+-------------------------------
+
+.. py:function:: paddle.sparse.nn.functional.max_pool3d(x, kernel_size, stride=None, padding=0, ceil_mode=False, data_format="NDHWC", name=None)
+
+该函数是一个三维最大池化函数,根据输入参数 `kernel_size` , `stride` , `padding` 等参数对输入 `x` 做最大池化操作。
+
+参数
+:::::::::
+ - **x** (Tensor) - 形状为 [N,D,H,W, C] 的 5-D SparseCooTensor,N 是批尺寸,C 是通道数,D 是特征深度,H 是特征高度,W 是特征宽度,数据类型为 float32 或 float64。
+ - **kernel_size** (int|list|tuple) - 池化核大小。如果它是一个元组或列表,它必须包含三个整数值,(pool_size_Depth,pool_size_Height, pool_size_Width)。若为一个整数,则表示 D,H 和 W 维度上均为该值,比如若 kernel_size=2,则池化核大小为[2,2,2]。
+ - **stride** (int|list|tuple,可选) - 池化层的步长。如果它是一个元组或列表,它将包含三个整数,(pool_stride_Depth,pool_stride_Height, pool_stride_Width)。若为一个整数,则表示 D, H 和 W 维度上 stride 均为该值。默认值为 kernel_size。
+ - **padding** (string|int|list|tuple,可选) - 池化填充。如果它是一个字符串,可以是"VALID"或者"SAME",表示填充算法。如果它是一个元组或列表,它可以有 3 种格式:
+
+ - (1)包含 3 个整数值:[pad_depth, pad_height, pad_width];
+ - (2)包含 6 个整数值:[pad_depth_front, pad_depth_back, pad_height_top, pad_height_bottom, pad_width_left, pad_width_right];
+ - (3)包含 5 个二元组:当 data_format 为"NCDHW"时为[[0,0], [0,0], [pad_depth_front, pad_depth_back], [pad_height_top, pad_height_bottom], [pad_width_left, pad_width_right]],当 data_format 为"NDHWC"时为[[0,0], [pad_depth_front, pad_depth_back], [pad_height_top, pad_height_bottom], [pad_width_left, pad_width_right], [0,0]]。若为一个整数,则表示 D、H 和 W 维度上均为该值。默认值:0
+ - **ceil_mode** (bool,可选) - 是否用 ceil 函数计算输出高度和宽度。如果是 True,则使用 `ceil` 计算输出形状的大小。默认为 False
+ - **data_format** (str,可选) - 输入和输出的数据格式,可以是"NCDHW"和"NDHWC"。N 是批尺寸,C 是通道数,D 是特征深度,H 是特征高度,W 是特征宽度。当前只支持:"NDHWC"。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name` ,一般无需设置,默认值为 None。
+
+
+
+返回
+:::::::::
+5-D Tensor,数据类型与输入 x 一致。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.functional.max_pool3d
diff --git a/docs/api/paddle/sparse/nn/functional/relu6_cn.rst b/docs/api/paddle/sparse/nn/functional/relu6_cn.rst
new file mode 100644
index 00000000000..e612638612c
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/functional/relu6_cn.rst
@@ -0,0 +1,26 @@
+.. _cn_api_paddle_sparse_nn_functional_relu6:
+
+relu6
+-------------------------------
+.. py:function:: paddle.sparse.nn.functional.relu6(x, name=None)
+
+稀疏 relu6 激活函数,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+.. math::
+ relu6(x) = min(max(0, x), 6)
+
+其中,:math:`x` 为输入的 Tensor。
+
+参数
+::::::::::
+ - **x** (Tensor) - 输入的稀疏 Tensor,可以是 SparseCooTensor 或 SparseCsrTensor,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.functional.relu6
diff --git a/docs/api/paddle/sparse/nn/functional/relu_cn.rst b/docs/api/paddle/sparse/nn/functional/relu_cn.rst
new file mode 100644
index 00000000000..99393c62514
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/functional/relu_cn.rst
@@ -0,0 +1,26 @@
+.. _cn_api_paddle_sparse_nn_functional_relu:
+
+relu
+-------------------------------
+.. py:function:: paddle.sparse.nn.functional.relu(x, name=None)
+
+稀疏 relu 激活函数,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+.. math::
+ relu(x) = max(x, 0)
+
+其中,:math:`x` 为输入的 Tensor。
+
+参数
+::::::::::
+ - **x** (Tensor) - 输入的稀疏 Tensor,可以是 SparseCooTensor 或 SparseCsrTensor,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.functional.relu
diff --git a/docs/api/paddle/sparse/nn/functional/softmax_cn.rst b/docs/api/paddle/sparse/nn/functional/softmax_cn.rst
new file mode 100644
index 00000000000..b2958e1aeee
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/functional/softmax_cn.rst
@@ -0,0 +1,31 @@
+.. _cn_api_paddle_sparse_nn_functional_softmax:
+
+softmax
+-------------------------------
+.. py:function:: paddle.sparse.nn.functional.softmax(x, axis=-1, name=None)
+
+稀疏 softmax 激活函数,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+当输入 `x` 为 `SparseCsrTensor` 时,仅支持 axis=-1,是由于 Csr 稀疏存储格式,更适合按行读取数据。
+
+如果将 `x` 从稀疏矩阵转换为稠密矩阵, :math:`i` 代表行数, :math:`j` 代表列数,且 axis=-1 时有如下公式:
+
+.. math::
+ softmax_ij = \frac{\exp(x_ij - max_j(x_ij))}{\sum_j(exp(x_ij - max_j(x_ij))}
+
+其中,:math:`x` 为输入的 Tensor。
+
+参数
+::::::::::
+ - **x** (Tensor) - 输入的稀疏 Tensor,可以是 SparseCooTensor 或 SparseCsrTensor,数据类型为 float32、float64。
+ - **axis** (int, 可选) - 指定对输入 SparseTensor 计算 softmax 的轴。对于 SparseCsrTensor,仅支持 axis=-1。默认值:-1。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同。
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.nn.functional.softmax
diff --git a/docs/api/paddle/sparse/nn/functional/subm_conv3d_cn.rst b/docs/api/paddle/sparse/nn/functional/subm_conv3d_cn.rst
new file mode 100644
index 00000000000..49a23d7d53e
--- /dev/null
+++ b/docs/api/paddle/sparse/nn/functional/subm_conv3d_cn.rst
@@ -0,0 +1,63 @@
+.. _cn_api_paddle_sparse_nn_functional_subm_conv3d:
+
+subm_conv3d
+-------------------------------
+
+.. py:function:: paddle.sparse.nn.functional.subm_conv3d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1, data_format="NDHWC", key=None, name=None)
+
+子流形稀疏三维卷积层(convolution3D layer),根据输入、卷积核、步长(stride)、填充(padding)、空洞大小(dilations)一组参数计算得到输出特征层大小。输入和输出是 NCDHW 或 NDHWC 格式,其中 N 是批尺寸,C 是通道数,D 是特征层深度,H 是特征层高度,W 是特征层宽度。如果 bias_attr 不为 False,卷积计算会添加偏置项。
+
+对每个输入 X,有等式:
+
+.. math::
+
+ Out = W * X + b
+
+其中:
+
+ - :math:`X` :输入值,NCDHW 或 NDHWC 格式的 5-D Tensor
+ - :math:`W` :卷积核值,MCDHW 格式的 5-D Tensor
+ - :math:`*` :卷积操作
+ - :math:`b` :偏置值,1-D Tensor,形为 ``[M]``
+ - :math:`Out` :输出值,NCDHW 或 NDHWC 格式的 5-D Tensor,和 ``X`` 的形状可能不同
+
+**示例**
+
+- 输入:
+
+ 输入形状::math:`(N, D_{in}, H_{in}, W_{in}, C_{in})`
+
+ 卷积核形状::math:`(D_f, H_f, W_f, C_{in}, C_{out})`
+
+- 输出:
+
+ 输出形状::math:`(N, D_{out}, H_{out}, W_{out}, C_{out})`
+
+参数
+::::::::::::
+
+ - **x** (Tensor) - 输入是形状为 :math:`[N, D, H, W, C]` 的 5-D SparseCooTensor,N 是批尺寸,C 是通道数,D 是特征层深度,H 是特征高度,W 是特征宽度,数据类型为 float16, float32 或 float64。
+ - **weight** (Tensor) - 形状为 :math:`[kD, kH, kW, C/g, M]` 的卷积核(卷积核)。 M 是输出通道数,g 是分组的个数,kH 是卷积核的高度,kW 是卷积核的宽度。
+ - **bias** (Tensor,可选) - 偏置项,形状为::math:`[M]` 。
+ - **stride** (int|list|tuple,可选) - 步长大小。卷积核和输入进行卷积计算时滑动的步长。如果它是一个列表或元组,则必须包含三个整型数:(stride_depth, stride_height,stride_width)。若为一个整数,stride_depth = stride_height = stride_width = stride。默认值:1。
+ - **padding** (int|list|tuple|str,可选) - 填充大小。如果它是一个字符串,可以是"VALID"或者"SAME",表示填充算法,计算细节可参考上述 ``padding`` = "SAME"或 ``padding`` = "VALID" 时的计算公式。如果它是一个元组或列表,它可以有 3 种格式:(1)包含 5 个二元组:当 ``data_format`` 为"NCDHW"时为 [[0,0], [0,0], [padding_depth_front, padding_depth_back], [padding_height_top, padding_height_bottom], [padding_width_left, padding_width_right]],当 ``data_format`` 为"NDHWC"时为[[0,0], [padding_depth_front, padding_depth_back], [padding_height_top, padding_height_bottom], [padding_width_left, padding_width_right], [0,0]];(2)包含 6 个整数值:[padding_depth_front, padding_depth_back, padding_height_top, padding_height_bottom, padding_width_left, padding_width_right];(3)包含 3 个整数值:[padding_depth, padding_height, padding_width],此时 padding_depth_front = padding_depth_back = padding_depth, padding_height_top = padding_height_bottom = padding_height, padding_width_left = padding_width_right = padding_width。若为一个整数,padding_depth = padding_height = padding_width = padding。默认值:0。
+ - **dilation** (int|list|tuple,可选) - 空洞大小。空洞卷积时会使用该参数,卷积核对输入进行卷积时,感受野里每相邻两个特征点之间的空洞信息。如果空洞大小为列表或元组,则必须包含两个整型数:(dilation_height,dilation_width)。若为一个整数,dilation_height = dilation_width = dilation。默认值:1。
+ - **groups** (int,可选) - 二维卷积层的组数。根据 Alex Krizhevsky 的深度卷积神经网络(CNN)论文中的成组卷积:当 group=n,输入和卷积核分别根据通道数量平均分为 n 组,第一组卷积核和第一组输入进行卷积计算,第二组卷积核和第二组输入进行卷积计算,……,第 n 组卷积核和第 n 组输入进行卷积计算。默认值:1。
+ - **weight_attr** (ParamAttr,可选) - 指定权重参数属性的对象。默认值为 None,表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr`。
+ - **bias_attr** (ParamAttr|bool,可选) - 指定偏置参数属性的对象。若 ``bias_attr`` 为 bool 类型,只支持为 False,表示没有偏置参数。默认值为 None,表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr`。
+ - **data_format** (str,可选) - 指定输入的数据格式,输出的数据格式将与输入保持一致,可以是"NCDHW"和"NDHWC"。N 是批尺寸,C 是通道数,D 是特征层深度,H 是特征高度,W 是特征宽度。默认值:"NCDHW"。
+ - **key** (str,可选) - 用来保存和使用相同 rulebook 。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name` ,一般无需设置,默认值为 None 。
+
+返回
+::::::::::::
+5-D Tensor,数据类型与 ``input`` 一致。返回卷积计算的结果。
+
+返回类型
+::::::::::::
+Tensor。
+
+代码示例
+::::::::::::
+
+COPY-FROM: paddle.sparse.nn.functional.subm_conv3d
diff --git a/docs/api/paddle/sparse/pow_cn.rst b/docs/api/paddle/sparse/pow_cn.rst
new file mode 100644
index 00000000000..2c1cc91a97d
--- /dev/null
+++ b/docs/api/paddle/sparse/pow_cn.rst
@@ -0,0 +1,31 @@
+.. _cn_api_paddle_sparse_pow:
+
+pow
+-------------------------------
+
+.. py:function:: paddle.sparse.pow(x, factor, name=None)
+
+
+逐元素计算 :attr:`x` 的幂函数,幂的系数为 `factor`,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+
+数学公式:
+
+.. math::
+ out = x^{factor}
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **factor** (float|int) - 幂函数的系数,可以为 float 或 int。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.pow
diff --git a/docs/api/paddle/sparse/rad2deg_cn.rst b/docs/api/paddle/sparse/rad2deg_cn.rst
new file mode 100644
index 00000000000..64eee2721f1
--- /dev/null
+++ b/docs/api/paddle/sparse/rad2deg_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_rad2deg:
+
+rad2deg
+-------------------------------
+
+.. py:function:: paddle.sparse.rad2deg(x, name=None)
+
+
+逐元素将输入 :attr:`x` 从弧度转换为度,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ rad2deg(x) = 180/ \pi * x
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.rad2deg
diff --git a/docs/api/paddle/sparse/reshape_cn.rst b/docs/api/paddle/sparse/reshape_cn.rst
new file mode 100644
index 00000000000..4b9c35dfbd1
--- /dev/null
+++ b/docs/api/paddle/sparse/reshape_cn.rst
@@ -0,0 +1,42 @@
+.. _cn_api_paddle_sparse_reshape:
+
+reshape
+-------------------------------
+
+.. py:function:: paddle.sparse.reshape(x, shape, name=None)
+
+
+在保持输入 ``x`` 数据不变的情况下,改变 ``x`` 的形状。 ``x`` 必须是一个 ``SparseCooTensor`` 或者 ``SparseCsrTensor`` 。
+
+目前只能针对输入 ``x`` 的 ``sparse dims`` 部分改变形状,但是 ``shape`` 仍必须指定为变形后的 ``Tensor`` 的完整的形状。
+
+注意如果 ``x`` 是一个 ``SparseCsrTensor`` , 则 ``len(shape)`` 必须为 2 或者 3。
+
+在指定目标 ``shape`` 时存在一些技巧:
+
+ - 1. -1 表示这个维度的值是从 ``x`` 的元素总数和剩余维度推断出来的。因此,有且只有一个维度可以被设置为-1。
+ - 2. 0 表示实际的维数是从 ``x`` 的对应维数中复制出来的,因此 ``shape`` 中 0 的索引值不能超过 ``x`` 的维度。
+
+这里有一些例子来解释它们:
+
+ - 1. 给定一个形状为[2,4,6]的三维 Tensor x ,目标形状为[6,8],则将 x 变换为形状为[6,8]的 2-D Tensor,且 x 的数据保持不变。
+ - 2. 给定一个形状为[2,4,6]的三维 Tensor x ,目标形状为[2,3,-1,2],则将 x 变换为形状为[2,3,4,2]的 4-D Tensor,且 x 的数据保持不变。在这种情况下,目标形状的一个维度被设置为 -1 ,这个维度的值是从 x 的元素总数和剩余维度推断出来的。
+ - 3. 给定一个形状为[2,4,6]的三维 Tensor x ,目标形状为[-1,0,3,2],则将 x 变换为形状为[2,4,3,2]的 4-D Tensor,且 x 的数据保持不变。在这种情况下, 0 对应位置的维度值将从 x 的对应维数中复制,-1 对应位置的维度值由 x 的元素总数和剩余维度推断出来。
+
+参数
+:::::::::
+
+ - **x** (Tensor) - ``sparse tensor``,数据类型为 ``float32``、 ``float64``、 ``int32``、 ``int64`` 或者 ``bool``。
+ - **shape** (list|tuple) - 数据类型是 ``int32``。定义目标形状。目标形状最多只能有一个维度为 -1 。
+ - **name** (str ,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None 。
+
+返回
+:::::::::
+
+``Tensor`` : 改变形状后的 ``Tensor``,数据类型与 ``x`` 相同。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.reshape
diff --git a/docs/api/paddle/sparse/sin_cn.rst b/docs/api/paddle/sparse/sin_cn.rst
new file mode 100644
index 00000000000..6cf81831c7b
--- /dev/null
+++ b/docs/api/paddle/sparse/sin_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_sparse_sin:
+
+sin
+-------------------------------
+
+.. py:function:: paddle.sparse.sin(x, name=None)
+
+逐元素计算 :attr:`x` 的正弦,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = sin(x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.sin
diff --git a/docs/api/paddle/sparse/sinh_cn.rst b/docs/api/paddle/sparse/sinh_cn.rst
new file mode 100644
index 00000000000..e02064db36c
--- /dev/null
+++ b/docs/api/paddle/sparse/sinh_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_sparse_sinh:
+
+sinh
+-------------------------------
+
+.. py:function:: paddle.sparse.sinh(x, name=None)
+
+逐元素计算 :attr:`x` 的双曲正弦,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = sinh(x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.sinh
diff --git a/docs/api/paddle/incubate/sparse/sparse_coo_tensor_cn.rst b/docs/api/paddle/sparse/sparse_coo_tensor_cn.rst
similarity index 90%
rename from docs/api/paddle/incubate/sparse/sparse_coo_tensor_cn.rst
rename to docs/api/paddle/sparse/sparse_coo_tensor_cn.rst
index dadc33bd8f0..dfca401f11e 100644
--- a/docs/api/paddle/incubate/sparse/sparse_coo_tensor_cn.rst
+++ b/docs/api/paddle/sparse/sparse_coo_tensor_cn.rst
@@ -1,10 +1,9 @@
-.. _cn_api_paddle_incubate_sparse_coo_tensor:
+.. _cn_api_paddle_sparse_coo_tensor:
sparse_coo_tensor
-------------------------------
-
-.. py:function:: paddle.incubate.sparse.sparse_coo_tensor(indices, values, shape=None, dtype=None, place=None, stop_gradient=True)
+.. py:function:: paddle.sparse.sparse_coo_tensor(indices, values, shape=None, dtype=None, place=None, stop_gradient=True)
该 API 通过已知的非零元素的 ``indices`` 和 ``values`` 来创建一个 coordinate 格式的稀疏 tensor,tensor 类型为 ``paddle.Tensor`` 。
@@ -40,4 +39,4 @@ sparse_coo_tensor
代码示例
:::::::::
-COPY-FROM: paddle.incubate.sparse.sparse_coo_tensor
+COPY-FROM: paddle.sparse.sparse_coo_tensor
diff --git a/docs/api/paddle/incubate/sparse/sparse_csr_tensor_cn.rst b/docs/api/paddle/sparse/sparse_csr_tensor_cn.rst
similarity index 91%
rename from docs/api/paddle/incubate/sparse/sparse_csr_tensor_cn.rst
rename to docs/api/paddle/sparse/sparse_csr_tensor_cn.rst
index 0240ec06dcf..4a032e366fb 100644
--- a/docs/api/paddle/incubate/sparse/sparse_csr_tensor_cn.rst
+++ b/docs/api/paddle/sparse/sparse_csr_tensor_cn.rst
@@ -1,10 +1,9 @@
-.. _cn_api_paddle_incubate_sparse_csr_tensor:
+.. _cn_api_paddle_sparse_csr_tensor:
sparse_csr_tensor
-------------------------------
-
-.. py:function:: paddle.incubate.sparse.sparse_csr_tensor(crows, cols, values, shape, dtype=None, place=None, stop_gradient=True)
+.. py:function:: paddle.sparse.sparse_csr_tensor(crows, cols, values, shape, dtype=None, place=None, stop_gradient=True)
该 API 通过已知的非零元素的 ``crows`` , ``cols`` 和 ``values`` 来创建一个 CSR(Compressed Sparse Row) 格式的稀疏 tensor,tensor 类型为 ``paddle.Tensor`` 。
@@ -43,4 +42,4 @@ sparse_csr_tensor
**代码示例**
-COPY-FROM: paddle.incubate.sparse.sparse_csr_tensor
+COPY-FROM: paddle.sparse.sparse_csr_tensor
diff --git a/docs/api/paddle/sparse/sqrt_cn.rst b/docs/api/paddle/sparse/sqrt_cn.rst
new file mode 100644
index 00000000000..1cbf40a44be
--- /dev/null
+++ b/docs/api/paddle/sparse/sqrt_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_sparse_sqrt:
+
+sqrt
+-------------------------------
+
+.. py:function:: paddle.sparse.sqrt(x, name=None)
+
+逐元素计算 :attr:`x` 的算术平方根,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = sqrt(x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.sqrt
diff --git a/docs/api/paddle/sparse/square_cn.rst b/docs/api/paddle/sparse/square_cn.rst
new file mode 100644
index 00000000000..4cc0c8b742c
--- /dev/null
+++ b/docs/api/paddle/sparse/square_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_square:
+
+square
+-------------------------------
+
+.. py:function:: paddle.sparse.square(x, name=None)
+
+
+逐元素计算 :attr:`x` 的平方,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = square(x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.square
diff --git a/docs/api/paddle/incubate/sparse/subtract_cn.rst b/docs/api/paddle/sparse/subtract_cn.rst
similarity index 77%
rename from docs/api/paddle/incubate/sparse/subtract_cn.rst
rename to docs/api/paddle/sparse/subtract_cn.rst
index 21d433ff220..e1596043bb0 100644
--- a/docs/api/paddle/incubate/sparse/subtract_cn.rst
+++ b/docs/api/paddle/sparse/subtract_cn.rst
@@ -1,10 +1,9 @@
-.. _cn_api_paddle_incubate_sparse_subtract:
+.. _cn_api_paddle_sparse_subtract:
subtract
-------------------------------
-.. py:function:: paddle.incubate.sparse.subtract(x, y, name=None)
-
+.. py:function:: paddle.sparse.subtract(x, y, name=None)
输入 :attr:`x` 与输入 :attr:`y` 逐元素相减,并将各个位置的输出元素保存到返回结果中。
@@ -14,10 +13,10 @@ subtract
等式为:
.. math::
- Out = X - Y
+ out = x - y
-- :math:`X`:多维稀疏 Tensor。
-- :math:`Y`:多维稀疏 Tensor。
+- :math:`x`:多维稀疏 Tensor。
+- :math:`y`:多维稀疏 Tensor。
参数
:::::::::
@@ -33,4 +32,4 @@ subtract
代码示例
:::::::::
-COPY-FROM: paddle.incubate.sparse.subtract
+COPY-FROM: paddle.sparse.subtract
diff --git a/docs/api/paddle/sparse/tan_cn.rst b/docs/api/paddle/sparse/tan_cn.rst
new file mode 100644
index 00000000000..1aed63f201d
--- /dev/null
+++ b/docs/api/paddle/sparse/tan_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_sparse_tan:
+
+tan
+-------------------------------
+
+.. py:function:: paddle.sparse.tan(x, name=None)
+
+逐元素计算 :attr:`x` 的正切,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = tan(x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.tan
diff --git a/docs/api/paddle/sparse/tanh_cn.rst b/docs/api/paddle/sparse/tanh_cn.rst
new file mode 100644
index 00000000000..5ad81c7df01
--- /dev/null
+++ b/docs/api/paddle/sparse/tanh_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_tanh:
+
+tanh
+-------------------------------
+
+.. py:function:: paddle.sparse.tanh(x, name=None)
+
+
+逐元素计算 :attr:`x` 的双曲正切,要求 输入 :attr:`x` 为 `SparseCooTensor` 或 `SparseCsrTensor` 。
+
+数学公式:
+
+.. math::
+ out = tanh(x)
+
+参数
+:::::::::
+ - **x** (SparseTensor) - 输入的稀疏 Tensor,可以为 Coo 或 Csr 格式,数据类型为 float32、float64。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+多维稀疏 Tensor, 数据类型和稀疏格式与 :attr:`x` 相同 。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.tanh
diff --git a/docs/api/paddle/sparse/transpose_cn.rst b/docs/api/paddle/sparse/transpose_cn.rst
new file mode 100644
index 00000000000..b66bcab7d12
--- /dev/null
+++ b/docs/api/paddle/sparse/transpose_cn.rst
@@ -0,0 +1,29 @@
+.. _cn_api_paddle_sparse_transpose:
+
+transpose
+-------------------------------
+
+.. py:function:: paddle.sparse.transpose(x, perm, name=None)
+
+
+根据 :attr:`perm` 对输入的 :attr:`x` 维度进行重排,但不改变数据,
+:attr:`x` 必须是多维 SparseTensor 或 COO 格式的 2 维或 3 维 SparseTensor。
+
+.. math::
+ out = transpose(x, perm)
+
+参数
+:::::::::
+ - **x** (Tensor) - 输入的 Tensor,数据类型为 float32、float64、int32 或 int64。
+ - **perm** (list|tuple) - :attr:`perm` 长度必须和 :attr:`x` 的维度相同,并依照 :attr:`perm` 中数据进行重排。
+ - **name** (str,可选) - 具体用法请参见 :ref:`api_guide_Name`,一般无需设置,默认值为 None。
+
+返回
+:::::::::
+转置后的稀疏 Tensor, 数据类型和压缩格式与 :attr:`x` 相同。
+
+
+代码示例
+:::::::::
+
+COPY-FROM: paddle.sparse.transpose
diff --git a/docs/api/paddle/static/ipu_shard_guard_cn.rst b/docs/api/paddle/static/ipu_shard_guard_cn.rst
index 94a90546bfe..0351c9f9719 100644
--- a/docs/api/paddle/static/ipu_shard_guard_cn.rst
+++ b/docs/api/paddle/static/ipu_shard_guard_cn.rst
@@ -9,15 +9,14 @@ ipu_shard_guard
对模型进行切分。用于指定 Op 在哪个 ipu 上进行计算以及模型被切分之后的计算顺序。
.. note::
-
- 仅支持当 enable_manual_shard=True,index 设置才有效。请参阅 :ref:`cn_api_fluid_IpuStrategy` 。
- 仅支持当 enable_pipelining=True,stage 设置才有效。请参阅 :ref:`cn_api_fluid_IpuStrategy` 。
+ 仅支持当 enable_manual_shard=True,才能将 index 设置为非-1 的值。请参阅 :ref:`cn_api_fluid_IpuStrategy` 。
+ 仅支持当 enable_pipelining=True,才能将 stage 设置为非-1 的值。请参阅 :ref:`cn_api_fluid_IpuStrategy` 。
一个 index 支持对应 None stage 或一个 stage,一个 stage 仅支持对应一个新的 index 或者一个重复的 index。
参数
:::::::::
- - **index** (int,可选) - 指定 Op 在哪个 ipu 上计算,(如‘0, 1, 2, 3’),默认值-1,表示 Op 没有指定 ipu。
- - **stage** (int,可选) – 指定被切分的模型的计算顺序,(如‘0, 1, 2, 3’),按照数值大小顺序对被切分的模型进行计算,默认值-1,表示没有数据流水计算顺序并按照计算图顺序计算 Op。
+ - **index** (int,可选) - 指定 Op 在哪个 ipu 上计算,(如‘0, 1, 2, 3’),默认值-1,表示 Op 仅在 ipu 0 上运行。
+ - **stage** (int,可选) - 指定被切分的模型的计算顺序,(如‘0, 1, 2, 3’),按照数值大小顺序对被切分的模型进行计算,默认值-1,表示没有数据流水计算顺序并按照计算图顺序计算 Op。
返回
:::::::::
diff --git a/docs/api/paddle/static/mlu_places_cn.rst b/docs/api/paddle/static/mlu_places_cn.rst
index eb7d4a4487f..789af734130 100644
--- a/docs/api/paddle/static/mlu_places_cn.rst
+++ b/docs/api/paddle/static/mlu_places_cn.rst
@@ -5,19 +5,19 @@ mlu_places
.. py:function:: paddle.static.mlu_places(device_ids=None)
-
-.. note::
- 多卡任务请先使用 FLAGS_selected_mlus 环境变量设置可见的 MLU 设备。
-
该接口根据 ``device_ids`` 创建一个或多个 ``paddle.device.MLUPlace`` 对象,并返回所创建的对象列表。
-如果 ``device_ids`` 为 ``None``,则首先检查 ``FLAGS_selected_mlus`` 标志。
+如果 ``device_ids`` 为 ``None``,则首先检查 ``FLAGS_selected_mlus`` 环境变量。
例如:``FLAGS_selected_mlus=0,1,2``,则返回的列表将为 ``[paddle.device.MLUPlace(0), paddle.device.MLUPlace(1), paddle.device.MLUPlace(2)]``。
-如果未设置标志 ``FLAGS_selected_mlus``,则返回所有可见的 MLU places。
+
+如果未设置环境变量 ``FLAGS_selected_mlus``,则返回所有可见的 MLU 位置。
如果 ``device_ids`` 不是 ``None``,它应该是使用的 MLU 设备 ID 的列表或元组。
例如:``device_id=[0,1,2]``,返回的列表将是 ``[paddle.device.MLUPlace(0), paddle.device.MLUPlace(1), paddle.device.MLUPlace(2)]``。
+.. note::
+ 多卡任务请先使用 FLAGS_selected_mlus 环境变量设置可见的 MLU 设备。
+
参数
:::::::::
- **device_ids** (list(int)|tuple(int),可选) - MLU 的设备 ID 列表或元组。默认值为 ``None``。
diff --git a/docs/api/paddle/static/set_ipu_shard_cn.rst b/docs/api/paddle/static/set_ipu_shard_cn.rst
index 404a9c313a7..c18d5f07fde 100644
--- a/docs/api/paddle/static/set_ipu_shard_cn.rst
+++ b/docs/api/paddle/static/set_ipu_shard_cn.rst
@@ -9,20 +9,19 @@ set_ipu_shard
通过设置输入的函数或计算层内每个算子的流水线属性实现对模型的切分。
.. note::
-
- 仅支持当 enable_manual_shard=True,index 设置才有效。请参阅 :ref:`cn_api_fluid_IpuStrategy` 。
- 仅支持当 enable_pipelining=True,stage 设置才有效。请参阅 :ref:`cn_api_fluid_IpuStrategy` 。
+ 仅支持当 enable_manual_shard=True,才能将 index 设置为非-1 的值。请参阅 :ref:`cn_api_fluid_IpuStrategy` 。
+ 仅支持当 enable_pipelining=True,才能将 stage 设置为非-1 的值。请参阅 :ref:`cn_api_fluid_IpuStrategy` 。
一个 index 支持对应 None stage 或一个 stage,一个 stage 仅支持对应一个新的 index 或者一个重复的 index。
参数
:::::::::
- **call_func** (Layer|function) - 静态图下的函数或者计算层。
- - **index** (int,可选) - 指定 Op 在哪个 ipu 上计算,(如‘0, 1, 2, 3’),默认值-1,表示不指定 ipu。
- - **stage** (int,可选) – 指定被切分的模型的计算顺序,(如‘0, 1, 2, 3’),按照数值大小顺序对被切分的模型进行计算,默认值-1,表示没有数据流水计算顺序并按照计算图顺序计算 Op。
+ - **index** (int,可选) - 指定 Op 在哪个 ipu 上计算,(如‘0, 1, 2, 3’),默认值-1,表示 Op 仅在 ipu 0 上运行。
+ - **stage** (int,可选) - 指定被切分的模型的计算顺序,(如‘0, 1, 2, 3’),按照数值大小顺序对被切分的模型进行计算,默认值-1,表示没有数据流水计算顺序并按照计算图顺序计算 Op。
返回
:::::::::
- 无。
+ 包装后的调用函数。
代码示例
::::::::::
diff --git a/docs/api/paddle/take_along_axis_cn.rst b/docs/api/paddle/take_along_axis_cn.rst
index 1095c3fc789..7ec44e9fcbc 100644
--- a/docs/api/paddle/take_along_axis_cn.rst
+++ b/docs/api/paddle/take_along_axis_cn.rst
@@ -16,7 +16,7 @@ take_along_axis
返回
:::::::::
-- **out** (Tensor) - 输出 Tensor,包含 indeces 矩阵选定的元素,与 ``arr`` 数据类型相同。
+输出 Tensor,包含 indeces 矩阵选定的元素,与 ``arr`` 数据类型相同。
代码示例
:::::::::
diff --git a/docs/design/phi/design_cn.md b/docs/design/phi/design_cn.md
index 91e3ab416f1..bc5043e07ed 100644
--- a/docs/design/phi/design_cn.md
+++ b/docs/design/phi/design_cn.md
@@ -582,7 +582,7 @@ Tensor scale(const Tensor& x,
C++ API 的自动生成是通过解析 YAML 配置文件来进行生成的,YAML 配置文件分为:
- - 前向 API 配置文件(`paddle/phi/api/yaml/api.yaml`,解析后生成代码文件为`paddle/phi/api/include/api.h`和`paddle/phi/api/lib/api.cc`)
+ - 前向 API 配置文件(`paddle/phi/api/yaml/ops.yaml`,解析后生成代码文件为`paddle/phi/api/include/api.h`和`paddle/phi/api/lib/api.cc`)
- 反向 API 配置文件(`paddle/phi/api/yaml/backward.yaml`,解析后生成的代码文件为`paddle/phi/api/backward/backward_api.h`和`paddle/phi/api/lib/backward_api.cc`)。
C++ API 生成的关键在于 YAML 文件的配置,以 matmul 为例,其前向和反向的配置文件如下:
@@ -1642,7 +1642,7 @@ PHI 期望的 Op 开发方式:**“完形填空”式算子描述实现 + “
需要写的内容如下:
```
-## 配置文件 api.yaml
+## 配置文件 ops.yaml
- api : add
args : (const Tensor& x, const Tensor& y)
output : Tensor
diff --git a/docs/design/phi/design_en.md b/docs/design/phi/design_en.md
index b8c36769142..9d6cd489481 100644
--- a/docs/design/phi/design_en.md
+++ b/docs/design/phi/design_en.md
@@ -582,7 +582,7 @@ Described as follows:
The automatic generation of the C++ API is generated by parsing the YAML configuration file. The YAML configuration file is divided into:
-- Forward API configuration file(`paddle/phi/api/yaml/api.yaml`. After parsing, the generated code file is `paddle/phi/api/include/api.h` and `paddle/phi/api/lib/api.cc`)
+- Forward API configuration file(`paddle/phi/api/yaml/ops.yaml`. After parsing, the generated code file is `paddle/phi/api/include/api.h` and `paddle/phi/api/lib/api.cc`)
- Backward API configuration file(`paddle/phi/api/yaml/backward.yaml`. After parsing, the generated code file is `paddle/phi/api/backward/backward_api.h` and `paddle/phi/api/lib/backward_api.cc`)
The key to C++ API generation lies in the configuration of the YAML file. Taking `matmul` as an example, the forward and backward configuration are as follows:
diff --git a/docs/dev_guides/Overview_cn.md b/docs/dev_guides/Overview_cn.md
index af1ead4cf4a..f045ffe295b 100644
--- a/docs/dev_guides/Overview_cn.md
+++ b/docs/dev_guides/Overview_cn.md
@@ -6,8 +6,8 @@
- [新建一个 ISSUE 来反馈 bug](https://github.com/PaddlePaddle/Paddle/issues/new/choose)
- [新建一个 ISSUE 来提出新功能需求](https://github.com/PaddlePaddle/Paddle/issues/new/choose)
-- [提 PR 来修复一个 bug](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/10_contribution/local_dev_guide_cn.html)
-- [提 PR 来实现一个新功能](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/10_contribution/local_dev_guide_cn.html)
+- [提 PR 来修复一个 bug](./code_contributing_path_cn.html)
+- [提 PR 来实现一个新功能](./code_contributing_path_cn.html)
- [优化我们的文档](https://github.com/PaddlePaddle/docs/wiki/%E6%96%87%E6%A1%A3%E8%B4%A1%E7%8C%AE%E6%8C%87%E5%8D%97)
感谢你对飞桨开源项目的贡献!
diff --git a/docs/dev_guides/api_contributing_guides/api_docs_guidelines_cn.md b/docs/dev_guides/api_contributing_guides/api_docs_guidelines_cn.md
index 46a0a4c30b3..d8d98e82f1e 100644
--- a/docs/dev_guides/api_contributing_guides/api_docs_guidelines_cn.md
+++ b/docs/dev_guides/api_contributing_guides/api_docs_guidelines_cn.md
@@ -311,6 +311,7 @@ API 的属性用来描述 API 所包含的属性。如果 API 有属性,每个
程序中随机运算符的默认随机种子。0 意味着随机生成随机种子。
**返回**
+
int64,返回该 Program 中当前正在使用的 random seed。
**代码示例**
@@ -335,12 +336,15 @@ API 的方法用来描述 API 所包含的方法,一些类的 API 会有这个
''''''''''''
.. py:function:: paddle.Program.parse_from_string(binary_str_type)
+
通过对 protobuf 的反序列化,转换成 ``Program``
**参数**
- binary_str_type (**str**) – protobuf 二进制字符串
+
+ - **binary_str_type** (str) – protobuf 二进制字符串
**返回**
+
``Program``,反序列化后的 ``Program``
**代码示例**
diff --git a/docs/dev_guides/api_contributing_guides/new_python_api_cn.md b/docs/dev_guides/api_contributing_guides/new_python_api_cn.md
index 5d1250af881..3e694201de6 100644
--- a/docs/dev_guides/api_contributing_guides/new_python_api_cn.md
+++ b/docs/dev_guides/api_contributing_guides/new_python_api_cn.md
@@ -122,7 +122,7 @@ def trace(x, offset=0, axis1=0, axis2=1, name=None):
- `_C_ops` 是 [python/paddle/_C_ops.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/_C_ops.py),其实现了从 Paddle 编译得到的二进制文件中 import C++ 算子对应的 Python C 函数。
- `trace` 是算子的 Python C 函数名。Python C 函数的命名直接采用算子名。
- - 参数 `( x, offset, axis1, axis2 )`需按照 [YAML 配置文件](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/api/yaml/api.yaml#L185) 中定义的输入参数顺序传入,C++ 算子的输入、输出和属性等描述是通过 YAML 配置文件定义的,具体可参见 [开发 C++ 算子](new_cpp_op_cn.html) 章节介绍。
+ - 参数 `( x, offset, axis1, axis2 )`需按照 [YAML 配置文件](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/api/yaml/ops.yaml#L185) 中定义的输入参数顺序传入,C++ 算子的输入、输出和属性等描述是通过 YAML 配置文件定义的,具体可参见 [开发 C++ 算子](new_cpp_op_cn.html) 章节介绍。
> 注意:由于目前飞桨动态图正处在重构升级阶段,所以现有算子的代码会分别有新旧动态图两个代码分支,其中 `in_dygraph_mode()` 表示新动态图分支(默认),`_in_legacy_dygraph()`为旧动态图分支,**在新增算子时无需添加旧动态图分支代码**。
diff --git a/docs/guides/06_distributed_training/data_parallel/recompute_cn.rst b/docs/guides/06_distributed_training/data_parallel/recompute_cn.rst
index 5ad4c125272..fe9c7837afc 100644
--- a/docs/guides/06_distributed_training/data_parallel/recompute_cn.rst
+++ b/docs/guides/06_distributed_training/data_parallel/recompute_cn.rst
@@ -9,9 +9,9 @@
- **反向计算:** 运行反向算子来计算参数(Parameter)的梯度。
- **优化:** 应用优化算法以更新参数值 。
-在前向计算过程中,前向算子会计算出大量的中间结果,由于这些中间结果是训练数据和算子计算得到的,所以训练数据的 batch bize 越大,中间结果占用的内存也就越大。飞桨核心框架会使用张量来存储这些隐层的中间结果。当模型层数加深时,其中间结果的数量可达数千甚至数万,占据大量的内存。飞桨核心框架的显存回收机制会及时清除无用的中间结果以节省显存,但是有些中间结果是反向计算过程中算子的输入,这些中间结果必须存储在内存中,直到相应的反向算子计算完毕。
+在前向计算过程中,前向算子会计算出大量的中间结果,由于这些中间结果是训练数据和算子计算得到的,所以训练数据的 batch size 越大,中间结果占用的内存也就越大。飞桨核心框架会使用张量来存储这些隐层的中间结果。当模型层数加深时,其中间结果的数量可达数千甚至数万,占据大量的内存。飞桨核心框架的显存回收机制会及时清除无用的中间结果以节省显存,但是有些中间结果是反向计算过程中算子的输入,这些中间结果必须存储在内存中,直到相应的反向算子计算完毕。
-对于大小固定的内存来说,如果用户希望使用大 batch bize 的数据进行训练,则将导致单个中间结果占用内存增大,那么就需要减少中间结果的存储数量,FRB 就是基于这种思想设计的。FRB 是将深度学习网络切分为 k 个部分(segments)。对每个 segment 而言:前向计算时,除了小部分必须存储在内存中的张量外,其他中间结果都将被删除;在反向计算中,首先重新计算一遍前向算子,以获得中间结果,再运行反向算子。简而言之,FRB 和普通的网络迭代相比,多计算了一遍前向算子。
+对于大小固定的内存来说,如果用户希望使用大 batch size 的数据进行训练,则将导致单个中间结果占用内存增大,那么就需要减少中间结果的存储数量,FRB 就是基于这种思想设计的。FRB 是将深度学习网络切分为 k 个部分(segments)。对每个 segment 而言:前向计算时,除了小部分必须存储在内存中的张量外,其他中间结果都将被删除;在反向计算中,首先重新计算一遍前向算子,以获得中间结果,再运行反向算子。简而言之,FRB 和普通的网络迭代相比,多计算了一遍前向算子。
具体过程如下图所示:
diff --git a/docs/guides/advanced/autograd_cn.rst b/docs/guides/advanced/autograd_cn.rst
index 01326d4e6e0..f02a71d2f10 100644
--- a/docs/guides/advanced/autograd_cn.rst
+++ b/docs/guides/advanced/autograd_cn.rst
@@ -38,7 +38,7 @@ PaddlePaddle 的神经网络核心是自动微分,本篇文章主要为你介
2.2.0
-本案例首先定义网络。因为本示例着重展示如何使用飞桨进行自动微分,故组网部分不过多展开,直接使用高层 API 中封装好的模型\ ``vgg11``\ 。
+本案例首先定义网络。因为本示例着重展示如何使用飞桨进行自动微分,故组网部分不过多展开,直接使用高层 API 中封装好的模型 :ref:`paddle.vision.models ` 。
然后随机初始化一个输入\ ``x``\ ,和对应标签\ ``label``\ 。
diff --git a/docs/guides/beginner/tensor_cn.md b/docs/guides/beginner/tensor_cn.md
index 0971ad53e6e..00b3a0f931b 100644
--- a/docs/guides/beginner/tensor_cn.md
+++ b/docs/guides/beginner/tensor_cn.md
@@ -331,7 +331,7 @@ Tensor flattened to Vector: [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1
> * [paddle.squeeze](../../../api/paddle/squeeze_cn.html),可实现 Tensor 的降维操作,即把 Tensor 中尺寸为 1 的维度删除。
> * [paddle.unsqueeze](../../../api/paddle/unsqueeze_cn.html),可实现 Tensor 的升维操作,即向 Tensor 中某个位置插入尺寸为 1 的维度。
> * [paddle.flatten](../../../api/paddle/flatten_cn.html),将 Tensor 的数据在指定的连续维度上展平。
-> * [transpose](../../../api/paddle/transpose_cn.html),对 Tensor 的数据进行重排。
+> * [paddle.transpose](../../../api/paddle/transpose_cn.html),对 Tensor 的数据进行重排。
**(3)原位(Inplace)操作和非原位操作的区别**
@@ -545,7 +545,7 @@ Tensor(shape=[4], dtype=int64, place=Place(gpu:0), stop_gradient=True,
#### 4.1.2 修改 Tensor
-与访问 Tensor 类似,修改 Tensor 可以在单个或多个维度上通过索引或切片操作。同时,支持将多种类型的数据赋值给该 Tensor,当前支持的数据类型有:`int`,`float`,`numpy.ndarray`,`omplex`,`Tensor`。
+与访问 Tensor 类似,修改 Tensor 可以在单个或多个维度上通过索引或切片操作。同时,支持将多种类型的数据赋值给该 Tensor,当前支持的数据类型有:`int`,`float`,`numpy.ndarray`,`complex`,`Tensor`。
> **注意:**
>
> 请慎重通过索引或切片修改 Tensor,该操作会**原地**修改该 Tensor 的数值,且原值不会被保存。如果被修改的 Tensor 参与梯度计算,仅会使用修改后的数值,这可能会给梯度计算引入风险。飞桨框架会自动检测不当的原位(inplace)使用并报错。
@@ -673,7 +673,7 @@ x.matmul(y) #矩阵乘法
飞桨框架提供的一些 API 支持广播(broadcasting)机制,允许在一些运算时使用不同形状的 Tensor。
飞桨 Tensor 的广播机制主要遵循如下规则(参考 [Numpy 广播机制](https://numpy.org/doc/stable/user/basics.broadcasting.html#module-numpy.doc.broadcasting)):
-* 每个 Tensor 至少为一维 Tensor
+* 每个 Tensor 至少为一维 Tensor。
* 从最后一个维度向前开始比较两个 Tensor 的形状,需要满足如下条件才能进行广播:两个 Tensor 的维度大小相等;或者其中一个 Tensor 的维度等于 1;或者其中一个 Tensor 的维度不存在。
举例如下:
diff --git a/docs/guides/hardware_support/hardware_info_cn.md b/docs/guides/hardware_support/hardware_info_cn.md
index 3e342f90de7..2f581de0cae 100644
--- a/docs/guides/hardware_support/hardware_info_cn.md
+++ b/docs/guides/hardware_support/hardware_info_cn.md
@@ -12,6 +12,7 @@
| AI 加速芯片 | | 海光 | 海光 DCU | [安装](./rocm_docs/paddle_install_cn.html#wheel) | [源码编译](./rocm_docs/paddle_install_cn.html#anzhuangfangshier-tongguoyuanmabianyianzhuang) | ✔️ | [支持模型](./rocm_docs/paddle_rocm_cn.html) |
| AI 加速芯片 | XPU | 百度 | 昆仑 K200、R200 等 | [安装](./xpu_docs/paddle_install_xpu2_cn.html#wheel) | [源码编译](./xpu_docs/paddle_install_xpu2_cn.html#xpu) | | [支持模型](./xpu_docs/paddle_2.0_xpu2_cn.html) |
| AI 加速芯片 | IPU | Graphcore | GC200 | | [源码编译](./ipu_docs/paddle_install_cn.html) | | |
+| AI 加速芯片 | MLU | 寒武纪 | MLU370 系列 | [安装](./mlu_docs/paddle_install_cn.html) | [源码编译](./mlu_docs/paddle_install_cn.html#anzhuangfangshier-tongguoyuanmabianyianzhuang) | | ✔️ |
## Paddle Inference
@@ -21,6 +22,7 @@
| 服务端 GPU | | NVIDIA | 常见 GPU 型号如 V100、T4 等 | [预编译库](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html) | [源码编译](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html) | ✔️ | |
| 移动端 GPU | | NVIDIA | Jetson 系列 | [预编译库](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html) | [源码编译](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html) | ✔️ | |
| AI 加速芯片 | 达芬奇 | 华为 | 昇腾 910 | 即将提供 | | | |
+| AI 加速芯片 | MLU | 寒武纪 | MLU370 系列 | [预编译库](./mlu_docs/paddle_install_cn.html#wheel) | [源码编译](./mlu_docs/paddle_install_cn.html#anzhuangfangshier-tongguoyuanmabianyianzhuang) | ✔️ | |
| AI 加速芯片 | | 海光 | 海光 DCU | [预编译库](./rocm_docs/paddle_install_cn.html) | [源码编译](./rocm_docs/paddle_install_cn.html) | ✔️ | [支持模型](./rocm_docs/paddle_rocm_cn.html) |
| AI 加速芯片 | XPU | 百度 | 昆仑 K200、R200 等 | [预编译库](./xpu_docs/inference_install_example_cn.html#wheel) | [源码编译](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/09_hardware_support/xpu_docs/paddle_install_cn.html#id2) | | [支持模型](./xpu_docs/paddle_2.0_xpu_cn.html#xunlianzhichi) |
| 服务端 CPU | ARM | 飞腾 | FT-2000+/64、S2500 | |[源码编译](../../install/compile/arm-compile.html#anchor-1) | | |
diff --git a/docs/guides/hardware_support/index_cn.rst b/docs/guides/hardware_support/index_cn.rst
index 957806cb83e..fb278be6e07 100644
--- a/docs/guides/hardware_support/index_cn.rst
+++ b/docs/guides/hardware_support/index_cn.rst
@@ -11,6 +11,7 @@
- `海光 DCU 芯片运行飞桨 <./rocm_docs/index_cn.html>`_ : 介绍如何在海光 DCU 芯片环境上安装和使用飞桨。
- `昇腾 NPU 芯片运行飞桨 <./npu_docs/index_cn.html>`_ : 介绍如何在昇腾环境上安装和使用飞桨。
- `Graphcore IPU 芯片运行飞桨 <./ipu_docs/index_cn.html>`_ : 介绍如何在 IPU 环境上安装和使用飞桨。
+- `寒武纪 MLU 芯片运行飞桨 <./mlu_docs/index_cn.html>`_ : 介绍如何在寒武纪 MLU 环境上安装和使用飞桨。
.. toctree::
:hidden:
@@ -20,3 +21,4 @@
rocm_docs/index_cn.rst
npu_docs/index_cn.rst
ipu_docs/index_cn.rst
+ mlu_docs/index_cn.rst
diff --git a/docs/guides/hardware_support/mlu_docs/index_cn.rst b/docs/guides/hardware_support/mlu_docs/index_cn.rst
new file mode 100644
index 00000000000..a693ded7d80
--- /dev/null
+++ b/docs/guides/hardware_support/mlu_docs/index_cn.rst
@@ -0,0 +1,21 @@
+.. _cn_mlu_information:
+
+####################
+寒武纪 MLU 芯片运行飞桨
+####################
+
+寒武纪 MLU370 系列是一款专门用于深度学习的加速卡。Paddle MLU 版当前可以支持在寒武纪 MLU370 系列板卡上进行模型训练。
+
+参考以下内容可快速了解和体验在寒武纪板卡上运行飞桨:
+
+- `飞桨框架寒武纪 MLU 版安装说明 <./paddle_install_cn.html>`_ : 飞桨框架寒武纪 MLU 版安装说明
+- `飞桨框架寒武纪 MLU 版训练示例 <./train_example_cn.html>`_ : 飞桨框架寒武纪 MLU 版训练示例
+- `飞桨框架寒武纪 MLU 版支持模型 <./paddle_mlu_cn.html>`_ : 飞桨框架寒武纪 MLU 版支持模型列表
+
+
+.. toctree::
+ :hidden:
+
+ paddle_install_cn.md
+ train_example_cn.md
+ paddle_mlu_cn.md
diff --git a/docs/guides/hardware_support/mlu_docs/paddle_install_cn.md b/docs/guides/hardware_support/mlu_docs/paddle_install_cn.md
new file mode 100644
index 00000000000..9fecec86a5d
--- /dev/null
+++ b/docs/guides/hardware_support/mlu_docs/paddle_install_cn.md
@@ -0,0 +1,131 @@
+# 飞桨框架寒武纪 MLU 版安装说明
+
+飞桨框架支持基于 python 的训练和原生预测,当前最新版本为 2.4.0,提供两种安装方式:
+
+- 通过预编译的 wheel 包安装
+- 通过源代码编译安装
+
+## 前置条件
+
+### 板卡安装
+
+寒武纪 MLU370 系列板卡安装,可以参见 [寒武纪官网板卡安装教程](https://developer.cambricon.com/index/curriculum/details/id/38/classid/7.html)。
+
+### 驱动安装
+
+寒武纪驱动安装,可以参见 [寒武纪官网驱动安装](https://www.cambricon.com/docs/sdk_1.6.0/driver_4.20.12/user_guide_4.20.12/index.html)。
+
+**注意**:建议安装寒武纪驱动版本高于 `v4.20.11`。
+
+
+## 镜像准备
+
+**注意**:当前仅提供基于 Ubuntu18.04 & CNToolkit3.0 的 docker 镜像环境。
+
+首先需要准备支持寒武纪板卡运行环境的 docker 镜像,可以直接从 Paddle 的官方镜像库拉取预先装有 CNToolkit3.0 的 docker 镜像来准备相应的运行环境。
+
+```bash
+# 拉取镜像
+docker pull registry.baidubce.com/device/paddle-mlu:cntoolkit3.0.2-cnnl1.13.0
+
+# 启动容器,注意这里的参数,例如 shm-size, device 等都需要配置
+# 可以通过 `-v` 参数来挂载训练所需的数据集目录,例如 -v /datasets:/datasets
+docker run --shm-size=128G \
+ --net=host \
+ --cap-add=sys_ptrace \
+ -v /usr/bin/cnmon:/usr/bin/cnmon \
+ -v `pwd`:/workspace
+ -it --privileged \
+ --name paddle_mlu_$USER \
+ -w=/workspace
+ registry.baidubce.com/device/paddle-mlu:cntoolkit3.0.2-cnnl1.13.0 \
+ /bin/bash
+
+# 检查容器是否可以正确识别寒武纪 MLU 设备
+cnmon
+
+# 预期得到以下结果(如下是一台 3 卡机器的信息):
+Sat Oct 8 11:22:22 2022
++------------------------------------------------------------------------------+
+| CNMON v4.20.11 |
++-------------------------------+----------------------+-----------------------+
+| Card VF Name Firmware | Inited Driver | Util Ecc-Error |
+| Fan Temp Pwr:Usage/Cap | Memory-Usage | vMemory-Usage |
+|===============================+======================+=======================|
+| 0 / MLU370-X4 v1.1.6 | On v4.20.11 | 0% N/A |
+| 0% 32C 30 W/ 150 W | 0 MiB/ 23308 MiB | 10240 MiB/1048576 MiB |
++-------------------------------+----------------------+-----------------------+
+| 1 / MLU370-X4 v1.1.6 | On v4.20.11 | 0% N/A |
+| 0% 33C 25 W/ 150 W | 0 MiB/ 23308 MiB | 10240 MiB/1048576 MiB |
++-------------------------------+----------------------+-----------------------+
+| 2 / MLU370-X4 v1.1.6 | On v4.20.11 | 0% N/A |
+| 0% 30C 26 W/ 150 W | 0 MiB/ 23308 MiB | 10240 MiB/1048576 MiB |
++-------------------------------+----------------------+-----------------------+
+
++------------------------------------------------------------------------------+
+| Processes: |
+| Card VF PID Command Line MLU Memory Usage |
+|==============================================================================|
+| No running processes found |
++------------------------------------------------------------------------------+
+```
+
+## 安装方式一:通过 wheel 包安装
+
+**注意**:当前仅提供 Python 3.7 的 wheel 安装包。
+
+**第一步**:下载 Python3.7 wheel 安装包
+
+```bash
+pip install https://paddle-device.bj.bcebos.com/mlu/paddlepaddle_mlu-2.4.0-cp37-cp37m-linux_x86_64.whl
+```
+
+**第二步**:验证安装包
+
+安装完成之后,运行如下命令。如果出现 PaddlePaddle is installed successfully!,说明已经安装成功。
+
+```bash
+python -c "import paddle; paddle.utils.run_check()"
+```
+
+## 安装方式二:通过源码编译安装
+
+**注意**:环境准备参见 [镜像准备](./paddle_install_cn.md#jingxiangzhunbei)
+
+**第一步**:下载 Paddle 源码并编译,CMAKE 编译选项含义请参见 [编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)
+
+```bash
+# 下载源码,默认 develop 分支
+git clone -b release/2.4 https://github.com/PaddlePaddle/Paddle.git
+cd Paddle
+
+# 创建编译目录
+mkdir build && cd build
+
+# 执行 cmake
+export PADDLE_VERSION=2.4.0
+cmake .. -DPY_VERSION=3.7 -DWITH_MLU=ON -DCMAKE_BUILD_TYPE=Release -DWITH_DISTRIBUTE=ON -DWITH_CNCL=ON
+
+# 使用以下命令来编译
+make -j$(nproc)
+```
+
+**第二步**:安装与验证编译生成的 wheel 包
+
+编译完成之后进入`Paddle/build/python/dist`目录即可找到编译生成的.whl 安装包,安装与验证命令如下:
+
+```bash
+# 安装命令
+python -m pip install -U paddlepaddle_mlu-2.4.0-cp37-cp37m-linux_x86_64.whl
+
+# 验证命令
+python -c "import paddle; paddle.utils.run_check()"
+```
+
+## 如何卸载
+
+请使用以下命令卸载 Paddle:
+
+```bash
+pip uninstall paddlepaddle-mlu
+```
diff --git a/docs/guides/hardware_support/mlu_docs/paddle_mlu_cn.md b/docs/guides/hardware_support/mlu_docs/paddle_mlu_cn.md
new file mode 100644
index 00000000000..10d60028fe2
--- /dev/null
+++ b/docs/guides/hardware_support/mlu_docs/paddle_mlu_cn.md
@@ -0,0 +1,63 @@
+
+# 飞桨框架寒武纪 MLU 版支持模型
+
+目前 Paddle MLU 版基于寒武纪 MLU370 系列板卡支持以下模型的单机单卡/单机多卡的训练。
+
+## 图像分类
+
+| 模型 | 领域 | 模型链接 | 编程范式 | 训练单机多卡支持 | 训练多机多卡支持 | 推理支持 |
+| ----------------- | -------- | ------------------------------------------------------------ | ------------- | -------------- | -------------- | -------------- |
+| ResNet50 | 图像分类 | [模型链接](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/ImageNet/ResNet/ResNet50.yaml) | 动态图 | 支持 | 支持 | 支持 |
+| VGG16/19 | 图像分类 | [模型链接](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/ImageNet/VGG/VGG16.yaml) | 动态图 | 支持 | 支持 | 支持 |
+| InceptionV4 | 图像分类 | [模型链接](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/ImageNet/Inception/InceptionV4.yaml) | 动态图 | 支持 | 支持 | 支持 |
+| MobileNetV3 | 图像分类 | [模型链接](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_0.yaml) | 动态图 | 支持 | 支持 | 支持 |
+
+
+## 目标检测
+
+| 模型 | 领域 | 模型链接 | 编程范式 | 训练单机多卡支持 | 训练多机多卡支持 | 推理支持 |
+| ----------------- | -------- | ------------------------------------------------------------ | ------------- | -------------- | -------------- | -------------- |
+| YOLOv3 | 目标检测 | [模型链接](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3) | 动态图 | 支持 | 支持 | 支持 |
+| PP-YOLO | 目标检测 | [模型链接](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo) | 动态图 | 支持 | 支持 | 支持 |
+| SSD | 目标检测 | [模型链接](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ssd) | 动态图 | 支持 | 支持 | 支持 |
+| Mask R-CNN | 目标检测 | [模型链接](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn) | 动态图 | 支持 | 支持 | 支持 |
+| Mask R-CNN + FPN | 目标检测 | [模型链接](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn) | 动态图 | 支持 | 支持 | 支持 |
+
+
+## 图像分割
+
+| 模型 | 领域 | 模型链接 | 编程范式 | 训练单机多卡支持 | 训练多机多卡支持 | 推理支持 |
+| ----------------- | -------- | ------------------------------------------------------------ | ------------- | -------------- | -------------- | -------------- |
+| DeepLabV3+ | 图像分割 | [模型链接](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/configs/deeplabv3p) | 动态图 | 支持 | 不支持 | 支持 |
+| U-Net | 图像分割 | [模型链接](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/configs/unet) | 动态图 | 支持 | 不支持 | 支持 |
+
+## 自然语言处理
+
+| 模型 | 领域 | 模型链接 | 编程范式 | 训练单机多卡支持 | 训练多机多卡支持 | 推理支持 |
+| ----------------- | -------- | ------------------------------------------------------------ | ------------- | -------------- | -------------- | -------------- |
+| BERT | NLP | [模型链接](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/bert) | 动态图 | 支持 | 支持 | 支持 |
+| Transformer | NLP | [模型链接](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/machine_translation/transformer) | 动态图 | 支持 | 支持 | 支持 |
+| Bi-LSTM | NLP | [模型链接](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/text_classification/rnn) | 动态图 | 支持 | 支持 | 支持 |
+
+
+## 字符识别
+
+| 模型 | 领域 | 模型链接 | 编程范式 | 训练单机多卡支持 | 训练多机多卡支持 | 推理支持 |
+| ----------------- | -------- | ------------------------------------------------------------ | ------------- | -------------- | -------------- | -------------- |
+| OCR-DB | 文本检测 | [模型链接](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/detection.md) | 动态图 | 支持 | 支持 | 支持 |
+| CRNN-CTC | 文本识别 | [模型链接](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/recognition.md) | 动态图 | 支持 | 支持 | 支持 |
+| OCR-Clas | 角度分类 | [模型链接](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/angle_class.md) | 动态图 | 支持 | 支持 | 支持 |
+| OCR-E2E | 字符识别 | [模型链接](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/pgnet.md) | 动态图 | 支持 | 支持 | 支持 |
+
+
+## 模型套件
+
+模型放置在飞桨模型套件中,各领域套件是 github.com/PaddlePaddle 下的独立 repo,git clone 下载即可获取所需的模型文件:
+
+| 领域 | 套件名称 | 分支/版本 |
+| ----------- | --------------- | ---------------- |
+| 图像分类 | PaddleClas | develop |
+| 目标检测 | PaddleDetection | develop |
+| 图像分割 | PaddleSeg | develop |
+| 自然语言处理 | PaddleNLP | develop |
+| 字符识别 | PaddleOCR | dygraph |
diff --git a/docs/guides/hardware_support/mlu_docs/train_example_cn.md b/docs/guides/hardware_support/mlu_docs/train_example_cn.md
new file mode 100644
index 00000000000..a2cffaab02e
--- /dev/null
+++ b/docs/guides/hardware_support/mlu_docs/train_example_cn.md
@@ -0,0 +1,31 @@
+# 飞桨框架 MLU 版训练示例
+
+使用寒武纪 MLU370 进行训练与使用 Intel CPU/Nvidia GPU 训练相同,当前 Paddle MLU 版本完全兼容 Paddle CUDA 版本的 API,直接使用原有的 GPU 训练命令和参数即可。
+
+#### ResNet50 训练示例
+
+**第一步**:安装 MLU 支持的 Paddlepaddle
+
+Paddle MLU 版的 Python 预测库请参考 [飞桨框架 MLU 版安装说明](./paddle_install_cn.html) 进行安装或编译。
+
+
+**第二步**:下载 ResNet50 代码,并准备 ImageNet1k 数据集
+
+```bash
+cd path_to_clone_PaddleClas
+git clone https://github.com/PaddlePaddle/PaddleClas.git
+```
+也可以访问 PaddleClas 的 [GitHub Repo](https://github.com/PaddlePaddle/PaddleClas) 直接下载源码。请根据[数据说明](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/data_preparation/classification_dataset.md)文档准备 ImageNet1k 数据集。
+
+**第三步**:运行训练
+
+使用飞桨 PaddleXXX 套件运行 MLU 可以通过设置 Global.device 参数为 mlu 来指定设备,其他模型也可以参考该使用方式
+
+```bash
+export MLU_VISIBLE_DEVICES=0,1,2,3
+
+cd PaddleClas/
+python3.7 -m paddle.distributed.launch --mlus="0,1,2,3" tools/train.py \
+ -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml \
+ -o Global.device=mlu
+```
diff --git a/docs/guides/hardware_support/xpu_docs/paddle_2.0_xpu2_cn.md b/docs/guides/hardware_support/xpu_docs/paddle_2.0_xpu2_cn.md
index 8ffb3b5a91a..18e8b2bbccd 100644
--- a/docs/guides/hardware_support/xpu_docs/paddle_2.0_xpu2_cn.md
+++ b/docs/guides/hardware_support/xpu_docs/paddle_2.0_xpu2_cn.md
@@ -1,26 +1,66 @@
-# 飞桨对昆仑 2 代芯片的支持
+# 飞桨对昆仑芯 2 代芯片的支持
-飞桨自 2.3rc 版本起支持在昆仑 2 代芯片上(R200,R300)运行,经验证的模型训练的支持情况如下:
+飞桨自 2.4rc 版本起支持在昆仑芯 2 代芯片上(R200,R300,R200-8F,R200-8FS,RG800)运行,经验证的模型训练的支持情况如下:
## 训练支持
可进行单机单卡/单机多卡训练的模型,如下所示:
-| 模型 | 领域 | 编程范式 | 可用的 CPU 类型 | 单机单卡支持 | 单机多卡支持 |
-| ------------------ | -------- |------------- | ----------------------- | -------------- | -------------- |
-| ResNet50 | 图像分类 | 动态图 | X86(Intel) | 支持 |- |
-| MobileNet_v3 | 图像分类 | 动态图 | X86(Intel) | 支持 |- |
-| UNet | 图像分割 | 动态图 | X86(Intel) | 支持 |- |
-| Yolov3-DarkNet53 | 目标检测 | 动态图 | X86(Intel) | 支持 |- |
-| SSD-ResNet34 | 目标检测 | 动态图 | X86(Intel) | 支持 |支持 |
-| OCR-DB | 文字检测 | 动态图 | X86(Intel) | 支持 |- |
-| Bert-Base | NLP | 静态图 | X86(Intel) | 支持 |支持 |
-| Transformer | NLP | 静态图 | X86(Intel) | 支持 |支持 |
-| GPT-2 | NLP | 动态图 | X86(Intel) | 支持 |- |
-| DeepFM | 推荐 | 动态图 | X86(Intel) | 支持 |- |
-| Wide&Deep | 推荐 | 动态图 | X86(Intel) | 支持 |- |
-
-
+| 模型 | 领域 | 编程范式 | 可用的 CPU 类型 | 单机单卡支持 | 单机多卡支持 |
+| --- | --- | --- | --- | --- | --- |
+| ResNet50 | 图像分类 | 动态图 | X86(Intel) | 支持 | - |
+| ResNet101 | 图像分类 | 动态图 | X86(Intel) | 支持 | - |
+| MobileNet_v3 | 图像分类 | 动态图 | X86(Intel) | 支持 | - |
+| MobileNetV2 | 图像分类 | 动态图 | X86(Intel) | 支持 | - |
+| VGG19 | 图像分类 | 动态图 | X86(Intel) | 支持 | - |
+| VGG16 | 图像分类 | 动态图 | X86(Intel) | 支持 | - |
+| PP-LCNet | 图像分类 | 动态图 | X86(Intel) | 支持 | - |
+| PP-HGNet | 图像分类 | 动态图 | X86(Intel) | 支持 | - |
+| InceptionV4 | 图像分类 | 动态图 | X86(Intel) | 支持 | - |
+| UNet | 图像分割 | 动态图 | X86(Intel) | 支持 | - |
+| deeplabv3 | 图像分割 | 动态图 | X86(Intel) | 支持 | - |
+| HRNet | 图像分割 | 动态图 | X86(Intel) | 支持 | - |
+| PP-LiteSeq | 图像分割 | 动态图 | X86(Intel) | 支持 | - |
+| PP-humansegv2 | 图像分割 | 动态图 | X86(Intel) | 支持 | - |
+| PP-mating | 图像分割 | 动态图 | X86(Intel) | 支持 | - |
+| MaskRcnn | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| FasterRcnn | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| fairmot | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| Yolov3-DarkNet53 | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| SSD-ResNet34 | 目标检测 | 动态图 | X86(Intel) | 支持 | 支持 |
+| Yolov3-mobileNetv1 | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| PPYoloE | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| deepsort | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| ssd-mv1 | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| ssd-vgg16 | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| PP-picoDet | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| PPYolov2 | 目标检测 | 动态图 | X86(Intel) | 支持 | - |
+| OCR-DB | 文字检测 | 动态图 | X86(Intel) | 支持 | - |
+| OCR-crnn | 文字检测 | 动态图 | X86(Intel) | 支持 | - |
+| PPOCR-v2 | 文字检测 | 动态图 | X86(Intel) | 支持 | - |
+| PPOCR-v3 | 文字检测 | 动态图 | X86(Intel) | 支持 | - |
+| Bert-Base | NLP | 静态图 | X86(Intel) | 支持 | 支持 |
+| Transformer | NLP | 静态图 | X86(Intel) | 支持 | 支持 |
+| GPT-2 | NLP | 动态图 | X86(Intel) | 支持 | - |
+| ernie-base | NLP | 动态图 | X86(Intel) | 支持 | - |
+| ernie 3.0 medium | NLP | 动态图 | X86(Intel) | 支持 | - |
+| lstm | NLP | 动态图 | X86(Intel) | 支持 | - |
+| seq2seq | NLP | 动态图 | X86(Intel) | 支持 | - |
+| DeepFM | 推荐 | 动态图 | X86(Intel) | 支持 | - |
+| Wide&Deep | 推荐 | 动态图 | X86(Intel) | 支持 | - |
+| dlrm | 推荐 | 动态图 | X86(Intel) | 支持 | - |
+| deepspeech2 | 语音识别 | 动态图 | X86(Intel) | 支持 | - |
+| speedyspeech | 语音合成 | 动态图 | X86(Intel) | 支持 | - |
+| dqn | 强化学习 | 动态图 | X86(Intel) | 支持 | - |
+| ppo | 强化学习 | 动态图 | X86(Intel) | 支持 | - |
+| ddpg | 强化学习 | 动态图 | X86(Intel) | 支持 | - |
+| A2C | 强化学习 | 动态图 | X86(Intel) | 支持 | - |
+| TD3 | 强化学习 | 动态图 | X86(Intel) | 支持 | - |
+| SAC | 强化学习 | 动态图 | X86(Intel) | 支持 | - |
+| MADDPG | 强化学习 | 动态图 | X86(Intel) | 支持 | - |
+| CQL | 强化学习 | 动态图 | X86(Intel) | 支持 | - |
+| ES | 强化学习 | 动态图 | X86(Intel) | 支持 | - |
+| pp-tsm | 视频分类 | 动态图 | X86(Intel) | 支持 | - |
模型放置在飞桨模型套件中,作为 github.com/PaddlePaddle 下的独立 repo 存在,git clone 下载即可获取所需的模型文件:
@@ -28,9 +68,7 @@
| -------- | --------------- | ----------- |
| 图像分类 | [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) | [develop](https://github.com/PaddlePaddle/PaddleClas/tree/develop) |
| 目标检测 | [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) | [develop](https://github.com/PaddlePaddle/PaddleDetection/tree/develop) |
-| 图像分割 | [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) | [develop](https://github.com/PaddlePaddle/PaddleSeg/tree/develop) |
-| NLP | [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP) | [develop](https://github.com/PaddlePaddle/PaddleNLP/tree/develop) |
-| OCR | [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) | [dygraph](https://github.com/PaddlePaddle/PaddleOCR/tree/dygraph) |
-| 推荐 | [PaddleREC](https://github.com/PaddlePaddle/PaddleRec) | [master](https://github.com/PaddlePaddle/PaddleRec/tree/master) |
-
-* 注:支持基于 Kermel Primitive 算子的昆仑 2 代芯片支持,[点击这里](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/kernel_primitive_api/index_cn.html)。
+| 图像分割 | [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) | [develop](https://github.com/PaddlePaddle/PaddleSeg/tree/develop) |
+| NLP | [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP) | [develop](https://github.com/PaddlePaddle/PaddleNLP/tree/develop) |
+| OCR | [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) | [dygraph](https://github.com/PaddlePaddle/PaddleOCR/tree/dygraph) |
+| 推荐 | [PaddleREC](https://github.com/PaddlePaddle/PaddleRec) | [master](https://github.com/PaddlePaddle/PaddleRec/tree/master) |
diff --git a/docs/guides/infer/inference/inference_cn.md b/docs/guides/infer/inference/inference_cn.md
index f8523ac3ef5..4b29e81ab01 100644
--- a/docs/guides/infer/inference/inference_cn.md
+++ b/docs/guides/infer/inference/inference_cn.md
@@ -9,8 +9,8 @@ Paddle Inference 功能特性丰富,性能优异,针对不同平台不同的
一些常见的文档链接如下:
- 完整使用文档位于:[Paddle Inference 文档](https://www.paddlepaddle.org.cn/inference/product_introduction/inference_intro.html)
- 代码示例位于[inference demo](https://github.com/PaddlePaddle/Paddle-Inference-Demo)
-- 点此 [安装与编译 Linux 预测库](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html)
-- 点此 [安装与编译 Windows 预测库](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#windows)
+- 点此 [下载安装 Linux 预测库](https://www.paddlepaddle.org.cn/inference/v2.4/guides/install/download_lib.html)
+- 点此 [下载安装 Windows 预测库](https://www.paddlepaddle.org.cn/inference/v2.4/guides/install/download_lib.html#windows)
## 与主框架 model.predict 区别
diff --git a/docs/guides/jit/case_analysis_cn.md b/docs/guides/jit/case_analysis_cn.md
index fb152db33a8..8d6e44382cc 100644
--- a/docs/guides/jit/case_analysis_cn.md
+++ b/docs/guides/jit/case_analysis_cn.md
@@ -41,7 +41,7 @@
+ 建议模型搭建时,尽量考虑将预测主逻辑放到 ``forward`` 函数中
+ 将训练独有的逻辑放到 子函数 中,通过 ``if self.training`` 来控制
- + 最大程度抽离 训练和预测 的逻辑为 **公共子函数**
+ + 最大程度抽离 **训练和预测** 的逻辑为 **公共子函数**
## 二、何时指定 InputSpec?
@@ -129,7 +129,7 @@ def forward(self, x):
## 四、to_tensor() 的使用
-``paddle.to_tensor()`` 接口是动态图模型代码中使用比较频繁的一个接口。 ``to_tensor`` 功能强大,将可以将一个 ``scalar`` , ``list`` ,``tuple`` , ``numpy.ndarray`` 转为 ``paddle.Tensor`` 类型。
+``paddle.to_tensor()`` 接口是动态图模型代码中使用比较频繁的一个接口。 ``to_tensor`` 功能强大,可以将一个 ``scalar`` , ``list`` ,``tuple`` , ``numpy.ndarray`` 转为 ``paddle.Tensor`` 类型。
此接口是动态图独有的接口,在动转静时,会转换为 ``assign`` 接口:
@@ -191,7 +191,7 @@ class SimpleNet(paddle.nn.Layer):
## 五、 建议都继承 nn.Layer
-动态图模型常常包含很多嵌套的子网络,建议各个自定义的子网络 ``sublayer`` **无论是否包含了参数,都继承 ``nn.Layer`` .**
+动态图模型常常包含很多嵌套的子网络,建议各个自定义的子网络 ``sublayer`` **无论是否包含了参数,都继承 ``nn.Layer`` **。
从 **Parameters 和 Buffers** 章节可知,有些 ``paddle.to_tensor`` 接口转来的 ``Tensor`` 也可能参与预测逻辑分支的计算,即模型导出时,也需要作为参数序列化保存到 ``.pdiparams`` 文件中。
diff --git a/docs/guides/performance_improving/paddle_tensorrt_infer.md b/docs/guides/performance_improving/paddle_tensorrt_infer.md
index 2890eceb4ab..4590884c5a8 100644
--- a/docs/guides/performance_improving/paddle_tensorrt_infer.md
+++ b/docs/guides/performance_improving/paddle_tensorrt_infer.md
@@ -60,7 +60,7 @@ config->EnableTensorRtEngine(1 << 20 /* workspace_size*/,
## Paddle-TRT 样例编译测试
1. 下载或编译带有 TensorRT 的 paddle 预测库,参考[安装与编译 C++预测库](../../inference_deployment/inference/build_and_install_lib_cn.html)。
-2. 从[NVIDIA 官网](https://developer.nvidia.com/nvidia-tensorrt-download)下载对应本地环境中 cuda 和 cudnn 版本的 TensorRT,需要登陆 NVIDIA 开发者账号。
+2. 从[NVIDIA 官网](https://developer.nvidia.com/tensorrt)下载对应本地环境中 cuda 和 cudnn 版本的 TensorRT,需要登陆 NVIDIA 开发者账号。
3. 下载[预测样例](https://paddle-inference-dist.bj.bcebos.com/tensorrt_test/paddle_inference_sample_v1.7.tar.gz)并解压,进入`sample/paddle-TRT`目录下。
`paddle-TRT` 文件夹目录结构如下:
diff --git a/docs/guides/performance_improving/paddle_tensorrt_infer_en.md b/docs/guides/performance_improving/paddle_tensorrt_infer_en.md
index 0acc384ab2a..29bd16445b9 100644
--- a/docs/guides/performance_improving/paddle_tensorrt_infer_en.md
+++ b/docs/guides/performance_improving/paddle_tensorrt_infer_en.md
@@ -53,7 +53,7 @@ The details of this interface is as following:
## Paddle-TRT example compiling test
1. Download or compile Paddle Inference with TensorRT support, refer to [Install and Compile C++ Inference Library](../../inference_deployment/inference/build_and_install_lib_en.html).
-2. Download NVIDIA TensorRT(with consistent version of cuda and cudnn in local environment) from [NVIDIA TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) with an NVIDIA developer account.
+2. Download NVIDIA TensorRT(with consistent version of cuda and cudnn in local environment) from [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) with an NVIDIA developer account.
3. Download [Paddle Inference sample](https://paddle-inference-dist.bj.bcebos.com/tensorrt_test/paddle_inference_sample_v1.7.tar.gz) and uncompress, and enter `sample/paddle-TRT` directory.
`paddle-TRT` directory structure is as following:
diff --git a/docs/install/FAQ.md b/docs/install/FAQ.md
index 0cc2ecf46ac..137bc321ecd 100644
--- a/docs/install/FAQ.md
+++ b/docs/install/FAQ.md
@@ -63,9 +63,9 @@
> 是的。我们的 Docker image 运行一个 [Bash 脚本](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/paddle_build.sh)。这个脚本调用`make -j$(nproc)` 来启动和 CPU 核一样多的进程来并行编译。
-- 在 Windows/macOS 上编译很慢?
+- 在 Windows/MacOS 上编译很慢?
- > Docker 在 Windows 和 macOS 都可以运行。不过实际上是运行在一个 Linux 虚拟机上。可能需要注意给这个虚拟机多分配一些 CPU 和内存,以保证编译高效。具体做法请参考[issue627](https://github.com/PaddlePaddle/Paddle/issues/627)。
+ > Docker 在 Windows 和 MacOS 都可以运行。不过实际上是运行在一个 Linux 虚拟机上。可能需要注意给这个虚拟机多分配一些 CPU 和内存,以保证编译高效。具体做法请参考[issue627](https://github.com/PaddlePaddle/Paddle/issues/627)。
- 磁盘不够?
@@ -89,7 +89,7 @@
-- macOS 下安装 PaddlePaddle 后 import paddle.fluid 出现`Fatal Python error: PyThreadState_Get: no current thread running`错误
+- MacOS 下安装 PaddlePaddle 后 import paddle.fluid 出现`Fatal Python error: PyThreadState_Get: no current thread running`错误
- For Python2.7.x (install by brew): 请使用`export LD_LIBRARY_PATH=/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7 && export DYLD_LIBRARY_PATH=/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7`
- For Python2.7.x (install by Python.org): 请使用`export LD_LIBRARY_PATH=/Library/Frameworks/Python.framework/Versions/2.7 && export DYLD_LIBRARY_PATH=/Library/Frameworks/Python.framework/Versions/2.7`
diff --git a/docs/install/FAQ_en.md b/docs/install/FAQ_en.md
index 3aae1e0abbb..7e0da11d2a5 100644
--- a/docs/install/FAQ_en.md
+++ b/docs/install/FAQ_en.md
@@ -59,7 +59,7 @@
- Why use Docker?
- > Installing the tools and configurations in a Docker image standardizes the build environment. This way, if you encounter problems, others can reproduce the problem to help. In addition, for developers accustomed to using Windows and macOS, there is no need to configure a cross-compilation environment using Docker.
+ > Installing the tools and configurations in a Docker image standardizes the build environment. This way, if you encounter problems, others can reproduce the problem to help. In addition, for developers accustomed to using Windows and MacOS, there is no need to configure a cross-compilation environment using Docker.
- Can I choose not to use Docker?
@@ -86,9 +86,9 @@
> If you develop with your own computer, you will naturally have admin privileges (sudo). If you are developing from a public computer, you need to ask the administrator to install and configure Docker. In addition, the PaddlePaddle project is working hard to support other container technologies that don't require sudo, such as rkt.
-- Is compiling slow on Windows/macOS?
+- Is compiling slow on Windows/MacOS?
- > Docker runs on both Windows and macOS. However, it is actually running on a Linux virtual machine. It may be necessary to pay attention to allocate more CPU and memory to this virtual machine to ensure efficient compilation. Please refer to [issue627](https://github.com/PaddlePaddle/Paddle/issues/627) for details.
+ > Docker runs on both Windows and MacOS. However, it is actually running on a Linux virtual machine. It may be necessary to pay attention to allocate more CPU and memory to this virtual machine to ensure efficient compilation. Please refer to [issue627](https://github.com/PaddlePaddle/Paddle/issues/627) for details.
- Not enough disk?
@@ -109,7 +109,7 @@
> The main reason for this problem is that your graphics card driver is lower than the corresponding CUDA version. Please ensure that your graphics card driver supports the CUDA version used.
-- `Fatal Python error: PyThreadState_Get: no current thread running` error occurs when importing paddle.fluid after installing PaddlePaddle on macOS.
+- `Fatal Python error: PyThreadState_Get: no current thread running` error occurs when importing paddle.fluid after installing PaddlePaddle on MacOS.
- For Python2.7.x (install by brew): Please use `export LD_LIBRARY_PATH=/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7 && export DYLD_LIBRARY_PATH=/usr/ Local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7`
diff --git a/docs/install/Tables.md b/docs/install/Tables.md
index fa709dcb140..c80307069a1 100644
--- a/docs/install/Tables.md
+++ b/docs/install/Tables.md
@@ -1,6 +1,98 @@
# 附录
+
+
+## **飞桨支持的 Nvidia GPU 架构及安装方式**
+
+
+
+
+ GPU 架构 |
+ Compute Capability |
+ 对应 GPU 硬件型号 |
+ 请下载以下 CUDA 版本的飞桨安装包 |
+
+
+
+
+ Fermi |
+ sm_20 |
+ GeForce 400, 500, 600, GT-630 |
+ 不支持 |
+
+
+ Kepler |
+ sm_30 |
+ GeForce 700, GT-730 |
+ 不支持 |
+
+
+ Kepler |
+ sm_35 |
+ Tesla K40 |
+ CUDA10 |
+
+
+ Kepler |
+ sm_37 |
+ Tesla K80 |
+ CUDA10 |
+
+
+ Maxwell |
+ sm_50 |
+ Tesla/Quadro M series |
+ CUDA10、CUDA11 |
+
+
+ Maxwell |
+ sm_52 |
+ Quadro M6000 , GeForce 900, GTX-970, GTX-980, GTX Titan X |
+ CUDA10、CUDA11 |
+
+
+ Pascal |
+ sm_60 |
+ Quadro GP100, Tesla P100, DGX-1 |
+ CUDA10、CUDA11 |
+
+
+ Pascal |
+ sm_61 |
+ GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030 (GP108), GT 1010 (GP108) Titan Xp, Tesla P40, Tesla P4 |
+ CUDA10、CUDA11 |
+
+
+ Volta |
+ sm_70 |
+ DGX-1 with Volta, Tesla V100, GTX 1180 (GV104), Titan V, Quadro GV100 |
+ CUDA10、CUDA11 |
+
+
+ Turing |
+ sm_75 |
+ GTX/RTX Turing – GTX 1660 Ti, RTX 2060, RTX 2070, RTX 2080, Titan RTX, Quadro RTX 4000, Quadro RTX 5000, Quadro RTX 6000, Quadro RTX 8000, Quadro T1000/T2000, Tesla T4 |
+ CUDA10、CUDA11 |
+
+
+ Ampere |
+ sm_80 |
+ NVIDIA A100, GA100, NVIDIA DGX-A100 |
+ CUDA11 |
+
+
+ Ampere |
+ sm_86 |
+ Tesla GA10x cards, RTX Ampere – RTX 3080, GA102 – RTX 3090, RTX A2000, A3000, RTX A4000, A5000, A6000, NVIDIA A40, GA106 – RTX 3060, GA104 – RTX 3070, GA107 – RTX 3050, RTX A10, RTX A16, RTX A40, A2 Tensor Core GPU |
+ CUDA11、CUDA11.2(推荐) |
+
+
+
+
+
+
+
## **编译依赖表**
@@ -27,9 +119,9 @@
|
- Clang (macOS Only) |
+ Clang (MacOS Only) |
9.0 及以上 |
- 通常使用 macOS 10.11 及以上的系统对应的 Clang 版本即可 |
+ 通常使用 MacOS 10.11 及以上的系统对应的 Clang 版本即可 |
|
@@ -102,7 +194,7 @@
unrar |
|
|
- brew install rar (For macOS), apt-get install unrar (For Ubuntu) |
+ brew install unrar (For MacOS), apt-get install unrar (For Ubuntu) |
@@ -228,11 +320,11 @@ PaddePaddle 通过编译时指定路径来实现引用各种 BLAS/CUDA/cuDNN 库
- paddlepaddle==[版本号] 例如 paddlepaddle==2.2.1 |
+ paddlepaddle==[版本号] 例如 paddlepaddle==2.4.2 |
只支持 CPU 对应版本的 PaddlePaddle,具体版本请参见Pypi |
- paddlepaddle-gpu==[版本号] 例如 paddlepaddle-gpu==2.2.1 |
+ paddlepaddle-gpu==[版本号] 例如 paddlepaddle-gpu==2.4.2 |
默认安装支持 CUDA 10.2 和 cuDNN 7 的对应[版本号]的 PaddlePaddle 安装包 |
@@ -242,7 +334,7 @@ PaddePaddle 通过编译时指定路径来实现引用各种 BLAS/CUDA/cuDNN 库
您可以在 [Release History](https://pypi.org/project/paddlepaddle-gpu/#history) 中找到 PaddlePaddle-gpu 的各个发行版本。
> 其中`postXX` 对应的是 CUDA 和 cuDNN 的版本,`postXX`之前的数字代表 Paddle 的版本
-需要注意的是,命令中 paddlepaddle-gpu==2.2.1
在 windows 环境下,会默认安装支持 CUDA 10.2 和 cuDNN 7 的对应[版本号]的 PaddlePaddle 安装包
+需要注意的是,命令中 paddlepaddle-gpu==2.4.2
在 windows 环境下,会默认安装支持 CUDA 10.2 和 cuDNN 7 的对应[版本号]的 PaddlePaddle 安装包
@@ -258,195 +350,203 @@ PaddePaddle 通过编译时指定路径来实现引用各种 BLAS/CUDA/cuDNN 库
cp37-cp37m |
cp38-cp38 |
cp39-cp39 |
+ cp310-cp310 |
cpu-mkl-avx |
- paddlepaddle-2.2.1-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle-2.2.1-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle-2.2.1-cp38-cp38-linux_x86_64.whl |
- paddlepaddle-2.2.1-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp36-cp36m-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp37-cp37m-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp310-cp310-linux_x86_64.whl |
cpu-openblas-avx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-linux_x86_64.whl |
+ - |
- |
cpu-mkl-noavx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-linux_x86_64.whl |
+ - |
- |
cpu-openblas-noavx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-linux_x86_64.whl |
- |
-
-
- cuda10.1-cudnn7-mkl-gcc5.4-avx |
-
- paddlepaddle_gpu-2.2.1.post101-cp36-cp36m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post101-cp37-cp37m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post101-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post101-cp39-cp39-linux_x86_64.whl |
-
-
- cuda10.1-cudnn7-mkl-gcc5.4-noavx |
- - |
- - |
-
- paddlepaddle_gpu-2.2.1.post101-cp38-cp38-linux_x86_64.whl |
- |
cuda10.2-cudnn7-mkl-gcc8.2-avx |
-
- paddlepaddle_gpu-2.2.1-cp36-cp36m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1-cp37-cp37m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp36-cp36m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp37-cp37m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp310-cp310-linux_x86_64.whl |
cuda10.2-cudnn7-mkl-gcc8.2-noavx |
- |
- |
-
- paddlepaddle_gpu-2.2.1-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp38-cp38-linux_x86_64.whl |
+ - |
- |
-
-
- cuda11.0-cudnn8.0-mkl-gcc8.2-avx |
-
- paddlepaddle_gpu-2.2.1.post110-cp36-cp36m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post110-cp37-cp37m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post110-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post110-cp39-cp39-linux_x86_64.whl |
-
-
- cuda11.1-cudnn8.1-mkl-gcc8.2-avx |
-
- paddlepaddle_gpu-2.2.1.post111-cp36-cp36m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post111-cp37-cp37m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post111-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post111-cp39-cp39-linux_x86_64.whl |
cuda11.2-cudnn8.1-mkl-gcc8.2-avx |
-
- paddlepaddle_gpu-2.2.1.post112-cp36-cp36m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post112-cp37-cp37m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post112-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post112-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post112-cp36-cp36m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post112-cp37-cp37m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post112-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post112-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post112-cp310-cp310-linux_x86_64.whl |
+
+
+ cuda11.6-cudnn8.4-mkl-gcc8.2-avx |
+
+ paddlepaddle_gpu-2.4.2.post116-cp36-cp36m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post116-cp37-cp37m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post116-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post116-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post116-cp310-cp310-linux_x86_64.whl |
+
+
+ cuda11.7-cudnn8.4-mkl-gcc8.2-avx |
+
+ paddlepaddle_gpu-2.4.2.post117-cp36-cp36m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post117-cp37-cp37m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post117-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post117-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post117-cp310-cp310-linux_x86_64.whl |
macos-cpu-openblas |
-
- paddlepaddle-2.2.1-cp36-cp36m-macosx_10_6_intel.whl |
-
- paddlepaddle-2.2.1-cp37-cp37m-macosx_10_6_intel.whl |
-
- paddlepaddle-2.2.1-cp38-cp38-macosx_10_14_x86_64.whl |
-
- paddlepaddle-2.2.1-cp39-cp39-macosx_10_14_x86_64.whl |
+
+ paddlepaddle-2.4.2-cp36-cp36m-macosx_10_6_intel.whl |
+
+ paddlepaddle-2.4.2-cp37-cp37m-macosx_10_6_intel.whl |
+
+ paddlepaddle-2.4.2-cp38-cp38-macosx_10_14_x86_64.whl |
+
+ paddlepaddle-2.4.2-cp39-cp39-macosx_10_14_x86_64.whl |
+
+ paddlepaddle-2.4.2-cp310-cp310-macosx_10_14_universal2.whl |
+
+
+ macos-cpu-openblas-noavx |
+
+ paddlepaddle-2.4.2-cp36-cp36m-macosx_10_6_intel.whl |
+
+ paddlepaddle-2.4.2-cp37-cp37m-macosx_10_6_intel.whl |
+
+ paddlepaddle-2.4.2-cp38-cp38-macosx_10_14_x86_64.whl |
+
+ paddlepaddle-2.4.2-cp39-cp39-macosx_10_14_x86_64.whl |
+
+ paddlepaddle-2.4.2-cp310-cp310-macosx_10_14_universal2.whl |
+
+
+ macos-cpu-openblas-m1 |
+ - |
+ - |
+
+ paddlepaddle-2.4.2-cp38-cp38-macosx_11_0_arm64.whl |
+
+ paddlepaddle-2.4.2-cp39-cp39-macosx_11_0_arm64.whl |
+
+ paddlepaddle-2.4.2-cp310-cp310-macosx_11_0_arm64.whl |
win-cpu-mkl-avx |
- paddlepaddle-2.2.1-cp36-cp36m-win_amd64.whl |
- paddlepaddle-2.2.1-cp37-cp37m-win_amd64.whl |
- paddlepaddle-2.2.1-cp38-cp38-win_amd64.whl |
- paddlepaddle-2.2.1-cp39-cp39-win_amd64.whl |
+ paddlepaddle-2.4.2-cp36-cp36m-win_amd64.whl |
+ paddlepaddle-2.4.2-cp37-cp37m-win_amd64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-win_amd64.whl |
+ paddlepaddle-2.4.2-cp39-cp39-win_amd64.whl |
+ paddlepaddle-2.4.2-cp310-cp310-win_amd64.whl |
win-cpu-mkl-noavx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-win_amd64.whl |
+ - |
- |
win-cpu-openblas-avx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-win_amd64.whl |
+ - |
- |
win-cpu-openblas-noavx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-win_amd64.whl |
- |
-
-
- win-cuda10.1-cudnn7-mkl-vs2017-avx |
- paddlepaddle_gpu-2.2.1.post101-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post101-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post101-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post101-cp39-cp39-win_amd64.whl |
-
-
- win-cuda10.1-cudnn7-mkl-vs2017-noavx |
- - |
- - |
- paddlepaddle_gpu-2.2.1.post101-cp38-cp38-win_amd64.whl |
- |
win-cuda10.2-cudnn7-mkl-vs2017-avx |
- paddlepaddle_gpu-2.2.1.post102-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post102-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post102-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post102-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp36-cp36m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp37-cp37m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp310-cp310-win_amd64.whl |
win-cuda10.2-cudnn7-mkl-vs2017-noavx |
- |
- paddlepaddle_gpu-2.2.1.post102-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post102-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp37-cp37m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp38-cp38-win_amd64.whl |
+ - |
- |
- win-cuda11.0-cudnn8.0-mkl-vs2017-avx |
- paddlepaddle_gpu-2.2.1.post110-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post110-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post110-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post110-cp39-cp39-win_amd64.whl |
-
-
- win-cuda11.1-cudnn8.1-mkl-vs2017-avx |
- paddlepaddle_gpu-2.2.1.post111-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post111-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post111-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post111-cp39-cp39-win_amd64.whl |
+ win-cuda11.2-cudnn8.2-mkl-vs2017-avx |
+ paddlepaddle_gpu-2.4.2.post112-cp36-cp36m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post112-cp37-cp37m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post112-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post112-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post112-cp310-cp310-win_amd64.whl |
- win-cuda11.2-cudnn8.2-mkl-vs2017-avx |
- paddlepaddle_gpu-2.2.1.post112-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post112-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post112-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post112-cp39-cp39-win_amd64.whl |
+ win-cuda11.6-cudnn8.4-mkl-vs2017-avx |
+ paddlepaddle_gpu-2.4.2.post116-cp36-cp36m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post116-cp37-cp37m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post116-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post116-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post116-cp310-cp310-win_amd64.whl |
@@ -493,68 +593,54 @@ platform tag: 类似 'linux_x86_64', 'any'
版本说明 |
- cp36-cp36m |
cp37-cp37m |
cp38-cp38 |
cp39-cp39 |
+ cp310-cp310 |
- cpu-mkl |
- paddlepaddle-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle-latest-cp39-cp39-linux_x86_64.whl |
-
-
- cpu-openblas |
- paddlepaddle-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle-latest-cp39-cp39-linux_x86_64.whl |
+ linux-cpu-mkl-avx |
+ paddlepaddle-latest-cp37-cp37m-linux_x86_64.whl |
+ paddlepaddle-latest-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-latest-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle-latest-cp310-cp310-linux_x86_64.whl |
- cuda10.1-cudnn7-mkl-gcc5.4 |
- paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ linux-cpu-openblas-avx |
+ - |
+ paddlepaddle-latest-cp38-cp38-linux_x86_64.whl |
+ - |
+ - |
cuda10.2-cudnn7-mkl |
- paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl |
- cuda11.0-cudnn8.0-mkl |
- paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ cuda11.2-cudnn8.1-mkl |
+ paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl |
- cuda11.1-cudnn8.1-mkl |
- paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
-
-
- cuda11.2-cudnn8.1-mkl |
- paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ cuda11.6-cudnn8.4-mkl |
+ paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl |
mac-cpu |
- paddlepaddle-cp36-cp36m-macosx_10_6_intel.whl |
paddlepaddle-cp37-cp37m-macosx_10_6_intel.whl |
paddlepaddle-cp38-cp38-macosx_10_14_x86_64.whl |
paddlepaddle-cp39-cp39-macosx_10_14_x86_64.whl |
+ paddlepaddle-cp310-cp310-macosx_10_14_universal2.whl |
win-cpu-mkl-avx |
@@ -584,20 +670,6 @@ platform tag: 类似 'linux_x86_64', 'any'
paddlepaddle-latest-cp38-cp38-win_amd64.whl |
- |
-
- win-cuda10.1-cudnn7-mkl-vs2017-avx |
- paddlepaddle_gpu-latest-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl |
-
-
- win-cuda10.1-cudnn7-mkl-vs2017-noavx |
- - |
- - |
- paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl |
- - |
-
win-cuda10.2-cudnn7-mkl-vs2017-avx |
paddlepaddle_gpu-latest-cp36-cp36m-win_amd64.whl |
@@ -612,20 +684,6 @@ platform tag: 类似 'linux_x86_64', 'any'
paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl |
- |
-
- win-cuda11.0-cudnn8.0-mkl-vs2017-avx |
- paddlepaddle_gpu-latest-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl |
-
-
- win-cuda11.1-cudnn8.1-mkl-vs2017-avx |
- paddlepaddle_gpu-latest-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl |
-
win-cuda11.2-cudnn8.2-mkl-vs2017-avx |
paddlepaddle_gpu-latest-cp36-cp36m-win_amd64.whl |
diff --git a/docs/install/Tables_en.md b/docs/install/Tables_en.md
index e77772efa7e..20c1878ccd7 100644
--- a/docs/install/Tables_en.md
+++ b/docs/install/Tables_en.md
@@ -28,9 +28,9 @@
|
- Clang (macOS Only) |
+ Clang (MacOS Only) |
9.0 and above |
- Usually use the clang version of macOS 10.11 and above |
+ Usually use the clang version of MacOS 10.11 and above |
|
@@ -103,7 +103,7 @@
unrar |
|
|
- brew install rar (For macOS), apt-get install unrar (For Ubuntu) |
+ brew install unrar (For MacOS), apt-get install unrar (For Ubuntu) |
@@ -200,7 +200,7 @@ PaddlePaddle can be compiled and run using any version after cuDNN v5.1, but try
PaddePaddle implements references to various BLAS/CUDA/cuDNN libraries by specifying paths at compile time. When cmake compiles, it first searches the system paths ( `/usr/liby` and `/usr/local/lib` ) for these libraries, and also reads the relevant path variables for searching. Can be set by using the `-D` command, for example:
-> `cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCUDNN_ROOT=/opt/cudnnv5`
+> `Cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCUDNN_ROOT=/opt/cudnnv5`
**Note**: The settings introduced here for these compilation options are only valid for the first cmake. If you want to reset it later, it is recommended to clean up the entire build directory ( rm -rf ) and then specify it.
@@ -220,11 +220,11 @@ PaddePaddle implements references to various BLAS/CUDA/cuDNN libraries by specif
- paddlepaddle==[version code] such as paddlepaddle==2.2.1 |
+ paddlepaddle==[version code] such as paddlepaddle==2.4.2 |
Only support the corresponding version of the CPU PaddlePaddle, please refer to Pypi for the specific version. |
- paddlepaddle-gpu==[version code], such as paddlepaddle-gpu==2.2.1 |
+ paddlepaddle-gpu==[version code], such as paddlepaddle-gpu==2.4.2 |
The default installation supports the PaddlePaddle installation package corresponding to [version number] of CUDA 10.2 and cuDNN 7 |
@@ -234,7 +234,7 @@ PaddePaddle implements references to various BLAS/CUDA/cuDNN libraries by specif
You can find various distributions of PaddlePaddle-gpu in [the Release History](https://pypi.org/project/paddlepaddle-gpu/#history).
> 'postxx' corresponds to CUDA and cuDNN versions, and the number before 'postxx' represents the version of Paddle
-Please note that: in the commands, paddlepaddle-gpu==2.2.1
will install the installation package of PaddlePaddle that supports CUDA 10.2 and cuDNN 7 by default under Windows environment.
+Please note that: in the commands, paddlepaddle-gpu==2.4.2
will install the installation package of PaddlePaddle that supports CUDA 10.2 and cuDNN 7 by default under Windows environment.
@@ -252,195 +252,204 @@ Please note that: in the commands, paddlepaddle-gpu==2.2.1
will i
cp37-cp37m |
cp38-cp38 |
cp39-cp39 |
+ cp310-cp310 |
cpu-mkl-avx |
- paddlepaddle-2.2.1-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle-2.2.1-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle-2.2.1-cp38-cp38-linux_x86_64.whl |
- paddlepaddle-2.2.1-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp36-cp36m-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp37-cp37m-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp310-cp310-linux_x86_64.whl |
cpu-openblas-avx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-linux_x86_64.whl |
+ - |
- |
cpu-mkl-noavx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-linux_x86_64.whl |
+ - |
- |
cpu-openblas-noavx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-linux_x86_64.whl |
- - |
-
-
- cuda10.1-cudnn7-mkl-gcc5.4-avx |
-
- paddlepaddle_gpu-2.2.1.post101-cp36-cp36m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post101-cp37-cp37m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post101-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post101-cp39-cp39-linux_x86_64.whl |
-
-
- cuda10.1-cudnn7-mkl-gcc5.4-noavx |
- - |
+ paddlepaddle-2.4.2-cp38-cp38-linux_x86_64.whl |
- |
-
- paddlepaddle_gpu-2.2.1.post101-cp38-cp38-linux_x86_64.whl |
- |
cuda10.2-cudnn7-mkl-gcc8.2-avx |
-
- paddlepaddle_gpu-2.2.1-cp36-cp36m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1-cp37-cp37m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp36-cp36m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp37-cp37m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp310-cp310-linux_x86_64.whl |
cuda10.2-cudnn7-mkl-gcc8.2-noavx |
- |
- |
-
- paddlepaddle_gpu-2.2.1-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2-cp38-cp38-linux_x86_64.whl |
+ - |
- |
-
-
- cuda11.0-cudnn8.0-mkl-gcc8.2-avx |
-
- paddlepaddle_gpu-2.2.1.post110-cp36-cp36m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post110-cp37-cp37m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post110-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post110-cp39-cp39-linux_x86_64.whl |
-
-
- cuda11.1-cudnn8.1-mkl-gcc8.2-avx |
-
- paddlepaddle_gpu-2.2.1.post111-cp36-cp36m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post111-cp37-cp37m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post111-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post111-cp39-cp39-linux_x86_64.whl |
cuda11.2-cudnn8.1-mkl-gcc8.2-avx |
-
- paddlepaddle_gpu-2.2.1.post112-cp36-cp36m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post112-cp37-cp37m-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post112-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-2.2.1.post112-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post112-cp36-cp36m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post112-cp37-cp37m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post112-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post112-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post112-cp310-cp310-linux_x86_64.whl |
+
+
+ cuda11.6-cudnn8.4-mkl-gcc8.2-avx |
+
+ paddlepaddle_gpu-2.4.2.post116-cp36-cp36m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post116-cp37-cp37m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post116-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post116-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post116-cp310-cp310-linux_x86_64.whl |
+
+
+ cuda11.7-cudnn8.4-mkl-gcc8.2-avx |
+
+ paddlepaddle_gpu-2.4.2.post117-cp36-cp36m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post117-cp37-cp37m-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post117-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post117-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-2.4.2.post117-cp310-cp310-linux_x86_64.whl |
macos-cpu-openblas |
-
- paddlepaddle-2.2.1-cp36-cp36m-macosx_10_6_intel.whl |
-
- paddlepaddle-2.2.1-cp37-cp37m-macosx_10_6_intel.whl |
-
- paddlepaddle-2.2.1-cp38-cp38-macosx_10_14_x86_64.whl |
-
- paddlepaddle-2.2.1-cp39-cp39-macosx_10_14_x86_64.whl |
+
+ paddlepaddle-2.4.2-cp36-cp36m-macosx_10_6_intel.whl |
+
+ paddlepaddle-2.4.2-cp37-cp37m-macosx_10_6_intel.whl |
+
+ paddlepaddle-2.4.2-cp38-cp38-macosx_10_14_x86_64.whl |
+
+ paddlepaddle-2.4.2-cp39-cp39-macosx_10_14_x86_64.whl |
+
+ paddlepaddle-2.4.2-cp310-cp310-macosx_10_14_universal2.whl |
+
+
+ macos-cpu-openblas-noavx |
+
+ paddlepaddle-2.4.2-cp36-cp36m-macosx_10_6_intel.whl |
+
+ paddlepaddle-2.4.2-cp37-cp37m-macosx_10_6_intel.whl |
+
+ paddlepaddle-2.4.2-cp38-cp38-macosx_10_14_x86_64.whl |
+
+ paddlepaddle-2.4.2-cp39-cp39-macosx_10_14_x86_64.whl |
+
+ paddlepaddle-2.4.2-cp310-cp310-macosx_10_14_universal2.whl |
+
+
+ macos-cpu-openblas-m1 |
+ - |
+ - |
+
+ paddlepaddle-2.4.2-cp38-cp38-macosx_11_0_arm64.whl |
+
+ paddlepaddle-2.4.2-cp39-cp39-macosx_11_0_arm64.whl |
+
+ paddlepaddle-2.4.2-cp310-cp310-macosx_11_0_arm64.whl |
win-cpu-mkl-avx |
- paddlepaddle-2.2.1-cp36-cp36m-win_amd64.whl |
- paddlepaddle-2.2.1-cp37-cp37m-win_amd64.whl |
- paddlepaddle-2.2.1-cp38-cp38-win_amd64.whl |
- paddlepaddle-2.2.1-cp39-cp39-win_amd64.whl |
+ paddlepaddle-2.4.2-cp36-cp36m-win_amd64.whl |
+ paddlepaddle-2.4.2-cp37-cp37m-win_amd64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-win_amd64.whl |
+ paddlepaddle-2.4.2-cp39-cp39-win_amd64.whl |
+ paddlepaddle-2.4.2-cp310-cp310-win_amd64.whl |
win-cpu-mkl-noavx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-win_amd64.whl |
+ - |
- |
win-cpu-openblas-avx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle-2.4.2-cp38-cp38-win_amd64.whl |
+ - |
- |
win-cpu-openblas-noavx |
- |
- |
- paddlepaddle-2.2.1-cp38-cp38-win_amd64.whl |
- - |
-
-
- win-cuda10.1-cudnn7-mkl-vs2017-avx |
- paddlepaddle_gpu-2.2.1.post101-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post101-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post101-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post101-cp39-cp39-win_amd64.whl |
-
-
- win-cuda10.1-cudnn7-mkl-vs2017-noavx |
- - |
+ paddlepaddle-2.4.2-cp38-cp38-win_amd64.whl |
- |
- paddlepaddle_gpu-2.2.1.post101-cp38-cp38-win_amd64.whl |
- |
+
win-cuda10.2-cudnn7-mkl-vs2017-avx |
- paddlepaddle_gpu-2.2.1.post102-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post102-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post102-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post102-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp36-cp36m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp37-cp37m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp310-cp310-win_amd64.whl |
win-cuda10.2-cudnn7-mkl-vs2017-noavx |
- |
- paddlepaddle_gpu-2.2.1.post102-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post102-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp37-cp37m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2-cp38-cp38-win_amd64.whl |
+ - |
- |
- win-cuda11.0-cudnn8.0-mkl-vs2017-avx |
- paddlepaddle_gpu-2.2.1.post110-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post110-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post110-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post110-cp39-cp39-win_amd64.whl |
-
-
- win-cuda11.1-cudnn8.1-mkl-vs2017-avx |
- paddlepaddle_gpu-2.2.1.post111-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post111-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post111-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post111-cp39-cp39-win_amd64.whl |
+ win-cuda11.2-cudnn8.2-mkl-vs2017-avx |
+ paddlepaddle_gpu-2.4.2.post112-cp36-cp36m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post112-cp37-cp37m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post112-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post112-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post112-cp310-cp310-win_amd64.whl |
- win-cuda11.2-cudnn8.2-mkl-vs2017-avx |
- paddlepaddle_gpu-2.2.1.post112-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post112-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post112-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-2.2.1.post112-cp39-cp39-win_amd64.whl |
+ win-cuda11.6-cudnn8.4-mkl-vs2017-avx |
+ paddlepaddle_gpu-2.4.2.post116-cp36-cp36m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post116-cp37-cp37m-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post116-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post116-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-2.4.2.post116-cp310-cp310-win_amd64.whl |
@@ -490,69 +499,55 @@ platform tag: similar to 'linux_x86_64', 'any'
- version number |
- cp36-cp36m |
+ 版本说明 |
cp37-cp37m |
cp38-cp38 |
cp39-cp39 |
+ cp310-cp310 |
- cpu-mkl |
- paddlepaddle-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle-latest-cp39-cp39-linux_x86_64.whl |
-
-
- cpu-openblas |
- paddlepaddle-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle-latest-cp39-cp39-linux_x86_64.whl |
+ linux-cpu-mkl-avx |
+ paddlepaddle-latest-cp37-cp37m-linux_x86_64.whl |
+ paddlepaddle-latest-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-latest-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle-latest-cp310-cp310-linux_x86_64.whl |
- cuda10.1-cudnn7-mkl-gcc5.4 |
- paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ linux-cpu-openblas-avx |
+ - |
+ paddlepaddle-latest-cp38-cp38-linux_x86_64.whl |
+ - |
+ - |
cuda10.2-cudnn7-mkl |
- paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl |
- cuda11.0-cudnn8.0-mkl |
- paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
-
-
- cuda11.1-cudnn8.1-mkl |
- paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ cuda11.2-cudnn8.1-mkl |
+ paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl |
- cuda11.2-cudnn8.1-mkl |
- paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ cuda11.6-cudnn8.4-mkl |
+ paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle_gpu-latest-cp310-cp310-linux_x86_64.whl |
mac-cpu |
- paddlepaddle-cp36-cp36m-macosx_10_6_intel.whl |
paddlepaddle-cp37-cp37m-macosx_10_6_intel.whl |
paddlepaddle-cp38-cp38-macosx_10_14_x86_64.whl |
paddlepaddle-cp39-cp39-macosx_10_14_x86_64.whl |
+ paddlepaddle-cp310-cp310-macosx_10_14_universal2.whl |
win-cpu-mkl-avx |
@@ -582,20 +577,6 @@ platform tag: similar to 'linux_x86_64', 'any'
paddlepaddle-latest-cp38-cp38-win_amd64.whl |
- |
-
- win-cuda10.1-cudnn7-mkl-vs2017-avx |
- paddlepaddle_gpu-latest-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl |
-
-
- win-cuda10.1-cudnn7-mkl-vs2017-noavx |
- - |
- - |
- paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl |
- - |
-
win-cuda10.2-cudnn7-mkl-vs2017-avx |
paddlepaddle_gpu-latest-cp36-cp36m-win_amd64.whl |
@@ -610,20 +591,6 @@ platform tag: similar to 'linux_x86_64', 'any'
paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl |
- |
-
- win-cuda11.0-cudnn8.0-mkl-vs2017-avx |
- paddlepaddle_gpu-latest-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl |
-
-
- win-cuda11.1-cudnn8.1-mkl-vs2017-avx |
- paddlepaddle_gpu-latest-cp36-cp36m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp37-cp37m-win_amd64.whl |
- paddlepaddle_gpu-latest-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-latest-cp39-cp39-win_amd64.whl |
-
win-cuda11.2-cudnn8.2-mkl-vs2017-avx |
paddlepaddle_gpu-latest-cp36-cp36m-win_amd64.whl |
diff --git a/docs/install/compile/fromsource.rst b/docs/install/compile/fromsource.rst
index 21ff48887a3..cff034ca124 100644
--- a/docs/install/compile/fromsource.rst
+++ b/docs/install/compile/fromsource.rst
@@ -2,7 +2,7 @@
**从源码编译**
===========================
-.. toctree::
+.. toctree::
:maxdepth: 1
linux-compile.md
@@ -11,4 +11,3 @@
arm-compile.md
sw-compile.md
zhaoxin-compile.md
- mips-compile.md
diff --git a/docs/install/compile/fromsource_en.rst b/docs/install/compile/fromsource_en.rst
index 3e1592dc791..7f551994766 100644
--- a/docs/install/compile/fromsource_en.rst
+++ b/docs/install/compile/fromsource_en.rst
@@ -4,7 +4,7 @@
You can also choose to compile and install PaddlePaddle in the way of source code compilation. However, due to the diversity of the native environment, complicated problems may occur when compiling the source code, which may cause your installation to fail. In order to ensure your smooth installation, it is recommended that you prefer the normal installation method.
-.. toctree::
+.. toctree::
linux-compile_en.md
diff --git a/docs/install/compile/linux-compile.md b/docs/install/compile/linux-compile.md
index 9d7372c36f3..12ed39b9fe9 100644
--- a/docs/install/compile/linux-compile.md
+++ b/docs/install/compile/linux-compile.md
@@ -4,11 +4,11 @@
* **Linux 版本 (64 bit)**
* **CentOS 6 (不推荐,不提供编译出现问题时的官方支持)**
- * **CentOS 7 (GPU 版本支持 CUDA 10.1/10.2/11.0/11.1/11.2)**
+ * **CentOS 7 (GPU 版本支持 CUDA 10.1/10.2/11.1/11.2/11.6/11.7)**
* **Ubuntu 14.04 (不推荐,不提供编译出现问题时的官方支持)**
- * **Ubuntu 16.04 (GPU 版本支持 CUDA 10.1/10.2/11.0/11.1/11.2)**
- * **Ubuntu 18.04 (GPU 版本支持 CUDA 10.1/10.2/11.0/11.1/11.2)**
-* **Python 版本 3.6/3.7/3.8/3.9 (64 bit)**
+ * **Ubuntu 16.04 (GPU 版本支持 CUDA 10.1/10.2/11.1/11.2/11.6/11.7)**
+ * **Ubuntu 18.04 (GPU 版本支持 CUDA 10.1/10.2/11.1/11.2/11.6/11.7)**
+* **Python 版本 3.6/3.7/3.8/3.9/3.10 (64 bit)**
## 选择 CPU/GPU
@@ -16,13 +16,15 @@
* 如果您的计算机有 NVIDIA® GPU,请确保满足以下条件以编译 GPU 版 PaddlePaddle
- * **CUDA 工具包 10.1/10.2 配合 cuDNN 7 (cuDNN 版本>=7.6.5, 如需多卡支持,需配合 NCCL2.7 及更高)**
- * **CUDA 工具包 11.0 配合 cuDNN v8.0.4(如需多卡支持,需配合 NCCL2.7 及更高)**
- * **CUDA 工具包 11.1 配合 cuDNN v8.1.1(如需多卡支持,需配合 NCCL2.7 及更高)**
- * **CUDA 工具包 11.2 配合 cuDNN v8.1.1(如需多卡支持,需配合 NCCL2.7 及更高)**
+ * **CUDA 工具包 10.1 配合 cuDNN 7 (cuDNN 版本>=7.6.5, 如需多卡支持,需配合 NCCL2.7 及更高;不支持使用 TensorRT)**
+ * **CUDA 工具包 10.2 配合 cuDNN 7 (cuDNN 版本>=7.6.5, 如需多卡支持,需配合 NCCL2.7 及更高;如需使用 PaddleTensorRT 推理,需配合 TensorRT7.0.0.11)**
+ * **CUDA 工具包 11.1 配合 cuDNN v8.1.1(如需多卡支持,需配合 NCCL2.7 及更高;如需使用 PaddleTensorRT 推理,需配合 TensorRT7.2.3.4)**
+ * **CUDA 工具包 11.2 配合 cuDNN v8.1.1(如需多卡支持,需配合 NCCL2.7 及更高;如需使用 PaddleTensorRT 推理,需配合 TensorRT8.0.3.4)**
+ * **CUDA 工具包 11.6 配合 cuDNN v8.4.0(如需多卡支持,需配合 NCCL2.7 及更高;如需使用 PaddleTensorRT 推理,需配合 TensorRT8.4.0.6)**
+ * **CUDA 工具包 11.7 配合 cuDNN v8.4.1(如需多卡支持,需配合 NCCL2.7 及更高;如需使用 PaddleTensorRT 推理,需配合 TensorRT8.4.2.4)**
* **GPU 运算能力超过 3.5 的硬件设备**
- 您可参考 NVIDIA 官方文档了解 CUDA 和 CUDNN 的安装流程和配置方法,请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
+ 您可参考 NVIDIA 官方文档了解 CUDA、CUDNN 和 TensorRT 的安装流程和配置方法,请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/),[TensorRT](https://developer.nvidia.com/tensorrt)
## 安装步骤
@@ -129,13 +131,19 @@ cd Paddle
cd /paddle
```
-#### 6. 切换到 develop 版本进行编译:
+#### 6. 切换到较稳定版本下进行编译:
```
-git checkout develop
+git checkout [分支名]
```
-注意:python3.6、python3.7 版本从 release/1.2 分支开始支持, python3.8 版本从 release/1.8 分支开始支持, python3.9 版本从 release/2.1 分支开始支持
+例如:
+
+```
+git checkout release/2.4
+```
+
+注意:python3.6、python3.7 版本从 release/1.2 分支开始支持, python3.8 版本从 release/1.8 分支开始支持, python3.9 版本从 release/2.1 分支开始支持, python3.10 版本从 release/2.3 分支开始支持
#### 7. 创建并进入/paddle/build 路径下:
@@ -151,7 +159,7 @@ mkdir -p /paddle/build && cd /paddle/build
pip3.7 install protobuf
```
-注意:以上用 Python3.7 命令来举例,如您的 Python 版本为 3.6/3.8/3.9,请将上述命令中的 pip3.7 改成 pip3.6/pip3.8/pip3.9
+注意:以上用 Python3.7 命令来举例,如您的 Python 版本为 3.6/3.8/3.9/3.10,请将上述命令中的 pip3.7 改成 pip3.6/pip3.8/pip3.9/pip3.10
- 安装 patchelf,PatchELF 是一个小而实用的程序,用于修改 ELF 可执行文件的动态链接器和 RPATH。
@@ -203,7 +211,7 @@ pip3.7 install -U [whl 包的名字]
```
注意:
-以上用 Python3.7 命令来举例,如您的 Python 版本为 3.6/3.8/3.9,请将上述命令中的 pip3.7 改成 pip3.6/pip3.8/pip3.9。
+以上用 Python3.7 命令来举例,如您的 Python 版本为 3.6/3.8/3.9/3.10,请将上述命令中的 pip3.7 改成 pip3.6/pip3.8/pip3.9/pip3.10。
#### 恭喜,至此您已完成 PaddlePaddle 的编译安装。您只需要进入 Docker 容器后运行 PaddlePaddle,即可开始使用。更多 Docker 使用请参见[Docker 官方文档](https://docs.docker.com)
@@ -219,7 +227,7 @@ uname -m && cat /etc/*release
#### 2. 更新系统源
-* CentOS 环境
+* Centos 环境
更新`yum`的源:
@@ -246,7 +254,7 @@ uname -m && cat /etc/*release
* 如果您需要使用 GPU 多卡,请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA10.2,cuDNN7 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
- * **CentOS 系统可以参考以下命令**
+ * **Centos 系统可以参考以下命令**
```
wget http://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/nvidia-machine-learning-repo-rhel7-1.0.0-1.x86_64.rpm
@@ -280,7 +288,7 @@ uname -m && cat /etc/*release
#### 4. 安装必要的工具
-* CentOS 环境
+* Centos 环境
`bzip2`以及`make`:
@@ -378,13 +386,13 @@ uname -m && cat /etc/*release
(请参照 Python 官方流程安装, 并保证拥有 20.2.2 及以上的 pip3 版本,请注意,python3.6 及以上版本环境下,pip3 并不一定对应 python 版本,如 python3.7 下默认只有 pip3.7)
-* c.(Only For Python3)设置 Python3 相关的环境变量,这里以 python3.7 版本示例,请替换成您使用的版本(3.6、3.8、3.9):
+* c.(Only For Python3)设置 Python3 相关的环境变量,这里以 python3.7 版本示例,请替换成您使用的版本(3.6、3.8、3.9、3.10):
1. 首先使用
```
find `dirname $(dirname $(which python3))` -name "libpython3.so"
```
- 找到 Python lib 的路径,如果是 3.6、3.7、3.8、3.9,请将`python3`改成`python3.6`、`python3.7`、`python3.8`、`python3.9`,然后将下面[python-lib-path]替换为找到文件路径
+ 找到 Python lib 的路径,如果是 3.6、3.7、3.8、3.9、3.10,请将`python3`改成`python3.6`、`python3.7`、`python3.8`、`python3.9`、`python3.10`,然后将下面[python-lib-path]替换为找到文件路径
2. 设置 PYTHON_LIBRARIES:
```
@@ -408,7 +416,7 @@ uname -m && cat /etc/*release
```
(这里将[python-lib-path]的最后两级目录替换为/bin/)
-* d. 安装虚环境`virtualenv`以及`virtualenvwrapper`并创建名为`paddle-venv`的虚环境:(请注意对应 python 版本的 pip3 的命令,如 pip3.6、pip3.7、pip3.8、pip3.9)
+* d. 安装虚环境`virtualenv`以及`virtualenvwrapper`并创建名为`paddle-venv`的虚环境:(请注意对应 python 版本的 pip3 的命令,如 pip3.6、pip3.7、pip3.8、pip3.9、pip3.10)
1. 安装`virtualenv`
```
@@ -472,10 +480,16 @@ git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
```
-#### 9. 切换到 develop 分支进行编译:
+#### 9. 切换到较稳定 release 分支下进行编译:
+
+```
+git checkout [分支名]
+```
+
+例如:
```
-git checkout develop
+git checkout release/2.4
```
#### 10. 并且请创建并进入一个叫 build 的目录下:
@@ -500,11 +514,11 @@ mkdir build && cd build
> 请注意 PY_VERSION 参数更换为您需要的 python 版本
-* 对于需要编译**GPU 版本 PaddlePaddle**的用户:(**仅支持 CentOS7(CUDA11.2/CUDA11.0/CUDA10.2/CUDA10.1)**)
+* 对于需要编译**GPU 版本 PaddlePaddle**的用户:(**仅支持 CentOS7(CUDA11.7/CUDA11.6/CUDA11.2/CUDA11.1/CUDA10.2/CUDA10.1)**)
1. 请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA10.2,cuDNN7 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
- * **CentOS 系统可以参考以下命令**
+ * **Centos 系统可以参考以下命令**
```
wget http://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/nvidia-machine-learning-repo-rhel7-1.0.0-1.x86_64.rpm
@@ -538,7 +552,7 @@ mkdir build && cd build
cmake .. -DPYTHON_EXECUTABLE:FILEPATH=[您可执行的 Python3 的路径] -DPYTHON_INCLUDE_DIR:PATH=[之前的 PYTHON_INCLUDE_DIRS] -DPYTHON_LIBRARY:FILEPATH=[之前的 PYTHON_LIBRARY] -DWITH_GPU=ON
```
-注意:以上涉及 Python3 的命令,用 Python3.7 来举例,如您的 Python 版本为 3.6/3.8/3.9,请将上述命令中的 Python3.7 改成 Python3.6/Python3.8/Python3.9
+注意:以上涉及 Python3 的命令,用 Python3.7 来举例,如您的 Python 版本为 3.6/3.8/3.9/3.10,请将上述命令中的 Python3.7 改成 Python3.6/Python3.8/Python3.9/Python3.10
diff --git a/docs/install/compile/linux-compile_en.md b/docs/install/compile/linux-compile_en.md
index 1848d250d7f..755cfc70667 100644
--- a/docs/install/compile/linux-compile_en.md
+++ b/docs/install/compile/linux-compile_en.md
@@ -4,11 +4,11 @@
* **Linux version (64 bit)**
* **CentOS 6 (not recommended, no official support for compilation problems)**
- * **CentOS 7 (GPU version supports CUDA 10.1/10.2/11.0/11.1/11.2**
+ * **CentOS 7 (GPU version supports CUDA 10.1/10.2/11.1/11.2/11.6/11.7**
* **Ubuntu 14.04 (not recommended, no official support for compilation problems)**
- * **Ubuntu 16.04 (GPU version supports CUDA 10.1/10.2/11.0/11.1/11.2)**
- * **Ubuntu 18.04 (GPU version supports CUDA 10.1/10.2/11.0/11.1/11.2)**
-* **Python version 3.6/3.7/3.8/3.9 (64 bit)**
+ * **Ubuntu 16.04 (GPU version supports CUDA 10.1/10.2/11.1/11.2/11.6/11.7)**
+ * **Ubuntu 18.04 (GPU version supports CUDA 10.1/10.2/11.1/11.2/11.6/11.7)**
+* **Python version 3.6/3.7/3.8/3.9/3.10 (64 bit)**
## Choose CPU/GPU
@@ -16,13 +16,15 @@
* If your computer has NVIDIA® GPU, and the following conditions are met,GPU version of PaddlePaddle is recommended.
- * **CUDA toolkit 10.1/10.2 with cuDNN 7 (cuDNN version>=7.6.5, for multi card support, NCCL2.7 or higher)**
- * **CUDA toolkit 11.0 with cuDNN v8.0.4(for multi card support, NCCL2.7 or higher)**
- * **CUDA toolkit 11.1 with cuDNN v8.1.1(for multi card support, NCCL2.7 or higher)**
- * **CUDA toolkit 11.2 with cuDNN v8.1.1(for multi card support, NCCL2.7 or higher)**
+ * **CUDA toolkit 10.1 with cuDNN 7 (cuDNN version>=7.6.5, for multi card support, NCCL2.7 or higher;TensorRT is not supported)**
+ * **CUDA toolkit 10.2 with cuDNN 7 (cuDNN version>=7.6.5, for multi card support, NCCL2.7 or higher;for PaddleTensorRT deployment, TensorRT7.0.0.11)**
+ * **CUDA toolkit 11.1 with cuDNN v8.1.1(for multi card support, NCCL2.7 or higher;for PaddleTensorRT deployment, TensorRT7.2.3.4)**
+ * **CUDA toolkit 11.2 with cuDNN v8.1.1(for multi card support, NCCL2.7 or higher;for PaddleTensorRT deployment, TensorRT8.0.3.4)**
+ * **CUDA toolkit 11.6 with cuDNN v8.4.0(for multi card support, NCCL2.7 or higher;for PaddleTensorRT deployment, TensorRT8.4.0.6)**
+ * **CUDA toolkit 11.7 with cuDNN v8.4.1(for multi card support, NCCL2.7 or higher;for PaddleTensorRT deployment, TensorRT8.4.2.4)**
* **Hardware devices with GPU computing power over 3.5**
- You can refer to NVIDIA official documents for installation process and configuration method of CUDA and cudnn. Please refer to[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
+ You can refer to NVIDIA official documents for installation process and configuration method of CUDA, cuDNN and TensorRT. Please refer to[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/),[TensorRT](https://developer.nvidia.com/tensorrt)
## Installation steps
@@ -135,13 +137,19 @@ Please make sure to allocate at least 4g of memory for docker, otherwise the com
cd /paddle
```
-#### 6. Switch to develop version to compile:
+#### 6. Switch to a more stable version to compile:
```
-git checkout develop
+git checkout [name of the branch]
```
-Note: python3.6、python3.7 version started supporting from release/1.2 branch, python3.8 version started supporting from release/1.8 branch, python3.9 version started supporting from release/2.1 branch
+For example:
+
+```
+git checkout release/2.4
+```
+
+Note: python3.6、python3.7 version started supporting from release/1.2 branch, python3.8 version started supporting from release/1.8 branch, python3.9 version started supporting from release/2.1 branch, python3.10 version started supporting from release/2.3 branch
#### 7. Create and enter the /paddle/build path:
@@ -157,7 +165,7 @@ mkdir -p /paddle/build && cd /paddle/build
pip3.7 install protobuf
```
-Note: We used Python3.7 command as an example above, if the version of your Python is 3.6/3.8/3.9, please change pip3.7 in the commands to pip3.6/pip3.8/pip3.9
+Note: We used Python3.7 command as an example above, if the version of your Python is 3.6/3.8/3.9/3.10, please change pip3.7 in the commands to pip3.6/pip3.8/pip3.9/pip3.10
- Installing patchelf, PatchELF is a small and useful program for modifying the dynamic linker and RPATH of ELF executables.
@@ -208,7 +216,7 @@ pip3.7 install -U [whl package name]
```
Note:
-We used Python3.7 command as an example above, if the version of your Python is 3.6/3.8/3.9, please change pip3.7 in the commands to pip3.6/pip3.8/pip3.9.
+We used Python3.7 command as an example above, if the version of your Python is 3.6/3.8/3.9/3.10, please change pip3.7 in the commands to pip3.6/pip3.8/pip3.9/pip3.10.
#### Congratulations, now that you have successfully installed PaddlePaddle using Docker, you only need to run PaddlePaddle after entering the Docker container. For more Docker usage, please refer to the [official Docker documentation](https://docs.docker.com/).
@@ -223,7 +231,7 @@ uname -m && cat /etc/*release
#### 2. Update the system source
-* CentOS system
+* Centos system
Update the source of `yum`: `yum update`, and add the necessary yum source:
```
@@ -242,7 +250,7 @@ uname -m && cat /etc/*release
* If you need to use multi card environment, please make sure that you have installed nccl2 correctly, or install nccl2 according to the following instructions (here is the installation instructions of nccl2 under CUDA10.2 and cuDNN7. For more version of installation information, please refer to NVIDIA[official website](https://developer.nvidia.com/nccl)):
- * **CentOS system can refer to the following commands**
+ * **Centos system can refer to the following commands**
```
wget http://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/nvidia-machine-learning-repo-rhel7-1.0.0-1.x86_64.rpm
@@ -271,7 +279,7 @@ uname -m && cat /etc/*release
#### 4. Install the necessary tools
-* CentOS system
+* Centos system
`bzip2` and `make`:
```
@@ -361,13 +369,13 @@ uname -m && cat /etc/*release
(Please refer to the official Python installation process, and ensure that the pip3 version 20.2.2 and above, please note that in python3.6 and above, pip3 does not necessarily correspond to the python version, such as python3.7 default only Pip3.7)
-* c. (Only For Python3) set Python3 related environment variables, here is python3.7 version example, please replace with the version you use (3.6, 3.8, 3.9):
+* c. (Only For Python3) set Python3 related environment variables, here is python3.7 version example, please replace with the version you use (3.6, 3.8, 3.9, 3.10):
1. First find the path to the Python lib using
```
find `dirname $(dirname $(which python3))` -name "libpython3.so"
```
- If it is 3.6,3.7,3.8,3.9, change `python3` to `python3.6`,`python3.7`, `python3.8`, `python3.9`, then replace [python-lib-path] in the following steps with the file path found.
+ If it is 3.6,3.7,3.8,3.9,3.10, change `python3` to `python3.6`,`python3.7`, `python3.8`, `python3.9`, `python3.10`, then replace [python-lib-path] in the following steps with the file path found.
2. Set PYTHON_LIBRARIES:
```
@@ -391,7 +399,7 @@ uname -m && cat /etc/*release
```
(here replace the last two levels content of [python-lib-path] with /bin/)
-* d. Install the virtual environment `virtualenv` and `virtualenvwrapper` and create a virtual environment called `paddle-venv`: (please note the pip3 commands corresponding to the python version, such as pip3.6, pip3.7, pip3.8, pip3.9)
+* d. Install the virtual environment `virtualenv` and `virtualenvwrapper` and create a virtual environment called `paddle-venv`: (please note the pip3 commands corresponding to the python version, such as pip3.6, pip3.7, pip3.8, pip3.9, pip3.10)
1. Install `virtualenv`:
```
@@ -459,10 +467,16 @@ git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
```
-#### 9. Switch to develop branch for compilation (support for Python 3.6 and 3.7 is added from the 1.2 branch, support for Python 3.8 is added from the 1.8 branch, support for Python 3.9 is added from the 2.1 branch,):
+#### 9. Switch to a more stable release branch for compilation (support for Python 3.6 and 3.7 is added from the 1.2 branch, support for Python 3.8 is added from the 1.8 branch, support for Python 3.9 is added from the 2.1 branch, support for Python 3.10 is added from the 2.3 branch):
+
+```
+git checkout [name of target branch]
+```
+
+For example:
```
-git checkout develop
+git checkout release/2.4
```
#### 10. And please create and enter a directory called build:
@@ -511,7 +525,7 @@ mkdir build && cd build
```
-Note: For the command involving Python 3, we use Python 3.7 as an example above, if the version of your Python is 3.6/3.8/3.9, please change Python3.7 in the commands to Python3.6/Python3.8/Python3.9
+Note: For the command involving Python 3, we use Python 3.7 as an example above, if the version of your Python is 3.6/3.8/3.9/3.10, please change Python3.7 in the commands to Python3.6/Python3.8/Python3.9/Python3.10
diff --git a/docs/install/compile/macos-compile.md b/docs/install/compile/macos-compile.md
index 8995dd50308..d9ce6d30049 100644
--- a/docs/install/compile/macos-compile.md
+++ b/docs/install/compile/macos-compile.md
@@ -1,16 +1,16 @@
-# **macOS 下从源码编译**
+# **MacOS 下从源码编译**
## 环境准备
-* **macOS 版本 10.x/11.x (64 bit) (不支持 GPU 版本)**
-* **Python 版本 3.6/3.7/3.8/3.9 (64 bit)**
+* **MacOS 版本 10.x/11.x (64 bit) (不支持 GPU 版本)**
+* **Python 版本 3.6/3.7/3.8/3.9/3.10 (64 bit)**
## 选择 CPU/GPU
-* 目前仅支持在 macOS 环境下编译安装 CPU 版本的 PaddlePaddle
+* 目前仅支持在 MacOS 环境下编译安装 CPU 版本的 PaddlePaddle
## 安装步骤
-在 macOS 系统下有 2 种编译方式,推荐使用 Docker 编译。
+在 MacOS 系统下有 2 种编译方式,推荐使用 Docker 编译。
Docker 环境中已预装好编译 Paddle 需要的各种依赖,相较本机编译环境更简单。
* [Docker 源码编译](#compile_from_docker)
@@ -75,6 +75,7 @@ docker run --name paddle-test -v $PWD:/paddle --network=host -it registry.baidub
- `registry.baidubce.com/paddlepaddle/paddle:latest-dev`:使用名为`registry.baidubce.com/paddlepaddle/paddle:latest-dev`的镜像创建 Docker 容器,/bin/bash 进入容器后启动/bin/bash 命令
+
注意:
请确保至少为 docker 分配 4g 以上的内存,否则编译过程可能因内存不足导致失败。您可以在 docker 用户界面的“Preferences-Resources”中设置容器的内存分配上限。
@@ -84,13 +85,19 @@ docker run --name paddle-test -v $PWD:/paddle --network=host -it registry.baidub
cd /paddle
```
-#### 7. 切换到 develop 版本进行编译:
+#### 7. 切换到较稳定版本下进行编译:
+
+```
+git checkout [分支名]
+```
+
+例如:
```
-git checkout develop
+git checkout release/2.4
```
-注意:python3.6、python3.7 版本从 release/1.2 分支开始支持, python3.8 版本从 release/1.8 分支开始支持, python3.9 版本从 release/2.1 分支开始支持
+注意:python3.6、python3.7 版本从 release/1.2 分支开始支持, python3.8 版本从 release/1.8 分支开始支持, python3.9 版本从 release/2.1 分支开始支持, python3.10 版本从 release/2.3 分支开始支持
#### 8. 创建并进入/paddle/build 路径下:
@@ -106,7 +113,7 @@ mkdir -p /paddle/build && cd /paddle/build
pip3.7 install protobuf==3.1.0
```
-注意:以上用 Python3.7 命令来举例,如您的 Python 版本为 3.6/3.8/3.9,请将上述命令中的 pip3.7 改成 pip3.6/pip3.8/pip3.9
+注意:以上用 Python3.7 命令来举例,如您的 Python 版本为 3.6/3.8/3.9/3.10,请将上述命令中的 pip3.7 改成 pip3.6/pip3.8/pip3.9/pip3.10
- 安装 patchelf,PatchELF 是一个小而实用的程序,用于修改 ELF 可执行文件的动态链接器和 RPATH。
@@ -116,7 +123,7 @@ apt install patchelf
#### 10. 执行 cmake:
-* 对于需要编译**CPU 版本 PaddlePaddle**的用户(我们目前不支持 macOS 下 GPU 版本 PaddlePaddle 的编译):
+* 对于需要编译**CPU 版本 PaddlePaddle**的用户(我们目前不支持 MacOS 下 GPU 版本 PaddlePaddle 的编译):
```
cmake .. -DPY_VERSION=3.7 -DWITH_GPU=OFF
@@ -148,7 +155,7 @@ pip3.7 install -U [whl 包的名字]
```
注意:
-以上用 Python3.7 命令来举例,如您的 Python 版本为 3.6/3.8/3.9,请将上述命令中的 pip3.7 改成 pip3.6/pip3.8/pip3.9。
+以上用 Python3.7 命令来举例,如您的 Python 版本为 3.6/3.8/3.9/3.10,请将上述命令中的 pip3.7 改成 pip3.6/pip3.8/pip3.9/pip3.10。
#### 恭喜,至此您已完成 PaddlePaddle 的编译安装。您只需要进入 Docker 容器后运行 PaddlePaddle,即可开始使用。更多 Docker 使用请参见[Docker 官方文档](https://docs.docker.com)
@@ -167,7 +174,7 @@ uname -m
#### 2. 安装 Python 以及 pip:
-> **请不要使用 macOS 中自带 Python**,我们强烈建议您使用[Homebrew](https://brew.sh)安装 python(对于**Python3**请使用 python[官方下载](https://www.python.org/downloads/mac-osx/)python3.6.x、python3.7.x、python3.8、python3.9), pip 以及其他的依赖,这将会使您高效编译。
+> **请不要使用 MacOS 中自带 Python**,我们强烈建议您使用[Homebrew](https://brew.sh)安装 python(对于**Python3**请使用 python[官方下载](https://www.python.org/downloads/mac-osx/)python3.6.x、python3.7.x、python3.8、python3.9、python3.10), pip 以及其他的依赖,这将会使您高效编译。
使用 Python 官网安装
@@ -209,11 +216,11 @@ uname -m
```
(这里[python-ld-path]为[python-bin-path]的上一级目录)
-- g. (可选)如果您是在 macOS 10.14 上编译 PaddlePaddle,请保证您已经安装了[对应版本](http://developer.apple.com/download)的 Xcode。
+- g. (可选)如果您是在 MacOS 10.14 上编译 PaddlePaddle,请保证您已经安装了[对应版本](http://developer.apple.com/download)的 Xcode。
#### 4. **执行编译前**请您确认您的环境中安装有[编译依赖表](/documentation/docs/zh/install/Tables.html#third_party)中提到的相关依赖,否则我们强烈推荐使用`Homebrew`安装相关依赖。
-> macOS 下如果您未自行修改或安装过“编译依赖表”中提到的依赖,则仅需要使用`pip`安装`numpy,protobuf,wheel`,使用`Homebrew`安装`wget,swig, unrar`,另外安装`cmake`即可
+> MacOS 下如果您未自行修改或安装过“编译依赖表”中提到的依赖,则仅需要使用`pip`安装`numpy,protobuf,wheel`,使用`homebrew`安装`wget,swig, unrar`,另外安装`cmake`即可
- a. 这里特别说明一下**CMake**的安装:
@@ -237,13 +244,19 @@ git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
```
-#### 6. 切换到 develop 分支进行编译:
+#### 6. 切换到较稳定 release 分支下进行编译:
```
-git checkout develop
+git checkout [分支名]
```
-注意:python3.6、python3.7 版本从 release/1.2 分支开始支持, python3.8 版本从 release/1.8 分支开始支持, python3.9 版本从 release/2.1 分支开始支持
+例如:
+
+```
+git checkout release/2.4
+```
+
+注意:python3.6、python3.7 版本从 release/1.2 分支开始支持, python3.8 版本从 release/1.8 分支开始支持, python3.9 版本从 release/2.1 分支开始支持, python3.10 版本从 release/2.3 分支开始支持
#### 7. 并且请创建并进入一个叫 build 的目录下:
@@ -255,20 +268,37 @@ mkdir build && cd build
>具体编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)
-* 对于需要编译**CPU 版本 PaddlePaddle**的用户:
+* 若您的机器为 Mac M1 机器,需要编译**Arm 架构、CPU 版本 PaddlePaddle**:
+
+ ```
+ cmake .. -DPY_VERSION=3.8 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
+ -DPYTHON_LIBRARY=${PYTHON_LIBRARY} -DWITH_GPU=OFF \
+ -DWITH_AVX=OFF -DWITH_ARM=ON
+ ```
+
+* 若您的机器不是 Mac M1 机器,需要编译**x86_64 架构、CPU 版本 PaddlePaddle**:
```
- cmake .. -DPY_VERSION=3.7 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
+ cmake .. -DPY_VERSION=3.8 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
-DPYTHON_LIBRARY=${PYTHON_LIBRARY} -DWITH_GPU=OFF
```
->`-DPY_VERSION=3.7`请修改为安装环境的 Python 版本
+- `-DPY_VERSION=3.8`请修改为安装环境的 Python 版本
+- 若编译 arm 架构的 paddlepaddle,需要`cmake`版本为 3.19.2 以上
#### 9. 使用以下命令来编译:
-```
-make -j4
-```
+* 若您的机器为 Mac M1 机器,需要编译**Arm 架构、CPU 版本 PaddlePaddle**:
+
+ ```
+ make TARGET=ARMV8 -j4
+ ```
+
+* 若您的机器不是 Mac M1 机器,需要编译**x86_64 架构、CPU 版本 PaddlePaddle**:
+
+ ```
+ make -j4
+ ```
#### 10. 编译成功后进入`/paddle/build/python/dist`目录下找到生成的`.whl`包:
```
diff --git a/docs/install/compile/macos-compile_en.md b/docs/install/compile/macos-compile_en.md
index 8c2e8b5f7ee..de72cdebb8f 100644
--- a/docs/install/compile/macos-compile_en.md
+++ b/docs/install/compile/macos-compile_en.md
@@ -1,16 +1,16 @@
-# **Compile on macOS from Source Code**
+# **Compile on MacOS from Source Code**
## Environment preparation
-* **macOS version 10.x/11.x (64 bit) (not support GPU version)**
-* **Python version 3.6/3.7/3.8/3.9 (64 bit)**
+* **MacOS version 10.x/11.x (64 bit) (not support GPU version)**
+* **Python version 3.6/3.7/3.8/3.9/3.10 (64 bit)**
## Choose CPU/GPU
* Currently, only PaddlePaddle for CPU is supported.
## Installation steps
-There are two compilation methods in macOS system. It's recommended to use Docker to compile.
+There are two compilation methods in MacOS system. It's recommended to use Docker to compile.
The dependencies required for compiling Paddle are pre-installed in the Docker environment, which is simpler than the native compiling environment.
* [Compile with Docker](#compile_from_docker)
@@ -80,20 +80,25 @@ docker run --name paddle-test -v $PWD:/paddle --network=host -it registry.baidub
Note:
Please make sure to allocate at least 4g of memory for docker, otherwise the compilation process may fail due to insufficient memory. You can set a container's memory allocation cap in "Preferences-Resources" in the docker UI.
-
#### 6. After entering Docker, go to the paddle directory:
```
cd /paddle
```
-#### 7. Switch to develop version to compile:
+#### 7. Switch to a more stable version to compile:
+
+```
+git checkout [name of the branch]
+```
+
+For example:
```
-git checkout develop
+git checkout release/2.4
```
-Note: python3.6、python3.7 version started supporting from release/1.2 branch, python3.8 version started supporting from release/1.8 branch, python3.9 version started supporting from release/2.1 branch
+Note: python3.6、python3.7 version started supporting from release/1.2 branch, python3.8 version started supporting from release/1.8 branch, python3.9 version started supporting from release/2.1 branch, python3.10 version started supporting from release/2.3 branch
#### 8. Create and enter the /paddle/build path:
@@ -109,7 +114,7 @@ mkdir -p /paddle/build && cd /paddle/build
pip3.7 install protobuf==3.1.0
```
-Note: We used Python3.7 command as an example above, if the version of your Python is 3.6/3.8/3.9, please change pip3.7 in the commands to pip3.6/pip3.8/pip3.9
+Note: We used Python3.7 command as an example above, if the version of your Python is 3.6/3.8/3.9/3.10, please change pip3.7 in the commands to pip3.6/pip3.8/pip3.9/pip3.10
> Installing patchelf, PatchELF is a small and useful program for modifying the dynamic linker and RPATH of ELF executables.
@@ -119,7 +124,7 @@ apt install patchelf
#### 10. Execute cmake:
-* For users who need to compile the **CPU version PaddlePaddle** (We currently do not support the compilation of the GPU version PaddlePaddle under macOS):
+* For users who need to compile the **CPU version PaddlePaddle** (We currently do not support the compilation of the GPU version PaddlePaddle under MacOS):
```
cmake .. -DPY_VERSION=3.7 -DWITH_GPU=OFF
@@ -153,7 +158,7 @@ pip3.7 install -U [whl package name]
```
Note:
-We used Python3.7 command as an example above, if the version of your Python is 3.6/3.8/3.9, please change pip3.7 in the commands to pip3.6/pip3.8/pip3.9.
+We used Python3.7 command as an example above, if the version of your Python is 3.6/3.8/3.9/3.10, please change pip3.7 in the commands to pip3.6/pip3.8/pip3.9/pip3.10.
#### Congratulations, now that you have successfully installed PaddlePaddle using Docker, you only need to run PaddlePaddle after entering the Docker container. For more Docker usage, please refer to the [official Docker documentation](https://docs.docker.com/).
@@ -168,7 +173,7 @@ We used Python3.7 command as an example above, if the version of your Python is
#### 2. Install python and pip:
-> **Please do not use the Python initially given by macOS**, we strongly recommend that you use [Homebrew](https://brew.sh/) to install python (for Python3 please use python [official download](https://www.python.org/downloads/mac-osx/) python3.6.x, python3.7.x, python3.8, python3.9), pip and other dependencies, This will greatly reduce the difficulty of installing and compiling.
+> **Please do not use the Python initially given by MacOS**, we strongly recommend that you use [Homebrew](https://brew.sh/) to install python (for Python3 please use python [official download](https://www.python.org/downloads/mac-osx/) python3.6.x, python3.7.x, python3.8, python3.9, python3.10), pip and other dependencies, This will greatly reduce the difficulty of installing and compiling.
Install using Python official website
@@ -212,12 +217,12 @@ Install using Python official website
```
(here [python-ld-path] is the [python-bin-path]'s parent directory )
-- g. (Optional) If you are compiling PaddlePaddle on macOS 10.14, make sure you have the [appropriate version](http://developer.apple.com/download) of Xcode installed.
+- g. (Optional) If you are compiling PaddlePaddle on MacOS 10.14, make sure you have the [appropriate version](http://developer.apple.com/download) of Xcode installed.
#### 4. Before **compilation**, please confirm that the relevant dependencies mentioned in the [compilation dependency table](/documentation/docs/en/install/Tables_en.html/#third_party) are installed in your environment, otherwise we strongly recommend using `Homebrew` to install related dependencies.
-> Under macOS, if you have not modified or installed the dependencies mentioned in the "Compile Dependency Table", you only need to use `pip` to install `numpy`, `protobuf`, `wheel`, use `Homebrew` to install `wget`, `swig`,then install `cmake`.
+> Under MacOS, if you have not modified or installed the dependencies mentioned in the "Compile Dependency Table", you only need to use `pip` to install `numpy`, `protobuf`, `wheel`, use `homebrew` to install `wget`, `swig`,then install `cmake`.
- a. Here is a special description of the installation of **CMake**:
@@ -243,10 +248,16 @@ git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
```
-#### 6. Switch to develop branch to compile: (Note that python 3.6, python 3.7 version are supported from the 1.2 branch, python3.8 version started supporting from release/1.8 branch, python3.9 version started supporting from release/2.1 branch)
+#### 6. Switch to a more stable release branch to compile: (Note that python 3.6, python 3.7 version are supported from the 1.2 branch, python3.8 version started supporting from release/1.8 branch, python3.9 version started supporting from release/2.1 branch, python3.10 version started supporting from release/2.3 branch)
```
-git checkout develop
+git checkout [name of the branch]
+```
+
+For example:
+
+```
+git checkout release/2.4
```
#### 7. And please create and enter a directory called build:
@@ -259,20 +270,37 @@ mkdir build && cd build
> For details on the compilation options, see the [compilation options table](https://www.paddlepaddle.org.cn/documentation/docs/en/develop/install/Tables.html#Compile).
-* For users who need to compile the **CPU version PaddlePaddle**:
+* If you use Mac M1 machine, and need to compile the **Arm architecture, CPU version PaddlePaddle**:
+
+ ```
+ cmake .. -DPY_VERSION=3.8 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
+ -DPYTHON_LIBRARY=${PYTHON_LIBRARY} -DWITH_GPU=OFF \
+ -DWITH_AVX=OFF -DWITH_ARM=ON
+ ```
+
+* If you don't use Mac M1 machine, and need to compile the **x86_64 architecture, CPU version PaddlePaddle**:
```
- cmake .. -DPY_VERSION=3.7 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
- -DPYTHON_LIBRARY=${PYTHON_LIBRARY} -DWITH_GPU=OFF
+ cmake .. -DPY_VERSION=3.8 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
+ -DPYTHON_LIBRARY=${PYTHON_LIBRARY} -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF
```
-- ``-DPY_VERSION=3.7`` Please change to the Python version of the installation environment.
+- ``-DPY_VERSION=3.8`` Please change to the Python version of the installation environment.
+- If compiling paddlepaddle for arm architecture, you need ``cmake`` version 3.19.2 or above
#### 9. Compile with the following command:
-```
-make -j4
-```
+* If you use Mac M1 machine, and need to compile the **Arm architecture, CPU version PaddlePaddle**:
+
+ ```
+ make TARGET=ARMV8 -j4
+ ```
+
+* If you don't use Mac M1 machine, and need to compile the **x86_64 architecture, CPU version PaddlePaddle**:
+
+ ```
+ make -j4
+ ```
#### 10. After compiling successfully, go to the `/paddle/build/python/dist `directory and find the generated `.whl` package:
```
diff --git a/docs/install/compile/sw-compile.md b/docs/install/compile/sw-compile.md
index a85da8d4130..27fd31ac350 100644
--- a/docs/install/compile/sw-compile.md
+++ b/docs/install/compile/sw-compile.md
@@ -34,19 +34,29 @@
3. Paddle 依赖 cmake 进行编译构建,需要 cmake 版本>=3.15,检查操作系统源提供 cmake 的版本,使用源的方式直接安装 cmake, `apt install cmake`, 检查 cmake 版本, `cmake --version`, 如果 cmake >= 3.15 则不需要额外的操作,否则请修改 Paddle 主目录的`CMakeLists.txt`, `cmake_minimum_required(VERSION 3.15)` 修改为 `cmake_minimum_required(VERSION 3.0)`.
-4. 申威支持 openblas,使用 `yum` 安装 openblas 及其相关的依赖(如果安装失败,需要联系厂商解决安装问题)。
- 安装 openblas,得到 openblas 库文件及头文件 cblas.h;
- 安装 lapack:
- ```
- yum install lapack-devel.sw_64
- ```
- lapack 的搜索地址与 openblas 相同。
-
- 编译时出现以下 log 信息,表明 openblas 库链接成功:
- ```
- -- Found OpenBLAS (include: /usr/include/openblas, library: /usr/lib/libopenblas.so)
- -- Found lapack in OpenBLAS (include: /usr/include)
- ```
+4. 由于申威暂不支持 openblas,所以在此使用 blas + cblas 的方式,在此需要源码编译 blas 和 cblas。
+
+ ```
+ pushd /opt
+ wget http://www.netlib.org/blas/blas-3.8.0.tgz
+ wget http://www.netlib.org/blas/blast-forum/cblas.tgz
+ tar xzf blas-3.8.0.tgz
+ tar xzf cblas.tgz
+ pushd BLAS-3.8.0
+ make
+ popd
+ pushd CBLAS
+ # 修改 Makefile.in 中 BLLIB 为 BLAS-3.8.0 的编译产物 blas_LINUX.a
+ make
+ pushd lib
+ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD
+ ln -s cblas_LINUX.a libcblas.a
+ cp ../../BLAS-3.8.0/blas_LINUX.a .
+ ln -s blas_LINUX.a libblas.a
+ popd
+ popd
+ popd
+ ```
5. 根据[requirments.txt](https://github.com/PaddlePaddle/Paddle/blob/develop/python/requirements.txt)安装 Python 依赖库,注意在申威系统中一般无法直接使用 pip 或源码编译安装 python 依赖包,建议使用源的方式安装,如果遇到部分依赖包无法安装的情况,请联系操作系统服务商提供支持。此外也可以通过 pip 安装的时候加--no-deps 的方式来避免依赖包的安装,但该种方式可能导致包由于缺少依赖不可用。
@@ -66,13 +76,17 @@
>具体编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)
+ ```
+ CBLAS_ROOT=/opt/CBLAS
+ ```
+
For Python2:
```
- cmake .. -DPY_VERSION=2 -DPYTHON_EXECUTABLE=`which python2` -DWITH_MKL=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DON_INFER=ON -DWITH_PYTHON=ON -DWITH_XBYAK=OFF -DWITH_SW=ON -DCMAKE_CXX_FLAGS="-Wno-error -w" -DWITH_RCCL=OFF
+ cmake .. -DPY_VERSION=2 -DPYTHON_EXECUTABLE=`which python2` -DWITH_MKL=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DON_INFER=ON -DWITH_PYTHON=ON -DREFERENCE_CBLAS_ROOT=${CBLAS_ROOT} -DWITH_CRYPTO=OFF -DWITH_XBYAK=OFF -DWITH_SW=ON -DCMAKE_CXX_FLAGS="-Wno-error -w"
```
For Python3:
```
- cmake .. -DPY_VERSION=3 -DPYTHON_EXECUTABLE=`which python3` -DWITH_MKL=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DON_INFER=ON -DWITH_PYTHON=ON -DWITH_XBYAK=OFF -DWITH_SW=ON -DCMAKE_CXX_FLAGS="-Wno-error -w" -DWITH_RCCL=OFF
+ cmake .. -DPY_VERSION=3 -DPYTHON_EXECUTABLE=`which python3` -DWITH_MKL=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DON_INFER=ON -DWITH_PYTHON=ON -DREFERENCE_CBLAS_ROOT=${CBLAS_ROOT} -DWITH_CRYPTO=OFF -DWITH_XBYAK=OFF -DWITH_SW=ON -DCMAKE_CXX_FLAGS="-Wno-error -w"
```
9. 编译。
diff --git a/docs/install/compile/windows-compile.md b/docs/install/compile/windows-compile.md
index 2e5f06f7cc9..d163b01c42b 100644
--- a/docs/install/compile/windows-compile.md
+++ b/docs/install/compile/windows-compile.md
@@ -7,6 +7,7 @@
## 环境准备
* **Windows 7/8/10 专业版/企业版 (64bit)**
+
* **Python 版本 3.6/3.7/3.8/3.9/3.10 (64 bit)**
* **Visual Studio 2017/2019 社区版/专业版/企业版**
@@ -14,7 +15,7 @@
* 如果你的计算机硬件没有 NVIDIA® GPU,请编译 CPU 版本的 PaddlePaddle
-* 如果你的计算机硬件有 NVIDIA® GPU,推荐编译 GPU 版本的 PaddlePaddle,建议安装 **CUDA 10.1/10.2/11.0/11.1/11.2/11.6**
+* 如果你的计算机硬件有 NVIDIA® GPU,推荐编译 GPU 版本的 PaddlePaddle,建议安装 **CUDA 10.1/10.2/11.1/11.2/11.6/11.7**
## 本机编译过程
@@ -24,7 +25,7 @@
> **git**:官网下载[链接](https://github.com/git-for-windows/git/releases/download/v2.35.1.windows.2/Git-2.35.1.2-64-bit.exe),使用默认选项安装。
- > **python**:官网[链接](https://www.python.org/downloads/windows/),可选择 3.6/3.7/3.8/3.9 中任一版本的 Windows installer(64-bit)安装。安装时注意勾选 `Add Python 3.x to PATH`,将 Python 添加到环境变量中。
+ > **python**:官网[链接](https://www.python.org/downloads/windows/),可选择 3.6/3.7/3.8/3.9/3.10 中任一版本的 Windows installer(64-bit)安装。安装时注意勾选 `Add Python 3.x to PATH`,将 Python 添加到环境变量中。
> **Visual studio**:需根据 CUDA 版本选择对应的 Visual studio 版本,当只编译 CPU 版本或者 CUDA 版本 < 11.2 时,安装 VS2017;当 CUDA 版本 >= 11.2 时,安装 VS2019。官网[链接](https://visualstudio.microsoft.com/zh-hans/vs/older-downloads/),需要登录后下载,建议下载 Community 社区版。在安装时需要在工作负荷一栏中勾选 `使用 C++的桌面开发` 和 `通用 Windows 平台开发`,并在语言包一栏中选择 `英语`。
@@ -47,7 +48,13 @@
cd Paddle
```
-5. 创建名为 build 的目录并进入:
+5. 切换到 2.2 分支下进行编译:
+
+ ```
+ git checkout release/2.4
+ ```
+
+6. 创建名为 build 的目录并进入:
```
mkdir build
@@ -55,7 +62,7 @@
cd build
```
-6. 执行 cmake:
+7. 执行 cmake:
编译 CPU 版本的 Paddle:
@@ -83,19 +90,19 @@
cmake .. -GNinja -DWITH_GPU=ON -DPYTHON_EXECUTABLE=C:\Python38\python.exe -DPYTHON_INCLUDE_DIR=C:\Python38\include -DPYTHON_LIBRARY=C:\Python38\libs\python38.lib
```
-7. 执行编译:
+8. 执行编译:
```
ninja
```
-8. 编译成功后进入 `python\dist` 目录下找到生成的 `.whl` 包:
+9. 编译成功后进入 `python\dist` 目录下找到生成的 `.whl` 包:
```
cd python\dist
```
-9. 安装编译好的 `.whl` 包:
+10. 安装编译好的 `.whl` 包:
```
pip install(whl 包的名字)--force-reinstall
diff --git a/docs/install/compile/windows-compile_en.md b/docs/install/compile/windows-compile_en.md
index 62b9888e27f..3ca2518fe4c 100644
--- a/docs/install/compile/windows-compile_en.md
+++ b/docs/install/compile/windows-compile_en.md
@@ -3,8 +3,8 @@
## Environment preparation
* **Windows 7/8/10 Pro/Enterprise(64bit)**
-* **GPU Version support CUDA 10.1/10.2/11.0/11.1/11.2, and only support single GPU**
-* **Python version 3.6+/3.7+/3.8+/3.9+(64bit)**
+* **GPU Version support CUDA 10.1/10.2/11.1/11.2/11.6/11.7, and only support single GPU**
+* **Python version 3.6+/3.7+/3.8+/3.9+/3.10+(64bit)**
* **pip version 20.2.2 or above (64bit)**
* **Visual Studio 2017**
@@ -13,10 +13,11 @@
* If your computer doesn't have NVIDIA® GPU, please install CPU version of PaddlePaddle
* If your computer has NVIDIA® GPU, and the following conditions are met,GPU version of PaddlePaddle is recommended.
- * **CUDA toolkit 10.1/10.2 with cuDNN v7.6.5+**
- * **CUDA toolkit 11.0 with cuDNN v8.0.2+**
- * **CUDA toolkit 11.1 with cuDNN v8.1.1+**
+ * **CUDA toolkit 10.1/10.2 with cuDNN v7.6.5**
+ * **CUDA toolkit 11.1 with cuDNN v8.1.1**
* **CUDA toolkit 11.2 with cuDNN v8.2.1**
+ * **CUDA toolkit 11.6 with cuDNN v8.4.0**
+ * **CUDA toolkit 11.7 with cuDNN v8.4.1**
* **GPU's computing capability exceeds 3.5**
## Installation steps
@@ -65,13 +66,18 @@ There is one compilation methods in Windows system:
cd Paddle
```
-3. Switch to `develop` branch for compilation:
+3. Switch to a more stable release branch for compilation:
```
- git checkout develop
+ git checkout [name of the branch]
```
- Note: python3.6、python3.7 version started supporting from release/1.2, python3.8 version started supporting from release/1.8, python3.9 version started supporting from release/2.1
+ For example:
+ ```
+ git checkout release/2.4
+ ```
+
+ Note: python3.6、python3.7 version started supporting from release/1.2, python3.8 version started supporting from release/1.8, python3.9 version started supporting from release/2.1, python3.10 version started supporting from release/2.3
4. Create a directory called build and enter it:
diff --git a/docs/install/conda/fromconda.rst b/docs/install/conda/fromconda.rst
index 1a14f0b524f..478929ef7ce 100644
--- a/docs/install/conda/fromconda.rst
+++ b/docs/install/conda/fromconda.rst
@@ -2,7 +2,7 @@
**Conda 安装**
===========================
-.. toctree::
+.. toctree::
:maxdepth: 1
linux-conda.md
diff --git a/docs/install/conda/fromconda_en.rst b/docs/install/conda/fromconda_en.rst
index fb1eb259379..2b350a997b3 100644
--- a/docs/install/conda/fromconda_en.rst
+++ b/docs/install/conda/fromconda_en.rst
@@ -2,7 +2,7 @@
**Install via conda**
==============================
-.. toctree::
+.. toctree::
linux-conda_en.md
diff --git a/docs/install/conda/linux-conda.md b/docs/install/conda/linux-conda.md
index 485d28cbeda..aa85bf91a95 100644
--- a/docs/install/conda/linux-conda.md
+++ b/docs/install/conda/linux-conda.md
@@ -1,119 +1,134 @@
# Linux 下的 Conda 安装
-[Anaconda](https://www.anaconda.com/)是一个免费开源的 Python 和 R 语言的发行版本,用于计算科学,Anaconda 致力于简化包管理和部署。Anaconda 的包使用软件包管理系统 Conda 进行管理。Conda 是一个开源包管理系统和环境管理系统,可在 Windows、macOS 和 Linux 上运行。
+[Anaconda](https://www.anaconda.com/)是一个免费开源的 Python 和 R 语言的发行版本,用于计算科学,Anaconda 致力于简化包管理和部署。Anaconda 的包使用软件包管理系统 Conda 进行管理。Conda 是一个开源包管理系统和环境管理系统,可在 Windows、macOS 和 Linux 上运行。本文档为你介绍 Anaconda 安装方式,飞桨提供的 Anaconda 安装包支持分布式训练(多机多卡)、TensorRT 推理功能。
## 一、环境准备
-在进行 PaddlePaddle 安装之前请确保您的 Anaconda 软件环境已经正确安装。软件下载和安装参见 Anaconda 官网(https://www.anaconda.com/)。在您已经正确安装 Anaconda 的情况下请按照下列步骤安装 PaddlePaddle。
-
-* conda 版本 4.8.3+ (64 bit)
-
-
### 1.1 创建虚拟环境
#### 1.1.1 安装环境
-首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持以下五种 Python 安装环境。
-
-
-如果您想使用的 python 版本为 3.6:
+首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.6 - 3.10 版本的 Python 安装环境。
```
-conda create -n paddle_env python=3.6
+conda create -n paddle_env python=YOUR_PY_VER
```
-如果您想使用的 python 版本为 3.7:
+
+#### 1.1.2 进入 Anaconda 虚拟环境
```
-conda create -n paddle_env python=3.7
+conda activate paddle_env
```
-如果您想使用的 python 版本为 3.8:
-```
-conda create -n paddle_env python=3.8
-```
-如果您想使用的 python 版本为 3.9:
+### 1.2 其他环境检查
+
+#### 1.2.1 确认 Python 安装路径
+
+确认您的 conda 虚拟环境和需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。进入 Anaconda 的命令行终端,输入以下指令确认 Python 位置。
+
+
+输出 Python 路径的命令为:
+
```
-conda create -n paddle_env python=3.9
+which python3
```
+根据您的环境,您可能需要将说明中所有命令行中的 python3 替换为具体的 Python 路径
-#### 1.1.2 进入 Anaconda 虚拟环境
+
+
+#### 1.2.2 检查 Python 版本
+
+使用以下命令确认版本
```
-conda activate paddle_env
+python3 --version
```
-## 1.2 其他环境检查
+#### 1.2.3 检查系统环境
-确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构,目前 PaddlePaddle 不支持 arm64 架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64(或 x64、AMD64)"即可:
+确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64(或 x64、AMD64)"即可:
```
-python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
+python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
```
+
## 二、开始安装
本文档为您介绍 conda 安装方式
-
### 添加清华源(可选)
-对于国内用户无法连接到 Anaconda 官方源的可以按照以下命令添加清华源。
-
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
-conda config --set show_channel_urls yes
-```
+对于国内用户无法连接到 Anaconda 官方源的可以按照以下命令添加清华源:
+ ```
+ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
+ ```
+ ```
+ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
+ ```
+ ```
+ conda config --set show_channel_urls yes
+ ```
### 根据版本进行安装
-确定您的环境满足条件后可以开始安装了,选择下面您要安装的 PaddlePaddle
+选择下面您要安装的 PaddlePaddle
#### CPU 版的 PaddlePaddle
-如果您的计算机没有 NVIDIA® GPU 设备,请安装 CPU 版的 PaddlePaddle
+如果您的计算机没有 NVIDIA® GPU,请安装 CPU 版的 PaddlePaddle
```
-conda install paddlepaddle --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+conda install paddlepaddle==2.4.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
+
#### GPU 版的 PaddlePaddle
-如果您的计算机有 NVIDIA® GPU 设备
-* 如果您是使用 CUDA 10.1,cuDNN 7.6+,安装 GPU 版本的命令为:
+* 对于 `CUDA 10.2`,需要搭配 cuDNN 7.6.5(多卡环境下 NCCL>=2.7),安装命令为:
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=10.1 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
-* 如果您是使用 CUDA 10.2,cuDNN 7.6+,安装 GPU 版本的命令为:
+
+* 对于 `CUDA 11.2`,需要搭配 cuDNN 8.2.1(多卡环境下 NCCL>=2.7),安装命令为:
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
```
-* 如果您是使用 CUDA 11.2,cuDNN 8.1.1+,安装 GPU 版本的命令为:
+* 对于 `CUDA 11.6`,需要搭配 cuDNN 8.4.0(多卡环境下 NCCL>=2.7),安装命令为:
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.6 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
```
+* 对于 `CUDA 11.7`,需要搭配 cuDNN 8.4.1(多卡环境下 NCCL>=2.7),安装命令为:
+
+ ```
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
+ ```
+
+您可参考 NVIDIA 官方文档了解 CUDA 和 CUDNN 的安装流程和配置方法,请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
+
+
## **三、验证安装**
-安装完成后您可以使用 `python` 进入 python 解释器,输入`import paddle` ,再输入
+安装完成后您可以使用 `python3` 进入 python 解释器,输入`import paddle` ,再输入
`paddle.utils.run_check()`
如果出现`PaddlePaddle is installed successfully!`,说明您已成功安装。
diff --git a/docs/install/conda/linux-conda_en.md b/docs/install/conda/linux-conda_en.md
index 8162074b8db..c15884e486c 100644
--- a/docs/install/conda/linux-conda_en.md
+++ b/docs/install/conda/linux-conda_en.md
@@ -5,93 +5,60 @@
## Environmental preparation
-Before performing PaddlePaddle installation, please make sure that your Anaconda software environment is properly installed. For software download and installation, see Anaconda's official website (https://www.anaconda.com/). If you have installed Anaconda correctly, follow these steps to install PaddlePaddle.
-
-
-
### 1.1 Create Virtual Environment
#### 1.1.1 Create the Anaconda Virtual Environment
-Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports the following four Python installation environments.
-
-
-If you want to use python version 3.6:
-
-```
-conda create -n paddle_env python=3.6
-```
-
-If you want to use python version 3.7:
+Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.6 - 3.10.
```
-conda create -n paddle_env python=3.7
+conda create -n paddle_env python=YOUR_PY_VER
```
-If you want to use python version 3.8:
-```
-conda create -n paddle_env python=3.8
-```
-
-If you want to use python version 3.9:
-
-```
-conda create -n paddle_env python=3.9
-```
#### 1.1.2 Enter the Anaconda Virtual Environment
-for Windows
-
-```
-activate paddle_env
-```
-
-for macOS/Linux
-
```
conda activate paddle_env
```
-## 1.2 Confirm Other Environments
+### 1.2 Confirm Other Environments
Confirm that your conda virtual environment and the Python loaction which is preapared to install PaddlePaddle are where you expected them for your computer may have multiple Pythons environments. Enter Anaconda's command line terminal and enter the following command to confirm the Python location.
-1.2.1 Depending on your environment, you may need to replace python in all command lines in the instructions with specific Python path.
-
-In a Windows environment, the command to get the Python path is:
+#### 1.2.1 Confirm the installation path of python
-```
-where python
-```
+Depending on your environment, you may need to replace python3 in all command lines in the instructions with specific Python path.
-In a macOS/Linux environment, the command to get the Python path is:
+The command to get the Python path is:
```
-which python
+which python3
```
-1.2.2 Check the version of Python
+#### 1.2.2 Check the version of Python
-Use the following command to confirm it's version is 3.6/3.7/3.8/3.9
+Use the following command to confirm it's version
```
-python --version
+python3 --version
```
-1.2.3 Confirm that Python and pip are 64bit, and the processor architecture is x86_64 (or x64, Intel 64, AMD64) architecture. Currently PaddlePaddle does not support arm64 architecture. The first line below print "64bit", the second line prints "x86_64 (or x64, AMD64)."
+#### 1.2.3 Check the system environment
+
+Confirm that Python and pip are 64bit, and the processor architecture is x86_64 (or x64, Intel 64, AMD64) architecture. The first line below print "64bit", the second line prints "x86_64 (or x64, AMD64)."
```
-python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
+python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
```
@@ -100,52 +67,20 @@ python -c "import platform;print(platform.architecture()[0]);print(platform.mach
## INSTALLATION
-### Choose CPU/GPU
-
-* If your computer doesn't have NVIDIA® GPU, please install [the CPU Version of PaddlePaddle](#cpu)
-
-* If your computer has NVIDIA® GPU, please make sure that the following conditions are met and install [the GPU Version of PaddlePaddle](#gpu)
-
- * **CUDA toolkit 10.1/10.2 with cuDNN v7.6+(for multi card support, NCCL2.7 or higher)**
-
- * **CUDA toolkit 11.2 with cuDNN v8.1.1(for multi card support, NCCL2.7 or higher)**
-
- * **Hardware devices with GPU computing power over 3.5**
-
- You can refer to NVIDIA official documents for installation process and configuration method of CUDA and cudnn. Please refer to [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
-
-* If you need to use a multi-card environment, please make sure that you have installed nccl2 correctly, or install nccl2 according to the following instructions (here are the installation instructions of nccl2 under CUDA9 and cuDNN7. For more version installation information, please refer to NVIDIA [Official Website](https://developer.nvidia.com/nccl)):
-
- * **CentOS system can refer to the following commands**
-
- wget http://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/nvidia-machine-learning-repo-rhel7-1.0.0-1.x86_64.rpm
-
- ```
- rpm -i nvidia-machine-learning-repo-rhel7-1.0.0-1.x86_64.rpm
- ```
-
- ```
- yum update -y
- ```
+### Add Tsinghua source (optional)
- ```
- yum install -y libnccl-2.3.7-2+cuda9.0 libnccl-devel-2.3.7-2+cuda9.0 libnccl-static-2.3.7-2+cuda9.0
- ```
-
- * **Ubuntu system can refer to the following commands**
-
- ```
- wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
- ```
-
- ```
- dpkg -i nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
- ```
+For domestic users who cannot connect to the Anaconda official source, you can add Tsinghua source according to the following command.
- ```
- sudo apt-get install -y libnccl2=2.3.7-1+cuda9.0 libnccl-dev=2.3.7-1+cuda9.0
- ```
+```
+conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
+```
+```
+conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
+```
+```
+conda config --set show_channel_urls yes
+```
### Installation Step
@@ -154,56 +89,48 @@ You can choose the following version of PaddlePaddle to start installation:
-#### 2.1 CPU version of PaddlePaddle
+#### CPU Version of PaddlePaddle
+
+If your computer doesn't have NVIDIA® GPU, please install `the CPU Version of PaddlePaddle`
```
-conda install paddlepaddle --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+conda install paddlepaddle==2.4.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
-#### 2.2 GPU version of PaddlePaddle
+#### GPU Version of PaddlePaddle
-* If you are using CUDA 10.1,cuDNN 7.6+
+* If you are usingCUDA 10.2,cuDNN 7.6.5(for multi card support, NCCL>=2.7):
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=10.1 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
-* If you are usingCUDA 10.2,cuDNN 7.6+:
+* If you are using CUDA 11.2,cuDNN 8.2.1(for multi card support, NCCL>=2.7):
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
```
-* If you are using CUDA 11.2,cuDNN 8.1.1+:
+* If you are using CUDA 11.6,cuDNN 8.4.0(for multi card support, NCCL>=2.7):
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.6 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
```
+* If you are using CUDA 11.7,cuDNN 8.4.1(for multi card support, NCCL>=2.7):
+ ```
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
+ ```
-## Verify installation
-
-After the installation is complete, you can use `python` or `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
-
-If `PaddlePaddle is installed successfully!` appears, to verify that the installation was successful.
-
-
+You can refer to NVIDIA official documents for installation process and configuration method of CUDA and cudnn. Please refer to [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
-## Notice
-For domestic users who cannot connect to the Anaconda official source, you can add Tsinghua source according to the following command.
+## Verify installation
+After the installation is complete, you can use `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
-```
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
-```
-```
-conda config --set show_channel_urls yes
-```
+If `PaddlePaddle is installed successfully!` appears, to verify that the installation was successful.
diff --git a/docs/install/conda/macos-conda.md b/docs/install/conda/macos-conda.md
index 55e8981fc4f..9c801f20d62 100644
--- a/docs/install/conda/macos-conda.md
+++ b/docs/install/conda/macos-conda.md
@@ -1,92 +1,94 @@
-# macOS 下的 Conda 安装
+# MacOS 下的 Conda 安装
[Anaconda](https://www.anaconda.com/)是一个免费开源的 Python 和 R 语言的发行版本,用于计算科学,Anaconda 致力于简化包管理和部署。Anaconda 的包使用软件包管理系统 Conda 进行管理。Conda 是一个开源包管理系统和环境管理系统,可在 Windows、macOS 和 Linux 上运行。
## 一、环境准备
-在进行 PaddlePaddle 安装之前请确保您的 Anaconda 软件环境已经正确安装。软件下载和安装参见 Anaconda 官网(https://www.anaconda.com/)。在您已经正确安装 Anaconda 的情况下请按照下列步骤安装 PaddlePaddle。
-
-* macOS 版本 10.11/10.12/10.13/10.14 (64 bit) (不支持 GPU 版本)
-* conda 版本 4.8.3+ (64 bit)
-
### 1.1 创建虚拟环境
#### 1.1.1 安装环境
-首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持以下五种 Python 安装环境。
-
-
-如果您想使用的 python 版本为 3.6:
+首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.6 - 3.10 版本的 Python 安装环境。
```
-conda create -n paddle_env python=3.6
+conda create -n paddle_env python=YOUR_PY_VER
```
-如果您想使用的 python 版本为 3.7:
-```
-conda create -n paddle_env python=3.7
-```
+#### 1.1.2 进入 Anaconda 虚拟环境
-如果您想使用的 python 版本为 3.8:
```
-conda create -n paddle_env python=3.8
+conda activate paddle_env
```
-如果您想使用的 python 版本为 3.9:
-```
-conda create -n paddle_env python=3.9
-```
+### 1.2 其他环境检查
-#### 1.1.2 进入 Anaconda 虚拟环境
+#### 1.2.1 确认 Python 安装路径
+
+确认您的 conda 虚拟环境和需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。进入 Anaconda 的命令行终端,输入以下指令确认 Python 位置。
+
+输出 Python 路径的命令为:
```
-conda activate paddle_env
+which python3
```
+根据您的环境,您可能需要将说明中所有命令行中的 python3 替换为具体的 Python 路径
+
+
-## 1.2 其他环境检查
+#### 1.2.2 检查 Python 版本
-确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构,目前 PaddlePaddle 不支持 arm64 架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64(或 x64、AMD64)"即可:
+使用以下命令确认版本
```
-python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
+python3 --version
```
-## 二、开始安装
-本文档为您介绍 conda 安装方式
-### 添加清华源(可选)
+#### 1.2.3 检查系统环境
+
+确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构 或 arm64 架构(paddle 已原生支持 Mac M1 芯片):
-对于国内用户无法连接到 Anaconda 官方源的可以按照以下命令添加清华源。
```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
-conda config --set show_channel_urls yes
+python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
```
-### 首先请您选择您的版本
-* 目前在 macOS 环境仅支持 CPU 版 PaddlePaddle
+## 二、开始安装
+
+本文档为您介绍 conda 安装方式
+
+### 添加清华源(可选)
+
+* 对于国内用户无法连接到 Anaconda 官方源的可以按照以下命令添加清华源:
-### 根据版本进行安装
+ ```
+ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
+ ```
+ ```
+ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
+ ```
+ ```
+ conda config --set show_channel_urls yes
+ ```
-确定您的环境满足条件后可以开始安装了,选择下面您要安装的 PaddlePaddle
+### 安装 CPU 版 PaddlePaddle
-* 请参考如下命令安装:
+* 目前在 MacOS 环境仅支持 CPU 版 PaddlePaddle,请参考如下命令安装 Paddle:
```
- conda install paddlepaddle --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ conda install paddlepaddle==2.4.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
## **三、验证安装**
-安装完成后您可以使用 `python` 或 `python3` 进入 python 解释器,输入`import paddle` ,再输入
+安装完成后您可以使用 `python3` 进入 python 解释器,输入`import paddle` ,再输入
`paddle.utils.run_check()`
如果出现`PaddlePaddle is installed successfully!`,说明您已成功安装。
diff --git a/docs/install/conda/macos-conda_en.md b/docs/install/conda/macos-conda_en.md
index ddc6749fa4a..f87d7246e1e 100644
--- a/docs/install/conda/macos-conda_en.md
+++ b/docs/install/conda/macos-conda_en.md
@@ -1,4 +1,4 @@
-# Installation on macOS via Conda
+# Installation on MacOS via Conda
[Anaconda](https://www.anaconda.com/)is a free and open source distribution of Python and R for computational science. Anaconda is dedicated to simplifying package management and deployment. Anaconda's packages are managed using the package management system Conda. Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux.
@@ -6,96 +6,59 @@
## Environmental preparation
-Before performing PaddlePaddle installation, please make sure that your Anaconda software environment is properly installed. For software download and installation, see Anaconda's official website (https://www.anaconda.com/). If you have installed Anaconda correctly, follow these steps to install PaddlePaddle.
-
-* macOS version 10.11/10.12/10.13/10.14 (64 bit)(not support GPU version)
-* conda version 4.8.3+ (64 bit)
-
-
-
### 1.1 Create Virtual Environment
#### 1.1.1 Create the Anaconda Virtual Environment
-Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports the following four Python installation environments.
-
-
-If you want to use python version 3.6:
-
-```
-conda create -n paddle_env python=3.6
-```
-
-If you want to use python version 3.7:
-
-```
-conda create -n paddle_env python=3.7
-```
-
-If you want to use python version 3.8:
+Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.6 - 3.10.
```
-conda create -n paddle_env python=3.8
-```
-
-If you want to use python version 3.9:
-
-```
-conda create -n paddle_env python=3.9
+conda create -n paddle_env python=YOUR_PY_VER
```
#### 1.1.2 Enter the Anaconda Virtual Environment
-for Windows
-
-```
-activate paddle_env
-```
-
-for macOS/Linux
-
```
conda activate paddle_env
```
-## 1.2 Confirm Other Environments
+### 1.2 Confirm Other Environments
Confirm that your conda virtual environment and the Python loaction which is preapared to install PaddlePaddle are where you expected them for your computer may have multiple Pythons environments. Enter Anaconda's command line terminal and enter the following command to confirm the Python location.
-1.2.1 Depending on your environment, you may need to replace python in all command lines in the instructions with specific Python path.
+#### 1.2.1 Confirm the installation path of python
-In a Windows environment, the command to get the Python path is:
+Depending on your environment, you may need to replace python3 in all command lines in the instructions with specific Python path.
-```
-where python
-```
-
-In a macOS/Linux environment, the command to get the Python path is:
+The command to get the Python path is:
```
-which python
+which python3
```
-1.2.2 Check the version of Python
+#### 1.2.2 Check the version of Python
-Use the following command to confirm it's version is 3.6/3.7/3.8/3.9
+Use the following command to confirm it's version
```
-python --version
+python3 --version
```
-1.2.3 Confirm that Python and pip are 64bit, and the processor architecture is x86_64 (or x64, Intel 64, AMD64) architecture. Currently PaddlePaddle does not support arm64 architecture. The first line below print "64bit", the second line prints "x86_64 (or x64, AMD64)."
+#### 1.2.3 Check the system environment
+
+
+Confirm that Python and pip are 64bit, and the processor architecture is x86_64(or called x64、Intel 64、AMD64) or arm64 (PaddlePaddle already supports Mac M1):
```
-python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
+python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
```
@@ -104,31 +67,7 @@ python -c "import platform;print(platform.architecture()[0]);print(platform.mach
We will introduce conda installation here.
-### Choose CPU/GPU
-
-* Currently, only the CPU version of PaddlePaddle is supported in the macOS environment
-
-### Installation Step
-
-You can choose the following version of PaddlePaddle to start installation:
-
-* Please use the following command to install PaddlePaddle:
-
- ```
- conda install paddlepaddle --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
- ```
-
-
-## Verify installation
-
-After the installation is complete, you can use `python` or `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
-
-If `PaddlePaddle is installed successfully!` appears, to verify that the installation was successful.
-
-
-
-
-## Notice
+### Add Tsinghua source (optional)
For domestic users who cannot connect to the Anaconda official source, you can add Tsinghua source according to the following command.
@@ -142,3 +81,18 @@ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/m
```
conda config --set show_channel_urls yes
```
+
+### Install the CPU version of PaddlePaddle
+
+* Currently, only the CPU version of PaddlePaddle is supported in the MacOS environment. Please use the following command to install PaddlePaddle:
+
+ ```
+ conda install paddlepaddle==2.4.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ ```
+
+
+## Verify installation
+
+After the installation is complete, you can use `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
+
+If `PaddlePaddle is installed successfully!` appears, to verify that the installation was successful.
diff --git a/docs/install/conda/windows-conda.md b/docs/install/conda/windows-conda.md
index 231d7d76b9c..22ff9f315ff 100644
--- a/docs/install/conda/windows-conda.md
+++ b/docs/install/conda/windows-conda.md
@@ -1,118 +1,134 @@
# Windows 下的 Conda 安装
-[Anaconda](https://www.anaconda.com/)是一个免费开源的 Python 和 R 语言的发行版本,用于计算科学,Anaconda 致力于简化包管理和部署。Anaconda 的包使用软件包管理系统 Conda 进行管理。Conda 是一个开源包管理系统和环境管理系统,可在 Windows、macOS 和 Linux 上运行。
+[Anaconda](https://www.anaconda.com/)是一个免费开源的 Python 和 R 语言的发行版本,用于计算科学,Anaconda 致力于简化包管理和部署。Anaconda 的包使用软件包管理系统 Conda 进行管理。Conda 是一个开源包管理系统和环境管理系统,可在 Windows、macOS 和 Linux 上运行。本文档为你介绍 Anaconda 安装方式,飞桨提供的 Anaconda 安装包支持 TensorRT 推理功能。
## 一、环境准备
-在进行 PaddlePaddle 安装之前请确保您的 Anaconda 软件环境已经正确安装。软件下载和安装参见 Anaconda 官网(https://www.anaconda.com/)。在您已经正确安装 Anaconda 的情况下请按照下列步骤安装 PaddlePaddle。
-
-* Windows 7/8/10 专业版/企业版 (64bit)
-* conda 版本 4.8.3+ (64 bit)
### 1.1 创建虚拟环境
#### 1.1.1 安装环境
-首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持以下五种 Python 安装环境。
-
-
-如果您想使用的 python 版本为 3.6:
+首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.6 - 3.10 版本的 Python 安装环境。
```
-conda create -n paddle_env python=3.6
+conda create -n paddle_env python=YOUR_PY_VER
```
-如果您想使用的 python 版本为 3.7:
+
+#### 1.1.2 进入 Anaconda 虚拟环境
```
-conda create -n paddle_env python=3.7
+activate paddle_env
```
-如果您想使用的 python 版本为 3.8:
-```
-conda create -n paddle_env python=3.8
-```
-如果您想使用的 python 版本为 3.9:
+### 1.2 其他环境检查
+
+#### 1.2.1 确认 Python 安装路径
+
+确认您的 conda 虚拟环境和需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。进入 Anaconda 的命令行终端,输入以下指令确认 Python 位置。
+
+输出 Python 路径的命令为:
```
-conda create -n paddle_env python=3.9
+where python
```
-#### 1.1.2 进入 Anaconda 虚拟环境
+根据您的环境,您可能需要将说明中所有命令行中的 python 替换为具体的 Python 路径
+
+
+
+#### 1.2.2 检查 Python 版本
+
+使用以下命令确认版本
```
-conda activate paddle_env
+python --version
```
-## 1.2 其他环境检查
-确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构,目前 PaddlePaddle 不支持 arm64 架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64(或 x64、AMD64)"即可:
+#### 1.2.3 检查系统环境
+
+确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64(或 x64、AMD64)"即可:
+
```
python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
```
+
## 二、开始安装
本文档为您介绍 conda 安装方式
-
### 添加清华源(可选)
-对于国内用户无法连接到 Anaconda 官方源的可以按照以下命令添加清华源。
+对于国内用户无法连接到 Anaconda 官方源的可以按照以下命令添加清华源:
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
-conda config --set show_channel_urls yes
-```
+ ```
+ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
+ ```
+ ```
+ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
+ ```
+ ```
+ conda config --set show_channel_urls yes
+ ```
### 根据版本进行安装
-确定您的环境满足条件后可以开始安装了,选择下面您要安装的 PaddlePaddle
+选择下面您要安装的 PaddlePaddle
#### CPU 版的 PaddlePaddle
-如果您的计算机没有 NVIDIA® GPU 设备,请安装 CPU 版的 PaddlePaddle
+如果您的计算机没有 NVIDIA® GPU,请安装 CPU 版的 PaddlePaddle
```
-conda install paddlepaddle --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+conda install paddlepaddle==2.4.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
+
#### GPU 版的 PaddlePaddle
-如果您的计算机有 NVIDIA® GPU 设备
-* 如果您是使用 CUDA 10.1,cuDNN 7.6+,安装 GPU 版本的命令为:
+* 对于 `CUDA 10.2`,需要搭配 cuDNN 7.6.5,安装命令为:
+
+ ```
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ ```
+
+* 对于 `CUDA 11.2`,需要搭配 cuDNN 8.2.1,安装命令为:
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=10.1 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
```
-* 如果您是使用 CUDA 10.2,cuDNN 7.6+,安装 GPU 版本的命令为:
+* 对于 `CUDA 11.6`,需要搭配 cuDNN 8.4.0,安装命令为:
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.6 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
```
-* 如果您是使用 CUDA 11.2,cuDNN 8.1.1+,安装 GPU 版本的命令为:
+* 对于 `CUDA 11.7`,需要搭配 cuDNN 8.4.1,安装命令为:
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
```
+您可参考 NVIDIA 官方文档了解 CUDA 和 CUDNN 的安装流程和配置方法,请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
+
## **三、验证安装**
-安装完成后您可以使用 `python` 进入 python 解释器,输入`import paddle` ,再输入
+安装完成后您可以使用 `python` 或 `python3` 进入 python 解释器,输入`import paddle` ,再输入
`paddle.utils.run_check()`
如果出现`PaddlePaddle is installed successfully!`,说明您已成功安装。
diff --git a/docs/install/conda/windows-conda_en.md b/docs/install/conda/windows-conda_en.md
index 8a73dcfec3d..23c93671be3 100644
--- a/docs/install/conda/windows-conda_en.md
+++ b/docs/install/conda/windows-conda_en.md
@@ -6,86 +6,45 @@
## Environmental preparation
-Before performing PaddlePaddle installation, please make sure that your Anaconda software environment is properly installed. For software download and installation, see Anaconda's official website (https://www.anaconda.com/). If you have installed Anaconda correctly, follow these steps to install PaddlePaddle.
-
-* Windows 7/8/10 Pro/Enterprise (64bit)
- * GPU Version supportCUDA 10.1/10.2/11.2,且仅支持单卡
-* conda version 4.8.3+ (64 bit)
-
-
-
### 1.1 Create Virtual Environment
#### 1.1.1 Create the Anaconda Virtual Environment
-Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports the following four Python installation environments.
-
-
-If you want to use python version 3.6:
-
-```
-conda create -n paddle_env python=3.6
-```
-
-If you want to use python version 3.7:
-
-```
-conda create -n paddle_env python=3.7
-```
-
-If you want to use python version 3.8:
-
-```
-conda create -n paddle_env python=3.8
-```
-
-If you want to use python version 3.9:
+Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.6 - 3.10.
```
-conda create -n paddle_env python=3.9
+conda create -n paddle_env python=YOUR_PY_VER
```
#### 1.1.2 Enter the Anaconda Virtual Environment
-for Windows
-
```
activate paddle_env
```
-for macOS/Linux
-
-```
-conda activate paddle_env
-```
-
-## 1.2 Confirm Other Environments
+### 1.2 Confirm Other Environments
Confirm that your conda virtual environment and the Python loaction which is preapared to install PaddlePaddle are where you expected them for your computer may have multiple Pythons environments. Enter Anaconda's command line terminal and enter the following command to confirm the Python location.
-1.2.1 Depending on your environment, you may need to replace python in all command lines in the instructions with specific Python path.
-
-In a Windows environment, the command to get the Python path is:
+#### 1.2.1 Confirm the installation path of python
-```
-where python
-```
+Depending on your environment, you may need to replace python in all command lines in the instructions with specific Python path.
-In a macOS/Linux environment, the command to get the Python path is:
+The command to get the Python path is:
```
-which python
+where python
```
-1.2.2 Check the version of Python
+#### 1.2.2 Check the version of Python
-Use the following command to confirm it's version is 3.6/3.7/3.8/3.9
+Use the following command to confirm it's version
```
python --version
@@ -93,7 +52,9 @@ python --version
-1.2.3 Confirm that Python and pip are 64bit, and the processor architecture is x86_64 (or x64, Intel 64, AMD64) architecture. Currently PaddlePaddle does not support arm64 architecture. The first line below print "64bit", the second line prints "x86_64 (or x64, AMD64)."
+#### 1.2.3 Check the system environment
+
+Confirm that Python and pip are 64bit, and the processor architecture is x86_64 (or x64, Intel 64, AMD64) architecture. The first line below print "64bit", the second line prints "x86_64 (or x64, AMD64)."
```
@@ -108,19 +69,20 @@ python -c "import platform;print(platform.architecture()[0]);print(platform.mach
We will introduce conda installation here.
-### Choose CPU/GPU
-
-* If your computer doesn't have NVIDIA® GPU, please install [the CPU Version of PaddlePaddle](#cpu)
-
-* If your computer has NVIDIA® GPU, please make sure that the following conditions are met and install [the GPU Version of PaddlePaddle](#gpu)
-
- * **CUDA toolkit 10.1/10.2 with cuDNN v7.6+**
+### Add Tsinghua source (optional)
- * **CUDA toolkit 11.2 with cuDNN v8.1.1(**
+For domestic users who cannot connect to the Anaconda official source, you can add Tsinghua source according to the following command.
- * **Hardware devices with GPU computing power over 3.5**
- You can refer to NVIDIA official documents for installation process and configuration method of CUDA and cudnn. Please refer to [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
+```
+conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
+```
+```
+conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
+```
+```
+conda config --set show_channel_urls yes
+```
### Installation Step
@@ -129,56 +91,49 @@ You can choose the following version of PaddlePaddle to start installation:
-#### 2.1 CPU version of PaddlePaddle
+#### CPU Version of PaddlePaddle
+
+If your computer doesn't have NVIDIA® GPU, please install `the CPU Version of PaddlePaddle`
```
-conda install paddlepaddle --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+conda install paddlepaddle==2.4.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
-#### 2.2 GPU version of PaddlePaddle
+#### GPU Version of PaddlePaddle
-* If you are using CUDA 10.1,cuDNN 7.6+
+* If you are usingCUDA 10.2,cuDNN 7.6.5:
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=10.1 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
-* If you are usingCUDA 10.2,cuDNN 7.6+:
+* If you are using CUDA 11.2,cuDNN 8.2.1:
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
```
-* If you are using CUDA 11.2,cuDNN 8.1.1+:
+* If you are using CUDA 11.6,cuDNN 8.4.0:
```
- conda install paddlepaddle-gpu==2.1.0 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.6 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
```
+* If you are using CUDA 11.7,cuDNN 8.4.1:
-## Verify installation
-
-After the installation is complete, you can use `python` or `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
-
-If `PaddlePaddle is installed successfully!` appears, to verify that the installation was successful.
-
+ ```
+ conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
+ ```
+You can refer to NVIDIA official documents for installation process and configuration method of CUDA and cudnn. Please refer to [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
-## Notice
-For domestic users who cannot connect to the Anaconda official source, you can add Tsinghua source according to the following command.
+## Verify installation
+After the installation is complete, you can use `python` or `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
-```
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
-```
-```
-conda config --set show_channel_urls yes
-```
+If `PaddlePaddle is installed successfully!` appears, to verify that the installation was successful.
diff --git a/docs/install/docker/fromdocker.rst b/docs/install/docker/fromdocker.rst
index 5f80c8cd003..ddcfa65b9e1 100644
--- a/docs/install/docker/fromdocker.rst
+++ b/docs/install/docker/fromdocker.rst
@@ -2,7 +2,8 @@
**Docker 安装**
===========================
-.. toctree::
+.. toctree::
:maxdepth: 1
+ linux-docker.md
macos-docker.md
diff --git a/docs/install/docker/fromdocker_en.rst b/docs/install/docker/fromdocker_en.rst
index d44f176367f..06206eb36e6 100644
--- a/docs/install/docker/fromdocker_en.rst
+++ b/docs/install/docker/fromdocker_en.rst
@@ -2,7 +2,8 @@
**Install via docker**
==============================
-.. toctree::
+.. toctree::
+ linux-docker_en.md
macos-docker_en.md
diff --git a/docs/install/docker/linux-docker.md b/docs/install/docker/linux-docker.md
index c65434780bd..60d78e7fbf1 100644
--- a/docs/install/docker/linux-docker.md
+++ b/docs/install/docker/linux-docker.md
@@ -1,120 +1,132 @@
# **Linux 下的 Docker 安装**
-[Docker](https://docs.docker.com/install/)是一个开源的应用容器引擎。使用 Docker,既可以将 PaddlePaddle 的安装&使用与系统环境隔离,也可以与主机共享 GPU、网络等资源
+[Docker](https://docs.docker.com/install/)是一个开源的应用容器引擎。使用 Docker,既可以将 PaddlePaddle 的安装&使用与系统环境隔离,也可以与主机共享 GPU、网络等资源。
+以下 Docker 安装与使用流程中,docker 里已经安装好了特定版本的 PaddlePaddle。
## 环境准备
-- 目前支持的系统类型,请见[安装说明](../index_cn.html),请注意目前暂不支持在 CentOS 6 使用 Docker
+- 目前支持的系统类型,请见[安装说明](/documentation/docs/zh/install/index_cn.html),请注意目前暂不支持在 CentOS 6 使用 Docker
-- 在本地主机上[安装 Docker](https://hub.docker.com/search/?type=edition&offering=community)
+- 在本地主机上[安装 Docker](https://docs.docker.com/engine/install/)
- 如需在 Linux 开启 GPU 支持,请[安装 nvidia-docker](https://github.com/NVIDIA/nvidia-docker)
-## 安装步骤
-
-1. 拉取 PaddlePaddle 镜像
+- 镜像中 Python 版本为 3.7
- * CPU 版的 PaddlePaddle:
- ```
- docker pull registry.baidubce.com/paddlepaddle/paddle:[版本号]
- ```
+## 安装步骤
- * CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
- ```
- docker pull registry.baidubce.com/paddlepaddle/paddle:[版本号]-jupyter
- ```
+### 1. 拉取 PaddlePaddle 镜像
- * GPU 版的 PaddlePaddle:
- ```
- docker pull registry.baidubce.com/paddlepaddle/paddle:[版本号]-gpu-cuda10.2-cudnn7
- ```
+对于国内用户,因为网络问题下载 docker 比较慢时,可使用百度提供的镜像:
- 如果您的机器不在中国大陆地区,可以直接从 DockerHub 拉取镜像:
+* CPU 版的 PaddlePaddle:
+ ```
+ docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2
+ ```
- * CPU 版的 PaddlePaddle:
- ```
- docker pull paddlepaddle/paddle:[版本号]
- ```
+* CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
+ ```
+ docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter
+ ```
- * CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
- ```
- docker pull paddlepaddle/paddle:[版本号]-jupyter
- ```
+* GPU 版的 PaddlePaddle:
+ ```
+ nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda10.2-cudnn7.6-trt7.0
+ ```
+ ```
+ nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda11.2-cudnn8.2-trt8.0
+ ```
+ ```
+ nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda11.7-cudnn8.4-trt8.4
+ ```
- * GPU 版的 PaddlePaddle:
- ```
- docker pull paddlepaddle/paddle:[版本号]-gpu-cuda10.2-cudnn7
- ```
+如果您的机器不在中国大陆地区,可以直接从 DockerHub 拉取镜像:
- 在`:`后请您填写 PaddlePaddle 版本号,例如当前版本`2.1.0`,更多请见[镜像简介](#dockers)。
+* CPU 版的 PaddlePaddle:
+ ```
+ docker pull paddlepaddle/paddle:2.4.2
+ ```
- 上例中,`cuda10.2-cudnn7` 也仅作示意用,表示安装 GPU 版的镜像。如果您还想安装其他 cuda/cudnn 版本的镜像,可以将其替换成`cuda11.2-cudnn8`等。
+* CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
+ ```
+ docker pull paddlepaddle/paddle:2.4.2-jupyter
+ ```
- 您可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。
+* GPU 版的 PaddlePaddle:
+ ```
+ nvidia-docker pull paddlepaddle/paddle:2.4.2-gpu-cuda10.2-cudnn7.6-trt7.0
+ ```
+ ```
+ nvidia-docker pull paddlepaddle/paddle:2.4.2-gpu-cuda11.2-cudnn8.2-trt8.0
+ ```
+ ```
+ nvidia-docker pull paddlepaddle/paddle:2.4.2-gpu-cuda11.7-cudnn8.4-trt8.4
+ ```
-2. 构建、进入 Docker 容器
+您还可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取更多镜像。
- * 使用 CPU 版本的 PaddlePaddle:
+### 2. 构建并进入 docker 容器
+* 使用 CPU 版本的 PaddlePaddle:
- ```
- docker run --name [Name of container] -it -v $PWD:/paddle /bin/bash
- ```
- > --name [Name of container] 设定 Docker 的名称;
+ ```
+ docker run --name paddle_docker -it -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.2 /bin/bash
+ ```
+ - `--name paddle_docker`:设定 Docker 的名称,`paddle_docker` 是自己设置的名称;
- > -it 参数说明容器已和本机交互式运行;
+ - `-it`:参数说明容器已和本机交互式运行;
- > -v $PWD:/paddle 指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /paddle 目录;
- > `` 指定需要使用的 image 名称,您可以通过`docker images`命令查看;/bin/bash 是在 Docker 中要执行的命令
+ - `-v $PWD:/paddle`:指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /paddle 目录;
+ - `registry.baidubce.com/paddlepaddle/paddle:2.4.2`:指定需要使用的 image 名称,您可以通过`docker images`命令查看;/bin/bash 是在 Docker 中要执行的命令
- * 使用 CPU 版本的 PaddlePaddle,且镜像中预装好了 jupyter:
- ```
- mkdir ./jupyter_docker
- ```
- ```
- chmod 777 ./jupyter_docker
- ```
- ```
- cd ./jupyter_docker
- ```
- ```
- docker run -p 80:80 --rm --env USER_PASSWD=[password you set] -v $PWD:/home/paddle
- ```
+* 使用 CPU 版本的 PaddlePaddle,且镜像中预装好了 jupyter:
- > --rm 关闭容器后删除容器;
+ ```
+ mkdir ./jupyter_docker
+ ```
+ ```
+ chmod 777 ./jupyter_docker
+ ```
+ ```
+ cd ./jupyter_docker
+ ```
+ ```
+ docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter
+ ```
+ - `--rm`:关闭容器后删除容器;
- > --env USER_PASSWD=[password you set] 为 jupyter 设置登录密码,[password you set] 是自己设置的密码;
+ - `--env USER_PASSWD="password you set"`:为 jupyter 设置登录密码,`password you set` 是自己设置的密码;
- > -v $PWD:/home/paddle 指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /home/paddle 目录;
- > `` 指定需要使用的 image 名称,您可以通过`docker images`命令查看
+ - `-v $PWD:/home/paddle`:指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /home/paddle 目录;
- * 使用 GPU 版本的 PaddlePaddle:
+ - `registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter`:指定需要使用的 image 名称,您可以通过`docker images`命令查看
+* 使用 GPU 版本的 PaddlePaddle:
- ```
- nvidia-docker run --name [Name of container] -it -v $PWD:/paddle /bin/bash
- ```
+ ```
+ nvidia-docker run --name paddle_docker -it -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda10.2-cudnn7.6-trt7.0 /bin/bash
+ ```
- > --name [Name of container] 设定 Docker 的名称;
+ - `--name paddle_docker`:设定 Docker 的名称,`paddle_docker` 是自己设置的名称;
- > -it 参数说明容器已和本机交互式运行;
+ - `-it`:参数说明容器已和本机交互式运行;
- > -v $PWD:/paddle 指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /paddle 目录;
+ - `-v $PWD:/paddle`:指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /paddle 目录;
- > `` 指定需要使用的 image 名称,您可以通过`docker images`命令查看;/bin/bash 是在 Docker 中要执行的命令
+ - `registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda10.2-cudnn7.6-trt7.0`:指定需要使用的 image 名称,如果您希望使用 CUDA 11.2 或 CUDA 11.7 的镜像,也可以将其替换成`registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda11.2-cudnn8.2-trt8.0` 或 `registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda11.7-cudnn8.4-trt8.4`。您可以通过`docker images`命令查看镜像。/bin/bash 是在 Docker 中要执行的命令
@@ -122,7 +134,7 @@
-### **镜像简介**
+## **镜像简介**
@@ -133,20 +145,24 @@
- registry.baidubce.com/paddlepaddle/paddle:2.1.0 |
- 安装了 2.1.0 版本 paddle 的 CPU 镜像 |
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2 |
+ 安装了 2.4.2 版本 paddle 的 CPU 镜像 |
- registry.baidubce.com/paddlepaddle/paddle:2.1.0-jupyter |
- 安装了 2.1.0 版本 paddle 的 CPU 镜像,且镜像中预装好了 jupyter,启动 docker 即运行 jupyter 服务 |
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter |
+ 安装了 2.4.2 版本 paddle 的 CPU 镜像,且镜像中预装好了 jupyter,启动 docker 即运行 jupyter 服务 |
- registry.baidubce.com/paddlepaddle/paddle:2.1.0-gpu-cuda11.2-cudnn8 |
- 安装了 2.1.0 版本 paddle 的 GPU 镜像,cuda 版本为 11.2,cudnn 版本为 8.1 |
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda11.7-cudnn8.4-trt8.4 |
+ 安装了 2.4.2 版本 paddle 的 GPU 镜像,cuda 版本为 11.7,cudnn 版本为 8.4,trt 版本为 8.4 |
-
- registry.baidubce.com/paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 |
- 安装了 2.1.0 版本 paddle 的 GPU 镜像,cuda 版本为 10.2,cudnn 版本为 7 |
+
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda11.2-cudnn8.2-trt8.0 |
+ 安装了 2.4.2 版本 paddle 的 GPU 镜像,cuda 版本为 11.2,cudnn 版本为 8.2,trt 版本为 8.0 |
+
+
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda10.2-cudnn7.6-trt7.0 |
+ 安装了 2.4.2 版本 paddle 的 GPU 镜像,cuda 版本为 10.2,cudnn 版本为 7.6,trt 版本为 7.0 |
@@ -154,22 +170,18 @@
您可以在 [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) 中找到 PaddlePaddle 的各个发行的版本的 docker 镜像。
-### 注意事项
-
-* 镜像中 Python 版本为 3.7
-
-### 补充说明
+## 补充说明
* 当您需要第二次进入 Docker 容器中,使用如下命令:
启动之前创建的容器
```
- docker start [Name of container]
+ docker start
```
进入启动的容器
```
- docker attach [Name of container]
+ docker attach
```
* 如您是 Docker 新手,您可以参考互联网上的资料学习,例如[Docker 教程](http://www.runoob.com/docker/docker-hello-world.html)
@@ -188,4 +200,4 @@
pip uninstall paddlepaddle-gpu
```
-或通过`docker rm [Name of container]`来直接删除 Docker 容器
+或通过`docker rm `来直接删除 Docker 容器
diff --git a/docs/install/docker/linux-docker_en.md b/docs/install/docker/linux-docker_en.md
index 1bb0141440c..7a33f057f75 100644
--- a/docs/install/docker/linux-docker_en.md
+++ b/docs/install/docker/linux-docker_en.md
@@ -1,128 +1,142 @@
# **Install on Linux via Docker**
-[Docker](https://docs.docker.com/install/) is an open source application container engine. Using docker, you can not only isolate the installation and use of paddlepaddle from the system environment, but also share GPU, network and other resources with the host
+[Docker](https://docs.docker.com/install/) is an open source application container engine. Using docker, you can not only isolate the installation and use of paddlepaddle from the system environment, but also share GPU, network and other resources with the host.
+In the following Docker installation and use process, a specific version of PaddlePaddle has been installed in docker.
## Environment preparation
-- Currently supported system types, please see [Installation instruction](../index_en.html), please note that Docker is not currently supported in CentOS 6
+- Currently supported system types, please see [Installation instruction](/documentation/docs/en/install/index_en.html), please note that Docker is not currently supported in CentOS 6
-- On the local host [Install Docker](https://hub.docker.com/search/?type=edition&offering=community)
+- On the local host [Install Docker](https://docs.docker.com/engine/install/)
- To enable GPU support on Linux, please [Install nvidia-docker](https://github.com/NVIDIA/nvidia-docker)
+- Python version in the image is 3.7
+
## Installation steps
-1. Pull PaddlePaddle image
+### 1. Pull PaddlePaddle image
- * CPU version of PaddlePaddle:
- ```
- docker pull registry.baidubce.com/paddlepaddle/paddle:[version number]
- ```
+For domestic users, when downloading docker is slow due to network problems, you can use the mirror provided by Baidu:
- * CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
- ```
- docker pull registry.baidubce.com/paddlepaddle/paddle:[version number]-jupyter
- ```
+* CPU version of PaddlePaddle:
+ ```
+ docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2
+ ```
- * GPU version of PaddlePaddle:
- ```
- docker pull registry.baidubce.com/paddlepaddle/paddle:[version number]-gpu-cuda10.2-cudnn7
- ```
+* CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
+ ```
+ docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter
+ ```
- If your machine is not in mainland China, you can pull the image directly from DockerHub:
+* GPU version of PaddlePaddle:
+ ```
+ nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda10.2-cudnn7.6-trt7.0
+ ```
+ ```
+ nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda11.2-cudnn8.2-trt8.0
+ ```
+ ```
+ nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda11.7-cudnn8.4-trt8.4
+ ```
- * CPU version of PaddlePaddle:
- ```
- docker pull paddlepaddle/paddle:[version number]
- ```
+If your machine is not in mainland China, you can pull the image directly from DockerHub:
- * CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
- ```
- docker pull paddlepaddle/paddle:[version number]-jupyter
- ```
+* CPU version of PaddlePaddle:
+ ```
+ docker pull paddlepaddle/paddle:2.4.2
+ ```
- * GPU version of PaddlePaddle:
- ```
- docker pull paddlepaddle/paddle:[version number]-gpu-cuda10.2-cudnn7
- ```
+* CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
+ ```
+ docker pull paddlepaddle/paddle:2.4.2-jupyter
+ ```
- After `:`, please fill in the PaddlePaddle version number, such as the current version `2.1.0`. For more details, please refer to [image profile](#dockers).
+* GPU version of PaddlePaddle:
+ ```
+ nvidia-docker pull paddlepaddle/paddle:2.4.2-gpu-cuda10.2-cudnn7.6-trt7.0
+ ```
+ ```
+ nvidia-docker pull paddlepaddle/paddle:2.4.2-gpu-cuda11.2-cudnn8.2-trt8.0
+ ```
+ ```
+ nvidia-docker pull paddlepaddle/paddle:2.4.2-gpu-cuda11.7-cudnn8.4-trt8.4
+ ```
- In the above example, `cuda10.2-cudnn7` is only for illustration, indicating that the GPU version of the image is installed. If you want to install another `cuda/cudnn` version of the image, you can replace it with `cuda11.2-cudnn8` etc.
+You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get more images.
- You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that matches your machine.
+### 2. Build and enter Docker container
-2. Build and enter Docker container
+* Use CPU version of PaddlePaddle:
- * Use CPU version of PaddlePaddle:
+ ```
+ docker run --name paddle_docker -it -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.2 /bin/bash
+ ```
- ```
- docker run --name [Name of container] -it -v $PWD:/paddle /bin/bash
- ```
+ - `--name paddle_docker`: set name of Docker, `paddle_docker` is name of docker you set;
- > --name [Name of container] set name of Docker;
+ - `-it`: The parameter indicates that the container has been operated interactively with the local machine;
- > -it The parameter indicates that the container has been operated interactively with the local machine;
+ - `-v $PWD:/paddle`: Specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container;
- > -v $PWD:/paddle specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container;
+ - `registry.baidubce.com/paddlepaddle/paddle:2.4.2`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
- > `` Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
+* Use GPU version of PaddlePaddle:
- * Use GPU version of PaddlePaddle:
+ ```
+ nvidia-docker run --name paddle_docker -it -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda10.2-cudnn7.6-trt7.0 /bin/bash
+ ```
- ```
- nvidia-docker run --name [Name of container] -it -v $PWD:/paddle /bin/bash
- ```
+ - `--name paddle_docker`: set name of Docker, `paddle_docker` is name of docker you set;
- > --name [Name of container] set name of Docker;
+ - `-it`: The parameter indicates that the container has been operated interactively with the local machine;
- > -it The parameter indicates that the container has been operated interactively with the local machine;
+ - `-v $PWD:/paddle`: Specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container;
- > -v $PWD:/paddle specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container;
+ - `registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda10.2-cudnn7.6-trt7.0`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
- > `` Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
- * Use CPU version of PaddlePaddle:
+* Use CPU version of PaddlePaddle with jupyter:
- ```
- mkdir ./jupyter_docker
- ```
- ```
- chmod 777 ./jupyter_docker
- ```
- ```
- cd ./jupyter_docker
- ```
- ```
- docker run -p 80:80 --rm --env USER_PASSWD=[password you set] -v $PWD:/home/paddle
- ```
+ ```
+ mkdir ./jupyter_docker
+ ```
+ ```
+ chmod 777 ./jupyter_docker
+ ```
+ ```
+ cd ./jupyter_docker
+ ```
+ ```
+ docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter
+ ```
- > --rm Delete the container after closing it;
+ - `--rm`: Delete the container after closing it;
- > --env USER_PASSWD=[password you set] Set the login password for jupyter, [password you set] is the password you set;
+ - `--env USER_PASSWD="password you set"`: Set the login password for jupyter, `password you set` is the password you set;
- > -v $PWD:/home/paddle Specifies to mount the current path (the PWD variable will be expanded to the absolute path of the current path) to the /home/paddle directory inside the container;
+ - `-v $PWD:/home/paddle`: Specifies to mount the current path (the PWD variable will be expanded to the absolute path of the current path) to the /home/paddle directory inside the container;
- > `` Specify the name of the image to be used, you can view it through the `docker images` command
+ - `registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter`: Specify the name of the image to be used, you can view it through the `docker images` command
Now you have successfully used Docker to install PaddlePaddle. For more information about using Docker, see[Docker official documents](https://docs.docker.com)
-### **Introduction to mirror images**
+## **Introduction to mirror images**
@@ -133,20 +147,24 @@ Now you have successfully used Docker to install PaddlePaddle. For more informat
- registry.baidubce.com/paddlepaddle/paddle:2.1.0 |
- CPU image with 2.1.0 version of paddle installed |
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2 |
+ CPU image with 2.4.2 version of paddle installed |
- registry.baidubce.com/paddlepaddle/paddle:2.1.0-jupyter |
- CPU image of paddle version 2.1.0 is installed, and jupyter is pre-installed in the image. Start the docker to run the jupyter service |
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter |
+ CPU image of paddle version 2.4.2 is installed, and jupyter is pre-installed in the image. Start the docker to run the jupyter service |
- registry.baidubce.com/paddlepaddle/paddle:2.1.0-gpu-cuda11.2-cudnn8 |
- GPU image of paddle version 2.1.0 is installed, cuda version is 11.2, cudnn version is 8.1 |
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda11.7-cudnn8.4-trt8.4 |
+ GPU image of paddle version 2.4.2 is installed, cuda version is 11.7, cudnn version is 8.4, trt version is 8.4 |
-
- registry.baidubce.com/paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 |
- GPU image of paddle version 2.1.0 is installed, cuda version is 10.2, cudnn version is 7 |
+
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda11.2-cudnn8.2-trt8.0 |
+ GPU image of paddle version 2.4.2 is installed, cuda version is 11.2, cudnn version is 8.2, trt version is 8.0 |
+
+
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2-gpu-cuda10.2-cudnn7.6-trt7.0 |
+ GPU image of paddle version 2.4.2 is installed, cuda version is 10.2, cudnn version is 7.6, trt version is 7.0 |
@@ -155,22 +173,18 @@ Now you have successfully used Docker to install PaddlePaddle. For more informat
You can find the docker mirroring of the published versions of PaddlePaddle in [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/).
-### Note
-
-* Python version in the image is 3.7
-
-### 补充说明
+## Supplement
* When you need to enter the docker container for the second time, use the following command:
Container created before startup
```
- docker start [Name of container]
+ docker start
```
Enter the starting container
```
- docker attach [Name of container]
+ docker attach
```
* If you are a newcomer to Docker, you can refer to the materials on the Internet for learning, such as [Docker tutorial](http://www.runoob.com/docker/docker-hello-world.html)
@@ -189,4 +203,4 @@ After entering the Docker container, execute the following command:
pip uninstall paddlepaddle-gpu
```
-Or delete the docker container directly through `docker rm [Name of container]`
+Or delete the docker container directly through `docker rm `
diff --git a/docs/install/docker/macos-docker.md b/docs/install/docker/macos-docker.md
index b967f944cac..9ed0353b91d 100644
--- a/docs/install/docker/macos-docker.md
+++ b/docs/install/docker/macos-docker.md
@@ -1,85 +1,90 @@
-# **macOS 下的 Docker 安装**
+# **MacOS 下的 Docker 安装**
-[Docker](https://docs.docker.com/install/)是一个开源的应用容器引擎。使用 Docker,既可以将 PaddlePaddle 的安装&使用与系统环境隔离,也可以与主机共享 GPU、网络等资源
+[Docker](https://docs.docker.com/install/)是一个开源的应用容器引擎。使用 Docker,既可以将 PaddlePaddle 的安装&使用与系统环境隔离,也可以与主机共享 GPU、网络等资源。
+以下 Docker 安装与使用流程中,docker 里已经安装好了特定版本的 PaddlePaddle。
## 环境准备
-- macOS 版本 10.11/10.12/10.13/10.14 (64 bit) (不支持 GPU 版本)
+- MacOS 版本 10.x/11.x (64 bit) (不支持 GPU 版本)
-- 在本地主机上[安装 Docker](https://hub.docker.com/search/?type=edition&offering=community)
+- 在本地主机上[安装 Docker](https://docs.docker.com/engine/install/)
+
+- 镜像中 Python 版本为 3.7
## 安装步骤
-1. 拉取 PaddlePaddle 镜像
+### 1. 拉取 PaddlePaddle 镜像
- * CPU 版的 PaddlePaddle:
- ```
- docker pull registry.baidubce.com/paddlepaddle/paddle:[版本号]
- ```
+对于国内用户,因为网络问题下载 docker 比较慢时,可使用百度提供的镜像:
- * CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
- ```
- docker pull registry.baidubce.com/paddlepaddle/paddle:[版本号]-jupyter
- ```
+* CPU 版的 PaddlePaddle:
+ ```
+ docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2
+ ```
- 如果您的机器不在中国大陆地区,可以直接从 DockerHub 拉取镜像:
+* CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
+ ```
+ docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter
+ ```
- * CPU 版的 PaddlePaddle:
- ```
- docker pull paddlepaddle/paddle:[版本号]
- ```
+如果您的机器不在中国大陆地区,可以直接从 DockerHub 拉取镜像:
- * CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
- ```
- docker pull paddlepaddle/paddle:[版本号]-jupyter
- ```
+* CPU 版的 PaddlePaddle:
+ ```
+ docker pull paddlepaddle/paddle:2.4.2
+ ```
- 在`:`后请您填写 PaddlePaddle 版本号,您可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。
+* CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
+ ```
+ docker pull paddlepaddle/paddle:2.4.2-jupyter
+ ```
-2. 构建、进入 Docker 容器
+您还可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取更多镜像。
- * 使用 CPU 版本的 PaddlePaddle:
+### 2. 构建并进入 docker 容器
+* 使用 CPU 版本的 PaddlePaddle:
- ```
- docker run --name [Name of container] -it -v $PWD:/paddle /bin/bash
- ```
- > --name [Name of container] 设定 Docker 的名称;
+ ```
+ docker run --name paddle_docker -it -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.2 /bin/bash
+ ```
+
+ - `--name paddle_docker`:设定 Docker 的名称,`paddle_docker` 是自己设置的名称;
- > -it 参数说明容器已和本机交互式运行;
+ - `-it`:参数说明容器已和本机交互式运行;
- > -v $PWD:/paddle 指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /paddle 目录;
+ - `-v $PWD:/paddle`:指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /paddle 目录;
- > `` 指定需要使用的 image 名称,您可以通过`docker images`命令查看;/bin/bash 是在 Docker 中要执行的命令
+ - `registry.baidubce.com/paddlepaddle/paddle:2.4.2`:指定需要使用的 image 名称,您可以通过`docker images`命令查看;/bin/bash 是在 Docker 中要执行的命令
- * 使用 CPU 版本的 PaddlePaddle,且镜像中预装好了 jupyter:
+* 使用 CPU 版本的 PaddlePaddle,且镜像中预装好了 jupyter:
- ```
- mkdir ./jupyter_docker
- ```
- ```
- chmod 777 ./jupyter_docker
- ```
- ```
- cd ./jupyter_docker
- ```
- ```
- docker run -p 80:80 --rm --env USER_PASSWD=[password you set] -v $PWD:/home/paddle
- ```
+ ```
+ mkdir ./jupyter_docker
+ ```
+ ```
+ chmod 777 ./jupyter_docker
+ ```
+ ```
+ cd ./jupyter_docker
+ ```
+ ```
+ docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter
+ ```
- > --rm 关闭容器后删除容器;
+ - `--rm`:关闭容器后删除容器;
- > --env USER_PASSWD=[password you set] 为 jupyter 设置登录密码,[password you set] 是自己设置的密码;
+ - `--env USER_PASSWD="password you set"`:为 jupyter 设置登录密码,`password you set` 是自己设置的密码;
- > -v $PWD:/home/paddle 指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /home/paddle 目录;
+ - `-v $PWD:/home/paddle`:指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /home/paddle 目录;
- > `` 指定需要使用的 image 名称,您可以通过`docker images`命令查看
+ - `registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter`:指定需要使用的 image 名称,您可以通过`docker images`命令查看
@@ -88,7 +93,7 @@
-### **镜像简介**
+## **镜像简介**
@@ -99,12 +104,12 @@
- registry.baidubce.com/paddlepaddle/paddle:2.1.0 |
- 安装了 2.1.0 版本 paddle 的 CPU 镜像 |
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2 |
+ 安装了 2.4.2 版本 paddle 的 CPU 镜像 |
- registry.baidubce.com/paddlepaddle/paddle:2.1.0-jupyter |
- 安装了 2.1.0 版本 paddle 的 CPU 镜像,且镜像中预装好了 jupyter,启动 docker 即运行 jupyter 服务 |
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter |
+ 安装了 2.4.2 版本 paddle 的 CPU 镜像,且镜像中预装好了 jupyter,启动 docker 即运行 jupyter 服务 |
@@ -113,22 +118,18 @@
您可以在 [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) 中找到 PaddlePaddle 的各个发行的版本的 docker 镜像。
-### 注意事项
-
-* 镜像中 Python 版本为 3.7
-
-### 补充说明
+## 补充说明
* 当您需要第二次进入 Docker 容器中,使用如下命令:
启动之前创建的容器
```
- docker start [Name of container]
+ docker start
```
进入启动的容器
```
- docker attach [Name of container]
+ docker attach
```
* 如您是 Docker 新手,您可以参考互联网上的资料学习,例如[Docker 教程](http://www.runoob.com/docker/docker-hello-world.html)
@@ -142,4 +143,4 @@
pip uninstall paddlepaddle
```
-或通过`docker rm [Name of container]`来直接删除 Docker 容器
+或通过`docker rm `来直接删除 Docker 容器
diff --git a/docs/install/docker/macos-docker_en.md b/docs/install/docker/macos-docker_en.md
index fcbe3eff72d..c70543ccd91 100644
--- a/docs/install/docker/macos-docker_en.md
+++ b/docs/install/docker/macos-docker_en.md
@@ -1,85 +1,92 @@
-# **Install on macOS via Docker**
+# **Install on MacOS via Docker**
-[Docker](https://docs.docker.com/install/) is an open source application container engine. Using docker, you can not only isolate the installation and use of paddlepaddle from the system environment, but also share GPU, network and other resources with the host
+[Docker](https://docs.docker.com/install/) is an open source application container engine. Using docker, you can not only isolate the installation and use of paddlepaddle from the system environment, but also share GPU, network and other resources with the host.
+In the following Docker installation and use process, a specific version of PaddlePaddle has been installed in docker.
## Environment preparation
-- macOS version 10.11/10.12/10.13/10.14 (64 bit)(not support GPU version)
+- MacOS version 10.x/11.x (64 bit)(not support GPU version)
-- On the local host [Install Docker](https://hub.docker.com/search/?type=edition&offering=community)
+- On the local host [Install Docker](https://docs.docker.com/engine/install/)
+
+- Python version in the image is 3.7
## Installation steps
-1. Pull PaddlePaddle image
+### 1. Pull PaddlePaddle image
+
+For domestic users, when downloading docker is slow due to network problems, you can use the mirror provided by Baidu:
+
+* CPU version of PaddlePaddle:
+ ```
+ docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2
+ ```
- * CPU version of PaddlePaddle:
- ```
- docker pull registry.baidubce.com/paddlepaddle/paddle:[version number]
- ```
+* CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
+ ```
+ docker pull registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter
+ ```
- * CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
- ```
- docker pull registry.baidubce.com/paddlepaddle/paddle:[version number]-jupyter
- ```
+If your machine is not in mainland China, you can pull the image directly from DockerHub:
- If your machine is not in mainland China, you can pull the image directly from DockerHub:
+* CPU version of PaddlePaddle:
+ ```
+ docker pull paddlepaddle/paddle:2.4.2
+ ```
- * CPU version of PaddlePaddle:
- ```
- docker pull paddlepaddle/paddle:[version number]
- ```
+* CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
+ ```
+ docker pull paddlepaddle/paddle:2.4.2-jupyter
+ ```
- * CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
- ```
- docker pull paddlepaddle/paddle:[version number]-jupyter
+You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get more images.
- After `:`please fill in the PaddlePaddle version number, you can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that matches your machine.
+### 2. Build and enter Docker container
-2. Build and enter Docker container
+* Use CPU version of PaddlePaddle:
- * Use CPU version of PaddlePaddle:
+ ```
+ docker run --name paddle_docker -it -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.2 /bin/bash
+ ```
- ```
- docker run --name [Name of container] -it -v $PWD:/paddle /bin/bash
- ```
+ - `--name paddle_docker`: set name of Docker, `paddle_docker` is name of docker you set;
- > --name [Name of container] set name of Docker;
+ - `-it`: The parameter indicates that the container has been operated interactively with the local machine;
- > -it The parameter indicates that the container has been operated interactively with the local machine;
+ - `-v $PWD:/paddle`: Specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container;
- > -v $PWD:/paddle specifies to mount the current path of the host (PWD variable will expand to the absolute path of the current path) to the /paddle directory inside the container;
+ - `registry.baidubce.com/paddlepaddle/paddle:2.4.2`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
- > `` Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
- * Use CPU version of PaddlePaddle:
+* Use CPU version of PaddlePaddle with jupyter:
- ```
- mkdir ./jupyter_docker
- ```
- ```
- chmod 777 ./jupyter_docker
- ```
- ```
- cd ./jupyter_docker
- ```
- ```
- docker run -p 80:80 --rm --env USER_PASSWD=[password you set] -v $PWD:/home/paddle
- ```
+ ```
+ mkdir ./jupyter_docker
+ ```
+ ```
+ chmod 777 ./jupyter_docker
+ ```
+ ```
+ cd ./jupyter_docker
+ ```
+ ```
+ docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter
+ ```
- > --rm Delete the container after closing it;
+ - `--rm`: Delete the container after closing it;
- > --env USER_PASSWD=[password you set] Set the login password for jupyter, [password you set] is the password you set;
+ - `--env USER_PASSWD="password you set"`: Set the login password for jupyter, `password you set` is the password you set;
- > -v $PWD:/home/paddle Specifies to mount the current path (the PWD variable will be expanded to the absolute path of the current path) to the /home/paddle directory inside the container;
+ - `-v $PWD:/home/paddle`: Specifies to mount the current path (the PWD variable will be expanded to the absolute path of the current path) to the /home/paddle directory inside the container;
- > `` Specify the name of the image to be used, you can view it through the `docker images` command
+ - `registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter`: Specify the name of the image to be used, you can view it through the `docker images` command
@@ -87,7 +94,7 @@ Now you have successfully used Docker to install PaddlePaddle. For more informat
-### **Introduction to mirror images**
+## **Introduction to mirror images**
@@ -98,12 +105,12 @@ Now you have successfully used Docker to install PaddlePaddle. For more informat
- registry.baidubce.com/paddlepaddle/paddle:2.1.0 |
- CPU image with 2.1.0 version of paddle installed |
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2 |
+ CPU image with 2.4.2 version of paddle installed |
- registry.baidubce.com/paddlepaddle/paddle:2.1.0-jupyter |
- CPU image of paddle version 2.1.0 is installed, and jupyter is pre-installed in the image. Start the docker to run the jupyter service |
+ registry.baidubce.com/paddlepaddle/paddle:2.4.2-jupyter |
+ CPU image of paddle version 2.4.2 is installed, and jupyter is pre-installed in the image. Start the docker to run the jupyter service |
@@ -111,23 +118,18 @@ Now you have successfully used Docker to install PaddlePaddle. For more informat
You can find the docker mirroring of the published versions of PaddlePaddle in [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/).
-
-### Note
-
-* Python version in the image is 3.7
-
-### 补充说明
+## Supplement
* When you need to enter the docker container for the second time, use the following command:
Container created before startup
```
- docker start [Name of container]
+ docker start
```
Enter the starting container
```
- docker attach [Name of container]
+ docker attach
```
* If you are a newcomer to Docker, you can refer to the materials on the Internet for learning, such as [Docker tutorial](http://www.runoob.com/docker/docker-hello-world.html)
@@ -141,4 +143,4 @@ After entering the Docker container, execute the following command:
pip uninstall paddlepaddle
```
-Or delete the docker container directly through `docker rm [Name of container]`
+Or delete the docker container directly through `docker rm `
diff --git a/docs/install/index_cn.rst b/docs/install/index_cn.rst
index 2df915dbda2..93c1fcbc104 100644
--- a/docs/install/index_cn.rst
+++ b/docs/install/index_cn.rst
@@ -5,18 +5,6 @@
=========
------------
- 重要更新
------------
-
-* 新增对 python3.9 的支持,并不再支持 python2.7 和 python3.5
-* 新增对 CUDA 11.2 的支持,并不再支持 CUDA 9.0、CUDA 10.0 和 CUDA 11.0
-* 新增对 ROCm 平台的支持(2.1 中飞桨对 ROCm 平台的支持是 experimental 的)
-* Linux 系统相关的包已被拆分为 avx 和 noavx 两种类型的包(大部分机器都使用 avx 指令集,可使用 `Linux 下的 PIP 安装 `_ 页面中的命令查看您的机器是否支持)
-* 新增预装好 jupyter 的 CPU 镜像,启动镜像后即启动 jupyter 服务
-* 新增支持 Windows Visual Studio 2017 编译,由 VS2015 全面升级至 VS2017
-
-
-----------
安装说明
-----------
@@ -26,7 +14,7 @@
**1. 操作系统要求:**
* Windows 7 / 8 / 10,专业版 / 企业版
-* Ubuntu 16.04 / 18.04
+* Ubuntu 16.04 / 18.04 / 20.04 / 22.04
* CentOS 7
* MacOS 10.11 / 10.12 / 10.13 / 10.14
* 操作系统要求是 64 位版本
@@ -34,18 +22,18 @@
**2. 处理器要求**
* 处理器支持 MKL
-* 处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构,目前 PaddlePaddle 不支持 arm64 架构
+* 处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构,目前 PaddlePaddle 不支持 arm64 架构(mac M1 除外,paddle 已支持 Mac M1 芯片)
**3. Python 和 pip 版本要求:**
-* Python 的版本要求 3.6/3.7/3.8/3.9
+* Python 的版本要求 3.6/3.7/3.8/3.9/3.10
* Python 具有 pip, 且 pip 的版本要求 20.2.2+
* Python 和 pip 要求是 64 位版本
**4. PaddlePaddle 对 GPU 支持情况:**
* 目前 **PaddlePaddle** 支持 **NVIDIA** 显卡的 **CUDA** 驱动和 **AMD** 显卡的 **ROCm** 架构
-* 需要安装 `cuDNN `_ ,版本要求 7.6+(For CUDA10.1/10.2)
+* 需要安装 `cuDNN `_ ,版本要求 7.6(For CUDA10.2)
* 如果您需要 GPU 多卡模式,需要安装 `NCCL 2 `_
* 仅 Ubuntu/CentOS 支持 NCCL 2 技术
@@ -53,23 +41,22 @@
* Windows 安装 GPU 版本
- * Windows 7/8/10 支持 CUDA 10.1/10.2/11.2 单卡模式
+ * Windows 7/8/10 支持 CUDA 10.2/11.2/11.6/11.7 单卡模式
* 不支持 **nvidia-docker** 方式安装
* Ubuntu 安装 GPU 版本
- * Ubuntu 16.04 支持 CUDA 10.1/10.2/11.2
- * Ubuntu 18.04 支持 CUDA 10.1/10.2/11.2
- * 如果您是使用 **nvidia-docker** 安装,支持 CUDA 10.2/11.2
+ * Ubuntu 16.04/18.04/20.04/22.04 支持 CUDA 10.2/11.2/11.6/11.7
+ * 如果您是使用 **nvidia-docker** 安装,支持 CUDA 10.2/11.2/11.7
* CentOS 安装 GPU 版本
* 如果您是使用本机 **pip** 安装:
- * CentOS 7 支持 CUDA 10.1/10.2/11.2
+ * CentOS 7 支持 CUDA 10.2/11.2/11.6/11.7
* 如果您是使用本机源码编译安装:
- * CentOS 7 支持 CUDA 10.1/10.2/11.2
+ * CentOS 7 支持 CUDA 10.2/11.2/11.6/11.7
* CentOS 6 不推荐,不提供编译出现问题时的官方支持
- * 如果您是使用 **nvidia-docker** 安装,在 CentOS 7 下支持 CUDA 10.2/11.2
+ * 如果您是使用 **nvidia-docker** 安装,在 CentOS 7 下支持 CUDA 10.2/11.2/11.7
* MacOS 不支持:MacOS 平台不支持 GPU 安装
请确保您的环境满足以上条件。如您有其他需求,请参考 `多版本 whl 包安装列表 `_ .
@@ -81,18 +68,15 @@
* 不支持 NCCL
* Ubuntu 支持情况
- * Ubuntu 16.04:
-
- * CUDA10.1 下支持 NCCL v2.4.2-v2.4.8
- * Ubuntu 18.04:
+ * Ubuntu 16.04/18.04/20.04/22.04:
- * CUDA10.1 下支持 NCCL v2.4.2-v2.4.8
+ * 支持 NCCL v2.7.8 及更高版本
* CentOS 支持情况
* CentOS 6:不支持 NCCL
* CentOS 7:
- * CUDA10.1 下支持 NCCL v2.4.2-v2.4.8
+ * 支持 NCCL v2.7.8 及更高版本
* MacOS 支持情况
* 不支持 NCCL
@@ -126,7 +110,7 @@
4. 检查 Python 的版本
- 使用以下命令确认是 3.6/3.7/3.8/3.9
+ 使用以下命令确认是 3.6/3.7/3.8/3.9/3.10
::
python --version
@@ -139,7 +123,7 @@
python -m pip --version
-6. 确认 Python 和 pip 是 64 bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构,目前 PaddlePaddle 不支持 arm64 架构。下面的第一行输出的是 "64bit" ,第二行输出的是 "x86_64" 、 "x64" 或 "AMD64" 即可:
+6. 确认 Python 和 pip 是 64 bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构,目前 PaddlePaddle 不支持 arm64 架构(mac M1 除外,paddle 已支持 Mac M1 芯片)。下面的第一行输出的是 "64bit" ,第二行输出的是 "x86_64" 、 "x64" 或 "AMD64" 即可:
::
@@ -153,11 +137,11 @@
安装 CPU 版本的命令为:
::
- python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+ python -m pip install paddlepaddle==2.4.2 -i https://mirror.baidu.com/pypi/simple
或
- python -m pip install paddlepaddle -i https://pypi.tuna.tsinghua.edu.cn/simple
+ python -m pip install paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
(2). **GPU 版本** :如果您想使用 GPU 版本请参考如下命令安装
@@ -169,11 +153,11 @@
请注意用以下指令安装的 PaddlePaddle 在 Windows、Ubuntu、CentOS 下只支持 CUDA10.2:
::
- python -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+ python -m pip install paddlepaddle-gpu==2.4.2 -i https://mirror.baidu.com/pypi/simple
或
- python -m pip install paddlepaddle-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple
+ python -m pip install paddlepaddle-gpu==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
请确认需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。根据您的环境您可能需要将说明中所有命令行中的 python 替换为具体的 Python 路径。
@@ -199,12 +183,14 @@
- 如果您有开发 PaddlePaddle 的需求,请参考:`从源码编译 `_
-.. toctree::
+.. toctree::
:hidden:
pip/frompip.rst
+ conda/fromconda.rst
+ docker/fromdocker.rst
compile/fromsource.rst
install_Kunlun_zh.md
install_ROCM_zh.md
- instalL_NGC_PaddlePaddle_ch.rst
+ install_NGC_PaddlePaddle_ch.rst
Tables.md
diff --git a/docs/install/index_en.rst b/docs/install/index_en.rst
index 412b8d4040b..82ad6edbdd2 100644
--- a/docs/install/index_en.rst
+++ b/docs/install/index_en.rst
@@ -5,17 +5,6 @@
=======================
-----------------------
- Important updates
-----------------------
-
-* Add support for python3.9, and no longer supports python2.7 and python3.5
-* Add support for CUDA 11.2, and no longer supports CUDA 9.0, CUDA 10.0 and CUDA 11.0
-* Add support for ROCm platform (2.1 Paddle's support for ROCm platform is experimental)
-* Linux system-related packages have been split into two types of packages, avx and noavx (Most machines use the avx instruction set. You can check whether your machine supports it through commands on the `PIP installation under Linux `_ page )
-* Add a CPU image with jupyter pre-installed. Jupyter service will be started after starting the image
-* Added support for Windows Visual Studio 2017 compilation, fully upgraded from VS2015 to VS2017
-
------------------------
Installation Manuals
@@ -28,7 +17,7 @@ The manuals will guide you to build and install PaddlePaddle on your 64-bit desk
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
* Windows 7 / 8 / 10, Pro/Enterprise
-* Ubuntu 16.04 / 18.04
+* Ubuntu 16.04 / 18.04 / 20.04 / 22.04
* CentOS 7
* MacOS 10.11 / 10.12 / 10.13 / 10.14
* 64-bit operating system is required
@@ -42,7 +31,7 @@ The manuals will guide you to build and install PaddlePaddle on your 64-bit desk
3. Version requirements of python and pip:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
-* Python requires version 3.6/3.7/3.8/3.9
+* Python requires version 3.6/3.7/3.8/3.9/3.10
* Python needs pip, and pip requires version 20.2.2 or above
* Python and pip requires 64-bit
@@ -50,7 +39,7 @@ The manuals will guide you to build and install PaddlePaddle on your 64-bit desk
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
* Currently, **PaddlePaddle** supports **CUDA** driver of **NVIDIA** graphics card and **ROCm** driver of **AMD** card.
-* You need to install `cuDNN `_ , and version 7.6+ is required(For CUDA10.1/10.2)
+* You need to install `cuDNN `_ , and version 7.6 is required(For CUDA10.1/10.2)
* If you need GPU multi-card mode, you need to install `NCCL 2 `_
* Only Ubuntu/CentOS support NCCL 2
@@ -58,22 +47,21 @@ The manuals will guide you to build and install PaddlePaddle on your 64-bit desk
* Windows install GPU version
- * Windows 7 / 8 / 10 support CUDA 10.1/10.2/11.2 single-card mode, but don't support CUDA 9.1/9.2/10.1
+ * Windows 7 / 8 / 10 support CUDA 10.2/11.2/11.6/11.7 single-card mode
* don't support install using **nvidia-docker**
* Ubuntu install GPU version
- * Ubuntu 16.04 supports CUDA 10.1/10.2/11.2
- * Ubuntu 18.04 supports CUDA 10.1/10.2/11.2
- * If you install using **nvidia-docker** , it supports CUDA 10.2/11.2
+ * Ubuntu 16.04 / 18.04 / 20.04 / 22.04 supports CUDA 10.2/11.2/11.6/11.7
+ * If you install using **nvidia-docker** , it supports CUDA 10.2/11.2/11.7
* CentOS install GPU version
* If you install using native **pip** :
- * CentOS 7 supports CUDA 10.1/10.2/11.2
+ * CentOS 7 supports CUDA 10.2/11.2/11.6/11.7
* If you compile and install using native source code:
- * CentOS 7 supports CUDA 10.1/10.2/11.2
- * If you install using **nvidia-docker** , CentOS 7 supports CUDA 10.2/11.2
+ * CentOS 7 supports CUDA 10.2/11.2/11.6/11.7
+ * If you install using **nvidia-docker** , CentOS 7 supports CUDA 10.2/11.2/11.7
* MacOS isn't supported: PaddlePaddle has no GPU support in Mac OS platform
Please make sure your environment meets the above conditions. If you have other requirements, please refer to `Appendix `_ .
@@ -86,12 +74,9 @@ Please make sure your environment meets the above conditions. If you have other
* not support NCCL
* Support for Ubuntu
- * Ubuntu 16.04:
+ * Ubuntu 16.04 / 18.04 / 20.04 / 22.04:
* support NCCL v2.4.2-v2.4.8 under CUDA10.1
- * Ubuntu 18.04:
-
- * support v2.4.2-v2.4.8 under CUDA10.1
* Support for CentOS
* CentOS 6: not support NCCL
@@ -133,7 +118,7 @@ This section describes how to use pip to install.
4. Check the version of Python
- Confirm the Python is 3.6/3.7/3.8/3.9 using command
+ Confirm the Python is 3.6/3.7/3.8/3.9/3.10 using command
::
python --version
@@ -160,11 +145,11 @@ This section describes how to use pip to install.
Command to install CPU version is:
::
- python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+ python -m pip install paddlepaddle==2.4.2 -i https://mirror.baidu.com/pypi/simple
or
- python -m pip install paddlepaddle -i https://pypi.tuna.tsinghua.edu.cn/simple
+ python -m pip install paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
(2). **GPU version** : If you only want to install GPU version, please refer to command below
@@ -177,11 +162,11 @@ This section describes how to use pip to install.
Please attention that PaddlePaddle installed through command below only supports CUDA10.2 under Windows、Ubuntu、CentOS:
::
- python -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+ python -m pip install paddlepaddle-gpu==2.4.2 -i https://mirror.baidu.com/pypi/simple
or
- python -m pip install paddlepaddle-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple
+ python -m pip install paddlepaddle-gpu==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
Please confirm that the Python where you need to install PaddlePaddle is your expected location, because your computer may have multiple Python. Depending on the environment, you may need to replace Python in all command lines in the instructions with Python 3 or specific Python path.
@@ -208,10 +193,12 @@ The second way to install: compile and install with source code
- If you use PaddlePaddle only, we suggest you installation methods **pip** to install.
- If you need to develop PaddlePaddle, please refer to `compile from source code `_
-.. toctree::
+.. toctree::
:hidden:
pip/frompip_en.rst
+ conda/fromconda_en.rst
+ docker/fromdocker_en.rst
compile/fromsource_en.rst
install_Kunlun_en.md
install_NGC_PaddlePaddle_en.rst
diff --git a/docs/install/install_NGC_PaddlePaddle_ch.rst b/docs/install/install_NGC_PaddlePaddle_ch.rst
new file mode 100644
index 00000000000..0621c2a873d
--- /dev/null
+++ b/docs/install/install_NGC_PaddlePaddle_ch.rst
@@ -0,0 +1,110 @@
+.. _install_NGC_PaddlePaddle_container introduction:
+
+================================
+NGC 飞桨容器安装指南
+================================
+
+----------------------
+ 整体介绍
+----------------------
+
+NGC 飞桨容器针对 NVIDIA GPU 加速进行了优化,并包含一组经过验证的库,可启用和优化 NVIDIA GPU 性能。此容器还可能包含对 PaddlePaddle 源代码的修改,以最大限度地提高性能和兼容性。此容器还包含用于加速 ETL (`DALI `_, `RAPIDS `_),、训练(`cuDNN `_, `NCCL `_)和推理(`TensorRT `_)工作负载的软件。
+
+----------------------
+ 环境准备
+----------------------
+
+使用 NGC 飞桨容器需要主机系统安装以下内容:
+
+* `Docker 引擎 `_
+
+* `NVIDIA GPU 驱动程序 `_
+
+* `NVIDIA 容器工具包 `_
+
+有关支持的版本,请参阅 `NVIDIA 框架容器支持矩阵 `_ 和 `NVIDIA 容器工具包文档 `_。
+
+不需要其他安装、编译或依赖管理。 无需安装 NVIDIA CUDA Toolkit。
+
+----------------------
+ 安装步骤
+----------------------
+
+要运行容器,请按照 NVIDIA Containers For Deep Learning Frameworks User's Guide 中 `Running A Container `_ 一章中的说明发出适当的命令,并指定注册表、存储库和标签。 有关使用 NGC 的更多信息,请参阅 NGC 容器用户指南。
+如果您有 Docker 19.03 或更高版本,启动容器的典型命令是:
+
+ ::
+
+ docker run --gpus all --shm-size=1g --ulimit memlock=-1 -it --rm nvcr.io/nvidia/paddlepaddle:22.07-py3
+
+
+如果您有 Docker 19.02 或更早版本,启动容器的典型命令是:
+
+ ::
+
+ nvidia-docker run --shm-size=1g --ulimit memlock=-1 -it --rm nvcr.io/nvidia/paddlepaddle:22.07-py3
+
+
+
+其中:
+* 22.07 是容器版本。
+PaddlePaddle 通过将其作为 Python 模块导入来运行:
+
+ ::
+
+ $ python -c 'import paddle; paddle.utils.run_check()'
+ Running verify PaddlePaddle program ...
+ W0516 06:36:54.208734 442 device_context.cc:451] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.7, Runtime API Version: 11.7
+ W0516 06:36:54.212574 442 device_context.cc:469] device: 0, cuDNN Version: 8.4.
+ PaddlePaddle works well on 1 GPU.
+ W0516 06:37:12.706600 442 fuse_all_reduce_op_pass.cc:76] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 2.
+ PaddlePaddle works well on 8 GPUs.
+ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
+
+有关入门和自定义 PaddlePaddle 映像的信息,请参阅容器内的 /workspace/README.md。
+
+您可能希望从容器外部的位置提取数据和模型描述以供 PaddlePaddle 使用。 为此,最简单的方法是将一个或多个主机目录挂载为 `Docker 绑定挂载 `_。 例如:
+
+ ::
+
+ docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/paddlepaddle:22.07-py3
+
+
+注意:为了在队列之间共享数据,NCCL 可能需要共享系统内存用于 IPC 和固定(页面锁定)系统内存资源。 操作系统对这些资源的限制可能需要相应增加。 有关详细信息,请参阅系统文档。 特别是,Docker 容器默认使用有限的共享和固定内存资源。 在容器内使用 NCCL 时,建议您通过发出以下命令来增加这些资源:
+
+ ::
+
+ --shm-size=1g --ulimit memlock=-1
+
+在 docker run 命令中。
+
+----------------------
+ NGC 容器介绍
+----------------------
+
+有关内容的完整列表,请参阅 `NGC 飞桨容器发行说明 `_。
+此容器映像包含 NVIDIA 版 PaddlePaddle 的完整源代码,位于 /opt/paddle/paddle。它是作为系统 Python 模块预构建和安装的。
+NVIDIA PaddlePaddle 容器针对与 NVIDIA GPU 一起使用进行了优化,并包含以下用于 GPU 加速的软件:
+
+* `CUDA `_
+
+* `cuBLAS `_
+
+* `NVIDIA cuDNN `_
+
+* `NVIDIA NCCL `_ (optimized for `NVLink `_ )
+
+* `NVIDIA Data Loading Library (DALI) `_
+
+* `TensorRT `__
+
+* `PaddlePaddle with TensorRT (Paddle-TRT) `_
+
+此容器中的软件堆栈已经过兼容性验证,不需要最终用户进行任何额外的安装或编译。此容器可以帮助您从端到端加速深度学习工作流程。
+
+
+--------------------------------------------
+ NGC 飞桨容器软件许可协议
+--------------------------------------------
+
+当您下载或使用 NGC 飞桨容器时,即表示您已经同意并接受此 `最终用户许可协议 `_ 的条款及其对应约束。
diff --git a/docs/install/install_NGC_PaddlePaddle_en.rst b/docs/install/install_NGC_PaddlePaddle_en.rst
index 9ee73559770..95f3c746ffc 100644
--- a/docs/install/install_NGC_PaddlePaddle_en.rst
+++ b/docs/install/install_NGC_PaddlePaddle_en.rst
@@ -16,11 +16,11 @@ The PaddlePaddle NGC Container is optimized for GPU acceleration, and contains a
Using the PaddlePaddle NGC Container requires the host system to have the following installed:
-* `Docker Engine `_
+* `Docker Engine `_
-* `NVIDIA GPU Drivers `_
+* `NVIDIA GPU Drivers `_
-* `NVIDIA Container Toolkit `_
+* `NVIDIA Container Toolkit `_
For supported versions, see the `Framework Containers Support Matrix `_ and the `NVIDIA Container Toolkit Documentation `_ .
@@ -50,7 +50,7 @@ If you have Docker 19.02 or earlier, a typical command to launch the container i
Where:
-* 22.07 is the container version.
+* 22.07 is the container version.
PaddlePaddle is run by importing it as a Python module:
@@ -96,19 +96,19 @@ This container image contains the complete source of the NVIDIA version of Paddl
The NVIDIA PaddlePaddle Container is optimized for use with NVIDIA GPUs, and contains the following software for GPU acceleration:
-* `CUDA `_
+* `CUDA `_
-* `cuBLAS `_
+* `cuBLAS `_
-* `NVIDIA cuDNN `_
+* `NVIDIA cuDNN `_
-* `NVIDIA NCCL `_ (optimized for `NVLink `_ )
+* `NVIDIA NCCL `_ (optimized for `NVLink `_ )
-* `NVIDIA Data Loading Library (DALI) `_
+* `NVIDIA Data Loading Library (DALI) `_
-* `TensorRT `__
+* `TensorRT `__
-* `PaddlePaddle with TensorRT (Paddle-TRT) `_
+* `PaddlePaddle with TensorRT (Paddle-TRT) `_
The software stack in this container has been validated for compatibility, and does not require any additional installation or compilation from the end user. This container can help accelerate your deep learning workflow from end to end.
diff --git a/docs/install/install_script.md b/docs/install/install_script.md
index 72300a97a7e..8ef978d944c 100644
--- a/docs/install/install_script.md
+++ b/docs/install/install_script.md
@@ -8,17 +8,17 @@
脚本会执行以下几步:
-1. GPU 检测
+1. GPU 检测
- 检测您的机器是否含有我们支持的 GPU,如果有,会安装 GPU 版本的 PaddlePaddle,否则会安装 CPU 版本。
- (PaddlePaddle 目前支持 NVIDIA[官网](https://developer.nvidia.com/cuda-gpus#collapseOne)列出的,算力 7.0 以下的 GPU 和 v100 系列的 GPU)
+ 检测您的机器是否含有我们支持的 GPU,如果有,会安装 GPU 版本的 PaddlePaddle,否则会安装 CPU 版本。
+ (PaddlePaddle 目前支持 NVIDIA[官网](https://developer.nvidia.com/cuda-gpus#collapseOne)列出的,算力 7.0 以下的 GPU 和 v100 系列的 GPU)
2. CUDA,cuDNN 检测
- 检测您的机器是否安装我们支持的 CUDA,cuDNN,具体地:
+ 检测您的机器是否安装我们支持的 CUDA,cuDNN,具体地:
- 1. 在`/usr/local/` 及其子目录下寻找 `cuda10.1/cuda10.2/cuda11.0/cuda11.2` 目录下的`version.txt`文件(通常如果您以默认方式安装了 CUDA)。 如果提示未找到 CUDA 请使用命令`find / -name version.txt`找到您所需要的 CUDA 目录下的“version.txt”路径,然后按照提示输入。
- 2. 在`/usr` 及其子目录下寻找文件 `cudnn.h` , 如果您的 cuDNN 未安装在默认路径请使用命令`find / -name cudnn.h`寻找您希望使用的 cuDNN 版本的`cudnn.h`路径并按提示输入
+ 1. 在`/usr/local/` 及其子目录下寻找 `cuda10.1/cuda10.2/cuda11.0/cuda11.2/cuda11.6/cuda11.7` 目录下的`version.txt`文件(通常如果您以默认方式安装了 CUDA)。 如果提示未找到 CUDA 请使用命令`find / -name version.txt`找到您所需要的 CUDA 目录下的“version.txt”路径,然后按照提示输入。
+ 2. 在`/usr` 及其子目录下寻找文件 `cudnn.h` , 如果您的 cuDNN 未安装在默认路径请使用命令`find / -name cudnn.h`寻找您希望使用的 cuDNN 版本的`cudnn.h`路径并按提示输入
如果未找到相应文件,则会安装 CPU 版本的 PaddlePaddle
@@ -39,14 +39,14 @@
以上检查完成后就会为您安装对应您系统的 PaddlePaddle 了,安装一般需要 1~2 分钟会根据您的网络来决定,请您耐心等待。
-### macOS
+### MacOS
脚本会执行以下几步:
1. 选择 PaddlePaddle 版本
我们为您提供 2 种版本:开发版和稳定版,推荐您选择测试验证过的稳定版
-2. 检查 Python 版本
-由于 macOS 自带的 Python 通常依赖于系统环境,因此我们不支持 macOS 自带的 Python 环境,请重新从 Python.org 安装 Python,然后根据提示输入您希望使用的 Python 的路径
+2. 检查 Python 版本
+由于 MacOS 自带的 Python 通常依赖于系统环境,因此我们不支持 MacOS 自带的 Python 环境,请重新从 Python.org 安装 Python,然后根据提示输入您希望使用的 Python 的路径
3. 检查是否支持[AVX](https://zh.wikipedia.org/zh-hans/AVX 指令集)指令集
diff --git a/docs/install/pip/frompip.rst b/docs/install/pip/frompip.rst
index 931460df602..c6e9e3c2a2b 100644
--- a/docs/install/pip/frompip.rst
+++ b/docs/install/pip/frompip.rst
@@ -2,7 +2,7 @@
**Pip 安装**
===========================
-.. toctree::
+.. toctree::
:maxdepth: 1
linux-pip.md
diff --git a/docs/install/pip/frompip_en.rst b/docs/install/pip/frompip_en.rst
index 7706c500279..273c7c9ab96 100644
--- a/docs/install/pip/frompip_en.rst
+++ b/docs/install/pip/frompip_en.rst
@@ -2,7 +2,7 @@
**Install via pip**
==============================
-.. toctree::
+.. toctree::
linux-pip_en.md
diff --git a/docs/install/pip/linux-pip.md b/docs/install/pip/linux-pip.md
index 4300319363f..eb3c030cc9a 100644
--- a/docs/install/pip/linux-pip.md
+++ b/docs/install/pip/linux-pip.md
@@ -1,20 +1,10 @@
# Linux 下的 PIP 安装
-## 一、环境准备
-
-### 1.1 目前飞桨支持的环境
-
-* **Linux 版本 (64 bit)**
-
- * **CentOS 7 (GPU 版本支持 CUDA 10.1/10.2/11.2)**
- * **Ubuntu 16.04 (GPU 版本支持 CUDA 10.1/10.2/11.1/11.2)**
- * **Ubuntu 18.04 (GPU 版本支持 CUDA 10.1/10.2/11.1/11.2)**
+[The Python Package Index(PyPI)](https://pypi.org/)是 Python 的包管理器。本文档为你介绍 PyPI 安装方式,飞桨提供的 PyPI 安装包支持分布式训练(多机多卡)、TensorRT 推理功能。
-* **Python 版本 3.6/3.7/3.8/3.9 (64 bit)**
-
-* **pip 或 pip3 版本 20.2.2 或更高版本 (64 bit)**
+## 一、环境准备
-### 1.2 如何查看您的环境
+### 1.1 如何查看您的环境
* 可以使用以下命令查看本机的操作系统和位数信息:
@@ -26,67 +16,71 @@
* 确认需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python
- * 根据您的环境您可能需要将说明中所有命令行中的 python 替换为具体的 Python 路径
+ * 根据您的环境您可能需要将说明中所有命令行中的 python3 替换为具体的 Python 路径
```
- which python
+ which python3
```
* 需要确认 python 的版本是否满足要求
- * 使用以下命令确认是 3.6/3.7/3.8/3.9
+ * 使用以下命令确认是 3.6/3.7/3.8/3.9/3.10
- python --version
+ python3 --version
* 需要确认 pip 的版本是否满足要求,要求 pip 版本为 20.2.2 或更高版本
```
- python -m ensurepip
+ python3 -m ensurepip
```
```
- python -m pip --version
+ python3 -m pip --version
```
-* 需要确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构,目前 PaddlePaddle 不支持 arm64 架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64"、"x64"或"AMD64"即可:
+* 需要确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64"、"x64"或"AMD64"即可:
```
- python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
+ python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
```
* 默认提供的安装包需要计算机支持 MKL
-* 如果您对机器环境不了解,请下载使用[快速安装脚本](https://fast-install.bj.bcebos.com/fast_install.sh),配套说明请参考[这里](https://github.com/PaddlePaddle/docs/blob/develop/docs/install/install_script.md)。
+* 如果您对机器环境不了解,请下载使用[快速安装脚本](https://fast-install.bj.bcebos.com/fast_install.sh),配套说明请参考[这里](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/install/install_script.md)。
## 二、开始安装
-本文档为您介绍 pip 安装方式
-
-### 首先请您选择您的版本
+### 首先请选择您的版本
* 如果您的计算机没有 NVIDIA® GPU,请安装[CPU 版的 PaddlePaddle](#cpu)
-* 如果您的计算机有 NVIDIA® GPU,请确保满足以下条件并且安装[GPU 版 PaddlePaddle](#gpu)
+* 如果您的计算机有 NVIDIA® GPU,请确保满足以下条件并且安装[GPU 版 PaddlePaddle](#gpu),依赖库环境版本要求如下:
+
+ * **CUDA 工具包 10.2 配合 cuDNN v7.6.5, 如需使用 PaddleTensorRT 推理,需配合 TensorRT7.0.0.11**
- * **CUDA 工具包 10.1/10.2 配合 cuDNN v7.6+(如需多卡支持,需配合 NCCL2.7 及更高)**
+ * **CUDA 工具包 11.2 配合 cuDNN v8.2.1, 如需使用 PaddleTensorRT 推理,需配合 TensorRT8.0.3.4**
- * **CUDA 工具包 11.2 配合 cuDNN v8.1.1(如需多卡支持,需配合 NCCL2.7 及更高)**
+ * **CUDA 工具包 11.6 配合 cuDNN v8.4.0, 如需使用 PaddleTensorRT 推理,需配合 TensorRT8.4.0.6**
+
+ * **CUDA 工具包 11.7 配合 cuDNN v8.4.1, 如需使用 PaddleTensorRT 推理,需配合 TensorRT8.4.2.4**
+
+ * **如需使用分布式多卡环境,需配合 NCCL>=2.7**
* **GPU 运算能力超过 3.5 的硬件设备**
- 您可参考 NVIDIA 官方文档了解 CUDA 和 CUDNN 的安装流程和配置方法,请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
+ 您可参考 NVIDIA 官方文档了解 CUDA、CUDNN 和 TensorRT 的安装流程和配置方法,请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/),[TensorRT](https://developer.nvidia.com/tensorrt)
-* 如果您需要使用多卡环境请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA9,cuDNN7 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
+* 如果您需要使用多卡环境请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA10.2,cuDNN7 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
- * **CentOS 系统可以参考以下命令**
+ * **Centos 系统可以参考以下命令**
wget http://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/nvidia-machine-learning-repo-rhel7-1.0.0-1.x86_64.rpm
@@ -99,7 +93,7 @@
```
```
- yum install -y libnccl-2.3.7-2+cuda9.0 libnccl-devel-2.3.7-2+cuda9.0 libnccl-static-2.3.7-2+cuda9.0
+ yum install -y libnccl-2.7.8-1+cuda10.2 libnccl-devel-2.7.8-1+cuda10.2 libnccl-static-2.7.8-1+cuda10.2
```
* **Ubuntu 系统可以参考以下命令**
@@ -113,74 +107,105 @@
```
```
- sudo apt-get install -y libnccl2=2.3.7-1+cuda9.0 libnccl-dev=2.3.7-1+cuda9.0
+ sudo apt install -y libnccl2=2.7.8-1+cuda10.2 libnccl-dev=2.7.8-1+cuda10.2
```
-#### 2.1 CPU 版的 PaddlePaddle
+#### 2.1 CPU 版的 PaddlePaddle
```
- python -m pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/linux/cpu-mkl/develop.html
+ python3 -m pip install paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
-#### 2.2 GPU 版的 PaddlePaddle
+#### 2.2 GPU 版的 PaddlePaddle
-2.2.1 CUDA10.1 的 PaddlePaddle
+2.2.1 CUDA10.2 的 PaddlePaddle
+
```
- python -m pip install paddlepaddle-gpu==0.0.0.post101 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+ python3 -m pip install paddlepaddle-gpu==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
+2.2.2 CUDA11.2 的 PaddlePaddle
+
+
+ ```
+ python3 -m pip install paddlepaddle-gpu==2.4.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+ ```
+
-2.2.2 CUDA10.2 的 PaddlePaddle
+2.2.3 CUDA11.6 的 PaddlePaddle
```
- python -m pip install paddlepaddle-gpu==0.0.0.post102 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+ python3 -m pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
```
-2.2.3 CUDA11.0 的 PaddlePaddle
+2.2.4 CUDA11.7 的 PaddlePaddle
```
- python -m pip install paddlepaddle-gpu==0.0.0.post110 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+ python3 -m pip install paddlepaddle-gpu==2.4.2.post117 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
```
-2.2.4 CUDA11.1 的 PaddlePaddle
+注:
+
+* 如果你使用的是安培架构的 GPU,推荐使用 CUDA11 以上。如果你使用的是非安培架构的 GPU,推荐使用 CUDA10.2,性能更优。
+
+* 飞桨对于主流各 python 版本均提供了对应的安装包,而您环境中可能有多个 Python,请确认你想使用的 python 版本并下载对应的 paddlepaddle 安装包。例如您想使用 python3.7 的环境,则安装命令为 python3.7 -m pip install paddlepaddle。
+
+* 如果您需要使用清华源,可以通过以下命令
```
- python -m pip install paddlepaddle-gpu==0.0.0.post111 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+ python3 -m pip install paddlepaddle-gpu==[版本号] -i https://pypi.tuna.tsinghua.edu.cn/simple
```
+* 上述命令默认安装`avx`的包。如果你的机器不支持`avx`,需要安装`noavx`的 Paddle 包,判断你的机器是否支持`avx`,可以输入以下命令,如果输出中包含`avx`,则表示机器支持`avx`
+ ```
+ cat /proc/cpuinfo | grep -i avx
+ ```
-2.2.5 CUDA11.2 的 PaddlePaddle
+ 首先使用如下命令将 wheel 包下载到本地:
+ * cpu、mkl 版本 noavx 机器安装:
```
- python -m pip install paddlepaddle-gpu==0.0.0.post112 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+ python3 -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/noavx/stable.html --no-index --no-deps
```
+ * cpu、openblas 版本 noavx 机器安装:
+ ```
+ python3 -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/linux/openblas/noavx/stable.html --no-index --no-deps
+ ```
-注:
-* 如果你使用的是安培架构的 GPU,推荐使用 CUDA11.2。如果你使用的是非安培架构的 GPU,推荐使用 CUDA10.2,性能更优。请参考: [GPU 架构对照表](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#nvidia-gpu)
+ * gpu 版本 cuda10.2 noavx 机器安装:
+
+ ```
+ python3 -m pip download paddlepaddle-gpu==2.4.2 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/noavx/stable.html --no-index --no-deps
+ ```
-* 请确认需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。根据您的环境您可能需要将说明中所有命令行中的 python 替换为 python3 或者替换为具体的 Python 路径。
+ 再使用`python3 -m pip install [name].whl`本地安装([name]为 wheel 包名称)。
+* 如果你想安装`avx`、`openblas`的 Paddle 包,可以通过以下命令将 wheel 包下载到本地,再使用`python3 -m pip install [name].whl`本地安装([name]为 wheel 包名称):
+
+ ```
+ python3 -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/linux/openblas/avx/stable.html --no-index --no-deps
+ ```
## **三、验证安装**
-安装完成后您可以使用 `python` 或 `python3` 进入 python 解释器,输入`import paddle` ,再输入
+安装完成后您可以使用 `python3` 进入 python 解释器,输入`import paddle` ,再输入
`paddle.utils.run_check()`
如果出现`PaddlePaddle is installed successfully!`,说明您已成功安装。
@@ -190,6 +215,6 @@
请使用以下命令卸载 PaddlePaddle:
-* **CPU 版本的 PaddlePaddle**: `python -m pip uninstall paddlepaddle`
+* **CPU 版本的 PaddlePaddle**: `python3 -m pip uninstall paddlepaddle`
-* **GPU 版本的 PaddlePaddle**: `python -m pip uninstall paddlepaddle-gpu`
+* **GPU 版本的 PaddlePaddle**: `python3 -m pip uninstall paddlepaddle-gpu`
diff --git a/docs/install/pip/linux-pip_en.md b/docs/install/pip/linux-pip_en.md
index 1478aeb5d0e..74b9ec447cd 100644
--- a/docs/install/pip/linux-pip_en.md
+++ b/docs/install/pip/linux-pip_en.md
@@ -2,18 +2,7 @@
## Environmental preparation
-### 1.1 PREQUISITES
-
-* **Linux Version (64 bit)**
- * **CentOS 7 (GPUVersion Supports CUDA 10.1/10.2/11.2**)**
- * **Ubuntu 16.04 (GPUVersion Supports CUDA 10.1/10.2/11.2)**
- * **Ubuntu 18.04 (GPUVersion Supports CUDA 10.1/10.2/11.2)**
-
-* **Python Version: 3.6/3.7/3.8/3.9 (64 bit)**
-
-* **pip or pip3 Version 20.2.2 or above (64 bit)**
-
-### 1.2 How to check your environment
+### 1.1 How to check your environment
* You can use the following commands to view the local operating system and bit information
@@ -28,65 +17,67 @@
* Use the following command to output Python path. Depending on the environment, you may need to replace Python in all command lines in the description with specific Python path
```
- which python
+ which python3
```
* You need to confirm whether the version of Python meets the requirements
- * Use the following command to confirm that it is 3.6/3.7/3.8/3.9
+ * Use the following command to confirm that it is 3.6/3.7/3.8/3.9/3.10
- python --version
+ python3 --version
* It is required to confirm whether the version of pip meets the requirements. The version of pip is required to be 20.2.2 or above
```
- python -m ensurepip
+ python3 -m ensurepip
```
```
- python -m pip --version
+ python3 -m pip --version
```
-* You need to confirm that Python and pip are 64bit, and the processor architecture is x86_64(or called x64、Intel 64、AMD64). Currently, paddlepaddle does not support arm64 architecture. The first line below outputs "64bit", and the second line outputs "x86_64", "x64" or "AMD64"
+* You need to confirm that Python and pip are 64bit, and the processor architecture is x86_64(or called x64、Intel 64、AMD64). The first line below outputs "64bit", and the second line outputs "x86_64", "x64" or "AMD64"
```
- python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
+ python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
```
* The installation package provided by default requires computer support for MKL
-* If you do not know the machine environment, please download and use[Quick install script](https://fast-install.bj.bcebos.com/fast_install.sh), for instructions please refer to[here](https://github.com/PaddlePaddle/docs/blob/develop/docs/install/install_script.md)。
+* If you do not know the machine environment, please download and use[Quick install script](https://fast-install.bj.bcebos.com/fast_install.sh), for instructions please refer to[here](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/install/install_script.md)。
## INSTALLATION
-If you installed Python via Homebrew or the Python website, `pip` was installed with it. If you installed Python 3.x, then you will be using the command `pip3`.
-
### Choose CPU/GPU
* If your computer doesn't have NVIDIA® GPU, please install [the CPU Version of PaddlePaddle](#cpu)
* If your computer has NVIDIA® GPU, please make sure that the following conditions are met and install [the GPU Version of PaddlePaddle](#gpu)
- * **CUDA toolkit 10.1/10.2 with cuDNN v7.6+(for multi card support, NCCL2.7 or higher)**
+ * **CUDA toolkit 10.2 with cuDNN v7.6.5(for multi card support, NCCL2.7 or higher;for PaddleTensorRT deployment, TensorRT7.0.0.11)**
- * **CUDA toolkit 11.2 with cuDNN v8.1.1(for multi card support, NCCL2.7 or higher)**
+ * **CUDA toolkit 11.2 with cuDNN v8.2.1(for multi card support, NCCL2.7 or higher;for PaddleTensorRT deployment, TensorRT8.0.3.4)**
+
+ * **CUDA toolkit 11.6 with cuDNN v8.4.0(for multi card support, NCCL2.7 or higher;for PaddleTensorRT deployment, TensorRT8.4.0.6)**
+
+ * **CUDA toolkit 11.7 with cuDNN v8.4.1(for multi card support, NCCL2.7 or higher;for PaddleTensorRT deployment, TensorRT8.4.2.4)**
* **Hardware devices with GPU computing power over 3.5**
- You can refer to NVIDIA official documents for installation process and configuration method of CUDA and cudnn. Please refer to [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
+ You can refer to NVIDIA official documents for installation process and configuration method of CUDA, cuDNN and TensorRT. Please refer to [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/),[TensorRT](https://developer.nvidia.com/tensorrt)
-* If you need to use a multi-card environment, please make sure that you have installed nccl2 correctly, or install nccl2 according to the following instructions (here are the installation instructions of nccl2 under CUDA9 and cuDNN7. For more version installation information, please refer to NVIDIA [Official Website](https://developer.nvidia.com/nccl)):
+* If you need to use a multi-card environment, please make sure that you have installed nccl2 correctly, or install nccl2 according to the following instructions (here are the installation instructions of nccl2 under CUDA10.2 and cuDNN7. For more version installation information, please refer to NVIDIA [Official Website](https://developer.nvidia.com/nccl)):
- * **CentOS system can refer to the following commands**
+ * **Centos system can refer to the following commands**
wget http://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/nvidia-machine-learning-repo-rhel7-1.0.0-1.x86_64.rpm
@@ -99,7 +90,7 @@ If you installed Python via Homebrew or the Python website, `pip` was installed
```
```
- yum install -y libnccl-2.3.7-2+cuda9.0 libnccl-devel-2.3.7-2+cuda9.0 libnccl-static-2.3.7-2+cuda9.0
+ yum install -y libnccl-2.7.8-1+cuda10.2 libnccl-devel-2.7.8-1+cuda10.2 libnccl-static-2.7.8-1+cuda10.2
```
* **Ubuntu system can refer to the following commands**
@@ -113,7 +104,7 @@ If you installed Python via Homebrew or the Python website, `pip` was installed
```
```
- sudo apt-get install -y libnccl2=2.3.7-1+cuda9.0 libnccl-dev=2.3.7-1+cuda9.0
+ sudo apt install -y libnccl2=2.7.8-1+cuda10.2 libnccl-dev=2.7.8-1+cuda10.2
```
@@ -124,73 +115,100 @@ You can choose the following version of PaddlePaddle to start installation:
-#### 2.1 CPU Versoion of PaddlePaddle
+#### 2.1 CPU Version of PaddlePaddle
```
- python -m pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/linux/cpu-mkl/develop.html
+ python3 -m pip install paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
-#### 2.2 GPU Version of PaddlePaddle
+#### 2.2 GPU Version of PaddlePaddle
-2.2.1 If you are using CUDA 10.1
+2.2.1 If you are using CUDA 10.2
```
- python -m pip install paddlepaddle-gpu==0.0.0.post101 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+ python3 -m pip install paddlepaddle-gpu==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
-
-2.2.2 If you are using CUDA 10.2
+2.2.2 If you are using CUDA 11.2
```
- python -m pip install paddlepaddle-gpu==0.0.0.post102 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+ python3 -m pip install paddlepaddle-gpu==2.4.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
```
-2.2.3 If you are using CUDA 11.0
+2.2.3 If you are using CUDA 11.6
```
- python -m pip install paddlepaddle-gpu==0.0.0.post110 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+ python3 -m pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
```
-
-2.2.4 If you are using CUDA 11.1
+2.2.4 If you are using CUDA 11.7
```
- python -m pip install paddlepaddle-gpu==0.0.0.post111 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+ python3 -m pip install paddlepaddle-gpu==2.4.2.post117 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
```
+Note:
+* If you are using ampere-based GPU, CUDA 11 above version is recommended; otherwise CUDA 10.2 is recommended for better performance.
-2.2.5 If you are using CUDA 11.2
+* Please confirm that the Python where you need to install PaddlePaddle is your expected location, because your computer may have multiple Python. Depending on the environment, you may need to replace python3 in all command lines in the instructions with specific Python path.
+* If you want to use the tsinghua pypi, you can use the following command:
```
- python -m pip install paddlepaddle-gpu==0.0.0.post112 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
+ python3 -m pip install paddlepaddle-gpu==[Version] -i https://pypi.tuna.tsinghua.edu.cn/simple
```
+* The above commands install the `avx` package by default. If your machine does not support `avx`, you need to install the Paddle package of `noavx`, you can use the following command to install,noavx version paddle wheel only support python3.8:
+ First use the following command to download the wheel package to the local, and then use `python3 -m pip install [name].whl` to install locally ([name] is the name of the wheel package):
+
+ * cpu and mkl version installed on noavx machine:
+
+ ```
+ python3 -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/noavx/stable.html --no-index --no-deps
+ ```
+
+ * cpu and openblas version installed on noavx machine:
+
+ ```
+ python3 -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/linux/openblas/noavx/stable.html --no-index --no-deps
+ ```
-Note:
-* If you are using ampere-based GPU, CUDA 11.2 is recommended; otherwise CUDA 10.2 is recommended for better performance. please refer to: [GPU architecture comparison table](https://www.paddlepaddle.org.cn/documentation/docs/en/install/Tables.html#nvidia-gpu)
+ * GPU cuda10.2 version install on noavx machine:
-* Please confirm that the Python where you need to install PaddlePaddle is your expected location, because your computer may have multiple Python. Depending on the environment, you may need to replace Python in all command lines in the instructions with Python 3 or specific Python path.
+ ```
+ python3 -m pip download paddlepaddle-gpu==2.4.2 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/noavx/stable.html --no-index --no-deps
+ ```
+
+ To determine whether your machine supports `avx`, you can use the following command. If the output contains `avx`, it means that the machine supports `avx`:
+ ```
+ cat /proc/cpuinfo | grep -i avx
+ ```
+
+* If you want to install the Paddle package with `avx` and `openblas`, you can use the following command to download the wheel package to the local, and then use `python3 -m pip install [name].whl` to install locally ([name] is the name of the wheel package):
+
+ ```
+ python3 -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/linux/openblas/avx/stable.html --no-index --no-deps
+ ```
## Verify installation
-After the installation is complete, you can use `python` or `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
+After the installation is complete, you can use `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
If `PaddlePaddle is installed successfully!` appears, to verify that the installation was successful.
@@ -198,5 +216,5 @@ If `PaddlePaddle is installed successfully!` appears, to verify that the install
Please use the following command to uninstall PaddlePaddle:
-- ***CPU version of PaddlePaddle\***: `python -m pip uninstall paddlepaddle`
-- ***GPU version of PaddlePaddle\***: `python -m pip uninstall paddlepaddle-gpu`
+- **CPU version of PaddlePaddle**: `python3 -m pip uninstall paddlepaddle`
+- **GPU version of PaddlePaddle**: `python3 -m pip uninstall paddlepaddle-gpu`
diff --git a/docs/install/pip/macos-pip.md b/docs/install/pip/macos-pip.md
index 849d08b03ff..44f1fa59904 100644
--- a/docs/install/pip/macos-pip.md
+++ b/docs/install/pip/macos-pip.md
@@ -1,17 +1,10 @@
-# macOS 下的 PIP 安装
+# MacOS 下的 PIP 安装
-## 一、环境准备
-
-### 1.1 目前飞桨支持的环境
-
-* **macOS 版本 10.11/10.12/10.13/10.14 (64 bit) (不支持 GPU 版本)**
-
-* **Python 版本 3.6/3.7/3.8/3.9 (64 bit)**
-
-* **pip 或 pip3 版本 20.2.2 或更高版本 (64 bit)**
+[The Python Package Index(PyPI)](https://pypi.org/)是 Python 的包管理器。本文档为你介绍 PyPI 安装方式,飞桨提供的 PyPI 安装包支持 TensorRT 推理功能。
+## 一、环境准备
-### 1.2 如何查看您的环境
+### 1.1 如何查看您的环境
* 可以使用以下命令查看本机的操作系统和位数信息:
@@ -23,7 +16,7 @@
* 确认需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python
- * 使用以下命令输出 Python 路径,根据的环境您可能需要将说明中所有命令行中的 python 替换为具体的 Python 路径
+ * 使用以下命令输出 Python 路径,根据的环境您可能需要将说明中所有命令行中的 python3 替换为具体的 Python 路径
```
which python
@@ -33,46 +26,42 @@
* 需要确认 python 的版本是否满足要求
- * 使用以下命令确认是 3.6/3.7/3.8/3.9
+ * 使用以下命令确认是 3.6/3.7/3.8/3.9/3.10
```
- python --version
+ python3 --version
```
* 需要确认 pip 的版本是否满足要求,要求 pip 版本为 20.2.2 或更高版本
```
- python -m ensurepip
+ python3 -m ensurepip
```
```
- python -m pip --version
+ python3 -m pip --version
```
-* 需要确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构,目前 PaddlePaddle 不支持 arm64 架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64"、"x64"或"AMD64"即可:
+* 需要确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构 或 arm64 架构(paddle 已原生支持 Mac M1 芯片):
```
- python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
+ python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
```
-* 默认提供的安装包需要计算机支持 MKL
-
-* 如果您对机器环境不了解,请下载使用[快速安装脚本](https://fast-install.bj.bcebos.com/fast_install.sh),配套说明请参考[这里](https://github.com/PaddlePaddle/docs/blob/develop/docs/install/install_script.md)。
+* 如果您对机器环境不了解,请下载使用[快速安装脚本](https://fast-install.bj.bcebos.com/fast_install.sh),配套说明请参考[这里](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/install/install_script.md)。
## 二、开始安装
-本文档为您介绍 pip 安装方式
+### 首先请选择您的版本
-### 首先请您选择您的版本
-
-* 目前在 macOS 环境仅支持 CPU 版 PaddlePaddle
+* 目前在 MacOS 环境仅支持 CPU 版 PaddlePaddle
### 根据版本进行安装
@@ -81,15 +70,31 @@
```
- python -m pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/mac/cpu/develop.html
+ python3 -m pip install paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
-* 注:
-* macOS 上您需要安装 unrar 以支持 PaddlePaddle,可以使用命令 `brew install rar`
-* 请确认需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。根据您的环境您可能需要将说明中所有命令行中的 python 替换为具体的 Python 路径。
-* 默认下载最新稳定版的安装包,如需获取开发版安装包,请参考[这里](https://www.paddlepaddle.org.cn/install/quick/zh/1.8.5-windows-pip)
-* 使用 macOS 中自带 Python 可能会导致安装失败。请使用[Python.org](https://www.python.org/downloads/mac-osx/)提供的 python3.6.x、python3.7.x、python3.8.x 或 python3.9.x。
+注:
+* MacOS 上您需要安装 unrar 以支持 PaddlePaddle,可以使用命令`brew install unrar`
+* 请确认需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。根据您的环境您可能需要将说明中所有命令行中的 python3 替换为具体的 Python 路径。
+* 默认下载最新稳定版的安装包,如需获取 develop 版本 nightly build 的安装包,请参考[这里](https://www.paddlepaddle.org.cn/install/quick/zh/1.8.5-windows-pip)
+* 使用 MacOS 中自带 Python 可能会导致安装失败。请使用[python 官网](https://www.python.org/downloads/mac-osx/)提供的 python3.6.x、python3.7.x、python3.8.x、python3.9.x、python3.10.x。
+* 上述命令默认安装`avx`的包,如果想要安装`noavx`的包,可以使用如下命令:
+
+ 首先使用如下命令将 wheel 包下载到本地,再使用`python3 -m pip install [name].whl`本地安装([name]为 wheel 包名称):
+
+ ```
+ python3 -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/mac/openblas/noavx/stable.html --no-index --no-deps
+ ```
+
+ 判断你的机器是否支持`avx`,可以输入以下命令,如果输出中包含`avx`,则表示机器支持`avx`
+ ```
+ sysctl machdep.cpu.features | grep -i avx
+ ```
+ 或
+ ```
+ sysctl machdep.cpu.leaf7_features | grep -i avx
+ ```
## **三、验证安装**
@@ -102,4 +107,4 @@
请使用以下命令卸载 PaddlePaddle:
-* `python -m pip uninstall paddlepaddle`
+* `python3 -m pip uninstall paddlepaddle`
diff --git a/docs/install/pip/macos-pip_en.md b/docs/install/pip/macos-pip_en.md
index 29cd9e7762b..440aa2d70a6 100644
--- a/docs/install/pip/macos-pip_en.md
+++ b/docs/install/pip/macos-pip_en.md
@@ -1,17 +1,8 @@
-# Install on macOS via PIP
+# Install on MacOS via PIP
## Environmental preparation
-### 1.1 PREQUISITES
-
-* **macOS version 10.11/10.12/10.13/10.14 (64 bit) (not support GPU version)**
-
-* **Python version 3.6/3.7/3.8/3.9 (64 bit)**
-
-* **pip or pip3 版本 20.2.2 or above (64 bit)**
-
-
-### 1.2 How to check your environment
+### 1.1 How to check your environment
* You can use the following commands to view the local operating system and bit information
@@ -23,52 +14,48 @@
* Confirm that the Python where you need to install PaddlePaddle is your expected location, because your computer may have multiple Python
- * Use the following command to output Python path. Depending on the environment, you may need to replace Python in all command lines in the description with specific Python path
+ * Use the following command to output Python path. Depending on the environment, you may need to replace python3 in all command lines in the description with specific Python path
```
- which python
+ which python3
```
* You need to confirm whether the version of Python meets the requirements
- * Use the following command to confirm that it is 3.6/3.7/3.8/3.9
+ * Use the following command to confirm that it is 3.6/3.7/3.8/3.9/3.10
- python --version
+ python3 --version
* It is required to confirm whether the version of pip meets the requirements. The version of pip is required to be 20.2.2 or above
```
- python -m ensurepip
+ python3 -m ensurepip
```
```
- python -m pip --version
+ python3 -m pip --version
```
-* You need to confirm that Python and pip are 64bit, and the processor architecture is x86_64(or called x64、Intel 64、AMD64). Currently, paddlepaddle does not support arm64 architecture. The first line below outputs "64bit", and the second line outputs "x86_64", "x64" or "AMD64"
+* You need to confirm that Python and pip are 64bit, and the processor architecture is x86_64(or called x64、Intel 64、AMD64) or arm64 (PaddlePaddle already supports Mac M1):
```
- python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
+ python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
```
-* The installation package provided by default requires computer support for MKL
-
-* If you do not know the machine environment, please download and use[Quick install script](https://fast-install.bj.bcebos.com/fast_install.sh), for instructions please refer to[here](https://github.com/PaddlePaddle/docs/blob/develop/docs/install/install_script.md)。
+* If you do not know the machine environment, please download and use[Quick install script](https://fast-install.bj.bcebos.com/fast_install.sh), for instructions please refer to[here](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/install/install_script.md)。
## INSTALLATION
-If you installed Python via Homebrew or the Python website, `pip` was installed with it. If you installed Python 3.x, then you will be using the command `pip3`. We will introduce pip installation here.
-
### Choose CPU/GPU
-* Currently, only the CPU version of PaddlePaddle is supported in the macOS environment
+* Currently, only the CPU version of PaddlePaddle is supported in the MacOS environment
### Installation Step
@@ -79,13 +66,29 @@ You can choose the following version of PaddlePaddle to start installation:
```
-python -m pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/mac/cpu/develop.html
+python3 -m pip install paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
Note:
-* Please confirm that the Python where you need to install PaddlePaddle is your expected location, because your computer may have multiple Python. Depending on the environment, you may need to replace Python in all command lines in the instructions with specific Python path.
+* Please confirm that the Python where you need to install PaddlePaddle is your expected location, because your computer may have multiple Python. Depending on the environment, you may need to replace python3 in all command lines in the instructions with specific Python path.
+* The above commands install the `avx` package by default. If you want to install the Paddle package of `noavx`, you can use the following command to install:
+
+ First use the following command to download the wheel package to the local, and then use `python3 -m pip install [name].whl` to install locally ([name] is the name of the wheel package):
+
+ ```
+ python3 -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/mac/openblas/noavx/stable.html --no-index --no-deps
+ ```
+
+ To determine whether your machine supports `avx`, you can use the following command. If the output contains `avx`, it means that the machine supports `avx`:
+ ```
+ sysctl machdep.cpu.features | grep -i avx
+ ```
+ or
+ ```
+ sysctl machdep.cpu.leaf7_features | grep -i avx
+ ```
@@ -100,5 +103,5 @@ If `PaddlePaddle is installed successfully!` appears, to verify that the install
Please use the following command to uninstall PaddlePaddle:
```
-python -m pip uninstall paddlepaddle
+python3 -m pip uninstall paddlepaddle
```
diff --git a/docs/install/pip/windows-pip.md b/docs/install/pip/windows-pip.md
index 44f7d443175..e9f8f34647e 100644
--- a/docs/install/pip/windows-pip.md
+++ b/docs/install/pip/windows-pip.md
@@ -1,19 +1,14 @@
# Windows 下的 PIP 安装
-## 一、环境准备
-
-### 1.1 目前飞桨支持的环境
+[The Python Package Index(PyPI)](https://pypi.org/)是 Python 的包管理器。本文档为你介绍 PyPI 安装方式,飞桨提供的 PyPI 安装包支持 TensorRT 推理功能。
-* **Windows 7/8/10 专业版/企业版 (64bit)**
-* **GPU 版本支持 CUDA 10.1/10.2/11.0/11.1/11.2,且仅支持单卡**
-* **Python 版本 3.6+/3.7+/3.8+/3.9+ (64 bit)**
-* **pip 版本 20.2.2 或更高版本 (64 bit)**
+## 一、环境准备
-### 1.2 如何查看您的环境
+### 1.1 如何查看您的环境
* 需要确认 python 的版本是否满足要求
- * 使用以下命令确认是 3.6/3.7/3.8/3.9
+ * 使用以下命令确认是 3.6/3.7/3.8/3.9/3.10
```
python --version
@@ -30,7 +25,7 @@
```
-* 需要确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构,目前 PaddlePaddle 不支持 arm64 架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64"、"x64"或"AMD64"即可:
+* 需要确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64"、"x64"或"AMD64"即可:
```
python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
@@ -51,17 +46,17 @@
* 如果您的计算机有 NVIDIA® GPU,请确保满足以下条件并且安装 GPU 版 PaddlePaddle
- * **CUDA 工具包 10.1/10.2 配合 cuDNN v7.6.5+**
+ * **CUDA 工具包 10.2 配合 cuDNN v7.6.5,如需使用 PaddleTensorRT 推理,需配合 TensorRT7.0.0.11**
- * **CUDA 工具包 11.0 配合 cuDNN v8.0.2**
+ * **CUDA 工具包 11.2 配合 cuDNN v8.2.1,如需使用 PaddleTensorRT 推理,需配合 TensorRT8.2.4.2**
- * **CUDA 工具包 11.1 配合 cuDNN v8.1.1**
+ * **CUDA 工具包 11.6 配合 cuDNN v8.4.0,如需使用 PaddleTensorRT 推理,需配合 TensorRT8.4.0.6**
- * **CUDA 工具包 11.2 配合 cuDNN v8.2.1**
+ * **CUDA 工具包 11.7 配合 cuDNN v8.4.1,如需使用 PaddleTensorRT 推理,需配合 TensorRT8.4.2.4**
* **GPU 运算能力超过 3.5 的硬件设备**
- * 注:目前官方发布的 windows 安装包仅包含 CUDA 10.1/10.2/11.0/11.1/11.2,如需使用其他 cuda 版本,请通过源码自行编译。您可参考 NVIDIA 官方文档了解 CUDA 和 CUDNN 的安装流程和配置方法,请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
+ * 注:目前官方发布的 windows 安装包仅包含 CUDA 10.2/11.2/11.6/11.7,如需使用其他 cuda 版本,请通过源码自行编译。您可参考 NVIDIA 官方文档了解 CUDA、CUDNN 和 TensorRT 的安装流程和配置方法,请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/),[TensorRT](https://developer.nvidia.com/tensorrt)
@@ -70,61 +65,82 @@
-#### 2.1 CPU 版的 PaddlePaddle
+#### 2.1 CPU 版的 PaddlePaddle
```
- python -m pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/windows/cpu-mkl-avx/develop.html
+ python -m pip install paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
-#### 2.2 GPU 版的 PaddlePaddle
+#### 2.2 GPU 版的 PaddlePaddle
-2.2.1 CUDA10.1 的 PaddlePaddle
+2.2.1 CUDA10.2 的 PaddlePaddle
```
- python -m pip install paddlepaddle-gpu==0.0.0.post101 -f https://www.paddlepaddle.org.cn/whl/windows/gpu/develop.html
+ python -m pip install paddlepaddle-gpu==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
-2.2.2 CUDA10.2 的 PaddlePaddle
-
+2.2.2 CUDA11.2 的 PaddlePaddle
```
- python -m pip install paddlepaddle-gpu==0.0.0.post102 -f https://www.paddlepaddle.org.cn/whl/windows/gpu/develop.html
+ python -m pip install paddlepaddle-gpu==2.4.2.post112 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
```
-2.2.3 CUDA11.0 的 PaddlePaddle
+2.2.3 CUDA11.6 的 PaddlePaddle
+ ```
+ python -m pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
+ ```
+
+2.2.4 CUDA11.7 的 PaddlePaddle
```
- python -m pip install paddlepaddle-gpu==0.0.0.post110 -f https://www.paddlepaddle.org.cn/whl/windows/gpu/develop.html
+ python -m pip install paddlepaddle-gpu==2.4.2.post117 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
```
-2.2.4 CUDA11.1 的 PaddlePaddle
+注:
+
+* 如果你使用的是安培架构的 GPU,推荐使用 CUDA11 以上。如果你使用的是非安培架构的 GPU,推荐使用 CUDA10.2,性能更优。
+
+* 请确认需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。根据您的环境您可能需要将说明中所有命令行中的 python 替换为具体的 Python 路径。
+
+* 上述命令默认安装`avx`的包。如果你的机器不支持`avx`,需要安装`noavx`的 Paddle 包。判断你的机器是否支持`avx`,可以安装[CPU-Z](https://www.cpuid.com/softwares/cpu-z.html)工具查看“处理器-指令集”。
+ 首先使用如下命令将 wheel 包下载到本地,仅支持 python3.8
+
+ * cpu、mkl 版本 noavx 机器安装:
```
- python -m pip install paddlepaddle-gpu==0.0.0.post111 -f https://www.paddlepaddle.org.cn/whl/windows/gpu/develop.html
+ python -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/noavx/stable.html --no-index --no-deps
```
+ * cpu、openblas 版本 noavx 机器安装:
+
+ ```
+ python -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/windows/openblas/noavx/stable.html --no-index --no-deps
+ ```
-2.2.5 CUDA11.2 的 PaddlePaddle
+ * gpu 版本 cuda10.2 noavx 机器安装:
```
- python -m pip install paddlepaddle-gpu==0.0.0.post112 -f https://www.paddlepaddle.org.cn/whl/windows/gpu/develop.html
+ python -m pip download paddlepaddle-gpu==2.4.2 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/noavx/stable.html --no-index --no-deps
```
+ 再使用`python -m pip install [name].whl`本地安装([name]为 wheel 包名称)
-注:
+* 如果你想安装`avx`、`openblas`的 Paddle 包,可以通过以下命令将 wheel 包下载到本地,再使用`python -m pip install [name].whl`本地安装([name]为 wheel 包名称):
+
+ ```
+ python -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/windows/openblas/avx/stable.html --no-index --no-deps
+ ```
-* 如果你使用的是安培架构的 GPU,推荐使用 CUDA11.2。如果你使用的是非安培架构的 GPU,推荐使用 CUDA10.2,性能更优。请参考: [GPU 架构对照表](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#nvidia-gpu)
-* 请确认需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。根据您的环境,可能需要将上述命令行中所有 `python` 替换为具体的 `Python 解释器` 路径(例如 C:\Python37\python.exe)。
## **三、验证安装**
@@ -133,10 +149,6 @@
如果出现`PaddlePaddle is installed successfully!`,说明您已成功安装。
-注:
-
-* 由于飞桨使用 Visual Studio 进行编译,使用时需要操作系统自带 Visual C++运行时库,大部分情况下 Windows 系统已默认自带,但对于某些纯净版系统可能未安装,若 `import paddle` 后出现 `DLL load failed` 报错,请下载 https://aka.ms/vs/17/release/vc_redist.x64.exe 安装后再次尝试。
-
## **四、如何卸载**
请使用以下命令卸载 PaddlePaddle:
diff --git a/docs/install/pip/windows-pip_en.md b/docs/install/pip/windows-pip_en.md
index 3ca6cf109be..d109f1d8c20 100644
--- a/docs/install/pip/windows-pip_en.md
+++ b/docs/install/pip/windows-pip_en.md
@@ -2,18 +2,11 @@
## Environmental preparation
-### 1.1 PREQUISITES
-
-* **Windows 7/8/10 Pro/Enterprise (64bit)**
-* **GPU Version support CUDA 10.1/10.2/11.0/11.1/11.2, and only support single GPU**
-* **Python version 3.6+/3.7+/3.8+/3.9+(64bit)**
-* **pip version 20.2.2 or above (64bit)**
-
-### 1.2 How to check your environment
+### 1.1 How to check your environment
* Confirm whether the Python version meets the requirements
- * Use the following command to confirm that it is 3.6+/3.7+/3.8+/3.9+
+ * Use the following command to confirm that it is 3.6+/3.7+/3.8+/3.9+/3.10+
python --version
@@ -28,7 +21,7 @@
python -m pip --version
```
-* You need to confirm that Python and pip are 64bit, and the processor architecture is x86_64(or called x64、Intel 64、AMD64). Currently, paddlepaddle does not support arm64 architecture. The first line below outputs "64bit", and the second line outputs "x86_64", "x64" or "AMD64"
+* You need to confirm that Python and pip are 64bit, and the processor architecture is x86_64(or called x64、Intel 64、AMD64). The first line below outputs "64bit", and the second line outputs "x86_64", "x64" or "AMD64"
```
python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
@@ -50,17 +43,17 @@ If you installed Python via Homebrew or the Python website, `pip` was installed
* If your computer has NVIDIA® GPU, please make sure that the following conditions are met and install [the GPU Version of PaddlePaddle](#gpu)
- * **CUDA toolkit 10.1/10.2 with cuDNN v7.6.5+**
+ * **CUDA toolkit 10.2 with cuDNN v7.6.5(for PaddleTensorRT deployment, TensorRT7.0.0.11)**
- * **CUDA toolkit 11.0 with cuDNN v8.0.2**
+ * **CUDA toolkit 11.2 with cuDNN v8.2.1(for PaddleTensorRT deployment, TensorRT8.2.4.2)**
- * **CUDA toolkit 11.1 with cuDNN v8.1.1**
+ * **CUDA toolkit 11.6 with cuDNN v8.4.0(for PaddleTensorRT deployment, TensorRT8.4.0.6)**
- * **CUDA toolkit 11.2 with cuDNN v8.2.1**
+ * **CUDA toolkit 11.7 with cuDNN v8.4.1(for PaddleTensorRT deployment, TensorRT8.4.2.4)**
* **GPU CUDA capability over 3.5**
- You can refer to NVIDIA official documents for installation process and configuration method of CUDA and cudnn. Please refer to [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
+ You can refer to NVIDIA official documents for installation process and configuration method of CUDA, cuDNN and TensorRT. Please refer to [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/),[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/),[TensorRT](https://developer.nvidia.com/tensorrt)
## Installation Step
@@ -68,60 +61,80 @@ If you installed Python via Homebrew or the Python website, `pip` was installed
You can choose the following version of PaddlePaddle to start installation:
-#### 2.1 CPU Versoion of PaddlePaddle
+
+#### 2.1 CPU Version of PaddlePaddle
```
- python -m pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/windows/cpu-mkl-avx/develop.html
+ python -m pip install paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
-#### 2.2 GPU Version of PaddlePaddle
+#### 2.2 GPU Version of PaddlePaddle
-2.2.1 If you are using CUDA 10.1
-
+2.2.1 If you are using CUDA 10.2
```
- python -m pip install paddlepaddle-gpu==0.0.0.post101 -f https://www.paddlepaddle.org.cn/whl/windows/gpu/develop.html
+ python -m pip install paddlepaddle-gpu==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
-2.2.2 If you are using CUDA 10.2
+2.2.2 If you are using CUDA 11.2
```
- python -m pip install paddlepaddle-gpu==0.0.0.post102 -f https://www.paddlepaddle.org.cn/whl/windows/gpu/develop.html
+ python -m pip install paddlepaddle-gpu==2.4.2.post112 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
```
-2.2.3 If you are using CUDA 11.0
+2.2.3 If you are using CUDA 11.6
+ ```
+ python -m pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
+ ```
+
+2.2.4 If you are using CUDA 11.7
```
- python -m pip install paddlepaddle-gpu==0.0.0.post110 -f https://www.paddlepaddle.org.cn/whl/windows/gpu/develop.html
+ python -m pip install paddlepaddle-gpu==2.4.2.post117 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
```
+Note:
-2.2.4 If you are using CUDA 11.1
+* If you are using ampere-based GPU, CUDA 11 above version is recommended; otherwise CUDA 10.2 is recommended for better performance.
+* Please confirm that the Python where you need to install PaddlePaddle is your expected location, because your computer may have multiple Python. Depending on the environment, you may need to replace Python in all command lines in the instructions with specific Python path.
- ```
- python -m pip install paddlepaddle-gpu==0.0.0.post111 -f https://www.paddlepaddle.org.cn/whl/windows/gpu/develop.html
- ```
+* The above commands install the `avx` package by default. If your machine does not support `avx`, you need to install the Paddle package of `noavx`, you can use the following command to install,noavx version paddle wheel only support python3.8:
+ First use the following command to download the wheel package to the local, and then use `python -m pip install [name].whl` to install locally ([name] is the name of the wheel package):
-2.2.5 If you are using CUDA 11.2
+ * cpu and mkl version installed on noavx machine:
- ```
- python -m pip install paddlepaddle-gpu==0.0.0.post112 -f https://www.paddlepaddle.org.cn/whl/windows/gpu/develop.html
- ```
+ ```
+ python -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/noavx/stable.html --no-index --no-deps
+ ```
-Note:
+ * cpu and openblas version installed on noavx machine:
-* If you are using ampere-based GPU, CUDA 11.2 is recommended; otherwise CUDA 10.2 is recommended for better performance. please refer to: [GPU architecture comparison table](https://www.paddlepaddle.org.cn/documentation/docs/en/install/Tables.html#nvidia-gpu)
+ ```
+ python -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/windows/openblas/noavx/stable.html --no-index --no-deps
+ ```
-* Please confirm that the Python where you need to install PaddlePaddle is your expected location, because your computer may have multiple Python. Depending on the environment, you may need to replace Python in all command lines in the instructions with specific Python path.
+ * GPU cuda10.2 version install on noavx machine:
+
+ ```
+ python -m pip download paddlepaddle-gpu==2.4.2 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/noavx/stable.html --no-index --no-deps
+ ```
+ To determine whether your machine supports `avx`, you can install the [CPU-Z](https://www.cpuid.com/softwares/cpu-z.html) tool to view the "processor-instruction set".
+
+
+* If you want to install the Paddle package with `avx` and `openblas`, you can use the following command to download the wheel package to the local, and then use `python -m pip install [name].whl` to install locally ([name] is the name of the wheel package):
+
+ ```
+ python -m pip download paddlepaddle==2.4.2 -f https://www.paddlepaddle.org.cn/whl/windows/openblas/avx/stable.html --no-index --no-deps
+ ```
## Verify installation
diff --git a/docs/release_note_cn.md b/docs/release_note_cn.md
index 0c6b321de6e..d8864319f7c 100644
--- a/docs/release_note_cn.md
+++ b/docs/release_note_cn.md
@@ -1,3 +1,249 @@
+# 2.4.1 Release Note
+
+
+去除飞桨对 python.so 的依赖,修复在包括 conda 在内的特定的环境下,因无法找到 python.so 而造成运行失败的 Bug。
+
+
+
+# 2.4.0 Release Note
+
+## 1. 重要更新
+
+- **新动态图架构正式生效**:新动态图框架调大幅提升了调度性能,超 90%API 的调度性能提升超过 50%,超 50%套件模型性能提升超过 5%,功能架构更加清晰,二次开发能力和体验显著增强。
+
+- **全面提升了飞桨的动静统一能力:** 动转静功能提供了更加丰富的 Python 语法支持,飞桨的 Python 语法覆盖率达到 90%,对语法转写逻辑进行了重点地优化,完备地支持了控制流语法,提供了更加流畅的一键转静态图体验;借助全新升级的静态图执行器,让动转静训练具有更优的加速能力,重点模型测试显示接近静态图最佳水平;提升了动转静的可扩展性,新增支持多函数合并导出和推理,支持用户使用 PHI 算子库进行二次开发和灵活部署,有效支撑语音领域 U2++特色模型的自定义解码。
+
+- **新增稀疏计算类 API:** 新增 55 个稀疏 API `paddle.sparse.*`,支持稀疏计算主流场景,已应用于 3D 点云目标检测、Sparse Transformers 等任务的稀疏训练和推理部署,高稀疏度场景下相比使用 DenseTensor 提速 105.75%,相比同类产品稀疏计算提速 4.01%~58.55%;支持多种稀疏 Tensor(SparseCoo 和 SparseCsr 等)的计算,极致节省显存;同时保持了一致的使用体验,和稠密 Tensor 的 API 使用方式一致。
+
+- **大规模图神经网络 GPU 训练引擎:** 通过 SSD、内存、显存的异构层次化存储技术,突破显存瓶颈,支持超大规模图的全 GPU 存储和训练;实现了游走、采样、训练的全 GPU 一体化解决方案,相比传统的分布式 CPU 解决方案,相同成本的情况下训练速度提升 10+倍。
+
+- **环境适配:** 新增了适配 CUDA11.7 版本的预编译安装包,新增了支持在 Ubuntu 22.04 及以上版本中运行。
+
+### 前瞻性预告
+
+- 飞桨框架将在 2.5 版本废弃对 python 3.6 的支持。
+- 飞桨框架将会逐步废弃 python 端的`paddle.fluild`命名空间下的 API,在 2.5 版本时,部分该命名空间下的 API 将会被直接删除。
+
+## 2. 不兼容升级
+
+- 取消了适配 CUDA10.1 版本的预编译安装包。
+- Tensor.clear_gradient(bool set_to_zero)接口不再接收 kwargs 传入的值,只能通过 args 传入 set_to_zero 的 bool 变量。
+- 为了提高显存利用效率,动态图默认仅保留前向叶子结点变量的梯度如训练中网络参数的梯度,而不再支持默认保留非叶子结点的梯度。如果需要保留特定 Tensor 的梯度,可以在反向执行前调用 Tensor.retain_grads()接口。
+- paddle.autograd.PyLayer 将不再支持输入是 tuple 的情况,如果输入希望是一组 Tensor 的情况请传入 list of Tensor。
+
+## 3. 训练框架(含分布式)
+
+### (1)新增 API 和增强 API 功能
+- **新增稀疏计算类 API**:paddle.sparse
+ - 新增 55 个稀疏 API,支持稀疏计算主流场景,已应用于 3D 点云目标检测、Sparse Transformers 等任务的稀疏训练和推理部署,高稀疏度场景下相比使用 DenseTensor 提速 105.75%,相比同类产品稀疏计算提速 4.01%~58.55%;支持多种稀疏 Tensor(SparseCoo 和 SparseCsr 等)的计算,极致节省显存;同时保持了一致的使用体验,和稠密 Tensor 的 API 使用方式一致。[#45849](https://github.com/PaddlePaddle/Paddle/pull/45849), [#46694](https://github.com/PaddlePaddle/Paddle/pull/46694), [#45086](https://github.com/PaddlePaddle/Paddle/pull/45086), [#41857](https://github.com/PaddlePaddle/Paddle/pull/41857), [#42935](https://github.com/PaddlePaddle/Paddle/pull/42935), [#43475](https://github.com/PaddlePaddle/Paddle/pull/43475), [#43668](https://github.com/PaddlePaddle/Paddle/pull/43668), [#43966](https://github.com/PaddlePaddle/Paddle/pull/43966), [#44022](https://github.com/PaddlePaddle/Paddle/pull/44022), [#44346](https://github.com/PaddlePaddle/Paddle/pull/44346), [#44432](https://github.com/PaddlePaddle/Paddle/pull/44432), [#44451](https://github.com/PaddlePaddle/Paddle/pull/44451), [#44743](https://github.com/PaddlePaddle/Paddle/pull/44743), [#42013](https://github.com/PaddlePaddle/Paddle/pull/42013), [#43520](https://github.com/PaddlePaddle/Paddle/pull/43520), [#41434](https://github.com/PaddlePaddle/Paddle/pull/41434), [#42130](https://github.com/PaddlePaddle/Paddle/pull/42130), [#41276](https://github.com/PaddlePaddle/Paddle/pull/41276), [#41857](https://github.com/PaddlePaddle/Paddle/pull/41857), [#41356](https://github.com/PaddlePaddle/Paddle/pull/41356)
+- **新增语音领域 API:** paddle.audio
+ - 新增 MFCC、Spectrogram、LogMelSpectrogram 等特征提取 API,支持 GPU 计算,相比 CPU 实现处理性能提升 15x 倍以上,可大幅提升语音模型训练 GPU 利用率。[#45424](https://github.com/PaddlePaddle/Paddle/pull/45424)
+ - 新增窗函数、离散余弦变换等特征提取基础 API,方便用户自定义语音特征提取。[#45424](https://github.com/PaddlePaddle/Paddle/pull/45424)
+ - 新增语音 IO 模块,提供 2 种 音频 I/O backend,支持 6 种编解码,便捷地实现语音数据的加载。 [#45939](https://github.com/PaddlePaddle/Paddle/pull/45939)
+ - 新增 TESS,ESC50 语音分类数据集,方便用户完成经典语音分类模型。[#45939](https://github.com/PaddlePaddle/Paddle/pull/45939)
+- **新增图学习领域 API:** paddle.geometric
+ - 图学习逐渐成为机器学习领域的关键技术,飞桨新增 paddle.geometric 模块提供更好的图学习建模和训练开发体验。
+ - 消息传递:图学习消息传递机制是图建模的基础,因此新增 7 个图学习消息传递 API,更方便完成进行图学习建模。其中,新增的 3 个消息传递融合算子可大幅减少图模型训练显存占用,稠密图场景下 GCN 系列模型可节省 50%+显存,训练速度可提升 20%+。[#44848](https://github.com/PaddlePaddle/Paddle/pull/44848), [#44580](https://github.com/PaddlePaddle/Paddle/pull/44580), [#43174](https://github.com/PaddlePaddle/Paddle/pull/43174), [#44970](https://github.com/PaddlePaddle/Paddle/pull/44970)
+ - 图采样:图采样是图模型训练的性能瓶颈,此次新增了高性能图采样算子,支持高并发图采样,GraphSage 的采样速度可提升 32 倍以上,模型训练速度可提升 12 倍以上。[#44970](https://github.com/PaddlePaddle/Paddle/pull/44970)
+- **新增视觉领域 API**
+ - paddle.vision 新增目标检测领域算子 paddle.vision.distribute_fpn_proposals([#43736](https://github.com/PaddlePaddle/Paddle/pull/43736)), paddle.vision.generate_proposals([#43611](https://github.com/PaddlePaddle/Paddle/pull/43611)), paddle.vision.matrix_nms([#44357](https://github.com/PaddlePaddle/Paddle/pull/44357)), paddle.vision.prior_box 和 paddle.vision.box_coder([#47282](https://github.com/PaddlePaddle/Paddle/pull/47282))。
+
+- - **新增其他 API**
+ - 新增 iinfo([#45321](https://github.com/PaddlePaddle/Paddle/pull/45321)), count_nonzero([#44169](https://github.com/PaddlePaddle/Paddle/pull/44169)), nanmedian([#42385](https://github.com/PaddlePaddle/Paddle/pull/42385)), remainder\_ ([#45266](https://github.com/PaddlePaddle/Paddle/pull/45266)), take([#44741](https://github.com/PaddlePaddle/Paddle/pull/44741)), triu_indices([#45168](https://github.com/PaddlePaddle/Paddle/pull/45168)), sgn([#44568](https://github.com/PaddlePaddle/Paddle/pull/44568)), bucketize([#44195](https://github.com/PaddlePaddle/Paddle/pull/44195)), nanquantile([#41343](https://github.com/PaddlePaddle/Paddle/pull/41343)), frac([#41226](https://github.com/PaddlePaddle/Paddle/pull/41226)), logcumsumexp([#42267](https://github.com/PaddlePaddle/Paddle/pull/42267)), pairwise_distance([#44161](https://github.com/PaddlePaddle/Paddle/pull/44161)), heaviside([#41872](https://github.com/PaddlePaddle/Paddle/pull/41872)), logspace([#41261](https://github.com/PaddlePaddle/Paddle/pull/41261)), corrcoef([#40690](https://github.com/PaddlePaddle/Paddle/pull/40690))
+ - 新增 RReLU([#41823](https://github.com/PaddlePaddle/Paddle/pull/41823)), CyclicLR([#40698](https://github.com/PaddlePaddle/Paddle/pull/40698)), OneCycleLR([#41825](https://github.com/PaddlePaddle/Paddle/pull/41825)), Softmax2D([#40910](https://github.com/PaddlePaddle/Paddle/pull/40910)), SoftMarginLoss([#42364](https://github.com/PaddlePaddle/Paddle/pull/42364)), MultiLabelSoftMarginLoss([#41183](https://github.com/PaddlePaddle/Paddle/pull/41183)), TripletMarginLoss([#40487](https://github.com/PaddlePaddle/Paddle/pull/40487)), TripletMarginWithDistanceLoss([#40545](https://github.com/PaddlePaddle/Paddle/pull/40545)), CosineEmbeddingLoss 和 cosine_embedding_loss([#41680](https://github.com/PaddlePaddle/Paddle/pull/41680)), PixelUnshuffle([#40728](https://github.com/PaddlePaddle/Paddle/pull/40728)), ChannelShuffle([#40743](https://github.com/PaddlePaddle/Paddle/pull/40743))
+- **增强 API 功能**
+ - 增加 BatchNorm1D 的大 batch_size 计算功能 [#43072](https://github.com/PaddlePaddle/Paddle/pull/43072)
+- **完善集合通信分布式训练 API**
+ - 完善`fleet.init`函数,增加`log_level`参数,方便用户查看运行过程中的日志 [#45909](https://github.com/PaddlePaddle/Paddle/pull/45909)
+ - 新增`paddle.distributed.fleet.recompute_sequential paddle.distributed.fleet.recompute_hybrid`接口,方便用户使用 recompute 功能[#45348](https://github.com/PaddlePaddle/Paddle/pull/45348)
+ - 新增`paddle.distributed.fleet.layers.mpu` package,方便用户使用张量并行功能 [#45803](https://github.com/PaddlePaddle/Paddle/pull/45803)
+ - 新增通信 API `paddle.distributed.destroy_process_group paddle.distributed.isend paddle.distributed.irecv paddle.distributed.all_to_all_single`,提升了通信的功能完备性和易用性 [#43918](https://github.com/PaddlePaddle/Paddle/pull/43918)
+ - 新增`paddle.distributed.stream` 通信 package,性能比基础版本提升 5%到 10% [#46023](https://github.com/PaddlePaddle/Paddle/pull/46023) [#45282](https://github.com/PaddlePaddle/Paddle/pull/45282)
+ - 通信 API 新增多种数据类型`Char/Byte/Bool`等的支持,提升了通信的功能完备性和易用性 [#45574](https://github.com/PaddlePaddle/Paddle/pull/45574) [#45440](https://github.com/PaddlePaddle/Paddle/pull/45440)
+ - 通信 API 异步参数从`use_calc_stream`变成`sync_op`,增强了接口的语义可读性 [#46493](https://github.com/PaddlePaddle/Paddle/pull/46493)
+- **增强高层 API**
+ - 高层 API 中视觉模型 ResNeXt 实现复用 ResNet 代码进行重构。 [#40588](https://github.com/PaddlePaddle/Paddle/pull/40588)
+ - 高层 API 中视觉模型 Inceptionv3、MobileNetv1、MobileNetv2、ShuffleNetv2 实现改进。[#40431](https://github.com/PaddlePaddle/Paddle/pull/40431)
+
+### (2)新功能及重要功能升级
+
+- **新动态图架构正式上线**:新动态图框架调度性能大幅提升,相比原有架构大幅提升了调度性能,超 90%API 的调度性能提升超过 50%,超 50%套件模型性能提升超过 5%; 新动态图架构清晰,耦合度低,基于新架构实现 Hook、PyLayer 等扩展模块的学习与开发成本显著降低。[#37550](https://github.com/PaddlePaddle/Paddle/pull/37550),[#37574](https://github.com/PaddlePaddle/Paddle/pull/37574),[#37813](https://github.com/PaddlePaddle/Paddle/pull/37813),[#37926](https://github.com/PaddlePaddle/Paddle/pull/37926),[#39192](https://github.com/PaddlePaddle/Paddle/pull/39192),[#37599](https://github.com/PaddlePaddle/Paddle/pull/37599),[#37406](https://github.com/PaddlePaddle/Paddle/pull/37406),[#37466](https://github.com/PaddlePaddle/Paddle/pull/37466),[#37599](https://github.com/PaddlePaddle/Paddle/pull/37599),[#40945](https://github.com/PaddlePaddle/Paddle/pull/40945),[#39989](https://github.com/PaddlePaddle/Paddle/pull/39989)
+
+- **高阶自动微分机制**:为了更好支持科学计算等场景,飞桨框架针对高阶自动微分能力进一步完善优化。目前,已在`paddle.incubate.autograd` 目录下提供了支持前反向高阶自动微分相关试用功能及 API(当前处于孵化状态,相关功能及 API 签名可能会发生变化)。如果想自行实现相关模型、探索自动微分机制,请仔细阅读[高阶自动微分使用方法及限制](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/incubate/autograd/Overview_cn.html)。具体的升级包括:
+ 1. 静态图高阶微分机制升级,通过基础算子体系和程序变换,支持高阶前向及反向微分,并打通编译器、分布式功能。[#41919](https://github.com/PaddlePaddle/Paddle/pull/41919), [#41201](https://github.com/PaddlePaddle/Paddle/pull/41201)
+ 2. 新增前向和反向高阶自动微分 API, `paddle.incubate.autograd.forward_grad`, `paddle.incubate.autograd.grad`。[#43354](https://github.com/PaddlePaddle/Paddle/pull/43354)
+ 3. 新增 18 个高阶自动微分算子`sin`, `cos`, `exp`, `erf`, `abs`, `log`, `cast`, `where`, `equal`, `not_equal`, `greater_than`, `greater_equal`, `elementwise_pow` `square`, `elementwise_max`, `gelu`, `reduce_mean`, `size`。[#46184](https://github.com/PaddlePaddle/Paddle/pull/46184), [#46024](https://github.com/PaddlePaddle/Paddle/pull/46024), [#45888](https://github.com/PaddlePaddle/Paddle/pull/45888), [#45338](https://github.com/PaddlePaddle/Paddle/pull/45338), [#44345](https://github.com/PaddlePaddle/Paddle/pull/44345)
+ 4. 修复现有`elementwise_div`, `reduce_sum`, `p_norm`等算子缺陷。[#46514](https://github.com/PaddlePaddle/Paddle/pull/46514), [#46184](https://github.com/PaddlePaddle/Paddle/pull/46184)
+
+- **通用异构参数服务器架构**:
+ - 参数服务器 GPUGraph 基础架构升级,满足大规模应用落地:针对传统 CPU 存储和训练大规模图神经网络的成本高,稳定性低,性能不足的问题打造了纯 GPU 图训练引擎(PGLBox),通过 SSD、内存、显存的异构层次化存储技术,支持超大规模图模型训练,同等成本下训练性能相对 CPU 图训练引擎提升 10+倍,任务失败率下降到极低。[#44594](https://github.com/PaddlePaddle/Paddle/pull/44594)
+ - 大规模联邦参数服务器架构:针对大规模个性化推荐场景,基于异构 PS 基础架构,开发了大规模联邦参数服务器训练,支持千亿参数下的横向纵向联邦,它包括两个特性:用户私有参数本地更新,公共参数在远端更新,用户可灵活配置私有参数和公共参数的切分策略;新增中心调度节点 Coordinator,用户可从基类进行二次开发,自定义 Client 选择策略。[#42682](https://github.com/PaddlePaddle/Paddle/pull/42682),[#44864](https://github.com/PaddlePaddle/Paddle/pull/44864),[#44327](https://github.com/PaddlePaddle/Paddle/pull/44327)
+- **自适应并行**
+ - 设计并推出了完善的自动并行接口体系,支持自动动转静分布式训练、自动分布式数据加载、自动分布式保存与加载、自动参数转换、自定义切分标记和自定义执行过程等。用户只需要基于单机组网就可以非常容易获得自动分布式训练能力,支持数据并行、模型并行、流水线并行和混合并行。[#45776](https://github.com/PaddlePaddle/Paddle/pull/45776) ,[#46552](https://github.com/PaddlePaddle/Paddle/pull/46552),[#44202](https://github.com/PaddlePaddle/Paddle/pull/44202),[#45840](https://github.com/PaddlePaddle/Paddle/pull/45840),[#45518](https://github.com/PaddlePaddle/Paddle/pull/45518),[#40528](https://github.com/PaddlePaddle/Paddle/pull/40528),[#42838](https://github.com/PaddlePaddle/Paddle/pull/42838),[#43093](https://github.com/PaddlePaddle/Paddle/pull/43093),[#43312](https://github.com/PaddlePaddle/Paddle/pull/43312),[#45053](https://github.com/PaddlePaddle/Paddle/pull/45053)。
+ - 完善了自适应并行底层机制,包括升级分布式 cost model 设计和实现,为切分策略提供更好评价;为 Program IR 添加了原生分布式属性,丰富了 Cluster 功能。[#40457](https://github.com/PaddlePaddle/Paddle/pull/40457),[#42601](https://github.com/PaddlePaddle/Paddle/pull/42601),[#42727](https://github.com/PaddlePaddle/Paddle/pull/42727),[#42874](https://github.com/PaddlePaddle/Paddle/pull/42784),[#43114](https://github.com/PaddlePaddle/Paddle/pull/43114),[#44095](https://github.com/PaddlePaddle/Paddle/pull/44095),[#44146](https://github.com/PaddlePaddle/Paddle/pull/44146),[#44701](https://github.com/PaddlePaddle/Paddle/pull/44701),[#44973](https://github.com/PaddlePaddle/Paddle/pull/44973),[#45002](https://github.com/PaddlePaddle/Paddle/pull/45002),[#45118](https://github.com/PaddlePaddle/Paddle/pull/45118),[#45237](https://github.com/PaddlePaddle/Paddle/pull/45237),[#42576](https://github.com/PaddlePaddle/Paddle/pull/42576),[#41722](https://github.com/PaddlePaddle/Paddle/pull/41722),[#44150](https://github.com/PaddlePaddle/Paddle/pull/44150), [#44989](https://github.com/PaddlePaddle/Paddle/pull/44989), [#44951](https://github.com/PaddlePaddle/Paddle/pull/44951), [#44963](https://github.com/PaddlePaddle/Paddle/pull/44963)。
+ - 新增数据并行下 Sharding stage1/2/3 自动调优功能,在保证满足显存约束情况下,自动选择吞吐最高的 Sharding stage 策略。[#43782](https://github.com/PaddlePaddle/Paddle/pull/43782)。
+
+- **训练硬件接入-插件式方案**:新增了自定义 Runtime/Kernel/CCL/Graph/Pass 等方案,硬件厂商可以根据硬件特性按需选择实现哪些模块。
+
+- **ONNX 格式导出**
+ - 支持量化模型导出,导出后的 ONNX 模型使用 TensorRT 或 ONNXRuntime 加载推理,可获得 1.5~4 倍的推理加速 [#856](https://github.com/PaddlePaddle/Paddle2ONNX/pull/856),[#782](https://github.com/PaddlePaddle/Paddle2ONNX/pull/782)
+ - 新增大于 2GB 的大模型导出 [#942](https://github.com/PaddlePaddle/Paddle2ONNX/pull/942)
+
+### (3)功能优化
+- **动转静分析转换 & 扩展能力全面提升**
+ - 为了提升模型动转静转换成功率和使用体验,重构了控制流语法的转写逻辑,升级核心语法为 JIT (just-in-time)范式,实现与 Python 代码的等价转写,并完善了 break、return、continue 等语法功能。[#43666](https://github.com/PaddlePaddle/Paddle/pull/43666),[#43846](https://github.com/PaddlePaddle/Paddle/pull/43846),[#43848](https://github.com/PaddlePaddle/Paddle/pull/43848),[#43880](https://github.com/PaddlePaddle/Paddle/pull/43880),[#43957](https://github.com/PaddlePaddle/Paddle/pull/43957),[#43328](https://github.com/PaddlePaddle/Paddle/pull/43328),[#43348](https://github.com/PaddlePaddle/Paddle/pull/43348),[#43998](https://github.com/PaddlePaddle/Paddle/pull/43998),[#44465](https://github.com/PaddlePaddle/Paddle/pull/44465),[#44504](https://github.com/PaddlePaddle/Paddle/pull/44504),[#43713](https://github.com/PaddlePaddle/Paddle/pull/43713),[#43864](https://github.com/PaddlePaddle/Paddle/pull/43864),[#43967](https://github.com/PaddlePaddle/Paddle/pull/43967),[#44155](https://github.com/PaddlePaddle/Paddle/pull/44155),[#44487](https://github.com/PaddlePaddle/Paddle/pull/44487),[#44527](https://github.com/PaddlePaddle/Paddle/pull/44527),[#45105](https://github.com/PaddlePaddle/Paddle/pull/45105),[#45900](https://github.com/PaddlePaddle/Paddle/pull/45900)
+ - 为了支撑语音等场景自定义解码灵活部署场景,扩展了 jit.save/load 接口功能,支持用户多函数合并导出,并新增了 JITLayer 组件,支持类函数式调用,同时配合 PHI 算子库 C++ API 实现了自定义推理部署功能。[#44283](https://github.com/PaddlePaddle/Paddle/pull/44283),[#41783](https://github.com/PaddlePaddle/Paddle/pull/41783),[#43607](https://github.com/PaddlePaddle/Paddle/pull/43607),[#43754](https://github.com/PaddlePaddle/Paddle/pull/43754),[#43758](https://github.com/PaddlePaddle/Paddle/pull/43758),[#43798](https://github.com/PaddlePaddle/Paddle/pull/43798),[#44010](https://github.com/PaddlePaddle/Paddle/pull/44010),[#44351](https://github.com/PaddlePaddle/Paddle/pull/44351),[#44465](https://github.com/PaddlePaddle/Paddle/pull/44465),[#44504](https://github.com/PaddlePaddle/Paddle/pull/44504),[#44597](https://github.com/PaddlePaddle/Paddle/pull/44597),[#44738](https://github.com/PaddlePaddle/Paddle/pull/44738),[#44984](https://github.com/PaddlePaddle/Paddle/pull/44984),[#46249](https://github.com/PaddlePaddle/Paddle/pull/46249)
+ - 为了统一 API 动静行为,升级了 20 个算子,支持在静态图中 Op 的 attribute 信息可变,保证动静行为一致,提升模型的动转静转换成功率。包括`pad2d`、`depthwise_conv2d_transpose`、`conv2d_transpose`、`adaptive_avg_pool2d`、`reverse`、`bincount`、`multinomial`、`reduce_sum`、`reduce_mean`、`reduce_prod`、`reduce_min`、`reduce_max`、`uniform`、`squeeze`、`max_unpool2d`、`dropout`、`cumsum`、`eye`、`argmin`、`argmax`,[#44737](https://github.com/PaddlePaddle/Paddle/pull/44737),[#45084](https://github.com/PaddlePaddle/Paddle/pull/45084),[#45189](https://github.com/PaddlePaddle/Paddle/pull/45189),[#45391](https://github.com/PaddlePaddle/Paddle/pull/45391),[#45417](https://github.com/PaddlePaddle/Paddle/pull/45417),[#45427](https://github.com/PaddlePaddle/Paddle/pull/45427)、[#45514](https://github.com/PaddlePaddle/Paddle/pull/45514)、[#45525](https://github.com/PaddlePaddle/Paddle/pull/45525)、[#45543](https://github.com/PaddlePaddle/Paddle/pull/45543)、[#45660](https://github.com/PaddlePaddle/Paddle/pull/45660)、[#46352](https://github.com/PaddlePaddle/Paddle/pull/46352/)、[#46433](https://github.com/PaddlePaddle/Paddle/pull/46433)、[#45078](https://github.com/PaddlePaddle/Paddle/pull/45078),[#45342](https://github.com/PaddlePaddle/Paddle/pull/45342),[#45372](https://github.com/PaddlePaddle/Paddle/pull/45372),[#45453](https://github.com/PaddlePaddle/Paddle/pull/45453),[#45522](https://github.com/PaddlePaddle/Paddle/pull/45522),[#45620](https://github.com/PaddlePaddle/Paddle/pull/45620)
+ - 为了解决用户动转静报错栈偶尔丢失问题,优化了报错模块的逻辑,提升了报错栈的可读性以及用户调试的使用体验。[#44054](https://github.com/PaddlePaddle/Paddle/pull/44054),[#44083](https://github.com/PaddlePaddle/Paddle/pull/44083),[#44781](https://github.com/PaddlePaddle/Paddle/pull/44781),[#44996](https://github.com/PaddlePaddle/Paddle/pull/44996)
+ - 为了全面支持 Python 类型 Type Hint 语法,新增了 TypeHint 语法识别和转写模块。[#47121](https://github.com/PaddlePaddle/Paddle/pull/47121)
+
+- **PHI 算子库覆盖全量运算类算子**:继续建设高可复用算子库 PHI,将剩余的飞桨 2.x 运算类 PythonAPI 关联的算子以及相关内核均迁移到 PHI 算子库,并改写为函数式,新增了约 180 个前反向算子的 CPU&GPU 内核,以及 170 个 Kunlun 专用算子内核,进一步提升了新增算子时可复用的内核函数集。同时,新增了 100 余个 C++运算类 API,可支持在自定义算子中使用,进一步提升了基于飞桨进行外部扩展开发的易用性。[#44577](https://github.com/PaddlePaddle/Paddle/pull/44577),[#44631](https://github.com/PaddlePaddle/Paddle/pull/44631),[#44434](https://github.com/PaddlePaddle/Paddle/pull/44434),[#44605](https://github.com/PaddlePaddle/Paddle/pull/44605),[#44676](https://github.com/PaddlePaddle/Paddle/pull/44676),[#44742](https://github.com/PaddlePaddle/Paddle/pull/44742),[#44436](https://github.com/PaddlePaddle/Paddle/pull/44436),[#45887](https://github.com/PaddlePaddle/Paddle/pull/45887),[#45851](https://github.com/PaddlePaddle/Paddle/pull/45851),[#45623](https://github.com/PaddlePaddle/Paddle/pull/45623),[#45397](https://github.com/PaddlePaddle/Paddle/pull/45397),[#45863](https://github.com/PaddlePaddle/Paddle/pull/45863)
+
+- **规范化算子定义,大幅提升模型简洁度**:针对飞桨 1.x 历史算子定义存在诸多冗余参数,理解适配成本高的问题,对约 150 个高频算子的冗余参数进行了集中清理,基本上将数学无关的参数清理完毕。这些冗余参数清理后,飞桨存储的推理模型中信息量明显减少,普遍裁减掉了约 40%的属性变量,显著提升了飞桨算子定义的清晰程度,提升了模型分析调试的体验;同时,也显著减小了飞桨存储推理模型的体积,普遍减小超过 70%,显著提升了飞桨模型的轻量化程度。[#44310](https://github.com/PaddlePaddle/Paddle/pull/44310) , [#45613](https://github.com/PaddlePaddle/Paddle/pull/45613) , [#45684](https://github.com/PaddlePaddle/Paddle/pull/45684) , [#45708](https://github.com/PaddlePaddle/Paddle/pull/45708) , [#45758](https://github.com/PaddlePaddle/Paddle/pull/45758) , [#45786](https://github.com/PaddlePaddle/Paddle/pull/45786) , [#45772](https://github.com/PaddlePaddle/Paddle/pull/45772) , [#45845](https://github.com/PaddlePaddle/Paddle/pull/45845) , [#45984](https://github.com/PaddlePaddle/Paddle/pull/45984) , [#46218](https://github.com/PaddlePaddle/Paddle/pull/46218) , [#46553](https://github.com/PaddlePaddle/Paddle/pull/46553)
+
+### (4)性能优化
+
+- AMP 性能及精度优化
+ - 更多算子增加 FP16 数据类型支持,包括 elementwise 系列算子, compare 系列算子, strided_slice, set_value, uniform_ramdom 等。([#45504](https://github.com/PaddlePaddle/Paddle/pull/45504) [#44405](https://github.com/PaddlePaddle/Paddle/pull/44405) [#45496](https://github.com/PaddlePaddle/Paddle/pull/45496) [#46641](https://github.com/PaddlePaddle/Paddle/pull/46641) [#46906](https://github.com/PaddlePaddle/Paddle/pull/46906))
+ - 优化 hard_swish 算子 FP16 Kernel 实现方案,保证精度无损。( [35386](https://github.com/PaddlePaddle/Paddle/pull/35386) )
+ - 更多算子增加 BF16 数据类型支持,包括 fused_linear、empty、selu、pow、adam、clip、embedding、gelu、pad3d、pixel_shuffle、tile、where 等。[#46364](https://github.com/PaddlePaddle/Paddle/pull/46364),[#47177](https://github.com/PaddlePaddle/Paddle/pull/47177)
+- 单机训练性能自动调优
+ - Transpose OP 支持自动 Kernel 选择机制,可以针对不同模型配置自动搜索到性能最优的 Kernel 实现,提升模型性能。[#43310](https://github.com/PaddlePaddle/Paddle/pull/43310) (Transpose Op 接入自动调优功能)
+ - AMP Layout 自动切换支持新动态图模式,ResNet50、TSM、DeepLabV3 等模型在新动态图下通过 Layout 自动调整获得性能提升 9%~21%。([#45409](https://github.com/PaddlePaddle/Paddle/pull/45409), [#45751](https://github.com/PaddlePaddle/Paddle/pull/45751), [#45826](https://github.com/PaddlePaddle/Paddle/pull/45826), [#46880](https://github.com/PaddlePaddle/Paddle/pull/46880))
+- GPU 单机训练通用性能优化
+ - 优化 Conv 类算子 cuDNN 算法的 Cache 方案,并 Cache 所有算法获取方式下的结果,大幅减少算子的 CPU 开销。([#41891](https://github.com/PaddlePaddle/Paddle/pull/41891) [#47197](https://github.com/PaddlePaddle/Paddle/pull/47197))
+ - 进一步优化多个算子的 GPU Kernel 和 Python 端性能,包括 dist, poisson, depthwise_conv2d、transpose, eigh, broadcast 类计算,reduce 类计算,layer_norm,cross_entropy 等,在更多配置场景下达到更优性能。([#44946](https://github.com/PaddlePaddle/Paddle/pull/44946), [#45057](https://github.com/PaddlePaddle/Paddle/pull/45057), [#45160](https://github.com/PaddlePaddle/Paddle/pull/45160), [#42491](https://github.com/PaddlePaddle/Paddle/pull/42491), [#42704](https://github.com/PaddlePaddle/Paddle/pull/42704), [#42853](https://github.com/PaddlePaddle/Paddle/pull/42853), [#46287](https://github.com/PaddlePaddle/Paddle/pull/46287), [#46362](https://github.com/PaddlePaddle/Paddle/pull/46362), [#46490](https://github.com/PaddlePaddle/Paddle/pull/46490), [#46412](https://github.com/PaddlePaddle/Paddle/pull/46412), [#46623](https://github.com/PaddlePaddle/Paddle/pull/46623), [#40051](https://github.com/PaddlePaddle/Paddle/pull/40051))
+- 集合通信分布式训练性能优化
+ - 为提高流水线并行调度效率,支持动态图 Interleaving 1F1B 调度策略,在 GPT-3 模型上性能提升 3%~4%。[#45797](https://github.com/PaddlePaddle/Paddle/pull/45797),[#45869](https://github.com/PaddlePaddle/Paddle/pull/45869),[#45922](https://github.com/PaddlePaddle/Paddle/pull/45922),[#46209](https://github.com/PaddlePaddle/Paddle/pull/46209),[#45402](https://github.com/PaddlePaddle/Paddle/pull/45402),[#45444](https://github.com/PaddlePaddle/Paddle/pull/45444),[#45497](https://github.com/PaddlePaddle/Paddle/pull/45497),[#45797](https://github.com/PaddlePaddle/Paddle/pull/45797),[#45869](https://github.com/PaddlePaddle/Paddle/pull/45869),[#45922](https://github.com/PaddlePaddle/Paddle/pull/45922),[#46209](https://github.com/PaddlePaddle/Paddle/pull/46209),[#46399](https://github.com/PaddlePaddle/Paddle/pull/46399),[#46483](https://github.com/PaddlePaddle/Paddle/pull/46483),[#46876](https://github.com/PaddlePaddle/Paddle/pull/46876),[#47242](https://github.com/PaddlePaddle/Paddle/pull/47242),[#47249](https://github.com/PaddlePaddle/Paddle/pull/47249),[#47497](https://github.com/PaddlePaddle/Paddle/pull/47497),[#47517](https://github.com/PaddlePaddle/Paddle/pull/47517)
+ - 为提升 MLPerf BERT 模型的分布式训练性能,DistributedFusedLamb 分布式优化器支持分层 AllReduce,在 DCU 1024 卡上 MLPerf BERT 性能提升 17%。[#44821](https://github.com/PaddlePaddle/Paddle/pull/44821),[#44843](https://github.com/PaddlePaddle/Paddle/pull/44843)
+ - 为优化使用数据并行 Data Parallel 时的显存占用,支持 Tensor Fusion 时的 Buffer Lazy 初始化策略,可降低等于模型参数量的显存占用量。[#45631](https://github.com/PaddlePaddle/Paddle/pull/45631)。
+ - 分布式并行策略 Data Parallel 和 Sharding 支持 BF16 训练。[#46846](https://github.com/PaddlePaddle/Paddle/pull/46846),[#47246](https://github.com/PaddlePaddle/Paddle/pull/47246)
+ - 为支持 Sequence Parallel 等策略,分布式流水线并行策略支持 enable_partial_send_recv 策略,支持传输 sequence parallel 切分后的 tensor。[#46992](https://github.com/PaddlePaddle/Paddle/pull/46992),[#47083](https://github.com/PaddlePaddle/Paddle/pull/47083)
+ - 为提升 sharding stage 2 策略的性能,实现了 sharding stage 2 optimizer broadcast 参数与下一个 step forward 的 overlap,并使用多 CUDA Stream 进行通信,GPT 6.7B 模型 16 卡训练性能提升 11%。[#46495](https://github.com/PaddlePaddle/Paddle/pull/46495),[#46656](https://github.com/PaddlePaddle/Paddle/pull/46656),[#47061](https://github.com/PaddlePaddle/Paddle/pull/47061)
+
+### (5)问题修复
+
+- 动转静
+ - 修复了模型在多卡训练时 Parameter 无梯度场景下,动转静会报错的问题。[#44485](https://github.com/PaddlePaddle/Paddle/pull/44485)
+ - 修复了动转静时终端会有多余的框架日志误输出的问题。[#45754](https://github.com/PaddlePaddle/Paddle/pull/45754),[#46800](https://github.com/PaddlePaddle/Paddle/pull/46800)
+ - 修复了模型中控制流中包含无需梯度的 Tensor 时,在动转静训练时会报错的问题。[#43034](https://github.com/PaddlePaddle/Paddle/pull/43034)
+ - 修复了动转静训练在梯度聚合时计算值错误的问题。[#44893](https://github.com/PaddlePaddle/Paddle/pull/44893)
+ - 修复了函数被@staticmethod 装饰时动转静会报错的问题。[#44983](https://github.com/PaddlePaddle/Paddle/pull/44983),[#45268](https://github.com/PaddlePaddle/Paddle/pull/45268),[#45277](https://github.com/PaddlePaddle/Paddle/pull/45277)
+ - 修复了部分场景下模型包含控制动转静训练时,显存占用过多的问题。[#45380](https://github.com/PaddlePaddle/Paddle/pull/45380)
+ - 修复了模型中包含复杂控制流时,动转静在组网阶段 shape 推导报错的问题。[#45916](https://github.com/PaddlePaddle/Paddle/pull/45916),[#46020](https://github.com/PaddlePaddle/Paddle/pull/46020)
+- 报错机制修复
+ - 使用 np.testing.assert_allclose 替换 self.assertTrue(np.allclose(...)),获得更充分的报错信息 ([#44947)(https://github.com/PaddlePaddle/Paddle/pull/44947), [#44988](https://github.com/PaddlePaddle/Paddle/pull/44988),[#45213](https://github.com/PaddlePaddle/Paddle/pull/45213))
+- 集合通信分布式训练
+ - 修复了通信库初始化、通信过程中的若干 bug,增强了系统运行稳定性 [#44964](https://github.com/PaddlePaddle/Paddle/pull/44964) [#45100](https://github.com/PaddlePaddle/Paddle/pull/45100) [#44758](https://github.com/PaddlePaddle/Paddle/pull/44758)
+ - 修复流水线并行容易 hang 的问题,增强策略的易用性 [#47201](https://github.com/PaddlePaddle/Paddle/pull/47201);增强流水线功能支持不均衡的输入 [#47199](https://github.com/PaddlePaddle/Paddle/pull/47199)
+ - 修复新动态图 MP/PP 策略下性能低于老动态图的问题 [#47071](https://github.com/PaddlePaddle/Paddle/pull/47071)
+ - 修复 sharding stage2 策略错误维护参数 trainable 属性的 bug [#47240](https://github.com/PaddlePaddle/Paddle/pull/47240)
+ - 修复一系列 OP 在 tensor numel 大于 INT32_MAX 时的 bug。[#45711](https://github.com/PaddlePaddle/Paddle/pull/45711),[#45741](https://github.com/PaddlePaddle/Paddle/pull/45741),[#45897](https://github.com/PaddlePaddle/Paddle/pull/45897),[#46158](https://github.com/PaddlePaddle/Paddle/pull/46158),[#46767](https://github.com/PaddlePaddle/Paddle/pull/46767),[#47191](https://github.com/PaddlePaddle/Paddle/pull/47191),[#46045](https://github.com/PaddlePaddle/Paddle/pull/46045),[#46160](https://github.com/PaddlePaddle/Paddle/pull/46160)
+ - 修复 FusedAttention 和 FusedFeedForward OP 显存占用过大的 bug。[#47236](https://github.com/PaddlePaddle/Paddle/pull/47236),[#47235](https://github.com/PaddlePaddle/Paddle/pull/47235)
+ - 修复 multi_tensor_adam 和 multi_tensor_momentum OP 在传入的 parameters 是 list of dict 时参数更新错误的 bug。[#47352](https://github.com/PaddlePaddle/Paddle/pull/47352),[#47372](https://github.com/PaddlePaddle/Paddle/pull/47372)
+
+## 4. 部署方向(Paddle Inference)
+
+### (1)新增特性
+
+- 后端图引擎集成方案优化
+ - 为了减少 Paddle-TensorRT 插件代码开发,以及减少 Paddle-TensorRT 子图数量从而降低资源占用率,开发了通用插件机制,可以自动对框架内丰富的 Phi 算子提供统一的 TensorRT 插件接口,在多数场景下可以有效减少显存占用。 [#46970](https://github.com/PaddlePaddle/Paddle/pull/46070),[#46179](https://github.com/PaddlePaddle/Paddle/pull/46179),[#46580](https://github.com/PaddlePaddle/Paddle/pull/46580)
+ - 为了方便用户在框架定制算子且能使得 Paddle-TensorRT 高效推理,进行功能升级支持升级框架自定义 Paddle-TensorRT 插件。[#46970](https://github.com/PaddlePaddle/Paddle/pull/46070)
+- Inference 推理库构建系统优化,体积可按需裁剪
+ - 预编译的安装包默认支持 TensorRT:训练用的预编译安装包与部署用的预编译安装包(Paddle Inference)统一为一个预编译安装包,且优化了构建系统,使得预编译的安装包默认支持 TensorRT,减少用户使用 PaddleTensorRT 时的切换成本。[#46008](https://github.com/PaddlePaddle/Paddle/pull/46008),[#45824](https://github.com/PaddlePaddle/Paddle/pull/45824),[#46058](https://github.com/PaddlePaddle/Paddle/pull/46058)
+ - 体积可按需裁剪:可依据模型算子进行裁剪。[#47033](https://github.com/PaddlePaddle/Paddle/pull/47033) , [#47049](https://github.com/PaddlePaddle/Paddle/pull/47049) , [#47047](https://github.com/PaddlePaddle/Paddle/pull/47047)
+- Inference 支持原生 AMP
+ - 为了充分利用 GPU Tensor Core 计算能力,提升模型的推理性能,开发了模型精度转换工具,Inference GPU 原生支持了混合精度模型的推理。使用方式可参考[文档](https://github.com/PaddlePaddle/Paddle-Inference-Demo/blob/release/v2.4/docs-official/guides/nv_gpu_infer/gpu_mixed_precision.md)。[#43814](https://github.com/PaddlePaddle/Paddle/pull/43814),[#43881](https://github.com/PaddlePaddle/Paddle/pull/43881),[#44057](https://github.com/PaddlePaddle/Paddle/pull/44057),[#44307](https://github.com/PaddlePaddle/Paddle/pull/44307),[#44457](https://github.com/PaddlePaddle/Paddle/pull/44457),[#44866](https://github.com/PaddlePaddle/Paddle/pull/44866),[#45050](https://github.com/PaddlePaddle/Paddle/pull/45050),[#45346](https://github.com/PaddlePaddle/Paddle/pull/45346),[#45379](https://github.com/PaddlePaddle/Paddle/pull/45379),[#45406](https://github.com/PaddlePaddle/Paddle/pull/45406),[#45882](https://github.com/PaddlePaddle/Paddle/pull/45882)
+ - 为了提升混合精度下模型的推理性能,补充了未支持 FP16 计算的高频算子的 FP16 kernel,减少了由于输入精度不匹配插入 cast 算子的可能性,提升推理性能。[#44642](https://github.com/PaddlePaddle/Paddle/pull/44642),[#45061](https://github.com/PaddlePaddle/Paddle/pull/45061),[#44653](https://github.com/PaddlePaddle/Paddle/pull/44653),[#45504](https://github.com/PaddlePaddle/Paddle/pull/45504),[#45061](https://github.com/PaddlePaddle/Paddle/pull/45061),[#44969](https://github.com/PaddlePaddle/Paddle/pull/44969),[#44558](https://github.com/PaddlePaddle/Paddle/pull/44558),[#44710](https://github.com/PaddlePaddle/Paddle/pull/44710),[#43871](https://github.com/PaddlePaddle/Paddle/pull/43871),[#44792](https://github.com/PaddlePaddle/Paddle/pull/44792)
+- 压缩与推理引擎打通升级
+ - 升级量化模型存储格式,新格式支持 Paddle Inference、PaddleLite 和 Paddle2ONNX 3 种部署方式,支持芯片类型包括 X86 CPU、NVIDIA GPU、Arm CPU。([#46305](https://github.com/PaddlePaddle/Paddle/pull/46305) [#462832](https://github.com/PaddlePaddle/Paddle/pull/46283) [#46022](https://github.com/PaddlePaddle/Paddle/pull/46022))
+ - 新增兼容 SoC/NPU 芯片的 INT8 全量化功能,可保证产出的 INT8 量化模型在 SoC/NPU 芯片上有最佳推理加速和精度。
+- 推理引擎与飞桨编译器(CINN)打通升级
+ - 升级飞桨框架与编译器的接口模块,支持推理模型通过 Paddle Inference 接入编译器进行优化([#44499](https://github.com/PaddlePaddle/Paddle/pull/44499) [#44708](https://github.com/PaddlePaddle/Paddle/pull/44708) )
+
+### (2)底层优化
+
+- **GPU 性能优化**
+ - 新增 matmul_v2、LSTM、reshape、fill_constant、swish、mulitclass_nms3、bilinear_interp_v2、split、silu、shuffle_channel 算子的 TensorRT 映射及完善动态 shape 的支持。多类重点模型性能提升 7%~90% 。([#46177](https://github.com/PaddlePaddle/Paddle/pull/46177),[#44678](https://github.com/PaddlePaddle/Paddle/pull/44678),[#44314](https://github.com/PaddlePaddle/Paddle/pull/44314),[#44561](https://github.com/PaddlePaddle/Paddle/pull/44561),[#45166](https://github.com/PaddlePaddle/Paddle/pull/45166), [#44411](https://github.com/PaddlePaddle/Paddle/pull/44411),[#43424](https://github.com/PaddlePaddle/Paddle/pull/43424), [#44516](https://github.com/PaddlePaddle/Paddle/pull/44516))
+ - 增加常量折叠 PASS 进行推理性能优化,提升 SwinTransformer、HifiGAN、FastSpeech2 等模型的性能。([#45494](https://github.com/PaddlePaddle/Paddle/pull/45494))
+ - 增加 conv_fusion workspacesize 的 cache,提升 conv_fusion 计算性能。([#45902](https://github.com/PaddlePaddle/Paddle/pull/45902))
+- **视觉 ViT 模型优化**
+ - 新增 ViT 模型 Attention 结构融合 PASS,并支持 OSS Plugin 和自动 padding,ViT 推理速度提升 30%-40% [#45019](https://github.com/PaddlePaddle/Paddle/pull/45019) [#45506](https://github.com/PaddlePaddle/Paddle/pull/45506)
+- **大模型推理性能优化**
+ - 为提高超大生成模型推理速度以及显存节省,对多层 Transformer 融合算子(fused_multi_transformer_op)增加 INT8 实现(fused_multi_transformer_int8_op),支持生成模型的量化推理。结合矩阵乘算法选择、量化反量化 kernel 融合进行性能优化。 [#46169](https://github.com/PaddlePaddle/Paddle/pull/46169)
+ - 为了提升大模型推理使用 fused_multi_transformer 融合的易用性,增加 Pass 进行自动匹配融合。
+- **CPU 性能优化**
+ - 优化语音 U2++ 模型,FP32 模型推理速度提升 35%,INT8 模型推理速度提升 69% ([#47592](https://github.com/PaddlePaddle/Paddle/pull/47592) [#47127](https://github.com/PaddlePaddle/Paddle/pull/47127) [#47391](https://github.com/PaddlePaddle/Paddle/pull/47391) [#47234](https://github.com/PaddlePaddle/Paddle/pull/47234) [#47009](https://github.com/PaddlePaddle/Paddle/pull/47009) [#47080](https://github.com/PaddlePaddle/Paddle/pull/47080))
+
+
+### (3)问题修复
+
+- TensorRT workspace size 大小设置支持 int64。([#44469](https://github.com/PaddlePaddle/Paddle/pull/44469))
+- Paddle-TRT 中,全面支持 Op 的输入为权重。([#45545](https://github.com/PaddlePaddle/Paddle/pull/45545))
+- Paddle-TRT 中,支持 conv2d_transpose/conv3d_transpose 含 output_padding 属性。([#45004](https://github.com/PaddlePaddle/Paddle/pull/45004))
+- Paddle-TRT 中,增强 strided_slice 对动态 shape 的支持。([#46819](https://github.com/PaddlePaddle/Paddle/pull/46819))
+- Paddle-TRT 中,优化了在多线程场景下运行时 context 的显存占用。([#45468](https://github.com/PaddlePaddle/Paddle/pull/45468))
+- Paddle-TRT 中,修复了多个模型在同一进程中运行时,当初始化顺序变动时,反复生成序列化文件的问题。([#43942](https://github.com/PaddlePaddle/Paddle/pull/43942))
+- 修复了同一进程中,多次初始化 Predictor 并运行时,偶发崩溃的问题。([#45203](https://github.com/PaddlePaddle/Paddle/pull/45203))
+- 修复 MobileNetV3_large、ERNIE 3.0-Medium 和 bert 等量化模型推理精度异常问题 ([#45416](https://github.com/PaddlePaddle/Paddle/pull/45416) [#46283](https://github.com/PaddlePaddle/Paddle/pull/46283) [#45920](https://github.com/PaddlePaddle/Paddle/pull/45920) [#47573](https://github.com/PaddlePaddle/Paddle/pull/47574))
+
+## 5. 环境适配
+
+- 训练用的预编译安装包与部署用的预编译安装包(Paddle Inference)统一为一个预编译安装包,且优化了构建系统,使得预编译的安装包默认支持 TensorRT。
+- 取消了适配 CUDA10.1 版本的预编译安装包。
+- 新增了适配 CUDA11.7 版本的预编译安装包。
+- 源码编译时间缩短:减少模块间依赖,提升并行度,优化部分模块的编译速度,共同使的全量编译时间减少了约 20 分钟。
+- 支持在 windows 11、Centos 8、Ubuntu 22.04、Jetson 5.02 系统环境上运行飞桨,支持使用 WSL 2 工具在 windows 系统中运行飞桨 linux 安装包。
+- 修复飞桨在 glibc2.34+环境中运行错误的问题。
+- 优化了整个代码仓库中的 C++、Python、CMake 的代码风格,并引入或升级了以下的代码风格检查工具。
+ - pre-commit 由 1.10.4 升级到 2.17.0: [#43103](https://github.com/PaddlePaddle/Paddle/pull/43103)
+ - pylint 由默认版本改为指定 2.12.0 版本: [#43103](https://github.com/PaddlePaddle/Paddle/pull/43103)
+ - remove-crlf 由 1.0.1 升级到 1.1.14: [#43103](https://github.com/PaddlePaddle/Paddle/pull/43103)
+ - cpplint 由默认版本改为指定 1.6.0 版本: [#43175](https://github.com/PaddlePaddle/Paddle/pull/43175),[#43978](https://github.com/PaddlePaddle/Paddle/pull/43978),[#43673](https://github.com/PaddlePaddle/Paddle/pull/43673),[#43679](https://github.com/PaddlePaddle/Paddle/pull/43679),[#43695](https://github.com/PaddlePaddle/Paddle/pull/43695),[#43733](https://github.com/PaddlePaddle/Paddle/pull/43733),[#43740](https://github.com/PaddlePaddle/Paddle/pull/43740)
+ - clang-format 由 3.8 升级到 13.0: [#42840](https://github.com/PaddlePaddle/Paddle/pull/42840),[#43248](https://github.com/PaddlePaddle/Paddle/pull/43248),[#43329](https://github.com/PaddlePaddle/Paddle/pull/43329),[#43333](https://github.com/PaddlePaddle/Paddle/pull/43333),[#43633](https://github.com/PaddlePaddle/Paddle/pull/43633),[#43678](https://github.com/PaddlePaddle/Paddle/pull/43678)
+ - 引入 black 工具进行 python 代码的风格检查:[#46014](https://github.com/PaddlePaddle/Paddle/pull/46014)
+ - 引入 cmakelint 工具用于 cmake 文件代码检查,版本为 1.4.2: [#43222](https://github.com/PaddlePaddle/Paddle/pull/43222),[#43406](https://github.com/PaddlePaddle/Paddle/pull/43406),[#43414](https://github.com/PaddlePaddle/Paddle/pull/43414),[#43428](https://github.com/PaddlePaddle/Paddle/pull/43428)
+ - 引入 cmake-format 用于 cmake 文件的自动格式化,版本为 0.6.13: [#43057](https://github.com/PaddlePaddle/Paddle/pull/43057)
+
+## 6. 硬件适配
+### 海光 DCU
+- 增加在 DCU 上的 Profiler 功能,可以在 DCU 上对模型运行过程的性能数据进行收集、统计和展示,支持 kernel 层面的 DCU 占用率显示。
+### 昆仑芯
+- 增加在昆仑芯 2 代芯片上的 Profiler 功能,可以在昆仑芯 2 代芯片上对模型运行过程的性能数据进行收集、统计和展示,支持 kernel 层面的昆仑芯 2 代芯片占用率显示。
+- 昆仑芯 2 代芯片(昆仑芯 AI 加速卡 R200、R300、R200-8F、R200-8FS、RG800)训练/推理支持,已验证 PPYOLOE、PP-OCR、ERNIE 3.0、PP-TSM、PP-TTS、DLRM、PPO 等总计 51 个模型,支持静态图+动态图训练,支持混合精度训练,支持单机单卡、单机多卡训练,覆盖了智能视觉、自然语言处理、智能语音、智能推荐、强化学习 5 个领域。
+### 寒武纪
+- 寒武纪 MLU 芯片(MLU370 系列板卡)训练/推理支持,已验证 ResNet50、BERT、YoloV3、OCR-DB、Deeplabv3 等多个模型,支持静态图+动态图训练,支持混合精度训练,支持单机单卡、单机多卡训练。
+### Graphcore
+- Graphcore IPU 芯片(包括 IPU Mk2 GC200 和 Bow IPU)训练/推理支持,支持 ResNet50、BERT 等模型,支持静态图和动转静模式训练,支持单芯片、单机、多机分布式训练。
+- 增加更多算子支持
+- 升级到 Poplar SDK v3.0.0 版本 [#46892](https://github.com/PaddlePaddle/Paddle/pull/46892)
+* 支持使用动转静模式训练模型, 添加了一个新的 paddle.incubate.identity_loss op 用来辅助构图 [#43770](https://github.com/PaddlePaddle/Paddle/pull/43770)
+* 支持 Paddle 原生的分布式训练 API paddle.distributed.launch [#43311](https://github.com/PaddlePaddle/Paddle/pull/43311)
+* 支持使用混合精度训练模型 [#41733](https://github.com/PaddlePaddle/Paddle/pull/41733)
+* Paddle Inference 支持使用 PopART 自定义算子 [#45235](https://github.com/PaddlePaddle/Paddle/pull/45235)
+
+### Intel
+- 迁移 oneDNN 算子 transpose2_grad([#46139](https://github.com/PaddlePaddle/Paddle/pull/46139)), relu6_grad([#46501](https://github.com/PaddlePaddle/Paddle/pull/46501)), gaussian_random([#46747](https://github.com/PaddlePaddle/Paddle/pull/46747), [#45481](https://github.com/PaddlePaddle/Paddle/pull/45481)), sgd and stack([#46374](https://github.com/PaddlePaddle/Paddle/pull/46374)), concat+grad, expand+grad,fill_constant([#45863](https://github.com/PaddlePaddle/Paddle/pull/45863)), slice, slice_grad, split,pad and pad3d([#46101](https://github.com/PaddlePaddle/Paddle/pull/46101)), softmax_grad([#46257](https://github.com/PaddlePaddle/Paddle/pull/46257)), Shape([#46051](https://github.com/PaddlePaddle/Paddle/pull/46051)), Sum([#46239](https://github.com/PaddlePaddle/Paddle/pull/46239)), Transpose2_grad([#46139](https://github.com/PaddlePaddle/Paddle/pull/46139)), Cast, clip+grad andpool+grad([#45775](https://github.com/PaddlePaddle/Paddle/pull/45775)), Reduce sum+grad,mean+grad, min and max([#45536](https://github.com/PaddlePaddle/Paddle/pull/45536)), Relu and abs([#45397](https://github.com/PaddlePaddle/Paddle/pull/45397)), Gelu([#45596](https://github.com/PaddlePaddle/Paddle/pull/45596)), Scale([#45537](https://github.com/PaddlePaddle/Paddle/pull/45537))
+- 优化 fill_constant, fc, conv 等若干算子内核
+- 增加若干 Pass 融合优化
+- 优化 Adam-W CPU FP32 优化器 ([#42522](https://github.com/PaddlePaddle/Paddle/pull/42522))
+- 优化 pad3d fp32 onednn 算子内核实现 ([#43990](https://github.com/PaddlePaddle/Paddle/pull/43990))
+- 改进 matmul, FC andlookup_v2 内核的并发执行 ([#44023](https://github.com/PaddlePaddle/Paddle/pull/44023), [#44078](https://github.com/PaddlePaddle/Paddle/pull/444078), [#44640](https://github.com/PaddlePaddle/Paddle/pull/44640), [#44744](https://github.com/PaddlePaddle/Paddle/pull/44744), [#45249](https://github.com/PaddlePaddle/Paddle/pull/45249))
+- FC onednn 算子内核支持 bf16 ( [#42758](https://github.com/PaddlePaddle/Paddle/pull/42758), [#43154](https://github.com/PaddlePaddle/Paddle/pull/43154), [#43109](https://github.com/PaddlePaddle/Paddle/pull/43109))
+- 增加矩阵乘法和激活函数的融合([#43519](https://github.com/PaddlePaddle/Paddle/pull/43519), [#43198](https://github.com/PaddlePaddle/Paddle/pull/43198))
+- 支持卷积算子 int8 参数生产 IR passes ( [#44680](https://github.com/PaddlePaddle/Paddle/pull/44680), [#42625](https://github.com/PaddlePaddle/Paddle/pull/42625))
+- 增加 pool/avg 量化和 scales 修正 ([#44186](https://github.com/PaddlePaddle/Paddle/pull/44186))
+- 增加 matmul 和 elementwise onednn 算子内核融合([#45077](https://github.com/PaddlePaddle/Paddle/pull/45077))
+- 修复 QAT 精度问题 ([#43693](https://github.com/PaddlePaddle/Paddle/pull/43693), [#45936](https://github.com/PaddlePaddle/Paddle/pull/45936), [#46378](https://github.com/PaddlePaddle/Paddle/pull/46378))
+- 迁移 42 个 oneDNN 算子内核到 PHI 算子库 ([#46374](https://github.com/PaddlePaddle/Paddle/pull/46374), [#46101](https://github.com/PaddlePaddle/Paddle/pull/46101), [#45989](https://github.com/PaddlePaddle/Paddle/pull/45989), [#45863](https://github.com/PaddlePaddle/Paddle/pull/45863), [#45775](https://github.com/PaddlePaddle/Paddle/pull/45775), [#45626](https://github.com/PaddlePaddle/Paddle/pull/45626), [#45536](https://github.com/PaddlePaddle/Paddle/pull/45536), [#46501](https://github.com/PaddlePaddle/Paddle/pull/46501), [#46257](https://github.com/PaddlePaddle/Paddle/pull/46257), [#45596](https://github.com/PaddlePaddle/Paddle/pull/45596), [#45537](https://github.com/PaddlePaddle/Paddle/pull/45537), [#45481](https://github.com/PaddlePaddle/Paddle/pull/45481), [#45397](https://github.com/PaddlePaddle/Paddle/pull/45397), [#46239](https://github.com/PaddlePaddle/Paddle/pull/46239), [#46139](https://github.com/PaddlePaddle/Paddle/pull/46139), [#46051](https://github.com/PaddlePaddle/Paddle/pull/46051))
+- 量化 elementwise_sub 和 shape 算子内核 ([#42854](https://github.com/PaddlePaddle/Paddle/pull/42854), [#44124](https://github.com/PaddlePaddle/Paddle/pull/44124))
+
+## Thanks to our Contributors
+
+This release contains contributions from:
+
+0x45f, Aganlengzi, Ainavo, Allen Guo, Asthestarsfalll, Aurelius84, Baibaifan, baoachun, BiynXu, Bo Zhang, BrilliantYuKaimin, cambriconhsq, caozhou, carryyu, ccrrong, ceci3, chalsliu, Chang Xu, Charles-hit, Chen Long, Chen Weihang, chenjian, chentianyu03, Chenxiao Niu, cifar10, crystal, csy0225, danleifeng, David Nicolas, dc-cheny, denglin-github, dongfangshenzhu, duanboqiang, duanyanhui, engineer, enzodechine, Fan Zhang, feifei-111, Feiyu Chan, Feng Ni, feng_shuai, FlyingQianMM, freeliuzc, furnace, fuyou765, fwenguang, Ghost Screaming, gongweibao, Guanghua Yu, guguguzi, Guoxia Wang, Haipeng Wang, handiz, Haohongxiang, haosicheng, helen88, heliqi, hong, HongyuJia, houj04, huangxu96, Hui Zhang, Huihuang Zheng, huzhiqiang, Jacek Czaja, Jack Zhou, jack603047588, Jackwaterveg, jakpiase, james, Jiabin Yang, jiangcheng, Jiaqi Liu, JingZhuangzhuang, joanna.wozna.intel, JYChen, JZ-LIANG, Kaipeng Deng, kangguangli, kuizhiqing, Leo Chen, Leo Guo, levi131, Li Min, Li-fAngyU, lidanqing, LielinJiang, Ligoml, Lijunhui, lilong12, limingshu, Lin Manhui, Linjie Chen, liqitong-a, littletomatodonkey, liu zhengxi, Liu-xiandong, liutiexing, Liyulingyue, LiYuRio, Lux et Veritas, lyq, Matsumoto Ruko, MayYouBeProsperous, mengqingchun02, Ming-Xu Huang, ming1753, minghaoBD, moyan, mrcangye, Netpunk, niuliling123, Nyakku Shigure, OccupyMars2025, onecatcn, pangyoki, parap1uie-s, peachlcy, piotrekobi, Qi Li, QingshuChen, qipengh, Rayman, Regan Yue, RichardWooSJTU, risemeup1, Roc, ronnywang, Rui Li, Ruibiao Chen, seemingwang, Shang Zhizhou, shangliang Xu, ShenLiang, shentanyue, Shijie, ShiningZhang, shixingbo, shiyutang, Shuangchi He, Siming Dai, Sing_chan, Skr Bang, SmirnovKol, sneaxiy, sprouteer, Sylwester Fraczek, Sławomir Siwek, taixiurong, Tao CHANG, TeFeng Chen, Thomas Young, thunder95, Thunderbrook, tiancaishaonvjituizi, tianshuo78520a, Tomasz Socha, TTerror, USTCKAY, Vigi Zhang, Walter, Wang Bojun, wangguanqun, wangguanzhong, wanghuancoder, wangna11BD, WangXi, wangxinxin08, Wangzheee, WangZhen, wangzhen38, wawltor, wbn, Wei Shengyu, Weilong Wu, weishengying, Wen Sun, wenbin, whs, Wilber, WJJ1995, wuhuachaocoding, wuhuanzhou, wuyefeilin, XiaoguangHu, xiaoguoguo626807, xiaohemaikoo, xiaoting, xiaoxiaohehe001, Xiaoxu Chen, xiayanming, Xingyuan Zhang, xiongkun, yang131313, yangguohao, YangZhou, Yanxing Shi, Yao Zihang, yaoxuefeng, yaozhixin, yeliang2258, Yilingyelu, Yiqun Liu, ykkk2333, Yuang Liu, Yuanle Liu, YuanRisheng, yuguo, Yulong Ao, Yulv-git, YUNSHEN XIE, Zhang Jun, Zhang Ting, Zhang Zheng, zhangbo9674, zhangbopd, zhangchunle, Zhangjingyu06, zhangkaihuo, zhangxiaoci, zhangyikun02, zhangzhenguo, Zhanlue Yang, zhaocaibei123, zhaoying9105, zhaoyingli, Zhen Wang, Zhengyang Song, zhiboniu, Zhong Hui, Zhou Wei, zhoutianzi666, zhupengyang, ziyoujiyi, zlsh80826, zmxdream, zn, Zuza Gawrysiak, zyfncg, 傅剑寒, 六个骨头, 津, 熊峻峰, 王明冬, 石晓伟
+
# 2.3.1 Release Note
diff --git a/docs/release_note_en.md b/docs/release_note_en.md
index c98913471db..9f9bec6a0a0 100644
--- a/docs/release_note_en.md
+++ b/docs/release_note_en.md
@@ -1,3 +1,246 @@
+# 2.4.1 Release Note
+
+
+Remove the dependence of the Paddle on python.so, and fix the bug that fails to execute due to the inability to find python.so in specific environments, including conda.
+
+
+# 2.4.0 Release Note
+
+## 1. Important Updates
+
+- **New dynamic graph architecture is officially effective**: The new dynamic graph framework has significantly improved the scheduling performance. The scheduling performance of more than 90% APIs is improved by over 50%, and the model performance of more than 50% kits is improved by over 5%. The functional architecture is clearer, and the secondary development capability and experience are significantly enhanced.
+
+- **Comprehensive improvement of the dynamic-static unification ability of the PaddlePaddle**: The dynamic-to-static function is provided with richer Python syntax support. The Python syntax coverage of the PaddlePaddle reaches 90%. The syntax transcription logic is mainly optimized to completely support the control flow syntax, with providing smooth dynamic-to-static graph experiences by pressing one key. With the newly upgraded static graph executor, the dynamic-to-static training has better acceleration capability, and the key model test shows that it is close to the best level of the static graph. The dynamic-to-static scalability is improved, with newly supporting multi-function merge export and inference. Users can use the PHI operator library for secondary development and flexible deployment. This can effectively support the custom decoding of U2++ featured models in the speech domain.
+
+- **Add sparse computing APIs**: Add 55 sparse APIs `paddle.sparse.*` and support mainstream sparse computing scenarios. The APIs have been applied to sparse training and inference deployment for 3D point cloud target detection, Sparse Transformers, and other tasks, with a speedup of 105.75% compared to DenseTensor in high sparse scenarios. In contrast to similar products, the speed of sparse computing is increased by 4.01%-58.55%. Support the computing of a variety of sparse Tensors (SparseCoo and SparseCsr). This is the ultimate saving of video memory. Meanwhile, it maintains a consistent usage experience, with the same usage method of the dense Tensor API.
+
+- **Large-scale graph neural network GPU training engine**: Through the heterogeneous hierarchical storage technology of SSD, memory, and video memory, it breaks through the video memory bottleneck and supports all-GPU storage and training of super-large-scale graphs. It realizes the all-GPU integrated solution of walk, sampling and training. This can increase the training speed by more than 10x under the same costs, compared to the traditional distributed CPU solution.
+
+- **Environment adaptation**: Add pre-compiled installer adapted to CUDA version 11.7. It newly supports the running in Ubuntu 22.04 or later.
+
+### Forward-looking forecast
+
+- PaddlePaddle Framework will deprecate support for python 3.6 in version 2.5.
+- The PaddlePaddle framework will gradually deprecate the API under the `paddle.fluild` namespace on the python side, and some of the APIs under this namespace will be directly removed in version 2.5.
+
+## 2. Incompatibility upgrade
+
+- The pre-compiled installer for CUDA version 10.1 is cancelled.
+- The -Tensor.clear_gradient(bool set_to_zero) interface will not take the value passed by kwargs, and will have to pass the bool variable of set_to_zero through args.
+- In order to improve the utilization efficiency of video memory, only the gradients of forward leaf node variables, such as the gradients of network parameters in training, are retained in the dynamic graph by default, instead of the gradients of non-leaf nodes. If you need to preserve a specific Tensor gradient, you can call the Tensor.retain_grads() interface before reverse execution.
+- paddle.autograd. PyLayer will no longer support the case where the input is tuple, pass in a list of Tensor if you want a group of them.
+
+## 3. Training framework (including the distributed feature)
+
+### (1)New APIs and enhanced API functions
+- **Add the sparse computing class API**:paddle.sparse
+ - Add 55 sparse APIs and support mainstream sparse computing scenarios. The APIs have been applied to sparse training and inference deployment for 3D point cloud target detection, Sparse Transformers, and other tasks, with a speedup of 105.75% compared to DenseTensor in high sparse scenarios. In contrast to similar products, the speed of sparse computing is increased by 4.01%-58.55%. Support the computing of a variety of sparse Tensors (SparseCoo and SparseCsr). This is the ultimate saving of video memory. Meanwhile, it maintains a consistent usage experience, with the same usage method of the dense Tensor API.[#45849](https://github.com/PaddlePaddle/Paddle/pull/45849), [#46694](https://github.com/PaddlePaddle/Paddle/pull/46694), [#45086](https://github.com/PaddlePaddle/Paddle/pull/45086), [#41857](https://github.com/PaddlePaddle/Paddle/pull/41857), [#42935](https://github.com/PaddlePaddle/Paddle/pull/42935), [#43475](https://github.com/PaddlePaddle/Paddle/pull/43475), [#43668](https://github.com/PaddlePaddle/Paddle/pull/43668), [#43966](https://github.com/PaddlePaddle/Paddle/pull/43966), [#44022](https://github.com/PaddlePaddle/Paddle/pull/44022), [#44346](https://github.com/PaddlePaddle/Paddle/pull/44346), [#44432](https://github.com/PaddlePaddle/Paddle/pull/44432), [#44451](https://github.com/PaddlePaddle/Paddle/pull/44451), [#44743](https://github.com/PaddlePaddle/Paddle/pull/44743), [#42013](https://github.com/PaddlePaddle/Paddle/pull/42013), [#43520](https://github.com/PaddlePaddle/Paddle/pull/43520), [#41434](https://github.com/PaddlePaddle/Paddle/pull/41434), [#42130](https://github.com/PaddlePaddle/Paddle/pull/42130), [#41276](https://github.com/PaddlePaddle/Paddle/pull/41276), [#41857](https://github.com/PaddlePaddle/Paddle/pull/41857), [#41356](https://github.com/PaddlePaddle/Paddle/pull/41356)
+- **Add the audio field API:** paddle.audio
+ - Add the feature extraction APIs such as MFCC, Spectrogram, and LogMelSpectrogram. Support the GPU computing. The performance increases by more than 15x compared to the CPU. This can significantly improve the GPU utilization in speech model training.[#45424](https://github.com/PaddlePaddle/Paddle/pull/45424)
+ - Add the feature extraction basic APIs such as Window Function and Discrete Cosine Transform. This can facilitate users to customize the speech feature extraction.[#45424](https://github.com/PaddlePaddle/Paddle/pull/45424)
+ - Add the speech I/O module. It provides 2 types of audio I/O backend and supports 6 types of codecs for convenient loading of speech data. [#45939](https://github.com/PaddlePaddle/Paddle/pull/45939)
+ - Add TESS and ESC50 speech classification datasets. It is convenient for users to complete the classical speech classification model.[#45939](https://github.com/PaddlePaddle/Paddle/pull/45939)
+- **Add the graph learning domain API:** paddle.geometric
+ - Graph learning is gradually becoming a key technology in the field of machine learning. The new paddle.geometric module of PaddlePaddle provides a better modeling and training development experience of graph learning.
+ - Message passing: The message passing mechanism of the graph learning is the basis of graph modeling. We add 7 graph learning message passing APIs to make it more convenient to complete the modeling of the graph learning. Among them, 3 newly added message passing fusion operators can significantly reduce the GPU memory consumption in the GNN model training. In the dense graph scenarios, more than 50% of GPU memory can be saved in the models of GCN series, and the training speed can increase by more than 20%.[#44848](https://github.com/PaddlePaddle/Paddle/pull/44848), [#44580](https://github.com/PaddlePaddle/Paddle/pull/44580), [#43174](https://github.com/PaddlePaddle/Paddle/pull/43174), [#44970](https://github.com/PaddlePaddle/Paddle/pull/44970)
+ - Graph sampling: Graph sampling is the performance bottleneck of GNN model training. This newly added high-performance graph sampling operator supports high concurrent graph sampling. It can increase the sampling speed of GraphSage by more than 32 times and the model training speed by more than 12 times.[#44970](https://github.com/PaddlePaddle/Paddle/pull/44970)
+- **Add the vision domain API**
+ - The paddle.vision is added with target detection domain operators.([#43736](https://github.com/PaddlePaddle/Paddle/pull/43736)), paddle.vision.generate_proposals([#43611](https://github.com/PaddlePaddle/Paddle/pull/43611)), paddle.vision.matrix_nms([#44357](https://github.com/PaddlePaddle/Paddle/pull/44357)), paddle.vision.prior_box 和 paddle.vision.box_coder( [#47282](https://github.com/PaddlePaddle/Paddle/pull/47282) ).
+
+- - **Add other API**
+ - Add the iinfo([#45321](https://github.com/PaddlePaddle/Paddle/pull/45321)), count_nonzero([#44169](https://github.com/PaddlePaddle/Paddle/pull/44169)), nanmedian([#42385](https://github.com/PaddlePaddle/Paddle/pull/42385)), remainder\_ ([#45266](https://github.com/PaddlePaddle/Paddle/pull/45266)), take([#44741](https://github.com/PaddlePaddle/Paddle/pull/44741)), triu_indices([#45168](https://github.com/PaddlePaddle/Paddle/pull/45168)), sgn([#44568](https://github.com/PaddlePaddle/Paddle/pull/44568)), bucketize([#44195](https://github.com/PaddlePaddle/Paddle/pull/44195)), nanquantile([#41343](https://github.com/PaddlePaddle/Paddle/pull/41343)), frac([#41226](https://github.com/PaddlePaddle/Paddle/pull/41226)), logcumsumexp([#42267](https://github.com/PaddlePaddle/Paddle/pull/42267)), pairwise_distance([#44161](https://github.com/PaddlePaddle/Paddle/pull/44161)), heaviside([#41872](https://github.com/PaddlePaddle/Paddle/pull/41872)), logspace([#41261](https://github.com/PaddlePaddle/Paddle/pull/41261)), corrcoef([#40690](https://github.com/PaddlePaddle/Paddle/pull/40690))
+ - Add the RReLU([#41823](https://github.com/PaddlePaddle/Paddle/pull/41823)), CyclicLR([#40698](https://github.com/PaddlePaddle/Paddle/pull/40698)), OneCycleLR([#41825](https://github.com/PaddlePaddle/Paddle/pull/41825)), Softmax2D([#40910](https://github.com/PaddlePaddle/Paddle/pull/40910)), SoftMarginLoss([#42364](https://github.com/PaddlePaddle/Paddle/pull/42364)), MultiLabelSoftMarginLoss([#41183](https://github.com/PaddlePaddle/Paddle/pull/41183)), TripletMarginLoss([#40487](https://github.com/PaddlePaddle/Paddle/pull/40487)), TripletMarginWithDistanceLoss([#40545](https://github.com/PaddlePaddle/Paddle/pull/40545)), CosineEmbeddingLoss 和 cosine_embedding_loss([#41680](https://github.com/PaddlePaddle/Paddle/pull/41680)), PixelUnshuffle([#40728](https://github.com/PaddlePaddle/Paddle/pull/40728)), ChannelShuffle([#40743](https://github.com/PaddlePaddle/Paddle/pull/40743))
+- **Enhanced API functions**
+ - Add the large batch_size calculation function of BatchNorm1D [#43072](https://github.com/PaddlePaddle/Paddle/pull/43072)
+- **Optimize the collective communications distributed training API**
+ - Optimize the `fleet.init` function, and add the `log_level` parameter to facilitate users to view logs during operation [#45909](https://github.com/PaddlePaddle/Paddle/pull/45909)
+ - Add the `paddle.distributed.fleet.recompute_sequential paddle.distributed.fleet.recompute_hybrid` interface. It is convenient for users to use the recompute function [#45348](https://github.com/PaddlePaddle/Paddle/pull/45348)
+ - Add the `paddle.distributed.fleet.layers.mpu` package. It is convenient for users to use tensor parallel function [#45803](https://github.com/PaddlePaddle/Paddle/pull/45803)
+ - Add the communication API `paddle.distributed.destroy_process_group paddle.distributed.isend paddle.distributed.irecv paddle.distributed.all_to_all_single`. It improves the completeness and ease of use of communication [#43918](https://github.com/PaddlePaddle/Paddle/pull/43918)
+ - Add the `paddle.distributed.stream` package. The performance is increased by 5% to 10% compared to the base version[#46023](https://github.com/PaddlePaddle/Paddle/pull/46023) [#45282](https://github.com/PaddlePaddle/Paddle/pull/45282)
+ - The communication API is added with the support of multiple data types such as `Char/Byte/Bool`. It improves the completeness and ease of use of communication [#45574](https://github.com/PaddlePaddle/Paddle/pull/45574) [#45440](https://github.com/PaddlePaddle/Paddle/pull/45440)
+ - The communication API asynchronous parameter is changed from`use_calc_stream` to `sync_op`, It enhances the semantic readability of the interface [#46493](https://github.com/PaddlePaddle/Paddle/pull/46493)
+- **Enhanced high-level API**
+ - The visual model ResNeXt in the high-level API implements the reuse of the ResNet code for refactoring. [#40588](https://github.com/PaddlePaddle/Paddle/pull/40588)
+ - The visual models Inceptionv3, MobileNetv1, MobileNetv2, and ShuffleNetv2 in the high level API are improved.[#40431](https://github.com/PaddlePaddle/Paddle/pull/40431)
+
+### (2)New functions and important upgrades
+
+- **The new dynamic graph architecture is officially launched**:The scheduling performance of the new dynamic graph framework is greatly improved. Compared with the original architecture, the scheduling performance is significantly enhanced. The scheduling performance of more than 90% APIs is improved by over 50%, and the model performance of more than 50% of kits is improved by over 5%. The new dynamic graph architecture is clear, and the coupling is low. The learning and development costs of extension modules such as Hook and PyLayer are significantly reduced based on the new architecture. [#37550](https://github.com/PaddlePaddle/Paddle/pull/37550) , [#37574](https://github.com/PaddlePaddle/Paddle/pull/37574) , [#37813](https://github.com/PaddlePaddle/Paddle/pull/37813) , [#37926](https://github.com/PaddlePaddle/Paddle/pull/37926) , [#39192](https://github.com/PaddlePaddle/Paddle/pull/39192) , [#37599](https://github.com/PaddlePaddle/Paddle/pull/37599) , [#37406](https://github.com/PaddlePaddle/Paddle/pull/37406) , [#37466](https://github.com/PaddlePaddle/Paddle/pull/37466) , [#37599](https://github.com/PaddlePaddle/Paddle/pull/37599) , [#40945](https://github.com/PaddlePaddle/Paddle/pull/40945) , [#39989](https://github.com/PaddlePaddle/Paddle/pull/39989)
+
+- **High-order auto-differentiation mechanism**:In order to better support scientific computing and other scenarios, the PaddlePaddle framework has been further improved and optimized for higher-order auto-differentiation capabilities. At present, the `paddle.incubate.autograd` directory has provided relevant trial functions and APIs for forward/reverse higher-order auto-differentiation (Currently they are in incubation, and related functions and API signatures may change).If you intend to implement related models and explore the auto-differentiation mechanism by yourself, please read the [usage and limitations of higher-order auto-differentiation](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/incubate/autograd/Overview_cn.html) carefully. Specific upgrades include:
+ 1. Static graph higher-order differentiation mechanism upgrade. Through the base operator system and program transformation, it supports higher-order forward and reverse differentiation, with the availability of the compiler and distributed functions.[#41919](https://github.com/PaddlePaddle/Paddle/pull/41919), [#41201](https://github.com/PaddlePaddle/Paddle/pull/41201)
+ 2. Add the forward and reverse higher-order auto-differentiation API, `paddle.incubate.autograd.forward_grad`, `paddle.incubate.autograd.grad`. [#43354](https://github.com/PaddlePaddle/Paddle/pull/43354)
+ 3. Add 18 higher-order auto-differentiation operators:`sin`, `cos`, `exp`, `erf`, `abs`, `log`, `cast`, `where`, `equal`, `not_equal`, `greater_than`, `greater_equal`, `elementwise_pow` `square`, `elementwise_max`, `gelu`, `reduce_mean`, `size`. [#46184](https://github.com/PaddlePaddle/Paddle/pull/46184), [#46024](https://github.com/PaddlePaddle/Paddle/pull/46024), [#45888](https://github.com/PaddlePaddle/Paddle/pull/45888), [#45338](https://github.com/PaddlePaddle/Paddle/pull/45338), [#44345](https://github.com/PaddlePaddle/Paddle/pull/44345)
+ 4. Fix the existing bugs of the operators such as`elementwise_div`, `reduce_sum`, `p_norm`. [#46514](https://github.com/PaddlePaddle/Paddle/pull/46514), [#46184](https://github.com/PaddlePaddle/Paddle/pull/46184)
+- **Generic heterogeneous parameter server architecture**:
+ - Parameter server GPUGraph infrastructure upgraded to meet the implementation needs of large-scale applications: The storage and training of large-scale graph neural networks based on the traditional CPU feature high cost, low stability, and less performance. To overcome these problems, we have built a pure GPU graph training engine (PGLBox). Through the heterogeneous hierarchical storage technology of SSD, memory and video memory, it supports the training of ultra-large scale graph models. The training performance is improved by more than 10x compared with CPU graph training engine on the premise of equal cost. The task failure rate is extremely low.[#44594](https://github.com/PaddlePaddle/Paddle/pull/44594)
+ - Large-scale federation parameter server architecture: For large-scale personalized recommendation scenarios, the large-scale federation parameter server training is developed based on the heterogeneous PS infrastructure, to support horizontal and vertical federation under hundreds of billions of parameters. It includes two features: User private parameters updated locally and public parameters updated remotely. Users can flexibly configure the slicing policy for private and public parameters. A new central scheduling node Coordinator is added. Users can perform secondary development from the base class to customize the Client selection policy. [#42682](https://github.com/PaddlePaddle/Paddle/pull/42682) , [#44864](https://github.com/PaddlePaddle/Paddle/pull/44864) , [#44327](https://github.com/PaddlePaddle/Paddle/pull/44327)
+- **Adaptive parallel**
+ - Design and launch a complete automatic parallelism interface system: Support automatic dynamic-to-static distributed training, automatic distributed data loading, automatic distributed saving and loading, automatic parameter conversion, custom slice marker and custom execution process. Users can easily obtain the automatic distributed training capability based on a single machine networking. It supports data parallel, model parallel, pipeline parallel, and hybrid parallel. [#45776](https://github.com/PaddlePaddle/Paddle/pull/45776) ,[#46552](https://github.com/PaddlePaddle/Paddle/pull/46552) , [#44202](https://github.com/PaddlePaddle/Paddle/pull/44202) , [#45840](https://github.com/PaddlePaddle/Paddle/pull/45840) , [#45518](https://github.com/PaddlePaddle/Paddle/pull/45518) , [#40528](https://github.com/PaddlePaddle/Paddle/pull/40528), [#42838](https://github.com/PaddlePaddle/Paddle/pull/42838), [#43093](https://github.com/PaddlePaddle/Paddle/pull/43093), [#43312](https://github.com/PaddlePaddle/Paddle/pull/43312), [#45053](https://github.com/PaddlePaddle/Paddle/pull/45053).
+ - Improve the underlying adaptive parallel mechanism, including the upgrade of the distributed costmodel design and implementation, to provide better evaluation of the slice policy. Add the native distributed properties to ProgramIR and enrich the Cluster functions. [#40457](https://github.com/PaddlePaddle/Paddle/pull/40457) , [#42601](https://github.com/PaddlePaddle/Paddle/pull/42601) , [#42727](https://github.com/PaddlePaddle/Paddle/pull/42727) , [#42874](https://github.com/PaddlePaddle/Paddle/pull/42784) , [#43114](https://github.com/PaddlePaddle/Paddle/pull/43114) , [#44095](https://github.com/PaddlePaddle/Paddle/pull/44095) , [#44146](https://github.com/PaddlePaddle/Paddle/pull/44146) , [#44701](https://github.com/PaddlePaddle/Paddle/pull/44701) , [#44973](https://github.com/PaddlePaddle/Paddle/pull/44973) , [#45002](https://github.com/PaddlePaddle/Paddle/pull/45002) , [#45118](https://github.com/PaddlePaddle/Paddle/pull/45118) , [#45237](https://github.com/PaddlePaddle/Paddle/pull/45237) , [#42576](https://github.com/PaddlePaddle/Paddle/pull/42576) , [#41722](https://github.com/PaddlePaddle/Paddle/pull/41722) , [#44150](https://github.com/PaddlePaddle/Paddle/pull/44150) , [#44989](https://github.com/PaddlePaddle/Paddle/pull/44989), [#44951](https://github.com/PaddlePaddle/Paddle/pull/44951), [#44963](https://github.com/PaddlePaddle/Paddle/pull/44963) .
+ - Add the Shardingstage1/2/3 AutoTuning feature under data parallel. This allows to automatically select the highest throughput Shardingstage policy while ensuring that the video memory constraints are met. [#43782](https://github.com/PaddlePaddle/Paddle/pull/43782) .
+
+- **Training hardware access - Plug-in solutions**:Add custom Runtime/Kernel/CCL/Graph/Pass solutions. The hardware vendors can choose which modules to implement on-demand based on hardware characteristics.
+
+- **ONNX format export**
+ - Support the quantized model export. The exported ONNX model uses TensorRT or ONNXRuntime to load inference. About 1.5~4 times inference acceleration can be obtained [#856](https://github.com/PaddlePaddle/Paddle2ONNX/pull/856), [#782](https://github.com/PaddlePaddle/Paddle2ONNX/pull/782)
+ - Add the export of a large model greater than 2GB [#942](https://github.com/PaddlePaddle/Paddle2ONNX/pull/942)
+
+### (3)Function optimization
+- **Comprehensive increase of dynamic-to-static analysis conversion & extension capabilities**
+ - In order to improve the success rate and experience of model dynamic-to-static conversion, the transcription logic of control flow syntax is reconstructed. The core syntax has been upgraded to JIT (just-in-time) paradigm to achieve equivalent transcription with Python codes. The syntax functions such as break, return and continue are improved.[#43666](https://github.com/PaddlePaddle/Paddle/pull/43666) , [#43846](https://github.com/PaddlePaddle/Paddle/pull/43846) , [#43848](https://github.com/PaddlePaddle/Paddle/pull/43848) , [#43880](https://github.com/PaddlePaddle/Paddle/pull/43880) , [#43957](https://github.com/PaddlePaddle/Paddle/pull/43957) , [#43328](https://github.com/PaddlePaddle/Paddle/pull/43328) , [#43348](https://github.com/PaddlePaddle/Paddle/pull/43348) , [#43998](https://github.com/PaddlePaddle/Paddle/pull/43998) , [#44465](https://github.com/PaddlePaddle/Paddle/pull/44465) , [#44504](https://github.com/PaddlePaddle/Paddle/pull/44504) , [#43713](https://github.com/PaddlePaddle/Paddle/pull/43713) , [#43864](https://github.com/PaddlePaddle/Paddle/pull/43864) , [#43967](https://github.com/PaddlePaddle/Paddle/pull/43967) , [#44155](https://github.com/PaddlePaddle/Paddle/pull/44155) , [#44487](https://github.com/PaddlePaddle/Paddle/pull/44487) , [#44527](https://github.com/PaddlePaddle/Paddle/pull/44527) , [#45105](https://github.com/PaddlePaddle/Paddle/pull/45105) , [#45900](https://github.com/PaddlePaddle/Paddle/pull/45900)
+ - In order to support the voice custom decoding flexible deployment scenarios, the jit.save/load interface function is extended to support user multi-function merge and export. A new JITLayer component is added to support the invocation of class functions. Meanwhile, the custom inference deployment function is implemented with the PHI operator library C++ API. [#44283](https://github.com/PaddlePaddle/Paddle/pull/44283), [#41783](https://github.com/PaddlePaddle/Paddle/pull/41783), [#43607](https://github.com/PaddlePaddle/Paddle/pull/43607), [#43754](https://github.com/PaddlePaddle/Paddle/pull/43754), [#43758](https://github.com/PaddlePaddle/Paddle/pull/43758), [#43798](https://github.com/PaddlePaddle/Paddle/pull/43798), [#44010](https://github.com/PaddlePaddle/Paddle/pull/44010), [#44351](https://github.com/PaddlePaddle/Paddle/pull/44351), [#44465](https://github.com/PaddlePaddle/Paddle/pull/44465), [#44504](https://github.com/PaddlePaddle/Paddle/pull/44504), [#44597](https://github.com/PaddlePaddle/Paddle/pull/44597), [#44738](https://github.com/PaddlePaddle/Paddle/pull/44738), [#44984](https://github.com/PaddlePaddle/Paddle/pull/44984), [#46249](https://github.com/PaddlePaddle/Paddle/pull/46249)
+ - In order to unify API dynamic and static behaviors, 20 operators are upgraded to support variable attribute information of Op in static graphs, to ensure consistent dynamic and static behaviors and improve the success rate of dynamic-to-static conversion of models. Include `pad2d`,`depthwise_conv2d_transpose`,`conv2d_transpose`,`adaptive_avg_pool2d`,`reverse`,`bincount`,`multinomial`,`reduce_sum`,`reduce_mean`,`reduce_prod`,`reduce_min`,`reduce_max`,`uniform`,`squeeze`,`max_unpool2d`,`dropout`,`cumsum`,`eye`,`argmin`,`argmax`. [#44737](https://github.com/PaddlePaddle/Paddle/pull/44737), [#45084](https://github.com/PaddlePaddle/Paddle/pull/45084), [#45189](https://github.com/PaddlePaddle/Paddle/pull/45189), [#45391](https://github.com/PaddlePaddle/Paddle/pull/45391), [#45417](https://github.com/PaddlePaddle/Paddle/pull/45417), [#45427](https://github.com/PaddlePaddle/Paddle/pull/45427), [#45514](https://github.com/PaddlePaddle/Paddle/pull/45514), [#45525](https://github.com/PaddlePaddle/Paddle/pull/45525), [#45543](https://github.com/PaddlePaddle/Paddle/pull/45543), [#45660](https://github.com/PaddlePaddle/Paddle/pull/45660), [#46352](https://github.com/PaddlePaddle/Paddle/pull/46352/), [#46433](https://github.com/PaddlePaddle/Paddle/pull/46433), [#45078](https://github.com/PaddlePaddle/Paddle/pull/45078), [#45342](https://github.com/PaddlePaddle/Paddle/pull/45342), [#45372](https://github.com/PaddlePaddle/Paddle/pull/45372), [#45453](https://github.com/PaddlePaddle/Paddle/pull/45453), [#45522](https://github.com/PaddlePaddle/Paddle/pull/45522), [#45620](https://github.com/PaddlePaddle/Paddle/pull/45620)
+ - In order to solve the problem of occasional loss of error reporting stack for user dynamic-to-static, the logic of the error reporting module is optimized to improve the readability of the error reporting stack and the user debugging experience. [#44054](https://github.com/PaddlePaddle/Paddle/pull/44054), [#44083](https://github.com/PaddlePaddle/Paddle/pull/44083), [#44781](https://github.com/PaddlePaddle/Paddle/pull/44781), [#44996](https://github.com/PaddlePaddle/Paddle/pull/44996)
+ - Add the TypeHint syntax recognition and transcription module to fully support Python Type Hint syntax. [#47121](https://github.com/PaddlePaddle/Paddle/pull/47121)
+
+- **PHI operator library covers the full amount of arithmetic class operators**:Continuously build the highly reusable operator library PHI. The remaining PaddlePaddle 2.x arithmetic class PythonAPI-associated operators and related kernels are migrated to the PHI operators library and rewritten as functional expression. Add about 180 forward/reverse operator CPU&GPU kernels, and 170 Kunlun-specific arithmetic kernels. This further enhances the kernel function sets that can be reused when new operators are added. In addition, add more than 100 C++ arithmetic class APIs. These APIs can be used in the custom operators, further enhancing the ease of use for external extension development based on the PaddlePaddle. [#44577](https://github.com/PaddlePaddle/Paddle/pull/44577), [#44631](https://github.com/PaddlePaddle/Paddle/pull/44631), [#44434](https://github.com/PaddlePaddle/Paddle/pull/44434), [#44605](https://github.com/PaddlePaddle/Paddle/pull/44605), [#44676](https://github.com/PaddlePaddle/Paddle/pull/44676), [#44742](https://github.com/PaddlePaddle/Paddle/pull/44742), [#44436](https://github.com/PaddlePaddle/Paddle/pull/44436) , [#45887](https://github.com/PaddlePaddle/Paddle/pull/45887), [#45851](https://github.com/PaddlePaddle/Paddle/pull/45851), [#45623](https://github.com/PaddlePaddle/Paddle/pull/45623), [#45397](https://github.com/PaddlePaddle/Paddle/pull/45397), [#45863](https://github.com/PaddlePaddle/Paddle/pull/45863)
+
+- **Normalized operator definitions with significantly improving the model simplicity**:For the problems of many redundant parameters in the historical operator definitions of PaddlePaddle 1.x and the high cost of understanding the adaptation, the redundant parameters of about 150 high-frequency operators are cleaned up centrally. Basically, the mathematically irrelevant parameters are removed. After these redundant parameters are cleaned up, the amount of information in the inference model stored in the PaddlePaddle is significantly reduced. Generally, about 40% of the attribute variables are removed, significantly improving the clarity of the PaddlePaddle operator definition, and improving the experience of model analysis and debugging. Meanwhile, the size of the inference model stored in the PaddlePaddle is also significantly reduced by more than 70%. As a result, this can significantly improve the lightweight of the PaddlePaddle model. [#44310](https://github.com/PaddlePaddle/Paddle/pull/44310) , [#45613](https://github.com/PaddlePaddle/Paddle/pull/45613) , [#45684](https://github.com/PaddlePaddle/Paddle/pull/45684) , [#45708](https://github.com/PaddlePaddle/Paddle/pull/45708) , [#45758](https://github.com/PaddlePaddle/Paddle/pull/45758) , [#45786](https://github.com/PaddlePaddle/Paddle/pull/45786) , [#45772](https://github.com/PaddlePaddle/Paddle/pull/45772) , [#45845](https://github.com/PaddlePaddle/Paddle/pull/45845) , [#45984](https://github.com/PaddlePaddle/Paddle/pull/45984) , [#46218](https://github.com/PaddlePaddle/Paddle/pull/46218) , [#46553](https://github.com/PaddlePaddle/Paddle/pull/46553)
+
+### (4)Performance optimization
+
+- AMP performance and accuracy optimization
+ - More operators are added with the support of FP16 data types, including elementwise series operators, compare series operators, strided_slice, set_value, uniform_ramdom, etc.([#45504](https://github.com/PaddlePaddle/Paddle/pull/45504) [#44405](https://github.com/PaddlePaddle/Paddle/pull/44405) [#45496](https://github.com/PaddlePaddle/Paddle/pull/45496) [#46641](https://github.com/PaddlePaddle/Paddle/pull/46641), [#46906](https://github.com/PaddlePaddle/Paddle/pull/46906) )
+ - Optimize the implementation scheme of the hard_swish operator FP16 Kernel to guarantee the accuracy without loss. ( [35386](https://github.com/PaddlePaddle/Paddle/pull/35386) )
+ - More operators are added with the support of BF16 data types, including fused_linear, empty, selu, pow, adam, clip, embedding, gelu, pad3d, pixel_shuffle, tile, where, etc. [#46364](https://github.com/PaddlePaddle/Paddle/pull/46364), [#47177](https://github.com/PaddlePaddle/Paddle/pull/47177)
+- AutoTuning of single machine training performance
+ - Transpose OP supports automatic Kernel selection mechanism. This allows the automatic search for the best Kernel implementation for different model configurations, improving the model performance. [#43310](https://github.com/PaddlePaddle/Paddle/pull/43310) (Transpose Op access AutoTuning function)
+ - AMP Layout auto-switching supports the new dynamic graph mode. For the ResNet50, TSM, and DeepLabV3 models, the performance increases by 9%-21% by Layout AutoTuning in the new dynamic graph. ([#45409](https://github.com/PaddlePaddle/Paddle/pull/45409), [#45751](https://github.com/PaddlePaddle/Paddle/pull/45751), [#45826](https://github.com/PaddlePaddle/Paddle/pull/45826), [#46880](https://github.com/PaddlePaddle/Paddle/pull/46880))
+- Generic performance optimization of GPU single machine training
+ - Optimize the Cache scheme of the Conv operator cuDNN algorithm and Cache the results in all algorithm acquisition methods. This can significantly reduce the CPU overhead of the operator.([#41891](https://github.com/PaddlePaddle/Paddle/pull/41891) [#47197](https://github.com/PaddlePaddle/Paddle/pull/47197) )
+ - Further optimize the GPU Kernel and Python side performance of multiple operators, including dist, poisson, depthwise_conv2d, transpose, eigh, broadcast computation, reduce computation, layer_norm, cross_entropy, etc. This can achieve better performance in more configuration scenarios. ([#44946](https://github.com/PaddlePaddle/Paddle/pull/44946), [#45057](https://github.com/PaddlePaddle/Paddle/pull/45057), [#45160](https://github.com/PaddlePaddle/Paddle/pull/45160), [#42491](https://github.com/PaddlePaddle/Paddle/pull/42491), [#42704](https://github.com/PaddlePaddle/Paddle/pull/42704), [#42853](https://github.com/PaddlePaddle/Paddle/pull/42853), [#46287](https://github.com/PaddlePaddle/Paddle/pull/46287), [#46362](https://github.com/PaddlePaddle/Paddle/pull/46362), [#46490](https://github.com/PaddlePaddle/Paddle/pull/46490), [#46412](https://github.com/PaddlePaddle/Paddle/pull/46412), [#46623](https://github.com/PaddlePaddle/Paddle/pull/46623), [#40051](https://github.com/PaddlePaddle/Paddle/pull/40051) )
+- Performance optimization of distributed training for collective communications
+ - To improve pipeline parallel scheduling efficiency, support the dynamic graph Interleaving1F1B scheduling policy. In the GPT-3 model, the performance is improved by 3%-4%. [#45797](https://github.com/PaddlePaddle/Paddle/pull/45797) , [#45869](https://github.com/PaddlePaddle/Paddle/pull/45869) , [#45922](https://github.com/PaddlePaddle/Paddle/pull/45922) , [#46209](https://github.com/PaddlePaddle/Paddle/pull/46209) , [#45402](https://github.com/PaddlePaddle/Paddle/pull/45402) , [#45444](https://github.com/PaddlePaddle/Paddle/pull/45444) , [#45497](https://github.com/PaddlePaddle/Paddle/pull/45497) , [#45797](https://github.com/PaddlePaddle/Paddle/pull/45797) , [#45869](https://github.com/PaddlePaddle/Paddle/pull/45869) , [#45922](https://github.com/PaddlePaddle/Paddle/pull/45922), [#46209](https://github.com/PaddlePaddle/Paddle/pull/46209), [#46399](https://github.com/PaddlePaddle/Paddle/pull/46399) , [#46483](https://github.com/PaddlePaddle/Paddle/pull/46483) , [#46876](https://github.com/PaddlePaddle/Paddle/pull/46876) , [#47242](https://github.com/PaddlePaddle/Paddle/pull/47242) , [#47249](https://github.com/PaddlePaddle/Paddle/pull/47249) , [#47497](https://github.com/PaddlePaddle/Paddle/pull/47497) , [#47517](https://github.com/PaddlePaddle/Paddle/pull/47517)
+ - To improve the distributed training performance of the MLPerfBERT model, the DistributedFusedLamb distributed optimizer supports hierarchical AllReduce. It improves MLPerfBERT performance by 17% on the DCU1024 card. [#44821](https://github.com/PaddlePaddle/Paddle/pull/44821) , [#44843](https://github.com/PaddlePaddle/Paddle/pull/44843)
+ - To optimize the video memory footprint when using DataParallel, the Buffer Lazy initialization policy for Tensor Fusion is supported, thus reducing the video memory footprint by an amount equal to the number of model parameters. [#45631](https://github.com/PaddlePaddle/Paddle/pull/45631).
+ - Distributed parallel policies DataParallel and Sharding support BF16 training. [#46846](https://github.com/PaddlePaddle/Paddle/pull/46846) , [#47246](https://github.com/PaddlePaddle/Paddle/pull/47246)
+ - To support the Sequence Parallel policy, the Distributed Pipeline Parallel supports enable_partial_send_recv policy, and supports the tensor after slice of the transmission sequence parallel. [#46992](https://github.com/PaddlePaddle/Paddle/pull/46992) , [#47083](https://github.com/PaddlePaddle/Paddle/pull/47083)
+ - To improve the performance of sharding stage 2 policy, implement the overlap of sharding stage 2 optimizer broadcast parameters with next step forward and use multi-CUDA Stream for communication. In the GPT 6.7B model, the 16-card training performance is improved by 11%. [#46495](https://github.com/PaddlePaddle/Paddle/pull/46495) , [#46656](https://github.com/PaddlePaddle/Paddle/pull/46656) , [#47061](https://github.com/PaddlePaddle/Paddle/pull/47061)
+
+### (5)Bug fix
+
+- Dynamic-to-static
+ - Fix the bug of reporting an error in dynamic-to-static of the model in a Parameter no-gradient scenario during multi-card training. [#44485](https://github.com/PaddlePaddle/Paddle/pull/44485)
+ - Fix the bug of where redundant frame logs are mistakenly output by the terminal in the dynamic-to-static. [#45754](https://github.com/PaddlePaddle/Paddle/pull/45754), [#46800](https://github.com/PaddlePaddle/Paddle/pull/46800)
+ - Fix the bug of reporting an error in the dynamic-to-static training when the control flow in the model contains a Tensor that does not require a gradient. [#43034](https://github.com/PaddlePaddle/Paddle/pull/43034)
+ - Fix the bug of incorrect computation value during gradient aggregation in the dynamic-to-static training. [#44893](https://github.com/PaddlePaddle/Paddle/pull/44893)
+ - Fix the bug of reporting an error in the dynamic-to-static when the function is decorated with @staticmethod. [#44983](https://github.com/PaddlePaddle/Paddle/pull/44983), [#45268](https://github.com/PaddlePaddle/Paddle/pull/45268), [#45277](https://github.com/PaddlePaddle/Paddle/pull/45277)
+ - Fix the bug of too much video memory footprint in some scenarios where the model contains the dynamic-to-static training. [#45380](https://github.com/PaddlePaddle/Paddle/pull/45380)
+ - Fix the bug of reporting an error of dynamic-to-static shape derivation in the networking phase when the model contains a complex control flow. [#45916](https://github.com/PaddlePaddle/Paddle/pull/45916), [#46020](https://github.com/PaddlePaddle/Paddle/pull/46020)
+- Fix the error report mechanism
+ - Replace self.assertTrue(np.allclose(...)) with np.testing.assert_allclose to get fuller error reporting information ( [#44947](https://github.com/PaddlePaddle/Paddle/pull/44947), [#44988](https://github.com/PaddlePaddle/Paddle/pull/44988), [#45213](https://github.com/PaddlePaddle/Paddle/pull/45213))
+- Distributed training in collective communications
+ - Fix several bugs in communication library initialization and communication process, and enhance the system operation stability. [#44964](https://github.com/PaddlePaddle/Paddle/pull/44964) [#45100](https://github.com/PaddlePaddle/Paddle/pull/45100) [#44758](https://github.com/PaddlePaddle/Paddle/pull/44758)
+ - Fix the bug of frequent occurrences of hang in pipeline parallel, and enhance the ease of use of the policy [#47201](https://github.com/PaddlePaddle/Paddle/pull/47201); enhance the pipeline function to support unbalanced input. [#47199](https://github.com/PaddlePaddle/Paddle/pull/47199)
+ - Fix the bug that the performance of the new dynamic graph MP/PP policy is lower than the old dynamic graph. [#47071](https://github.com/PaddlePaddle/Paddle/pull/47071)
+ - Fix the bug that the shardingstage2 policy incorrectly maintains the parameter trainable property. [#47240](https://github.com/PaddlePaddle/Paddle/pull/47240)
+ - Fix the bug that tensornumel is greater than INT32_MAX in series of OPs. [#45711](https://github.com/PaddlePaddle/Paddle/pull/45711), [#45741](https://github.com/PaddlePaddle/Paddle/pull/45741), [#45897](https://github.com/PaddlePaddle/Paddle/pull/45897), [#46158](https://github.com/PaddlePaddle/Paddle/pull/46158), [#46767](https://github.com/PaddlePaddle/Paddle/pull/46767), [#47191](https://github.com/PaddlePaddle/Paddle/pull/47191), [#46045](https://github.com/PaddlePaddle/Paddle/pull/46045), [#46160](https://github.com/PaddlePaddle/Paddle/pull/46160)
+ - Fix the bug of too much video memory footprint in FusedAttention and Fused FeedForward OP.[#47236](https://github.com/PaddlePaddle/Paddle/pull/47236), [#47235](https://github.com/PaddlePaddle/Paddle/pull/47235)
+ - Fix the bug of incorrect parameter update in multi_tensor_adam and multi_tensor_momentumOP when the parameters passed in are listofdict. [#47352](https://github.com/PaddlePaddle/Paddle/pull/47352), [#47372](https://github.com/PaddlePaddle/Paddle/pull/47372)
+
+## 4. Deployment direction (Paddle Inference)
+
+### (1)New features
+
+- Optimize the back-end graph engine integration scheme
+ - In order to reduce Paddle-TensorRT plugin code development and reduce the number of Paddle-TensorRT subgraphs and thus reducing resource usage, a generic plugin mechanism has been developed, to automatically provide a unified TensorRT plugin interface for rich Phi operators in the framework. As a result, the video memory footprint can be effectively reduced in most scenarios. [#46970](https://github.com/PaddlePaddle/Paddle/pull/46070), [#46179](https://github.com/PaddlePaddle/Paddle/pull/46179), [#46580](https://github.com/PaddlePaddle/Paddle/pull/46580)
+ - In order to facilitate users to customize operators in the framework and make Paddle-TensorRT perform efficient inference, the function is upgraded to support the framework custom Paddle-TensorRT plugin. [#46970](https://github.com/PaddlePaddle/Paddle/pull/46070)
+- Optimize the Inference library build system. The size can be pruned on demand
+ - Pre-compiled installer supports TensorRT by default: The pre-compiled installer for training and the pre-compiled installer for deployment (Paddle Inference) are unified into one pre-compiled installer. The build system is optimized so that the pre-compiled installer supports TensorRT by default, reducing the switching cost for users using PaddleTensorRT. [#46008](https://github.com/PaddlePaddle/Paddle/pull/46008), [#45824](https://github.com/PaddlePaddle/Paddle/pull/45824), [#46058](https://github.com/PaddlePaddle/Paddle/pull/46058)
+ - The size can be pruned on demand: Pruned according to the model operator. [#47033](https://github.com/PaddlePaddle/Paddle/pull/47033) , [#47049](https://github.com/PaddlePaddle/Paddle/pull/47049) , [#47047](https://github.com/PaddlePaddle/Paddle/pull/47047)
+- Inference supports native AMP
+ - In order to make full use of GPUTensorCore computation capability and improve the model inference performance, a model accuracy conversion tool has been developed. The InferenceGPU natively supports the inference of the mixed precision model. For the usages, refer to the documentation. [documentation](https://github.com/PaddlePaddle/Paddle-Inference-Demo/blob/release/v2.4/docs-official/guides/nv_gpu_infer/gpu_mixed_precision.md), [#43814](https://github.com/PaddlePaddle/Paddle/pull/43814), [#43881](https://github.com/PaddlePaddle/Paddle/pull/43881), [#44057](https://github.com/PaddlePaddle/Paddle/pull/44057), [#44307](https://github.com/PaddlePaddle/Paddle/pull/44307), [#44457](https://github.com/PaddlePaddle/Paddle/pull/44457), [#44866](https://github.com/PaddlePaddle/Paddle/pull/44866), [#45050](https://github.com/PaddlePaddle/Paddle/pull/45050), [#45346](https://github.com/PaddlePaddle/Paddle/pull/45346), [#45379](https://github.com/PaddlePaddle/Paddle/pull/45379), [#45406](https://github.com/PaddlePaddle/Paddle/pull/45406), [#45882](https://github.com/PaddlePaddle/Paddle/pull/45882)
+ - In order to improve the inference performance of the mixed precision model, the FP16kernel of high-frequency operators that do not support FP16 computation is supplemented, thus reducing the possibility of inserting the cast operator due to input precision mismatch. The inference performance is improved. [#44642](https://github.com/PaddlePaddle/Paddle/pull/44642), [#45061](https://github.com/PaddlePaddle/Paddle/pull/45061), [#44653](https://github.com/PaddlePaddle/Paddle/pull/44653), [#45504](https://github.com/PaddlePaddle/Paddle/pull/45504), [#45061](https://github.com/PaddlePaddle/Paddle/pull/45061), [#44969](https://github.com/PaddlePaddle/Paddle/pull/44969), [#44558](https://github.com/PaddlePaddle/Paddle/pull/44558), [#44710](https://github.com/PaddlePaddle/Paddle/pull/44710), [#43871](https://github.com/PaddlePaddle/Paddle/pull/43871), [#44792](https://github.com/PaddlePaddle/Paddle/pull/44792)
+- Upgrade the compression and inference engine
+ - Upgrade the quantization model storage format. The new format supports PaddleInference, PaddleLite and Paddle2ONNX 3 deployment methods. The supported chips include X86 CPU, NVIDIA GPU, and Arm CPU. ([#46305](https://github.com/PaddlePaddle/Paddle/pull/46305), [#462832](https://github.com/PaddlePaddle/Paddle/pull/46283), [#46022](https://github.com/PaddlePaddle/Paddle/pull/46022) )
+ - Add the INT8 full quantization function compatible with SoC/NPU chips. This can ensure the output INT8 quantization model has the best inference acceleration and precision on SoC/NPU chips.
+- Add the INT8 full quantization function compatible with SoC/NPU chips. This can ensure the output INT8 quantization model has the best inference acceleration and precision on SoC/NPU chips.
+ - Upgrade the interface module between the PaddlePaddle framework and compiler, to support inference models to access the compiler for optimization via Paddle Inference. ([#44499](https://github.com/PaddlePaddle/Paddle/pull/44499) [#44708](https://github.com/PaddlePaddle/Paddle/pull/44708) )
+
+### (2)Underlying optimization
+
+- **GPU performance optimization**
+ - Add the TensorRT mapping for operators such as matmul_v2, LSTM, reshape, fill_constant, swish, mulitclass_nms3, bilinear_interp_v2, split, silu, shuffle_channel operators. Optimize the support for the dynamic shape. Performance improved by 7% to 90% for multi-class focused models. ([#46177](https://github.com/PaddlePaddle/Paddle/pull/46177), [#44678](https://github.com/PaddlePaddle/Paddle/pull/44678), [#44314](https://github.com/PaddlePaddle/Paddle/pull/44314), [#44561](https://github.com/PaddlePaddle/Paddle/pull/44561), [#45166](https://github.com/PaddlePaddle/Paddle/pull/45166), [#44411](https://github.com/PaddlePaddle/Paddle/pull/44411), [#43424](https://github.com/PaddlePaddle/Paddle/pull/43424), [#44516](https://github.com/PaddlePaddle/Paddle/pull/44516))
+ - Add constant folding PASS for inference performance optimization, to improve the performance of SwinTransformer, HifiGAN, FastSpeech2, and other models.([#45494](https://github.com/PaddlePaddle/Paddle/pull/45494))
+ - Add cache of conv_fusionworkspacesize, to improve the computation performance of conv_fusion. ([#45902](https://github.com/PaddlePaddle/Paddle/pull/45902))
+- **Vision ViT model optimization**
+ - Add the ViT model Attention structure fusion PASS, and support OSSPlugin and auto padding. The ViT inference speed increases by 30%-40%. [#45019](https://github.com/PaddlePaddle/Paddle/pull/45019) [#45506](https://github.com/PaddlePaddle/Paddle/pull/45506)
+- **Inference performance optimization of large model**
+ - To improve the inference speed of very large generative models and save the video memory, add INT8 implementation (fused_multi_transformer_int8_op) to the multi-layer Transformer fusion operator (fused_multi_transformer_op), and support quantized inference of generative models. Use the matrix multiplication algorithm to select, quantize/de-quantize the kernel fusion for performance optimization. [#46169](https://github.com/PaddlePaddle/Paddle/pull/46169)
+ - Add Pass for automatic matching fusion in order to improve the ease of use of fused_multi_transformer fusion for large model inference.
+- **CPU performance optimization**
+ - Optimize the speech U2++ model. The FP32 model inference speed is improved by 35%. The INT8 model inference speed is improved by 69%. ([#47592](https://github.com/PaddlePaddle/Paddle/pull/47592), [#47127](https://github.com/PaddlePaddle/Paddle/pull/47127), [#47391](https://github.com/PaddlePaddle/Paddle/pull/47391), [#47234](https://github.com/PaddlePaddle/Paddle/pull/47234), [#47009](https://github.com/PaddlePaddle/Paddle/pull/47009), [#47080](https://github.com/PaddlePaddle/Paddle/pull/47080))
+
+
+### (3)Bug fix
+
+- TensorRT workspace size supports int64. ([#44469](https://github.com/PaddlePaddle/Paddle/pull/44469) )
+- In Paddle-TRT, fully support Op's input as weight.([#45545](https://github.com/PaddlePaddle/Paddle/pull/45545) )
+- In Paddle-TRT, support conv2d_transpose/conv3d_transpose to have the output_padding attribute.([#45004](https://github.com/PaddlePaddle/Paddle/pull/45004) )
+- In Paddle-TRT, enhance the strided_slice support for dynamic shape. ([#46819](https://github.com/PaddlePaddle/Paddle/pull/46819) )
+- In Paddle-TRT, optimize the video memory footprint of context when running in multi-thread scenarios.([#45468](https://github.com/PaddlePaddle/Paddle/pull/45468) )
+- In Paddle-TRT, fix the bug of repeatedly generating serialization files in case of change of initialization sequences when multiple models run in the same process.([#43942](https://github.com/PaddlePaddle/Paddle/pull/43942) )
+- Fix the bug of occasional crash when Predictor is initialized to run for multiple times in the same process.([#45203](https://github.com/PaddlePaddle/Paddle/pull/45203) )
+- Fix the bug of abnormal inference accuracy of quantization models such as MobileNetV3_large, ERNIE 3.0-Medium and bert ([#45416](https://github.com/PaddlePaddle/Paddle/pull/45416), [#46283](https://github.com/PaddlePaddle/Paddle/pull/46283), [#45920](https://github.com/PaddlePaddle/Paddle/pull/45920) [#47573](https://github.com/PaddlePaddle/Paddle/pull/47574))
+
+## 5. Environment adaptation
+
+- The pre-compiled installer for training and the pre-compiled installer for deployment (Paddle Inference) are unified into one pre-compiled installer. The build system is optimized so that the pre-compiled installer supports TensorRT by default.
+- The pre-compiled installer for CUDA version 10.1 is cancelled.
+- Add the pre-compiled installer for CUDA 11.7.
+- Decrease of source code compilation time: Reduce inter-module dependencies, improve the parallel, and optimize the compilation speed of some modules. The full compilation time is reduced by about 20 minutes in total.
+- Support the running of PaddlePaddle on windows 11, Centos 8, Ubuntu 22.04, Jetson 5.02 system environment. Support to run PaddlePaddle linux installer in windows system by using the WSL 2 tool.
+- Fix the running error bug of the PaddlePaddle in glibc2.34+ environment.
+- Optimize the code style of C++, Python, CMake in the whole code repository. Introduce or upgrade the following code style checking tools.
+ - pre-commit is upgraded from 1.10.4 to 2.17.0: [#43103](https://github.com/PaddlePaddle/Paddle/pull/43103)
+ - pylint is changed from default version to specify as: [#43103](https://github.com/PaddlePaddle/Paddle/pull/43103)
+ - remove-crlf is upgraded from 1.0.1 to 1.1.14 : [#43103](https://github.com/PaddlePaddle/Paddle/pull/43103)
+ - cpplint is changed from default version to specify as 1.6.0 : [#43175](https://github.com/PaddlePaddle/Paddle/pull/43175), [#43978](https://github.com/PaddlePaddle/Paddle/pull/43978), [#43673](https://github.com/PaddlePaddle/Paddle/pull/43673), [#43679](https://github.com/PaddlePaddle/Paddle/pull/43679), [#43695](https://github.com/PaddlePaddle/Paddle/pull/43695), [#43733](https://github.com/PaddlePaddle/Paddle/pull/43733), [#43740](https://github.com/PaddlePaddle/Paddle/pull/43740)
+ - clang-format is upgrade from 3.8 to 13.0 : [#42840](https://github.com/PaddlePaddle/Paddle/pull/42840), [#43248](https://github.com/PaddlePaddle/Paddle/pull/43248), [#43329](https://github.com/PaddlePaddle/Paddle/pull/43329), [#43333](https://github.com/PaddlePaddle/Paddle/pull/43333), [#43633](https://github.com/PaddlePaddle/Paddle/pull/43633), [#43678](https://github.com/PaddlePaddle/Paddle/pull/43678)
+ - Introduce the black tool for python code style checking :[#46014](https://github.com/PaddlePaddle/Paddle/pull/46014)
+ - Introduce the cmakelint tool for cmake file code checking. Version is 1.4.2 : [#43222](https://github.com/PaddlePaddle/Paddle/pull/43222), [#43406](https://github.com/PaddlePaddle/Paddle/pull/43406), [#43414](https://github.com/PaddlePaddle/Paddle/pull/43414), [#43428](https://github.com/PaddlePaddle/Paddle/pull/43428)
+ - Introduce cmake-format for automatic formatting of cmake files. Version is 0.6.13 : [#43057](https://github.com/PaddlePaddle/Paddle/pull/43057)
+
+## 6. Hardware adaptation
+### Hygon DCU
+- Add the Profiler function on DCU, to collect, count and display performance data of model running process on DCU, and support DCU occupancy display at kernel level.
+### Kunlunxin Chip
+- Add Profiler function on Kunlunxin 2 generation chip, which can collect, count and display the performance data of model running process on Kunlunxin 2 generation chip, and support occupancy display of Kunlunxin 2 generation chip at kernel level.
+- Training/reasoning support for Kunlunxin 2 generation chips (Kunlunxin AI accelerator cards R200, R300, R200-8F, R200-8FS, RG800), a total of 51 models such as PPYOLOE, PP-OCR, ERNIE3.0, PP-TSM, PP-TTS, DLRM, PPO, etc. have been verified, supporting static graph + dynamic graph training, supporting mixed precision training, support single machine single card and single machine multi-card training, covering 5 fields of intelligent vision, natural language processing, intelligent speech, intelligent recommendation, reinforcement learning.
+### Cambricon
+- Support the training/inference of Cambricon MLU chip (MLU370 series of boards): The ResNet50, BERT, YoloV3, OCR-DB, Deeplabv3 and many other models are verified. Support the static graph + dynamic graph training. Support mixed precision training. Support the single machine single card and single machine multi-card training.
+### Graphcore
+- Support the training/inference of Graphcore IPU chip (including IPU Mk2 GC200 and Bow IPU). Support ResNet50, BERT and other models. Support the static graph and dynamic-to-static mode training. Support the single chip, single machine, and multi-machine distributed training.
+- Add the support of more operators
+- Upgrade to Poplar SDK v3.0.0 [#46892](https://github.com/PaddlePaddle/Paddle/pull/46892)
+* Support the training models by using the dynamic-to-static mode. Add a new paddle.incubate.identity_loss op to assist with composition [#43770](https://github.com/PaddlePaddle/Paddle/pull/43770)
+* Support the Paddle native distributed training API: paddle.distributed.launch [#43311](https://github.com/PaddlePaddle/Paddle/pull/43311)
+* Support the training models with the mixed precision [#41733](https://github.com/PaddlePaddle/Paddle/pull/41733)
+* Paddle Inference supports custom operators by using PopART [#45235](https://github.com/PaddlePaddle/Paddle/pull/45235)
+
+### Intel
+- Migrate oneDNN operators : transpose2_grad([#46139](https://github.com/PaddlePaddle/Paddle/pull/46139)), relu6_grad([#46501](https://github.com/PaddlePaddle/Paddle/pull/46501)), gaussian_random([#46747](https://github.com/PaddlePaddle/Paddle/pull/46747), [#45481](https://github.com/PaddlePaddle/Paddle/pull/45481)), sgd and stack([#46374](https://github.com/PaddlePaddle/Paddle/pull/46374)), concat+grad, expand+grad,fill_constant([#45863](https://github.com/PaddlePaddle/Paddle/pull/45863)), slice, slice_grad, split,pad and pad3d([#46101](https://github.com/PaddlePaddle/Paddle/pull/46101)), softmax_grad([#46257](https://github.com/PaddlePaddle/Paddle/pull/46257)), Shape([#46051](https://github.com/PaddlePaddle/Paddle/pull/46051)), Sum([#46239](https://github.com/PaddlePaddle/Paddle/pull/46239)), Transpose2_grad([#46139](https://github.com/PaddlePaddle/Paddle/pull/46139)), Cast, clip+grad andpool+grad([#45775](https://github.com/PaddlePaddle/Paddle/pull/45775)), Reduce sum+grad,mean+grad, min and max([#45536](https://github.com/PaddlePaddle/Paddle/pull/45536)), Relu and abs([#45397](https://github.com/PaddlePaddle/Paddle/pull/45397)), Gelu([#45596](https://github.com/PaddlePaddle/Paddle/pull/45596)), Scale([#45537](https://github.com/PaddlePaddle/Paddle/pull/45537))
+- Optimize kernels of fill_constant, fc, conv, and a number of operators
+- Add several Pass fusion optimizations
+- Optimize the Adam-W CPU FP32 optimizer ([#42522](https://github.com/PaddlePaddle/Paddle/pull/42522))
+- Optimize pad3d fp32 onednn operator kernel implementation ([#43990](https://github.com/PaddlePaddle/Paddle/pull/43990))
+- Optimize the concurrent execution of matmul, FC andlookup_v2 kernels ([#44023](https://github.com/PaddlePaddle/Paddle/pull/44023), [#44078](https://github.com/PaddlePaddle/Paddle/pull/444078), [#44640](https://github.com/PaddlePaddle/Paddle/pull/44640), [#44744](https://github.com/PaddlePaddle/Paddle/pull/44744), [#45249](https://github.com/PaddlePaddle/Paddle/pull/45249))
+- FC onednn operator kernel supports bf16 ( [#42758](https://github.com/PaddlePaddle/Paddle/pull/42758), [#43154](https://github.com/PaddlePaddle/Paddle/pull/43154), [#43109](https://github.com/PaddlePaddle/Paddle/pull/43109))
+- Add the fusion of matrix multiplication and activation functions ([#43519](https://github.com/PaddlePaddle/Paddle/pull/43519), [#43198](https://github.com/PaddlePaddle/Paddle/pull/43198))
+- Support convolution operator int8 parameter production IR passes ( [#44680](https://github.com/PaddlePaddle/Paddle/pull/44680), [#42625](https://github.com/PaddlePaddle/Paddle/pull/42625))
+- Add pool/avg quantization and scales correction ([#44186](https://github.com/PaddlePaddle/Paddle/pull/44186))
+- Add the matmul and elementwise onednn operator kernel fusion ([#45077](https://github.com/PaddlePaddle/Paddle/pull/45077))
+- Fix the QAT precision bug ([#43693](https://github.com/PaddlePaddle/Paddle/pull/43693), [#45936](https://github.com/PaddlePaddle/Paddle/pull/45936), [#46378](https://github.com/PaddlePaddle/Paddle/pull/46378))
+- Migrate 42 oneDNN operator kernels to PHI operator library ([#46374](https://github.com/PaddlePaddle/Paddle/pull/46374), [#46101](https://github.com/PaddlePaddle/Paddle/pull/46101), [#45989](https://github.com/PaddlePaddle/Paddle/pull/45989), [#45863](https://github.com/PaddlePaddle/Paddle/pull/45863), [#45775](https://github.com/PaddlePaddle/Paddle/pull/45775), [#45626](https://github.com/PaddlePaddle/Paddle/pull/45626), [#45536](https://github.com/PaddlePaddle/Paddle/pull/45536), [#46501](https://github.com/PaddlePaddle/Paddle/pull/46501), [#46257](https://github.com/PaddlePaddle/Paddle/pull/46257), [#45596](https://github.com/PaddlePaddle/Paddle/pull/45596), [#45537](https://github.com/PaddlePaddle/Paddle/pull/45537), [#45481](https://github.com/PaddlePaddle/Paddle/pull/45481), [#45397](https://github.com/PaddlePaddle/Paddle/pull/45397), [#46239](https://github.com/PaddlePaddle/Paddle/pull/46239), [#46139](https://github.com/PaddlePaddle/Paddle/pull/46139), [#46051](https://github.com/PaddlePaddle/Paddle/pull/46051))
+- Quantize the elementwise_sub and shape operator kernels ([#42854](https://github.com/PaddlePaddle/Paddle/pull/42854), [#44124](https://github.com/PaddlePaddle/Paddle/pull/44124))
+
+## Thanks to our Contributors
+
+This release contains contributions from:
+
+0x45f, Aganlengzi, Ainavo, Allen Guo, Asthestarsfalll, Aurelius84, Baibaifan, baoachun, BiynXu, Bo Zhang, BrilliantYuKaimin, cambriconhsq, caozhou, carryyu, ccrrong, ceci3, chalsliu, Chang Xu, Charles-hit, Chen Long, Chen Weihang, chenjian, chentianyu03, Chenxiao Niu, cifar10, crystal, csy0225, danleifeng, David Nicolas, dc-cheny, denglin-github, dongfangshenzhu, duanboqiang, duanyanhui, engineer, enzodechine, Fan Zhang, feifei-111, Feiyu Chan, Feng Ni, feng_shuai, FlyingQianMM, freeliuzc, furnace, fuyou765, fwenguang, Ghost Screaming, gongweibao, Guanghua Yu, guguguzi, Guoxia Wang, Haipeng Wang, handiz, Haohongxiang, haosicheng, helen88, heliqi, hong, HongyuJia, houj04, huangxu96, Hui Zhang, Huihuang Zheng, huzhiqiang, Jacek Czaja, Jack Zhou, jack603047588, Jackwaterveg, jakpiase, james, Jiabin Yang, jiangcheng, Jiaqi Liu, JingZhuangzhuang, joanna.wozna.intel, JYChen, JZ-LIANG, Kaipeng Deng, kangguangli, kuizhiqing, Leo Chen, Leo Guo, levi131, Li Min, Li-fAngyU, lidanqing, LielinJiang, Ligoml, Lijunhui, lilong12, limingshu, Lin Manhui, Linjie Chen, liqitong-a, littletomatodonkey, liu zhengxi, Liu-xiandong, liutiexing, Liyulingyue, LiYuRio, Lux et Veritas, lyq, Matsumoto Ruko, MayYouBeProsperous, mengqingchun02, Ming-Xu Huang, ming1753, minghaoBD, moyan, mrcangye, Netpunk, niuliling123, Nyakku Shigure, OccupyMars2025, onecatcn, pangyoki, parap1uie-s, peachlcy, piotrekobi, Qi Li, QingshuChen, qipengh, Rayman, Regan Yue, RichardWooSJTU, risemeup1, Roc, ronnywang, Rui Li, Ruibiao Chen, seemingwang, Shang Zhizhou, shangliang Xu, ShenLiang, shentanyue, Shijie, ShiningZhang, shixingbo, shiyutang, Shuangchi He, Siming Dai, Sing_chan, Skr Bang, SmirnovKol, sneaxiy, sprouteer, Sylwester Fraczek, Sławomir Siwek, taixiurong, Tao CHANG, TeFeng Chen, Thomas Young, thunder95, Thunderbrook, tiancaishaonvjituizi, tianshuo78520a, Tomasz Socha, TTerror, USTCKAY, Vigi Zhang, Walter, Wang Bojun, wangguanqun, wangguanzhong, wanghuancoder, wangna11BD, WangXi, wangxinxin08, Wangzheee, WangZhen, wangzhen38, wawltor, wbn, Wei Shengyu, Weilong Wu, weishengying, Wen Sun, wenbin, whs, Wilber, WJJ1995, wuhuachaocoding, wuhuanzhou, wuyefeilin, XiaoguangHu, xiaoguoguo626807, xiaohemaikoo, xiaoting, xiaoxiaohehe001, Xiaoxu Chen, xiayanming, Xingyuan Zhang, xiongkun, yang131313, yangguohao, YangZhou, Yanxing Shi, Yao Zihang, yaoxuefeng, yaozhixin, yeliang2258, Yilingyelu, Yiqun Liu, ykkk2333, Yuang Liu, Yuanle Liu, YuanRisheng, yuguo, Yulong Ao, Yulv-git, YUNSHEN XIE, Zhang Jun, Zhang Ting, Zhang Zheng, zhangbo9674, zhangbopd, zhangchunle, Zhangjingyu06, zhangkaihuo, zhangxiaoci, zhangyikun02, zhangzhenguo, Zhanlue Yang, zhaocaibei123, zhaoying9105, zhaoyingli, Zhen Wang, Zhengyang Song, zhiboniu, Zhong Hui, Zhou Wei, zhoutianzi666, zhupengyang, ziyoujiyi, zlsh80826, zmxdream, zn, Zuza Gawrysiak, zyfncg, 傅剑寒, 六个骨头, 津, 熊峻峰, 王明冬, 石晓伟
# 2.3.1 Release Note