fixes to build on mac os x #17

djinn · 2016-09-01T11:22:35Z

Overall port for build process to Mac OS X. There is a problem with c++ stdlib but that is for another day.

reyoung · 2016-09-02T02:11:26Z

We will pull this code & test locally as soon as possible.

Thank you very much to patch this code.

The continuous integration sever for github is also under constructing.

reyoung · 2016-09-02T03:32:43Z

The std::exp not found, should include . And there are several issues for linux tcp. If you are not hurry for fixing it, we will fix these problems in next two weeks.

djinn · 2016-09-02T03:39:26Z

Yes. Please fix it. I just did a preliminary check on, if it works or not. Sadly, it will be long work before it will be usable.

kashif · 2016-09-02T13:55:55Z

I have also extended the cmake file to compile against the Accelerate framework... let me know if you want that snippet

reyoung · 2016-09-29T05:02:36Z

See #138 .

merge to local

add evaluate and predict for model

fix dropout bugs

fix pybind of FeatureNode

* Add Instruction data structure. * Add more methods for Instruction. * Update the cmake dependency of note_ir. * Add Function class for Note IR. * Print attribute values of instruction. * Add the assert statement in FunctionToString UT. * Add the unit test for Function and Instruction. * Add the default action for VisitParameter. * Fix errors of layout_test.cc and shape_test.cc. * Add comments for Instruction and Function. * Add non-const Accept for Instruction class. * Add comments of constructors in Instruction and Function.

* update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]>

make scheduler works

* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * polish some details * polish kernel signature details * fix a bug about offsets of the tensor, test=develop (#31) Co-authored-by: shixiaowei02 <[email protected]> * polish some details Co-authored-by: chentianyu03 <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: YuanRisheng <[email protected]> Co-authored-by: 石晓伟 <[email protected]>

* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * polish some details * polish kernel signature details * fix a bug about offsets of the tensor, test=develop (#31) Co-authored-by: shixiaowei02 <[email protected]> * add matmul kernel in pten * add unittest for new matmul_v2 kernel * fix bug of CI compile * fix bug of CI compile * merge conflict * remove useless file Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: chentianyu03 <[email protected]> Co-authored-by: YuanRisheng <[email protected]> Co-authored-by: 石晓伟 <[email protected]>

* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * Add Intermediate API layer * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * intermediate api adapt to new dense tensor * add some TODO and delete include header Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: chentianyu03 <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: 石晓伟 <[email protected]>

* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * polish some details * polish kernel signature details * fix a bug about offsets of the tensor, test=develop (#31) Co-authored-by: shixiaowei02 <[email protected]> * support multiply inputs and outputs * rm attrs {} * fix multioutputs bug * merge develop * remove unsed header file * add missing & in const reference * modify inputAt, outputAt to inputBetween, outputBetween Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: YuanRisheng <[email protected]> Co-authored-by: 石晓伟 <[email protected]>

…34425) * initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (PaddlePaddle#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (PaddlePaddle#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (PaddlePaddle#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (PaddlePaddle#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (PaddlePaddle#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (PaddlePaddle#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (PaddlePaddle#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (PaddlePaddle#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (PaddlePaddle#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (PaddlePaddle#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (PaddlePaddle#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (PaddlePaddle#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (PaddlePaddle#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (PaddlePaddle#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (PaddlePaddle#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (PaddlePaddle#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (PaddlePaddle#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * polish some details * polish kernel signature details * fix a bug about offsets of the tensor, test=develop (PaddlePaddle#31) Co-authored-by: shixiaowei02 <[email protected]> * polish some details Co-authored-by: chentianyu03 <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: YuanRisheng <[email protected]> Co-authored-by: 石晓伟 <[email protected]>

* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (PaddlePaddle#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (PaddlePaddle#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (PaddlePaddle#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (PaddlePaddle#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (PaddlePaddle#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (PaddlePaddle#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (PaddlePaddle#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (PaddlePaddle#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (PaddlePaddle#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (PaddlePaddle#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (PaddlePaddle#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (PaddlePaddle#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (PaddlePaddle#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (PaddlePaddle#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (PaddlePaddle#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (PaddlePaddle#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (PaddlePaddle#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * polish some details * polish kernel signature details * fix a bug about offsets of the tensor, test=develop (PaddlePaddle#31) Co-authored-by: shixiaowei02 <[email protected]> * add matmul kernel in pten * add unittest for new matmul_v2 kernel * fix bug of CI compile * fix bug of CI compile * merge conflict * remove useless file Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: chentianyu03 <[email protected]> Co-authored-by: YuanRisheng <[email protected]> Co-authored-by: 石晓伟 <[email protected]>

* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (PaddlePaddle#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (PaddlePaddle#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (PaddlePaddle#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (PaddlePaddle#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (PaddlePaddle#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (PaddlePaddle#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (PaddlePaddle#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (PaddlePaddle#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (PaddlePaddle#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (PaddlePaddle#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (PaddlePaddle#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (PaddlePaddle#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (PaddlePaddle#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (PaddlePaddle#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (PaddlePaddle#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (PaddlePaddle#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * Add Intermediate API layer * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (PaddlePaddle#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * intermediate api adapt to new dense tensor * add some TODO and delete include header Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: chentianyu03 <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: 石晓伟 <[email protected]>

* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (PaddlePaddle#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (PaddlePaddle#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (PaddlePaddle#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (PaddlePaddle#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (PaddlePaddle#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (PaddlePaddle#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (PaddlePaddle#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (PaddlePaddle#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (PaddlePaddle#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (PaddlePaddle#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (PaddlePaddle#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (PaddlePaddle#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (PaddlePaddle#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (PaddlePaddle#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (PaddlePaddle#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (PaddlePaddle#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (PaddlePaddle#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * polish some details * polish kernel signature details * fix a bug about offsets of the tensor, test=develop (PaddlePaddle#31) Co-authored-by: shixiaowei02 <[email protected]> * support multiply inputs and outputs * rm attrs {} * fix multioutputs bug * merge develop * remove unsed header file * add missing & in const reference * modify inputAt, outputAt to inputBetween, outputBetween Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: YuanRisheng <[email protected]> Co-authored-by: 石晓伟 <[email protected]>

* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * Add Intermediate API layer * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * intermediate api adapt to new dense tensor * add some TODO and delete include header * Support XPU for Flatten Kernel * fix bugs when run kunlun ci * fix compile bugs * fix bugs for kunlun ci * fix compile bugs when run kunlun * fix compile bugs in kunlun * fix compile bugs in kunlun * fix bugs when compile * fix bugs when compile * fix compile bug * delete useless annotation Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: chentianyu03 <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: 石晓伟 <[email protected]>

* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * polish some details * polish kernel signature details * fix a bug about offsets of the tensor, test=develop (#31) Co-authored-by: shixiaowei02 <[email protected]> * polish some details * add fill_constant kernel in pten * fix bug of full api (c++) * remove the support for SelectRows in new fill_constant kernel * fix bug of setting fill_any_like kernel key * merge code confilct * modify fill_constant GetExpectedKernelType * fix fill_constant KernelType bug * polish code of build pten KernelContext * refactor code of fill_constant in pten Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: chentianyu03 <[email protected]> Co-authored-by: YuanRisheng <[email protected]> Co-authored-by: 石晓伟 <[email protected]>

2 python api

* apply lint * add ipu_build_strategy * del unused code * add popart_canonicalization_utils to ipu_backend deps * Update compiler.py * Update ipu_build_strategy.cc

…nt-for-distill-lstm Improve data augmentation for distilling Bi-LSTM

adapt uint64 and iterate edge table

fix ins bug, add mean logloss gpu op

* add fp16 doc

[DRR] Suport Tensor Assgin API in ResultPattern

Paddlebox

[MTAI-484] fix(build): update submodule eigen3 commit id

README: Change Mandarin to Chinese

support LiftToAnchorPattern for reduce tree pattern

…lt (PaddlePaddle#17)

add lvdm

add v3 api

* [fix] fix fail test when backend is mack * [metax]change_cupti_and_fix_softmax (#7) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * [Metax] fix dgc & mklml compile product path problem (#8) * [Metax] fix accuracy kernel & add test_accuracy_op_metax.py unit test (#9) * [Metax] fix dgc & mklml compile product path problem * [Metax] fix accuracy kernel & add test_accuracy_op_metax.py unit test * [Metax] add mixed_vector fix & update change patch * [Metax] update metax_gpu CMakeLists.txt (#10) * [Metax] fix dgc & mklml compile product path problem * [Metax] fix accuracy kernel & add test_accuracy_op_metax.py unit test * [Metax] add mixed_vector fix & update change patch * [Metax] update metax_gpu CMakeLists.txt * [metax] updata_qr_kernel (#11) * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * [Metax] fix illegal address access error in test_momentum_op (#12) * [Metax] fix illegal address access error in test_momentum_op * [Metax] fix cufft and fix some blas kernel apply (#13) * [Metax] fix cufft and fix some blas kernel apply * [metax] add warpctc_warprnn (#14) * [metax] fix bug * [Metax] update metax CI (#15) * [Metax] update metax CI * [Metax] update metax CI CMakeLists (#16) * [Metax] update metax CI * [Metax] update metax CI CMakeLists * [Metax] add github action (#18) * [Metax] add github action --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * [metax] chang build (#19) * [metax]chaneg build --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * change_build (#20) * [metax]chaneg build --------- * change_build (#21) * change_build (#22) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * 【metax】modify cmake for warpctc and warprnnt (#17) * modify cmake for warpctc and warprnnt * modify conv for tf32 and fp32 * modify conv kernel * [metax]modify library to static library (#24) * modify cmake for warpctc and warprnnt * modify conv for tf32 and fp32 * modify conv kernel * modify library to static library * [Metax] organize documents (#25) * [Metax] fix dgc & mklml compile product path problem * [Metax] update metax_gpu CMakeLists.txt * [Metax] organize documents * [metax]fix_code style and index_elementwise_put_kernel (#27) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * change_build_917 (#29) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel * [metax]change_build --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * chang_build (#30) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel * [metax]change_build * [metax]change_build --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * [metax]modify kernel (#31) * modify cmake for warpctc and warprnnt * modify conv for tf32 and fp32 * modify conv kernel * modify library to static library * modify kernel * change_metax_work (#32) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel * [metax]change_build * [metax]change_build * change_metax_work * change_metax_work --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * change_build (#33) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel * [metax]change_build * [metax]change_build * change_metax_work * change_metax_work * change_metax_work --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * [metax] modify fused_bias_dropout_residual_layer_norm (#34) * modify cmake for warpctc and warprnnt * modify conv for tf32 and fp32 * modify conv kernel * modify library to static library * modify kernel * modify fused_bias_dropout_residual_layer_norm * change_build (#35) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel * [metax]change_build * [metax]change_build * change_metax_work * change_metax_work * change_metax_work * change_metax_work --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * change_build (#36) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel * [metax]change_build * [metax]change_build * change_metax_work * change_metax_work * change_metax_work * change_metax_work * change_metax_work --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * change_warpctc.cmake (#38) * change_warpctc.cmake * change_warpctc.cmake (#39) * change warpctc.cmake * test (#40) * test --------- * test_ut (#41) * change_run_ut --------- * tets (#43) * remove_tets --------- * test (#44) * test --------- * [metax] modify compile (#42) * modify cmake for warpctc and warprnnt * modify conv for tf32 and fp32 * modify conv kernel * modify library to static library * modify kernel * modify fused_bias_dropout_residual_layer_norm * modify compile * modify blas * [Metax] add log analysis script (#46) * [Metax] fix dgc & mklml compile product path problem * [Metax] update metax_gpu CMakeLists.txt * [Metax] organize documents * [Metax] add log analysis script * add_generate_pb (#47) * add_generate_pb --------- * modify blas (#51) * modify cmake for warpctc and warprnnt * modify conv for tf32 and fp32 * modify conv kernel * modify library to static library * modify kernel * modify fused_bias_dropout_residual_layer_norm * modify compile * modify blas * modify blas * modify blas * modify blas * [metax] modify tf32 (#52) * modify cmake for warpctc and warprnnt * modify conv for tf32 and fp32 * modify conv kernel * modify library to static library * modify kernel * modify fused_bias_dropout_residual_layer_norm * modify compile * modify blas * modify blas * modify blas * modify blas * modify context * [Metax] update metax backend CI test (#53) * [Metax] fix dgc & mklml compile product path problem * [Metax] update metax_gpu CMakeLists.txt * [Metax] organize documents * [Metax] add log analysis script * [Metax] update metax backend CI test * [Metax] fix log_analysis.py bug (#54) * [Metax] fix dgc & mklml compile product path problem * [Metax] update metax_gpu CMakeLists.txt * [Metax] organize documents * [Metax] add log analysis script * [Metax] update metax backend CI test * [Metax] fix log_analysis.py bug * [Metax] update metax CI CMakeLists & scripts (#56) * [Metax] fix dgc & mklml compile product path problem * [Metax] update metax_gpu CMakeLists.txt * [Metax] organize documents * [Metax] add log analysis script * [Metax] update metax backend CI test * [Metax] fix log_analysis.py bug * [Metax] update metax CI CMakeLists & scripts * [Metax] fix MatmulKernel problem (#57) * [Metax] fix dgc & mklml compile product path problem * [Metax] update metax_gpu CMakeLists.txt * [Metax] organize documents * [Metax] add log analysis script * [Metax] update metax backend CI test * [Metax] fix log_analysis.py bug * [Metax] update metax CI CMakeLists & scripts * [Metax] fix MatmulKernel problem * [Metax] update metax CI program * [metax]fix paddle bug" (#58) * [metax]fix paddle bug * change—ut (#59) * change_ut * change_ut (#60) * change_ut --------- * change_ut (#63) * change_ut * change_ut --------- * [Metax] add keyword filter in CI CMakeLists.txt (#64) * [Metax] add keyword filter in CI CMakeLists.txt * [Metax] add ignore case list * [metax] modify kernels (#67) * modify cmake for warpctc and warprnnt * modify conv for tf32 and fp32 * modify conv kernel * modify library to static library * modify kernel * modify fused_bias_dropout_residual_layer_norm * modify compile * modify blas * modify blas * modify blas * modify blas * modify context * modify kernels * Fix part of the missing kernel issues (#66) Co-authored-by: root <[email protected]> * [Metax] fix index_elementwise_get kernel (#68) * [Metax] add keyword filter in CI CMakeLists.txt * [Metax] add ignore case list * [Metax] fix phi::backends::gpu::DnnVersion() symbol not found * Revert "[Metax] fix phi::backends::gpu::DnnVersion() symbol not found" This reverts commit 087a9c1240f024210d536e543a2fc55db1175529. * [Metax] fix index_elementwise_get kernel * [metax]fix patch and fix missing kernel (#72) * [metax]fix patch and fix missing kernel * [metax] modify kernels (#73) * modify kernels * [metax] modify kernels (#74) * modify kernels * [metax] link mccl and fix missing kernel (#76) * [metax] link mccl and fix missing kernel * [metax] rename yaml file (#77) * [metax]fix patch and fix missing kernel * [metax] link mccl and fix missing kernel * [metax] rename yaml file --------- * [metax] rm file (#78) * [metax]fix patch and fix missing kernel * [metax] link mccl and fix missing kernel * [metax] rename yaml file * [metax] rm file * [metax] rm file --------- * metax_fix_ci (#79) * [metax] add Rules --------- * [metax] add print tensor (#91) * modify cmake for warpctc and warprnnt * modify conv for tf32 and fp32 * modify conv kernel * modify library to static library * modify kernel * modify fused_bias_dropout_residual_layer_norm * modify compile * modify blas * modify blas * modify blas * modify blas * modify context * modify kernels * modify kernels * modify kernels * add print tensor * [Metax] change_patch (#94) * [metax] change_patch --------- * update paddle (#95) * update paddle --------- * [metax] fix dot error (#96) * [metax] fix dot error --------- * Update metax_work.yaml * [metax]rm opt path and fix activation_kernel bug (#98) * [metax]rm opt path and fix activation_kernel bug --------- * updata_paddle (#99) * updata paddle --------- * [Metax] Fix some tests (#102) * fix some tests * [metax] support wint4 in quantize (#103) * updata_metax (#104) * test * test --------- * updata_metax (#105) * chang_meatx_yaml * chang_meatx_yaml * updata_metax * test * test * test * test --------- * add one test to metax (#107) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * fix some tests * add one test --------- Co-authored-by: sw <[email protected]> Co-authored-by: duqimeng <[email protected]> Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> * uodata_metax (#106) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel * [metax]change_build * [metax]change_build * change_metax_work * change_metax_work * change_metax_work * change_metax_work * change_metax_work * change_warpctc.cmake * change warpctc.cmake * test * change_run_ut * remove_tets * test * add_generate_pb * [metax]fix paddle bug * change_ut * change_ut * change_ut * [metax]fix patch and fix missing kernel * [metax] link mccl and fix missing kernel * [metax] rename yaml file * [metax] rm file * [metax] rm file * [metax] add Rules * [metax] change_patch * update paddle * [metax] fix dot error * [metax]rm opt path and fix activation_kernel bug * updata paddle * chang_meatx_yaml * chang_meatx_yaml * updata_metax * test * test * test * test * test * test * test * test * test * test * test * test --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * updata eigen_and fix_bug (#109) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel * [metax]change_build * [metax]change_build * change_metax_work * change_metax_work * change_metax_work * change_metax_work * change_metax_work * change_warpctc.cmake * change warpctc.cmake * test * change_run_ut * remove_tets * test * add_generate_pb * [metax]fix paddle bug * change_ut * change_ut * change_ut * [metax]fix patch and fix missing kernel * [metax] link mccl and fix missing kernel * [metax] rename yaml file * [metax] rm file * [metax] rm file * [metax] add Rules * [metax] change_patch * update paddle * [metax] fix dot error * [metax]rm opt path and fix activation_kernel bug * updata paddle * chang_meatx_yaml * chang_meatx_yaml * updata_metax * test * test * test * test * test * test * test * test * test * test * test * test * updata_enigen --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * updata paddle (#110) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel * [metax]change_build * [metax]change_build * change_metax_work * change_metax_work * change_metax_work * change_metax_work * change_metax_work * change_warpctc.cmake * change warpctc.cmake * test * change_run_ut * remove_tets * test * add_generate_pb * [metax]fix paddle bug * change_ut * change_ut * change_ut * [metax]fix patch and fix missing kernel * [metax] link mccl and fix missing kernel * [metax] rename yaml file * [metax] rm file * [metax] rm file * [metax] add Rules * [metax] change_patch * update paddle * [metax] fix dot error * [metax]rm opt path and fix activation_kernel bug * updata paddle * chang_meatx_yaml * chang_meatx_yaml * updata_metax * test * test * test * test * test * test * test * test * test * test * test * test * updata_enigen * updata_paddle --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> * test * [metax] modify kernels (#117) * modify kernels * modify kernels * fix activation_grad kernel (#118) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * fix some tests * add one test * fix one kernel --------- Co-authored-by: sw <[email protected]> Co-authored-by: duqimeng <[email protected]> Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> * updata flag_and_fix_activation * updata flag_and_fix_activation * updataignore --------- * updata_patch (#120) * updata_patch --------- * Update Paddle submodule to latest develop (#121) Co-authored-by: tianshuo78520a <[email protected]> * [metax] modify kernels (#122) * modify kernels * [Metax] fix weight_quant & weight_only_linear bug (#125) * [Metax] fix weight_quant & weight_only_linear bug * fix and add some kernels (#126) * fix and add some kernels * [Metax] fix 'WeightQuantizeKernel' wint4 branch (#133) * [Metax] fix 'WeightQuantizeKernel' wint4 branch * [Metax] add quanted weight layout transformation using CPU programming (#135) * [Metax] adjust quanted weight layout transformation * [Metax] add quanted weight layout transformation using GPU programming (#136) * [Metax] add quanted weight layout transformation using GPU programming * [Metax] updata_softmax (#138) * updata_softmax * udata patch (#139) * updata_patch --------- * [Metax] optimize wint4 quantization implementation (#140) * [Metax] optimize wint4 quantization implementation * change_flag (#141) * change_flag * [Metax] register fused_fc_elementwise_layernorm kernel (#143) * [Metax] register fused_fc_elementwise_layernorm kernel * updata paddle * [Metax] add private CI (#144) * [Metax] add private CI * [Metax] add Upload (#145) * [Metax] add Upload * test (#154) * ReRun CI (#150) * [metax]fix collect_fpn_proposals (#157) * [Metax_change_ut] * fix sum&collect_fpn_proposals op register * modify profile * [Metax] fix paddle bug replace 'MoeGradDispatchKernel' to 'MoeGateDispatchKernel' * [Metax] register bce_loss_grad & bce_loss & index_add_grad kernels * [Metax] con2d_grad use gpudnn * blas handle support * [Metax] register some kernels & update CMakeLists * [Metax] fix metax unittest fail * [Metax] add group_norm & label_smooth kernel and update matmul kernel * [Metax] fix rmsprop kernel register and add meshgrid & meshgrid_grad kernel register * add test * add test * [test] chang the logic of workspace_host in cholesky_kernel_register alloc(cpuplace,size), test pass alloc(cpuplace, size, stream), crash * [Metax] fix compile fail * Revert "[Metax] fix compile fail" This reverts commit 83bc87f686227962b0262e044225c6ed5507b824. * [Metax] fix compile fail by 'conv_transpose_grad_kernel_impl.h' * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] con2d_grad use gpudnn * [Metax]fix bug and add qr lstsq logsoftmax * [Metax] change_patch * [Metax] update unit test CMakeLists.txt * [Metax] update unit test CMakeLists.txt * [feature] add unique_consecutive kernel * [metax] add some kernel * [metax] add some kernel * [Metax] register baddbmm kernel & update blas api * [Metax] register baddbmm kernel & update blas api * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [feature] add add unique_consecutive kernel.cu * [fix] fix some test case due to missing op register * [fix] fix some fail text * [metax]fix lu eigvalshsqueeze rnn kernel * [metax]fix lu eigvalshsqueeze rnn kernel * add and fix some kernels * [Metax] register deformable_conv kernel & fix 'ModulatedDeformableCol2imCoord' symbol undefined * [Metax] fix conflict * [Metax] adapt to paddle-cpu-20250901 & resolve the issue of 'test_elementwise_mul_op_metax' failure * [Metax] update repeat_interleave kernel & ignore max op test * [metax]fix lu eigvalshsqueeze rnn kernel * [metax] chang patch fix copy * [metax] chang patch fix copy * [Metax] update metax_gpu unit test * [Metax] fix test CMakeList.txt * [metax]change_cupti_and_fix_softmax * [metax]change_patch * [metax]change_patch * [metax] updata_qr_kernel * [metax] updata_qr_kernel * [Metax] fix cufft and fix some blas kernel apply * [metax] fix bug * [Metax] add github action * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]chaneg build * [metax]fix_code style and index_elementwise_put_kernel * [metax]change_build * [metax]change_build * change_metax_work * change_metax_work * change_metax_work * change_metax_work * change_metax_work * change_warpctc.cmake * change warpctc.cmake * test * change_run_ut * remove_tets * test * add_generate_pb * [metax]fix paddle bug * change_ut * change_ut * change_ut * [metax]fix patch and fix missing kernel * [metax] link mccl and fix missing kernel * [metax] rename yaml file * [metax] rm file * [metax] rm file * [metax] add Rules * [metax] change_patch * update paddle * [metax] fix dot error * [metax]rm opt path and fix activation_kernel bug * updata paddle * chang_meatx_yaml * chang_meatx_yaml * updata_metax * test * test * test * test * test * test * test * test * test * test * test * test * updata_enigen * updata_paddle * test * updata ignore * updata_ignore * updata flag_and_fix_activation * updataignore * updata_patch * feat: add gammaln_grad_kernel.cu * updata_softmax * updata_patch * change_flag * [metax] add private CI * [metax] add private CI * [metax] add private CI * [Metax] add private CI * [Metax] add private CI * [Metax] add private CI * [Metax] add private CI * [Metax] add private CI * [Metax] add private CI * [Metax] add Upload * chang yaml * chang ut * updata_paddle * [metax] add schedule * test * [metax]fix collect_fpn_proposals --------- Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: metax666 <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: chezhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> Co-authored-by: root <[email protected]> * [Metax]Update version information (#158) * [Metax] update env (#163) * [metax] Timed trigger (#164) * 【Metax】update (#165) * [Metax] fix version (#166) * [Metax] fix nterpolate_grad_kernel (#167) * [metax]fix version.txt (#169) * test (#170) * update yaml (#171) * [Metax]add parameterized (#172) * [Metax] Assign data stream to CUDA (#174) * [Metax] fix CUDA Kernel No.50 (#175) * [metax] change yaml (#176) * [metax] Add some tests for CI (#173) * Change test script to use 8 jobs instead of 16 * 【Metax】fix patch (#178) * [METAX] Modify CI logic (#179) * [Metax] fix patch (#180) * ignore bilinear_interp_v2_op (#181) * change yaml-yml (#182) * test (#183) * rm metax ci (#184) * updata paddle (#185) * updata_paddle (#186) * tets --------- Co-authored-by: chezhang <[email protected]> Co-authored-by: duqimeng <[email protected]> Co-authored-by: Mingkun.Zhang <[email protected]> Co-authored-by: jiaxinWang-metax <[email protected]> Co-authored-by: MingkunZhang <[email protected]> Co-authored-by: zhang-chenyi <[email protected]> Co-authored-by: ZhouDuan <[email protected]> Co-authored-by: Theendlessofhell <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: ZhouDuan <[email protected]> Co-authored-by: sw <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tianshuo78520a <[email protected]> Co-authored-by: Yuqiang Ge <[email protected]> Co-authored-by: metax666 <[email protected]>

fixes to build on mac os x

8d12842

reyoung added the enhancement label Sep 2, 2016

reyoung closed this Sep 29, 2016

sdujq mentioned this pull request May 5, 2017

paddle exp计算出core （vsExp），浮点计算溢出？ #2024

Closed

xiehongweiscut mentioned this pull request Aug 3, 2017

使用paddle capi在线化部署服务出现core #3207

Closed

fty8788 mentioned this pull request Jan 23, 2018

capi forward函数core： Check failed: size != 0 allocate 0 bytes #7774

Closed

likeqinqin mentioned this pull request Mar 28, 2018

使用libpaddle_capi_shared.so动态链接库，偶发core #9436

Closed

wangshuohuan mentioned this pull request Apr 25, 2018

Seq2Seq网络（对示例网络的部分layer做了修改），报Check failed: size != 0 allocate 0 bytes，输入数据和batch数正常非空，麻烦帮忙看下原因，谢谢 #10187

Closed

xuezhong mentioned this pull request Jun 12, 2018

distribution trainning for transformer core dump #11387

Closed

junjun315 added a commit that referenced this pull request Apr 3, 2019

Merge pull request #17 from PaddlePaddle/develop

1d6dc8c

merge to local

kangzhangqi mentioned this pull request Jun 26, 2019

c++多线程预测给压力之后，输入越界，出core，报warning。 #18337

Closed

songyiting mentioned this pull request Aug 22, 2019

多线程环境下使用fluid在线预估库，释放clone的predictor出core #19361

Closed

xiuechen mentioned this pull request Oct 28, 2019

预测出core，能帮忙看下啥原因不？paddle训练和预测的版本都是v1.3.0 #20859

Closed

lost-little-lamb mentioned this pull request Dec 2, 2019

训练好的模型，c++预测接口出core #21472

Closed

qingqing01 pushed a commit to qingqing01/Paddle that referenced this pull request Apr 30, 2020

Merge pull request PaddlePaddle#17 from LielinJiang/evaluate

4d22fee

add evaluate and predict for model

joey12300 pushed a commit to joey12300/Paddle that referenced this pull request Oct 20, 2020

Merge pull request PaddlePaddle#17 from ZHUI/lstm_cpu

9844fd2

fix dropout bugs

zhangyushan223706 mentioned this pull request Oct 23, 2020

Unable to cast Python instance to C++ type #28227

Closed

PengheLiu mentioned this pull request Oct 27, 2020

C++预测时出 core，不知道原因在哪，同样的模型使用 pathon 预测 api 无问题 #28268

Closed

DemoMoon mentioned this pull request Mar 24, 2021

oneDNN 如何能提升DeepSpeech的语音处理性能 #31838

Closed

inpiredhss pushed a commit to inpiredhss/Paddle that referenced this pull request May 4, 2021

Merge pull request PaddlePaddle#17 from Liwb5/develop

2cb2eaf

fix pybind of FeatureNode

paddle-bot-old bot referenced this pull request Oct 18, 2021

remove two funcs, test=develop

864e602

paddle-bot-old bot referenced this pull request Oct 19, 2021

Merge branch 'op2func_refactor' into dev/op2func_refactor_

4ec1c2c

thisjiang pushed a commit to thisjiang/Paddle that referenced this pull request Oct 28, 2021

Merge pull request PaddlePaddle#17 from Superjomn/fea/add-scheduler

7f64f9c

make scheduler works

paddle-bot-old bot pushed a commit that referenced this pull request Nov 10, 2021

Merge pull request #17 from JiabinYang/run_backward

2119c06

2 python api

wangxicoding pushed a commit to wangxicoding/Paddle that referenced this pull request Dec 9, 2021

Merge pull request PaddlePaddle#17 from LiuChiachi/improve-data-augme…

c8ebb53

…nt-for-distill-lstm Improve data augmentation for distilling Bi-LSTM

Thunderbrook added a commit to Thunderbrook/Paddle that referenced this pull request Jun 7, 2022

Merge pull request PaddlePaddle#17 from Thunderbrook/gpugraph_0602

20a1aa3

adapt uint64 and iterate edge table

jack603047588 referenced this pull request in jack603047588/Paddle Nov 9, 2022

Merge pull request #17 from qingshui/paddlebox

638d3f1

fix ins bug, add mean logloss gpu op

marsbzp mentioned this pull request Jan 11, 2023

多线程调用C++推理库进行RNN算子崩溃问题！！！！ #49737

Open

qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this pull request Mar 3, 2023

add mixed precision doc (PaddlePaddle#17)

cd69252

* add fp16 doc

chlyzzo mentioned this pull request Mar 29, 2023

paddle/fluid/core_avx.so paddle::memory::allocation::MemoryMapFdSet::Clear() #52269

Closed

zyfncg pushed a commit to zyfncg/Paddle that referenced this pull request Aug 22, 2023

Merge pull request PaddlePaddle#17 from zyfncg/drr_opt

1c3ab40

[DRR] Suport Tensor Assgin API in ResultPattern

zmxdream pushed a commit to zmxdream/Paddle that referenced this pull request Nov 4, 2023

Merge pull request PaddlePaddle#17 from tiancaitzp/paddlebox

174455f

Paddlebox

lizexu123 pushed a commit to lizexu123/Paddle that referenced this pull request Feb 23, 2024

Add one-shot NAS API and mnasnet based search space. (PaddlePaddle#17)

02144bc

hanhaowen-mt pushed a commit to hanhaowen-mt/Paddle that referenced this pull request Feb 29, 2024

Merge pull request PaddlePaddle#17 from mthreads/fix_eigen3

ea4eb3e

[MTAI-484] fix(build): update submodule eigen3 commit id

NKNaN pushed a commit to NKNaN/Paddle that referenced this pull request Mar 3, 2024

Merge pull request PaddlePaddle#17 from umhan35/patch-1

ffd008a

README: Change Mandarin to Chinese

Fridge003 pushed a commit to Fridge003/Paddle that referenced this pull request May 8, 2024

Merge pull request PaddlePaddle#17 from Fridge003/multi-down

f84c7f4

support LiftToAnchorPattern for reduce tree pattern

wwbitejotunn pushed a commit to wwbitejotunn/Paddle that referenced this pull request Jun 21, 2024

Add num_splits in flash_attn backward api to support determistic resu…

e6b9d0d

…lt (PaddlePaddle#17)

WAYKEN-TSE pushed a commit to WAYKEN-TSE/Paddle that referenced this pull request Dec 6, 2024

Merge pull request PaddlePaddle#17 from westfish/add_lvdm

a5a680b

add lvdm

zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this pull request Sep 28, 2025

Merge pull request PaddlePaddle#17 from l1351868270/m2n_dev_lsl_0804

f79b7d2

add v3 api

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fixes to build on mac os x #17

fixes to build on mac os x #17

Uh oh!

djinn commented Sep 1, 2016

Uh oh!

reyoung commented Sep 2, 2016

Uh oh!

reyoung commented Sep 2, 2016

Uh oh!

djinn commented Sep 2, 2016

Uh oh!

kashif commented Sep 2, 2016 •

edited

Loading

Uh oh!

reyoung commented Sep 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fixes to build on mac os x #17

fixes to build on mac os x #17

Uh oh!

Conversation

djinn commented Sep 1, 2016

Uh oh!

reyoung commented Sep 2, 2016

Uh oh!

reyoung commented Sep 2, 2016

Uh oh!

djinn commented Sep 2, 2016

Uh oh!

kashif commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reyoung commented Sep 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kashif commented Sep 2, 2016 •

edited

Loading