Research on the Performance of Coding Calculation for Large-Scale Matrix Multiplication
CSTR:
Author:
Affiliation:

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the growth of machine learning algorithm models and data sets, a single node cannot effectively bear the computing and storage requirements required for large -scale training. A common solution is to run large-scale machine learning algorithms on distributed clusters. However, the performance of distributed clusters is significantly affected by stragglers. In recent studies, researchers have used coding calculations to solve the straggler problem, but the performance of coding calculation schemes for large-scale matrix multiplication has not been fully studied and analyzed. This paper examines the task completion time of the coding calculation scheme for large-scale matrix multiplication, and considers the total calculation overhead of all nodes participating in distributed computing. The expression of the task completion time for each working node to complete the calculation task according with the total time under the uniform distribution scenario and the total computing time of the cluster machines is given. The performance of the three coding schemes is compared and analyzed. The effects of different situations on the task completion time and the total computing cost of the computing node are compared through experiments, and a heuristic algorithm is proposed to provide the basis for the selection of different coding calculation schemes.

    Reference
    Related
    Cited by
Get Citation

王艳,王希龄,赖宏达,李念爽.面向大规模矩阵乘法的编码计算性能研究[J].华东交通大学学报英文版,2021,38(3):41-51.
Wang Yan, Wang Xiling, Lai Hongda, Li Nianshuang. Research on the Performance of Coding Calculation for Large-Scale Matrix Multiplication[J]. JOURNAL OF EAST CHINA JIAOTONG UNIVERSTTY,2021,38(3):41-51

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: August 02,2021
  • Published:
Article QR Code