Skip to content

Performance of large files is really bad #382

@jjmartin

Description

@jjmartin

I see your performance metrics in the docs but the files i'm dealing with can often be up to half a million rows.

I'm importing from a CSV and as I continually SetCellValues processing each line, the performance of adding those lines gets worse and worse.

	reader := csv.NewReader(csvBody)
	const headerRow = 1
	row := headerRow
	var headers []string
	rowstart := time.Now()
	for {
		line, err := reader.Read()
		if err == io.EOF {
			break
		} else if err != nil {
			log.Printf("Error Reading CSV: %+v ", err)
			return err
		}
		if row == headerRow {
			headers = line
		}
		for cellIndex, cellValue := range line {
			if row > headerRow && stringInSlice(headers[cellIndex], []string{"quantity", "price"}) {
				cellParsedValue, err := strconv.ParseFloat(cellValue, 64)
				if err != nil {
					log.Printf("%s\ncell %d in line %d had invalid value %s", err, cellIndex, row, cellValue)
					return err
				}
				xlsx.SetCellValue("details", fmt.Sprintf("%s%d", excelize.ToAlphaString(cellIndex), row), cellParsedValue)
			} else {
				xlsx.SetCellValue("details", fmt.Sprintf("%s%d", excelize.ToAlphaString(cellIndex), row), cellValue)
			}

		}
		pow10row := findPow10(row)
		if row <= pow10row*10 && (row%pow10row == 0 || row%10000 == 0) {
			elapsed := time.Since(rowstart)
			log.Printf("Row %d, time elapsed %s", row, elapsed)
			rowstart = time.Now()
		}
		row++
	}

Describe the results you received:
In the log output below, each time elapsed is the difference from the previous log line. you can see that after about 10,000 rows, its starting to get really bad in terms of how long it takes to process each next 10,000 rows.

2019/04/08 21:38:37 Row 1, time elapsed 1.505084ms
2019/04/08 21:38:37 Row 2, time elapsed 96.097µs
2019/04/08 21:38:37 Row 3, time elapsed 101.482µs
2019/04/08 21:38:37 Row 4, time elapsed 94.35µs
2019/04/08 21:38:37 Row 5, time elapsed 94.585µs
2019/04/08 21:38:37 Row 6, time elapsed 96.621µs
2019/04/08 21:38:37 Row 7, time elapsed 97.552µs
2019/04/08 21:38:37 Row 8, time elapsed 99.631µs
2019/04/08 21:38:37 Row 9, time elapsed 108.849µs
2019/04/08 21:38:37 Row 10, time elapsed 74.078µs
2019/04/08 21:38:37 Row 20, time elapsed 545.239µs
2019/04/08 21:38:37 Row 30, time elapsed 513.501µs
2019/04/08 21:38:37 Row 40, time elapsed 532.816µs
2019/04/08 21:38:37 Row 50, time elapsed 564.326µs
2019/04/08 21:38:37 Row 60, time elapsed 669.845µs
2019/04/08 21:38:37 Row 70, time elapsed 1.508732ms
2019/04/08 21:38:37 Row 80, time elapsed 666.172µs
2019/04/08 21:38:37 Row 90, time elapsed 594.624µs
2019/04/08 21:38:37 Row 100, time elapsed 630.948µs
2019/04/08 21:38:37 Row 200, time elapsed 7.519094ms
2019/04/08 21:38:37 Row 300, time elapsed 6.852758ms
2019/04/08 21:38:37 Row 400, time elapsed 8.674476ms
2019/04/08 21:38:37 Row 500, time elapsed 8.159781ms
2019/04/08 21:38:37 Row 600, time elapsed 9.568621ms
2019/04/08 21:38:37 Row 700, time elapsed 8.916284ms
2019/04/08 21:38:37 Row 800, time elapsed 10.846477ms
2019/04/08 21:38:37 Row 900, time elapsed 9.282789ms
2019/04/08 21:38:37 Row 1000, time elapsed 12.92103ms
2019/04/08 21:38:37 Row 2000, time elapsed 128.488664ms
2019/04/08 21:38:37 Row 3000, time elapsed 189.107883ms
2019/04/08 21:38:37 Row 4000, time elapsed 278.586948ms
2019/04/08 21:38:38 Row 5000, time elapsed 391.341065ms
2019/04/08 21:38:38 Row 6000, time elapsed 471.830863ms
2019/04/08 21:38:39 Row 7000, time elapsed 530.416468ms
2019/04/08 21:38:39 Row 8000, time elapsed 602.603427ms
2019/04/08 21:38:40 Row 9000, time elapsed 652.277227ms
2019/04/08 21:38:41 Row 10000, time elapsed 729.849772ms
2019/04/08 21:38:52 Row 20000, time elapsed 10.977776474s
2019/04/08 21:39:10 Row 30000, time elapsed 18.55464695s
2019/04/08 21:39:37 Row 40000, time elapsed 26.336721766s
2019/04/08 21:40:10 Row 50000, time elapsed 33.489274657s
2019/04/08 21:40:51 Row 60000, time elapsed 40.729296603s
2019/04/08 21:41:39 Row 70000, time elapsed 47.928431496s
2019/04/08 21:42:34 Row 80000, time elapsed 55.148409674s
2019/04/08 21:43:36 Row 90000, time elapsed 1m2.4532031s
2019/04/08 21:44:46 Row 100000, time elapsed 1m9.608536367s
2019/04/08 21:46:03 Row 110000, time elapsed 1m16.990387462s
2019/04/08 21:47:27 Row 120000, time elapsed 1m24.146257207s
2019/04/08 21:48:59 Row 130000, time elapsed 1m31.308584865s
2019/04/08 21:50:37 Row 140000, time elapsed 1m38.654884213s
2019/04/08 21:52:23 Row 150000, time elapsed 1m46.000199696s
2019/04/08 21:54:16 Row 160000, time elapsed 1m53.238934707s
2019/04/08 21:56:17 Row 170000, time elapsed 2m0.485714266s
2019/04/08 21:58:25 Row 180000, time elapsed 2m7.897305904s
2019/04/08 22:00:40 Row 190000, time elapsed 2m15.234462928s
2019/04/08 22:03:03 Row 200000, time elapsed 2m23.134322152s
2019/04/08 22:05:34 Row 210000, time elapsed 2m30.40930936s
2019/04/08 22:08:11 Row 220000, time elapsed 2m37.873410076s
2019/04/08 22:10:59 Row 230000, time elapsed 2m47.92659603s
2019/04/08 22:13:58 Row 240000, time elapsed 2m58.625053178s
2019/04/08 22:17:08 Row 250000, time elapsed 3m10.348595584s
2019/04/08 22:20:29 Row 260000, time elapsed 3m20.726383957s
2019/04/08 22:24:08 Row 270000, time elapsed 3m38.840478421s
2019/04/08 22:28:12 Row 280000, time elapsed 4m4.294031488s
2019/04/08 22:32:43 Row 290000, time elapsed 4m30.85305806s
2019/04/08 22:37:45 Row 300000, time elapsed 5m2.183625905s
2019/04/08 22:43:18 Row 310000, time elapsed 5m33.135633645s
2019/04/08 22:49:22 Row 320000, time elapsed 6m3.47749514s
2019/04/08 22:55:56 Row 330000, time elapsed 6m33.647828s
2019/04/08 23:02:59 Row 340000, time elapsed 7m3.546443285s
2019/04/08 23:10:35 Row 350000, time elapsed 7m35.978277292s
2019/04/08 23:18:43 Row 360000, time elapsed 8m8.039533099s
2019/04/08 23:27:22 Row 370000, time elapsed 8m38.447390938s
2019/04/08 23:36:33 Row 380000, time elapsed 9m11.603785808s
2019/04/08 23:46:15 Row 390000, time elapsed 9m41.515021912s
2019/04/08 23:56:17 Row 400000, time elapsed 10m2.085553551s
2019-04-09
2019/04/09 00:06:42 Row 410000, time elapsed 10m25.252517462s
2019/04/09 00:17:38 Row 420000, time elapsed 10m55.909756693s

Describe the results you expected:

if there was a direct way to import a CSV or some method to speed this sort of import up, it would be really useful

Excelize version or commit ID:

  digest = "1:9b67e96a030cc96a3bef1d7cb1143f1e13440f1087eee5999fa9ba5514c1027c"
  name = "github.com/360EntSecGroup-Skylar/excelize"
  packages = ["."]
  pruneopts = ""
  revision = "dea7ba0ec43a4c29a6642d02b6edc73b8b0369f0"
  version = "v1.4.1"

Environment details (OS, Microsoft Excel™ version, physical, etc.):
The above log was captured from an AWS Fargate Docker task running with 4096 CPU units and 30720 MiB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions