Skip to content

q6_K get_rows and dequantize#6

Open
swetha097 wants to merge 2 commits intoswe_fork/q6_K_basefrom
swe_fork/q6_K_get_rows_dequantize
Open

q6_K get_rows and dequantize#6
swetha097 wants to merge 2 commits intoswe_fork/q6_K_basefrom
swe_fork/q6_K_get_rows_dequantize

Conversation

@swetha097
Copy link
Copy Markdown
Owner

No description provided.

Copy link
Copy Markdown

@Srihari-mcw Srihari-mcw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address review comments. Thanks


switch (src0->type) {
case GGML_TYPE_Q6_K:
ggml_compute_forward_get_rows_q6_Kx8(params, dst);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt this ideal to be within AVX2 condition similar to Q4_0 ? - https://github.com/ggml-org/whisper.cpp/pull/3223/files

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And any reason if (src0->ne[1] % 8 == 0) is skipped? Although add it if only necessary

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the block_q4_0x8 is one among the three variants and thats why the condition was necessary but if its better to add here also, maybe try adding it

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly

#undef MMID_MATRIX_ROW
}

void forward_get_rows(const ggml_compute_params * params,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep all params in same lines, follow same formatting as other functions in file

const ggml_tensor * src0 = dst->src[0];
const ggml_tensor * src1 = dst->src[1];

GGML_TENSOR_BINARY_OP_LOCALS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

static void ggml_compute_forward_get_rows_q6_Kx8(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, pls follow same formatting as the other functions

/**
* Dequantizes a single logical row from the repacked q6_Kx8 data format.
*
* @param p_repacked_blocks Pointer to the start of the 'block_q6_Kx8' structures for the entire row.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that the variable names are uniform across all PRs

float * GGML_RESTRICT y,
int64_t k,
int row_idx_in_group) {
assert(k % QK_K == 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a empty line after the function params would be nice, pls update formatting wherever necessary

}
}

static inline int8_t read_scale_from_repacked(const uint8_t* ptr_repacked_scales, int row_idx_in_group, int scale_idx) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add a line of comment explaining the helper overall? And can we keep these helpers overall at the start of the file only because some other helper AFAIK are also grouped together at the start

y[l + 96] = d_super_block * sc3 * q4;
}
y += 128;
ptr_repacked_scales = (uint8_t *)current_block->scales + 64;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check formatting here


const uint8_t * ptr_ql_base = current_block->ql;
const uint8_t * ptr_qh_base = current_block->qh;
uint8_t * ptr_repacked_scales = (uint8_t *)current_block->scales; // 16*8 scales repacked - 2bytes of each super block stored together
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 * 8, 2 bytes

const int8_t q3 = (int8_t)((ql_l0 >> 4) | (((qh_l >> 4) & 3) << 4)) - 32;
const int8_t q4 = (int8_t)((ql_l32 >> 4) | (((qh_l >> 6) & 3) << 4)) - 32;

y[l + 0] = d_super_block * sc0 * q1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y[l] should be sufficient imo


const int8_t q1 = (int8_t)((ql_l0 & 0xF) | (((qh_l >> 0) & 3) << 4)) - 32;
const int8_t q2 = (int8_t)((ql_l32 & 0xF) | (((qh_l >> 2) & 3) << 4)) - 32;
const int8_t q3 = (int8_t)((ql_l0 >> 4) | (((qh_l >> 4) & 3) << 4)) - 32;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check whitespaces between expressions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants