Skip to content

Calculation of mean log probability (GPT-3) #3

@kaixqu

Description

@kaixqu

Hello Wenlong,

I think there might be an error in calculating mean log probability when using GPT-3. The main issue is that GPT-3 does not only return generated texts in response, it returns more than these (including token_logprobs of logprobs). Therefore, in order to calculate the mean log probability, we cannot simply use

# calculate mean log prob across tokens
mean_log_probs = [np.mean(response['choices'][i]['logprobs']['token_logprobs']) for i in range(sampling_params['n'])]

Instead, we should stop counting when a stop token is met.

For example, here is a response with a stop sequence of "\n". The generated text is "Walk to kitchen", however GPT-3 returns more than that,

response: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": {
        "text_offset": [
          317,
          322,
          325,
          333,
          333,
          333,
          333,
          333
        ],
        "token_logprobs": [
          -0.2976162,
          -0.00012346054,
          -0.5069456,
          -0.0011470452,
          -0.0060894582,
          -0.00028055036,
          -6.838237e-05,
          -0.054386232
        ],
        "tokens": [
          " Walk",
          " to",
          " kitchen",
          "\n",
          "Step",
          " 2",
          ":",
          " Walk"
        ],
        "top_logprobs": [
          {
            " Get": -3.9821253,
            " Go": -3.5860093,
            " Make": -3.1428235,
            " Wake": -2.513738,
            " Walk": -0.2976162
          },
          {
            " To": -12.335158,
            " in": -11.411637,
            " into": -9.384543,
            " to": -0.00012346054,
            " upstairs": -12.2138815
          },
          {
            " bedroom": -5.3587174,
            " dining": -1.0860167,
            " kitchen": -0.5069456,
            " living": -4.34434,
            " the": -3.2986841
          },
          {
            "\n": -0.0011470452,
            " ": -7.6692185,
            " table": -9.372099,
            ".": -8.122213,
            "ette": -9.167303
          },
          {
            "\n": -5.1904135,
            " Step": -7.8304586,
            "Step": -0.0060894582,
            "Task": -9.905375,
            "step": -10.6300955
          },
          {
            " 1": -10.295448,
            " 2": -0.00028055036,
            " 3": -11.589857,
            " 4": -12.77457,
            "2": -8.387781
          },
          {
            "\n": -11.062581,
            " :": -11.94543,
            ",": -12.268325,
            ".": -10.367215,
            ":": -6.838237e-05
          },
          {
            " Find": -3.783928,
            " Open": -4.0909195,
            " Turn": -5.903181,
            " Walk": -0.054386232,
            "Walk": -5.14835
          }
        ]
      },
      "text": " Walk to kitchen"
    }
  ],
  "model": "text-davinci-001",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 3,
    "prompt_tokens": 94,
    "total_tokens": 97
  }
}

The current way of calculating mean log prob gives -0.10833211608375, where it should be mean(-0.2976162, -0.00012346054, -0.5069456) = -0.26822842018

Please let me know what you think. Great work!

Cheers,
Kaixian

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions