Grant preparation – Estimating pricing for OpenAI

Prompt Tokens in and Completion Tokens out is the best estimator to help determine a cap on the cost of interacting with the model


Example

You are using OpenAI to conduct zero-shot sentiment analysis on a dataset containing customer interactions. Something like what was done here: https://doi.org/10.1016/j.mlwa.2023.100508


Prompt Tokens

Say each row in your dataset consisted of text with a length of about ~200 words.

Words to tokens are not 1:1 but we can round up to 200 Tokens, which will help account for the entire prompt.

If you want to get more accurate, you can tokenize the data here: https://platform.openai.com/tokenizer


Completion Tokens

You want the model to return a paragraph containing a sentiment result along with an explanation of reasoning and some of the keywords associated with the sentiment.

On average, this result could be 150 words (approx. 150 tokens)

Tokenizing 200 words for Input using GPT-4 8K costs ~$0.006

Generating 150 words as Output using GPT-4 8K costs ~$0.009

https://openai.com/pricing


Cap on Expenses

In this example, a cap of $500 would allow you to perform ~33,333 queries.

Round down to account for testing, prompt engineering and other learning curves say possibly 30,000 queries.

If your dataset was less than 30,000 rows, then a cap of $500 makes sense.