Feat/sentiment analysis by tbittencourt · Pull Request #34 · Montreal-Analytics/dbt-snowflake-utils

tbittencourt · 2023-05-19T16:25:29Z

This macro iterates through a piece of text to return the overall sentiment of that text.

First, the macro pre-processes the text removing unnecessary punctuation and stopwords to help increase the accuracy of the model. Subsequently, using the transformers library it applies a sentiment analysis pipeline based on a pre-trained model that will return either a score or a label for the text.

Recommendation is to use the following popular models:

cardiffnlp/twitter-roberta-base-sentiment-latest:
(https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest)
This model is trained on 124M tweets from January 2018 to December 2021, and is finetuned for sentiment analysis.
It outputs a label - Neutral, Positive or Negative - and a score ranging from 0 to 1 - 0 being the most negative and 1,
the most positive.
nlptown/bert-base-multilingual-uncased-sentiment:
(https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment)
This model is fine-tuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French,
Spanish and Italian. It outputs a label - 1 to 5 stars - and a score ranging from 0 to 1 - 0 being the
most negative and 1, the most positive.

Macro returns a STRING data type. If 'score' is used as an output, then it will have to be cast to FLOAT data type.

cris-seaton

first iteration of review. let's chat about it when your back in the 'office'

cris-seaton · 2023-06-02T00:57:01Z

macros/udfs/sentiment_analysis.yml

+          This model is trained on 124M tweets from January 2018 to December 2021,
+          and is finetuned for sentiment analysis.
+          It outputs a label - Neutral, Positive or Negative - and a score ranging
+          from 0 to 1 - 0 being the most negative and 1, the most positive.


I think you've mis-interpreted score here. Score is more akin to the confidence in the assessment.

I actually have mixed results with nlptown model as well - the same positive text in the screenshots only yield 1 star with a score of 0.76.

cris-seaton · 2023-06-02T01:01:04Z

macros/udfs/sentiment_analysis.sql

+create or replace function {{target.schema}}.udf_sentiment_analysis(text STRING, output INTEGER)
+returns STRING


your arguments need revision
text STRING, model INTEGER or STRING [more on this later], output VARIANT (if you still want to differentiate between label and score)

and returns STRING, you have to make sure L32 has to cast as a string.

cris-seaton · 2023-06-02T01:04:04Z

macros/udfs/sentiment_analysis.sql

+model = "cardiffnlp/twitter-roberta-base-sentiment-latest"
+-- model = 'nlptown/bert-base-multilingual-uncased-sentiment'
+
+def sentiment_analysis(text, model_id, output='score'):


your function name (sentiment_analysis) and your handler on L8 have to be equivalent. you also should have the equivalent # of arguments here and in your UDF call.

Mayurjit · 2024-09-06T07:11:58Z

function name and handler should be quivalent

tbittencourt added 2 commits May 19, 2023 12:16

feat: sentiment analysis UDF

d8d1589

fix: sentiment analysis UDF, fix to macro description

e2ed51f

cris-seaton suggested changes Jun 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/sentiment analysis#34

Feat/sentiment analysis#34
tbittencourt wants to merge 2 commits intomasterfrom
feat/sentiment_analysis

tbittencourt commented May 19, 2023

Uh oh!

cris-seaton left a comment

Uh oh!

cris-seaton Jun 2, 2023

Uh oh!

cris-seaton Jun 2, 2023

Uh oh!

cris-seaton Jun 2, 2023

Uh oh!

Mayurjit commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		create or replace function {{target.schema}}.udf_sentiment_analysis(text STRING, output INTEGER)
		returns STRING

Conversation

tbittencourt commented May 19, 2023

Uh oh!

cris-seaton left a comment

Choose a reason for hiding this comment

Uh oh!

cris-seaton Jun 2, 2023

Choose a reason for hiding this comment

Uh oh!

cris-seaton Jun 2, 2023

Choose a reason for hiding this comment

Uh oh!

cris-seaton Jun 2, 2023

Choose a reason for hiding this comment

Uh oh!

Mayurjit commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants