Skip to content

Add Facemesh model#42

Merged
gohai merged 12 commits intomainfrom
facemesh-noeventestr
Sep 7, 2023
Merged

Add Facemesh model#42
gohai merged 12 commits intomainfrom
facemesh-noeventestr

Conversation

@B2xx
Copy link
Contributor

@B2xx B2xx commented Aug 30, 2023

Hi everyone,

I think the facemesh model facemesh-noeventest is already cleaned up and be ready to merge to the main.

Copy link
Member

@ziyuan-linn ziyuan-linn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @B2xx, this looks great! You might want to run yarn add @tensorflow-models/face-landmarks-detection @mediapipe/face_mesh in the terminal to add the facemesh dependencies to package.json. After installing those the model should work perfectly. I also commented on a small thing inline.

I also think this should be good to merge after these small fixes!

Copy link
Member

@shiffman shiffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful work! Two comments, one small, one bigger!

  • I think prettier could probably be run over the code, I'm noticing some small inconsitencies with whitespace and indentation.
  • Would it be possible to create an example or two that demonstrates how to single out specific elements of the face? Let's say I wanted to only track the eyes or look at the lips. Seeing how to use individual keypoints (by name or through array access) as well as bounding boxes for individual face elements would be helpful.

This second comment can happen later and I'm happy to work on it! We should open an issue to make a list after merging.

@gohai I would love for you to review this too!

B2xx and others added 2 commits September 1, 2023 17:26
Co-authored-by: Ziyuan Lin <93690311+ziyuan-linn@users.noreply.github.com>
…arts to create an example of how to single out specific elements of the face and get its keypoints and bounding boxes
@B2xx
Copy link
Contributor Author

B2xx commented Sep 2, 2023

Hi @shiffman and @gohai,

Thank you for your advice!

  • I fixed the problem of some inconsistency with whitespace and indentation, and @UpKindLikeWater will double-check our codes.
  • I created a folder named "Facemesh-parts", and I added two functions to draw the keypoints and bounding box of specific face elements there.

Normalize comment style a bit, and comment-out the console.log() by default. (It's more instructive and clear, if users/instructors enable this intentionally, rather than the console being printed to from the get-go imho.)
let facemesh;
let video;
let faces = [];
let options = { maxFaces: 1, refineLandmarks: false, flipHorizontal: false };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@B2xx @UpKindLikeWater I am wondering whether having an options object in the examples is worth having or not. flipHorizontal seems useful, but perhaps something that ML5 should better do uniformly across all models (e.g. with the existing utils class)? @shiffman

Is there a drawback to having the default for maxFaces at a higher value?
When would one want to enable refineLandmarks? (can we do it always, or never?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"facemesh" might be an exception b/c it's a bit more complex of a model than bodypose or handpose, but I would lean towards not including the options in the beginner, first example and set defaults. I would also guess that most starter example and demos would be for one face only, but maybe allowing more than one is best by default. Does it affect performance?

I was curious about flipHorizontal, I realize it doesn't actually change the video itself it probably just mirrors all the points? This goes back to questions around the flipVideo() utility function we have in the current ml5.js. Do we want to keep it? Some options are:

  • Build a utility in ml5.js that mirrors video
  • Clear examples that show how to mirror the video when drawing.... and maybe all the models would include an option that mirrors the landmarks like this one does?

@ziyuan-linn what do you think in terms of bodypose and handpose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to take the approach of mirroring the video while drawing - but in models that don't have a flipHorizontal property, that leaves one with either some more convoluted code that does the arithmetic on the result - or running the whole draw function mirrored, which then quickly runs into issues when displaying text 😅.

I am not sure how feasible this is to implement, but my preference would be to have a class that transparently flips the image of the underlying video element (so that you can use it even in continuous mode), e.g.:

let video;
let flippedVideo;

function setup() {
  video = createCapture(VIDEO);
  // ..
  flippedVideo = flipVideo(video);
  facemesh.detectStart(flippedVideo, gotFaces);
}

function draw() {
  image(flippedVideo, ...);
}

Or, a flipHorizontal() method for videos in p5.js? 🤔

Copy link
Contributor Author

@B2xx B2xx Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gohai and @shiffman, here's the documentation of tensorflow.js face mesh models and it talks about all the options that we could use.

I love the idea of giving a general flip video option for ml5.js. however, I'm not that sure about the feasibility of it. Flipping the image of the underlying video element definitely works better for me. Maybe we could use copy() and scale() to realize this?

According to my tests, there is no drawback to having maxFaces to a higher value.

As for the refineLandmarks option, the documentation says

refineLandmarks: Defaults to false. If set to true, refines the landmark coordinates around the eyes and lips, and outputs additional landmarks around the irises.

Since the users will set refineLandmarks to true only when they need the irises, which I think is not that common. Maybe we could add a comment for this option or let it always to false for first example/ beginners?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all! In terms of bodypose and handpose, I agree with @gohai that a function like ml5.flipVideo would work better. Trying to flip the video using the p5 functions has caused a lot of problems for me personally, and it would be nice to have a way to flip the webcam video with one line of code and not interfere with other things.

As far as implementing a function like ml5.flipVideo(video) I am imaging taking these steps:

  1. Get the original video from the p5 video object (or directly if the user inputs an HTML video)
  2. Create an HTML canvas and draw the video on it with the x-axis flipped.
  3. Convert the HTML canvas into a media stream using captureStream()
  4. Return the media steam ad a video or p5 video depending on the input type.

I am not 100% sure whether this would work, but I am willing to try to implement this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with all thoughts! @ziyuan-linn I think a good next step would be for you to open a new issue duplicating your comment here and your proposal. We can then also tag in and check with p5.js team to make sure we aren't doing redundant work (in case they have something like this planned.) Or maybe they would prefer we add something to p5 directly! (I would ask here, but I think it may be hard to follow this conversation buried in an inline code comment for a specific pull request). Also, I don't think we need to resolve the "flipVideo" question as part of facemesh since it applies across the board.

Thanks all!!!

@gohai gohai changed the title Facemesh noeventestr Add Facemesh model Sep 5, 2023
@gohai
Copy link
Member

gohai commented Sep 5, 2023

@B2xx @UpKindLikeWater Really appreciate your work on this 🎉 Good job!
I did some small fixes and tweaks as extra commits on your branch. Please pull before you do any more work (if you do any more work) ;)

I believe this PR would be essentially ready to go in (best via squash-and-merge). There are two items that we might want to discuss still - I'll add those as comments in-line for now

max(lipsX) - min(lipsX),
max(lipsY) - min(lipsY)
);
}
Copy link
Member

@gohai gohai Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@B2xx @UpKindLikeWater The API design you selected has keys for different landmarks (.lips.) - but inside, you chose to keep the original structure of having arrays of points. This makes it very consistent with the "vanilla" model, but it also makes it a bit harder to use for simple tracking applications - since now it's up to the caller (user) to calculate e.g. the center point, or the extent of this landmark. (That's quite a bit of code above...)

Perhaps we could something like this instead:

[{
  lips: {
    cx: 300,   // center point calculated from all the points
    cy: 300,
    w: 10,     // width calculated from all the points
    h: 3,
    points: [  // actual points (if you want to draw them)
     { ... },
     { ... },
    ]
  },
  ...
}]

@shiffman Would this make sense to you also?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gohai yes, I like this idea! I think cx and cy might be a little confusing, if we just use x and y is that clear? Should we set them at the top left or the center? I would also perhaps opt for width and height instead of w and h.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use centerX and centerY, but maybe x y width height that are aligned with the default "rectMode" makes the most sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very sure about this... having instant access to the center point could make a lot of fun examples even more concise. (e.g. drawing a circle over your nose, drawing a line between the eyes, calculating the distance between mouse and...) But I understand the call to not mixing rectModes in APIs.

How about we provide both? 🤔

  lips: {
    x: 250,   // top left
    y: 250,
    width: 100,
    height: 100,
    centerX: 300,
    centerY: 300,
    points: [  // actual points (if you want to draw them)
     { ... },
     { ... },
    ]
  },

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to provide both because I think it is useful for facemesh models that contains so many points in one face element.

I think this structure will also affect the output of bodypose and handpose models, maybe @ziyuan-linn will have some thoughts as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both sounds great!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure if bodypose and handpose have any bounding boxes. There might be one for the entire body and entire hand, which is less useful than the facemesh bounding boxes. I don't really have a strong preference for this. I can see some cases where x, y, width, height is more useful and other cases where centerX, centerY is more useful, so keeping both might be a good idea.

An alternative to keeping both is to keep x and y at the top left corner by default and add a function facemesh.rectMode() which can switch x and y to center mode. However, this might become confusing and may introduce unnecessary complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @ziyuan-linn I agree, I don't think we need to add this for handpose or bodypose, I think it's more specific to the way people may want to use facemesh and the complexity of so many keypoints.

I believe this doesn't need mirroring? (without, the points align better to the former president's face).
@ziyuan-linn ziyuan-linn mentioned this pull request Sep 6, 2023
@gohai gohai merged commit 75bd3ce into main Sep 7, 2023
@gohai gohai deleted the facemesh-noeventestr branch September 16, 2023 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants