Conversation
ziyuan-linn
left a comment
There was a problem hiding this comment.
Hi @B2xx, this looks great! You might want to run yarn add @tensorflow-models/face-landmarks-detection @mediapipe/face_mesh in the terminal to add the facemesh dependencies to package.json. After installing those the model should work perfectly. I also commented on a small thing inline.
I also think this should be good to merge after these small fixes!
shiffman
left a comment
There was a problem hiding this comment.
This is wonderful work! Two comments, one small, one bigger!
- I think prettier could probably be run over the code, I'm noticing some small inconsitencies with whitespace and indentation.
- Would it be possible to create an example or two that demonstrates how to single out specific elements of the face? Let's say I wanted to only track the eyes or look at the lips. Seeing how to use individual keypoints (by name or through array access) as well as bounding boxes for individual face elements would be helpful.
This second comment can happen later and I'm happy to work on it! We should open an issue to make a list after merging.
@gohai I would love for you to review this too!
Co-authored-by: Ziyuan Lin <93690311+ziyuan-linn@users.noreply.github.com>
…arts to create an example of how to single out specific elements of the face and get its keypoints and bounding boxes
|
Thank you for your advice!
|
As pointed out by @ziyuan-linn
Normalize comment style a bit, and comment-out the console.log() by default. (It's more instructive and clear, if users/instructors enable this intentionally, rather than the console being printed to from the get-go imho.)
| let facemesh; | ||
| let video; | ||
| let faces = []; | ||
| let options = { maxFaces: 1, refineLandmarks: false, flipHorizontal: false }; |
There was a problem hiding this comment.
@B2xx @UpKindLikeWater I am wondering whether having an options object in the examples is worth having or not. flipHorizontal seems useful, but perhaps something that ML5 should better do uniformly across all models (e.g. with the existing utils class)? @shiffman
Is there a drawback to having the default for maxFaces at a higher value?
When would one want to enable refineLandmarks? (can we do it always, or never?)
There was a problem hiding this comment.
"facemesh" might be an exception b/c it's a bit more complex of a model than bodypose or handpose, but I would lean towards not including the options in the beginner, first example and set defaults. I would also guess that most starter example and demos would be for one face only, but maybe allowing more than one is best by default. Does it affect performance?
I was curious about flipHorizontal, I realize it doesn't actually change the video itself it probably just mirrors all the points? This goes back to questions around the flipVideo() utility function we have in the current ml5.js. Do we want to keep it? Some options are:
- Build a utility in ml5.js that mirrors video
- Clear examples that show how to mirror the video when drawing.... and maybe all the models would include an option that mirrors the landmarks like this one does?
@ziyuan-linn what do you think in terms of bodypose and handpose?
There was a problem hiding this comment.
I used to take the approach of mirroring the video while drawing - but in models that don't have a flipHorizontal property, that leaves one with either some more convoluted code that does the arithmetic on the result - or running the whole draw function mirrored, which then quickly runs into issues when displaying text 😅.
I am not sure how feasible this is to implement, but my preference would be to have a class that transparently flips the image of the underlying video element (so that you can use it even in continuous mode), e.g.:
let video;
let flippedVideo;
function setup() {
video = createCapture(VIDEO);
// ..
flippedVideo = flipVideo(video);
facemesh.detectStart(flippedVideo, gotFaces);
}
function draw() {
image(flippedVideo, ...);
}
Or, a flipHorizontal() method for videos in p5.js? 🤔
There was a problem hiding this comment.
Hi @gohai and @shiffman, here's the documentation of tensorflow.js face mesh models and it talks about all the options that we could use.
I love the idea of giving a general flip video option for ml5.js. however, I'm not that sure about the feasibility of it. Flipping the image of the underlying video element definitely works better for me. Maybe we could use copy() and scale() to realize this?
According to my tests, there is no drawback to having maxFaces to a higher value.
As for the refineLandmarks option, the documentation says
refineLandmarks: Defaults to false. If set to true, refines the landmark coordinates around the eyes and lips, and outputs additional landmarks around the irises.
Since the users will set refineLandmarks to true only when they need the irises, which I think is not that common. Maybe we could add a comment for this option or let it always to false for first example/ beginners?
There was a problem hiding this comment.
Hi all! In terms of bodypose and handpose, I agree with @gohai that a function like ml5.flipVideo would work better. Trying to flip the video using the p5 functions has caused a lot of problems for me personally, and it would be nice to have a way to flip the webcam video with one line of code and not interfere with other things.
As far as implementing a function like ml5.flipVideo(video) I am imaging taking these steps:
- Get the original video from the p5 video object (or directly if the user inputs an HTML video)
- Create an HTML canvas and draw the video on it with the x-axis flipped.
- Convert the HTML canvas into a media stream using
captureStream() - Return the media steam ad a video or p5 video depending on the input type.
I am not 100% sure whether this would work, but I am willing to try to implement this.
There was a problem hiding this comment.
Agreed with all thoughts! @ziyuan-linn I think a good next step would be for you to open a new issue duplicating your comment here and your proposal. We can then also tag in and check with p5.js team to make sure we aren't doing redundant work (in case they have something like this planned.) Or maybe they would prefer we add something to p5 directly! (I would ask here, but I think it may be hard to follow this conversation buried in an inline code comment for a specific pull request). Also, I don't think we need to resolve the "flipVideo" question as part of facemesh since it applies across the board.
Thanks all!!!
|
@B2xx @UpKindLikeWater Really appreciate your work on this 🎉 Good job! I believe this PR would be essentially ready to go in (best via squash-and-merge). There are two items that we might want to discuss still - I'll add those as comments in-line for now |
| max(lipsX) - min(lipsX), | ||
| max(lipsY) - min(lipsY) | ||
| ); | ||
| } |
There was a problem hiding this comment.
@B2xx @UpKindLikeWater The API design you selected has keys for different landmarks (.lips.) - but inside, you chose to keep the original structure of having arrays of points. This makes it very consistent with the "vanilla" model, but it also makes it a bit harder to use for simple tracking applications - since now it's up to the caller (user) to calculate e.g. the center point, or the extent of this landmark. (That's quite a bit of code above...)
Perhaps we could something like this instead:
[{
lips: {
cx: 300, // center point calculated from all the points
cy: 300,
w: 10, // width calculated from all the points
h: 3,
points: [ // actual points (if you want to draw them)
{ ... },
{ ... },
]
},
...
}]
@shiffman Would this make sense to you also?
There was a problem hiding this comment.
@gohai yes, I like this idea! I think cx and cy might be a little confusing, if we just use x and y is that clear? Should we set them at the top left or the center? I would also perhaps opt for width and height instead of w and h.
There was a problem hiding this comment.
We could also use centerX and centerY, but maybe x y width height that are aligned with the default "rectMode" makes the most sense?
There was a problem hiding this comment.
I am not very sure about this... having instant access to the center point could make a lot of fun examples even more concise. (e.g. drawing a circle over your nose, drawing a line between the eyes, calculating the distance between mouse and...) But I understand the call to not mixing rectModes in APIs.
How about we provide both? 🤔
lips: {
x: 250, // top left
y: 250,
width: 100,
height: 100,
centerX: 300,
centerY: 300,
points: [ // actual points (if you want to draw them)
{ ... },
{ ... },
]
},
There was a problem hiding this comment.
I'd like to provide both because I think it is useful for facemesh models that contains so many points in one face element.
I think this structure will also affect the output of bodypose and handpose models, maybe @ziyuan-linn will have some thoughts as well?
There was a problem hiding this comment.
I am not too sure if bodypose and handpose have any bounding boxes. There might be one for the entire body and entire hand, which is less useful than the facemesh bounding boxes. I don't really have a strong preference for this. I can see some cases where x, y, width, height is more useful and other cases where centerX, centerY is more useful, so keeping both might be a good idea.
An alternative to keeping both is to keep x and y at the top left corner by default and add a function facemesh.rectMode() which can switch x and y to center mode. However, this might become confusing and may introduce unnecessary complexity.
There was a problem hiding this comment.
Yes @ziyuan-linn I agree, I don't think we need to add this for handpose or bodypose, I think it's more specific to the way people may want to use facemesh and the complexity of so many keypoints.
I believe this doesn't need mirroring? (without, the points align better to the former president's face).
Hi everyone,
I think the facemesh model facemesh-noeventest is already cleaned up and be ready to merge to the main.