I do some applied ML on photos/videos and I feel like this should be pretty simple
Manually map out the coordinates of the counter and add a bounding box of coffee cups/mugs in addition to your baristas. Count how many times an overlapping coffee mug and barista bounding box enters the bounding box of the counter
I guess that would work if you structured the shop's entire workflow around being recognizable by the program. Even then, pairing this with employee recognition and considering all the edge cases it would be very hard to pull off. It would be a really cool problem to hash out if it wasn't for such a cartoonish evil application.
I do some applied ML on photos/videos and I feel like this should be pretty simple
Manually map out the coordinates of the counter and add a bounding box of coffee cups/mugs in addition to your baristas. Count how many times an overlapping coffee mug and barista bounding box enters the bounding box of the counter
I guess that would work if you structured the shop's entire workflow around being recognizable by the program. Even then, pairing this with employee recognition and considering all the edge cases it would be very hard to pull off. It would be a really cool problem to hash out if it wasn't for such a cartoonish evil application.