Approach that uses Google’s MediaPipe library to detect and track hand movement.
mediapipe/hands.py at master · google/mediapipe
def findHandPos(capture : cv2.Mat, color : cv2.cvtColor):
imgColor = cv2.cvtColor(capture, color)
processed = hands.process(imgColor)
result = processed.multi_hand_landmarks
return result
This function returns a dictionary containing the x, y, and z coordinates of the hand that has most recently entered the screen. The x and y coordinates are described through values ranging from 0 to 1, and the z coordinate is described through a value ranging from 0 to -1.
Pictured here is a drawing of the axes described previously (with a minor error, y maximum should be 1) and a primitive demonstration of how this can apply to the arm
“Landmarks” are used to describe certain points on the hand.
class HandLandmark(enum.IntEnum):
WRIST = 0
THUMB_CMC = 1
THUMB_MCP = 2
THUMB_IP = 3
THUMB_TIP = 4
INDEX_FINGER_MCP = 5
INDEX_FINGER_PIP = 6
INDEX_FINGER_DIP = 7
INDEX_FINGER_TIP = 8
MIDDLE_FINGER_MCP = 9
MIDDLE_FINGER_PIP = 10
MIDDLE_FINGER_DIP = 11
MIDDLE_FINGER_TIP = 12
RING_FINGER_MCP = 13
RING_FINGER_PIP = 14
RING_FINGER_DIP = 15
RING_FINGER_TIP = 16
PINKY_MCP = 17
PINKY_PIP = 18
PINKY_DIP = 19
PINKY_TIP = 20
To retrieve the location of the hand, one of these values must be given. In this example, the wrist is used to retrieve the x and y coordinates of itself:
coords = findHandPos(frame, cv2.COLOR_BGR2RGB)
# if a hand is offscreen, findHandPos() will return None
if coords != None:
xcoord = coords[0].landmark[mpHands.HandLandmark.WRIST].x
ycoord = coords[0].landmark[mpHands.HandLandmark.WRIST].y
print(f"{xcoord}, {ycoord}")