Hands on MediaPipe and OpenCV

Hands on MediaPipe and OpenCV

Trying out machine learning with computer vision

Learning about computer vision, I found OpenCV. OpenCV is a library for real-time applications of computer vision. This is a cross-platform developed by Intel. Originally, OpenCV library was written in C++ programming language but it has binding with other languages also which are python, javascript and MATLAB.

Let us start with some interesting questions. Have you ever wondered what is behind the google assistant in your phones. Well that is MediaPipe.

MediaPipe is a cross-platform framework designed for building machine learning libraries for time-series data processing like video, audio etc. It provides a suite of libraries and tools to apply machine learning and artificial intelligence techniques in the applications. There are cross-platform APIs, libraries, pre-trained models and studios to work on these framework.

Let us simply know about an application of MediaPipe to use in hand landmark and movement detection. MediaPipe provides hand movement and finger tracking solution. By providing this hand perception functionality to the wider research and development community, it will result in an emergence of creative use cases, stimulating new applications and new research avenues.

Identifying hand and finger movement

There is a small code which can be used for hand landmark and palm detection. First there will be palm detection model through which one can detect palm addressing and carrying various objects, bounding fists and articulated fingers. Later, hand landmark model will be developed. In hand landmark model, keypoint localization of 21 3D hand-knuckle coordinates inside the detected hand regions via regression.

From this one can detect any part of the hand.

image = cv2.flip(cv2.imread(file), 1)
# Convert the BGR image to RGB before processing.
results = hands.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
print('Handedness:', results.multi_handedness)
   if not results.multi_hand_landmarks:
     continue
   image_height, image_width, _ = image.shape
   annotated_image = image.copy()
   for hand_landmarks in results.multi_hand_landmarks:
     print('hand_landmarks:', hand_landmarks)
     print(
         f'Index finger tip coordinates: (',
         f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * image_width}, '
          f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_height})'
      )

From this, one can know the coordinates of index finger in a 2D graphical model.

If we want to draw hand annotations as shown in the video above, one can use following code:

# Draw the hand annotations on the image.
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    if results.multi_hand_landmarks:
      for hand_landmarks in results.multi_hand_landmarks:
        mp_drawing.draw_landmarks(
            image,
            hand_landmarks,
            mp_hands.HAND_CONNECTIONS,
            mp_drawing_styles.get_default_hand_landmarks_style(),
            mp_drawing_styles.get_default_hand_connections_style())

Computer vision library can be used to capture the image as

cap = cv2.VideoCapture(0)

cv2 is a computer vision library used here to capture incoming video and use it for application.

There are many applications of MediaPipe which are:

  • Object detection

  • Image classification

  • Image segmentation

  • Interactive segmentation

  • Gesture recognition

  • Hand landmark detection

  • Image embedding

  • Face detection

  • Face landmark detection

  • Pose landmark detection

With these, I would like to conclude my article.

Hope you guys liked it

Thank you

Akhil Soni

To connect from me: https://www.linkedin.com/in/akhil-soni-9827181a1/