6-Deep Learning Applications (Computer Vision)
This is Lecture 6 of the series. The focus is no longer on a single foundational model but on the overall landscape of common tasks in the visual domain. It serves well as a follow-up to the CNN lecture, helping you move from "being able to do image classification" to "knowing what other visual tasks exist, what their output formats look like, and what their respective challenges are."
What This Lecture Covers
This lecture brings together multiple sub-tasks in computer vision, making it ideal for building a differentiated understanding across tasks.
- Object Detection: introduces two-stage methods, one-stage methods, and subsequent developments, helping you distinguish the fundamental difference between classification and detection.
- Typical Image Analysis Tasks: covers common directions such as image segmentation, image retrieval, and object tracking.
- Specialized Image Analysis Tasks: extends to more specific topics including fine-grained classification, style transfer, image captioning, and super-resolution.
- Vertical Applications and Practice: applies visual methods to scenarios like medical image analysis and text detection/recognition, with object detection as the hands-on case.
How to Study This
- When studying this lecture, it is recommended to first clarify the input and output of each task. For example, detection involves bounding boxes and categories, segmentation is pixel-level prediction, and tracking emphasizes temporal continuity.
- If you are planning to work on engineering projects, object detection and segmentation are usually the most worthwhile to dive into first, as they have the most direct connection to real-world scenarios.
- You do not need to try to master all visual topics at once. First build a task map, then pick one area based on your future project direction and continue deep-diving into it.