There is tremendous potential for the application of Image Recognition and Computer Vision technologies in solving real world challenges at scale, with reliability and accuracy !

Kumar Vaibhav, our next pathbreaker, Associate Manager at Accenture, works on AI and ML projects as well as envisions future innovations for R&D.

Vaibhav talks to Shyam Krishnamurthy from The Interview Portal about his PhD (Computer Vision) and his work on Digital Image Processing across a diverse range of sectors such as healthcare, transportation, insurance, robotics.

For students, there is probably no domain or sector today which has not been positively impacted by AI/ML and benefitted our society !

Vaibhav, what can you tell us about your background?

Hi, my name is Kumar Vaibhav. I was born in the holy city of Prayagraj, the meeting point of three divine rivers Ganga, Yamuna and Saraswati. When I was in class third, my father got transferred to Agra. So, from class 3 till post-graduation, I resided in Agra. I completed schooling from Saraswati Vidya Mandir, Kamala Nagar, Agra. It was not only a school for education, but also an institution from which I learned values about life, discipline, and morality along with love for the nation.

That was the time when I got fascinated by science and technology. I started to read about science and performed small experiments related to Electromagnetism and Physics on my own.

As my father was also an Engineer, he used to make me understand the concepts behind those experiments in detail. My mother was a government teacher. She taught me elementary Mathematics and English too. 

What did you do for graduation/post-graduation?

After completing my schooling, I pursued Engineering. I joined Dr. B. R. Ambedkar University (formerly Agra University) for Engineering in 2001. My branch was Electronics and Communication as I was fascinated by Electronics since childhood. During Engineering, along with Electronics, I developed an interest in Computers. I learned computer languages like C, C++, Java. To gain more knowledge about computers, I completed a 1 year Higher Software Diploma from NIIT in 2006.

I completed my Masters in Technology (M.Tech.) from Dayalbagh Deemed University, Dayalbagh, Agra in Engineering Systems in 2011. Here I specialized in the field of Digital Image Processing. I wrote my master’s thesis in ‘Content Based Image Retrieval’.

I recently earned a PhD degree in Computer Vision from Amity University Rajasthan in 2022.

What prompted you to pursue such an offbeat, unconventional and uncommon career?

During my Masters, I realized that I should choose a career where I could utilize the knowledge that I had gained with a lot of perseverance, focus and interest in my field. 

I had many mentors at different stages of life. During my school days, my history teacher, Dr. Dheerendra Sharma inspired me a lot. I liked his way of teaching and his deep knowledge. 

During my Bachelors, I got inspired by Electrical Networks professor, Dr. Jitendra Kumar Dwivedi. He simplified the complex subject and the mathematics behind Electrical Network in an easy to understand way.

During my Masters, two of my professors, Dr. D.K. Chaturvedi and my M.Tech. guide Dr. C. Patvardhan inspired me. It was my masters studies which developed the path for my future development.

During my doctorate, my PhD guide Dr. Jagdish Prasad guided me towards developing a research attitude and solving research problems.

How did you plan the steps to get into the career you wanted?

After the completion of my bachelors, I worked as a trainer in NIIT Ltd. It was the first job of my career. Here I started to grow from a student towards becoming a professional. My colleagues were very senior people who helped me to become a professionally mature person. In my first job, I learned deeper concepts of computers.

But I realized that I should gain more knowledge and hence I decided to do a masters. I did my MTech with major as Digital Image Processing from Dayalbagh Educational Institute.

After completing my masters, I got a chance to work at Indian Institute of Technology Jodhpur as a Senior Research Associate in 2011. It was a project under the ministry of Human Resource Development. Here I practically implemented the knowledge that I gained during my master’s thesis professionally. The problem statement was to extract text from the video lectures delivered by IIT professors. To solve the problem, we first extracted the video frames and then identified the distinct frames from all the frames. Then we extracted the frames which had text. After that, we performed some preprocessing of the distinct frames extracted and finally performed OCR (Optical Character Recognition) operation on the preprocessed images.

At what point in time did you decide that you wanted to do a PhD?

In 2012, I joined Amity University Rajasthan as an Assistant Professor. Here I taught students along with further refining my own knowledge. I have always considered teaching as a noble profession and I like to teach students. During my Amity days, I was also involved in several research activities in the field of Digital Image Processing, my masters main topic. One of the research papers I wrote was on, “Detection of Brain Tumor from MRI Scans”. To solve the problem, we first created a dataset of brain MRI images consisting of benign and malignant tumors. The images were collected from one of the nearby hospitals. Once we collected the images, we divided the images in two categories of benign and malignant tumors and created the training and test set. After that, Haralick texture features were extracted and a Support Vector Machine (SVM) was trained to classify benign and malignant tumors. Once the SVM has been trained, it would be able to classify the images correctly. More details about the paper can be read here.

It was then, I realized, that I should further extend my knowledge by enrolling in a PhD. So, I enrolled for a PhD program at Amity University in Computer Vision (Electronics and Communication Engineering). As I was a working professional, I took up a part-time PhD. It was extremely challenging to manage my job, research and corresponding studies simultaneously. A PhD program requires a lot of patience and a “never give up” attitude. Many times, I faced experiments which produced results not as per my expectations. So I had to refine my approaches again and again. Many times, it was like a dead end. Then I had to explore more research papers and formulate the theory and then perform experiments. 

My PhD thesis topic was on “Content Based Image Retrieval (CBIR) using Nature Inspired Algorithms”. As I had worked on CBIR during my masters also, so decided to further extend it. I introduced natured inspired algorithms in CBIR specially Convolutional Neural Network (CNN) and its different architectures. I proposed several architectural modifications and performed experiments covering several use cases like Estimation of Traffic Density using Progressive Neural Architecture Search (PNAS) DOI: 10.1080/09720510.2020.1736336, Classification of Indian Jewellery using Convolutional Neural Network (

Can you talk a little about your career path in the industry, especially on the diverse real world problems you addressed through Machine Learning and AI?

In 2016, I got an interview call from Manipal Technologies Limited (MTL), Pune. With several rounds of technical discussions, I cleared the interview. They invited me to join them in Pune. It was a career shift from academics to the corporate world. So, initially I was a bit nervous, but I had confidence about my knowledge, so I joined MTL.

In MTL, I worked on an interesting use case of detection of hidden text from holograms. The final aim was those hologram can be used as to verify if the product is genuine or fake. More details, you can find below.

After quitting MTL, I joined Accenture in 2017. There I utilized Artificial Intelligence and Computer Vision for automatic detection of damages for houses and automobiles. The final aim was that a customer should be able to upload images of a customer’s damaged house or automobiles and accordingly, the AI system should be able to suggest the insurance claim amount for issuance. To solve the problem, we first classified the images in five categories ranging from no damage to fully damaged and three intermediate categories. After that, we experimented with various CNN architectures and finalized Inception V3 model. We trained the model for training and tested the trained model for the test set and finally solved the problem.

After Accenture, I joined Persistent Systems Limited in 2018. Here I implemented Artificial Intelligence and Machine Learning concepts on Robots and Drones. The problem statement was that the robot should be able to navigate in a hospital without colliding with any person or thing and should be able to deliver a payload from source to destination. So, we divided the problem with two parts, first finding the optimal path from source to destination and second is collision avoidance. For finding the optimal path, we tried several path finding algorithms and finalized Dijkstra’s algorithm. For collision avoidance, we have collected the point cloud data from lidars and camera feeds. With lidar data, we were able to get the position of the object and with camera, we were able to identify the objects. For object identification, we used SSD Mobilenet object detection architecture.

I joined Neilsoft Limited in 2020 where we worked on the problem of generating the footprint (the network of warehouses a company has at its disposal to store inventory and fulfill customer orders) for warehouses. The client provided us with the historical data and wanted us to generate the footprints for their future warehouses using Artificial Intelligence. The footprints were to be generated by satisfying several conditions. As it was a new problem, we initially tried to solve the problem using numerical dimensions and then applied regression approach which worked for simple cases, but not for complex cases. So, we improvised our approach and introduced Generative Adversarial Networks (GAN) to generate warehouse footprints. To train the GAN model, we first generated ground truth and input images and then the GAN model has been trained on training set. Once the model has been trained, we tested it for query image and to verify if it was able to generate the footprints. Although it was generating the footprints, it was not very accurate. So, we thought of ways to improve the accuracy. We introduced a new concept to color code various entities in the image and then train the network again for better distinction between entities. Our approach worked and finally the model was able to generate accurate images.

How did you get your first break?

I got an interview call from Manipal Technologies Limited (MTL), Pune. That was my 1st break.

What were some of the challenges you faced? How did you address them?

I faced several challenges during various phases of my career. When I joined Amity University as an Assistant Professor, I realized teaching is not an easy task. You should have a lot of patience along with inter-personal skills and deep knowledge.

During my first corporate job in MTL, I was assigned a project for the detection of hidden text in specifically designed holograms with a mobile application. I was working as a chief contributor and I initially had no knowledge about hologram and mobile application programming. So, I started reading research papers about the topic. I, along with one of my colleagues, started to try approaches discussed in research papers. After several months of research work, we succeeded in breaking down the problem and at last, we were able to develop the solution.

During my next job at Accenture, I entered the field of Artificial Intelligence and Machine Learning. It was again a new technological change and I had to learn it in a short span of time. With several months of hard work and training, I learned AI/ML.

Again, in my next job at Persistent Systems Ltd, I was assigned to implement the AI/ML concept on robots and drones. As I have never worked on robots and drones, I learned ROS (Robot Operating System) and then successfully implemented AI/ML use cases.

Where do you work now? What problems do you solve?

Currently, I am working at Accenture on the Google Cloud Platform. It is again a new area for me. I underwent training from Google and acquired knowledge and have now become a Google Cloud Platform Associate Cloud Engineer (ACE).

I work on translating a video from a source language to a target language using a web application. To solve the problem, the pipeline is to first extract the audio from videos, then convert the audio files to text files using Speech-to-Text services of Google. Once the audio has been converted to text, translate the text in target language using Google Translation services. After the translation, convert back text to audio using Google’s Text-to-Speech services. After that, we merge the translated audio on the original video frames. One of the challenges, we have faced here was to match the duration of translated audio and the video file. To tackle this problem, two approaches has been utilized here. First we devised a speed coefficient based upon our experiments, to increase or decrease the pace of the audio and second we introduced pauses if the audio length is shorter compared to original video segments. 

What’s a typical day like?

My typical day starts with a project scrum call. As I am working as a project owner, it is my responsibility to deliver the project on time along with other responsibilities like requirement analysis, project plans, resource allocation and preparation of timelines. I coordinate with team members to understand day to day progress and help them to mitigate technical changes.

Also, I am involved in developing research proposals and activities for future innovative developments along with preparation for documents for filing patents.

How does your work benefit society? 

I am working in Artificial Intelligence and Computer Vision, which is a cutting edge technology. We are developing applications which could help human beings to make their life more comfortable.

AI is being applied in many field such as: 

  • Development of Autonomous Vehicles
  • AI in Agriculture, like
    • Agricultural Robots
    • Crop and Soil Monitoring
    • Predictive Analysis of crops, weather conditions
  • AI in Financial Services, like
    • For credit scoring
    • Financial Lending
    • Legal compliance
    • Fraud detection
    • Algorithmic trading
  • AI in Marketing and Advertising
    • Natural Language Processing
    • Structured Data Analysis
    • Personalized advertisements
    • AI powered Augmented Reality (AR)
  • AI in Science
    • Faster scientific discovery
    • Cheaper experimentation
    • Easier Training
  • AI in healthcare
    • Improving patient care (precision medicine, mobile health)
    • Managing health systems
    • Understanding and managing population and public health
    • Telemedicine and Telesurgery

These are only a few of the fields mentioned above although the applications of AI are very wide.

Tell us an example of a specific memorable work you did that is very close to you!

Implementation of AI for robots and drones always fascinates me. I implemented AI on Robots and love to do more work in that domain.

Your advice to students based on your experience?

My advice to students is as follows:

  • Learn fundamentals deeply, not just superficial knowledge. That will also develop your interest in the subject.
  • Practice daily.
  • Prepare a time-table and adhere to it, learn to live a disciplined life.
  • Take care of your physical, mental and spiritual health.

Future Plans?

My future plans are to continue research in the field of Artificial Intelligence and Machine Learning along with other new technologies.