Each point in the scatter plot above represents one job listed by Google LLC. Through TF-IDF technique, each job description can be embedded as vector.
With 2000 jobs and 9000 unique words, we construct a matrix of dimensions 2000x9000. Utilizing Principal Component Analysis (PCA), we can identify pivotal words that effectively distinguish between these job roles.
The scatter plot above is the PCA_0 and PCA_1 projection.
An interesting dichotomy emerges when analyzing each principal dimension.
software engineering vs customer sales (PCA_0),
cloud-related roles vs hardware-oriented positions (PCA_1)
So, not everyone in Google are software engineers, in fact, there are a large portion of business and sales people.
Also, in addition to Google Cloud Platform, Google has large portion of hardware jobs. Hover the mosue near negative side of Y-axis, you shall see more keywords like cpu, sillicon, etc..
Below are the word components of PCA_0, PCA_1. They should be 9000 dimensional vectors. Here only the four most positive and negative word componets are shown: