Data Scientist (PhD)

Remote
Full Time
Experienced
Location: 100% Remote within the United States

 

Job Overview:
As a Data Scientist, you will be responsible for managing the complete Model Development Life Cycle (MDLC), from problem definition to model deployment and monitoring. You will work closely with cross-functional teams to deliver machine learning models that support business objectives and drive innovation. The ideal candidate should have a strong background in data analysis, feature engineering, and model selection, along with a deep understanding of model deployment and ongoing model maintenance.

Key Responsibilities:

  • Problem Definition: Collaborate with business stakeholders to define and structure data-driven problems. Translate business objectives into machine learning tasks (e.g., classification, regression, clustering).
  • Data Collection & Preprocessing: Gather, clean, and preprocess data from multiple sources (e.g., databases, APIs, publicly available datasets). Handle missing data, outliers, and apply normalization techniques.
  • Exploratory Data Analysis (EDA): Use statistical analysis and data visualization techniques to identify key patterns, trends, and correlations in the data.
  • Feature Engineering: Create, extract, and transform features to improve model performance. Apply techniques such as feature extraction, selection, and transformation.
  • Model Selection & Training: Select the appropriate machine learning models based on the problem at hand (e.g., supervised learning, unsupervised learning, deep learning). Train models using tools like Scikit-learn, TensorFlow, or PyTorch. Evaluate model performance using relevant metrics (e.g., RMSE, accuracy, F1-score, ROC-AUC) and optimize hyperparameters to ensure robustness. Deploy models in a production environment using tools like Flask, FastAPI, Docker, and Kubernetes. Ensure scalability and integration with existing systems.
  • Model Monitoring & Maintenance: Monitor model performance post-deployment, address model drift, and retrain models as needed. Ensure continuous accuracy and relevance of models in real-world scenarios.
  • Model Interpretation & Communication: Provide clear and actionable insights through model interpretation techniques such as feature importance and SHAP values. Present results to both technical and non-technical stakeholders.

Qualifications:

  • PhD degree in Computer Science, Data Science, Statistics, Engineering, or a related field.
  • 3+ years of experience in machine learning, statistical modeling, and data science.
  • Proficiency in Python, SQL, and experience with libraries such as Pandas, NumPy, Scikit-learn, TensorFlow, and Keras.
  • Hands-on experience with model deployment tools such as Flask, Docker, Kubernetes, and cloud platforms like AWS, Azure, or Google Cloud.
  • Strong knowledge of data preprocessing techniques, feature engineering, and exploratory data analysis.
  • Experience with hyperparameter tuning techniques (e.g., Grid Search, Bayesian Optimization).
  • Familiarity with model monitoring tools such as MLflow, Prometheus, or Grafana.
  • Excellent communication skills, with the ability to translate technical results into actionable insights for stakeholders.
  • Strong problem-solving skills and the ability to work on complex, data-driven projects.

Preferred Qualifications:

  • Experience with deep learning models (e.g., CNNs, RNNs, LSTMs).
  • Familiarity with NLP and time-series analysis.
  • Knowledge of big data tools like Spark or Hadoop.
  • Experience in sectors such as healthcare, finance, or e-commerce.
Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*