Skip to main content

From Script Usage to Computational Reasoning: My Progression in Python, R, and Bioinformatics

This article is a reflective and technical account of how my understanding and practical usage of Python, R, RStudio, and bioinformatics workflows has evolved over time. It is written deliberately without hype. The goal is not to exaggerate proficiency, but to document growth, limitations, scale of data handled, and a statistically reasoned trajectory of where my computational capacity is realistically headed over the next three years.


1. Starting Point: Computational Literacy, Not Expertise

My initial engagement with programming languages was not as a formally trained computer scientist, but as a biomedical researcher responding to data pressure. Early usage of R and Python was functional and problem-driven: plotting figures, running basic statistics, reshaping tables, and automating repetitive tasks. I did not begin with algorithmic depth; I began with necessity.

At this stage, my interaction with code was characterized by:

  • Script reuse with modification
  • Heavy reliance on documentation and examples
  • Minimal abstraction or modularity
  • Strong biological intuition guiding computational decisions

This is an important baseline. Many scientists skip documenting this phase, but it represents the statistically dominant entry point into computational biology.


2. Evolution of Python Usage: From Scripts to Pipelines

Dimension Early Phase Current State (2025)
Primary use Single-file scripts Multi-step task pipelines
Libraries pandas, matplotlib pandas, numpy, matplotlib, PyPDF2, PIL, scikit-learn (basic)
Data size handled KB–MB GB-scale structured datasets
Error handling Minimal Explicit checks, logging awareness

Python gradually became my preferred language for automation and data engineering tasks. Examples include:

  • Batch processing of hundreds to thousands of images and PDFs
  • Automated file system operations across multi-TB external drives
  • Programmatic annotation, merging, and transformation of research artifacts
  • Custom scripts to support wet-lab documentation and reporting

Importantly, I remain honest about limits: my Python usage is applied and utilitarian, not algorithmically novel. However, it is reproducible, scalable, and grounded in real biomedical workflows.


3. R and RStudio: Statistical Thinking Over Syntax

R became central to my scientific reasoning because it enforced statistical discipline. Unlike Python, where syntax flexibility can obscure logic, R workflows forced explicit engagement with assumptions, distributions, and model validity.

Competency Area Demonstrated Capability
Data wrangling tidyverse pipelines, factor control, reshaping
Statistics t-tests, ANOVA, non-parametric tests, ROC analysis
Visualization ggplot2 with publication-oriented formatting
Reproducibility Script-based analysis, session-aware workflows

RStudio, in particular, reshaped how I think about analysis as a narrative process: raw data → cleaned data → tested assumptions → statistical inference → figure generation.

The scale of data handled in R typically ranges from:

  • Dozens to hundreds of patient samples
  • Thousands of molecular features (e.g., miRNAs, cytokines)
  • Multiple linked datasets spanning clinical, molecular, and experimental domains

4. Bioinformatics Literacy: Honest Scope

My bioinformatics expertise is best described as literate and integrative, rather than deeply algorithmic. I can:

  • Understand and interpret NGS workflows end-to-end
  • Perform downstream analyses using guided and reproducible scripts
  • Critically evaluate outputs (QC metrics, fold changes, statistical validity)
  • Translate computational results into biological meaning

However, I do not claim:

  • Independent development of novel bioinformatics algorithms
  • Deep optimization of alignment or variant-calling engines

This honesty matters. In translational science, correct interpretation often outweighs raw computational virtuosity.


5. Machine Learning and LLM Understanding: Conceptual, Not Performative

Aspect Current Level Annotation
Classical ML Foundational Regression, classification, ROC interpretation
Model evaluation Moderate Overfitting awareness, validation logic
LLMs Advanced user Prompt structuring, constraint design, critical validation

I treat ML and LLMs as statistical instruments, not magic engines. My strength lies in knowing when a model is inappropriate, biologically implausible, or statistically underpowered.


6. Quantifying Scale: How Much Data Can I Handle?

Data Type Typical Scale Handling Strategy
Tabular clinical data 10⁴–10⁵ rows Chunking, vectorized operations
Images (IHC, microscopy) 10³–10⁴ files Batch automation (Python)
NGS outputs GB-scale Downstream analysis & interpretation

This places me comfortably above casual scripting users, while still grounded within the realities of biomedical infrastructure.


7. Trajectory Projection (2026–2028)

Computational Maturity Index
│
│                          ██████████████  (2028)
│                     █████████████
│                ████████████
│           ██████████
│      █████████
│ ███████
│
└──────────────────────────────────────────
   2023    2024    2025    2026    2027    2028

This projection is based on:

  • Observed learning velocity (2023–2025)
  • Increasing data complexity
  • Integration of computation into daily research practice
  • Shift from reactive coding to anticipatory design

Statistically, such trajectories follow a logistic growth curve, not exponential hype. Plateaus are expected—and healthy.


8. Concluding Reflection

My computational evolution is not defined by mastery claims, but by increasing alignment between biological questions and computational tools. Python, R, and bioinformatics have become extensions of my scientific reasoning rather than separate skills.

This post serves as a transparent record: of growth, limits, scale, and direction. If scientific credibility depends on reproducibility and honesty, then this trajectory is one I am comfortable documenting publicly.

Keywords: Python, R, RStudio, bioinformatics literacy, computational biology, reproducible research, scientific growth

Comments

Popular posts from this blog

KRAS-Driven Oncogenic Signalling in Pancreatic Ductal Adenocarcinoma: Molecular Mechanisms, Regulatory Pathways, and Therapeutic Frontiers

Pancreatic ductal adenocarcinoma (PDAC) is a characteristically aggressive tumour resistant to chemotherapy, and at the centre of this malignant phenotype lies an almost universal dependency on activating mutations in the KRAS oncogene. More than 90 %of PDAC tumours present with alterations in the  KRAS oncogene, most frequently at codon 12, and these mutations represent the primary cause of the tumour’s signalling complexity, metabolic heterogeneity and stromal orchestration. The predominance of KRAS in PDAC reflects the capacity of mutant KRAS to adversely affect cellular processes in the tumour microenvironment that sustain the tumour’s growth, plasticity, survival and resistance to therapy. The biochemical behaviour of KRAS is rooted in its role as a molecular switch cycling between inactive GDP-bound and active GTP-bound conformations. In physiologically normal cells, this transition is carefully modulated by guanine nucleotide exchange factors and GTPase-activat...

My Year with ChatGPT (2025): Quantifying the Growth of an AI-Literate Scientist

In 2025, my interaction with artificial intelligence transitioned from occasional consultation to sustained intellectual collaboration. This post documents that trajectory using structured metrics, annotated summaries, and reflective analysis, treating AI usage as a measurable component of scientific skill development rather than a casual productivity aid. 1. Temporal Scope and Engagement Intensity Metric Scientifically Interpretable Value Annotation Calendar window January–December 2025 Continuous annual engagement, not episodic usage Total active days Multi-month distributed activity Indicates integration into routine research workflow Session depth High (20–30+ conversational turns/session) Reflects iterative hypothesis refinement rather than query–response use Cumulative active time Equivalent to several full working weeks Comparable to time invested in a structured training module Interpretation: Engagement patterns resemble supervised intelle...