Core Concept
Applied Scientist takes two inputs and produces one output. Inputscurrent_notebook: your baseline Jupyter notebook with a working modelresearch_paper: a PDF describing the method you want to try
- An
ExperimentResultwith a verdict (BETTER,WORSE,INCONCLUSIVE, orFAILED), a structured comparison table of metrics, and written explanations of what was tried and why the verdict was reached.
{workspace}/experiments/{name}/ so you can come back to it later.
Why Use Upsonic Prebuilt Autonomous Agent
A question we hear a lot: why use this instead of just doing the same thing in Cursor or Claude Code? The short answer is that those are general coding copilots, and Applied Scientist is a purpose-built experiment runner. The differences come down to four things:- Fully isolated workspace folder. Every experiment runs inside its own
{workspace}/experiments/{name}/directory. The agent cannot touch your editor’s working tree or anything outside the workspace. - Fully structured outputs. Every run produces the same shape of result: a verdict, a comparison table, a summary, an explanation. No free-form chat to parse, no screenshot to reread.
- Tested, battle-hardened workflow. The pipeline for reading the paper, running the baseline, applying the method, and benchmarking is fixed and pre-tested. You do not assemble it yourself from prompts.
- Runs directly inside Jupyter. The agent is designed to live in a notebook cell. Progress bars, log timelines, and result cards all render as HTML right where you are working.
Cursor & Claude Code vs Upsonic Prebuilt Autonomous Agents
| Dimension | Cursor & Claude Code | Upsonic Applied Scientist |
|---|---|---|
| Workspace | Runs in your working repo, shared with your editor | Fully isolated workspace folder per experiment |
| Output | Free-form chat and file edits | Structured ExperimentResult (verdict, comparison table, metrics) |
| Workflow | Assembled case by case in the chat | Pre-tested, battle-hardened pipeline |
| Environment | Outside the notebook | Runs directly inside Jupyter |
The Experiment
The central unit of Applied Scientist is the experiment. Each experiment is a named run with its own folder and its own set of JSON files on disk. When you callscientist.new_experiment("my_run", ...), the agent creates:
"name" field in every JSON file the agent writes. You supply it yourself, the agent never derives it from the paper title or adds suffixes. That means scientist.experiments["my_run"] always points at this exact run.
Phases & Progress
During a run, Applied Scientist moves through a fixed sequence of phases (reading the paper, running the baseline, extracting metrics, applying the new method, benchmarking, writing the result). You can watch what it is doing in three ways:experiment.progress_bar: a snapshot of the current phase and activity, rendered as an HTML progress bar. Re-run the cell to get a fresh snapshot.scientist.progress_bar_live(experiment, interval=5): a blocking live view that auto-refreshes everyintervalseconds until the run finishes.experiment.last_logs(n): a timeline of the lastnentries fromlog.json, rendered as HTML.
progress.jsonis a snapshot. It is overwritten on every tick and only tells you the current state.log.jsonis append-only. Every phase writes a timestamped entry with its action, status, and structured details. This is where you go to see what the agent actually did.
Running an Experiment
The steps below walk through a full run end to end. Each step maps to a cell in the companion demo notebook.1. Install and configure
2. Create the agent
workspace is the root directory the agent is allowed to work in. All experiment folders are created inside it.
3. Prepare the experiment
The first positional argument is the experiment name. It becomes the folder name and the registry key.| Parameter | Purpose |
|---|---|
name (positional) | Experiment name, used as folder name and registry key |
research_paper | Path to the PDF of the method you want to try |
current_notebook | Path to your baseline notebook |
current_data | Short description of where the data comes from |
experiments_directory | Where the experiment folder is created (relative to workspace) |
inputs | Extra files or folders the agent should have read access to |
4. Run in the background
run_in_background() starts the run in a daemon thread, silences the agent’s printing, and returns immediately.
experiment.is_running:Truewhile the thread is alive and has not finishedexperiment.is_done:Trueonce the run has either succeeded or erroredexperiment.error: the exception object if the run raised, otherwiseNone
5. Watch progress
6. Stop or wait
If you change your mind mid-run,stop() requests a cooperative cancel. The agent raises at its next pipeline checkpoint.
wait() returns the ExperimentResult and re-raises any exception the run produced.
7. Read the result
Once the run finishes,experiment.result returns an ExperimentResult parsed from result.json. It renders as an HTML card in Jupyter and also exposes four Python attributes:
result.table looks like this:
| Field | Type | Meaning |
|---|---|---|
name | str | Metric name (e.g. accuracy, f1, auroc) |
current | float | Value from the baseline run |
new | float | Value from the new method |
diff | float | Raw difference new - current |
diff_display | str | Human-friendly diff (e.g. +1.2%) |
unit | str | Unit of the metric |
higher_is_better | bool | Whether larger values are better for this metric |
better | bool | Whether the new method won on this metric |
result.record gives you the underlying ExperimentRecord with access to log.json, progress.json, and registry metadata.
Managing Experiments
Every experiment you create is recorded inexperiments.json. The registry is re-read from disk on every call, so it always reflects current state.
List every experiment, newest first:
name, date, status, verdict, baseline_model, new_method, paper, and path.
To access an experiment programmatically by name:
API Reference
The full demo notebook for this agent lives in the Upsonic repo under prebuilt_autonomous_agents.

