Plug your Python model

Suggest edits
Documentation > Plug

Content:

1 - PythonTask syntax
2 - Embedding a Python script
3 - Using Python packages


PythonTask syntax 🔗

Preliminary remarks 🔗

The PythonTask uses the Singularity container system. You should install Singularity on your system otherwise you won't be able to use it.

The PythonTask supports files and directories, in and out. Get some help on how to handle it by reading this page.
The PythonTask relies on an underlying ContainerTask but is designed to be transparent and takes only Python-related arguments.

Arguments of the PythonTask 🔗

It takes the following arguments :
  • script String or file, mandatory. The Python script to be executed.
  • version String, optional. The version of Python to run.
  • install Sequence of strings, optional (default = empty). The commands to be executed prior to any Python packages installation and script execution (to install libraries on the system).
  • libraries Sequence of strings, optional (default = empty). The name of Python libraries (through pip) that will be used by the script and need to be installed before (note: as detailed below, installations are only achieved during the first execution of the script, and then stored in a docker image in cache. To force an update, use the forceUpdate argument).
  • forceUpdate Boolean, optional (default = false). Should the libraries installation be forced (to ensure an update for example). If true, the task will perform the installation (and thus the update) even if the library was already installed.
  • prepare Sequence of strings, optional (default = empty). System commands to be executed just before to the execution of Python on the execution node.

Embedding a Python script 🔗

The toy Python script for this test case is:

import sys

f = open("output.txt", 'w')
f.write(str(arg))

We save this to hello.py. It does nothing but printing its first argument to the file passed as a second argument.

To run this script in OpenMOLE, upload hello.py in you workspace. You can then use the following script:

// Declare the variables
val arg = Val[Int]
val output = Val[File]

// Python task
val pythonTask = PythonTask(workDirectory / "hello.py") set (
    inputs += arg.mapped,
    outputs += arg,
    outputs += output mapped "output.txt",
)

// Workflow
DirectSampling(
    evaluation = pythonTask,
    sampling = arg in (0 to 10)
) hook (workDirectory / "result")

Notions from OpenMOLE are reused in this example. If you're not too familiar with Environments, Groupings, Hooks or Samplings, check the relevant sections of the documentation.

Using Python packages 🔗

One crucial advantage of the Python programming environment is its broad ecosystem of packages, for example used in the machine learning community. You can use Python packages in your script, through the libraries argument.

Below is an example, available on the marketplace, which applies a very basic "machine learning" technique (logistic regression) using the scikit-learn Python packages, to the outputs of a NetLogo model, providing a sort of "meta-model" to predict the outputs of the simulation model as a function of its parameters without running it.

The syntax for the Python task is the following:

// Declare variables
val training = Val[File]
val validation = Val[File]
val errdensity = Val[Array[Double]]
val errresistance = Val[Array[Double]]
val score = Val[Double]

// Define task
val sklearnclassifier = PythonTask(
    workDirectory / "logisticregression.py",
    libraries = Seq("pandas","numpy","sklearn")
) set (
    inputs += training mapped "data/training.csv",
    inputs += validation mapped "data/validation.csv",
    outputs += errdensity mapped "errdensity",
    outputs += errresistance mapped "errresistance",
    outputs += score mapped "score"
)

with the following Python script:

from sklearn.linear_model import LogisticRegression
import pandas
import numpy

d = pandas.read_csv('data/training.csv')
dp = pandas.read_csv('data/validation.csv')

X = d[['density','resistance']]
y = d['binaryburnt']

Xp = dp[['density','resistance']]
yp = dp['binaryburnt']

clf = LogisticRegression(random_state=0, solver='lbfgs').fit(X, y)
pred = clf.predict(Xp)
prederror = dp.loc[abs(pred - yp)==1]

# define outputs - must be "standard types", not objects (basic types and multidimensional lists)
errdensity = list(prederror['density'])
errresistance = list(prederror['resistance'])
score = clf.score(Xp,yp)

See the market entry for plugging with NetLogo and complete script.