Dongming Jin
Date: 07/14/2018
The following map summarized my general WorkFlow for data analysis based on Python, which is a popular community-driven and user-friendly language.
I hope the WorkFlow and Methodology beneath can also serve as a reference for other language/projects, where Python is found useful as a prototyping/demonstration tool.
Some of the basic concepts include,
Use mind map to conveniently remark and organize the outline of a project.
Content type in MindNode
http://
Export the mind map to markdown document to extend the details on each topic.
Markdown Basics: the key formatting syntax. Markdown is also compatible to use html markup most of the time.
Containerize: balancing between system isolation and performance like a sandbox for micro-service.
Basic concepts
docker pull
command to download the busybox
image.docker run
which we did using the busybox image that we downloaded. A
list of running containers can be seen using the docker ps
command.1. Start from base image
Basic commands
docker pull image_name # pull public image/repository from a registry
docker build [--no-cache] -t image_name path/to/Dockerfile [-f renamed-dockerfile]
docker run -it # interactive
--rm # rm container when exit
-d # run as detached
-p 8888:8888 # port fwd to host
-e DISPLAY=$DISPLAY # set environment variable
-u user # username/uid in image
-v path/to/local:path/to/container # mount directory
image_name
[command]
docker port container_id # show the open ports of a container instance
docker start/attach/stop/rm container_id # manage a container instance
docker rmi image_id # remove an image
2. Record needed ingredients in requirement.txt
While developing, record the additional python packages in a text file named requirement.txt
,
which will be useful to construct the Dockerfile
to automatically
configure the developing environment, as well as hosting an interactive Jupyter
notebook with mybinder.
pip freeze >
requirement.txt
: list installed package3. Compose building recipe to Dockerfile
Some resources
Steps
RUN sudo pip install -r
requirement.txt
Some principles
A few suggestions
Code Styling: PEP8
_
for build-in variablesfunc(a=1, b=3)
or c = a/b
>2 space # 1 space
inline comments, docstring/
Practical steps
1. Dev with Jupyter, note issues: interactive notebook is very handy at development
2. Aggregate to python script: modularize codes into functions
More to read and adopt
3. Checkpoint scripts with git: git log the progress
Github git cheat sheet: some basic operations
When the code is ready to share,
python setup.py sdist
1. Modularize function: if not done earlier
2. Unittest: remember the issues we note down during the developing? These are good cases to write up tests about. A more proactive concept is test-driven programming.
├── __init__.py
├── code.py
├── func_a.py
├── func_b.py
├── func_c.py
└── tests
├── __init__.py
├── test_funcs.py
└── test_something.py
testing framework
pytest pytest-cov
tests/
py.test
3. Continuous Integration: use continuous integration to automatically test when something changes in repository.
.travis.yml
file to config4. Profiling & Optimization
Premature optimization is the root of all evil. – Donald Knuth
Tips from Cameron Hummels
Parallel computing
5. Documenting: essential for future revisit or further development.
versioning: x.y.z (E.g., 0.2.3, 2.7.12, 3.6)
• change x for breaking changes • change y for non-breaking changes • change z for bug-fixes
docstring and comments tips
Following my WorkFlow, most of the work has been done at this stage. The rest can be carried out in very minimum effort with decent finish.
Example finish: use ?
for keyboard shortcuts to control the slides.
1. MindNote -> Markdown: re-arrange and convert the outline mind map to markdown.
Warning: Jupyter notebook should be re-organized for presentation, especially dissertation defense! The order of work is not necessary the order of talk! Check my LSST talk for some tips.
2. Markdown -> HTML: extend the details in markdown and convert to html.
Pandoc: powerful tool for conversion.
brew install pandoc
%
for frontpage infopandoc -s --mathjax -i -t
--slide-level=2 revealjs WorkFlow.md -o WorkFlow.html
3. Slideshow: HTML + reveal.js
-V revealjs-url=http://lab.hakim.se/reveal-js
when using pandoc to
convertor download reveal.js to the same directory of the converted directory
Now it’s ready to open the html file to start the slideshow, use ?
for keyboard shortcuts to
control the slides.
To take the research/project to a workshop, we need to recall what we’ve done.
1. Config env.: Dockerfile
With everything done, it is now easy to put all the ingredients and recipe together into a Dockerfile.
2. Demo: Markdown + Scripts -> Jupyter notebook
Following the mind map and resulting markdown file, we can put the outline structure in Jupyter notebook since it natively supports markdown formatted cells. We can fill in the function calls and visualization codes in between.
My secret on toggling code cells
nbextension
codefolding
: $ jupyter nbextension enable codefolding/main
nbconvert
with --template template.tpl
Another wheel to edit slide styles on the fly
3. Slideshow: nbconvert + reveal.js
Wrap up commands to convert notebook into slideshow
When it is ready to take the project to the public, there are a few wheels very handy to make it more appealing.
Live slideshow: add some markup in the url of the html file in repository to render the slideshow in live, not always working.
reveal html file on github:
go to http://htmlpreview.github.io/?
+git_html_url
+?print-pdf
Live notebook demo: binder
Everything is ready, just paste the repository link to mybinder.org
Project webpage: github.io + HTML
github.io: use any of the converted html file to set it up in 3 steps
username.github.io
https://username.github.io
Documentation: Read the Docs
This step depends on how often and well the project is documented. If earlier guide is followed, there is no pain at all.
sphinx-quickstart -a "Name" -p Repo -v 0.1 --ext-autodoc -q
Building test: Travis CI
Follow the manual/documentation!
The End!
email: contact@domij.info