2021-02-16

pandasに一からデータを入れる（自分用メモ）。

pandasの使い方をググると、多くの場合にはsampleのcsvファイルを用意してそれを読み込ませてから使うという流れのものが多い印象がある。
読み込ませるファイルの区切りは基本的にカンマかタブであることを前提として作られている。
しかし、数値計算を扱っているコミュニティーとしては普通は（複数の）空白で数値を区切る。そして、普通に空白を区切りとして指定しpandasに読み込ませると、NaNだらけになってパッと使えない。
そのため、通常の用途ではnumpy.loadtxtで数値計算結果ファイルを読み込ませた方が楽である。
だが一方で、pandasのDataFrameのフィルタリングは非常に便利で、恩恵も多い。そのため、何らかの方法で用意したリストからDataFrameを構築し、体裁を整える or 解析することをやりたい。

例えば、次のような計算結果ファイル(sample.dat)があるとする。

Energy  Intensity  Quantity
0.0  0.0  0.0
0.5  0.25  1.0
1.0  0.75  3.0
...

これを例えば横軸をEnergy、縦軸をIntensityにしてプロットしたいときには、numpy.loadtxtで読み込ませた配列を転置して使う。

import numpy as np
from matplotlib import pyplot as plt

data_T = np.loadtxt( 'sample.dat', skiprows=1 ).T  # Transpose
plt.plot( data_T[0], data_T[1] )
plt.show()

転置することで、行の指定によってEnergyかIntensityかを選ぶことができる。

一方、pandas.DataFrameに流し込む際には、表の構造を保ちたいので、転置無しでnumpy.loadtxtを使う。

import pandas as pd

data = np.loadtxt( 'sample.dat' )  # No Transpose
df = pd.DataFrame( data, columns=('Energy', 'Intensity', 'Quantity') )
# df.values is same as data

columnsやindexを変更するには、要素ではなくリストまるごとで指定する必要がある。

df.columns = ('E', 'I', 'Q')

中身の修正には、DataFrame.valuesからではなく、DataFrame.locやDataFrame.atと言った要素指定メソッドを介して行う。

df.loc[ :, 'I' ] *= 10

データの追加には、いろいろ方法があるが、好きな位置に挿入できるinsertが個人的には分かり易い。

def func( e_ ):
   ...

df.insert( len(df.columns), 'Q2', func( df.loc[ :, 'E' ].values ) )

Energyに依存する適当な関数funcを用意して、Q2として最終列に挿入した。
ただし、insertは元のdfを変更してしまう（破壊的な手続き）であるため、メソッドチェーンに組み込むことができない場合がある。その時は、他のメソッドを使った方が無難であろう。

気に入らない行が列がある場合には、DataFrame.dropで落とせる。

df = df.drop( index =[...], columns=[...] )

本命のフィルタリングについては、DataFrame.queryを用いることで実行可能である。

e0 = 10
max_intensity = 1
print( df.query('E > @e0 and I < @max_intensity') )
# Not like df.query('"E" > @e0 and "I" < @max_intensity')

文字列で条件を入れることに注意。また列名が文字列でも「文字列の中で文字列」にする必要はない。変数は@を付ける。
注意として、列名に空白やピリオドが含まれているとコケる。一応、DataFrame.queryを使わなくても以下のような形で複数条件によるフィルタリングが可能ではある。

print( df.[ (df['E'] > e0 ) & (df['I'] < max_intensity) ] )

最後にソートについて。

df.sort_values( by='Q2', ascending=True )

"ascending=True"で昇順になる。

参考サイト
pandas.DataFrameの構造とその作成方法 | note.nkmk.me
pandas.DataFrameに列や行を追加（assign, appendなど） | note.nkmk.me
pandas.DataFrameの行・列を指定して削除するdrop | note.nkmk.me
pandas.DataFrameの行を条件で抽出するquery | note.nkmk.me
データ分析で頻出のPandas基本操作 - Qiita
pandasでcsv/tsvファイル読み込み（read_csv, read_table） | note.nkmk.me

2021-01-27

pythonでの==とisの違い。

プログラミング

次のコードを見て頂きたい。

x = 0; y = 0

if (x, y) is (0, 0):
    print( True )
else:
    print( False )

if (x, y) == (0, 0):
    print( True )
else:
    print( False )

ニュアンスはどちらも同じようなことをしたい訳だが、"is"では"False"が、"=="では"True"が出力される。

以下のサイトに説明を発見。
== と is の違い | Python-izm

"is"だと、値が同じかどうかではなく、「同一のオブジェクトかどうか」で判断されるため、弾かれたということであった。

もしかしたら、過去のプログラムにこの手のバグがある気がしてめっちゃ怖くなった。。。

2021-01-25

pythonでawkを呼んでファイルを編集する。

プログラミング

pythonを使って、awkでinput fileを編集しながらプログラムを回したい。

参考サイト様：
テキスト処理にたまに便利なAWK入門 - Qiita
awkで処理結果を元になったファイルに上書きする | 俺的備忘録〜なんかいろいろ〜
フォーマット文字列内での波括弧のエスケープ - Qiita

from subprocess import call

input_file = 'input_file'
n_line = 999
replace_string = '"{:}"'.fomrat( 'string' )

cmd = "awk -i inplace '{{ if (NR == {:}) print {:} ; else print $0 }}' {:}".format( n_line, replace_string, input_file )
call( cmd, shell = True )

call( 'path to external program', shell = True )

上で書いたawkに流し込まれるコマンドは、

awk -i inplace '{ if (NR=999) print "string" ; else print $0 }' input_file

となる。

ポイントは以下の通り。

awkのコマンドでクオーテションが必要なので、文字列表現としてダブルクオーテーションを用いている。
「-i inplace」で読み込んだファイルを上書きすることができる。
pythonで中括弧を書くには、二個重ねて書く。
NRはawkがファイルを読んでいる最中に更新される行番号変数。上のコマンドでは、NRがn_lineと同じときに行の内容をreplace_stringに置き換えるようにしている。つまりn_line行目を変更する。
awkで文字列を書き込むには、ダブルクオーテーションが必要なので、「'"string"'」という形で文字列変数を宣言しておく。つまり、「"..."」を含めて文字列にしておく。
「$0」は一行分の行の内容全部である。したがって、興味のない部分はそのまま書き出しておけば良い。

2021-01-15

グラフ理論の個人用メモ（その２）

数学

bipartite grapheのcycleは全て偶数長。

edge数の制限

vertex数 $n$ のsimple graphにおいて、connected graphが $k$ 個含まれているとき、edge数 $m$ は次の不等式を満たす。
$\displaystyle n-k \le m \le\frac{1}{2}(n-k)(n-k+1)$
これは、cycleが無いときに最もedge数が少なく、complete graphのときに最もedge数が多いことが背景にある。

conneceted graphの条件。

vertex数 $n$ のsimple graphにedge数が $m \ge \frac{1}{2}(n-1)(n-2) + 1$ 個あれば、connected graphである。

disconnecting set

edge setの部分集合で、それをgraphから消去すると、disconnected graphになるようなもの。非連結化集合。

cutset

disconnecting setのうち、真部分集合がdisconnecting setではないもの。

bridge

元が1個からなるcutset。

edge connectivity

最小なcutsetの大きさ。辺連結度。

separating set

vertex setの部分集合で、その部分集合およびvertexにincidnetしているedgeを消去すると、disconnected graphになるようなもの。分離集合。

cut vertex

元が1個からなるseparating setに含まれているvertex。

girth

最短なcycleの長さ。内周。

distance

最短のpathの長さ。距離。

Turánの極値定理

vertex数2kのsimple graphで、三角形が無いとすると、edge数は $k^2$ 以下。

独立

edge setにcycleが含まれていないとき、edge setは独立であるという。

handshakingg lemma

任意のgraphにおけて、全てのvertexのdegreeの和は偶数になる。握手補題。

補題6.1

全てのvertexのdegreeが2以上のとき、cycleが存在する。

定理6.2

connected diagramがEulerian graphである必要十分条件は、全てのvertexのdegreeが偶数であることである。

系6.3

connected diagramがEulerian graphである必要十分条件は、edge setが互いに素なcycleに分割できることである。

系6.4

connected diagramがsemi-Eulerian graphである必要十分条件は、奇数次のvertexを2個だけ含んでいることである。

Fleuryのアルゴリズム

次の規則に従いながら、任意のinitial vertexから自由にedgeを辿る。
1. 辿ったedgeは削除する。それによりisolated vertexが生じたら、それも削除する。
2. どの段階でも、他に辿るedgeが無い限り、bridgeは渡らない。

Oreの定理

$n \ge 3$ のsimple graphにおいて、adjacentでない任意の二つのvetexを $v, w$ とするとき、 $deg( v ) + deg( w ) \ge n$ であれば、Hamiltonian graphとなる。

Diracの定理

$n \ge 3$ のsimple graphにおいて、任意のvertex $v$ に対して $\forall v \, deg(v) \ge n/2$ であれば、Hamiltonian graphとなる。

参考文献：グラフ理論入門

2021-01-11

グラフ理論の個人用メモ。

数学

graph

vertexとedgeの集合。

vertex

点。

edge

二つのvertexの非順序対。辺。

adjacent

edgeで繋がれている二つのvertex間の関係。もしくは、veterxを共有している二つのedgeの関係。隣接。

incident

edgeにvertexが含まれているときの「edgeとvertex」の関係。接続。adjacentとの混同に注意。

degree

あるvertexにincidentしているedgeの数。

isolated vertex

degreeが0のvertex

end vertex

degreeが1のvertex

loop

同じvertexによるedge。

simple graph

loopやmultiple edgeを含まないgraph。

directed graph (digraph)

edgeが非順序対ではなく、順序対になったgraph。

walk

あるvertexからあるvertexへの行き方。vertexの有限列。ただし、隣り合う項の対はedge setに含まれていなければならない。walkに含まれるedgeの数が長さとして定義される。歩道。

trail

含まれるedgeが全て異なるwalk。小道。

path

どのvertexも高々一つしか現れないwalk。道。

cycle

出発と終着が同じpath。walkでもcycleと言うのか不明。

Eulerian graph

全てのedgeを一回ずつ通って出発点に戻るwalkを持つgraph。

Hamiltonian graph

全てのvertexを通るcycleがpathになっているgraph。
Eulerian graphのvertex version。

connected graph

どのvertexもwalkで繋げることのできるgraph。

disconnected graph

walkで繋げることのできないgraphを含むgraph。connected graphのunion。

subgraph

vertex setおよびedge setが部分集合になっていて、それら部分集合がgraphを成しているもの。

complement

vertex setは同じで、adjecentでなかったvertexを繋いだgraph。補グラフ。

tree

どのvertex間もpathが一つしかないgraph。

planar graph

交差が無いように書き直せるgraph。

regular graph

どのvertexのdegreeも等しいsimple graph。

complete graph

相異なるvertexが全てadjacentにあるsimple graph。 $K_n$ と書き、edgeの数は $n(n-1)/2$ 。
degree $n-1$ のregular graphとも言える。

cycle graph

degree 2のregular graph。vertexの数が $n$ のcycle graphを $C_n$ と書く。

path graph

cycle graphからedgeを一つ取り除いたgraph。 $P_n$ と書く。

wheel

$C_{n-1}$ にvertexを一つ足し、他の全てのvertexとadjacentになるようにedgeを追加したgraph。 $W_n$ と書く。

bipartite graph

graphのvertex setを互いに素な二つのvertex setに分け、graphの全てのedgeが分割したvertex set間を繋いでいるとき、このgraphはbipartite graphと呼ばれる。

complete bipartite graph

分割したvertex set間の各点が相手のvertex setのvertex全てと隣接しているとき、このgraphはcomplete bipartite graphと呼ばれる。

contraction

あるedgeを消去し、それにincidentしていた二つのvertexを統合して一つのvertexにすること。縮約。

isomorphic

二つのgraph間でvertexとedgeが一対一対応する状態。同形。

self complementary

complementが元のgraphとisomorphicな状態。自己補対。

automorphism

vertex setに対して全単射でかつedge setの構造を保存している写像。自己同形写像。

参考文献：グラフ理論入門

2020-12-03

ネイピア数について。

数学

$e^x$ が微分によって不変である性質を使うと、
$\displaystyle f(x) = a^x \\ \displaystyle \lim_{h \rightarrow 0} \frac{ f(x + h) - f(x)}{h} = f( x ) \\ \displaystyle \lim_{h \rightarrow 0} \frac{ a^h - 1 }{h} = 1 \\ \displaystyle \lim_{h \rightarrow 0} ( a^h - 1 ) = \lim_{h \rightarrow 0} h \\ \displaystyle \lim_{h \rightarrow 0} a^h = \lim_{h \rightarrow 0} ( 1 + h ) \\ \displaystyle a = \lim_{h \rightarrow 0} ( 1 + h )^{1/h} \equiv e$
左右微分を気にすれば、次のことが言える。
$\displaystyle e = \lim_{h \rightarrow 0+} ( 1 + h )^{1/h} = \lim_{h \rightarrow 0-} ( 1 + h )^{1/h} \\ \displaystyle \qquad = \lim_{H \rightarrow \infty} ( 1 + 1/H )^{H} = \lim_{H \rightarrow \infty} ( 1 - 1/H )^{-H} = \lim_{H \rightarrow \infty} ( 1 + 1/(-H) )^{-H}$

同様に $(e^{-x})' = -e^{-x}$ より、
$\displaystyle f(x) = a^x \\ \displaystyle \lim_{h \rightarrow 0} \frac{ f(x + h) - f(x)}{h} = - f( x ) \\ \displaystyle \lim_{h \rightarrow 0} \frac{ a^h - 1 }{h} = - 1 \\ \displaystyle \lim_{h \rightarrow 0} ( a^h - 1 ) = \lim_{h \rightarrow 0} ( - h ) \\ \displaystyle \lim_{h \rightarrow 0} a^h = \lim_{h \rightarrow 0} ( 1 - h ) \\ \displaystyle a = \lim_{h \rightarrow 0} ( 1 - h )^{1/h} = \lim_{h \rightarrow 0} ( 1 + (-h) )^{1/h} = \lim_{h' \rightarrow 0} ( 1 + h' )^{-1/h'} \equiv e^{-1}$

では $e^x$ はどうなるかというと、
$\displaystyle e^x = \lim_{h \rightarrow 0} ( 1 + h )^{x/h} = \lim_{H \rightarrow \infty} \left( 1 + \frac{|x|}{H} \right)^{{\rm sgn}(x) H} = \lim_{H \rightarrow \infty} \left( 1 + {\rm sgn}( x ) \frac{x}{H} \right)^{{\rm sgn}(x) H} \\ \displaystyle \qquad = \lim_{H \rightarrow \infty} \left( 1 + \frac{x}{{\rm sgn}( x ) H} \right)^{{\rm sgn}(x) H} = \lim_{H' \rightarrow \infty} \left( 1 + \frac{x}{H'} \right)^{H'} \quad ( H = |x|/h ) \\ \displaystyle e^{-x} = \lim_{h \rightarrow 0} ( 1 - h )^{x/h} = \lim_{H \rightarrow \infty} \left( 1 - \frac{x}{H} \right)^{H}$

また、マクローリン展開を利用すると、
$\displaystyle e^x = \sum_{n=0}^\infty \frac{ x^n }{n!} \\ \displaystyle e^{-x} = \sum_{n=0}^\infty \frac{ (-x)^n }{n!}$

2020-11-24

外部プログラムをpythonでループさせて動かす（自分用のメモ）。

プログラミング

pythonを使って外部プログラムをループ動作させる。

エラーが起こることを想定して、適当にエラーを起こさせるためのfortranプログラムのソースを用意。

! test.f90
program test
open( 10, file = 'test.inp', status = 'old' )
end program test

適当にコンパイルしておく。

gfortran test.f90 -o test.out

ループの途中でのファイルのやり取りを想定して、適当なテキストファイルを用意。

touch a.txt

適当なループでtest.outを繰り返し動作させるpython スクリプト。
errorを発生させるため、test.outが読み込むtest.inpを用意せずに動かす。
発生したerrorはtest.errに書き込ませるため、動作は途中で止まらず、故にループも止まらない。
以下のスクリプトでは、各ループでの結果を別々のディレクトリに保存することを想定しているため、ディレクトリ及びパス操作が含まれている。

# test.py

import os
import shutil
import time
from subprocess import call

time_start = time.time()

# corrent work directory
cwd = os.getcwd()

# arbitrary loop
for i in range(3):

    # directory path which stores results
    path = './test{}'.format( i )

    # check path and make directory
    if os.path.exists( path ):
        print( '{}.exists.'.format( path ) )
    else:
        os.makedirs( path )

    # copy required files from cwd
    shutils.copy( cwd + '/a.txt', path + '/a.txt' )

    # change directory
    os.chdir( path )
    
    # main operation
    cmd = '{}/test.out 2> test.err'.format( cwd )
    call( cmd. shell = True )

    # return to the original directory
    os.chdir( cwd )

time_end = time.time()
dt = time_end - time_start
print( 'time: {:.1f}'.format( dt ) )
print( 'time: {:d}h {:d}m'.format( int(dt/3600), int(dt%3600/60) )

nano_exit

基礎的なことこそ、簡単な例が必要だと思うのです。

pandasに一からデータを入れる（自分用メモ）。

pythonでの==とisの違い。

pythonでawkを呼んでファイルを編集する。

グラフ理論の個人用メモ（その２）

グラフ理論の個人用メモ。

ネイピア数について。

外部プログラムをpythonでループさせて動かす（自分用のメモ）。