It happens often enough that after cutting out irrelevant details we end up with a simple model of a complex system, yet it doesn’t allow us to obtain predictions about the system in a straightforward way. Many such models require simple operations to be performed in order to describe the system. However, the number of required operations is astronomically huge, and no computer power has the capability to tackle the straightforward routine. Nonetheless, to circumvent those difficulties a method of sampling is invented (so-called Monte Carlo approach). Since then many modifications have entered the scene.
As far as we can’t account all the details of the complex system, we use simple models, which try to incorporate most relevant properties of the system for our particular application. Correct choice of the model is crucial in making predictions about behavior of the complex systems. As an example, consider a satellite orbiting around the Earth, for it we don’t take into account the influence of Pluto’s gravity and treat the satellite as a point, among many other assumptions. However, to verify that the model is relevant one needs to compare its predictions with the real outcomes. Some models are particularly suitable as they allow obtaining predictions directly (simple differential equations and e.t.c). However, often it is not the case. To validate the model or use it to describe the real system, we often need to make computer simulations. In some cases, model is inherently stochastic, but still exhibits some regularity at larger space or time scale, which might be important for particular situation. Financial markets and proteins under physiological conditions could serve as examples. In both cases models are often studied by means of Monte Carlo simulations.
Those properties/rules/laws are hard to derive mathematically and much easier to establish numerically. Later numerical results might be formulated as conjectures to be proven with mathematical rigor. Both in case of conjecture and application of numerically obtained law, it’s important to get it right. It’s not hard to see that the program, which does calculation, must be correct, as small mistake might have a cumulative effect and the final result will differ from the correct one significantly. Testing might not help if one works with datasets that are hardly checkable by eye and the result of application in their case is not known beforehand. In a series of posts, I’m going to share different small programs that are written in Ada 2012 and proven with SPARK 2014. All programs will answer on questions about statistical law or probability and illustrate different provability issues that appear in such cases.
If you haven’t read Significance Journal, you have probably missed a lot of interesting material about statistics. All the articles are intended for anyone with a curiosity about modern statistics but doesn’t have any serious prior knowledge. As far as I know, it’s a joint journal of the Royal Statistical Society and the American Statistical Society; thus, I believe, the material in it represents the most actual topics as perceived by members of those societies. The October issue (Volume 11, Issue 4) presents an article on data anonymisation titled ‘Data and privacy: Now you see me; New model for data sharing; Modern governance and statisticians’ by Sam Smith. I’m going to briefly discuss it and share some related ideas.