Friday, December 1, 2017

Edward modeling to artificial data with random effects

Overview

By Edward, I’ll try to make the model with random effect.
There are some ways to fulfill that. On this article, I’ll follow the style that the Edward tutorial takes.

Data

For the model with random effect, I'll use the artificial data which is used on the article below.

Hierarchical Bayesian model's parameter Interpretation on Stan

Usually, Hierarchical Bayesian model has many parameters. So apparently, the interception to the sampled point's statistical information looks complex. On the article below, I made a Hierarchical Bayesian model to the artificial data. Here, by using almost same but simpler data, I'll make a model and try to interpret.

The following code makes the data.

import numpy as np
np.random.seed(49)

x_1 = list(range(1, 100))
x_2 = list(range(1, 100))
x_3 = list(range(1, 100))

y_1 = [np.random.normal(3.0 * i + 4, 4.0) for i in x_1]
y_2 = [np.random.normal(4.0 * i + 5, 4.0) for i in x_2]
y_3 = [np.random.normal(2.0 * i + 2, 20.0) for i in x_3]

x_1_sign = [1 for i in x_1]
x_2_sign = [2 for i in x_2]
x_3_sign = [3 for i in x_3]

X = x_1 + x_2 + x_3
Y = y_1 + y_2 + y_3
GID = x_1_sign + x_2_sign + x_3_sign

To check data behavior, we can plot it.

import matplotlib.pyplot as plt

plt.plot(x_1, y_1)
plt.plot(x_2, y_2)
plt.plot(x_3, y_3)
plt.show()

enter image description here

As you can see, the data is composed of three types of slopes and intercepts. Those three also have different variance.

Edward modeling


On this case, we already know how the data are generated. So appropriate model is not difficult to construct. But here, just for small trial, I’ll make simple model for easy understanding.
Anyway, at first, import libraries and prepare the data.

import tensorflow as tf
import edward as ed
from edward.models import Normal

# data
X_data = np.array(X).reshape([len(X), 1])
g_n = len(set(GID))
GID_data = np.array([x-1 for x in GID])

The code below is to set placeholders for data and for fixed and random effects. Here, I assume the model like below.


On this case, the random effect only works about interception. Because we know how the data was generated, it is apparently not good model to express the data. Random effect should have effect even on slopes. But here, I just show this case as simple test.
b_common, a_common and random_effect are the target parameters.
  • b_common : slope ignoring differences between Gids
  • a_common : interception ignoring differences between Gids
  • random_effects: interception working about Gids
Those parameters follow normal distributions.

# placeholder
x_ph = tf.placeholder(tf.float32, [None, 1])
gid_ph = tf.placeholder(tf.int32, None)

# fixed effect
a_common = Normal(loc=tf.zeros(1), scale=tf.ones(1))
b_common = Normal(loc=tf.zeros(1), scale=tf.ones(1))

# random effect
random_effect = Normal(loc=tf.zeros([g_n]), scale=tf.ones([g_n]))

The following part constructs the model and executes the inference.
tf.gather() is bit confusing part. From the official page,
Gather slices from params axis axis according to indices.
About detail, check the official page.
By this, Gid works as index and by passing Gid data to placeholder, the model uses 3 types of normal distributions responding to the Gids. If there are mistakes about my interpretation or codes, please let me know.

# model
y_hat = ed.dot(x_ph, b_common) + a_common + tf.gather(random_effect, gid_ph)
output = Normal(loc=y_hat, scale=tf.ones(1))

# inference
q_a_common = Normal(loc=tf.Variable(tf.random_normal([1])), scale=tf.nn.softplus(tf.random_normal([1])))
q_b_common = Normal(loc=tf.Variable(tf.random_normal([1])), scale=tf.nn.softplus(tf.random_normal([1])))
q_random_effect = Normal(loc=tf.Variable(tf.random_normal([g_n])), scale=tf.nn.softplus(tf.random_normal([g_n])))

inference = ed.KLqp({a_common: q_a_common, b_common: q_b_common, random_effect: q_random_effect},
                   {x_ph: X_data, output: np.array(Y), gid_ph: GID_data})
inference.run(n_iter=1000)
1000/1000 [100%] ██████████████████████████████ Elapsed: 6s | Loss: 3963688.750

Check the sampled points


On Edward, we can check the sampled points by plot and evaluation. Here, the parameters we got samples about are just three. Let’s plot all of those.

import matplotlib.pyplot as plt
plt.subplot(3,1,1)
plt.title("b")
plt.hist(q_b_common.sample(1000).eval())
plt.subplot(3,1,2)
plt.title("a")
plt.hist(q_a_common.sample(1000).eval())
plt.subplot(3,1,3)
plt.title("random_effect")
plt.hist(q_random_effect.sample(1000).eval())
plt.show()

enter image description here

Roughly, about fixed ones, the slope is 3.0 and the interception is 2.0. Random effects has three responding to the Gids.
Of course by statistical function like mean(), we can get representative points of the sampled points.

Usually, in the good manner, we should use test data for evaluation to the model. But here, I just show predicted data points by explaining variables of the training data and the model.

n_samples = 100
prob_lst = []
samples = []
b_samples = []
a_samples = []
random_effect_samples = []
for _ in range(n_samples):
    b_samp = q_b_common.sample()
    a_samp = q_a_common.sample()
    random_samp = q_random_effect.sample()
    b_samples.append(b_samp)
    a_samples.append(a_samp)
    random_effect_samples.append(random_samp)
    prob = ed.dot(tf.cast(X_data, tf.float32), b_samp) + a_samp + tf.gather(random_samp, GID_data)

    prob_lst.append(prob.eval())

for i,val in enumerate(prob_lst):
    if i == 0:
        added = val
    else:
        added += val

added_mean = added / len(prob_lst)

By the code above, we can get the predicted and explained points. We need to know that the sampled parameter’s points are sampled points. Here, I just got the mean of the explained points.
Let’s plot it.

plt.scatter(X_data, added_mean)
plt.show()



Visually, we can check it has three types of lines.